Arxiv Day: Article

Quantization for OpenAI's Whisper Models: A Comparative Analysis

Automated speech recognition (ASR) models have gained prominence for applications such as captioning, speech translation, and live transcription. This paper studies Whisper and two model variants: one optimized for live speech streaming and another for offline transcription. Notably, these models have been found to generate hallucinated content, reducing transcription reliability. Furthermore, larger model variants exhibit increased latency and pose challenges for deployment on resource-constrained devices. This study analyzes the similarities and differences between three Whisper models, qualitatively examining their distinct capabilities. Next, this study quantifies the impact of model quantization on latency and evaluates its viability for edge deployment. Using the open source LibriSpeech dataset, this paper evaluates the word error rate (WER) along with latency analysis of whispercpp using 3 quantization methods (INT4, INT5, INT8). Results show that quantization reduces latency by 19\% and model size by 45\%, while preserving transcription accuracy. These findings provide insights into the optimal use cases of different Whisper models and edge device deployment possibilities. All code, datasets, and implementation details are available in a public GitHub repository: https://github.com/allisonandreyev/WhisperQuantization.git

Updated: 2025-03-12 23:50:35

标题: OpenAI的Whisper模型的量化：一项比较分析

摘要: 自动语音识别（ASR）模型在字幕、语音翻译和实时转录等应用中备受关注。本文研究了Whisper及其两个模型变体：一个针对实时语音流优化，另一个针对离线转录。值得注意的是，这些模型被发现生成了虚构内容，降低了转录的可靠性。此外，更大的模型变体会导致增加的延迟，并对在资源受限设备上部署提出挑战。本研究分析了三个Whisper模型之间的相似性和差异，定性地检验它们的独特能力。接下来，本研究量化了模型量化对延迟的影响，评估了在边缘部署中的可行性。使用开源的LibriSpeech数据集，本文评估了whispercpp使用3种量化方法（INT4、INT5、INT8）的单词错误率（WER）以及延迟分析。结果显示，量化可以将延迟降低19％，模型大小减小45％，同时保持转录准确性。这些发现为不同Whisper模型的最佳用例和边缘设备部署可能性提供了见解。所有代码、数据集和实现细节都可以在公共GitHub存储库中找到：https://github.com/allisonandreyev/WhisperQuantization.git

更新时间: 2025-03-12 23:50:35

领域: cs.SD,cs.CL,cs.LG,eess.AS,68T50, 68T10,I.2.7; I.5.4; H.5.1

下载: http://arxiv.org/abs/2503.09905v1

A Semantic-Loss Function Modeling Framework With Task-Oriented Machine Learning Perspectives

The integration of machine learning (ML) has significantly enhanced the capabilities of Earth Observation (EO) systems by enabling the extraction of actionable insights from complex datasets. However, the performance of data-driven EO applications is heavily influenced by the data collection and transmission processes, where limited satellite bandwidth and latency constraints can hinder the full transmission of original data to the receivers. To address this issue, adopting the concepts of Semantic Communication (SC) offers a promising solution by prioritizing the transmission of essential data semantics over raw information. Implementing SC for EO systems requires a thorough understanding of the impact of data processing and communication channel conditions on semantic loss at the processing center. This work proposes a novel data-fitting framework to empirically model the semantic loss using real-world EO datasets and domain-specific insights. The framework quantifies two primary types of semantic loss: (1) source coding loss, assessed via a data quality indicator measuring the impact of processing on raw source data, and (2) transmission loss, evaluated by comparing practical transmission performance against the Shannon limit. Semantic losses are estimated by evaluating the accuracy of EO applications using four task-oriented ML models, EfficientViT, MobileViT, ResNet50-DINO, and ResNet8-KD, on lossy image datasets under varying channel conditions and compression ratios. These results underpin a framework for efficient semantic-loss modeling in bandwidth-constrained EO scenarios, enabling more reliable and effective operations.

Updated: 2025-03-12 23:45:11

标题: 一个基于任务导向机器学习视角的语义损失函数建模框架

摘要: 机器学习（ML）的整合显著增强了地球观测（EO）系统的能力，使得能够从复杂数据集中提取可操作的见解。然而，数据驱动的EO应用的性能受到数据收集和传输过程的影响，卫星带宽有限和延迟约束可能会阻碍原始数据完全传输给接收方。为解决这一问题，采用语义通信（SC）的概念提供了一种有前途的解决方案，通过优先传输关键数据语义而不是原始信息。实施SC对EO系统需要深入了解数据处理和通信通道条件对处理中心语义丢失的影响。本研究提出了一个新颖的数据拟合框架，利用真实的EO数据集和领域特定的见解，经验建模语义丢失。该框架量化了两种主要类型的语义丢失：（1）源编码丢失，通过衡量处理对原始源数据的影响的数据质量指标来评估，（2）传输丢失，通过将实际传输性能与香农极限进行比较来评估。通过在不同信道条件和压缩比下使用四个面向任务的ML模型EfficientViT，MobileViT，ResNet50-DINO和ResNet8-KD对有损图像数据集进行评估，估计语义丢失。这些结果为在带宽受限的EO场景中进行高效的语义丢失建模提供了支持，从而实现更可靠和有效的操作。

更新时间: 2025-03-12 23:45:11

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2503.09903v1

AI Rivalry as a Craft: How Resisting and Embracing Generative AI Reshape Writing Professions

Generative AI (GAI) technologies are disrupting professional writing, challenging traditional practices. Recent studies explore GAI adoption experiences of creative practitioners, but we know little about how these experiences evolve into established practices and how GAI resistance alters these practices. To address this gap, we conducted 25 semi-structured interviews with writing professionals who adopted and/or resisted GAI. Using the theoretical lens of Job Crafting, we identify four strategies professionals employ to reshape their roles. Writing professionals employed GAI resisting strategies to maximize human potential, reinforce professional identity, carve out a professional niche, and preserve credibility within their networks. In contrast, GAI-enabled strategies allowed writers who embraced GAI to enhance desirable workflows, minimize mundane tasks, and engage in new AI-managerial labor. These strategies amplified their collaborations with GAI while reducing their reliance on other people. We conclude by discussing implications of GAI practices on writers' identity and practices as well as crafting theory.

Updated: 2025-03-12 23:43:57

标题: AI竞争作为一门手艺：如何抵制和拥抱生成式AI重塑写作行业

摘要: 生成式人工智能（GAI）技术正在颠覆专业写作，挑战传统实践。最近的研究探讨了创意从业者对GAI采用的经验，但我们对这些经验如何演变为已建立的实践以及GAI抵抗如何改变这些实践知之甚少。为了填补这一空白，我们对采用和/或抵抗GAI的写作专业人员进行了25次半结构化访谈。运用工作塑造的理论视角，我们确定了专业人员用于重塑自身角色的四种策略。写作专业人员采用GAI抵制策略来最大限度地发挥人类潜力，强化专业身份，开辟专业领域，并在他们的网络中保持可信度。相反，启用GAI的策略允许接受GAI的作家增强理想的工作流程，最小化琐碎任务，并参与新的AI管理劳动。这些策略加强了他们与GAI的合作，同时降低了他们对其他人的依赖。我们通过讨论GAI实践对作家身份和实践以及工艺理论的影响来总结。

更新时间: 2025-03-12 23:43:57

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2503.09901v1

Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control

Practical control systems pose significant challenges in identifying optimal control policies due to uncertainties in the system model and external disturbances. While $H_\infty$ control techniques are commonly used to design robust controllers that mitigate the effects of disturbances, these methods often require complex and computationally intensive calculations. To address this issue, this paper proposes a reinforcement learning algorithm called Robust Deterministic Policy Gradient (RDPG), which formulates the $H_\infty$ control problem as a two-player zero-sum dynamic game. In this formulation, one player (the user) aims to minimize the cost, while the other player (the adversary) seeks to maximize it. We then employ deterministic policy gradient (DPG) and its deep reinforcement learning counterpart to train a robust control policy with effective disturbance attenuation. In particular, for practical implementation, we introduce an algorithm called robust deep deterministic policy gradient (RDDPG), which employs a deep neural network architecture and integrates techniques from the twin-delayed deep deterministic policy gradient (TD3) to enhance stability and learning efficiency. To evaluate the proposed algorithm, we implement it on an unmanned aerial vehicle (UAV) tasked with following a predefined path in a disturbance-prone environment. The experimental results demonstrate that the proposed method outperforms other control approaches in terms of robustness against disturbances, enabling precise real-time tracking of moving targets even under severe disturbance conditions.

Updated: 2025-03-12 23:39:47

标题: 坚固的确定性策略梯度用于干扰抑制及其在四轴飞行器控制中的应用

摘要: 实用控制系统在确定最优控制策略时面临重大挑战，主要是由于系统模型和外部干扰的不确定性。尽管$H_\infty$控制技术通常用于设计能够减轻干扰效应的稳健控制器，但这些方法往往需要复杂和计算密集的计算。为了解决这个问题，本文提出了一种称为Robust Deterministic Policy Gradient (RDPG)的强化学习算法，将$H_\infty$控制问题建模为一个双人零和动态博弈。在这种建模中，一个玩家（用户）的目标是最小化成本，而另一个玩家（对手）则寻求最大化成本。然后，我们使用确定性策略梯度（DPG）及其深度强化学习的对应方法来训练一个具有有效干扰衰减能力的稳健控制策略。特别是，为了实际实现，我们引入了一种称为Robust Deep Deterministic Policy Gradient (RDDPG)的算法，该算法采用了深度神经网络结构，并整合了来自双延迟深度确定性策略梯度（TD3）的技术，以增强稳定性和学习效率。为了评估所提出的算法，我们将其应用于一架无人机（UAV），任务是在干扰多发的环境中沿着预定路径飞行。实验结果表明，所提出的方法在抵抗干扰方面优于其他控制方法，即使在严重干扰条件下，也能实现对移动目标的精确实时跟踪。

更新时间: 2025-03-12 23:39:47

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2502.21057v3

A Rule Based Solution to Co-reference Resolution in Clinical Text

Objective: The aim of this study was to build an effective co-reference resolution system tailored for the biomedical domain. Materials and Methods: Experiment materials used in this study is provided by the 2011 i2b2 Natural Language Processing Challenge. The 2011 i2b2 challenge involves coreference resolution in medical documents. Concept mentions have been annotated in clinical texts, and the mentions that co-refer in each document are to be linked by coreference chains. Normally, there are two ways of constructing a system to automatically discover co-referent links. One is to manually build rules for co-reference resolution, and the other category of approaches is to use machine learning systems to learn automatically from training datasets and then perform the resolution task on testing datasets. Results: Experiments show the existing co-reference resolution systems are able to find some of the co-referent links, and our rule based system performs well finding the majority of the co-referent links. Our system achieved 89.6% overall performance on multiple medical datasets. Conclusion: The experiment results show that manually crafted rules based on observation of training data is a valid way to accomplish high performance in this coreference resolution task for the critical biomedical domain.

Updated: 2025-03-12 23:29:08

标题: 在临床文本中共指消解的基于规则的解决方案

摘要: 目的：本研究旨在构建一个针对生物医学领域定制的有效共指解析系统。材料和方法：本研究中使用的实验材料由2011年i2b2自然语言处理挑战提供。2011年i2b2挑战涉及医疗文件中的共指解析。在临床文本中已经注释了概念提及，并且每个文档中共指的提及应该通过共指链进行链接。通常有两种构建系统以自动发现共指链接的方式。一种是手动构建共指解析规则，另一类方法是使用机器学习系统从训练数据集中自动学习，然后在测试数据集上执行解析任务。结果：实验证明现有的共指解析系统能够找到一些共指链接，而我们基于规则的系统在找到大部分共指链接方面表现良好。我们的系统在多个医学数据集上实现了89.6%的整体性能。结论：实验证明，根据对训练数据的观察手动制定的规则是在关键的生物医学领域实现高性能的共指解析任务的有效方法。

更新时间: 2025-03-12 23:29:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.09896v1

Tracking the Best Expert Privately

We design differentially private algorithms for the problem of prediction with expert advice under dynamic regret, also known as tracking the best expert. Our work addresses three natural types of adversaries, stochastic with shifting distributions, oblivious, and adaptive, and designs algorithms with sub-linear regret for all three cases. In particular, under a shifting stochastic adversary where the distribution may shift $S$ times, we provide an $\epsilon$-differentially private algorithm whose expected dynamic regret is at most $O\left( \sqrt{S T \log (NT)} + \frac{S \log (NT)}{\epsilon}\right)$, where $T$ and $N$ are the epsilon horizon and number of experts, respectively. For oblivious adversaries, we give a reduction from dynamic regret minimization to static regret minimization, resulting in an upper bound of $O\left(\sqrt{S T \log(NT)} + \frac{S T^{1/3}\log(T/\delta) \log(NT)}{\epsilon^{2/3}}\right)$ on the expected dynamic regret, where $S$ now denotes the allowable number of switches of the best expert. Finally, similar to static regret, we establish a fundamental separation between oblivious and adaptive adversaries for the dynamic setting: while our algorithms show that sub-linear regret is achievable for oblivious adversaries in the high-privacy regime $\epsilon \le \sqrt{S/T}$, we show that any $(\epsilon, \delta)$-differentially private algorithm must suffer linear dynamic regret under adaptive adversaries for $\epsilon \le \sqrt{S/T}$. Finally, to complement this lower bound, we give an $\epsilon$-differentially private algorithm that attains sub-linear dynamic regret under adaptive adversaries whenever $\epsilon \gg \sqrt{S/T}$.

Updated: 2025-03-12 23:17:03

标题: 私密跟踪最佳专家

摘要: 我们设计了针对专家建议下的预测问题的差分隐私算法，也称为跟踪最佳专家的动态遗憾。我们的工作涉及三种自然类型的对手：具有变化分布的随机对手，无意识对手和自适应对手，并设计了在这三种情况下都具有次线性遗憾的算法。特别是，在一个具有变化随机对手的情况下，分布可能会变化$S$次，我们提供了一个期望动态遗憾最多为$O\left( \sqrt{S T \log (NT)} + \frac{S \log (NT)}{\epsilon}\right)$的$\epsilon$-差分隐私算法，其中$T$和$N$分别是epsilon时间跨度和专家数量。对于无意识对手，我们将动态遗憾最小化降低为静态遗憾最小化，导致期望动态遗憾的上限为$O\left(\sqrt{S T \log(NT)} + \frac{S T^{1/3}\log(T/\delta) \log(NT)}{\epsilon^{2/3}}\right)$，其中$S$现在表示最佳专家允许的交换次数。最后，类似于静态遗憾，我们为动态设置建立了无意识和自适应对手之间的基本区别：虽然我们的算法表明在高隐私范围$\epsilon \le \sqrt{S/T}$下无意识对手可以实现次线性遗憾，但我们表明任何$(\epsilon, \delta)$-差分隐私算法在$\epsilon \le \sqrt{S/T}$下必须承受自适应对手的线性动态遗憾。最后，为了补充这个下界，我们提供了一个在自适应对手下获得次线性动态遗憾的$\epsilon$-差分隐私算法，只要$\epsilon \gg \sqrt{S/T}$。

更新时间: 2025-03-12 23:17:03

领域: cs.LG

下载: http://arxiv.org/abs/2503.09889v1

On Generalization Across Environments In Multi-Objective Reinforcement Learning

Real-world sequential decision-making tasks often require balancing trade-offs between multiple conflicting objectives, making Multi-Objective Reinforcement Learning (MORL) an increasingly prominent field of research. Despite recent advances, existing MORL literature has narrowly focused on performance within static environments, neglecting the importance of generalizing across diverse settings. Conversely, existing research on generalization in RL has always assumed scalar rewards, overlooking the inherent multi-objectivity of real-world problems. Generalization in the multi-objective context is fundamentally more challenging, as it requires learning a Pareto set of policies addressing varying preferences across multiple objectives. In this paper, we formalize the concept of generalization in MORL and how it can be evaluated. We then contribute a novel benchmark featuring diverse multi-objective domains with parameterized environment configurations to facilitate future studies in this area. Our baseline evaluations of state-of-the-art MORL algorithms on this benchmark reveals limited generalization capabilities, suggesting significant room for improvement. Our empirical findings also expose limitations in the expressivity of scalar rewards, emphasizing the need for multi-objective specifications to achieve effective generalization. We further analyzed the algorithmic complexities within current MORL approaches that could impede the transfer in performance from the single- to multiple-environment settings. This work fills a critical gap and lays the groundwork for future research that brings together two key areas in reinforcement learning: solving multi-objective decision-making problems and generalizing across diverse environments. We make our code available at https://github.com/JaydenTeoh/MORL-Generalization.

Updated: 2025-03-12 23:09:08

标题: 跨环境中的多目标强化学习泛化

摘要: 真实世界中的序列决策任务通常需要在多个相互冲突的目标之间取得平衡，这使得多目标强化学习(MORL)成为一个日益突出的研究领域。尽管最近取得了一些进展，但现有的MORL文献主要集中在静态环境中的表现，忽视了在不同环境中泛化的重要性。相反，现有的RL泛化研究一直假定标量奖励，忽略了实际问题的内在多目标性。在多目标环境中的泛化基本上更具挑战性，因为它需要学习一个帕累托集，以解决跨多个目标的不同偏好。在本文中，我们正式化了MORL中泛化的概念以及评估方法。然后，我们提供了一个新颖的基准测试，包括具有参数化环境配置的多样多目标领域，以促进未来在这一领域的研究。我们对这一基准测试中最先进的MORL算法进行了基线评估，发现其泛化能力有限，表明有很大改进空间。我们的实证研究还揭示了标量奖励的表达能力存在局限，强调了需要多目标规范来实现有效的泛化。我们进一步分析了当前MORL方法中的算法复杂性，可能会阻碍从单一到多环境设置中的性能转移。这项工作填补了一个关键空白，并为未来将强化学习中两个关键领域——解决多目标决策问题和在不同环境中泛化——结合起来的研究奠定了基础。我们将我们的代码提供在https://github.com/JaydenTeoh/MORL-Generalization。

更新时间: 2025-03-12 23:09:08

领域: cs.LG

下载: http://arxiv.org/abs/2503.00799v2

Learning-Augmented Competitive Algorithms for Spatiotemporal Online Allocation with Deadline Constraints

We introduce and study spatiotemporal online allocation with deadline constraints ($\mathsf{SOAD}$), a new online problem motivated by emerging challenges in sustainability and energy. In $\mathsf{SOAD}$, an online player completes a workload by allocating and scheduling it on the points of a metric space $(X, d)$ while subject to a deadline $T$. At each time step, a service cost function is revealed that represents the cost of servicing the workload at each point, and the player must irrevocably decide the current allocation of work to points. Whenever the player moves this allocation, they incur a movement cost defined by the distance metric $d(\cdot, \ \cdot)$ that captures, e.g., an overhead cost. $\mathsf{SOAD}$ formalizes the open problem of combining general metrics and deadline constraints in the online algorithms literature, unifying problems such as metrical task systems and online search. We propose a competitive algorithm for $\mathsf{SOAD}$ along with a matching lower bound establishing its optimality. Our main algorithm, \textsc{ST-CLIP}, is a learning-augmented algorithm that takes advantage of predictions (e.g., forecasts of relevant costs) and achieves an optimal consistency-robustness trade-off. We evaluate our proposed algorithms in a simulated case study of carbon-aware spatiotemporal workload management, an application in sustainable computing that schedules a delay-tolerant batch compute job on a distributed network of data centers. In these experiments, we show that \textsc{ST-CLIP} substantially improves on heuristic baseline methods.

Updated: 2025-03-12 22:56:43

标题: 学习增强的具有截止时间约束的时空在线分配的竞争算法

摘要: 我们引入并研究了具有截止时间约束的时空在线分配（$\mathsf{SOAD}$），这是一个由可持续性和能源领域的新挑战激发出的新的在线问题。在$\mathsf{SOAD}$中，一个在线参与者通过在度量空间$(X, d)$的点上分配和调度工作量来完成工作，同时受到截止时间$T$的限制。在每个时间步骤，会揭示一个代表在每个点服务工作的成本的服务成本函数，玩家必须不可逆地决定当前分配给点的工作。每当玩家移动此分配时，他们会产生由距离度量$d(\cdot, \ \cdot)$定义的移动成本，该成本可以捕捉例如开销成本等。$\mathsf{SOAD}$正式化了在线算法文献中结合一般度量和截止时间约束的开放性问题，统一了诸如度量任务系统和在线搜索等问题。我们提出了一个针对$\mathsf{SOAD}$的竞争算法，并建立了一个匹配的下界来确定其最优性。我们的主要算法\textsc{ST-CLIP}是一个学习增强算法，利用预测（例如相关成本的预测）并实现了最佳一致性-鲁棒性的权衡。我们在一个模拟的碳感知时空工作负载管理案例研究中评估了我们提出的算法，这是一种可持续计算应用，它在分布式数据中心网络上调度一个延迟容忍的批量计算作业。在这些实验中，我们展示了\textsc{ST-CLIP}明显改进了启发式基线方法。

更新时间: 2025-03-12 22:56:43

领域: cs.DS,cs.DC,cs.LG

下载: http://arxiv.org/abs/2408.07831v2

Planning with Adaptive World Models for Autonomous Driving

Motion planning is crucial for safe navigation in complex urban environments. Historically, motion planners (MPs) have been evaluated with procedurally-generated simulators like CARLA. However, such synthetic benchmarks do not capture real-world multi-agent interactions. nuPlan, a recently released MP benchmark, addresses this limitation by augmenting real-world driving logs with closed-loop simulation logic, effectively turning the fixed dataset into a reactive simulator. We analyze the characteristics of nuPlan's recorded logs and find that each city has its own unique driving behaviors, suggesting that robust planners must adapt to different environments. We learn to model such unique behaviors with BehaviorNet, a graph convolutional neural network (GCNN) that predicts reactive agent behaviors using features derived from recently-observed agent histories; intuitively, some aggressive agents may tailgate lead vehicles, while others may not. To model such phenomena, BehaviorNet predicts the parameters of an agent's motion controller rather than directly predicting its spacetime trajectory (as most forecasters do). Finally, we present AdaptiveDriver, a model-predictive control (MPC) based planner that unrolls different world models conditioned on BehaviorNet's predictions. Our extensive experiments demonstrate that AdaptiveDriver achieves state-of-the-art results on the nuPlan closed-loop planning benchmark, improving over prior work by 2% on Test-14 Hard R-CLS, and generalizes even when evaluated on never-before-seen cities.

Updated: 2025-03-12 22:55:20

标题: 使用自适应世界模型进行自动驾驶规划

摘要: 运动规划对于在复杂城市环境中安全导航至关重要。历史上，运动规划器（MPs）通常使用CARLA等程序生成的模拟器进行评估。然而，这种合成基准并未捕捉到真实世界中的多智能体互动。最近发布的MP基准nuPlan通过将真实驾驶日志与闭环模拟逻辑相结合，有效地将固定数据集转变为反应式模拟器，从而解决了这一局限性。我们分析了nuPlan记录的日志特征，并发现每个城市都具有其独特的驾驶行为，这表明健壮的规划器必须适应不同的环境。我们通过BehaviorNet学习对这种独特行为进行建模，BehaviorNet是一种图卷积神经网络（GCNN），利用最近观察到的智能体历史特征来预测反应式智能体行为；直观地说，一些侵略性智能体可能会跟车领先车辆，而其他一些则不会。为了模拟这种现象，BehaviorNet预测智能体运动控制器的参数，而不是直接预测其时空轨迹（大多数预测者所做的）。最后，我们提出了AdaptiveDriver，这是一个基于模型预测控制（MPC）的规划器，它在BehaviorNet的预测条件下展开不同的世界模型。我们的广泛实验证明，AdaptiveDriver在nuPlan闭环规划基准上取得了最新的成果，在Test-14 Hard R-CLS上比之前的工作提高了2％，而且在评估以前未见过的城市时也能泛化。

更新时间: 2025-03-12 22:55:20

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2406.10714v3

AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances

Large language models (LLMs) are being increasingly integrated into everyday products and services, such as coding tools and writing assistants. As these embedded AI applications are deployed globally, there is a growing concern that the AI models underlying these applications prioritize Western values. This paper investigates what happens when a Western-centric AI model provides writing suggestions to users from a different cultural background. We conducted a cross-cultural controlled experiment with 118 participants from India and the United States who completed culturally grounded writing tasks with and without AI suggestions. Our analysis reveals that AI provided greater efficiency gains for Americans compared to Indians. Moreover, AI suggestions led Indian participants to adopt Western writing styles, altering not just what is written but also how it is written. These findings show that Western-centric AI models homogenize writing toward Western norms, diminishing nuances that differentiate cultural expression.

Updated: 2025-03-12 22:40:12

标题: 人工智能建议使写作向西方风格同质化，并减少文化细微差别。

摘要: 大型语言模型（LLMs）正越来越多地整合到日常产品和服务中，例如编码工具和写作助手。随着这些嵌入式人工智能应用在全球范围内部署，人们越来越担心这些应用背后的人工智能模型优先考虑西方价值观。本文调查了当一个以西方为中心的人工智能模型为来自不同文化背景的用户提供写作建议时会发生什么。我们进行了一项跨文化控制实验，招募了来自印度和美国的118名参与者，完成了具有文化背景的写作任务，有和没有人工智能建议的情况。我们的分析显示，与印度人相比，人工智能为美国人提供了更大的效率提升。此外，人工智能建议导致印度参与者采用西方写作风格，改变了不仅写了什么，还有如何写的方式。这些发现表明，以西方为中心的人工智能模型使写作向西方规范同质化，削弱了区分文化表达的细微差别。

更新时间: 2025-03-12 22:40:12

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2409.11360v3

The erasure of intensive livestock farming in text-to-image generative AI

Generative AI (e.g., ChatGPT) is increasingly integrated into people's daily lives. While it is known that AI perpetuates biases against marginalized human groups, their impact on non-human animals remains understudied. We found that ChatGPT's text-to-image model (DALL-E 3) introduces a strong bias toward romanticizing livestock farming as dairy cows on pasture and pigs rooting in mud. This bias remained when we requested realistic depictions and was only mitigated when the automatic prompt revision was inhibited. Most farmed animal in industrialized countries are reared indoors with limited space per animal, which fail to resonate with societal values. Inhibiting prompt revision resulted in images that more closely reflected modern farming practices; for example, cows housed indoors accessing feed through metal headlocks, and pigs behind metal railings on concrete floors in indoor facilities. While OpenAI introduced prompt revision to mitigate bias, in the case of farmed animal production systems, it paradoxically introduces a strong bias towards unrealistic farming practices.

Updated: 2025-03-12 22:35:38

标题: 文本到图像生成人工智能中对密集畜牧业的擦除

摘要: 生成式人工智能（例如ChatGPT）越来越多地融入人们的日常生活。尽管众所周知人工智能会对边缘化的人类群体产生偏见，但对非人类动物的影响仍然鲜为人知。我们发现ChatGPT的文本到图像模型（DALL-E 3）在浪漫化畜牧业方面存在着强烈的偏见，将奶牛在牧场上吃草和猪在泥浆中觅食。这种偏见在我们要求真实描绘时仍然存在，并且只有当自动提示修订被禁止时才得以缓解。在工业化国家，大多数养殖动物被室内饲养，每头动物的空间有限，这与社会价值观不符。抑制提示修订会导致更贴近现代养殖实践的图像；例如，被关在室内通过金属牢笼进食的奶牛，以及在室内设施中被关在混凝土地板上的金属栏杆后面的猪。尽管OpenAI引入了提示修订以减少偏见，在养殖动物生产系统的情况下，它却矛盾地引入了对不现实的养殖实践的强烈偏见。

更新时间: 2025-03-12 22:35:38

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2502.19771v2

Revisiting the Predictability of Performative, Social Events

Social predictions do not passively describe the future; they actively shape it. They inform actions and change individual expectations in ways that influence the likelihood of the predicted outcome. Given these dynamics, to what extent can social events be predicted? This question was discussed throughout the 20th century by authors like Merton, Morgenstern, Simon, and others who considered it a central issue in social science methodology. In this work, we provide a modern answer to this old problem. Using recent ideas from performative prediction and outcome indistinguishability, we establish that one can always efficiently predict social events accurately, regardless of how predictions influence data. While achievable, we also show that these predictions are often undesirable, highlighting the limitations of previous desiderata. We end with a discussion of various avenues forward.

Updated: 2025-03-12 22:19:33

标题: 重新审视表现性、社交事件的可预测性

摘要: 社会预测并不 passively 描述未来；它们积极地塑造未来。它们指导行动，并以影响个体预期的方式改变，从而影响预测结果的可能性。鉴于这些动态，社会事件能够被预测到何种程度？在20世纪，像默顿、莫根斯特恩、西蒙等作者讨论了这个问题，他们认为这是社会科学方法论的一个核心问题。在这项工作中，我们为这个古老问题提供了一个现代答案。使用最近的表现性预测和结果不可区分性的思想，我们建立了一个结论，即无论预测如何影响数据，人们总是可以有效地准确预测社会事件。虽然这是可以实现的，我们还展示了这些预测通常是不可取的，突显了以前的期望的局限性。最后，我们讨论了各种前进的途径。

更新时间: 2025-03-12 22:19:33

领域: cs.CY,cs.LG,econ.TH,stat.ML

下载: http://arxiv.org/abs/2503.11713v1

CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation

Vision foundation models (VFMs) such as DINO have led to a paradigm shift in 2D camera-based perception towards extracting generalized features to support many downstream tasks. Recent works introduce self-supervised cross-modal knowledge distillation (KD) as a way to transfer these powerful generalization capabilities into 3D LiDAR-based models. However, they either rely on highly complex distillation losses, pseudo-semantic maps, or limit KD to features useful for semantic segmentation only. In this work, we propose CleverDistiller, a self-supervised, cross-modal 2D-to-3D KD framework introducing a set of simple yet effective design choices: Unlike contrastive approaches relying on complex loss design choices, our method employs a direct feature similarity loss in combination with a multi layer perceptron (MLP) projection head to allow the 3D network to learn complex semantic dependencies throughout the projection. Crucially, our approach does not depend on pseudo-semantic maps, allowing for direct knowledge transfer from a VFM without explicit semantic supervision. Additionally, we introduce the auxiliary self-supervised spatial task of occupancy prediction to enhance the semantic knowledge, obtained from a VFM through KD, with 3D spatial reasoning capabilities. Experiments on standard autonomous driving benchmarks for 2D-to-3D KD demonstrate that CleverDistiller achieves state-of-the-art performance in both semantic segmentation and 3D object detection (3DOD) by up to 10% mIoU, especially when fine tuning on really low data amounts, showing the effectiveness of our simple yet powerful KD strategy

Updated: 2025-03-12 22:18:29

标题: CleverDistiller: 简单且空间一致的跨模态蒸馏

摘要: 视觉基础模型（VFMs）如DINO已经引领了基于2D摄像头的感知向提取泛化特征以支持许多下游任务的转变。最近的研究引入了自监督跨模态知识蒸馏（KD）作为一种将这些强大泛化能力转移到基于3D LiDAR的模型的方法。然而，它们要么依赖于高度复杂的蒸馏损失，伪语义地图，要么将KD限制在仅对语义分割有用的特征上。在这项工作中，我们提出了CleverDistiller，一个自监督的跨模态2D到3D KD框架，引入了一组简单而有效的设计选择：与依赖复杂损失设计选择的对比方法不同，我们的方法采用直接的特征相似性损失，结合多层感知器（MLP）投影头，使3D网络能够通过投影学习复杂的语义依赖关系。关键的是，我们的方法不依赖于伪语义地图，允许从VFMs进行直接知识传递，而无需明确的语义监督。此外，我们引入了辅助的自监督空间任务，即占用预测，以增强通过KD从VFMs获取的语义知识，具有3D空间推理能力。在用于2D到3D KD的标准自动驾驶基准测试中的实验表明，CleverDistiller在语义分割和3D目标检测（3DOD）方面实现了最先进的性能，mIoU高达10％，尤其是在使用非常少的数据量进行微调时，显示出我们简单而强大的KD策略的有效性。

更新时间: 2025-03-12 22:18:29

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2503.09878v1

Improving the Diffusability of Autoencoders

Latent diffusion models have emerged as the leading approach for generating high-quality images and videos, utilizing compressed latent representations to reduce the computational burden of the diffusion process. While recent advancements have primarily focused on scaling diffusion backbones and improving autoencoder reconstruction quality, the interaction between these components has received comparatively less attention. In this work, we perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces, which are especially pronounced in the autoencoders with a large bottleneck channel size. We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality. To mitigate the issue, we propose scale equivariance: a simple regularization strategy that aligns latent and RGB spaces across frequencies by enforcing scale equivariance in the decoder. It requires minimal code changes and only up to 20K autoencoder fine-tuning steps, yet significantly improves generation quality, reducing FID by 19% for image generation on ImageNet-1K 256x256 and FVD by at least 44% for video generation on Kinetics-700 17x256x256.

Updated: 2025-03-12 22:08:10

标题: Improving the Diffusability of Autoencoders （改善自动编码器的扩散性）

摘要: 潜在扩散模型已成为生成高质量图像和视频的主要方法，利用压缩的潜在表示来降低扩散过程的计算负担。尽管最近的进展主要集中在扩展扩散主干和改善自动编码器重建质量上，但这些组件之间的相互作用却受到相对较少的关注。在这项工作中，我们对现代自动编码器进行了谱分析，并确定了其潜在空间中过多的高频组件，这些组件在具有较大瓶颈通道大小的自动编码器中尤为明显。我们假设这种高频组件干扰了扩散合成过程的粗到细的特性，并阻碍了生成质量。为了缓解这一问题，我们提出了尺度等变性：一种简单的正则化策略，通过在解码器中强制在频率上实现尺度等变性，来对齐潜在空间和RGB空间。它需要最少的代码更改，只需要高达20K次自动编码器微调步骤，但显著改善了生成质量，使ImageNet-1K 256x256图像生成的FID减少了19％，而在Kinetics-700 17x256x256视频生成中，FVD至少减少了44％。

更新时间: 2025-03-12 22:08:10

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.14831v2

The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions

Stance detection holds great potential to improve online political discussions through its deployment in discussion platforms for purposes such as content moderation, topic summarization or to facilitate more balanced discussions. Typically, transformer-based models are employed directly for stance detection, requiring vast amounts of data. However, the wide variety of debate topics in online political discussions makes data collection particularly challenging. LLMs have revived stance detection, but their online deployment in online political discussions faces challenges like inconsistent outputs, biases, and vulnerability to adversarial attacks. We show how LLM-generated synthetic data can improve stance detection for online political discussions by using reliable traditional stance detection models for online deployment, while leveraging the text generation capabilities of LLMs for synthetic data generation in a secure offline environment. To achieve this, (i) we generate synthetic data for specific debate questions by prompting a Mistral-7B model and show that fine-tuning with the generated synthetic data can substantially improve the performance of stance detection, while remaining interpretable and aligned with real world data. (ii) Using the synthetic data as a reference, we can improve performance even further by identifying the most informative samples in an unlabelled dataset, i.e., those samples which the stance detection model is most uncertain about and can benefit from the most. By fine-tuning with both synthetic data and the most informative samples, we surpass the performance of the baseline model that is fine-tuned on all true labels, while labelling considerably less data.

Updated: 2025-03-12 22:04:34

标题: LLM生成的合成数据在在线政治讨论中态度检测中的作用

摘要: 态度检测在通过在讨论平台中应用来改善在线政治讨论方面具有巨大潜力，例如内容管理、主题总结或促进更加平衡的讨论。通常情况下，基于transformer的模型被直接应用于态度检测，需要大量的数据。然而，在在线政治讨论中的广泛辩论主题使得数据收集尤为具有挑战性。LLM已经重新唤起了态度检测的兴趣，但它们在在线政治讨论中的部署面临着诸如输出不一致、偏见和容易受到对抗性攻击的挑战。我们展示了LLM生成的合成数据如何能够通过在安全的离线环境中利用LLM的文本生成能力生成合成数据，从而改善在线政治讨论中的态度检测，同时利用可靠的传统态度检测模型进行在线部署。为了实现这一目标，（i）我们通过提示Mistral-7B模型为特定辩论问题生成合成数据，并展示通过使用生成的合成数据进行微调，可以显著改善态度检测的性能，同时保持可解释性并与真实世界数据保持一致。（ii）利用合成数据作为参考，我们可以通过识别未标记数据集中最具信息量的样本，即态度检测模型最不确定的样本，并且可以从中受益，进一步提高性能。通过同时进行合成数据和最具信息量样本的微调，我们超越了基线模型的性能，该基线模型是在所有真实标签上进行微调的，同时标记的数据量要少得多。

更新时间: 2025-03-12 22:04:34

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12480v2

EquiPy: Sequential Fairness using Optimal Transport in Python

Algorithmic fairness has received considerable attention due to the failures of various predictive AI systems that have been found to be unfairly biased against subgroups of the population. Many approaches have been proposed to mitigate such biases in predictive systems, however, they often struggle to provide accurate estimates and transparent correction mechanisms in the case where multiple sensitive variables, such as a combination of gender and race, are involved. This paper introduces a new open source Python package, EquiPy, which provides a easy-to-use and model agnostic toolbox for efficiently achieving fairness across multiple sensitive variables. It also offers comprehensive graphic utilities to enable the user to interpret the influence of each sensitive variable within a global context. EquiPy makes use of theoretical results that allow the complexity arising from the use of multiple variables to be broken down into easier-to-solve sub-problems. We demonstrate the ease of use for both mitigation and interpretation on publicly available data derived from the US Census and provide sample code for its use.

Updated: 2025-03-12 21:53:22

标题: EquiPy：使用Python中的最优传输进行序列公平化

摘要: 算法公平性受到了广泛关注，因为各种预测性人工智能系统的失败被发现对人群的亚组存在不公平偏见。已经提出了许多方法来减轻预测系统中的这种偏见，然而，在涉及多个敏感变量（如性别和种族的组合）的情况下，它们往往难以提供准确的估计和透明的校正机制。本文介绍了一个新的开源Python软件包EquiPy，它提供了一个易于使用且与模型无关的工具箱，可以有效实现跨多个敏感变量的公平性。它还提供全面的图形工具，使用户能够在全局上下文中解释每个敏感变量的影响。EquiPy利用了理论结果，允许将由于使用多个变量而引起的复杂性分解为更容易解决的子问题。我们展示了从美国人口普查中衍生的公开数据的减轻和解释的易用性，并提供了其使用的示例代码。

更新时间: 2025-03-12 21:53:22

领域: cs.LG

下载: http://arxiv.org/abs/2503.09866v1

Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models

Recent advances in text-to-image (T2I) diffusion models have enabled remarkable control over various attributes, yet precise color specification remains a fundamental challenge. Existing approaches, such as ColorPeel, rely on model personalization, requiring additional optimization and limiting flexibility in specifying arbitrary colors. In this work, we introduce ColorWave, a novel training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning. By systematically analyzing the cross-attention mechanisms within IP-Adapter, we uncover an implicit binding between textual color descriptors and reference image features. Leveraging this insight, our method rewires these bindings to enforce precise color attribution while preserving the generative capabilities of pretrained models. Our approach maintains generation quality and diversity, outperforming prior methods in accuracy and applicability across diverse object categories. Through extensive evaluations, we demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.

Updated: 2025-03-12 21:49:52

标题: 利用语义属性绑定实现扩散模型中的免费午餐颜色控制

摘要: 最近在文本到图像(T2I)扩散模型方面取得了重大进展，使得对各种属性具有显著的控制能力，然而精确的颜色规范仍然是一个基本挑战。现有方法，如ColorPeel，依赖于模型个性化，需要额外的优化，限制了指定任意颜色的灵活性。在这项工作中，我们引入了ColorWave，一种全新的无需训练的方法，可以在扩散模型中实现精确的RGB级颜色控制而无需微调。通过系统分析IP-Adapter内的交叉注意机制，我们揭示了文本颜色描述符与参考图像特征之间的隐含绑定。利用这一洞察力，我们的方法重新连接这些绑定，以强制实现精确的颜色归因，同时保留预训练模型的生成能力。我们的方法保持了生成质量和多样性，优于先前方法在准确性和适用性上在不同对象类别中的表现。通过广泛的评估，我们证明了ColorWave为基于结构化、颜色一致的扩散图像合成建立了一个新的范式。

更新时间: 2025-03-12 21:49:52

领域: cs.GR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.09864v1

An Asymmetric Independence Model for Causal Discovery on Path Spaces

We develop the theory linking 'E-separation' in directed mixed graphs (DMGs) with conditional independence relations among coordinate processes in stochastic differential equations (SDEs), where causal relationships are determined by "which variables enter the governing equation of which other variables". We prove a global Markov property for cyclic SDEs, which naturally extends to partially observed cyclic SDEs, because our asymmetric independence model is closed under marginalization. We then characterize the class of graphs that encode the same set of independence relations, yielding a result analogous to the seminal 'same skeleton and v-structures' result for directed acyclic graphs (DAGs). In the fully observed case, we show that each such equivalence class of graphs has a greatest element as a parsimonious representation and develop algorithms to identify this greatest element from data. We conjecture that a greatest element also exists under partial observations, which we verify computationally for graphs with up to four nodes.

Updated: 2025-03-12 21:43:49

标题: 一个用于路径空间上因果发现的非对称独立模型

摘要: 我们发展了关于有向混合图（DMGs）中的“E分离”与随机微分方程（SDEs）中坐标过程之间条件独立关系的理论，其中因果关系由“哪些变量进入其他变量的控制方程”确定。我们证明了循环SDEs的全局马尔可夫性质，这自然地扩展到部分观测到的循环SDEs，因为我们的不对称独立模型在边缘化下封闭。然后我们对编码相同独立关系集合的图类进行了表征，产生了类似于有向无环图（DAGs）中的“相同骨架和v-结构”的结果。在完全观测的情况下，我们展示了每个这样的等价类图都有一个最大元素作为简约表示，并开发了从数据中识别这个最大元素的算法。我们猜想在部分观测下也存在一个最大元素，我们在最多四个节点的图中通过计算验证了这一点。

更新时间: 2025-03-12 21:43:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.09859v1

Media and responsible AI governance: a game-theoretic and LLM analysis

This paper investigates the complex interplay between AI developers, regulators, users, and the media in fostering trustworthy AI systems. Using evolutionary game theory and large language models (LLMs), we model the strategic interactions among these actors under different regulatory regimes. The research explores two key mechanisms for achieving responsible governance, safe AI development and adoption of safe AI: incentivising effective regulation through media reporting, and conditioning user trust on commentariats' recommendation. The findings highlight the crucial role of the media in providing information to users, potentially acting as a form of "soft" regulation by investigating developers or regulators, as a substitute to institutional AI regulation (which is still absent in many regions). Both game-theoretic analysis and LLM-based simulations reveal conditions under which effective regulation and trustworthy AI development emerge, emphasising the importance of considering the influence of different regulatory regimes from an evolutionary game-theoretic perspective. The study concludes that effective governance requires managing incentives and costs for high quality commentaries.

Updated: 2025-03-12 21:39:38

标题: 媒体与负责任的人工智能治理：博弈论和LLM分析

摘要: 本文研究了人工智能开发者、监管机构、用户和媒体在促进可信赖的人工智能系统方面的复杂互动。利用进化博弈论和大型语言模型（LLMs），我们对这些参与者在不同监管制度下的战略互动进行建模。研究探讨了实现负责任治理、安全人工智能开发和采用安全人工智能的两个关键机制：通过媒体报道激励有效监管，以及将用户信任建立在评论家的推荐上。研究结果突显了媒体在向用户提供信息方面的关键作用，可能作为一种“软”规范，通过调查开发者或监管机构来替代制度化的人工智能监管（在许多地区仍然缺失）。博弈论分析和基于LLM的模拟揭示了有效监管和可信赖的人工智能开发出现的条件，强调了从进化博弈论的角度考虑不同监管制度的影响的重要性。研究总结认为，有效治理需要管理高质量评论的激励和成本。

更新时间: 2025-03-12 21:39:38

领域: cs.AI,cs.GT,cs.MA,nlin.CD

下载: http://arxiv.org/abs/2503.09858v1

Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting

Large Language Models (LLMs) have recently demonstrated significant potential in time series forecasting, offering impressive capabilities in handling complex temporal data. However, their robustness and reliability in real-world applications remain under-explored, particularly concerning their susceptibility to adversarial attacks. In this paper, we introduce a targeted adversarial attack framework for LLM-based time series forecasting. By employing both gradient-free and black-box optimization methods, we generate minimal yet highly effective perturbations that significantly degrade the forecasting accuracy across multiple datasets and LLM architectures. Our experiments, which include models like LLMTime with GPT-3.5, GPT-4, LLaMa, and Mistral, TimeGPT, and TimeLLM show that adversarial attacks lead to much more severe performance degradation than random noise, and demonstrate the broad effectiveness of our attacks across different LLMs. The results underscore the critical vulnerabilities of LLMs in time series forecasting, highlighting the need for robust defense mechanisms to ensure their reliable deployment in practical applications. The code repository can be found at https://github.com/JohnsonJiang1996/AdvAttack_LLM4TS.

Updated: 2025-03-12 21:35:52

标题: 大型语言模型在时间序列预测中的对抗性漏洞

摘要: 最近，大型语言模型（LLMs）在时间序列预测方面展现出显著潜力，具有处理复杂时间数据的令人印象深刻的能力。然而，它们在现实世界应用中的稳健性和可靠性仍未得到充分探讨，特别是在对抗攻击方面的敏感性。本文介绍了一种针对LLM基于时间序列预测的定向对抗攻击框架。通过使用无梯度和黑盒优化方法，我们生成了最小但高效的扰动，显著降低了跨多个数据集和LLM架构的预测准确性。我们的实验包括了模型如LLMTime与GPT-3.5、GPT-4、LLaMa和Mistral、TimeGPT和TimeLLM，结果显示对抗攻击比随机噪声导致的性能下降更为严重，并展示了我们的攻击在不同LLM上的广泛有效性。这些结果强调了LLM在时间序列预测中的关键漏洞，突出了需要健壮的防御机制来确保它们在实际应用中的可靠部署。代码存储库可在https://github.com/JohnsonJiang1996/AdvAttack_LLM4TS找到。

更新时间: 2025-03-12 21:35:52

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2412.08099v4

HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models

Robots interacting with humans through natural language can unlock numerous applications such as Referring Grasp Synthesis (RGS). Given a text query, RGS determines a stable grasp pose to manipulate the referred object in the robot's workspace. RGS comprises two steps: visual grounding and grasp pose estimation. Recent studies leverage powerful Vision-Language Models (VLMs) for visually grounding free-flowing natural language in real-world robotic execution. However, comparisons in complex, cluttered environments with multiple instances of the same object are lacking. This paper introduces HiFi-CS, featuring hierarchical application of Featurewise Linear Modulation (FiLM) to fuse image and text embeddings, enhancing visual grounding for complex attribute rich text queries encountered in robotic grasping. Visual grounding associates an object in 2D/3D space with natural language input and is studied in two scenarios: Closed and Open Vocabulary. HiFi-CS features a lightweight decoder combined with a frozen VLM and outperforms competitive baselines in closed vocabulary settings while being 100x smaller in size. Our model can effectively guide open-set object detectors like GroundedSAM to enhance open-vocabulary performance. We validate our approach through real-world RGS experiments using a 7-DOF robotic arm, achieving 90.33\% visual grounding accuracy in 15 tabletop scenes. Our codebase is provided here: https://github.com/vineet2104/hifics

Updated: 2025-03-12 21:30:37

标题: HiFi-CS: 面向机器人抓取的开放词汇视觉定位的研究：使用视觉语言模型

摘要: 通过自然语言与人类互动的机器人可以解锁诸多应用，如参考抓握综合（RGS）。在给定文本查询的情况下，RGS确定一个稳定的抓握姿势，以操作机器人工作空间中所指的对象。RGS包括两个步骤：视觉定位和抓握姿势估计。最近的研究利用强大的视觉语言模型（VLMs）来在现实世界的机器人执行中将自由流动的自然语言与视觉定位相结合。然而，在复杂、混乱的环境中，存在多个相同对象实例的比较还不足。本文介绍了HiFi-CS，其采用特征线性调制（FiLM）的分层应用，将图像和文本嵌入融合在一起，增强视觉定位，以处理机器人抓握中遇到的复杂属性丰富的文本查询。视觉定位将2D/3D空间中的对象与自然语言输入关联起来，并在两种情景中进行研究：封闭和开放词汇。HiFi-CS采用轻量级解码器与冻结的VLM相结合，在封闭词汇环境中表现出色，同时体积缩小了100倍。我们的模型可以有效引导类似GroundedSAM的开放集对象检测器，以提高开放词汇表现。我们通过使用7自由度机械臂进行真实世界的RGS实验来验证我们的方法，在15个桌面场景中实现了90.33\%的视觉定位准确率。我们的代码库可以在以下链接找到：https://github.com/vineet2104/hifics

更新时间: 2025-03-12 21:30:37

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.10419v2

TabNSA: Native Sparse Attention for Efficient Tabular Data Learning

Tabular data poses unique challenges for deep learning due to its heterogeneous features and lack of inherent spatial structure. This paper introduces TabNSA, a novel deep learning architecture leveraging Native Sparse Attention (NSA) specifically for efficient tabular data processing. TabNSA incorporates a dynamic hierarchical sparse strategy, combining coarse-grained feature compression with fine-grained feature selection to preserve both global context awareness and local precision. By dynamically focusing on relevant subsets of features, TabNSA effectively captures intricate feature interactions. Extensive experiments demonstrate that TabNSA consistently outperforms existing methods, including both deep learning architectures and ensemble decision trees, achieving state-of-the-art performance across various benchmark datasets.

Updated: 2025-03-12 21:13:41

标题: TabNSA：用于高效表格数据学习的本地稀疏注意力

摘要: 表格数据对深度学习提出了独特挑战，因为它具有异构特征并且缺乏固有的空间结构。本文介绍了TabNSA，一种利用本地稀疏注意力（NSA）的新颖深度学习架构，专门用于高效处理表格数据。TabNSA结合了一种动态分层稀疏策略，将粗粒度特征压缩与细粒度特征选择相结合，以保留全局上下文意识和局部精度。通过动态聚焦于相关特征子集，TabNSA有效地捕获复杂的特征交互作用。大量实验证明，TabNSA始终优于现有方法，包括深度学习架构和集成决策树，在各种基准数据集上取得了最先进的性能。

更新时间: 2025-03-12 21:13:41

领域: cs.LG

下载: http://arxiv.org/abs/2503.09850v1

Training Human-Robot Teams by Improving Transparency Through a Virtual Spectator Interface

After-action reviews (AARs) are professional discussions that help operators and teams enhance their task performance by analyzing completed missions with peers and professionals. Previous studies that compared different formats of AARs have mainly focused on human teams. However, the inclusion of robotic teammates brings along new challenges in understanding teammate intent and communication. Traditional AAR between human teammates may not be satisfactory for human-robot teams. To address this limitation, we propose a new training review (TR) tool, called the Virtual Spectator Interface (VSI), to enhance human-robot team performance and situational awareness (SA) in a simulated search mission. The proposed VSI primarily utilizes visual feedback to review subjects' behavior. To examine the effectiveness of VSI, we took elements from AAR to conduct our own TR, designed a 1 x 3 between-subjects experiment with experimental conditions: TR with (1) VSI, (2) screen recording, and (3) non-technology (only verbal descriptions). The results of our experiments demonstrated that the VSI did not result in significantly better team performance than other conditions. However, the TR with VSI led to more improvement in the subjects SA over the other conditions.

Updated: 2025-03-12 21:13:34

标题: 通过改进透明度提高虚拟观众界面培训人机团队

摘要: 行动后评审（AARs）是专业讨论，通过与同行和专业人员分析完成的任务来帮助操作员和团队提高任务表现。先前比较不同AAR格式的研究主要集中在人类团队上。然而，引入机器人队友带来了理解队友意图和沟通的新挑战。传统的人类团队之间的AAR可能不适用于人机团队。为了解决这一局限性，我们提出了一种新的训练评审（TR）工具，名为虚拟观众界面（VSI），以增强模拟搜索任务中的人机团队表现和情境意识（SA）。建议的VSI主要利用视觉反馈来审查主体的行为。为了检验VSI的有效性，我们从AAR中提取元素来进行我们自己的TR，设计了一个1 x 3个实验条件的实验：TR与（1）VSI，（2）屏幕录制和（3）非技术（仅口头描述）。我们实验的结果表明，VSI并未比其他条件显著提高团队表现。然而，TR与VSI导致主体的SA比其他条件更有改善。

更新时间: 2025-03-12 21:13:34

领域: cs.HC,cs.AI,cs.RO,H.5.2; I.2.9

下载: http://arxiv.org/abs/2503.09849v1

An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints

We study Online Convex Optimization (OCO) with adversarial constraints, where an online algorithm must make sequential decisions to minimize both convex loss functions and cumulative constraint violations. We focus on a setting where the algorithm has access to predictions of the loss and constraint functions. Our results show that we can improve the current best bounds of $ O(\sqrt{T}) $ regret and $ \tilde{O}(\sqrt{T}) $ cumulative constraint violations to $ O(\sqrt{E_T(f)}) $ and $ \tilde{O}(\sqrt{E_T(g^+)}) $, respectively, where $ E_T(f) $ and $E_T(g^+)$ represent the cumulative prediction errors of the loss and constraint functions. In the worst case, where $E_T(f) = O(T) $ and $ E_T(g^+) = O(T) $ (assuming bounded gradients of the loss and constraint functions), our rates match the prior $ O(\sqrt{T}) $ results. However, when the loss and constraint predictions are accurate, our approach yields significantly smaller regret and cumulative constraint violations. Finally, we apply this to the setting of adversarial contextual bandits with sequential risk constraints, obtaining optimistic bounds $O (\sqrt{E_T(f)} T^{1/3})$ regret and $O(\sqrt{E_T(g^+)} T^{1/3})$ constraints violation, yielding better performance than existing results when prediction quality is sufficiently high.

Updated: 2025-03-12 21:09:19

标题: 一种面向具有对抗性约束的在线凸优化的乐观算法

摘要: 我们研究具有对抗性约束的在线凸优化（OCO），其中在线算法必须进行顺序决策，以最小化凸损失函数和累积约束违规。我们专注于一种情景，其中算法可以访问损失和约束函数的预测。我们的结果表明，我们可以将当前最佳界限的$ O(\sqrt{T}) $遗憾和$ \tilde{O}(\sqrt{T}) $累积约束违规改进为$ O(\sqrt{E_T(f)}) $和$ \tilde{O}(\sqrt{E_T(g^+)}) $，其中$ E_T(f) $和$ E_T(g^+) $分别代表损失和约束函数的累积预测误差。在最坏情况下，其中$E_T(f) = O(T) $和$ E_T(g^+) = O(T) $（假设损失和约束函数的梯度有界），我们的速率与先前的$ O(\sqrt{T}) $结果相匹配。然而，当损失和约束预测准确时，我们的方法产生明显更小的遗憾和累积约束违规。最后，我们将此应用于具有顺序风险约束的对抗性上下文强盗设置，获得乐观界限$O (\sqrt{E_T(f)} T^{1/3})$ 遗憾和$O(\sqrt{E_T(g^+)} T^{1/3})$ 约束违规，当预测质量足够高时，产生比现有结果更好的性能。

更新时间: 2025-03-12 21:09:19

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2412.08060v2

Predicting Tropical Cyclone Track Forecast Errors using a Probabilistic Neural Network

A new method for estimating tropical cyclone track uncertainty is presented and tested. This method uses a neural network to predict a bivariate normal distribution, which serves as an estimate for track uncertainty. We train the network and make predictions on forecasts from the National Hurricane Center (NHC), which currently uses static error distributions based on forecasts from the past five years for most applications. The neural network-based method produces uncertainty estimates that are dynamic and probabilistic. Further, the neural network-based method allows for probabilistic statements about tropical cyclone trajectories, including landfall probability, which we highlight. We show that our predictions are well calibrated using multiple metrics, that our method produces better uncertainty estimates than current NHC approaches, and that our method achieves similar performance to the Global Ensemble Forecast System. Once trained, the computational cost of predictions using this method is negligible, making it a strong candidate to improve the NHC's operational estimations of tropical cyclone track uncertainty.

Updated: 2025-03-12 21:00:31

标题: 使用概率神经网络预测热带气旋路径预测误差

摘要: 提出并测试了一种估计热带气旋路径不确定性的新方法。该方法使用神经网络来预测双变量正态分布，这作为路径不确定性的估计。我们对神经网络进行训练，并对国家飓风中心（NHC）的预测进行预测，该中心目前对大多数应用基于过去五年的预测的静态误差分布。基于神经网络的方法产生了动态和概率的不确定性估计。此外，基于神经网络的方法允许对热带气旋轨迹提出概率声明，包括我们重点介绍的登陆概率。我们展示了我们的预测在使用多个指标时具有良好的校准性，我们的方法产生比当前NHC方法更好的不确定性估计，并且我们的方法达到了全球集合预报系统相似的性能。一旦训练完成，使用这种方法进行预测的计算成本是可以忽略的，使其成为改进NHC对热带气旋路径不确定性操作估计的一个强有力候选。

更新时间: 2025-03-12 21:00:31

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2503.09840v1

Extrapolated Urban View Synthesis Benchmark

Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs). At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs. Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes. However, their performance is commonly evaluated using an interpolated setup with highly correlated training and test views. In contrast, extrapolation, where test views largely deviate from training views, remains underexplored, limiting progress in generalizable simulation technology. To address this gap, we leverage publicly available AV datasets with multiple traversals, multiple vehicles, and multiple cameras to build the first Extrapolated Urban View Synthesis (EUVS) benchmark. Meanwhile, we conduct both quantitative and qualitative evaluations of state-of-the-art NVS methods across different evaluation settings. Our results show that current NVS methods are prone to overfitting to training views. Besides, incorporating diffusion priors and improving geometry cannot fundamentally improve NVS under large view changes, highlighting the need for more robust approaches and large-scale training. We will release the data to help advance self-driving and urban robotics simulation technology.

Updated: 2025-03-12 20:57:59

标题: 外推城市景观综合基准Benchmark

摘要: 照片逼真的模拟器对于训练和评估以视觉为中心的自动驾驶车辆（AVs）至关重要。其核心是新颖视角合成（NVS），这是一个关键的能力，可以生成多样化的未见视角，以适应AVs的广泛和连续的姿势分布。最近在辐射场方面的进展，如3D高斯喷溅，在实时速度下实现了逼真的渲染，并广泛应用于建模大规模驾驶场景。然而，它们的性能通常是使用高度相关的训练和测试视图的插值设置进行评估的。相比之下，外推，在测试视图与训练视图大相径庭的情况下，仍未被充分探讨，限制了可推广的模拟技术的进展。为了填补这一差距，我们利用公开可用的AV数据集，包括多次遍历、多辆车和多个摄像头，建立了第一个外推城市视角合成（EUVS）基准。同时，我们在不同的评估设置中对最先进的NVS方法进行定量和定性评估。我们的结果显示，当前的NVS方法容易过拟合训练视图。此外，结合扩散先验和改进几何不能从根本上改善在大视角变化下的NVS，突显了对更稳健的方法和大规模训练的需求。我们将发布这些数据，以帮助推动自动驾驶和城市机器人技术的发展。

更新时间: 2025-03-12 20:57:59

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2412.05256v3

A Comprehensive Review on Understanding the Decentralized and Collaborative Approach in Machine Learning

The arrival of Machine Learning (ML) completely changed how we can unlock valuable information from data. Traditional methods, where everything was stored in one place, had big problems with keeping information private, handling large amounts of data, and avoiding unfair advantages. Machine Learning has become a powerful tool that uses Artificial Intelligence (AI) to overcome these challenges. We started by learning the basics of Machine Learning, including the different types like supervised, unsupervised, and reinforcement learning. We also explored the important steps involved, such as preparing the data, choosing the right model, training it, and then checking its performance. Next, we examined some key challenges in Machine Learning, such as models learning too much from specific examples (overfitting), not learning enough (underfitting), and reflecting biases in the data used. Moving beyond centralized systems, we looked at decentralized Machine Learning and its benefits, like keeping data private, getting answers faster, and using a wider variety of data sources. We then focused on a specific type called federated learning, where models are trained without directly sharing sensitive information. Real-world examples from healthcare and finance were used to show how collaborative Machine Learning can solve important problems while still protecting information security. Finally, we discussed challenges like communication efficiency, dealing with different types of data, and security. We also explored using a Zero Trust framework, which provides an extra layer of protection for collaborative Machine Learning systems. This approach is paving the way for a bright future for this groundbreaking technology.

Updated: 2025-03-12 20:54:22

标题: 机器学习中分散和协作方法的全面回顾

摘要: 机器学习（ML）的到来彻底改变了我们从数据中提取有价值信息的方式。传统方法中，一切都存储在一个地方，存在保护信息隐私、处理大量数据和避免不公平优势等问题。机器学习已经成为一个强大的工具，利用人工智能（AI）来克服这些挑战。我们开始学习机器学习的基础知识，包括监督学习、无监督学习和强化学习等不同类型。我们还探讨了涉及的重要步骤，如准备数据、选择合适的模型、训练模型，然后检查其性能。接下来，我们审视了机器学习中的一些关键挑战，如模型从特定示例中学习过多（过拟合）、学习不足（欠拟合）和反映数据中的偏见。超越集中式系统，我们研究了分散式机器学习及其优势，如保持数据私密性、更快获得答案以及使用更广泛的数据来源。然后，我们专注于一种特定类型的联邦学习，其中模型在不直接共享敏感信息的情况下进行训练。通过医疗保健和金融领域的实际例子，展示了协作机器学习如何解决重要问题同时保护信息安全。最后，我们讨论了沟通效率、处理不同类型数据和安全等挑战。我们还探讨了使用零信任框架，为协作机器学习系统提供额外的保护层。这种方法为这一划时代的技术铺平了一条光明的未来之路。

更新时间: 2025-03-12 20:54:22

领域: cs.LG

下载: http://arxiv.org/abs/2503.09833v1

EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage

Generalist web agents have demonstrated remarkable potential in autonomously completing a wide range of tasks on real websites, significantly boosting human productivity. However, web tasks, such as booking flights, usually involve users' PII, which may be exposed to potential privacy risks if web agents accidentally interact with compromised websites, a scenario that remains largely unexplored in the literature. In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments. First, we present a realistic threat model for attacks on the website, where we consider two adversarial targets: stealing users' specific PII or the entire user request. Then, we propose a novel attack method, termed Environmental Injection Attack (EIA). EIA injects malicious content designed to adapt well to environments where the agents operate and our work instantiates EIA specifically for privacy scenarios in web environments. We collect 177 action steps that involve diverse PII categories on realistic websites from the Mind2Web, and conduct experiments using one of the most capable generalist web agent frameworks to date. The results demonstrate that EIA achieves up to 70% ASR in stealing specific PII and 16% ASR for full user request. Additionally, by accessing the stealthiness and experimenting with a defensive system prompt, we indicate that EIA is hard to detect and mitigate. Notably, attacks that are not well adapted for a webpage can be detected via human inspection, leading to our discussion about the trade-off between security and autonomy. However, extra attackers' efforts can make EIA seamlessly adapted, rendering such supervision ineffective. Thus, we further discuss the defenses at the pre- and post-deployment stages of the websites without relying on human supervision and call for more advanced defense strategies.

Updated: 2025-03-12 20:54:00

标题: EIA: 通用网络代理的环境注入攻击对隐私泄露的影响

摘要: 通用网络代理已经展示出在真实网站上自主完成各种任务的显著潜力，显著提升了人类的生产力。然而，网页任务，如订购航班，通常涉及用户的个人身份信息(PII)，如果网络代理与受损网站意外交互，可能会面临潜在的隐私风险，这一情景在文献中尚未得到充分探讨。本文狭窄了这一差距，通过在对抗环境中进行首次关于通用网络代理隐私风险的研究来填补这一空白。首先，我们提出了一个针对网站攻击的现实威胁模型，其中考虑了两种对抗目标：窃取用户特定的PII或整个用户请求。然后，我们提出了一种新的攻击方法，称为环境注入攻击（EIA）。EIA注入了设计良好以适应代理操作环境的恶意内容，我们的工作特别为网络环境中的隐私场景实例化了EIA。我们从Mind2Web收集了涉及多种PII类别的177个行动步骤，然后使用迄今为止最强大的通用网络代理框架之一进行实验。结果表明，EIA在窃取特定PII方面实现了高达70%的ASR，对于完整用户请求则为16%的ASR。此外，通过访问隐蔽性并尝试防御系统提示，我们指出EIA很难被检测和缓解。值得注意的是，未适应网页的攻击可以通过人工检查来检测，这引发了我们对安全性和自主性之间权衡的讨论。然而，额外的攻击者努力可以使EIA无缝适应，使这种监督失效。因此，我们进一步讨论了在网站的预部署和事后部署阶段的防御措施，而不依赖于人工监督，并呼吁更多先进的防御策略。

更新时间: 2025-03-12 20:54:00

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.11295v5

On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective

Kolmogorov-Arnold Networks (KANs) have recently emerged as a novel approach to function approximation, demonstrating remarkable potential in various domains. Despite their theoretical promise, the robustness of KANs under adversarial conditions has yet to be thoroughly examined. In this paper we explore the adversarial robustness of KANs, with a particular focus on image classification tasks. We assess the performance of KANs against standard white box and black-box adversarial attacks, comparing their resilience to that of established neural network architectures. Our experimental evaluation encompasses a variety of standard image classification benchmark datasets and investigates both fully connected and convolutional neural network architectures, of three sizes: small, medium, and large. We conclude that small- and medium-sized KANs (either fully connected or convolutional) are not consistently more robust than their standard counterparts, but that large-sized KANs are, by and large, more robust. This comprehensive evaluation of KANs in adversarial scenarios offers the first in-depth analysis of KAN security, laying the groundwork for future research in this emerging field.

Updated: 2025-03-12 20:45:25

标题: 关于科尔莫戈洛夫-阿诺德网络的稳健性：一个对抗性视角

摘要: 科尔莫哥洛夫-阿诺德网络（KANs）最近作为一种新的函数逼近方法出现，展现出在各个领域中的显著潜力。尽管它们在理论上有所承诺，但KANs在对抗条件下的稳健性尚未得到彻底检查。在本文中，我们探讨了KANs的对抗稳健性，特别关注图像分类任务。我们评估了KANs在标准白盒和黑盒对抗攻击下的性能，并将其与已建立的神经网络架构的稳健性进行比较。我们的实验评估涵盖了各种标准图像分类基准数据集，并研究了三种规模的全连接和卷积神经网络架构：小型、中型和大型。我们得出结论，小型和中型的KANs（无论是全连接还是卷积）并不总是比它们的标准对应物更稳健，但大型的KANs在很大程度上更稳健。这一对KANs在对抗场景中的全面评估提供了KAN安全性的首次深入分析，为未来在这一新兴领域的研究奠定了基础。

更新时间: 2025-03-12 20:45:25

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2408.13809v3

Data Traceability for Privacy Alignment

This paper offers a new privacy approach for the growing ecosystem of services--ranging from open banking to healthcare--dependent on sensitive personal data sharing between individuals and third-parties. While these services offer significant benefits, individuals want control over their data, transparency regarding how their data is used, and accountability from third-parties for misuse. However, existing legal and technical mechanisms are inadequate for supporting these needs. A comprehensive approach to the modern privacy challenges of accountable third-party data sharing requires a closer alignment of technical system architecture and legal institutional design. In order to achieve this privacy alignment, we extend traditional security threat modeling and analysis to encompass a broader range of privacy notions than has been typically considered. In particular, we introduce the concept of covert-accountability, which addresses adversaries that may act dishonestly but face potential identification and legal consequences. As a concrete instance of this design approach, we present the OTrace protocol, designed to provide traceable, accountable, consumer-control in third-party data sharing ecosystems. OTrace empowers consumers with the knowledge of where their data is, who has it, what it is being used for, and whom it is being shared with. By applying our alignment framework to OTrace, we demonstrate that OTrace's technical affordances can provide more confident, scalable regulatory oversight when combined with complementary legal mechanisms.

Updated: 2025-03-12 20:42:23

标题: 隐私对齐的数据可追溯性

摘要: 这篇论文提出了一种新的隐私方法，用于日益增长的服务生态系统，从开放银行到医疗保健，这些服务依赖于个人和第三方之间敏感个人数据的共享。尽管这些服务带来了显著的好处，个人希望控制他们的数据，透明地了解他们的数据如何被使用，并要求第三方对滥用负责。然而，现有的法律和技术机制不足以支持这些需求。对于现代隐私挑战的全面处理需要更紧密地将技术系统架构与法律制度设计相结合。为了实现这种隐私对齐，我们将传统的安全威胁建模和分析扩展到比通常考虑的更广泛的隐私概念范围。特别地，我们引入了隐秘可追究性的概念，这个概念解决了可能会不诚实行事但可能会面临潜在识别和法律后果的对手。作为这种设计方法的一个具体实例，我们提出了OTrace协议，旨在为第三方数据共享生态系统提供可追踪、可追究、消费者控制。OTrace赋予消费者知识，了解他们的数据位于何处，谁拥有它，它被用于什么目的，以及与谁分享。通过将我们的对齐框架应用于OTrace，我们证明了OTrace的技术便利性可以在与补充的法律机制相结合时提供更有信心、更可扩展的监管监督。

更新时间: 2025-03-12 20:42:23

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2503.09823v1

Generative AI for Named Entity Recognition in Low-Resource Language Nepali

Generative Artificial Intelligence (GenAI), particularly Large Language Models (LLMs), has significantly advanced Natural Language Processing (NLP) tasks, such as Named Entity Recognition (NER), which involves identifying entities like person, location, and organization names in text. LLMs are especially promising for low-resource languages due to their ability to learn from limited data. However, the performance of GenAI models for Nepali, a low-resource language, has not been thoroughly evaluated. This paper investigates the application of state-of-the-art LLMs for Nepali NER, conducting experiments with various prompting techniques to assess their effectiveness. Our results provide insights into the challenges and opportunities of using LLMs for NER in low-resource settings and offer valuable contributions to the advancement of NLP research in languages like Nepali.

Updated: 2025-03-12 20:40:09

标题: 尼泊尔语低资源语言中的命名实体识别的生成式人工智能

摘要: 生成式人工智能（GenAI），特别是大型语言模型（LLMs），已显著推进了自然语言处理（NLP）任务，如命名实体识别（NER），其中涉及在文本中识别人物、地点和组织名称等实体。由于能够从有限数据中学习，LLMs对于低资源语言尤其具有潜力。然而，针对尼泊尔语这种低资源语言，GenAI模型的性能尚未得到全面评估。本文研究了最先进的LLMs在尼泊尔语NER中的应用，通过使用各种提示技术进行实验以评估其有效性。我们的结果提供了关于在低资源环境中使用LLMs进行NER的挑战和机遇，并为在尼泊尔语等语言中推进NLP研究提供了宝贵的贡献。

更新时间: 2025-03-12 20:40:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.09822v1

Vi-LAD: Vision-Language Attention Distillation for Socially-Aware Robot Navigation in Dynamic Environments

We introduce Vision-Language Attention Distillation (Vi-LAD), a novel approach for distilling socially compliant navigation knowledge from a large Vision-Language Model (VLM) into a lightweight transformer model for real-time robotic navigation. Unlike traditional methods that rely on expert demonstrations or human-annotated datasets, Vi-LAD performs knowledge distillation and fine-tuning at the intermediate layer representation level (i.e., attention maps) by leveraging the backbone of a pre-trained vision-action model. These attention maps highlight key navigational regions in a given scene, which serve as implicit guidance for socially aware motion planning. Vi-LAD fine-tunes a transformer-based model using intermediate attention maps extracted from the pre-trained vision-action model, combined with attention-like semantic maps constructed from a large VLM. To achieve this, we introduce a novel attention-level distillation loss that fuses knowledge from both sources, generating augmented attention maps with enhanced social awareness. These refined attention maps are then utilized as a traversability costmap within a socially aware model predictive controller (MPC) for navigation. We validate our approach through real-world experiments on a Husky wheeled robot, demonstrating significant improvements over state-of-the-art (SOTA) navigation methods. Our results show up to 14.2% - 50% improvement in success rate, which highlights the effectiveness of Vi-LAD in enabling socially compliant and efficient robot navigation.

Updated: 2025-03-12 20:38:23

标题: Vi-LAD：视觉-语言关注蒸馏用于动态环境中社交感知机器人导航

摘要: 我们介绍了Vision-Language Attention Distillation (Vi-LAD)，这是一种新颖的方法，用于从大型Vision-Language Model (VLM)中提炼社交合规导航知识，将其转化为适用于实时机器人导航的轻量级变换器模型。与传统方法依赖专家演示或人工注释数据集不同，Vi-LAD通过利用预训练视觉-动作模型的骨干，在中间层表示级别（即注意力图）上执行知识提炼和微调。这些注意力图突出显示了给定场景中的关键导航区域，这些区域作为社交意识动作规划的隐式指导。Vi-LAD使用从预训练视觉-动作模型中提取的中间注意力图以及从大型VLM构建的类似注意力的语义图，对基于变换器的模型进行微调。为了实现这一目标，我们引入了一种新颖的注意力级别提炼损失，融合来自两个源的知识，生成增强社交意识的增强注意力图。然后，这些精细化的注意力图被用作社交意识模型预测控制器（MPC）中的可穿越成本图，用于导航。我们通过在Husky轮式机器人上进行真实世界实验验证了我们的方法，展示了与最新导航方法相比的显著改进。我们的结果显示成功率提高了14.2%至50%，突显了Vi-LAD在实现社交合规和高效机器人导航方面的有效性。

更新时间: 2025-03-12 20:38:23

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.09820v1

Temporal Difference Flows

Predictive models of the future are fundamental for an agent's ability to reason and plan. A common strategy learns a world model and unrolls it step-by-step at inference, where small errors can rapidly compound. Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states, avoiding cumulative inference errors. While GHMs can be conveniently learned by a generative analog to temporal difference (TD) learning, existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons. This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods. Theoretically, we establish a new convergence result and primarily attribute TD-Flow's efficacy to reduced gradient variance during training. We further show that similar arguments can be extended to diffusion-based methods. Empirically, we validate TD-Flow across a diverse set of domains on both generative metrics and downstream tasks including policy evaluation. Moreover, integrating TD-Flow with recent behavior foundation models for planning over pre-trained policies demonstrates substantial performance gains, underscoring its promise for long-horizon decision-making.

Updated: 2025-03-12 20:30:07

标题: 时间差流量

摘要: 未来的预测模型对于代理人的推理和规划能力至关重要。一种常见的策略是学习一个世界模型，并在推理时逐步展开，其中小错误可能会迅速累积。几何视界模型（GHMs）通过直接预测未来状态，避免了累积推理错误，提供了一个引人注目的替代方案。虽然GHMs可以通过一种类似于时序差异（TD）学习的生成模型方便地学习，但现有方法在训练时通过引导预测受到负面影响，并且很难在长期视界上生成高质量的预测。本文引入了时间差异流（TD-Flow），利用新颖的贝尔曼方程结构以及流匹配技术，在超过先前方法5倍以上的视界长度上学习准确的GHMs。理论上，我们建立了一个新的收敛结果，并主要将TD-Flow的有效性归因于训练过程中降低梯度方差。我们进一步展示类似的论点可以扩展到基于扩散的方法。在实证方面，我们验证了TD-Flow在各种领域上的生成指标和下游任务中的有效性，包括策略评估。此外，将TD-Flow与最近的行为基础模型结合，用于计划预训练策略，显示出显著的性能提升，强调了其在长期决策制定中的潜力。

更新时间: 2025-03-12 20:30:07

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2503.09817v1

Maximum Entropy Heterogeneous-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) has been shown effective for cooperative games in recent years. However, existing state-of-the-art methods face challenges related to sample complexity, training instability, and the risk of converging to a suboptimal Nash Equilibrium. In this paper, we propose a unified framework for learning stochastic policies to resolve these issues. We embed cooperative MARL problems into probabilistic graphical models, from which we derive the maximum entropy (MaxEnt) objective for MARL. Based on the MaxEnt framework, we propose Heterogeneous-Agent Soft Actor-Critic (HASAC) algorithm. Theoretically, we prove the monotonic improvement and convergence to quantal response equilibrium (QRE) properties of HASAC. Furthermore, we generalize a unified template for MaxEnt algorithmic design named Maximum Entropy Heterogeneous-Agent Mirror Learning (MEHAML), which provides any induced method with the same guarantees as HASAC. We evaluate HASAC on six benchmarks: Bi-DexHands, Multi-Agent MuJoCo, StarCraft Multi-Agent Challenge, Google Research Football, Multi-Agent Particle Environment, and Light Aircraft Game. Results show that HASAC consistently outperforms strong baselines, exhibiting better sample efficiency, robustness, and sufficient exploration. See our page at https://sites.google.com/view/meharl.

Updated: 2025-03-12 20:29:23

标题: 最大熵异质代理强化学习

摘要: 多智能体强化学习（MARL）近年来已被证明在合作游戏中有效。然而，现有的最先进方法面临与样本复杂性、训练不稳定性和收敛到次优纳什均衡的风险相关的挑战。本文提出了一个统一的框架，用于学习随机策略以解决这些问题。我们将合作性MARL问题嵌入到概率图模型中，从中推导出MARL的最大熵（MaxEnt）目标。基于MaxEnt框架，我们提出了异质智能体软演员-评论家（HASAC）算法。从理论上讲，我们证明了HASAC的单调改进和收敛到量子响应均衡（QRE）的性质。此外，我们推广了一个名为最大熵异质智能体镜像学习（MEHAML）的MaxEnt算法设计统一模板，为任何诱导方法提供与HASAC相同的保证。我们在六个基准测试中评估了HASAC：Bi-DexHands，多智能体MuJoCo，星际争霸多智能体挑战，谷歌研究足球，多智能体粒子环境和轻型飞机游戏。结果显示，HASAC始终优于强基线，在样本效率、鲁棒性和充分探索方面表现更好。请访问我们的页面https://sites.google.com/view/meharl。

更新时间: 2025-03-12 20:29:23

领域: cs.MA,cs.LG

下载: http://arxiv.org/abs/2306.10715v6

A practical guide to machine learning interatomic potentials -- Status and future

The rapid development and large body of literature on machine learning interatomic potentials (MLIPs) can make it difficult to know how to proceed for researchers who are not experts but wish to use these tools. The spirit of this review is to help such researchers by serving as a practical, accessible guide to the state-of-the-art in MLIPs. This review paper covers a broad range of topics related to MLIPs, including (i) central aspects of how and why MLIPs are enablers of many exciting advancements in molecular modeling, (ii) the main underpinnings of different types of MLIPs, including their basic structure and formalism, (iii) the potentially transformative impact of universal MLIPs for both organic and inorganic systems, including an overview of the most recent advances, capabilities, downsides, and potential applications of this nascent class of MLIPs, (iv) a practical guide for estimating and understanding the execution speed of MLIPs, including guidance for users based on hardware availability, type of MLIP used, and prospective simulation size and time, (v) a manual for what MLIP a user should choose for a given application by considering hardware resources, speed requirements, energy and force accuracy requirements, as well as guidance for choosing pre-trained potentials or fitting a new potential from scratch, (vi) discussion around MLIP infrastructure, including sources of training data, pre-trained potentials, and hardware resources for training, (vii) summary of some key limitations of present MLIPs and current approaches to mitigate such limitations, including methods of including long-range interactions, handling magnetic systems, and treatment of excited states, and finally (viii) we finish with some more speculative thoughts on what the future holds for the development and application of MLIPs over the next 3-10+ years.

Updated: 2025-03-12 20:24:01

标题: 一个实用指南：机器学习原子间势的现状与未来

摘要: 机器学习亚原子势(MLIPs)的快速发展和大量文献可能会让不是专家但希望使用这些工具的研究人员难以知道如何继续。这篇评论的精神是通过作为MLIPs最新技术的实用、易于访问的指南来帮助这样的研究人员。这篇评论涵盖了与MLIPs相关的广泛主题，包括(i)MLIPs如何以及为什么成为分子建模中许多令人兴奋的进展的推动者的中心方面，(ii)不同类型的MLIPs的主要基础，包括它们的基本结构和形式主义，(iii)通用MLIPs对有机和无机系统的潜在变革性影响，包括最新进展、能力、不足以及这种新生MLIP类别的潜在应用的概述，(iv)一个实际指南，用于估算和了解MLIPs的执行速度，包括基于硬件可用性、使用的MLIP类型以及潜在模拟规模和时间的用户指南，(v)用户应该选择哪种MLIP用于特定应用的手册，考虑硬件资源、速度要求、能量和力精度要求，以及选择预训练势能或从头开始拟合新势能的指导，(vi)围绕MLIP基础设施的讨论，包括训练数据来源、预训练势能以及用于训练的硬件资源，(vii)总结当前MLIPs的一些关键局限性以及缓解这些限制的当前方法，包括包括长程相互作用、处理磁性系统和处理激发态的方法，最后(viii)我们对未来3-10年及以上MLIPs的发展和应用前景进行了一些更为推测性的思考。

更新时间: 2025-03-12 20:24:01

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2503.09814v1

Similarity-Distance-Magnitude Universal Verification

We address the neural network robustness problem by adding Similarity (i.e., correctly predicted depth-matches into training)-awareness and Distance-to-training-distribution-awareness to the existing output Magnitude (i.e., decision-boundary)-awareness of the softmax function. The resulting sdm activation function provides strong signals of the relative epistemic (reducible) predictive uncertainty. We use this novel behavior to further address the complementary HCI problem of mapping the output to human-interpretable summary statistics over relevant partitions of a held-out calibration set. Estimates of prediction-conditional uncertainty are obtained via a parsimonious learned transform over the class-conditional empirical CDFs of the output of a final-layer sdm activation function. For decision-making and as an intrinsic model check, estimates of class-conditional accuracy are obtained by further partitioning the high-probability regions of this calibrated output into class-conditional, region-specific CDFs. The uncertainty estimates from sdm calibration are remarkably robust to test-time distribution shifts and out-of-distribution inputs; incorporate awareness of the effective sample size; provide estimates of uncertainty from the learning and data splitting processes; and are well-suited for selective classification and conditional branching for additional test-time compute based on the predictive uncertainty, as for selective LLM generation, routing, and composition over multiple models and retrieval. Finally, we construct sdm networks, LLMs with uncertainty-aware verification and interpretability-by-exemplar as intrinsic properties. We provide open-source software implementing these results.

Updated: 2025-03-12 20:21:05

标题: 相似性-距离-幅度普遍验证

摘要: 我们通过将相似性（即，正确预测的深度匹配加入训练）感知和距离到训练分布感知添加到现有的输出幅度（即，决策边界）感知中来解决神经网络的鲁棒性问题。由此产生的sdm激活函数提供了相对确切（可减少）预测不确定性的强烈信号。我们利用这种新颖行为进一步解决了互补的人机交互问题，即将输出映射到一个保留校准集的相关分区上的人类可解释的摘要统计数据。通过对最终层sdm激活函数输出的类条件经验CDF的简约学习变换，得到了预测条件不确定性的估计。为了决策和作为内在模型检查，通过将这个校准输出的高概率区域进一步划分为类条件、区域特定的CDF，得到了类条件准确性的估计。sdm校准的不确定性估计对测试时分布变化和超出分布输入非常稳健；它们包含了有效样本大小的感知；提供了关于学习和数据分割过程中不确定性的估计；并且非常适合于基于预测不确定性进行选择分类和条件分支，例如用于选择性LLM生成，路由和组合多个模型和检索。最后，我们构建了具有不确定性感知验证和根据示例解释性的sdm网络和LLM。我们提供了实现这些结果的开源软件。

更新时间: 2025-03-12 20:21:05

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2502.20167v2

Fine-tuning Vision Language Models with Graph-based Knowledge for Explainable Medical Image Analysis

Accurate staging of Diabetic Retinopathy (DR) is essential for guiding timely interventions and preventing vision loss. However, current staging models are hardly interpretable, and most public datasets contain no clinical reasoning or interpretation beyond image-level labels. In this paper, we present a novel method that integrates graph representation learning with vision-language models (VLMs) to deliver explainable DR diagnosis. Our approach leverages optical coherence tomography angiography (OCTA) images by constructing biologically informed graphs that encode key retinal vascular features such as vessel morphology and spatial connectivity. A graph neural network (GNN) then performs DR staging while integrated gradients highlight critical nodes and edges and their individual features that drive the classification decisions. We collect this graph-based knowledge which attributes the model's prediction to physiological structures and their characteristics. We then transform it into textual descriptions for VLMs. We perform instruction-tuning with these textual descriptions and the corresponding image to train a student VLM. This final agent can classify the disease and explain its decision in a human interpretable way solely based on a single image input. Experimental evaluations on both proprietary and public datasets demonstrate that our method not only improves classification accuracy but also offers more clinically interpretable results. An expert study further demonstrates that our method provides more accurate diagnostic explanations and paves the way for precise localization of pathologies in OCTA images.

Updated: 2025-03-12 20:19:07

标题: 用基于图的知识对视觉语言模型进行微调，用于可解释的医学图像分析

摘要: 糖尿病视网膜病变（DR）的准确分期对于引导及时干预和预防视力丧失至关重要。然而，当前的分期模型很难解释，大多数公共数据集仅包含图像级别标签，没有临床推理或解释。在本文中，我们提出了一种新颖的方法，将图表示学习与视觉语言模型（VLMs）相结合，以提供可解释的DR诊断。我们的方法利用光学相干断层扫描血管造影（OCTA）图像，通过构建具有关键视网膜血管特征（如血管形态和空间连接性）的生物信息图来进行DR分期。然后，图神经网络（GNN）执行DR分期，同时集成梯度突出显示关键节点和边以及驱动分类决策的各自特征。我们收集这种基于图的知识，将模型的预测归因于生理结构及其特征。然后，我们将其转化为VLMs的文本描述。我们使用这些文本描述和相应的图像进行指导调整，训练一个学生VLM。这个最终代理可以仅基于单个图像输入对疾病进行分类并解释其决策的人类可解释的方式。对专有和公共数据集的实验评估表明，我们的方法不仅提高了分类准确性，还提供了更具临床解释性的结果。专家研究进一步证明我们的方法提供了更准确的诊断解释，并为OCTA图像中病变的精确定位铺平了道路。

更新时间: 2025-03-12 20:19:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09808v1

Un-Straightening Generative AI: How Queer Artists Surface and Challenge the Normativity of Generative AI Models

Queer people are often discussed as targets of bias, harm, or discrimination in research on generative AI. However, the specific ways that queer people engage with generative AI, and thus possible uses that support queer people, have yet to be explored. We conducted a workshop study with 13 queer artists, during which we gave participants access to GPT-4 and DALL-E 3 and facilitated group sensemaking activities. We found our participants struggled to use these models due to various normative values embedded in their designs, such as hyper-positivity and anti-sexuality. We describe various strategies our participants developed to overcome these models' limitations and how, nevertheless, our participants found value in these highly-normative technologies. Drawing on queer feminist theory, we discuss implications for the conceptualization of "state-of-the-art" models and consider how FAccT researchers might support queer alternatives.

Updated: 2025-03-12 20:16:38

标题: 《非直的生成式人工智能：酷儿艺术家如何凸显和挑战生成式人工智能模型的规范性》

摘要: 同性恋者在生成式人工智能研究中常被讨论为偏见、伤害或歧视的目标。然而，关于同性恋者如何与生成式人工智能互动以及可能支持同性恋者的使用方式尚未被探索。我们进行了一项与13名同性恋艺术家的研讨会研究，参与者在活动中获得了对GPT-4和DALL-E 3的访问权限，并进行了小组意义建构活动。我们发现，我们的参与者由于这些模型中嵌入的各种规范价值观（如超级积极性和反性）而难以使用这些模型。我们描述了参与者制定的各种策略来克服这些模型的局限性，以及我们的参与者如何发现这些高度规范技术的价值。借鉴酷儿女性主义理论，我们讨论了对“最先进”模型的概念化的影响，并考虑了FAccT研究者如何支持酷儿替代方案。

更新时间: 2025-03-12 20:16:38

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2503.09805v1

Batch List-Decodable Linear Regression via Higher Moments

We study the task of list-decodable linear regression using batches. A batch is called clean if it consists of i.i.d. samples from an unknown linear regression distribution. For a parameter $\alpha \in (0, 1/2)$, an unknown $\alpha$-fraction of the batches are clean and no assumptions are made on the remaining ones. The goal is to output a small list of vectors at least one of which is close to the true regressor vector in $\ell_2$-norm. [DJKS23] gave an efficient algorithm, under natural distributional assumptions, with the following guarantee. Assuming that the batch size $n$ satisfies $n \geq \tilde{\Omega}(\alpha^{-1})$ and the number of batches is $m = \mathrm{poly}(d, n, 1/\alpha)$, their algorithm runs in polynomial time and outputs a list of $O(1/\alpha^2)$ vectors at least one of which is $\tilde{O}(\alpha^{-1/2}/\sqrt{n})$ close to the target regressor. Here we design a new polynomial time algorithm with significantly stronger guarantees under the assumption that the low-degree moments of the covariates distribution are Sum-of-Squares (SoS) certifiably bounded. Specifically, for any constant $\delta>0$, as long as the batch size is $n \geq \Omega_{\delta}(\alpha^{-\delta})$ and the degree-$\Theta(1/\delta)$ moments of the covariates are SoS certifiably bounded, our algorithm uses $m = \mathrm{poly}((dn)^{1/\delta}, 1/\alpha)$ batches, runs in polynomial-time, and outputs an $O(1/\alpha)$-sized list of vectors one of which is $O(\alpha^{-\delta/2}/\sqrt{n})$ close to the target. That is, our algorithm achieves substantially smaller minimum batch size and final error, while achieving the optimal list size. Our approach uses higher-order moment information by carefully combining the SoS paradigm interleaved with an iterative method and a novel list pruning procedure. In the process, we give an SoS proof of the Marcinkiewicz-Zygmund inequality that may be of broader applicability.

Updated: 2025-03-12 20:11:07

标题: 通过更高阶矩实现批量列表可解的线性回归

摘要: 我们研究使用批处理进行列表可解线性回归的任务。如果一个批次由未知线性回归分布的i.i.d.样本组成，则称其为干净的批次。对于参数$\alpha \in (0, 1/2)$，未知$\alpha$比例的批次是干净的，对剩余的批次不做任何假设。目标是输出一个小列表的向量，其中至少有一个向量在$\ell_2$-范数下接近真实的回归向量。[DJKS23]提出了一个有效的算法，在自然分布假设下，具有以下保证。假设批次大小$n$满足$n \geq \tilde{\Omega}(\alpha^{-1})$，批次数为$m = \mathrm{poly}(d, n, 1/\alpha)$，他们的算法在多项式时间内运行，并输出一个包含$O(1/\alpha^2)$个向量的列表，其中至少有一个向量与目标回归向量$\tilde{O}(\alpha^{-1/2}/\sqrt{n})$接近。在这里，我们设计了一个新的多项式时间算法，在低次矩分布的假设下提供了更强的保证。具体地，对于任意常数$\delta>0$，只要批处理大小为$n \geq \Omega_{\delta}(\alpha^{-\delta})$，协变量的度-$\Theta(1/\delta)$矩被SoS认证地限制，我们的算法使用$m = \mathrm{poly}((dn)^{1/\delta}, 1/\alpha)$批次，在多项式时间内运行，并输出一个包含$O(1/\alpha)$个向量的列表，其中至少一个向量与目标向量$O(\alpha^{-\delta/2}/\sqrt{n})$接近。换句话说，我们的算法实现了更小的最小批处理大小和最终误差，同时实现了最佳列表大小。我们的方法通过精心将SoS范式与迭代方法和新颖的列表修剪过程相结合，利用高阶矩信息。在这个过程中，我们给出了Marcinkiewicz-Zygmund不等式的SoS证明，这可能具有更广泛的适用性。

更新时间: 2025-03-12 20:11:07

领域: cs.LG,cs.DS,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2503.09802v1

Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

As we scale to more massive machine learning models, the frequent synchronization demands inherent in data-parallel approaches create significant slowdowns, posing a critical challenge to further scaling. Recent work develops an approach (DiLoCo) that relaxes synchronization demands without compromising model quality. However, these works do not carefully analyze how DiLoCo's behavior changes with model size. In this work, we study the scaling law behavior of DiLoCo when training LLMs under a fixed compute budget. We focus on how algorithmic factors, including number of model replicas, hyperparameters, and token budget affect training in ways that can be accurately predicted via scaling laws. We find that DiLoCo scales both predictably and robustly with model size. When well-tuned, DiLoCo scales better than data-parallel training with model size, and can outperform data-parallel training even at small model sizes. Our results showcase a more general set of benefits of DiLoCo than previously documented, including increased optimal batch sizes, improved downstream generalization with scale, and improved evaluation loss for a fixed token budget.

Updated: 2025-03-12 20:04:38

标题: 高效沟通的语言模型训练可靠且稳健地扩展：DiLoCo的扩展定律

摘要: 随着我们扩展到更大规模的机器学习模型，数据并行方法中固有的频繁同步需求造成了显著的减速，对进一步扩展构成了重大挑战。最近的研究开发了一种方法（DiLoCo），它放宽了同步需求，而不会影响模型质量。然而，这些研究并没有仔细分析DiLoCo在模型规模变化时的行为。在这项工作中，我们研究了在固定计算预算下训练LLM时DiLoCo的缩放定律行为。我们关注算法因素如模型副本数量、超参数和令牌预算如何影响训练，这些影响可以通过缩放定律准确预测。我们发现，DiLoCo随着模型规模的增大既可以可预测又具有鲁棒性。当调整得当时，DiLoCo随着模型规模的增大比数据并行训练效果更好，并且即使在较小的模型规模下，也可以胜过数据并行训练。我们的结果展示了DiLoCo比以前记录的更一般的一系列优点，包括增加的最佳批量大小、随着规模的增大而改进的下游泛化能力，以及固定令牌预算下改善的评估损失。

更新时间: 2025-03-12 20:04:38

领域: cs.LG,cs.CL,cs.DC

下载: http://arxiv.org/abs/2503.09799v1

Exploration of Hepatitis B Virus Infection Dynamics through Virology-Informed Neural Network: A Novel Artificial Intelligence Approach

In this work, we introduce Virology-Informed Neural Networks (VINNs), a powerful tool for capturing the intricate dynamics of viral infection when data of some compartments of the model are not available. VINNs, an extension of the widely known Physics-Informed Neural Networks (PINNs), offer an alternative approach to traditional numerical methods for solving system of differential equations. We apply this VINN technique on a recently proposed hepatitis B virus (HBV) infection dynamics model to predict the transmission of the infection within the liver more accurately. This model consists of four compartments, namely uninfected and infected hepatocytes, rcDNA-containing capsids, and free viruses, along with the consideration of capsid recycling. Leveraging the power of VINNs, we study the impacts of variations in parameter range, experimental noise, data variability, network architecture, and learning rate in this work. In order to demonstrate the robustness and effectiveness of VINNs, we employ this approach on the data collected from nine HBV-infceted chimpanzees, and it is observed that VINNs can effectively estimate the model parameters. VINNs reliably capture the dynamics of infection spread and accurately predict their future progression using real-world data. Furthermore, VINNs efficiently identify the most influential parameters in HBV dynamics based solely on experimental data from the capsid component. It is also expected that this framework can be extended beyond viral dynamics, providing a powerful tool for uncovering hidden patterns and complex interactions across various scientific and engineering domains.

Updated: 2025-03-12 20:02:31

标题: 通过病毒学知识指导的神经网络探索乙型肝炎病毒感染动态：一种新颖的人工智能方法

摘要: 在这项工作中，我们介绍了病毒学信息神经网络（VINNs），这是一种强大的工具，用于捕捉病毒感染的复杂动态，即当模型的某些组分的数据不可用时。VINNs是广为人知的物理信息神经网络（PINNs）的扩展，提供了一种替代传统数值方法的方法，用于解决微分方程组。我们将这种VINN技术应用于最近提出的乙型肝炎病毒（HBV）感染动力学模型，以更准确地预测肝脏内感染的传播。该模型包括四个组分，即未感染和感染的肝细胞、含rcDNA的壳体以及游离病毒，同时考虑了壳体的再循环。利用VINNs的力量，我们研究了参数范围、实验噪声、数据变异性、网络架构和学习率在这项工作中的影响。为了证明VINNs的稳健性和有效性，我们将这种方法应用于从九只HBV感染黑猩猩收集的数据，并观察到VINNs可以有效地估计模型参数。VINNs可靠地捕捉感染传播的动态，并使用真实世界数据准确预测其未来进展。此外，VINNs可以有效地仅基于壳体组分的实验数据识别HBV动态中最有影响力的参数。预计这种框架可以扩展到病毒动态之外，为揭示各种科学和工程领域中的隐藏模式和复杂互动提供强大的工具。

更新时间: 2025-03-12 20:02:31

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2503.10708v1

SeqSAM: Autoregressive Multiple Hypothesis Prediction for Medical Image Segmentation using SAM

Pre-trained segmentation models are a powerful and flexible tool for segmenting images. Recently, this trend has extended to medical imaging. Yet, often these methods only produce a single prediction for a given image, neglecting inherent uncertainty in medical images, due to unclear object boundaries and errors caused by the annotation tool. Multiple Choice Learning is a technique for generating multiple masks, through multiple learned prediction heads. However, this cannot readily be extended to producing more outputs than its initial pre-training hyperparameters, as the sparse, winner-takes-all loss function makes it easy for one prediction head to become overly dominant, thus not guaranteeing the clinical relevancy of each mask produced. We introduce SeqSAM, a sequential, RNN-inspired approach to generating multiple masks, which uses a bipartite matching loss for ensuring the clinical relevancy of each mask, and can produce an arbitrary number of masks. We show notable improvements in quality of each mask produced across two publicly available datasets. Our code is available at https://github.com/BenjaminTowle/SeqSAM.

Updated: 2025-03-12 20:01:52

标题: SeqSAM：使用SAM进行医学图像分割的自回归多假设预测

摘要: 预训练分割模型是一种强大而灵活的工具，用于图像分割。最近，这种趋势已扩展到医学成像领域。然而，通常这些方法只针对给定图像生成单个预测，忽略了医学图像固有的不确定性，由于对象边界不清晰和注释工具引起的错误。多项选择学习是一种通过多个学习预测头生成多个掩模的技术。然而，这种方法不能轻易扩展到生成比其初始预训练超参数更多的输出，因为稀疏的胜者通吃损失函数使得一个预测头很容易变得过于主导，因此不能保证每个生成的掩模的临床相关性。我们引入了SeqSAM，这是一种顺序的、受RNN启发的方法，用于生成多个掩模，它使用二部匹配损失来确保每个掩模的临床相关性，并且可以产生任意数量的掩模。我们展示了在两个公开可用数据集上生成的每个掩模质量的显着提高。我们的代码可在https://github.com/BenjaminTowle/SeqSAM 上找到。

更新时间: 2025-03-12 20:01:52

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.09797v1

Bimodal Connection Attention Fusion for Speech Emotion Recognition

Multi-modal emotion recognition is challenging due to the difficulty of extracting features that capture subtle emotional differences. Understanding multi-modal interactions and connections is key to building effective bimodal speech emotion recognition systems. In this work, we propose Bimodal Connection Attention Fusion (BCAF) method, which includes three main modules: the interactive connection network, the bimodal attention network, and the correlative attention network. The interactive connection network uses an encoder-decoder architecture to model modality connections between audio and text while leveraging modality-specific features. The bimodal attention network enhances semantic complementation and exploits intra- and inter-modal interactions. The correlative attention network reduces cross-modal noise and captures correlations between audio and text. Experiments on the MELD and IEMOCAP datasets demonstrate that the proposed BCAF method outperforms existing state-of-the-art baselines.

Updated: 2025-03-12 19:50:21

标题: 双模态连接注意力融合在语音情感识别中的应用

摘要: 多模情感识别具有挑战性，因为提取能够捕捉微妙情感差异的特征很困难。理解多模态交互和连接对于构建有效的双模态语音情感识别系统至关重要。在这项工作中，我们提出了双模连接注意融合（BCAF）方法，包括三个主要模块：交互连接网络，双模态注意网络和相关注意网络。交互连接网络使用编码器-解码器架构来建模音频和文本之间的模态连接，同时利用模态特定特征。双模态注意网络增强语义互补性并利用模态内部和模态间的交互作用。相关注意网络降低跨模态噪音并捕捉音频和文本之间的相关性。在MELD和IEMOCAP数据集上的实验表明，提出的BCAF方法优于现有的最先进基线。

更新时间: 2025-03-12 19:50:21

领域: cs.SD,cs.AI,cs.CL,cs.MM,eess.AS

下载: http://arxiv.org/abs/2503.05858v2

Minimal Time Series Transformer

Transformer is the state-of-the-art model for many natural language processing, computer vision, and audio analysis problems. Transformer effectively combines information from the past input and output samples in auto-regressive manner so that each sample becomes aware of all inputs and outputs. In sequence-to-sequence (Seq2Seq) modeling, the transformer processed samples become effective in predicting the next output. Time series forecasting is a Seq2Seq problem. The original architecture is defined for discrete input and output sequence tokens, but to adopt it for time series, the model must be adapted for continuous data. This work introduces minimal adaptations to make the original transformer architecture suitable for continuous value time series data.

Updated: 2025-03-12 19:48:37

标题: 最小时间序列转换器

摘要: Transformer是许多自然语言处理、计算机视觉和音频分析问题的最新模型。Transformer有效地以自回归方式结合过去的输入和输出样本的信息，使得每个样本都能意识到所有的输入和输出。在序列到序列（Seq2Seq）建模中，Transformer处理过的样本在预测下一个输出方面非常有效。时间序列预测是一个Seq2Seq问题。原始架构是为离散输入和输出序列令牌定义的，但要将其用于时间序列，模型必须适应连续数据。本文介绍了最小的改进，使原始Transformer架构适用于连续数值时间序列数据。

更新时间: 2025-03-12 19:48:37

领域: cs.LG

下载: http://arxiv.org/abs/2503.09791v1

Constrained Language Generation with Discrete Diffusion Models

Constraints are critical in text generation as LLM outputs are often unreliable when it comes to ensuring generated outputs adhere to user defined instruction or general safety guidelines. To address this gap, we present Constrained Discrete Diffusion (CDD), a novel method for enforcing constraints on natural language by integrating discrete diffusion models with differentiable optimization. Unlike conventional text generators, which often rely on post-hoc filtering or model retraining for controllable generation, we propose imposing constraints directly into the discrete diffusion sampling process. We illustrate how this technique can be applied to satisfy a variety of natural language constraints, including (i) toxicity mitigation by preventing harmful content from emerging, (ii) character and sequence level lexical constraints, and (iii) novel molecule sequence generation with specific property adherence. Experimental results show that our constraint-aware procedure achieves high fidelity in meeting these requirements while preserving fluency and semantic coherence, outperforming auto-regressive and existing discrete diffusion approaches.

Updated: 2025-03-12 19:48:12

标题: 使用离散扩散模型的受限语言生成

摘要: 约束在文本生成中至关重要，因为LLM的输出通常在确保生成的结果符合用户定义的指令或一般安全准则方面不可靠。为了填补这一差距，我们提出了约束离散扩散（CDD），这是一种通过将离散扩散模型与可微优化相结合来强制约束自然语言的新方法。与常规文本生成器不同，后者通常依赖事后过滤或模型重新训练来实现可控生成，我们建议直接将约束强加到离散扩散采样过程中。我们阐述了如何应用这种技术来满足各种自然语言约束，包括（i）通过防止有害内容出现来减轻毒性，（ii）角色和序列级词汇约束，以及（iii）具有特定属性依从性的新分子序列生成。实验结果表明，我们的约束感知程序在满足这些要求方面具有高度忠实性，同时保留流畅性和语义连贯性，优于自回归和现有的离散扩散方法。

更新时间: 2025-03-12 19:48:12

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.09790v1

LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

Foundation models (FMs) achieve strong performance across diverse tasks with task-specific fine-tuning, yet full parameter fine-tuning is often computationally prohibitive for large models. Parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaptation (LoRA) reduce this cost by introducing low-rank matrices for tuning fewer parameters. While LoRA allows for efficient fine-tuning, it requires significant data for adaptation, making Federated Learning (FL) an appealing solution due to its privacy-preserving collaborative framework. However, combining LoRA with FL introduces two key challenges: the \textbf{Server-Side Aggregation Bias}, where server-side averaging of LoRA matrices diverges from the ideal global update, and the \textbf{Client-Side Initialization Lag}, emphasizing the need for consistent initialization across rounds. Existing approaches address these challenges individually, limiting their effectiveness. We propose LoRA-FAIR, a novel method that tackles both issues by introducing a correction term on the server, enhancing aggregation efficiency and accuracy. LoRA-FAIR maintains computational and communication efficiency, yielding superior performance over state-of-the-art methods. Experimental results on ViT and MLP-Mixer models across large-scale datasets demonstrate that LoRA-FAIR consistently achieves performance improvements in FL settings.

Updated: 2025-03-12 19:43:25

标题: LoRA-FAIR：具有聚合和初始化细化的联合LoRA微调

摘要: 基金会模型（FMs）通过任务特定的微调在各种任务中取得了强大的性能，但对于大型模型来说，全参数微调通常在计算上是禁止的。参数高效微调（PEFT）方法如低秩适应（LoRA）通过引入低秩矩阵来微调更少的参数，从而降低了成本。虽然LoRA可以实现高效微调，但它需要大量数据进行适应，因此联邦学习（FL）是一个吸引人的解决方案，因为它具有保护隐私的协作框架。然而，将LoRA与FL相结合会引入两个关键挑战：\textbf{服务器端聚合偏差}，即LoRA矩阵的服务器端平均值与理想的全局更新不一致，以及\textbf{客户端初始化滞后}，强调在各轮之间保持一致的初始化的必要性。现有方法分别解决这些挑战，限制了它们的有效性。我们提出了LoRA-FAIR，一种通过在服务器端引入校正项来解决这两个问题的新方法，提高了聚合效率和准确性。LoRA-FAIR保持了计算和通信效率，优于现有方法的性能。在大规模数据集上对ViT和MLP-Mixer模型的实验结果表明，LoRA-FAIR在FL环境中始终实现了性能改进。

更新时间: 2025-03-12 19:43:25

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2411.14961v2

Designing Graph Convolutional Neural Networks for Discrete Choice with Network Effects

We introduce a novel model architecture that incorporates network effects into discrete choice problems, achieving higher predictive performance than standard discrete choice models while offering greater interpretability than general-purpose flexible model classes. Econometric discrete choice models aid in studying individual decision-making, where agents select the option with the highest reward from a discrete set of alternatives. Intuitively, the utility an individual derives from a particular choice depends on their personal preferences and characteristics, the attributes of the alternative, and the value their peers assign to that alternative or their previous choices. However, most applications ignore peer influence, and models that do consider peer or network effects often lack the flexibility and predictive performance of recently developed approaches to discrete choice, such as deep learning. We propose a novel graph convolutional neural network architecture to model network effects in discrete choices, achieving higher predictive performance than standard discrete choice models while retaining the interpretability necessary for inference--a quality often lacking in general-purpose deep learning architectures. We evaluate our architecture using revealed commuting choice data, extended with travel times and trip costs for each travel mode for work-related trips in New York City, as well as 2016 U.S. election data aggregated by county, to test its performance on datasets with highly imbalanced classes. Given the interpretability of our models, we can estimate relevant economic metrics, such as the value of travel time savings in New York City. Finally, we compare the predictive performance and behavioral insights from our architecture to those derived from traditional discrete choice and general-purpose deep learning models.

Updated: 2025-03-12 19:38:47

标题: 为离散选择和网络效应设计图卷积神经网络

摘要: 我们介绍了一种新颖的模型架构，将网络效应纳入离散选择问题中，实现了比标准离散选择模型更高的预测性能，同时比一般灵活模型类提供了更大的解释性。计量经济学离散选择模型有助于研究个体决策，其中代理人从一组离散的选择中选择具有最高奖励的选项。直观地，个体从特定选择中获得的效用取决于他们的个人偏好和特征、备选选择的属性以及他们的同行对该选择或其先前选择的价值。然而，大多数应用忽略了同行影响，而那些考虑同行或网络效应的模型往往缺乏最近发展的离散选择方法（如深度学习）的灵活性和预测性能。我们提出了一种新颖的图卷积神经网络架构，以模拟离散选择中的网络效应，实现了比标准离散选择模型更高的预测性能，同时保留了推断所需的解释性--这是一般用途深度学习架构常常缺乏的品质。我们使用揭示的通勤选择数据对我们的架构进行评估，该数据扩展了纽约市工作相关出行的每种出行方式的旅行时间和费用，以及2016年按县汇总的美国选举数据，以测试其在高度不平衡类别数据集上的性能。鉴于我们模型的解释性，我们可以估计相关的经济指标，如纽约市节省旅行时间的价值。最后，我们将我们的架构的预测性能和行为洞察与传统离散选择和一般深度学习模型中获得的洞察进行了比较。

更新时间: 2025-03-12 19:38:47

领域: cs.LG,econ.EM,stat.ML

下载: http://arxiv.org/abs/2503.09786v1

On strategies for risk management and decision making under uncertainty shared across multiple fields

Decision theory recognizes two principal approaches to solving problems under uncertainty: probabilistic models and cognitive heuristics. However, engineers, public planners and decision-makers in other fields seem to employ solution strategies that do not fall into either field, i.e., strategies such as robust design and contingency planning. In addition, identical strategies appear in several fields and disciplines, pointing to an important shared toolkit. The focus of this paper is to develop a systematic understanding of such strategies and develop a framework to better employ them in decision making and risk management. The paper finds more than 110 examples of such strategies and this approach to risk is termed RDOT: Risk-reducing Design and Operations Toolkit. RDOT strategies fall into six broad categories: structural, reactive, formal, adversarial, multi-stage and positive. RDOT strategies provide an efficient response even to radical uncertainty or unknown unknowns that are challenging to address with probabilistic methods. RDOT could be incorporated into decision theory using workflows, multi-objective optimization and multi-attribute utility theory. Overall, RDOT represents an overlooked class of versatile responses to uncertainty. Because RDOT strategies do not require precise estimation or forecasting, they are particularly helpful in decision problems affected by uncertainty and for resource-constrained decision making.

Updated: 2025-03-12 19:38:21

标题: 在多个领域共享的不确定性下风险管理和决策制定策略

摘要: 决策理论认识到在不确定性下解决问题有两种主要方法：概率模型和认知启发式。然而，工程师、公共规划者和其他领域的决策者似乎采用的解决策略并不属于任何一种领域，即鲁棒设计和应急计划等策略。此外，相同的策略出现在多个领域和学科中，指向一个重要的共享工具包。本文的重点是发展对这些策略的系统理解，并开发一个框架，以更好地将它们应用于决策和风险管理。本文发现了超过110个这种策略的例子，并将这种风险方法称为RDOT：风险减少设计和运营工具包。RDOT策略分为六个广泛的类别：结构、反应、正式、对抗、多阶段和积极。RDOT策略即使在面对概率方法难以解决的激进不确定性或未知未知因素时，也能提供高效的应对。RDOT可以通过工作流、多目标优化和多属性效用理论纳入决策理论中。总的来说，RDOT代表了一类被忽视的对不确定性多变的响应。因为RDOT策略不需要精确的估计或预测，它们在受不确定性影响的决策问题和资源受限制的决策制定中尤其有帮助。

更新时间: 2025-03-12 19:38:21

领域: q-fin.RM,cs.AI,math.OC,60A05, 91B05, 62C05,F.1.3

下载: http://arxiv.org/abs/2309.03133v2

Learning richness modulates equality reasoning in neural networks

Equality reasoning is ubiquitous and purely abstract: sameness or difference may be evaluated no matter the nature of the underlying objects. As a result, same-different tasks (SD) have been extensively studied as a starting point for understanding abstract reasoning in humans and across animal species. With the rise of neural networks (NN) that exhibit striking apparent proficiency for abstractions, equality reasoning in NNs has also gained interest. Yet despite extensive study, conclusions about equality reasoning vary widely and with little consensus. To clarify the underlying principles in learning SD, we develop a theory of equality reasoning in multi-layer perceptrons (MLP). Following observations in comparative psychology, we propose a spectrum of behavior that ranges from conceptual to perceptual outcomes. Conceptual behavior is characterized by task-specific representations, efficient learning, and insensitivity to spurious perceptual details. Perceptual behavior is characterized by strong sensitivity to spurious perceptual details, accompanied by the need for exhaustive training to learn the task. We develop a mathematical theory to show that an MLP's behavior is driven by learning richness. Rich-regime MLPs exhibit conceptual behavior, whereas lazy-regime MLPs exhibit perceptual behavior. We validate our theoretical findings in vision SD experiments, showing that rich feature learning promotes success by encouraging hallmarks of conceptual behavior. Overall, our work identifies feature learning richness as a key parameter modulating equality reasoning, and suggests that equality reasoning in humans and animals may similarly depend on learning richness in neural circuits.

Updated: 2025-03-12 19:30:51

标题: 学习丰富度调节神经网络中的平等推理

摘要: Equality reasoning is a common and abstract concept that can be applied regardless of the objects involved. Same-different tasks (SD) have been extensively studied to understand abstract reasoning in humans and animals. With the emergence of neural networks (NN) showing proficiency in abstractions, interest in equality reasoning in NNs has grown. However, there is little consensus on the conclusions drawn from these studies. To clarify the principles behind learning SD, a theory of equality reasoning in multi-layer perceptrons (MLP) is developed. Drawing from comparative psychology, a spectrum of behavior from conceptual to perceptual outcomes is proposed. Conceptual behavior involves task-specific representations, efficient learning, and insensitivity to irrelevant details, while perceptual behavior is sensitive to irrelevant details and requires extensive training. A mathematical theory demonstrates that the behavior of an MLP is driven by learning richness, with rich-regime MLPs exhibiting conceptual behavior and lazy-regime MLPs exhibiting perceptual behavior. Experimental results in vision SD tasks support the theory, showing that rich feature learning promotes success by encouraging conceptual behavior. The study highlights feature learning richness as a crucial parameter in modulating equality reasoning, suggesting that equality reasoning in humans and animals may also depend on learning richness in neural circuits.

更新时间: 2025-03-12 19:30:51

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2503.09781v1

AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents

LLM-powered AI agents are an emerging frontier with tremendous potential to increase human productivity. However, empowering AI agents to take action on their user's behalf in day-to-day tasks involves giving them access to potentially sensitive and private information, which leads to a possible risk of inadvertent privacy leakage when the agent malfunctions. In this work, we propose one way to address that potential risk, by training AI agents to better satisfy the privacy principle of data minimization. For the purposes of this benchmark, by "data minimization" we mean instances where private information is shared only when it is necessary to fulfill a specific task-relevant purpose. We develop a benchmark called AgentDAM to evaluate how well existing and future AI agents can limit processing of potentially private information that we designate "necessary" to fulfill the task. Our benchmark simulates realistic web interaction scenarios and is adaptable to all existing web navigation agents. We use AgentDAM to evaluate how well AI agents built on top of GPT-4, Llama-3 and Claude can limit processing of potentially private information when unnecessary, and show that these agents are often prone to inadvertent use of unnecessary sensitive information. We finally propose a prompting-based approach that reduces this.

Updated: 2025-03-12 19:30:31

标题: AgentDAM：自主网络代理的隐私泄露评估

摘要: 基于LLM的人工智能代理是一个新兴的领域，具有巨大潜力来提高人类生产力。然而，赋予人工智能代理在日常任务中代表用户采取行动的能力涉及给予它们访问潜在敏感和私密信息的权限，这可能导致代理出现故障时意外泄露隐私的风险。在这项工作中，我们提出了一种解决潜在风险的方法，即通过训练人工智能代理更好地满足数据最小化隐私原则。在本次基准测试中，“数据最小化”指的是仅在必要时共享私密信息以实现特定任务相关目的的情况。我们开发了一个名为AgentDAM的基准测试，用于评估现有和未来人工智能代理在限制处理我们指定为“必要”的潜在私密信息时的表现。我们的基准测试模拟了现实的网络交互场景，并适用于所有现有的网络导航代理。我们使用AgentDAM来评估建立在GPT-4、Llama-3和Claude基础上的人工智能代理在不必要时如何限制处理潜在私密信息，并显示这些代理往往容易无意中使用不必要的敏感信息。最后，我们提出了一种基于提示的方法来减少这种情况。

更新时间: 2025-03-12 19:30:31

领域: cs.AI

下载: http://arxiv.org/abs/2503.09780v1

Real-Time Risky Fault-Chain Search using Time-Varying Graph RNNs

This paper introduces a data-driven graphical framework for the real-time search of risky cascading fault chains (FCs) in power-grids, crucial for enhancing grid resiliency in the face of climate change. As extreme weather events driven by climate change increase, identifying risky FCs becomes crucial for mitigating cascading failures and ensuring grid stability. However, the complexity of the spatio-temporal dependencies among grid components and the exponential growth of the search space with system size pose significant challenges to modeling and risky FC search. To tackle this, we model the search process as a partially observable Markov decision process (POMDP), which is subsequently solved via a time-varying graph recurrent neural network (GRNN). This approach captures the spatial and temporal structure induced by the system's topology and dynamics, while efficiently summarizing the system's history in the GRNN's latent space, enabling scalable and effective identification of risky FCs.

Updated: 2025-03-12 19:27:07

标题: 实时风险故障链搜索利用时变图RNNs

摘要: 本文介绍了一个基于数据驱动的图形框架，用于实时搜索电力网络中风险级联故障链（FCs），这对增强电网的抗灾能力至关重要。随着气候变化引发的极端天气事件的增加，识别风险级联故障链对减轻级联故障并确保电网稳定至关重要。然而，电网组件之间的时空依赖关系的复杂性以及随着系统规模的增长而指数级增长的搜索空间对建模和风险级联故障链搜索构成重大挑战。为了解决这个问题，我们将搜索过程建模为部分可观察的马尔可夫决策过程（POMDP），随后通过一个时间变化的图形循环神经网络（GRNN）来解决。这种方法捕捉了系统拓扑和动态所引发的空间和时间结构，同时有效地总结了GRNN的潜在空间中的系统历史，从而实现了风险级联故障链的可扩展和有效识别。

更新时间: 2025-03-12 19:27:07

领域: cs.LG,cs.SY,eess.SP,eess.SY

下载: http://arxiv.org/abs/2503.09775v1

How to Combine Differential Privacy and Continual Learning

The goal of continual learning (CL) is to retain knowledge across tasks, but this conflicts with strict privacy required for sensitive training data that prevents storing or memorising individual samples. This work explores the intersection of CL and differential privacy (DP). We advance the theoretical understanding and introduce methods for combining CL and DP. We formulate and clarify the theory for DP CL focusing on composition over tasks. We introduce different variants of choosing classifiers' output label space, show that choosing the output label space directly based on the task data is not DP, and offer a DP alternative. We propose a method for combining pre-trained models with DP prototype classifiers and parameter-efficient adapters learned under DP to address the trade-offs between privacy and utility in a CL setting. We also demonstrate the effectiveness of our methods for varying degrees of domain shift, for blurry tasks, and with different output label settings.

Updated: 2025-03-12 19:22:07

标题: 如何将差分隐私与持续学习结合起来

摘要: 持续学习（CL）的目标是在任务之间保留知识，但这与需要严格保密的敏感训练数据相冲突，这些数据阻止了存储或记忆个别样本。本研究探讨了持续学习（CL）和差分隐私（DP）的交集。我们推进了对CL和DP结合的理论理解，并引入了结合CL和DP的方法。我们制定和澄清了DP CL理论，重点关注任务间的组合。我们介绍了选择分类器输出标签空间的不同变体，表明直接基于任务数据选择输出标签空间并不是DP，并提供了一个DP的替代方案。我们提出了一种方法，通过将预训练模型与DP原型分类器和在DP下学习的参数高效适配器相结合，来解决在CL环境中隐私和效用之间的权衡。我们还展示了我们的方法在不同程度的领域转移、模糊任务和不同输出标签设置下的有效性。

更新时间: 2025-03-12 19:22:07

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2411.04680v3

Cover Learning for Large-Scale Topology Representation

Classical unsupervised learning methods like clustering and linear dimensionality reduction parametrize large-scale geometry when it is discrete or linear, while more modern methods from manifold learning find low dimensional representation or infer local geometry by constructing a graph on the input data. More recently, topological data analysis popularized the use of simplicial complexes to represent data topology with two main methodologies: topological inference with geometric complexes and large-scale topology visualization with Mapper graphs -- central to these is the nerve construction from topology, which builds a simplicial complex given a cover of a space by subsets. While successful, these have limitations: geometric complexes scale poorly with data size, and Mapper graphs can be hard to tune and only contain low dimensional information. In this paper, we propose to study the problem of learning covers in its own right, and from the perspective of optimization. We describe a method for learning topologically-faithful covers of geometric datasets, and show that the simplicial complexes thus obtained can outperform standard topological inference approaches in terms of size, and Mapper-type algorithms in terms of representation of large-scale topology.

Updated: 2025-03-12 19:10:20

标题: 大规模拓扑表示的覆盖学习

摘要: 经典的无监督学习方法，如聚类和线性降维，在几何形状是离散或线性时参数化大规模几何结构，而从流形学习中发展出的更现代方法则通过在输入数据上构建图来找到低维表示或推断局部几何形状。最近，拓扑数据分析普及了使用单纯复合体来表示数据拓扑，主要有两种方法论：使用几何复合体进行拓扑推断和使用Mapper图进行大规模拓扑可视化。这些方法的核心是从拓扑学中构建神经结构，根据子集构建一个单纯复合体。虽然这些方法取得了成功，但存在一些限制：几何复合体随着数据规模增大而扩展困难，Mapper图可能很难调整，且仅包含低维信息。在本文中，我们提出研究学习覆盖问题本身，并从优化的角度来看待这个问题。我们描述了一种学习几何数据集的拓扑忠实覆盖的方法，并展示了由此获得的单纯复合体在尺寸方面可以优于标准的拓扑推断方法，而在表示大规模拓扑方面可以优于Mapper类型的算法。

更新时间: 2025-03-12 19:10:20

领域: cs.LG,cs.CG,math.AT,stat.ML

下载: http://arxiv.org/abs/2503.09767v1

Structure Language Models for Protein Conformation Generation

Proteins adopt multiple structural conformations to perform their diverse biological functions, and understanding these conformations is crucial for advancing drug discovery. Traditional physics-based simulation methods often struggle with sampling equilibrium conformations and are computationally expensive. Recently, deep generative models have shown promise in generating protein conformations as a more efficient alternative. However, these methods predominantly rely on the diffusion process within a 3D geometric space, which typically centers around the vicinity of metastable states and is often inefficient in terms of runtime. In this paper, we introduce Structure Language Modeling (SLM) as a novel framework for efficient protein conformation generation. Specifically, the protein structures are first encoded into a compact latent space using a discrete variational auto-encoder, followed by conditional language modeling that effectively captures sequence-specific conformation distributions. This enables a more efficient and interpretable exploration of diverse ensemble modes compared to existing methods. Based on this general framework, we instantiate SLM with various popular LM architectures as well as proposing the ESMDiff, a novel BERT-like structure language model fine-tuned from ESM3 with masked diffusion. We verify our approach in various scenarios, including the equilibrium dynamics of BPTI, conformational change pairs, and intrinsically disordered proteins. SLM provides a highly efficient solution, offering a 20-100x speedup than existing methods in generating diverse conformations, shedding light on promising avenues for future research.

Updated: 2025-03-12 19:06:38

标题: 为蛋白质构象生成构建结构语言模型

摘要: 蛋白质采用多种结构构象来执行其多样的生物功能，了解这些构象对于推动药物发现至关重要。传统的基于物理的模拟方法经常难以采样平衡构象，并且计算上昂贵。最近，深度生成模型显示出在生成蛋白质构象方面具有潜力作为更有效的替代方法。然而，这些方法主要依赖于三维几何空间内的扩散过程，通常集中在亚稳态附近，并且在运行时间方面通常效率低下。在本文中，我们引入结构语言建模（SLM）作为一种高效蛋白质构象生成的新框架。具体来说，蛋白质结构首先通过离散变分自动编码器编码为紧凑的潜在空间，然后通过条件语言建模有效地捕获特定序列的构象分布。这使得相对于现有方法，能够更高效和可解释地探索多样集合模式。基于这一通用框架，我们使用各种流行的LM架构实例化SLM，并提出了ESMDiff，一种新颖的BERT样式结构语言模型，从ESM3中经过掩蔽扩散微调。我们在各种场景中验证了我们的方法，包括BPTI的平衡动力学、构象变化对和固有无序蛋白质。SLM提供了一种高效的解决方案，在生成多样构象方面比现有方法提供了20-100倍的加速，为未来研究开辟了有希望的途径。

更新时间: 2025-03-12 19:06:38

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2410.18403v2

Privacy-Preserved Automated Scoring using Federated Learning for Educational Research

Data privacy remains a critical concern in educational research, necessitating Institutional Review Board (IRB) certification and stringent data handling protocols to ensure compliance with ethical standards. Traditional approaches rely on anonymization and controlled data-sharing mechanisms to facilitate research while mitigating privacy risks. However, these methods still involve direct access to raw student data, posing potential vulnerabilities and being time-consuming. This study proposes a federated learning (FL) framework for automatic scoring in educational assessments, eliminating the need to share raw data. Our approach leverages client-side model training, where student responses are processed locally on edge devices, and only optimized model parameters are shared with a central aggregation server. To effectively aggregate heterogeneous model updates, we introduce an adaptive weighted averaging strategy, which dynamically adjusts weight contributions based on client-specific learning characteristics. This method ensures robust model convergence while preserving privacy. We evaluate our framework using assessment data from nine middle schools, comparing the accuracy of federated learning-based scoring models with traditionally trained centralized models. A statistical significance test (paired t-test, $t(8) = 2.29, p = 0.051$) confirms that the accuracy difference between the two approaches is not statistically significant, demonstrating that federated learning achieves comparable performance while safeguarding student data. Furthermore, our method significantly reduces data collection, processing, and deployment overhead, accelerating the adoption of AI-driven educational assessments in a privacy-compliant manner.

Updated: 2025-03-12 19:06:25

标题: 隐私保护的教育研究中使用联邦学习进行自动评分

摘要: 数据隐私仍然是教育研究中的一个关键问题，需要机构审查委员会（IRB）的认证和严格的数据处理协议，以确保符合道德标准。传统方法依赖于匿名化和受控数据共享机制，以促进研究同时减少隐私风险。然而，这些方法仍然涉及直接访问原始学生数据，存在潜在的漏洞并且耗时。本研究提出了一个用于教育评估中自动评分的联邦学习（FL）框架，消除了共享原始数据的需求。我们的方法利用客户端模型训练，在边缘设备上本地处理学生响应，只将优化后的模型参数与中央聚合服务器共享。为了有效地聚合异构模型更新，我们引入了一种自适应加权平均策略，根据客户端特定的学习特征动态调整权重贡献。这种方法确保了稳健的模型收敛性同时保护隐私。我们使用来自九所中学的评估数据评估我们的框架，比较基于联邦学习的评分模型与传统训练的集中模型的准确性。统计学显著性检验（配对t检验，$t(8) = 2.29, p = 0.051$）证实了两种方法之间准确性差异不具有统计学意义，表明联邦学习在保护学生数据的同时实现了可比较的性能。此外，我们的方法显著减少了数据收集、处理和部署的开销，加速了以符合隐私要求的方式采用基于人工智能的教育评估。

更新时间: 2025-03-12 19:06:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11711v1

CMP: Cooperative Motion Prediction with Multi-Agent Communication

The confluence of the advancement of Autonomous Vehicles (AVs) and the maturity of Vehicle-to-Everything (V2X) communication has enabled the capability of cooperative connected and automated vehicles (CAVs). Building on top of cooperative perception, this paper explores the feasibility and effectiveness of cooperative motion prediction. Our method, CMP, takes LiDAR signals as model input to enhance tracking and prediction capabilities. Unlike previous work that focuses separately on either cooperative perception or motion prediction, our framework, to the best of our knowledge, is the first to address the unified problem where CAVs share information in both perception and prediction modules. Incorporated into our design is the unique capability to tolerate realistic V2X transmission delays, while dealing with bulky perception representations. We also propose a prediction aggregation module, which unifies the predictions obtained by different CAVs and generates the final prediction. Through extensive experiments and ablation studies on the OPV2V and V2V4Real datasets, we demonstrate the effectiveness of our method in cooperative perception, tracking, and motion prediction. In particular, CMP reduces the average prediction error by 12.3% compared with the strongest baseline. Our work marks a significant step forward in the cooperative capabilities of CAVs, showcasing enhanced performance in complex scenarios. More details can be found on the project website: https://cmp-cooperative-prediction.github.io.

Updated: 2025-03-12 19:03:13

标题: CMP：具有多智能体通信的协作运动预测

摘要: 自动驾驶汽车（AVs）的进步与车辆对一切（V2X）通信的成熟相结合，使合作连接和自动化汽车（CAVs）的能力成为可能。本文建立在合作感知的基础上，探讨了合作运动预测的可行性和有效性。我们的方法CMP将LiDAR信号作为模型输入，以增强跟踪和预测能力。与先前专注于合作感知或运动预测的工作不同，据我们所知，我们的框架是第一个解决CAVs在感知和预测模块中共享信息的统一问题的方法。我们的设计中包含了独特的能力，可以容忍现实中的V2X传输延迟，同时处理庞大的感知表示。我们还提出了一个预测聚合模块，统一了不同CAVs得到的预测并生成最终预测。通过对OPV2V和V2V4Real数据集的大量实验和消融研究，我们展示了我们的方法在合作感知、跟踪和运动预测方面的有效性。特别是，与最强基准相比，CMP将平均预测误差降低了12.3%。我们的工作在CAVs的合作能力方面迈出了重要的一步，在复杂场景中展示了增强的性能。更多细节请访问项目网站：https://cmp-cooperative-prediction.github.io。

更新时间: 2025-03-12 19:03:13

领域: cs.RO,cs.AI,cs.CV,cs.LG,cs.MA

下载: http://arxiv.org/abs/2403.17916v3

ConjointNet: Enhancing Conjoint Analysis for Preference Prediction with Representation Learning

Understanding consumer preferences is essential to product design and predicting market response to these new products. Choice-based conjoint analysis is widely used to model user preferences using their choices in surveys. However, traditional conjoint estimation techniques assume simple linear models. This assumption may lead to limited predictability and inaccurate estimation of product attribute contributions, especially on data that has underlying non-linear relationships. In this work, we employ representation learning to efficiently alleviate this issue. We propose ConjointNet, which is composed of two novel neural architectures, to predict user preferences. We demonstrate that the proposed ConjointNet models outperform traditional conjoint estimate techniques on two preference datasets by over 5%, and offer insights into non-linear feature interactions.

Updated: 2025-03-12 19:01:59

标题: ConjointNet: 通过表示学习增强偏好预测的联合分析

摘要: 理解消费者偏好对产品设计和预测市场对这些新产品的反应至关重要。基于选择的共同分析被广泛用来通过调查中的选择来建模用户偏好。然而，传统的共同估计技术假设简单的线性模型。这种假设可能导致预测能力有限，并且对产品属性贡献的估计不准确，特别是在具有潜在非线性关系的数据上。在这项工作中，我们采用表示学习来有效地缓解这个问题。我们提出了ConjointNet，它由两种新颖的神经架构组成，用于预测用户偏好。我们证明了所提出的ConjointNet模型在两个偏好数据集上的表现优于传统的共同估计技术超过5%，并提供了有关非线性特征交互作用的见解。

更新时间: 2025-03-12 19:01:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11710v1

BiasConnect: Investigating Bias Interactions in Text-to-Image Models

The biases exhibited by Text-to-Image (TTI) models are often treated as if they are independent, but in reality, they may be deeply interrelated. Addressing bias along one dimension, such as ethnicity or age, can inadvertently influence another dimension, like gender, either mitigating or exacerbating existing disparities. Understanding these interdependencies is crucial for designing fairer generative models, yet measuring such effects quantitatively remains a challenge. In this paper, we aim to address these questions by introducing BiasConnect, a novel tool designed to analyze and quantify bias interactions in TTI models. Our approach leverages a counterfactual-based framework to generate pairwise causal graphs that reveals the underlying structure of bias interactions for the given text prompt. Additionally, our method provides empirical estimates that indicate how other bias dimensions shift toward or away from an ideal distribution when a given bias is modified. Our estimates have a strong correlation (+0.69) with the interdependency observations post bias mitigation. We demonstrate the utility of BiasConnect for selecting optimal bias mitigation axes, comparing different TTI models on the dependencies they learn, and understanding the amplification of intersectional societal biases in TTI models.

Updated: 2025-03-12 19:01:41

标题: BiasConnect: 调查文本到图像模型中的偏见交互

摘要: Text-to-Image（TTI）模型展现的偏见通常被视为独立的，但实际上它们可能深层次地相互关联。解决一个维度上的偏见，比如种族或年龄，可能会无意中影响另一个维度，比如性别，从而减轻或加剧现有的差距。了解这些相互依赖关系对于设计更公平的生成模型至关重要，然而量化这种效应仍然是一个挑战。在本文中，我们旨在通过引入BiasConnect，一种新颖的工具，来分析和量化TTI模型中的偏见相互作用。我们的方法利用基于反事实的框架生成成对因果图，揭示了给定文本提示的偏见相互作用的潜在结构。此外，我们的方法提供了实证估计，表明当给定偏见被修改时，其他偏见维度向理想分布移动或远离的方式。我们的估计与偏见减轻后的相互依赖观察具有较强的相关性（+0.69）。我们展示了BiasConnect在选择最佳偏见减轻轴、比较不同TTI模型学习的依赖关系以及理解TTI模型中交叉社会偏见的放大方面的实用性。

更新时间: 2025-03-12 19:01:41

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.09763v1

Distributionally Robust Multi-Agent Reinforcement Learning for Dynamic Chute Mapping

In Amazon robotic warehouses, the destination-to-chute mapping problem is crucial for efficient package sorting. Often, however, this problem is complicated by uncertain and dynamic package induction rates, which can lead to increased package recirculation. To tackle this challenge, we introduce a Distributionally Robust Multi-Agent Reinforcement Learning (DRMARL) framework that learns a destination-to-chute mapping policy that is resilient to adversarial variations in induction rates. Specifically, DRMARL relies on group distributionally robust optimization (DRO) to learn a policy that performs well not only on average but also on each individual subpopulation of induction rates within the group that capture, for example, different seasonality or operation modes of the system. This approach is then combined with a novel contextual bandit-based predictor of the worst-case induction distribution for each state-action pair, significantly reducing the cost of exploration and thereby increasing the learning efficiency and scalability of our framework. Extensive simulations demonstrate that DRMARL achieves robust chute mapping in the presence of varying induction distributions, reducing package recirculation by an average of 80\% in the simulation scenario.

Updated: 2025-03-12 18:56:25

标题: 分布鲁棒的多智能体强化学习用于动态滑槽映射

摘要: 在亚马逊的机器人仓库中，目标到料槽映射问题对于高效的包裹分类至关重要。然而，通常情况下，这个问题会受到不确定和动态的包裹投入率的影响，这可能导致增加的包裹循环。为了解决这一挑战，我们引入了一种分布鲁棒的多智能体强化学习（DRMARL）框架，该框架学习了一个目标到料槽映射策略，可以抵抗包裹投入率的敌对变化。具体来说，DRMARL依赖于群体分布鲁棒优化（DRO）来学习一个策略，不仅在平均水平上表现良好，而且在群体内捕捉到的每个个体子群体的投入率上表现良好，比如系统的不同季节性或操作模式。然后，这种方法与一种基于上下文的赌博算法的预测器相结合，用于每个状态-动作对的最坏情况投入分布，显著降低了探索成本，从而提高了我们框架的学习效率和可扩展性。大量模拟表明，DRMARL在存在不同投入分布的情况下实现了鲁棒的料槽映射，将模拟场景中的包裹循环平均减少了80％。

更新时间: 2025-03-12 18:56:25

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2503.09755v1

User-centric Immersive Communications in 6G: A Data-oriented Framework via Digital Twin

In this article, we present a novel user-centric service provision for immersive communications (IC) in 6G to deal with the uncertainty of individual user behaviors while satisfying unique requirements on the quality of multi-sensory experience. To this end, we propose a data-oriented framework for network resource management, featuring personalized data management that can support network modeling tailored to different user demands. Our framework leverages the digital twin (DT) technique as a key enabler. Particularly, a DT is established for each user, and the data attributes in the DT are customized based on the characteristics of the user. The DT functions, corresponding to various data operations, are customized in the development, evaluation, and update of network models to meet unique user demands. A trace-driven case study demonstrates the effectiveness of our framework in achieving user-centric IC and the significance of personalized data management in 6G.

Updated: 2025-03-12 18:52:20

标题: 6G中以用户为中心的沉浸式通信：通过数字孪生的数据导向框架

摘要: 在这篇文章中，我们提出了一种新颖的用户中心服务提供方式，用于在6G中实现沉浸式通信（IC），以应对个人用户行为的不确定性，同时满足多感官体验质量的独特需求。为此，我们提出了一个基于数据的网络资源管理框架，具有支持针对不同用户需求进行网络建模的个性化数据管理功能。我们的框架利用数字孪生（DT）技术作为关键的启用器。特别地，为每个用户建立一个DT，并根据用户的特征定制DT中的数据属性。DT功能，对应于各种数据操作，在开发、评估和更新网络模型中进行定制，以满足独特的用户需求。一个基于追踪驱动的案例研究展示了我们框架在实现用户中心 IC 和个性化数据管理在6G中的重要性。

更新时间: 2025-03-12 18:52:20

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2410.02688v2

Solving Bayesian inverse problems with diffusion priors and off-policy RL

This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (RL) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.

Updated: 2025-03-12 18:45:22

标题: 使用扩散先验和离线策略强化学习解决贝叶斯逆问题

摘要: 本文介绍了相对轨迹平衡（RTB）的实际应用，这是最近引入的一种离策略强化学习（RL）目标，可以渐近地最优地解决贝叶斯逆问题。我们通过使用RTB来训练预训练的无条件先验条件扩散模型后验，扩展了原始工作，用于挑战性视觉和科学领域的线性和非线性逆问题。我们使用这一目标以及离策略回溯探索等技术来改进训练。重要的是，我们的结果表明，现有的无需训练的扩散后验方法在潜在空间中进行有效的后验推断时存在固有偏差。

更新时间: 2025-03-12 18:45:22

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2503.09746v1

Review GIDE -- Restaurant Review Gastrointestinal Illness Detection and Extraction with Large Language Models

Foodborne gastrointestinal (GI) illness is a common cause of ill health in the UK. However, many cases do not interact with the healthcare system, posing significant challenges for traditional surveillance methods. The growth of publicly available online restaurant reviews and advancements in large language models (LLMs) present potential opportunities to extend disease surveillance by identifying public reports of GI illness. In this study, we introduce a novel annotation schema, developed with experts in GI illness, applied to the Yelp Open Dataset of reviews. Our annotations extend beyond binary disease detection, to include detailed extraction of information on symptoms and foods. We evaluate the performance of open-weight LLMs across these three tasks: GI illness detection, symptom extraction, and food extraction. We compare this performance to RoBERTa-based classification models fine-tuned specifically for these tasks. Our results show that using prompt-based approaches, LLMs achieve micro-F1 scores of over 90% for all three of our tasks. Using prompting alone, we achieve micro-F1 scores that exceed those of smaller fine-tuned models. We further demonstrate the robustness of LLMs in GI illness detection across three bias-focused experiments. Our results suggest that publicly available review text and LLMs offer substantial potential for public health surveillance of GI illness by enabling highly effective extraction of key information. While LLMs appear to exhibit minimal bias in processing, the inherent limitations of restaurant review data highlight the need for cautious interpretation of results.

Updated: 2025-03-12 18:42:43

标题: 评论GIDE -- 利用大型语言模型进行餐厅胃肠疾病检测和提取

摘要: 食源性胃肠道（GI）疾病是英国常见的致病原因之一。然而，许多病例并未与医疗系统互动，给传统监测方法带来了重大挑战。公开可获取的在线餐厅评论和大型语言模型（LLMs）的发展为通过识别公众关于GI疾病的报告来扩展疾病监测提供了潜在机会。在本研究中，我们引入了一种新颖的标注模式，与GI疾病专家共同开发，并应用于Yelp评论的开放数据集。我们的标注不仅限于二进制疾病检测，还包括对症状和食物信息的详细提取。我们评估了开放权重LLMs在这三个任务上的性能：GI疾病检测、症状提取和食物提取。我们将这一性能与专门针对这些任务进行微调的基于RoBERTa的分类模型进行比较。我们的结果表明，使用基于提示的方法，LLMs在所有三个任务上都实现了超过90％的微F1得分。仅使用提示，我们实现了超过较小的微调模型的微F1得分。我们进一步通过三个偏见焦点实验展示了LLMs在GI疾病检测中的稳健性。我们的结果表明，公开可获取的评论文本和LLMs通过高效提取关键信息，为GI疾病的公共卫生监测提供了重大潜力。虽然LLMs在处理过程中似乎表现出最小的偏见，但餐厅评论数据的固有限制突显了对结果的谨慎解释的必要性。

更新时间: 2025-03-12 18:42:43

领域: cs.CL,cs.LG,68T50

下载: http://arxiv.org/abs/2503.09743v1

Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control

Diffusion Probabilistic Models (DPMs) are powerful generative models that have achieved unparalleled success in a number of generative tasks. In this work, we aim to build inductive biases into the training and sampling of diffusion models to better accommodate the target distribution of the data to model. For topologically structured data, we devise a frequency-based noising operator to purposefully manipulate, and set, these inductive biases. We first show that appropriate manipulations of the noising forward process can lead DPMs to focus on particular aspects of the distribution to learn. We show that different datasets necessitate different inductive biases, and that appropriate frequency-based noise control induces increased generative performance compared to standard diffusion. Finally, we demonstrate the possibility of ignoring information at particular frequencies while learning. We show this in an image corruption and recovery task, where we train a DPM to recover the original target distribution after severe noise corruption.

Updated: 2025-03-12 18:40:15

标题: 通过基于频率的噪声控制来塑造扩散模型中的归纳偏差

摘要: 扩散概率模型（DPMs）是强大的生成模型，在许多生成任务中取得了空前的成功。在这项工作中，我们旨在将归纳偏差纳入扩散模型的训练和采样中，以更好地适应数据模型的目标分布。对于拓扑结构化数据，我们设计了一个基于频率的噪声操作符，目的是有意地操纵和设置这些归纳偏差。我们首先展示了对噪声正向过程的适当操作可以使DPMs集中于学习分布的特定方面。我们表明不同的数据集需要不同的归纳偏差，并且适当的基于频率的噪声控制相比标准扩散会导致更高的生成性能。最后，我们展示了在学习过程中忽略特定频率的信息的可能性。我们在图像损坏和恢复任务中展示了这一点，我们训练了一个DPM来恢复严重噪声损坏后的原始目标分布。

更新时间: 2025-03-12 18:40:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.10236v2

ICMarks: A Robust Watermarking Framework for Integrated Circuit Physical Design IP Protection

Physical design watermarking on contemporary integrated circuit (IC) layout encodes signatures without considering the dense connections and design constraints, which could lead to performance degradation on the watermarked products. This paper presents ICMarks, a quality-preserving and robust watermarking framework for modern IC physical design. ICMarks embeds unique watermark signatures during the physical design's placement stage, thereby authenticating the IC layout ownership. ICMarks's novelty lies in (i) strategically identifying a region of cells to watermark with minimal impact on the layout performance and (ii) a two-level watermarking framework for augmented robustness toward potential removal and forging attacks. Extensive evaluations on benchmarks of different design objectives and sizes validate that ICMarks incurs no wirelength and timing metrics degradation, while successfully proving ownership. Furthermore, we demonstrate ICMarks is robust against two major watermarking attack categories, namely, watermark removal and forging attacks; even if the adversaries have prior knowledge of the watermarking schemes, the signatures cannot be removed without significantly undermining the layout quality.

Updated: 2025-03-12 18:37:44

标题: ICMarks：一种用于集成电路物理设计IP保护的稳健水印框架

摘要: 当代集成电路（IC）布局的物理设计水印编码签名时，并未考虑密集连接和设计约束，这可能导致水印产品性能下降。本文介绍了ICMarks，这是一个适用于现代IC物理设计的保留质量和稳健性的水印框架。ICMarks在物理设计的放置阶段嵌入了唯一的水印签名，从而验证了IC布局的所有权。ICMarks的创新之处在于（i）在最小影响布局性能的情况下，战略性地识别要加水印的单元区域，以及（ii）一个两级水印框架，增加了对潜在的去除和伪造攻击的稳健性。对不同设计目标和规模的基准测试表明，ICMarks不会造成线长和时序指标的下降，同时成功地证明了所有权。此外，我们展示了ICMarks对抗两种主要水印攻击类别的稳健性，即水印去除和伪造攻击；即使对手事先了解水印方案，也无法在显著损害布局质量的情况下去除签名。

更新时间: 2025-03-12 18:37:44

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2404.18407v2

Unveiling Hidden Pivotal Players with GoalNet: A GNN-Based Soccer Player Evaluation System

Soccer analysis tools emphasize metrics such as expected goals, leading to an overrepresentation of attacking players' contributions and overlooking players who facilitate ball control and link attacks. Examples include Rodri from Manchester City and Palhinha who just transferred to Bayern Munich. To address this bias, we aim to identify players with pivotal roles in a soccer team, incorporating both spatial and temporal features. In this work, we introduce a GNN-based framework that assigns individual credit for changes in expected threat (xT), thus capturing overlooked yet vital contributions in soccer. Our pipeline encodes both spatial and temporal features in event-centric graphs, enabling fair attribution of non-scoring actions such as defensive or transitional plays. We incorporate centrality measures into the learned player embeddings, ensuring that ball-retaining defenders and defensive midfielders receive due recognition for their overall impact. Furthermore, we explore diverse GNN variants-including Graph Attention Networks and Transformer-based models-to handle long-range dependencies and evolving match contexts, discussing their relative performance and computational complexity. Experiments on real match data confirm the robustness of our approach in highlighting pivotal roles that traditional attacking metrics typically miss, underscoring the model's utility for more comprehensive soccer analytics.

Updated: 2025-03-12 18:36:55

标题: 揭示隐藏的关键球员：基于GNN的足球球员评估系统GoalNet

摘要: 足球分析工具强调诸如预期进球之类的指标，导致进攻球员的贡献被过分强调，而忽视了促进控球和连接进攻的球员。例如，曼城的罗德里和刚刚转会至拜仁慕尼黑的帕利尼亚。为了解决这一偏见，我们旨在识别在足球队中扮演关键角色的球员，同时结合空间和时间特征。在这项工作中，我们介绍了一种基于GNN的框架，为预期威胁（xT）变化分配个人信用，从而捕捉被忽视但至关重要的足球贡献。我们的管线在以事件为中心的图中编码了空间和时间特征，实现了对非得分动作（如防守或过渡战术）的公正归因。我们将中心性指标纳入学习到的球员嵌入中，确保保球能力强的后卫和防守型中场得到应有的认可。此外，我们探索了多种GNN变体，包括图注意力网络和基于Transformer的模型，以处理长距离依赖性和不断变化的比赛背景，讨论它们的相对性能和计算复杂性。对真实比赛数据的实验证实了我们方法在突出传统进攻指标通常忽视的关键角色方面的鲁棒性，强调了该模型在更全面的足球分析中的实用性。

更新时间: 2025-03-12 18:36:55

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2503.09737v1

CALLM: Context-Aware Emotion Analysis in Cancer Survivors Using LLMs and Retrieval-Augmented Mobile Diaries

Cancer survivors face unique emotional challenges that impact their quality of life. Mobile diary entries-short text entries recording through their phone about their emotional experiences-provide a promising method for tracking these experiences in real time. Although emotion analysis tools show potential for recognizing emotions from text, current methods lack the contextual understanding necessary to accurately interpret the brief, personal narratives in mobile diaries. We propose CALLM, a context-aware emotion analysis framework that leverages Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), to analyze mobile diary entries from cancer survivors to predict their emotional states. The framework enhances prediction accuracy beyond existing methods by (1) integrating retrieved peer experiences as contextual examples and (2) incorporating individuals' temporal emotional trajectories from their mobile diary entries. We collected a large-scale dataset (N=407) of cancer survivors' mobile ecological momentary assessments (EMAs), which assessed positive and negative affect, desire to regulate emotions, social interaction quality, and availability for interventions, alongside daily mobile diary entries in an open response format regarding what was driving their current emotional experience. Results demonstrate strong performance of CALLM, with balanced accuracies reaching 72.96% for positive and 73.29% for negative affect, and 73.72% for predicting individual's desire to regulate emotions. Post-hoc analysis reveals that leveraging model confidence, encouraging longer diary entries, and incorporating personal ground truth, further enhance predictive outcomes. Our findings support the feasibility of deploying LLM-powered emotion analysis in chronic health populations and suggest promising directions for personalized interventions for cancer survivors.

Updated: 2025-03-12 18:36:41

标题: CALLM：使用LLMs和检索增强型移动日记的上下文感知癌症幸存者情绪分析

摘要: 癌症幸存者面临着独特的情绪挑战，影响其生活质量。移动日记条目-通过手机记录情感经历的简短文本条目-为实时跟踪这些经历提供了一种有前途的方法。尽管情绪分析工具显示出从文本中识别情绪的潜力，但当前的方法缺乏准确解释移动日记中简短个人叙述所需的上下文理解。我们提出了CALLM，一个利用大型语言模型（LLMs）与检索增强生成（RAG）的上下文感知情绪分析框架，用于分析癌症幸存者的移动日记条目，以预测他们的情绪状态。该框架通过（1）整合检索到的同行经历作为上下文示例和（2）从移动日记条目中包括个体的时间情绪轨迹来提高预测准确性，超越了现有方法。我们收集了一份大规模数据集（N=407）的癌症幸存者的移动生态瞬间评估（EMAs），评估了积极和消极情绪、情绪调节欲望、社交互动质量以及干预的可用性，以及每日移动日记条目以开放回应格式描述当前情感经历的驱动因素。结果显示了CALLM的强大性能，平衡准确率达到了积极情绪72.96%、消极情绪73.29%和预测个体情绪调节欲望73.72%。事后分析显示，利用模型置信度、鼓励撰写更长的日记条目和融入个人真实情况，可以进一步提升预测结果。我们的发现支持在慢性健康人群中部署LLM动力情绪分析的可行性，并为癌症幸存者的个性化干预提供有前途的方向。

更新时间: 2025-03-12 18:36:41

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2503.10707v1

Enhancing Adversarial Example Detection Through Model Explanation

Adversarial examples are a major problem for machine learning models, leading to a continuous search for effective defenses. One promising direction is to leverage model explanations to better understand and defend against these attacks. We looked at AmI, a method proposed by a NeurIPS 2018 spotlight paper that uses model explanations to detect adversarial examples. Our study shows that while AmI is a promising idea, its performance is too dependent on specific settings (e.g., hyperparameter) and external factors such as the operating system and the deep learning framework used, and such drawbacks limit AmI's practical usage. Our findings highlight the need for more robust defense mechanisms that are effective under various conditions. In addition, we advocate for a comprehensive evaluation framework for defense techniques.

Updated: 2025-03-12 18:34:41

标题: 通过模型解释增强对抗样本检测

摘要: 对抗样本是机器学习模型面临的一个重要问题，导致持续寻找有效的防御方法。一个有前途的方向是利用模型解释来更好地理解和抵御这些攻击。我们研究了一个NeurIPS 2018的重点论文提出的AmI方法，该方法利用模型解释来检测对抗样本。我们的研究表明，虽然AmI是一个有前途的想法，但其性能过于依赖于特定设置（例如，超参数）和外部因素，如操作系统和使用的深度学习框架，这些缺点限制了AmI的实际使用。我们的发现突显了对更加稳健的防御机制的需求，这些机制在各种条件下都能有效。此外，我们主张建立一个全面的评估框架，用于防御技术。

更新时间: 2025-03-12 18:34:41

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.09735v1

Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving

The most promising recent methods for AI reasoning require applying variants of reinforcement learning (RL) either on rolled out trajectories from the model, even for the step-wise rewards, or large quantities of human annotated trajectory data. The reliance on the rolled-out trajectory renders the compute cost and time prohibitively high. In particular, the correctness of a reasoning trajectory can typically only be judged at its completion, leading to sparse rewards in RL or requiring expensive synthetic data generation in expert iteration-like methods. In this work, we focus on the Automatic Theorem Proving (ATP) task and propose a novel verifier-in-the-loop design, which unlike existing approaches that leverage feedback on the entire reasoning trajectory, employs an automated verifier to give intermediate feedback at each step of the reasoning process. Using Lean as the verifier, we empirically show that the step-by-step local verification produces a global improvement in the model's reasoning accuracy and efficiency.

Updated: 2025-03-12 18:20:47

标题: 通过在验证器中循环进行的本地前瞻指导，用于自动定理证明

摘要: 最近最有前景的人工智能推理方法需要在模型的轨迹上应用强化学习（RL）的变种，即使是针对逐步奖励，或者大量的人工注释轨迹数据。依赖于轨迹导致了计算成本和时间过高。特别是，推理轨迹的正确性通常只能在完成时判断，导致强化学习中的稀疏奖励或需要在专家迭代类似方法中生成昂贵的合成数据。在这项工作中，我们专注于自动定理证明（ATP）任务，并提出了一种新颖的循环验证器设计，与现有方法不同，这种方法利用整个推理轨迹的反馈，而是使用自动验证器在推理过程的每一步给出中间反馈。通过使用Lean作为验证器，我们实证表明，逐步本地验证可以全面提高模型的推理准确性和效率。

更新时间: 2025-03-12 18:20:47

领域: cs.AI,cs.CL,cs.LG,cs.LO

下载: http://arxiv.org/abs/2503.09730v1

All Your Knowledge Belongs to Us: Stealing Knowledge Graphs via Reasoning APIs

Knowledge graph reasoning (KGR), which answers complex, logical queries over large knowledge graphs (KGs), represents an important artificial intelligence task with a range of applications. Many KGs require extensive domain expertise and engineering effort to build and are hence considered proprietary within organizations and enterprises. Yet, spurred by their commercial and research potential, there is a growing trend to make KGR systems, (partially) built upon private KGs, publicly available through reasoning APIs. The inherent tension between maintaining the confidentiality of KGs while ensuring the accessibility to KGR systems motivates our study of KG extraction attacks: the adversary aims to "steal" the private segments of the backend KG, leveraging solely black-box access to the KGR API. Specifically, we present KGX, an attack that extracts confidential sub-KGs with high fidelity under limited query budgets. At a high level, KGX progressively and adaptively queries the KGR API and integrates the query responses to reconstruct the private sub-KG. This extraction remains viable even if any query responses related to the private sub-KG are filtered. We validate the efficacy of KGX against both experimental and real-world KGR APIs. Interestingly, we find that typical countermeasures (e.g., injecting noise into query responses) are often ineffective against KGX. Our findings suggest the need for a more principled approach to developing and deploying KGR systems, as well as devising new defenses against KG extraction attacks.

Updated: 2025-03-12 18:18:44

标题: 你所有的知识都属于我们：通过推理API窃取知识图谱

摘要: 知识图谱推理（KGR）是对大型知识图谱（KGs）进行复杂逻辑查询的重要人工智能任务，具有各种应用。许多知识图谱需要广泛的领域专业知识和工程努力来构建，因此在组织和企业内被视为专有。然而，受其商业和研究潜力的推动，越来越多的KGR系统（部分）建立在私有KG上，并通过推理API公开提供。保持知识图谱的保密性同时确保对KGR系统的可访问性之间的内在紧张关系激发了我们对KG提取攻击的研究：对手旨在“窃取”后端KG的私有部分，仅利用对KGR API的黑匣子访问。具体而言，我们提出了KGX攻击，可以在有限的查询预算下高度忠实地提取机密子KG。在高层次上，KGX逐步和自适应地查询KGR API，并整合查询响应以重建私有子KG。即使与私有子KG相关的任何查询响应都被过滤，此提取仍然可行。我们验证了KGX对实验和真实世界KGR API的有效性。有趣的是，我们发现典型的对抗措施（例如向查询响应中注入噪音）通常对KGX无效。我们的研究结果表明，需要更加原则性的方法来开发和部署KGR系统，以及设计新的防御措施来对抗KG提取攻击。

更新时间: 2025-03-12 18:18:44

领域: cs.CR

下载: http://arxiv.org/abs/2503.09727v1

A primer on optimal transport for causal inference with observational data

The theory of optimal transportation has developed into a powerful and elegant framework for comparing probability distributions, with wide-ranging applications in all areas of science. The fundamental idea of analyzing probabilities by comparing their underlying state space naturally aligns with the core idea of causal inference, where understanding and quantifying counterfactual states is paramount. Despite this intuitive connection, explicit research at the intersection of optimal transport and causal inference is only beginning to develop. Yet, many foundational models in causal inference have implicitly relied on optimal transport principles for decades, without recognizing the underlying connection. Therefore, the goal of this review is to offer an introduction to the surprisingly deep existing connections between optimal transport and the identification of causal effects with observational data -- where optimal transport is not just a set of potential tools, but actually builds the foundation of model assumptions. As a result, this review is intended to unify the language and notation between different areas of statistics, mathematics, and econometrics, by pointing out these existing connections, and to explore novel problems and directions for future work in both areas derived from this realization.

Updated: 2025-03-12 18:18:00

标题: 一个关于利用观测数据进行因果推断的最优传输基础知识

摘要: 最优输运理论已经发展成一个强大而优雅的框架，用于比较概率分布，在科学的各个领域都有广泛的应用。分析概率通过比较其潜在状态空间的基本思想自然地与因果推断的核心思想相一致，其中理解和量化反事实状态至关重要。尽管存在这种直觉联系，但最优输运和因果推断交叉领域的明确研究才刚刚开始发展。然而，许多因果推断中的基础模型几十年来都隐含地依赖于最优输运原理，却没有意识到这种潜在联系。因此，本综述的目标是介绍最优输运与使用观测数据识别因果效应之间惊人深刻的现有联系，其中最优输运不仅仅是一组潜在工具，而且实际上构建了模型假设的基础。因此，本综述旨在统一统计学、数学和计量经济学等不同领域之间的语言和符号，通过指出这些现有联系，并探索由此实现的新问题和未来工作方向。

更新时间: 2025-03-12 18:18:00

领域: stat.ME,cs.AI,econ.EM

下载: http://arxiv.org/abs/2503.07811v2

How Feasible is Augmenting Fake Nodes with Learnable Features as a Counter-strategy against Link Stealing Attacks?

Graph Neural Networks (GNNs) are widely used and deployed for graph-based prediction tasks. However, as good as GNNs are for learning graph data, they also come with the risk of privacy leakage. For instance, an attacker can run carefully crafted queries on the GNNs and, from the responses, can infer the existence of an edge between a pair of nodes. This attack, dubbed as a "link-stealing" attack, can jeopardize the user's privacy by leaking potentially sensitive information. To protect against this attack, we propose an approach called "$(N)$ode $(A)$ugmentation for $(R)$estricting $(G)$raphs from $(I)$nsinuating their $(S)$tructure" ($NARGIS$) and study its feasibility. $NARGIS$ is focused on reshaping the graph embedding space so that the posterior from the GNN model will still provide utility for the prediction task but will introduce ambiguity for the link-stealing attackers. To this end, $NARGIS$ applies spectral clustering on the given graph to facilitate it being augmented with new nodes -- that have learned features instead of fixed ones. It utilizes tri-level optimization for learning parameters for the GNN model, surrogate attacker model, and our defense model (i.e. learnable node features). We extensively evaluate $NARGIS$ on three benchmark citation datasets over eight knowledge availability settings for the attackers. We also evaluate the model fidelity and defense performance on influence-based link inference attacks. Through our studies, we have figured out the best feature of $NARGIS$ -- its superior fidelity-privacy performance trade-off in a significant number of cases. We also have discovered in which cases the model needs to be improved, and proposed ways to integrate different schemes to make the model more robust against link stealing attacks.

Updated: 2025-03-12 18:16:37

标题: 增加可学习特征的虚假节点作为对抗链接窃取攻击的反对策有多可行？

摘要: 图神经网络（GNNs）被广泛应用于基于图的预测任务。然而，尽管GNNs在学习图数据方面表现出色，但它们也带来了隐私泄露的风险。例如，攻击者可以在GNNs上运行精心设计的查询，并从响应中推断出两个节点之间存在边。这种攻击被称为“链接窃取”攻击，可能会通过泄露潜在敏感信息来危及用户的隐私。为了防止这种攻击，我们提出了一种称为“节点增强以限制图表达结构”（NARGIS）的方法，并研究其可行性。NARGIS专注于重塑图嵌入空间，以便GNN模型的后验仍然对预测任务提供效用，但会为链接窃取攻击者引入模糊性。为此，NARGIS对给定的图应用谱聚类，以便它被新节点增强--这些节点具有学习到的特征而不是固定的特征。它利用三级优化来学习GNN模型、替代攻击模型和我们的防御模型（即可学习的节点特征）的参数。我们在三个基准引文数据集上广泛评估了NARGIS在八种攻击者的知识可用性设置下的性能。我们还评估了模型的可靠性和对基于影响的链接推断攻击的防御性能。通过我们的研究，我们发现了NARGIS的最佳特征--在相当多情况下，它具有出色的可靠性-隐私性能权衡。我们还发现了哪些情况下需要改进模型，并提出了整合不同方案以使模型更具抗链接窃取攻击能力的方法。

更新时间: 2025-03-12 18:16:37

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2503.09726v1

Exploiting Adjacent Similarity in Multi-Armed Bandit Tasks via Transfer of Reward Samples

We consider a sequential multi-task problem, where each task is modeled as the stochastic multi-armed bandit with K arms. We assume the bandit tasks are adjacently similar in the sense that the difference between the mean rewards of the arms for any two consecutive tasks is bounded by a parameter. We propose two algorithms (one assumes the parameter is known while the other does not) based on UCB to transfer reward samples from preceding tasks to improve the overall regret across all tasks. Our analysis shows that transferring samples reduces the regret as compared to the case of no transfer. We provide empirical results for our algorithms, which show performance improvement over the standard UCB algorithm without transfer and a naive transfer algorithm.

Updated: 2025-03-12 18:15:36

标题: 利用相邻相似性通过奖励样本转移在多臂赌博任务中进行利用

摘要: 我们考虑一个顺序多任务问题，其中每个任务被建模为具有K臂的随机多臂老虎机。我们假设老虎机任务在某种意义上是相邻相似的，即任意两个连续任务的臂的平均奖励之间的差异受到一个参数的限制。我们提出了两种算法（一种假设参数已知，另一种则不假设），基于UCB从前面的任务中转移奖励样本，以改善所有任务的整体遗憾。我们的分析表明，与不进行转移的情况相比，转移样本可以减少遗憾。我们为我们的算法提供了实证结果，显示出与标准UCB算法和一个朴素的转移算法相比的性能改进。

更新时间: 2025-03-12 18:15:36

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2409.19975v2

MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache

This paper presents MoE-Infinity, an efficient MoE inference system designed for personal machines with limited GPU memory capacity. The key idea for MoE-Infinity is that on personal machines, which are often single-user environments, MoE-based LLMs typically operate with a batch size of one. In this setting, MoE models exhibit a high degree of activation sparsity, meaning a small number of experts are frequently reused in generating tokens during the decode phase. Leveraging this idea, we design a sparsity-aware expert cache, which can trace the sparse activation of experts during inference and carefully select the trace that represents the sparsity pattern. By analyzing these selected traces, MoE-Infinity guides the replacement and prefetching of the expert cache, providing 3.1-16.7x per-token latency improvements over numerous state-of-the-art systems, including vLLM, Ollama, DeepSpeed and BrainStorm across various MoE models (DeepSeek and Mixtral) when handling different LLM tasks. MoE-Infinity's source code is publicly available at https://github.com/EfficientMoE/MoE-Infinity

Updated: 2025-03-12 18:14:21

标题: MoE-Infinity：在个人机器上高效进行稀疏感知专家缓存的MoE推断

摘要: 本文介绍了MoE-Infinity，这是一个专为GPU内存容量有限的个人机器设计的高效MoE推理系统。MoE-Infinity的关键思想是，在个人机器上，通常是单用户环境，基于MoE的LLM通常以批处理大小为一运行。在这种情况下，MoE模型表现出很高的激活稀疏性，意味着在解码阶段生成标记时经常重复使用少量专家。利用这一思想，我们设计了一个稀疏感知的专家缓存，可以在推理过程中跟踪专家的稀疏激活，并仔细选择代表稀疏模式的跟踪。通过分析这些选择的跟踪，MoE-Infinity引导专家缓存的替换和预取，相对于许多最先进的系统（包括vLLM、Ollama、DeepSpeed和BrainStorm）在处理不同的LLM任务时，为各种MoE模型（DeepSeek和Mixtral）提供每个标记延迟改进3.1-16.7倍。MoE-Infinity的源代码可以在https://github.com/EfficientMoE/MoE-Infinity上公开获取。

更新时间: 2025-03-12 18:14:21

领域: cs.LG,cs.PF

下载: http://arxiv.org/abs/2401.14361v3

Finding the Muses: Identifying Coresets through Loss Trajectories

Deep learning models achieve state-of-the-art performance across domains but face scalability challenges in real-time or resource-constrained scenarios. To address this, we propose Loss Trajectory Correlation (LTC), a novel metric for coreset selection that identifies critical training samples driving generalization. $LTC$ quantifies the alignment between training sample loss trajectories and validation set loss trajectories, enabling the construction of compact, representative subsets. Unlike traditional methods with computational and storage overheads that are infeasible to scale to large datasets, $LTC$ achieves superior efficiency as it can be computed as a byproduct of training. Our results on CIFAR-100 and ImageNet-1k show that $LTC$ consistently achieves accuracy on par with or surpassing state-of-the-art coreset selection methods, with any differences remaining under 1%. LTC also effectively transfers across various architectures, including ResNet, VGG, DenseNet, and Swin Transformer, with minimal performance degradation (<2%). Additionally, LTC offers insights into training dynamics, such as identifying aligned and conflicting sample behaviors, at a fraction of the computational cost of traditional methods. This framework paves the way for scalable coreset selection and efficient dataset optimization.

Updated: 2025-03-12 18:11:16

标题: 寻找缪斯：通过损失轨迹识别核心集

摘要: 深度学习模型在各个领域取得了最先进的性能，但在实时或资源受限的情况下面临可扩展性挑战。为了解决这个问题，我们提出了Loss Trajectory Correlation（LTC），这是一种用于核心集选择的新型度量标准，可以识别推动泛化的关键训练样本。$LTC$量化了训练样本损失轨迹与验证集损失轨迹之间的一致性，从而实现构建紧凑、代表性子集。与传统方法不同，传统方法具有计算和存储开销，在处理大型数据集时不可行，$LTC$的效率更高，因为它可以作为训练的副产品进行计算。我们在CIFAR-100和ImageNet-1k上的结果显示，$LTC$始终达到与或超过最先进核心集选择方法相当的准确度，任何差异都保持在1%以下。LTC还有效地跨越了各种架构，包括ResNet、VGG、DenseNet和Swin Transformer，在性能下降最低的情况下（<2%）。此外，LTC提供了有关训练动态的洞察，例如识别对齐和冲突的样本行为，其计算成本仅为传统方法的一小部分。这个框架为可扩展的核心集选择和高效的数据集优化铺平了道路。

更新时间: 2025-03-12 18:11:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.09721v1

Near-Polynomially Competitive Active Logistic Regression

We address the problem of active logistic regression in the realizable setting. It is well known that active learning can require exponentially fewer label queries compared to passive learning, in some cases using $\log \frac{1}{\eps}$ rather than $\poly(1/\eps)$ labels to get error $\eps$ larger than the optimum. We present the first algorithm that is polynomially competitive with the optimal algorithm on every input instance, up to factors polylogarithmic in the error and domain size. In particular, if any algorithm achieves label complexity polylogarithmic in $\eps$, so does ours. Our algorithm is based on efficient sampling and can be extended to learn more general class of functions. We further support our theoretical results with experiments demonstrating performance gains for logistic regression compared to existing active learning algorithms.

Updated: 2025-03-12 18:09:09

标题: 接近多项式竞争力的主动逻辑回归

摘要: 我们解决了在可实现设置下的主动逻辑回归问题。众所周知，在某些情况下，与被动学习相比，主动学习可能需要指数级更少的标签查询，有时使用$\log \frac{1}{\eps}$标签而不是$\poly(1/\eps)$标签，以获得大于最优误差$\eps$。我们提出了第一个算法，该算法在每个输入实例上与最优算法具有多项式竞争力，最多相差多项式对数的误差和域大小。特别地，如果任何算法实现了与$\eps$多项式对数相关的标签复杂度，那么我们的算法也是如此。我们的算法基于高效抽样，并可以扩展到学习更一般类别的函数。我们进一步通过实验证明了与现有主动学习算法相比，对于逻辑回归，我们的理论结果表现出性能提升。

更新时间: 2025-03-12 18:09:09

领域: cs.LG

下载: http://arxiv.org/abs/2503.05981v2

Towards Causal Model-Based Policy Optimization

Real-world decision-making problems are often marked by complex, uncertain dynamics that can shift or break under changing conditions. Traditional Model-Based Reinforcement Learning (MBRL) approaches learn predictive models of environment dynamics from queried trajectories and then use these models to simulate rollouts for policy optimization. However, such methods do not account for the underlying causal mechanisms that govern the environment, and thus inadvertently capture spurious correlations, making them sensitive to distributional shifts and limiting their ability to generalize. The same naturally holds for model-free approaches. In this work, we introduce Causal Model-Based Policy Optimization (C-MBPO), a novel framework that integrates causal learning into the MBRL pipeline to achieve more robust, explainable, and generalizable policy learning algorithms. Our approach centers on first inferring a Causal Markov Decision Process (C-MDP) by learning a local Structural Causal Model (SCM) of both the state and reward transition dynamics from trajectories gathered online. C-MDPs differ from classic MDPs in that we can decompose causal dependencies in the environment dynamics via specifying an associated Causal Bayesian Network. C-MDPs allow for targeted interventions and counterfactual reasoning, enabling the agent to distinguish between mere statistical correlations and causal relationships. The learned SCM is then used to simulate counterfactual on-policy transitions and rewards under hypothetical actions (or ``interventions"), thereby guiding policy optimization more effectively. The resulting policy learned by C-MBPO can be shown to be robust to a class of distributional shifts that affect spurious, non-causal relationships in the dynamics. We demonstrate this through some simple experiments involving near and far OOD dynamics drifts.

Updated: 2025-03-12 18:09:02

标题: 朝向基于因果模型的政策优化

摘要: 真实世界中的决策问题通常具有复杂、不确定的动态，这些动态可能在不断变化的条件下发生转变或破坏。传统的基于模型的强化学习（MBRL）方法通过学习环境动态的预测模型，然后利用这些模型来模拟策略优化的轨迹。然而，这种方法没有考虑控制环境的潜在因果机制，因此无意中捕捉到虚假相关性，使其对分布变化敏感，并限制了其泛化能力。对于无模型方法也是如此。在这项工作中，我们引入了因果模型驱动的策略优化（C-MBPO），这是一个将因果学习整合到MBRL流程中的新框架，以实现更强大、可解释和可泛化的策略学习算法。我们的方法首先通过在线收集的轨迹学习状态和奖励转移动态的局部结构因果模型（SCM），从而推断出一个因果马尔可夫决策过程（C-MDP）。C-MDP与经典MDP不同之处在于，我们可以通过指定一个相关的因果贝叶斯网络来分解环境动态中的因果依赖关系。C-MDP允许有针对性的干预和反事实推理，使代理能够区分纯粹的统计相关性和因果关系。然后利用学到的SCM来模拟在假设行动（或“干预”）下的反事实的在线转移和奖励，从而更有效地引导策略优化。通过一些涉及近和远的OOD动态漂移的简单实验，我们展示了通过C-MBPO学习的策略对影响动态中虚假的非因果关系的一类分布变化是稳健的。

更新时间: 2025-03-12 18:09:02

领域: cs.LG

下载: http://arxiv.org/abs/2503.09719v1

MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching

This paper presents MoE-Gen, a high-throughput MoE inference system optimized for single-GPU execution. Existing inference systems rely on model-based or continuous batching strategies, originally designed for interactive inference, which result in excessively small batches for MoE's key modules-attention and expert modules-leading to poor throughput. To address this, we introduce module-based batching, which accumulates tokens in host memory and dynamically launches large batches on GPUs to maximize utilization. Additionally, we optimize the choice of batch sizes for each module in an MoE to fully overlap GPU computation and communication, maximizing throughput. Evaluation demonstrates that MoE-Gen achieves 8-31x higher throughput compared to state-of-the-art systems employing model-based batching (FlexGen, MoE-Lightning, DeepSpeed), and offers even greater throughput improvements over continuous batching systems (e.g., vLLM and Ollama) on popular MoE models (DeepSeek and Mixtral) across offline inference tasks. MoE-Gen's source code is publicly available at https://github.com/EfficientMoE/MoE-Gen

Updated: 2025-03-12 18:08:01

标题: MoE-Gen: 基于单个GPU的模块化批处理高吞吐MoE推断

摘要: 本文介绍了MoE-Gen，一个针对单GPU执行进行优化的高吞吐MoE推断系统。现有的推断系统依赖于基于模型或连续批处理策略，最初设计用于交互推断，导致MoE的关键模块-注意力和专家模块的批处理过小，从而导致吞吐量低下。为了解决这个问题，我们引入了基于模块的批处理，它在主机内存中累积令牌，并动态地在GPU上启动大批处理，以最大程度地利用。此外，我们优化了每个MoE模块的批处理大小的选择，以完全重叠GPU计算和通信，从而最大化吞吐量。评估表明，与采用基于模型批处理的最新系统（FlexGen、MoE-Lightning、DeepSpeed）相比，MoE-Gen的吞吐量提高了8-31倍，并且在流畅批处理系统（例如vLLM和Ollama）上对流行的MoE模型（DeepSeek和Mixtral）在离线推断任务中也实现了更大的吞吐量提升。MoE-Gen的源代码可以在https://github.com/EfficientMoE/MoE-Gen上公开获取。

更新时间: 2025-03-12 18:08:01

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2503.09716v1

Revisiting semi-supervised learning in the era of foundation models

Semi-supervised learning (SSL) leverages abundant unlabeled data alongside limited labeled data to enhance learning. As vision foundation models (VFMs) increasingly serve as the backbone of vision applications, it remains unclear how SSL interacts with these pre-trained models. To address this gap, we develop new SSL benchmark datasets where frozen VFMs underperform and systematically evaluate representative SSL methods. We make a surprising observation: parameter-efficient fine-tuning (PEFT) using only labeled data often matches SSL performance, even without leveraging unlabeled data. This motivates us to revisit self-training, a conceptually simple SSL baseline, where we use the supervised PEFT model to pseudo-label unlabeled data for further training. To overcome the notorious issue of noisy pseudo-labels, we propose ensembling multiple PEFT approaches and VFM backbones to produce more robust pseudo-labels. Empirical results validate the effectiveness of this simple yet powerful approach, providing actionable insights into SSL with VFMs and paving the way for more scalable and practical semi-supervised learning in the era of foundation models.

Updated: 2025-03-12 18:01:10

标题: 在基础模型时代重新审视半监督学习

摘要: 半监督学习（SSL）利用大量未标记数据以及有限的标记数据来增强学习。随着视觉基础模型（VFMs）越来越多地成为视觉应用的基础，SSL如何与这些预训练模型交互仍不清楚。为了填补这一空白，我们开发了新的SSL基准数据集，其中冻结的VFMs表现不佳，并系统评估代表性的SSL方法。我们做出了一个令人惊讶的观察：只使用标记数据的参数高效微调（PEFT）经常能够与SSL性能匹配，甚至不利用未标记数据。这激励我们重新审视自我训练，一个概念简单的SSL基线，其中我们使用监督PEFT模型为未标记数据生成伪标签进行进一步训练。为了克服伪标签的臭名昭著问题，我们提出集成多种PEFT方法和VFM骨干来产生更稳健的伪标签。实证结果验证了这种简单但强大方法的有效性，为SSL与VFMs提供了可操作的见解，并为在基础模型时代实现更可扩展和实用的半监督学习铺平了道路。

更新时间: 2025-03-12 18:01:10

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.09707v1

Have LLMs Made Active Learning Obsolete? Surveying the NLP Community

Supervised learning relies on annotated data, which is expensive to obtain. A longstanding strategy to reduce annotation costs is active learning, an iterative process, in which a human annotates only data instances deemed informative by a model. Large language models (LLMs) have pushed the effectiveness of active learning, but have also improved methods such as few- or zero-shot learning, and text synthesis - thereby introducing potential alternatives. This raises the question: has active learning become obsolete? To answer this fully, we must look beyond literature to practical experiences. We conduct an online survey in the NLP community to collect previously intangible insights on the perceived relevance of data annotation, particularly focusing on active learning, including best practices, obstacles and expected future developments. Our findings show that annotated data remains a key factor, and active learning continues to be relevant. While the majority of active learning users find it effective, a comparison with a community survey from over a decade ago reveals persistent challenges: setup complexity, estimation of cost reduction, and tooling. We publish an anonymized version of the collected dataset

Updated: 2025-03-12 18:00:04

标题: LLM是否使得主动学习过时？调查自然语言处理社区

摘要: 监督学习依赖于带有注释的数据，这些数据获取起来成本很高。减少注释成本的一个长期策略是主动学习，这是一个迭代过程，在这个过程中，人类只对模型认为具有信息量的数据实例进行注释。大型语言模型（LLMs）已经提高了主动学习的效果，但也改进了一些方法，如少量或零样本学习，以及文本合成 - 因此引入了潜在的替代方案。这引发了一个问题：主动学习是否已经过时？为了全面回答这个问题，我们必须超越文献，从实际经验出发。我们在自然语言处理（NLP）社区进行了在线调查，以收集以前无法捉摸到的对数据注释相关性的见解，特别关注主动学习，包括最佳实践、障碍和预期未来发展。我们的研究结果显示，带有注释的数据仍然是一个关键因素，主动学习仍然是相关的。虽然大多数主动学习用户认为它是有效的，但与十多年前的社区调查相比，仍然存在持续的挑战：设置复杂性、成本减少估计和工具。我们发布了收集到的数据集的匿名版本。

更新时间: 2025-03-12 18:00:04

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.09701v1

DRESS: Disentangled Representation-based Self-Supervised Meta-Learning for Diverse Tasks

Meta-learning represents a strong class of approaches for solving few-shot learning tasks. Nonetheless, recent research suggests that simply pre-training a generic encoder can potentially surpass meta-learning algorithms. In this paper, we first discuss the reasons why meta-learning fails to stand out in these few-shot learning experiments, and hypothesize that it is due to the few-shot learning tasks lacking diversity. We propose DRESS, a task-agnostic Disentangled REpresentation-based Self-Supervised meta-learning approach that enables fast model adaptation on highly diversified few-shot learning tasks. Specifically, DRESS utilizes disentangled representation learning to create self-supervised tasks that can fuel the meta-training process. Furthermore, we also propose a class-partition based metric for quantifying the task diversity directly on the input space. We validate the effectiveness of DRESS through experiments on datasets with multiple factors of variation and varying complexity. The results suggest that DRESS is able to outperform competing methods on the majority of the datasets and task setups. Through this paper, we advocate for a re-examination of proper setups for task adaptation studies, and aim to reignite interest in the potential of meta-learning for solving few-shot learning tasks via disentangled representations.

Updated: 2025-03-12 18:00:00

标题: DRESS：基于解耦表示的自监督元学习用于多样化任务

摘要: 元学习代表了一类强大的方法，用于解决少样本学习任务。然而，最近的研究表明，简单地预训练一个通用的编码器可能会超越元学习算法。在本文中，我们首先讨论了为什么元学习在这些少样本学习实验中表现不佳的原因，并假设这是因为少样本学习任务缺乏多样性。我们提出了一种称为DRESS的任务无关的基于自监督元学习的方法，该方法利用离散表示学习在高度多样化的少样本学习任务上实现快速模型适应。具体地，DRESS利用离散表示学习创建自监督任务，这些任务可以推动元训练过程。此外，我们还提出了一种基于类别划分的度量方法，直接在输入空间上量化任务的多样性。我们通过在具有多个变化因素和不同复杂性的数据集上进行实验来验证DRESS的有效性。结果表明，DRESS能够在大多数数据集和任务设置上胜过竞争方法。通过本文，我们倡导重新审视任务适应研究的适当设置，并旨在重新激发对通过离散表示解决少样本学习任务的潜力的兴趣。

更新时间: 2025-03-12 18:00:00

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.09679v1

A Geometric Framework for Understanding Memorization in Generative Models

As deep generative models have progressed, recent work has shown them to be capable of memorizing and reproducing training datapoints when deployed. These findings call into question the usability of generative models, especially in light of the legal and privacy risks brought about by memorization. To better understand this phenomenon, we propose the manifold memorization hypothesis (MMH), a geometric framework which leverages the manifold hypothesis into a clear language in which to reason about memorization. We propose to analyze memorization in terms of the relationship between the dimensionalities of (i) the ground truth data manifold and (ii) the manifold learned by the model. This framework provides a formal standard for "how memorized" a datapoint is and systematically categorizes memorized data into two types: memorization driven by overfitting and memorization driven by the underlying data distribution. By analyzing prior work in the context of the MMH, we explain and unify assorted observations in the literature. We empirically validate the MMH using synthetic data and image datasets up to the scale of Stable Diffusion, developing new tools for detecting and preventing generation of memorized samples in the process.

Updated: 2025-03-12 18:00:00

标题: 一个几何框架用于理解生成模型中的记忆化

摘要: 随着深度生成模型的进展，最近的研究表明在部署时，它们能够记忆和再现训练数据点。这些发现对生成模型的可用性提出了质疑，尤其是考虑到由记忆带来的法律和隐私风险。为了更好地理解这一现象，我们提出了流形记忆假设（MMH），这是一个几何框架，将流形假设转化为一种清晰的语言，用于推理记忆。我们建议从（i）地面真实数据流形和（ii）模型学习的流形之间的维度关系方面分析记忆。该框架为“数据点记忆程度”提供了一个形式化标准，并系统地将记忆数据分类为两种类型：过拟合驱动的记忆和基础数据分布驱动的记忆。通过在MMH的背景下分析先前的工作，我们解释并统一了文献中的各种观察结果。我们利用合成数据和图像数据集在Stable Diffusion规模上对MMH进行了经验验证，开发了用于检测和防止生成记忆样本的新工具。

更新时间: 2025-03-12 18:00:00

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2411.00113v2

How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation

As Large Language Models (LLMs) are widely deployed in diverse scenarios, the extent to which they could tacitly spread misinformation emerges as a critical safety concern. Current research primarily evaluates LLMs on explicit false statements, overlooking how misinformation often manifests subtly as unchallenged premises in real-world user interactions. We curated ECHOMIST, the first comprehensive benchmark for implicit misinformation, where the misinformed assumptions are embedded in a user query to LLMs. ECHOMIST is based on rigorous selection criteria and carefully curated data from diverse sources, including real-world human-AI conversations and social media interactions. We also introduce a new evaluation metric to measure whether LLMs can recognize and counter false information rather than amplify users' misconceptions. Through an extensive empirical study on a wide range of LLMs, including GPT-4, Claude, and Llama, we find that current models perform alarmingly poorly on this task, often failing to detect false premises and generating misleading explanations. Our findings underscore the critical need for an increased focus on implicit misinformation in LLM safety research.

Updated: 2025-03-12 17:59:18

标题: 如何保护自己免受5G辐射？探究LLM对隐含误导信息的响应

摘要: 随着大型语言模型（LLMs）在各种场景中被广泛部署，它们潜在地传播错误信息的程度成为一个关键的安全问题。目前的研究主要评估LLMs在明确的虚假陈述上的表现，忽视了在现实世界用户互动中，错误信息常常以未受挑战的前提形式表现的情况。我们精选了ECHOMIST，这是第一个针对隐含错误信息的全面基准，其中错误的假设嵌入到用户对LLMs的查询中。ECHOMIST基于严格的选择标准，并从各种来源精心策划数据，包括真实世界的人工智能对话和社交媒体互动。我们还引入了一个新的评估指标，用于衡量LLMs是否能识别和反驳虚假信息，而不是放大用户的误解。通过对包括GPT-4、Claude和Llama在内的广泛范围的LLMs进行了广泛的实证研究，我们发现当前的模型在这项任务上表现极其糟糕，经常未能检测出错误的前提，并生成误导性的解释。我们的研究结果强调了在LLM安全研究中增加对隐含错误信息的关注的迫切性。

更新时间: 2025-03-12 17:59:18

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2503.09598v1

PRISM: Efficient Long-Range Reasoning With Short-Context LLMs

Long-range tasks demand reasoning over long inputs. Current solutions require large compute budgets, training data, model weight access, or complex task-specific designs. We introduce PRISM, which processes information as a stream of chunks while maintaining a structured in-context memory specified with a typed hierarchical schema. PRISM outperforms baselines on diverse tasks while using at least 4x shorter contexts than long-context models. This approach is token-efficient, producing concise outputs and efficiently leveraging key-value (KV) caches to reduce costs by up to 54% compared to alternative short-context methods. PRISM scales down to tiny chunks (<500 tokens) without increasing encoding costs or sacrificing quality, and generalizes to new tasks with minimal effort by automatically generating schemas from task descriptions.

Updated: 2025-03-12 17:59:18

标题: PRISM: 高效利用短上下文LLMs进行长距离推理

摘要: 长距离任务需要对长输入进行推理。目前的解决方案需要大量的计算预算、训练数据、模型权重访问或复杂的特定任务设计。我们引入了PRISM，它将信息作为一系列块处理，同时保持用类型化的分层模式指定的结构化上下文记忆。PRISM在各种任务上表现优于基线模型，同时使用至少4倍短的上下文比长上下文模型。这种方法是令牌高效的，产生简洁的输出并有效地利用键-值（KV）缓存，将成本降低高达54%，相对于替代的短上下文方法。PRISM可以缩小到微小的块（<500令牌）而不增加编码成本或牺牲质量，并通过自动生成任务描述中的模式，以最小的努力泛化到新任务。

更新时间: 2025-03-12 17:59:18

领域: cs.AI

下载: http://arxiv.org/abs/2412.18914v2

Parsing the Language of Expression: Enhancing Symbolic Regression with Domain-Aware Symbolic Priors

Symbolic regression is essential for deriving interpretable expressions that elucidate complex phenomena by exposing the underlying mathematical and physical relationships in data. In this paper, we present an advanced symbolic regression method that integrates symbol priors from diverse scientific domains - including physics, biology, chemistry, and engineering - into the regression process. By systematically analyzing domain-specific expressions, we derive probability distributions of symbols to guide expression generation. We propose novel tree-structured recurrent neural networks (RNNs) that leverage these symbol priors, enabling domain knowledge to steer the learning process. Additionally, we introduce a hierarchical tree structure for representing expressions, where unary and binary operators are organized to facilitate more efficient learning. To further accelerate training, we compile characteristic expression blocks from each domain and include them in the operator dictionary, providing relevant building blocks. Experimental results demonstrate that leveraging symbol priors significantly enhances the performance of symbolic regression, resulting in faster convergence and higher accuracy.

Updated: 2025-03-12 17:57:48

标题: 解析表达语言：通过领域感知符号先验增强符号回归

摘要: 符号回归对推导可解释表达式至关重要，通过揭示数据中的基础数学和物理关系来阐释复杂现象。本文介绍了一种先进的符号回归方法，将来自不同科学领域（包括物理学、生物学、化学和工程学）的符号先验集成到回归过程中。通过系统分析特定领域的表达式，我们推导出符号的概率分布，以指导表达式的生成。我们提出了一种利用这些符号先验的新型树状结构递归神经网络（RNNs），使领域知识能够引导学习过程。此外，我们引入了一种用于表示表达式的分层树结构，其中一元和二元运算符被组织起来以促进更高效的学习。为了进一步加速训练，我们从每个领域中编译特征表达式块，并将它们包含在运算符字典中，提供相关的构建块。实验结果表明，利用符号先验显著提高了符号回归的性能，导致更快的收敛速度和更高的准确性。

更新时间: 2025-03-12 17:57:48

领域: cs.LG,cs.SC

下载: http://arxiv.org/abs/2503.09592v1

Fair Federated Medical Image Classification Against Quality Shift via Inter-Client Progressive State Matching

Despite the potential of federated learning in medical applications, inconsistent imaging quality across institutions-stemming from lower-quality data from a minority of clients-biases federated models toward more common high-quality images. This raises significant fairness concerns. Existing fair federated learning methods have demonstrated some effectiveness in solving this problem by aligning a single 0th- or 1st-order state of convergence (e.g., training loss or sharpness). However, we argue in this work that fairness based on such a single state is still not an adequate surrogate for fairness during testing, as these single metrics fail to fully capture the convergence characteristics, making them suboptimal for guiding fair learning. To address this limitation, we develop a generalized framework. Specifically, we propose assessing convergence using multiple states, defined as sharpness or perturbed loss computed at varying search distances. Building on this comprehensive assessment, we propose promoting fairness for these states across clients to achieve our ultimate fairness objective. This is accomplished through the proposed method, FedISM+. In FedISM+, the search distance evolves over time, progressively focusing on different states. We then incorporate two components in local training and global aggregation to ensure cross-client fairness for each state. This gradually makes convergence equitable for all states, thereby improving fairness during testing. Our empirical evaluations, performed on the well-known RSNA ICH and ISIC 2019 datasets, demonstrate the superiority of FedISM+ over existing state-of-the-art methods for fair federated learning. The code is available at https://github.com/wnn2000/FFL4MIA.

Updated: 2025-03-12 17:56:28

标题: 公平的联邦医学图像分类：通过客户间逐步状态匹配抵抗质量偏移

摘要: 尽管联邦学习在医疗应用中具有潜力，但由于来自少数客户的低质量数据导致机构之间成像质量不一致，从而使联邦模型偏向更常见的高质量图像。这引发了重要的公平性问题。现有的公平联邦学习方法已经证明通过调整单一0阶或1阶收敛状态（例如，训练损失或锐度）在解决这一问题上有一定效果。然而，在本文中，我们认为基于这样一个单一状态的公平性仍不足以作为测试期间的公平性的充分代理，因为这些单一指标未能完全捕捉收敛特征，使其无法有效引导公平学习。为了解决这一限制，我们开发了一个通用框架。具体来说，我们建议使用多个状态来评估收敛，这些状态定义为在不同搜索距离下计算的锐度或扰动损失。基于这种全面评估，我们提出促进跨客户对这些状态的公平性，以实现我们的最终公平性目标。这是通过所提出的方法FedISM+实现的。在FedISM+中，搜索距离随时间演变，逐渐聚焦于不同的状态。然后，我们在局部训练和全局聚合中加入两个组件，以确保每个状态的跨客户公平性。这逐渐使所有状态的收敛变得公平，从而提高测试期间的公平性。我们在知名的RSNA ICH和ISIC 2019数据集上进行的实证评估表明，FedISM+相对于现有的公平联邦学习的最新方法具有更高的优越性。代码可在https://github.com/wnn2000/FFL4MIA 上找到。

更新时间: 2025-03-12 17:56:28

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.09587v1

Auspex: Building Threat Modeling Tradecraft into an Artificial Intelligence-based Copilot

We present Auspex - a threat modeling system built using a specialized collection of generative artificial intelligence-based methods that capture threat modeling tradecraft. This new approach, called tradecraft prompting, centers on encoding the on-the-ground knowledge of threat modelers within the prompts that drive a generative AI-based threat modeling system. Auspex employs tradecraft prompts in two processing stages. The first stage centers on ingesting and processing system architecture information using prompts that encode threat modeling tradecraft knowledge pertaining to system decomposition and description. The second stage centers on chaining the resulting system analysis through a collection of prompts that encode tradecraft knowledge on threat identification, classification, and mitigation. The two-stage process yields a threat matrix for a system that specifies threat scenarios, threat types, information security categorizations and potential mitigations. Auspex produces formalized threat model output in minutes, relative to the weeks or months a manual process takes. More broadly, the focus on bespoke tradecraft prompting, as opposed to fine-tuning or agent-based add-ons, makes Auspex a lightweight, flexible, modular, and extensible foundational system capable of addressing the complexity, resource, and standardization limitations of both existing manual and automated threat modeling processes. In this connection, we establish the baseline value of Auspex to threat modelers through an evaluation procedure based on feedback collected from cybersecurity subject matter experts measuring the quality and utility of threat models generated by Auspex on real banking systems. We conclude with a discussion of system performance and plans for enhancements to Auspex.

Updated: 2025-03-12 17:54:18

标题: Auspex：将威胁建模技术融入基于人工智能的副驾驶系统

摘要: 我们提出了Auspex - 一个使用专门集成的生成式人工智能方法构建的威胁建模系统，这些方法捕捉了威胁建模技艺。这种新方法，称为技艺提示，侧重于在驱动生成式人工智能威胁建模系统的提示中对威胁建模者的现场知识进行编码。Auspex在两个处理阶段使用技艺提示。第一阶段侧重于使用编码与系统分解和描述相关的威胁建模技艺知识的提示来摄取和处理系统架构信息。第二阶段侧重于通过一系列提示链式地分析结果系统，这些提示编码了关于威胁识别、分类和缓解的技艺知识。这两阶段过程为系统生成了一个威胁矩阵，指定了威胁场景、威胁类型、信息安全分类和潜在的缓解措施。相对于手动处理需要数周或数月的时间，Auspex在几分钟内产生了形式化的威胁模型输出。更广泛地说，与精细调整或基于代理的附加组件相反，专门化的技艺提示使Auspex成为一个轻量级、灵活、模块化和可扩展的基础系统，能够解决现有手动和自动化威胁建模过程的复杂性、资源和标准化限制。在此基础上，我们通过从网络安全专家收集的反馈，评估Auspex在真实银行系统上生成的威胁模型的质量和实用性，建立了Auspex对威胁建模者的基准价值。我们最后讨论了系统性能和对Auspex的改进计划。

更新时间: 2025-03-12 17:54:18

领域: cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.09586v1

DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment

In recent years, imitation learning has made progress in the field of robotic manipulation. However, it still faces challenges when addressing complex long-horizon tasks with deformable objects, such as high-dimensional state spaces, complex dynamics, and multimodal action distributions. Traditional imitation learning methods often require a large amount of data and encounter distributional shifts and accumulative errors in these tasks. To address these issues, we propose a data-efficient general learning framework (DeformPAM) based on preference learning and reward-guided action selection. DeformPAM decomposes long-horizon tasks into multiple action primitives, utilizes 3D point cloud inputs and diffusion models to model action distributions, and trains an implicit reward model using human preference data. During the inference phase, the reward model scores multiple candidate actions, selecting the optimal action for execution, thereby reducing the occurrence of anomalous actions and improving task completion quality. Experiments conducted on three challenging real-world long-horizon deformable object manipulation tasks demonstrate the effectiveness of this method. Results show that DeformPAM improves both task completion quality and efficiency compared to baseline methods even with limited data. Code and data will be available at https://deform-pam.robotflow.ai.

Updated: 2025-03-12 17:54:11

标题: DeformPAM：基于偏好的动作对齐实现长程弹性物体操作的数据高效学习

摘要: 近年来，模仿学习在机器人操纵领域取得了进展。然而，在处理具有可变形物体的复杂长期任务时，仍然面临挑战，例如高维状态空间、复杂动态和多模态动作分布。传统的模仿学习方法通常需要大量数据，并在这些任务中遇到分布转移和累积误差。为了解决这些问题，我们提出了一种基于偏好学习和奖励引导动作选择的数据高效的通用学习框架（DeformPAM）。DeformPAM将长期任务分解为多个动作原语，利用3D点云输入和扩散模型来建模动作分布，并使用人类偏好数据训练隐式奖励模型。在推理阶段，奖励模型对多个候选动作进行评分，选择最佳动作进行执行，从而减少异常动作的发生并提高任务完成质量。在三个具有挑战性的真实世界长期任务中进行的实验表明了这种方法的有效性。结果显示，与基准方法相比，DeformPAM即使在有限数据的情况下也能提高任务完成质量和效率。代码和数据将在https://deform-pam.robotflow.ai 上提供。

更新时间: 2025-03-12 17:54:11

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.11584v2

Minimax Optimality of the Probability Flow ODE for Diffusion Models

Score-based diffusion models have become a foundational paradigm for modern generative modeling, demonstrating exceptional capability in generating samples from complex high-dimensional distributions. Despite the dominant adoption of probability flow ODE-based samplers in practice due to their superior sampling efficiency and precision, rigorous statistical guarantees for these methods have remained elusive in the literature. This work develops the first end-to-end theoretical framework for deterministic ODE-based samplers that establishes near-minimax optimal guarantees under mild assumptions on target data distributions. Specifically, focusing on subgaussian distributions with $\beta$-H\"older smooth densities for $\beta\leq 2$, we propose a smooth regularized score estimator that simultaneously controls both the $L^2$ score error and the associated mean Jacobian error. Leveraging this estimator within a refined convergence analysis of the ODE-based sampling process, we demonstrate that the resulting sampler achieves the minimax rate in total variation distance, modulo logarithmic factors. Notably, our theory comprehensively accounts for all sources of error in the sampling process and does not require strong structural conditions such as density lower bounds or Lipschitz/smooth scores on target distributions, thereby covering a broad range of practical data distributions.

Updated: 2025-03-12 17:51:29

标题: 扩散模型中概率流ODE的极小极值优化

摘要: 基于得分的扩散模型已成为现代生成建模的基础范式，在生成复杂高维分布的样本方面展现出卓越的能力。尽管由于其卓越的抽样效率和精度，概率流ODE-based采样器在实践中被广泛采纳，但这些方法的严格统计保证在文献中仍然难以得到。本研究开发了第一个用于确定性ODE-based采样器的端到端理论框架，建立了在目标数据分布上假设温和的情况下近似最小化的最优保证。具体地，针对具有$\beta$-H\"older平滑密度的次高斯分布，其中$\beta\leq 2$，我们提出了一个平滑正则化得分估计器，同时控制$L^2$得分误差和相关均值雅可比误差。利用这一估计器在ODE-based采样过程的精细收敛分析中，我们证明了由此产生的采样器实现了在总变差距离上的极小化率，除对数因子外。值得注意的是，我们的理论全面考虑了采样过程中的所有误差来源，并不需要目标分布上的密度下界或Lipschitz/光滑得分等强结构条件，因此涵盖了广泛的实际数据分布。

更新时间: 2025-03-12 17:51:29

领域: cs.LG,cs.IT,math.IT,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2503.09583v1

Cost-Optimal Grouped-Query Attention for Long-Context LLMs

Building effective and efficient Transformer-based large language models (LLMs) has recently become a research focus, requiring maximizing model language capabilities and minimizing training and deployment costs. Existing efforts have primarily described complex relationships among model performance, parameter size, and data size, as well as searched for the optimal compute allocation to train LLMs. However, they overlook the impacts of context length and attention head configuration (the number of query and key-value heads in grouped-query attention) on training and inference. In this paper, we systematically compare models with different parameter sizes, context lengths, and attention head configurations in terms of model performance, computational cost, and memory cost. Then, we extend the existing scaling methods, which are based solely on parameter size and training compute, to guide the construction of cost-optimal LLMs during both training and inference. Our quantitative scaling studies show that, when processing sufficiently long sequences, a larger model with fewer attention heads can achieve a lower loss while incurring lower computational and memory costs. Our findings provide valuable insights for developing practical LLMs, especially in long-context processing scenarios. We will publicly release our code and data.

Updated: 2025-03-12 17:50:42

标题: 成本最优的分组查询注意力机制用于长上下文LLMs

摘要: 近期，构建有效和高效的基于Transformer的大型语言模型(LLMs)已成为研究焦点，需要最大化模型的语言能力并最小化训练和部署成本。现有的努力主要描述了模型性能、参数大小和数据大小之间的复杂关系，并寻找了训练LLMs的最佳计算分配。然而，它们忽视了上下文长度和注意力头配置(分组查询注意力中查询和键值头的数量)对训练和推断的影响。在本文中，我们系统地比较了具有不同参数大小、上下文长度和注意力头配置的模型在模型性能、计算成本和内存成本方面的表现。然后，我们将现有的基于参数大小和训练计算的扩展方法，引导构建在训练和推断期间均具有成本最优化的LLMs。我们的定量缩放研究表明，在处理足够长的序列时，一个参数更少的更大模型可以实现更低的损失，同时承担更低的计算和内存成本。我们的发现为开发实用的LLMs提供了宝贵的见解，特别是在长上下文处理场景中。我们将公开发布我们的代码和数据。

更新时间: 2025-03-12 17:50:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.09579v1

Manify: A Python Library for Learning Non-Euclidean Representations

We present Manify, an open-source Python library for non-Euclidean representation learning. Leveraging manifold learning techniques, Manify provides tools for learning embeddings in (products of) non-Euclidean spaces, performing classification and regression with data that lives in such spaces, and estimating the curvature of a manifold. Manify aims to advance research and applications in machine learning by offering a comprehensive suite of tools for manifold-based data analysis. Our source code, examples, datasets, results, and documentation are available at https://github.com/pchlenski/manify

Updated: 2025-03-12 17:44:40

标题: Manify：一个用于学习非欧几里得表示的Python库

摘要: 我们介绍了Manify，这是一个开源的Python库，用于非欧几里得表示学习。利用流形学习技术，Manify提供了在（非欧几里得空间的乘积中）学习嵌入的工具，对生活在这样的空间中的数据进行分类和回归，并估计流形的曲率。Manify旨在通过提供一套全面的基于流形的数据分析工具，推动机器学习的研究和应用。我们的源代码、示例、数据集、结果和文档都可以在https://github.com/pchlenski/manify 上找到。

更新时间: 2025-03-12 17:44:40

领域: cs.LG

下载: http://arxiv.org/abs/2503.09576v1

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. Block diffusion overcomes key limitations of both approaches by supporting flexible-length generation and improving inference efficiency with KV caching and parallel token sampling. We propose a recipe for building effective block diffusion models that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance. Block diffusion sets a new state-of-the-art performance among diffusion models on language modeling benchmarks and enables generation of arbitrary-length sequences. We provide the code, along with the model weights and blog post on the project page: https://m-arriola.com/bd3lms/

Updated: 2025-03-12 17:43:40

标题: 块扩散：在自回归和扩散语言模型之间进行插值

摘要: 扩散语言模型具有独特的优势，可以并行生成和可控性，然而在可能性建模方面落后，并且仅限于固定长度的生成。在这项工作中，我们介绍了一类块扩散语言模型，它在离散去噪扩散和自回归模型之间插值。块扩散通过支持灵活长度生成和通过KV缓存和并行标记抽样提高推理效率，克服了这两种方法的关键限制。我们提出了一种构建有效块扩散模型的方法，包括高效的训练算法、梯度方差的估计器和数据驱动的噪声时间表，以最小化方差。块扩散在语言建模基准测试中树立了新的技术水平，并实现了任意长度序列的生成。我们提供了代码、模型权重和关于该项目的博客文章，链接为：https://m-arriola.com/bd3lms/。

更新时间: 2025-03-12 17:43:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.09573v1

Probabilistic Reasoning with LLMs for k-anonymity Estimation

Probabilistic reasoning is a key aspect of both human and artificial intelligence that allows for handling uncertainty and ambiguity in decision-making. In this paper, we introduce a novel numerical reasoning task under uncertainty, focusing on estimating the k-anonymity of user-generated documents containing privacy-sensitive information. We propose BRANCH, which uses LLMs to factorize a joint probability distribution to estimate the k-value-the size of the population matching the given information-by modeling individual pieces of textual information as random variables. The probability of each factor occurring within a population is estimated using standalone LLMs or retrieval-augmented generation systems, and these probabilities are combined into a final k-value. Our experiments show that this method successfully estimates the correct k-value 67% of the time, an 11% increase compared to GPT-4o chain-of-thought reasoning. Additionally, we leverage LLM uncertainty to develop prediction intervals for k-anonymity, which include the correct value in nearly 92% of cases.

Updated: 2025-03-12 17:41:25

标题: 使用LLMs进行k-匿名性估计的概率推理

摘要: 概率推理是人类和人工智能的关键方面，可以处理决策中的不确定性和模糊性。在本文中，我们介绍了一个新颖的不确定性下的数值推理任务，重点是估计包含隐私敏感信息的用户生成文档的k-匿名性。我们提出了BRANCH，它使用LLMs来分解联合概率分布，以估计k值-匹配给定信息的人口大小-通过将文本信息的个体部分建模为随机变量。每个因子在人口中发生的概率是使用独立的LLMs或检索增强生成系统估计的，并将这些概率组合成最终的k值。我们的实验表明，这种方法成功地在67%的时间内估计出正确的k值，与GPT-4o思维链相比增加了11%。此外，我们利用LLM的不确定性为k-匿名性开发了预测区间，这些区间在近92%的情况下包含了正确的值。

更新时间: 2025-03-12 17:41:25

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.09674v1

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

Recent advancements in reasoning with large language models (RLLMs), such as OpenAI-O1 and DeepSeek-R1, have demonstrated their impressive capabilities in complex domains like mathematics and coding. A central factor in their success lies in the application of long chain-of-thought (Long CoT) characteristics, which enhance reasoning abilities and enable the solution of intricate problems. However, despite these developments, a comprehensive survey on Long CoT is still lacking, limiting our understanding of its distinctions from traditional short chain-of-thought (Short CoT) and complicating ongoing debates on issues like "overthinking" and "test-time scaling." This survey seeks to fill this gap by offering a unified perspective on Long CoT. (1) We first distinguish Long CoT from Short CoT and introduce a novel taxonomy to categorize current reasoning paradigms. (2) Next, we explore the key characteristics of Long CoT: deep reasoning, extensive exploration, and feasible reflection, which enable models to handle more complex tasks and produce more efficient, coherent outcomes compared to the shallower Short CoT. (3) We then investigate key phenomena such as the emergence of Long CoT with these characteristics, including overthinking, and test-time scaling, offering insights into how these processes manifest in practice. (4) Finally, we identify significant research gaps and highlight promising future directions, including the integration of multi-modal reasoning, efficiency improvements, and enhanced knowledge frameworks. By providing a structured overview, this survey aims to inspire future research and further the development of logical reasoning in artificial intelligence.

Updated: 2025-03-12 17:35:03

标题: 走向推理时代：对于推理大型语言模型的长链式思维调查

摘要: 最近在处理大型语言模型（RLLMs）方面取得的进展，例如OpenAI-O1和DeepSeek-R1，展示了它们在数学和编码等复杂领域的令人印象深刻的能力。它们的成功的一个关键因素在于应用长链思维（Long CoT）特征，这些特征增强了推理能力，并使得解决复杂问题成为可能。然而，尽管有这些发展，对Long CoT的综合调查仍然缺乏，限制了我们对其与传统短链思维（Short CoT）的区别的理解，并使得关于"过度思考"和"测试时间扩展"等问题的争论变得更加复杂。本调查旨在通过提供一个统一的Long CoT视角来填补这一空白。首先，我们区分了Long CoT和Short CoT，并引入了一个新颖的分类法来对当前的推理范式进行分类。接下来，我们探讨了Long CoT的关键特征：深度推理、广泛探索和可行反思，这些特征使模型能够处理更复杂的任务，并产生比较浅层的Short CoT更高效、连贯的结果。然后，我们调查了一些关键现象，如具有这些特征的Long CoT的出现，包括过度思考和测试时间扩展，从而提供了这些过程在实践中如何体现的见解。最后，我们确定了重要的研究空白，并强调了有前途的未来方向，包括多模态推理的整合、效率改进和增强的知识框架。通过提供一个结构化的概述，本调查旨在激发未来研究，并推动人工智能中的逻辑推理的发展。

更新时间: 2025-03-12 17:35:03

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.09567v1

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization

Despite deep neural networks' powerful representation learning capabilities, theoretical understanding of how networks can simultaneously achieve meaningful feature learning and global convergence remains elusive. Existing approaches like the neural tangent kernel (NTK) are limited because features stay close to their initialization in this parametrization, leaving open questions about feature properties during substantial evolution. In this paper, we investigate the training dynamics of infinitely wide, $L$-layer neural networks using the tensor program (TP) framework. Specifically, we show that, when trained with stochastic gradient descent (SGD) under the Maximal Update parametrization ($\mu$P) and mild conditions on the activation function, SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum. Our analysis leverages both the interactions among features across layers and the properties of Gaussian random variables, providing new insights into deep representation learning. We further validate our theoretical findings through experiments on real-world datasets.

Updated: 2025-03-12 17:33:13

标题: 全球收敛和$L$层无限宽神经网络在$μP$参数化下的丰富特征学习

摘要: 尽管深度神经网络具有强大的表示学习能力，但网络如何同时实现有意义的特征学习和全局收敛的理论理解仍然难以捉摸。现有方法如神经切线核(NTK)存在局限性，因为在这种参数化下特征保持接近其初始化状态，这就引发了关于特征在重大演变过程中的属性的问题。本文中，我们使用张量程序(TP)框架研究了无限宽度的$L$层神经网络的训练动态。具体来说，我们表明，当在最大更新参数化($\mu$P)下和对激活函数施加温和条件的情况下，采用随机梯度下降(SGD)进行训练时，这些网络能够学习线性无关的特征，这些特征与其初始值大幅偏离。这个丰富的特征空间捕获了相关的数据信息，并确保训练过程的任何收敛点都是全局最小值。我们的分析利用了特征在各层之间的相互作用以及高斯随机变量的属性，为深度表示学习提供了新的见解。我们通过对真实世界数据集的实验进一步验证了我们的理论发现。

更新时间: 2025-03-12 17:33:13

领域: cs.LG,cs.AI,math.OC,stat.ML

下载: http://arxiv.org/abs/2503.09565v1

DAWN-FM: Data-Aware and Noise-Informed Flow Matching for Solving Inverse Problems

Inverse problems, which involve estimating parameters from incomplete or noisy observations, arise in various fields such as medical imaging, geophysics, and signal processing. These problems are often ill-posed, requiring regularization techniques to stabilize the solution. In this work, we employ Flow Matching (FM), a generative framework that integrates a deterministic processes to map a simple reference distribution, such as a Gaussian, to the target distribution. Our method DAWN-FM: Data-AWare and Noise-informed Flow Matching incorporates data and noise embedding, allowing the model to access representations about the measured data explicitly and also account for noise in the observations, making it particularly robust in scenarios where data is noisy or incomplete. By learning a time-dependent velocity field, FM not only provides accurate solutions but also enables uncertainty quantification by generating multiple plausible outcomes. Unlike pre-trained diffusion models, which may struggle in highly ill-posed settings, our approach is trained specifically for each inverse problem and adapts to varying noise levels. We validate the effectiveness and robustness of our method through extensive numerical experiments on tasks such as image deblurring and tomography.

Updated: 2025-03-12 17:30:41

标题: DAWN-FM：数据感知和噪声信息流匹配用于解决逆问题

摘要: 逆问题涉及从不完整或嘈杂的观测中估计参数，在医学成像、地球物理学和信号处理等各个领域中出现。这些问题通常是不适定的，需要正则化技术来稳定解决方案。在这项工作中，我们采用流匹配（FM），这是一个将确定性过程整合到一个简单的参考分布（如高斯分布）映射到目标分布的生成框架。我们的方法DAWN-FM：数据感知和噪声信息流匹配结合了数据和噪声嵌入，使模型能够明确地访问有关测量数据的表示，并考虑观测中的噪声，使其在数据嘈杂或不完整的情况下特别健壮。通过学习一个时变速度场，FM不仅提供准确的解决方案，还通过生成多个合理的结果进行不确定性量化。与预训练扩散模型不同，后者在高度不适定的情况下可能会遇到困难，我们的方法针对每个逆问题进行了专门训练，并适应不同的噪声水平。我们通过对图像去模糊和层析成像等任务进行大量数值实验，验证了我们方法的有效性和鲁棒性。

更新时间: 2025-03-12 17:30:41

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.04766v2

Strategyproof Reinforcement Learning from Human Feedback

We study Reinforcement Learning from Human Feedback (RLHF), where multiple individuals with diverse preferences provide feedback strategically to sway the final policy in their favor. We show that existing RLHF methods are not strategyproof, which can result in learning a substantially misaligned policy even when only one out of $k$ individuals reports their preferences strategically. In turn, we also find that any strategyproof RLHF algorithm must perform $k$-times worse than the optimal policy, highlighting an inherent trade-off between incentive alignment and policy alignment. We then propose a pessimistic median algorithm that, under appropriate coverage assumptions, is approximately strategyproof and converges to the optimal policy as the number of individuals and samples increases.

Updated: 2025-03-12 17:25:52

标题: 策略稳定的强化学习从人类反馈开始

摘要: 我们研究了来自人类反馈的强化学习（RLHF），在这种情况下，多个具有不同偏好的个体战略性地提供反馈，以影响最终政策朝着他们的方向发展。我们发现现有的RLHF方法并非具有策略证明性，这可能导致学习到一个明显不符合预期的政策，即使只有$k$个个体中的一个战略性地报告他们的偏好。反过来，我们还发现任何具有策略证明性的RLHF算法必须比最优政策表现差$k$倍，突显了激励对齐和政策对齐之间的固有权衡。然后，我们提出了一种悲观中值算法，在适当的覆盖假设下，近似具有策略证明性，并在个体和样本数量增加时收敛于最优政策。

更新时间: 2025-03-12 17:25:52

领域: cs.LG

下载: http://arxiv.org/abs/2503.09561v1

The R2D2 Deep Neural Network Series for Scalable Non-Cartesian Magnetic Resonance Imaging

We introduce the R2D2 Deep Neural Network (DNN) series paradigm for fast and scalable image reconstruction from highly-accelerated non-Cartesian k-space acquisitions in Magnetic Resonance Imaging (MRI). While unrolled DNN architectures provide a robust image formation approach via data-consistency layers, embedding non-uniform fast Fourier transform operators in a DNN can become impractical to train at large scale, e.g in 2D MRI with a large number of coils, or for higher-dimensional imaging. Plug-and-play approaches that alternate a learned denoiser blind to the measurement setting with a data-consistency step are not affected by this limitation but their highly iterative nature implies slow reconstruction. To address this scalability challenge, we leverage the R2D2 paradigm that was recently introduced to enable ultra-fast reconstruction for large-scale Fourier imaging in radio astronomy. R2D2's reconstruction is formed as a series of residual images iteratively estimated as outputs of DNN modules taking the previous iteration's data residual as input. The method can be interpreted as a learned version of the Matching Pursuit algorithm. A series of R2D2 DNN modules were sequentially trained in a supervised manner on the fastMRI dataset and validated for 2D multi-coil MRI in simulation and on real data, targeting highly under-sampled radial k-space sampling. Results suggest that a series with only few DNNs achieves superior reconstruction quality over its unrolled incarnation R2D2-Net (whose training is also much less scalable), and over the state-of-the-art diffusion-based "Decomposed Diffusion Sampler" approach (also characterised by a slower reconstruction process).

Updated: 2025-03-12 17:24:47

标题: R2D2深度神经网络系列用于可扩展的非笛卡尔磁共振成像

摘要: 我们介绍了R2D2深度神经网络（DNN）系列范式，用于快速和可扩展地从高度加速的非笛卡尔k空间采集中进行磁共振成像（MRI）的图像重建。虽然展开的DNN架构通过数据一致性层提供了强大的图像形成方法，但在DNN中嵌入非均匀快速傅立叶变换算子可能在大规模训练中变得不切实际，例如在具有大量线圈的2D MRI中，或用于更高维成像。交换一种对测量设置盲目的学习去噪器和数据一致性步骤的即插即用方法不受此限制，但其高度迭代的性质意味着重建速度较慢。为了解决这一可扩展性挑战，我们利用最近引入的R2D2范式，以实现无线电天文学中大规模傅立叶成像的超快速重建。R2D2的重建被形成为一系列残差图像，以前一次迭代的数据残差作为输入来迭代估计DNN模块的输出。该方法可以被解释为匹配追踪算法的学习版本。一系列R2D2 DNN模块在快速MRI数据集上以监督方式顺序训练，并在模拟和真实数据上验证了针对高度欠采样的径向k空间采样的2D多线圈MRI。结果表明，仅有少数DNN的一系列优于其展开版本R2D2-Net（其训练也不太可扩展），并且优于最先进的基于扩散的“分解扩散采样器”方法（其特征是较慢的重建过程）。

更新时间: 2025-03-12 17:24:47

领域: eess.IV,cs.CV,cs.LG,eess.SP

下载: http://arxiv.org/abs/2503.09559v1

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

Text-to-image diffusion models have achieved remarkable success in generating high-quality contents from text prompts. However, their reliance on publicly available data and the growing trend of data sharing for fine-tuning make these models particularly vulnerable to data poisoning attacks. In this work, we introduce the Silent Branding Attack, a novel data poisoning method that manipulates text-to-image diffusion models to generate images containing specific brand logos or symbols without any text triggers. We find that when certain visual patterns are repeatedly in the training data, the model learns to reproduce them naturally in its outputs, even without prompt mentions. Leveraging this, we develop an automated data poisoning algorithm that unobtrusively injects logos into original images, ensuring they blend naturally and remain undetected. Models trained on this poisoned dataset generate images containing logos without degrading image quality or text alignment. We experimentally validate our silent branding attack across two realistic settings on large-scale high-quality image datasets and style personalization datasets, achieving high success rates even without a specific text trigger. Human evaluation and quantitative metrics including logo detection show that our method can stealthily embed logos.

Updated: 2025-03-12 17:21:57

标题: 无声品牌攻击：对文本到图像扩散模型的无触发数据毒化攻击

摘要: 文本到图像扩散模型在从文本提示生成高质量内容方面取得了显著成功。然而，它们对公开可用数据的依赖以及数据共享用于微调的增长趋势使得这些模型特别容易受到数据毒害攻击的影响。在这项工作中，我们介绍了一种新颖的数据毒害方法——默默品牌攻击，它操纵文本到图像扩散模型以生成包含特定品牌标志或符号的图像，而无需任何文本触发。我们发现，当某些视觉模式在训练数据中反复出现时，模型会自然地在其输出中重现这些模式，甚至不需要提示提及。利用这一点，我们开发了一种自动化数据毒害算法，将标志不引人注目地注入原始图像中，确保它们自然融合并保持不被检测。在这种受污染的数据集上训练的模型生成包含标志的图像，而不会降低图像质量或文本对齐。我们在大规模高质量图像数据集和风格个性化数据集上的两个现实设置中实验证明了我们的默默品牌攻击，即使没有特定的文本触发，也能取得高成功率。人类评估和包括标志检测在内的定量指标显示，我们的方法可以隐蔽地嵌入标志。

更新时间: 2025-03-12 17:21:57

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.09669v1

Generative AI Policies under the Microscope: How CS Conferences Are Navigating the New Frontier in Scholarly Writing

As the use of Generative AI (Gen-AI) in scholarly writing and peer reviews continues to rise, it is essential for the computing field to establish and adopt clear Gen-AI policies. This study examines the landscape of Gen-AI policies across 64 major Computer Science conferences and offers recommendations for promoting more effective and responsible use of Gen-AI in the field.

Updated: 2025-03-12 17:10:33

标题: 生成AI政策受到审视：计算机科学会议如何在学术写作的新领域中航行

摘要: 随着在学术写作和同行评审中使用生成式人工智能（Gen-AI）的增加，计算领域建立和采纳明确的Gen-AI政策至关重要。本研究审视了64个主要计算机科学会议的Gen-AI政策现状，并提出了促进领域内更有效和负责任使用Gen-AI的建议。

更新时间: 2025-03-12 17:10:33

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.11977v4

A Generative Framework for Predictive Modeling of Multiple Chronic Conditions Using Graph Variational Autoencoder and Bandit-Optimized Graph Neural Network

Predicting the emergence of multiple chronic conditions (MCC) is crucial for early intervention and personalized healthcare, as MCC significantly impacts patient outcomes and healthcare costs. Graph neural networks (GNNs) are effective methods for modeling complex graph data, such as those found in MCC. However, a significant challenge with GNNs is their reliance on an existing graph structure, which is not readily available for MCC. To address this challenge, we propose a novel generative framework for GNNs that constructs a representative underlying graph structure by utilizing the distribution of the data to enhance predictive analytics for MCC. Our framework employs a graph variational autoencoder (GVAE) to capture the complex relationships in patient data. This allows for a comprehensive understanding of individual health trajectories and facilitates the creation of diverse patient stochastic similarity graphs while preserving the original feature set. These variations of patient stochastic similarity graphs, generated from the GVAE decoder, are then processed by a GNN using a novel Laplacian regularization technique to refine the graph structure over time and improves the prediction accuracy of MCC. A contextual Bandit is designed to evaluate the stochastically generated graphs and identify the best-performing graph for the GNN model iteratively until model convergence. We validate the performance of the proposed contextual Bandit algorithm against $\varepsilon$-Greedy and multi-armed Bandit algorithms on a large cohort (n = 1,592) of patients with MCC. These advancements highlight the potential of the proposed approach to transform predictive healthcare analytics, enabling a more personalized and proactive approach to MCC management.

Updated: 2025-03-12 17:08:05

标题: 一种用图变分自动编码器和优化图神经网络进行多种慢性病预测建模的生成框架

摘要: 预测多种慢性疾病（MCC）的出现对于早期干预和个性化医疗至关重要，因为MCC显著影响患者结果和医疗成本。图神经网络（GNNs）是对建模复杂图数据有效的方法，例如在MCC中发现的数据。然而，GNNs面临的一个重要挑战是它们依赖于现有的图结构，而这对于MCC并不readily可用。为了解决这一挑战，我们提出了一种新颖的GNNs生成框架，通过利用数据的分布来构建一个代表性的基础图结构，以增强MCC的预测分析。我们的框架采用图变分自动编码器（GVAE）来捕获患者数据中的复杂关系。这允许全面了解个体健康轨迹，并促进创建多样化的患者随机相似图，同时保留原始特征集。从GVAE解码器生成的这些患者随机相似图的变化，然后通过一种新颖的拉普拉斯正则化技术由GNNs处理，以随时间改进图结构并提高MCC的预测准确性。设计一个环境赌徒来评估随机产生的图，并迭代地识别最佳表现的图给GNN模型，直到模型收敛。我们在一个大队列（n = 1,592）的MCC患者中验证了所提出的环境赌徒算法与ε-贪婪和多臂赌徒算法的性能。这些进步突显了所提出方法转变预测性医疗分析的潜力，实现更个性化和主动的MCC管理方法。

更新时间: 2025-03-12 17:08:05

领域: cs.LG

下载: http://arxiv.org/abs/2409.13671v2

Grounding Video Models to Actions through Goal Conditioned Exploration

Large video models, pretrained on massive amounts of Internet video, provide a rich source of physical knowledge about the dynamics and motions of objects and tasks. However, video models are not grounded in the embodiment of an agent, and do not describe how to actuate the world to reach the visual states depicted in a video. To tackle this problem, current methods use a separate vision-based inverse dynamic model trained on embodiment-specific data to map image states to actions. Gathering data to train such a model is often expensive and challenging, and this model is limited to visual settings similar to the ones in which data are available. In this paper, we investigate how to directly ground video models to continuous actions through self-exploration in the embodied environment -- using generated video states as visual goals for exploration. We propose a framework that uses trajectory level action generation in combination with video guidance to enable an agent to solve complex tasks without any external supervision, e.g., rewards, action labels, or segmentation masks. We validate the proposed approach on 8 tasks in Libero, 6 tasks in MetaWorld, 4 tasks in Calvin, and 12 tasks in iThor Visual Navigation. We show how our approach is on par with or even surpasses multiple behavior cloning baselines trained on expert demonstrations while without requiring any action annotations.

Updated: 2025-03-12 17:03:25

标题: 将视频模型通过目标条件探索与动作相连接

摘要: 大型视频模型在大量互联网视频上进行预训练，提供了关于物体和任务的动态和运动的丰富物理知识。然而，视频模型并没有根植于一个agent的实体，并且没有描述如何操纵世界以达到视频中所描绘的视觉状态。为了解决这个问题，当前的方法使用一个单独的基于视觉的逆动力学模型，该模型在特定实体数据上进行训练，将图像状态映射到动作。收集数据来训练这样一个模型通常是昂贵且具有挑战性的，而且该模型仅限于与可用数据相似的视觉设置。在本文中，我们研究了如何通过在具体环境中进行自我探索，直接将视频模型与连续动作联系起来--使用生成的视频状态作为探索的视觉目标。我们提出了一个框架，该框架结合了轨迹级别的动作生成和视频指导，使代理能够在没有任何外部监督的情况下解决复杂任务，例如奖励、动作标签或分割蒙版。我们在Libero的8个任务、MetaWorld的6个任务、Calvin的4个任务和iThor Visual Navigation的12个任务上验证了所提出的方法。我们展示了我们的方法如何与甚至超过基于专家演示训练的多个行为克隆基线相媲美，而无需任何动作注释。

更新时间: 2025-03-12 17:03:25

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.07223v2

Fair Play in the Fast Lane: Integrating Sportsmanship into Autonomous Racing Systems

Autonomous racing has gained significant attention as a platform for high-speed decision-making and motion control. While existing methods primarily focus on trajectory planning and overtaking strategies, the role of sportsmanship in ensuring fair competition remains largely unexplored. In human racing, rules such as the one-motion rule and the enough-space rule prevent dangerous and unsportsmanlike behavior. However, autonomous racing systems often lack mechanisms to enforce these principles, potentially leading to unsafe maneuvers. This paper introduces a bi-level game-theoretic framework to integrate sportsmanship (SPS) into versus racing. At the high level, we model racing intentions using a Stackelberg game, where Monte Carlo Tree Search (MCTS) is employed to derive optimal strategies. At the low level, vehicle interactions are formulated as a Generalized Nash Equilibrium Problem (GNEP), ensuring that all agents follow sportsmanship constraints while optimizing their trajectories. Simulation results demonstrate the effectiveness of the proposed approach in enforcing sportsmanship rules while maintaining competitive performance. We analyze different scenarios where attackers and defenders adhere to or disregard sportsmanship rules and show how knowledge of these constraints influences strategic decision-making. This work highlights the importance of balancing competition and fairness in autonomous racing and provides a foundation for developing ethical and safe AI-driven racing systems.

Updated: 2025-03-12 17:02:38

标题: 快车道上的公平竞赛：将体育精神融入自主赛车系统中

摘要: 自主赛车作为一个高速决策和运动控制平台，已经引起了重大关注。尽管现有的方法主要集中在轨迹规划和超车策略上，但确保公平竞争的体育精神在其中的角色仍然很少被探索。在人类赛车中，诸如一次性规则和足够空间规则等规定可以防止危险和不体面的行为。然而，自主赛车系统通常缺乏强制执行这些原则的机制，可能导致不安全的操作。本文引入了一个双层博弈理论框架，将体育精神（SPS）融入对抗赛车中。在高层，我们使用斯塔克尔贝格博弈模拟赛车意图，其中蒙特卡洛树搜索（MCTS）用于推导最佳策略。在低层，车辆交互被构建为广义纳什均衡问题（GNEP），确保所有代理遵循体育精神约束条件同时优化他们的轨迹。仿真结果表明，所提出的方法在执行体育精神规则的同时保持竞争性能的有效性。我们分析了攻击者和防御者遵守或无视体育精神规则的不同情景，并展示了这些约束条件对战略决策的影响。这项工作强调了在自主赛车中平衡竞争和公平的重要性，并为开发道德和安全的AI驱动赛车系统奠定了基础。

更新时间: 2025-03-12 17:02:38

领域: cs.AI,cs.GT,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.03774v2

The Value of Goal Commitment in Planning

In this paper, we revisit the concept of goal commitment from early planners in the presence of current forward chaining heuristic planners. We present a compilation that extends the original planning task with commit actions that enforce the persistence of specific goals once achieved, thereby committing to them in the search sub-tree. This approach imposes a specific goal achievement order in parts of the search tree, potentially introducing dead-end states. This can reduce search effort if the goal achievement order is correct. Otherwise, the search algorithm can expand nodes in the open list where goals do not persist. Experimental results demonstrate that the reformulated tasks suit state-of-the-art agile planners, enabling them to find better

Updated: 2025-03-12 17:00:37

标题: 计划中目标承诺的价值

摘要: 在本文中，我们重新审视了早期规划者对目标承诺的概念，与当前的前向链接启发式规划者并存。我们提出了一种扩展原始规划任务的编译，通过执行承诺动作来强化一旦实现就持久的特定目标，从而在搜索子树中承诺它们。这种方法在搜索树的部分中强加了特定的目标实现顺序，可能引入死胡同状态。如果目标实现顺序正确，这可能减少搜索工作量。否则，搜索算法可能会在目标不持久的开放列表中扩展节点。实验结果表明，重新制定的任务适合最先进的敏捷规划者，使它们能够找到更好的解决方案。

更新时间: 2025-03-12 17:00:37

领域: cs.AI

下载: http://arxiv.org/abs/2503.09545v1

PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs

The stability of language model pre-training and its effects on downstream performance are still understudied. Prior work shows that the training process can yield significantly different results in response to slight variations in initial conditions, e.g., the random seed. Crucially, the research community still lacks sufficient resources and tools to systematically investigate pre-training stability, particularly for decoder-only language models. We introduce the PolyPythias, a set of 45 new training runs for the Pythia model suite: 9 new seeds across 5 model sizes, from 14M to 410M parameters, resulting in about 7k new checkpoints that we release. Using these new 45 training runs, in addition to the 5 already available, we study the effects of different initial conditions determined by the seed -- i.e., parameters' initialisation and data order -- on (i) downstream performance, (ii) learned linguistic representations, and (iii) emergence of training phases. In addition to common scaling behaviours, our analyses generally reveal highly consistent training dynamics across both model sizes and initial conditions. Further, the new seeds for each model allow us to identify outlier training runs and delineate their characteristics. Our findings show the potential of using these methods to predict training stability.

Updated: 2025-03-12 16:59:30

标题: PolyPythias：五十个语言模型预训练运行中的稳定性和异常值

摘要: 语言模型预训练的稳定性及其对下游性能的影响仍未得到充分研究。先前的工作表明，训练过程可能会对初始条件（例如随机种子）的轻微变化产生显著不同的结果。研究社区仍然缺乏足够的资源和工具来系统地调查预训练的稳定性，特别是针对仅解码器的语言模型。我们引入了PolyPythias，这是Pythia模型套件的45个新训练运行：在5个不同模型尺寸（从14M到410M参数）上使用9个新的随机种子，产生约7k个新的检查点。利用这45个新的训练运行，加上已有的5个，我们研究了由种子确定的不同初始条件对下游性能、学习的语言表示以及训练阶段的出现的影响。除了常见的扩展行为之外，我们的分析通常显示出模型尺寸和初始条件之间高度一致的训练动态。此外，每个模型的新种子使我们能够识别异常训练运行并描述其特征。我们的发现显示了利用这些方法来预测训练稳定性的潜力。

更新时间: 2025-03-12 16:59:30

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.09543v1

Blockchain-Enabled Management Framework for Federated Coalition Networks

In a globalized and interconnected world, interoperability has become a key concept for advancing tactical scenarios. Federated Coalition Networks (FCN) enable cooperation between entities from multiple nations while allowing each to maintain control over their systems. However, this interoperability necessitates the sharing of increasing amounts of information between different tactical assets, raising the need for higher security measures. Emerging technologies like blockchain drive a revolution in secure communications, paving the way for new tactical scenarios. In this work, we propose a blockchain-based framework to enhance the resilience and security of the management of these networks. We offer a guide to FCN design to help a broad audience understand the military networks in international missions by a use case and key functions applied to a proposed architecture. We evaluate its effectiveness and performance in information encryption to validate this framework.

Updated: 2025-03-12 16:59:23

标题: 区块链启用的联邦联盟网络管理框架

摘要: 在一个全球化和互联互通的世界中，互操作性已成为推进战术情景的关键概念。联合联盟网络（FCN）使多个国家的实体能够合作，同时允许每个实体保持对其系统的控制。然而，这种互操作性需要在不同战术资产之间共享越来越多的信息，从而提高了对更高安全措施的需求。像区块链这样的新兴技术推动了安全通信的革命，为新的战术情景铺平道路。在这项工作中，我们提出了一个基于区块链的框架，以增强这些网络管理的韧性和安全性。我们提供了一个FCN设计指南，通过一个使用案例和应用于提议架构的关键功能，帮助广泛的受众了解国际任务中的军事网络。我们评估了该框架在信息加密方面的有效性和性能，以验证这一框架。

更新时间: 2025-03-12 16:59:23

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.09666v1

Neural Network-Based Change Point Detection for Large-Scale Time-Evolving Data

The paper studies the problem of detecting and locating change points in multivariate time-evolving data. The problem has a long history in statistics and signal processing and various algorithms have been developed primarily for simple parametric models. In this work, we focus on modeling the data through feed-forward neural networks and develop a detection strategy based on the following two-step procedure. In the first step, the neural network is trained over a prespecified window of the data, and its test error function is calibrated over another prespecified window. Then, the test error function is used over a moving window to identify the change point. Once a change point is detected, the procedure involving these two steps is repeated until all change points are identified. The proposed strategy yields consistent estimates for both the number and the locations of the change points under temporal dependence of the data-generating process. The effectiveness of the proposed strategy is illustrated on synthetic data sets that provide insights on how to select in practice tuning parameters of the algorithm and in real data sets. Finally, we note that although the detection strategy is general and can work with different neural network architectures, the theoretical guarantees provided are specific to feed-forward neural architectures.

Updated: 2025-03-12 16:58:52

标题: 基于神经网络的大规模时间演变数据变点检测

摘要: 本文研究了在多变量时间演变数据中检测和定位变化点的问题。这个问题在统计学和信号处理领域有很长的历史，各种算法主要针对简单的参数模型进行了开发。在这项工作中，我们专注于通过前馈神经网络对数据进行建模，并基于以下两步策略开发检测策略。在第一步中，神经网络在预定的数据窗口上进行训练，并且其测试误差函数在另一个预定的窗口上进行校准。然后，该测试误差函数在移动窗口上使用以识别变化点。一旦检测到变化点，就会重复涉及这两个步骤的过程，直到识别出所有的变化点。所提出的策略在数据生成过程的时间依赖性下产生了一致的变化点数量和位置的估计。所提出策略的有效性在提供了关于算法调整参数的实践见解的合成数据集上得到了说明，并在真实数据集上展示了效果。最后，我们注意到，尽管检测策略是通用的并且可以与不同的神经网络架构一起工作，但提供的理论保证是特定于前馈神经网络架构的。

更新时间: 2025-03-12 16:58:52

领域: stat.ML,cs.LG,stat.AP,stat.CO,stat.ME

下载: http://arxiv.org/abs/2503.09541v1

Optimisation of the Accelerator Control by Reinforcement Learning: A Simulation-Based Approach

Optimizing accelerator control is a critical challenge in experimental particle physics, requiring significant manual effort and resource expenditure. Traditional tuning methods are often time-consuming and reliant on expert input, highlighting the need for more efficient approaches. This study aims to create a simulation-based framework integrated with Reinforcement Learning (RL) to address these challenges. Using \texttt{Elegant} as the simulation backend, we developed a Python wrapper that simplifies the interaction between RL algorithms and accelerator simulations, enabling seamless input management, simulation execution, and output analysis. The proposed RL framework acts as a co-pilot for physicists, offering intelligent suggestions to enhance beamline performance, reduce tuning time, and improve operational efficiency. As a proof of concept, we demonstrate the application of our RL approach to an accelerator control problem and highlight the improvements in efficiency and performance achieved through our methodology. We discuss how the integration of simulation tools with a Python-based RL framework provides a powerful resource for the accelerator physics community, showcasing the potential of machine learning in optimizing complex physical systems.

Updated: 2025-03-12 16:57:52

标题: 强化学习在加速器控制中的优化：基于模拟的方法

摘要: 优化加速器控制是实验粒子物理学中的一个关键挑战，需要大量人工努力和资源投入。传统的调试方法通常耗时长且依赖专家输入，突显了需要更高效方法的必要性。本研究旨在创建一个与强化学习（RL）集成的基于模拟的框架，以解决这些挑战。使用\texttt{Elegant}作为模拟后端，我们开发了一个简化RL算法和加速器模拟之间互动的Python包装器，实现无缝输入管理、模拟执行和输出分析。所提出的RL框架充当物理学家的副驾驶，提供智能建议来增强管线性能，减少调试时间，并提高运行效率。作为概念验证，我们展示了我们的RL方法在加速器控制问题上的应用，并突出了通过我们的方法论实现的效率和性能改善。我们讨论了模拟工具与基于Python的RL框架的集成如何为加速器物理学社区提供强大资源，并展示了机器学习在优化复杂物理系统中的潜力。

更新时间: 2025-03-12 16:57:52

领域: physics.acc-ph,cs.LG

下载: http://arxiv.org/abs/2503.09665v1

Differentially Private Equilibrium Finding in Polymatrix Games

We study equilibrium finding in polymatrix games under differential privacy constraints. To start, we show that high accuracy and asymptotically vanishing differential privacy budget (as the number of players goes to infinity) cannot be achieved simultaneously under either of the two settings: (i) We seek to establish equilibrium approximation guarantees in terms of Euclidean distance to the equilibrium set, and (ii) the adversary has access to all communication channels. Then, assuming the adversary has access to a constant number of communication channels, we develop a novel distributed algorithm that recovers strategies with simultaneously vanishing Nash gap (in expected utility, also referred to as exploitability and privacy budget as the number of players increases.

Updated: 2025-03-12 16:54:23

标题: 多边形游戏中的差分隐私均衡搜索

摘要: 我们研究了在差分隐私约束下多矩阵博弈中的均衡发现。首先，我们展示了在两种设置下高精度和渐近消失的差分隐私预算（随着玩家数量趋近于无穷大）不能同时实现：（i）我们寻求建立均衡逼近保证，以欧几里得距离到均衡集合为指标，和（ii）对手可以访问所有通信渠道。然后，假设对手可以访问恒定数量的通信渠道，我们开发了一种新颖的分布式算法，可以恢复具有同时消失的纳什差距（在预期效用中，也称为可利用性）和隐私预算随着玩家数量增加的策略。

更新时间: 2025-03-12 16:54:23

领域: cs.GT,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2503.09538v1

GenHPE: Generative Counterfactuals for 3D Human Pose Estimation with Radio Frequency Signals

Human pose estimation (HPE) detects the positions of human body joints for various applications. Compared to using cameras, HPE using radio frequency (RF) signals is non-intrusive and more robust to adverse conditions, exploiting the signal variations caused by human interference. However, existing studies focus on single-domain HPE confined by domain-specific confounders, which cannot generalize to new domains and result in diminished HPE performance. Specifically, the signal variations caused by different human body parts are entangled, containing subject-specific confounders. RF signals are also intertwined with environmental noise, involving environment-specific confounders. In this paper, we propose GenHPE, a 3D HPE approach that generates counterfactual RF signals to eliminate domain-specific confounders. GenHPE trains generative models conditioned on human skeleton labels, learning how human body parts and confounders interfere with RF signals. We manipulate skeleton labels (i.e., removing body parts) as counterfactual conditions for generative models to synthesize counterfactual RF signals. The differences between counterfactual signals approximately eliminate domain-specific confounders and regularize an encoder-decoder model to learn domain-independent representations. Such representations help GenHPE generalize to new subjects/environments for cross-domain 3D HPE. We evaluate GenHPE on three public datasets from WiFi, ultra-wideband, and millimeter wave. Experimental results show that GenHPE outperforms state-of-the-art methods and reduces estimation errors by up to 52.2mm for cross-subject HPE and 10.6mm for cross-environment HPE.

Updated: 2025-03-12 16:53:58

标题: GenHPE：使用无线电频信号进行三维人体姿势估计的生成对抗样本

摘要: 人体姿势估计（HPE）检测人体关节的位置，用于各种应用。与使用摄像头相比，利用射频（RF）信号的HPE是非侵入性的，并且对不利条件更加稳健，利用人类干扰引起的信号变化。然而，现有研究集中在受特定领域混淆因素限制的单一领域HPE上，这些混淆因素无法推广到新领域，并导致HPE性能降低。具体来说，不同人体部位引起的信号变化是相互纠缠的，包含特定主体的混淆因素。RF信号也与环境噪声交织在一起，涉及环境特定的混淆因素。在本文中，我们提出了GenHPE，一种生成对抗RF信号的3D HPE方法，以消除特定领域的混淆因素。GenHPE训练生成模型，以人类骨骼标签为条件，学习人体部位和混淆因素如何干扰RF信号。我们操纵骨骼标签（即删除身体部位）作为生成模型合成对抗RF信号的反事实条件。反事实信号之间的差异近似消除特定领域的混淆因素，并规范化编码器-解码器模型，以学习领域独立的表示。这种表示有助于GenHPE推广到新的主体/环境，用于跨领域3D HPE。我们在来自WiFi、超宽带和毫米波的三个公共数据集上评估了GenHPE。实验结果显示，GenHPE优于最先进的方法，并减少了跨主体HPE的估计误差达52.2毫米，跨环境HPE为10.6毫米。

更新时间: 2025-03-12 16:53:58

领域: cs.CV,cs.AI,cs.MM,eess.SP

下载: http://arxiv.org/abs/2503.09537v1

Evaluating Visual Explanations of Attention Maps for Transformer-based Medical Imaging

Although Vision Transformers (ViTs) have recently demonstrated superior performance in medical imaging problems, they face explainability issues similar to previous architectures such as convolutional neural networks. Recent research efforts suggest that attention maps, which are part of decision-making process of ViTs can potentially address the explainability issue by identifying regions influencing predictions, especially in models pretrained with self-supervised learning. In this work, we compare the visual explanations of attention maps to other commonly used methods for medical imaging problems. To do so, we employ four distinct medical imaging datasets that involve the identification of (1) colonic polyps, (2) breast tumors, (3) esophageal inflammation, and (4) bone fractures and hardware implants. Through large-scale experiments on the aforementioned datasets using various supervised and self-supervised pretrained ViTs, we find that although attention maps show promise under certain conditions and generally surpass GradCAM in explainability, they are outperformed by transformer-specific interpretability methods. Our findings indicate that the efficacy of attention maps as a method of interpretability is context-dependent and may be limited as they do not consistently provide the comprehensive insights required for robust medical decision-making.

Updated: 2025-03-12 16:52:52

标题: 评估基于Transformer的医学影像注意力图的视觉解释

摘要: 尽管视觉Transformer（ViTs）最近在医学影像问题中表现出优越性能，但它们面临与以往架构（如卷积神经网络）相似的可解释性问题。最近的研究努力表明，ViTs的决策过程中的注意力图可以通过识别影响预测的区域，特别是在经过自监督学习预训练的模型中，潜在地解决可解释性问题。在这项工作中，我们比较了注意力图的视觉解释与其他常用方法在医学影像问题中的表现。为此，我们利用四个不同的医学影像数据集，涉及（1）结肠息肉的识别，（2）乳腺肿瘤，（3）食管炎和（4）骨折和植入硬件。通过对上述数据集进行大规模实验，使用各种监督和自监督预训练的ViTs，我们发现，尽管在某些条件下注意力图表现出潜力，并且通常在可解释性方面超越GradCAM，但在Transformer特定的解释性方法面前表现不佳。我们的研究结果表明，注意力图作为一种解释性方法的有效性取决于上下文，并且可能受到限制，因为它们并不总能提供所需的全面见解，以支持健壮的医学决策。

更新时间: 2025-03-12 16:52:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09535v1

Large Language Models for Multi-Facility Location Mechanism Design

Designing strategyproof mechanisms for multi-facility location that optimize social costs based on agent preferences had been challenging due to the extensive domain knowledge required and poor worst-case guarantees. Recently, deep learning models have been proposed as alternatives. However, these models require some domain knowledge and extensive hyperparameter tuning as well as lacking interpretability, which is crucial in practice when transparency of the learned mechanisms is mandatory. In this paper, we introduce a novel approach, named LLMMech, that addresses these limitations by incorporating large language models (LLMs) into an evolutionary framework for generating interpretable, hyperparameter-free, empirically strategyproof, and nearly optimal mechanisms. Our experimental results, evaluated on various problem settings where the social cost is arbitrarily weighted across agents and the agent preferences may not be uniformly distributed, demonstrate that the LLM-generated mechanisms generally outperform existing handcrafted baselines and deep learning models. Furthermore, the mechanisms exhibit impressive generalizability to out-of-distribution agent preferences and to larger instances with more agents.

Updated: 2025-03-12 16:49:56

标题: 大型语言模型用于多设施位置机制设计

摘要: 在基于代理偏好优化社会成本的多设施位置设计策略时，由于需要广泛的领域知识和糟糕的最坏情况保证，这一直是一个具有挑战性的问题。最近，深度学习模型被提出作为替代方案。然而，这些模型需要一定的领域知识和广泛的超参数调整，同时缺乏可解释性，在实践中，当透明地学习机制是强制的时候，这一点至关重要。在本文中，我们介绍了一种新方法，名为LLMMech，通过将大型语言模型(LLMs)纳入进化框架中，以生成可解释、无超参数、经验性策略证明和几乎最优的机制，从而解决了这些限制。我们的实验结果在各种问题设置上进行评估，在这些设置中，社会成本在代理之间被任意加权，代理偏好可能不是均匀分布的，结果表明LLM生成的机制通常优于现有的手工基线和深度学习模型。此外，这些机制展现出对于超出分布的代理偏好和更多代理的更大实例的惊人泛化能力。

更新时间: 2025-03-12 16:49:56

领域: cs.LG

下载: http://arxiv.org/abs/2503.09533v1

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Sparse autoencoders (SAEs) are a popular technique for interpreting language model activations, and there is extensive recent work on improving SAE effectiveness. However, most prior work evaluates progress using unsupervised proxy metrics with unclear practical relevance. We introduce SAEBench, a comprehensive evaluation suite that measures SAE performance across seven diverse metrics, spanning interpretability, feature disentanglement and practical applications like unlearning. To enable systematic comparison, we open-source a suite of over 200 SAEs across eight recently proposed SAE architectures and training algorithms. Our evaluation reveals that gains on proxy metrics do not reliably translate to better practical performance. For instance, while Matryoshka SAEs slightly underperform on existing proxy metrics, they substantially outperform other architectures on feature disentanglement metrics; moreover, this advantage grows with SAE scale. By providing a standardized framework for measuring progress in SAE development, SAEBench enables researchers to study scaling trends and make nuanced comparisons between different SAE architectures and training methodologies. Our interactive interface enables researchers to flexibly visualize relationships between metrics across hundreds of open-source SAEs at: https://saebench.xyz

Updated: 2025-03-12 16:49:02

标题: SAEBench：语言模型可解释性中稀疏自动编码器的全面基准

摘要: 稀疏自动编码器（SAEs）是一种流行的技术，用于解释语言模型的激活，并且近年来有大量关于提高SAE效果的工作。然而，大多数先前的工作使用不明确实际相关性的无监督代理指标来评估进展。我们引入了SAEBench，一个全面的评估套件，可以跨越七种不同的指标来衡量SAE的性能，包括可解释性、特征解缠以及像取消学习这样的实际应用。为了实现系统化比较，我们开源了超过200个SAE，涵盖了八种最近提出的SAE架构和训练算法。我们的评估显示，代理指标上的收益并不一定能够可靠地转化为更好的实际性能。例如，尽管Matryoshka SAE在现有的代理指标上表现稍逊色，但在特征解缠指标上明显优于其他架构；此外，这种优势随着SAE规模的增加而增加。通过提供一个标准化框架来衡量SAE发展进展，SAEBench使研究人员能够研究扩展趋势，并对不同的SAE架构和训练方法进行细致的比较。我们的交互界面使研究人员能够灵活地可视化成百上千个开源SAE之间的指标关系，网址为：https://saebench.xyz。

更新时间: 2025-03-12 16:49:02

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.09532v1

Multi-Task Reinforcement Learning Enables Parameter Scaling

Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design itself or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.

Updated: 2025-03-12 16:43:00

标题: 多任务强化学习实现参数缩放

摘要: 多任务强化学习（MTRL）旨在赋予单个代理在多个任务上表现良好的能力。最近的研究集中于开发新颖复杂的架构以提高性能，通常导致模型变得更大；然而，目前尚不清楚性能提升是架构设计本身的结果还是额外参数的结果。我们认为性能主要是由于规模扩大而获得的，通过展示朴素地将简单的MTRL基准线扩展到与参数数量相匹配会胜过更复杂的架构，并且这些收益最大化来自于扩展评论者而非执行者。此外，我们探讨了与任务多样性相伴随的训练稳定性优势，展示增加任务数量可以帮助减轻可塑性损失。我们的发现表明，MTRL在多个任务之间的同时训练提供了一个有益的参数扩展框架，挑战了对于复杂架构创新的需求。

更新时间: 2025-03-12 16:43:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.05126v3

CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games

Recent advances in Vision-Language-Action models (VLAs) have expanded the capabilities of embodied intelligence. However, significant challenges remain in real-time decision-making in complex 3D environments, which demand second-level responses, high-resolution perception, and tactical reasoning under dynamic conditions. To advance the field, we introduce CombatVLA, an efficient VLA model optimized for combat tasks in 3D action role-playing games(ARPGs). Specifically, our CombatVLA is a 3B model trained on video-action pairs collected by an action tracker, where the data is formatted as action-of-thought (AoT) sequences. Thereafter, CombatVLA seamlessly integrates into an action execution framework, allowing efficient inference through our truncated AoT strategy. Experimental results demonstrate that CombatVLA not only outperforms all existing models on the combat understanding benchmark but also achieves a 50-fold acceleration in game combat. Moreover, it has a higher task success rate than human players. We will open-source all resources, including the action tracker, dataset, benchmark, model weights, training code, and the implementation of the framework at https://combatvla.github.io/.

Updated: 2025-03-12 16:42:26

标题: CombatVLA：一种高效的三维动作角色扮演游戏中战斗任务的视觉-语言-动作模型

摘要: 最近在视觉-语言-行为模型（VLAs）方面取得的进展扩展了具身智能的能力。然而，在复杂的3D环境中进行实时决策仍然存在重大挑战，这需要对动态条件下的次级响应、高分辨率感知和战术推理。为了推动该领域的发展，我们引入了CombatVLA，一个专为3D动作角色扮演游戏（ARPGs）中的作战任务优化的高效VLA模型。具体而言，我们的CombatVLA是一个经过视频-动作对训练的3B模型，由动作跟踪器收集，其中数据格式化为动作-思维（AoT）序列。此后，CombatVLA无缝地集成到一个动作执行框架中，通过我们的截断AoT策略实现高效推理。实验结果表明，CombatVLA不仅在作战理解基准上胜过所有现有模型，而且在游戏作战方面实现了50倍的加速。此外，它的任务成功率高于人类玩家。我们将开源所有资源，包括动作跟踪器、数据集、基准、模型权重、训练代码以及框架的实现，网址为https://combatvla.github.io/。

更新时间: 2025-03-12 16:42:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09527v1

PairVDN - Pair-wise Decomposed Value Functions

Extending deep Q-learning to cooperative multi-agent settings is challenging due to the exponential growth of the joint action space, the non-stationary environment, and the credit assignment problem. Value decomposition allows deep Q-learning to be applied at the joint agent level, at the cost of reduced expressivity. Building on past work in this direction, our paper proposes PairVDN, a novel method for decomposing the value function into a collection of pair-wise, rather than per-agent, functions, improving expressivity at the cost of requiring a more complex (but still efficient) dynamic programming maximisation algorithm. Our method enables the representation of value functions which cannot be expressed as a monotonic combination of per-agent functions, unlike past approaches such as VDN and QMIX. We implement a novel many-agent cooperative environment, Box Jump, and demonstrate improved performance over these baselines in this setting. We open-source our code and environment at https://github.com/zzbuzzard/PairVDN.

Updated: 2025-03-12 16:38:22

标题: PairVDN - 对分解值函数进行一对一处理

摘要: 将深度Q-learning扩展到合作多智体环境是具有挑战性的，这是由于联合动作空间的指数增长、非平稳环境以及信用分配问题。价值分解允许深度Q-learning在联合智体水平上应用，但代价是降低了表达能力。在此方向的过去工作基础上，我们的论文提出了PairVDN，一种将价值函数分解为一组成对而非每个智体的函数的新方法，提高了表达能力，但需要更复杂（但仍有效）的动态规划最大化算法。我们的方法使价值函数的表示能力提高，无法被表达为每个智体函数的单调组合，与VDN和QMIX等过去方法不同。我们实现了一个新颖的多智体合作环境，Box Jump，并展示了在此设置下优于这些基线的性能。我们在https://github.com/zzbuzzard/PairVDN 开源我们的代码和环境。

更新时间: 2025-03-12 16:38:22

领域: cs.AI

下载: http://arxiv.org/abs/2503.09521v1

Discovering new robust local search algorithms with neuro-evolution

This paper explores a novel approach aimed at overcoming existing challenges in the realm of local search algorithms. Our aim is to improve the decision process that takes place within a local search algorithm so as to make the best possible transitions in the neighborhood at each iteration. To improve this process, we propose to use a neural network that has the same input information as conventional local search algorithms. In this paper, which is an extension of the work presented at EvoCOP2024, we investigate different ways of representing this information so as to make the algorithm as efficient as possible but also robust to monotonic transformations of the problem objective function. To assess the efficiency of this approach, we develop an experimental setup centered around NK landscape problems, offering the flexibility to adjust problem size and ruggedness. This approach offers a promising avenue for the emergence of new local search algorithms and the improvement of their problem-solving capabilities for black-box problems. The last version of this article is published in the journal SN Computer Science (Springer).

Updated: 2025-03-12 16:37:23

标题: 用神经进化发现新的稳健局部搜索算法

摘要: 本文探讨了一种旨在克服本地搜索算法领域现有挑战的新方法。我们的目标是改善本地搜索算法中发生的决策过程，从而在每次迭代中进行最佳可能的邻域转换。为了改进这个过程，我们提出使用一个具有与传统本地搜索算法相同输入信息的神经网络。在这篇论文中，我们延伸了在EvoCOP2024上展示的工作，研究了不同的信息表示方式，以使算法尽可能高效，同时也对问题目标函数的单调转换具有鲁棒性。为了评估这种方法的效率，我们开发了一个围绕NK景观问题的实验设置，提供了调整问题规模和崎岖度的灵活性。这种方法为新的本地搜索算法的出现和改进它们对黑箱问题的问题解决能力提供了一个有前途的途径。本文的最新版本发表在《SN计算机科学》（Springer）杂志上。

更新时间: 2025-03-12 16:37:23

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2501.04747v2

SciFi-Benchmark: How Would AI-Powered Robots Behave in Science Fiction Literature?

Given the recent rate of progress in artificial intelligence (AI) and robotics, a tantalizing question is emerging: would robots controlled by emerging AI systems be strongly aligned with human values? In this work, we propose a scalable way to probe this question by generating a benchmark spanning the key moments in 824 major pieces of science fiction literature (movies, tv, novels and scientific books) where an agent (AI or robot) made critical decisions (good or bad). We use a LLM's recollection of each key moment to generate questions in similar situations, the decisions made by the agent, and alternative decisions it could have made (good or bad). We then measure an approximation of how well models align with human values on a set of human-voted answers. We also generate rules that can be automatically improved via amendment process in order to generate the first Sci-Fi inspired constitutions for promoting ethical behavior in AIs and robots in the real world. Our first finding is that modern LLMs paired with constitutions turn out to be well-aligned with human values (95.8%), contrary to unsettling decisions typically made in SciFi (only 21.2% alignment). Secondly, we find that generated constitutions substantially increase alignment compared to the base model (79.4% to 95.8%), and show resilience to an adversarial prompt setting (23.3% to 92.3%). Additionally, we find that those constitutions are among the top performers on the ASIMOV Benchmark which is derived from real-world images and hospital injury reports. Sci-Fi-inspired constitutions are thus highly aligned and applicable in real-world situations. We release SciFi-Benchmark: a large-scale dataset to advance robot ethics and safety research. It comprises 9,056 questions and 53,384 answers, in addition to a smaller human-labeled evaluation set. Data is available at https://scifi-benchmark.github.io

Updated: 2025-03-12 16:35:51

标题: 《SciFi-Benchmark：基于人工智能的机器人在科幻文学中的行为模式》

摘要: 鉴于人工智能（AI）和机器人技术近年来的进展速度，一个令人心动的问题正在浮现：由新兴AI系统控制的机器人是否会与人类价值观高度一致？在这项工作中，我们提出了一种可伸缩的方法来探讨这个问题，通过生成一个基准，涵盖824部主要科幻文学作品（电影、电视、小说和科学书籍）中的关键时刻，其中一个代理人（AI或机器人）做出了关键决策（好或坏）。我们利用LLM对每个关键时刻的回忆生成类似情况下的问题，代理人所做出的决定，以及它本可以做出的替代决策（好或坏）。然后，我们通过一组人类投票的答案来衡量模型与人类价值观的大致一致程度。我们还生成了可以通过修正过程自动改进的规则，以便生成第一个受科幻启发的用于促进AI和机器人在现实世界中的道德行为的宪法。我们的第一个发现是，现代LLM与宪法配对后与人类价值观高度一致（95.8%），与通常在科幻中做出的令人不安的决定相反（仅21.2%的一致性）。其次，我们发现生成的宪法与基本模型相比显著提高了一致性（从79.4%提高到95.8%），并且对对抗性提示设置显示出韧性（从23.3%提高到92.3%）。此外，我们发现这些宪法在ASIMOV基准测试中表现出色，该基准测试源自现实世界的图像和医院伤害报告。因此，受科幻启发的宪法在现实世界中高度一致且适用。我们发布了SciFi-Benchmark：一个用于推进机器人伦理和安全研究的大规模数据集。它包括9,056个问题和53,384个答案，另外还有一个较小的人类标记的评估集。数据可在https://scifi-benchmark.github.io 上获取。

更新时间: 2025-03-12 16:35:51

领域: cs.CL,cs.AI,cs.CY,cs.HC,cs.RO

下载: http://arxiv.org/abs/2503.10706v1

Algebraic Evaluation Theorems

Majority voting (MV) is the prototypical ``wisdom of the crowd'' algorithm. Theorems considering when MV is optimal for group decisions date back to Condorcet's 1785 jury \emph{decision} theorem. The same error independence assumption underlying the theorem can be used to prove a jury \emph{evaluation} theorem that does purely algebraic evaluation (AE) of juror performance based on a batch of their decisions. Three or more binary jurors are enough to obtain the only two possible statistics of their correctness on a test they took. AE is superior to MV in three ways. First, its empirical assumptions are looser and can handle jurors less than 50\% accurate in making decisions. Second, it has point-like precision in evaluating them given its assumption of error independence. This precision enables a multi-accuracy approach that has higher labeling accuracy than MV and comes with empirical uncertainty bounds. And, third, it is self-alarming about the failure of its error independence assumption. Experiments using demographic data from the American Community Survey confirm the practical utility of AE over MV. Two implications of the theorem for AI safety are discussed - a principled way to terminate infinite monitoring chains (who grades the graders?) and the super-alignment problem (how do we evaluate agents doing tasks we do not understand?).

Updated: 2025-03-12 16:31:39

标题: 代数评估定理

摘要: 多数投票（MV）是典型的“群体智慧”算法。考虑MV何时对群体决策最优的定理可以追溯到康多塞的1785年陪审团决策定理。定理所依赖的相同错误独立性假设可以用来证明一个关于评估陪审团的定理，该定理仅通过代数评估（AE）基于一批他们的决定来评估陪审员的表现。三名或更多的二元陪审员足以获得他们在参加测试时正确性的两个可能统计数据。相比MV，AE在三个方面优势明显。首先，其经验假设较松，可以处理决策准确率低于50％的陪审员。其次，鉴于其错误独立性假设，它在评估他们时具有点状精度。这种精度使其能够采用更高标签准确率的多准确度方法，并提供经验不确定性边界。第三，AE对其错误独立性假设的失败具有自我警示功能。使用来自美国社区调查的人口统计数据的实验证实了AE相对于MV的实用性。文章讨论了该定理对AI安全性的两个影响 - 终止无限监控链的原则性方法（谁评分者的分数？）以及超对齐问题（我们如何评估代理执行我们不理解的任务？）。

更新时间: 2025-03-12 16:31:39

领域: cs.AI,cs.LG,I.2.6

下载: http://arxiv.org/abs/2412.16238v2

The Interaction Layer: An Exploration for Co-Designing User-LLM Interactions in Parental Wellbeing Support Systems

Parenting brings emotional and physical challenges, from balancing work, childcare, and finances to coping with exhaustion and limited personal time. Yet, one in three parents never seek support. AI systems potentially offer stigma-free, accessible, and affordable solutions. Yet, user adoption often fails due to issues with explainability and reliability. To see if these issues could be solved using a co-design approach, we developed and tested NurtureBot, a wellbeing support assistant for new parents. 32 parents co-designed the system through Asynchronous Remote Communities method, identifying the key challenge as achieving a "successful chat." As part of co-design, parents role-played as NurtureBot, rewriting its dialogues to improve user understanding, control, and outcomes. The refined prototype, featuring an Interaction Layer, was evaluated by 32 initial and 46 new parents, showing improved user experience and usability, with final CUQ score of 91.3/100, demonstrating successful interaction patterns. Our process revealed useful interaction design lessons for effective AI parenting support.

Updated: 2025-03-12 16:29:28

标题: 交互层：探索在父母福祉支持系统中共同设计用户-LLM交互

摘要: 育儿带来情感和身体挑战，包括平衡工作、照顾孩子和财务，以及应对疲惫和时间有限等问题。然而，三分之一的父母从未寻求支持。人工智能系统有可能提供无歧视、易获得和经济实惠的解决方案。然而，用户采纳率通常受到解释性和可靠性问题的影响。为了探究这些问题是否可以通过协同设计方法解决，我们开发并测试了NurtureBot，这是一个针对新父母的健康支持助手。32名父母通过异步远程社区方法共同设计了该系统，确定了主要挑战为实现“成功的对话”。作为协同设计的一部分，父母扮演NurtureBot的角色，重写其对话以改善用户理解、控制和结果。经过改进的原型具有一个交互层，经过32名初始父母和46名新父母的评估，显示出用户体验和可用性得到改善，最终CUQ得分为91.3/100，展示出成功的交互模式。我们的过程揭示了有效AI育儿支持的有用交互设计教训。

更新时间: 2025-03-12 16:29:28

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2411.01228v2

ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions

Multiple-choice benchmarks, consisting of various prompts and choices, are among the most widely used methods to assess a language model's natural language understanding capability. Given a specific prompt, we typically compute $P(Choice|Prompt)$ to evaluate how likely a language model is to generate the correct choice compared to incorrect ones. However, we observe that performance measured using this approach reflects not only the model's comprehension of the prompt but also its inherent biases for certain choices regardless of the prompt. This issue makes it challenging to accurately measure a model's natural language understanding, as models may select the answer without fully understanding the prompt. To address this limitation, we propose a novel metric called ANPMI, which normalizes Pointwise Mutual Information (PMI) by $-\log P(Choice)$. ANPMI provides a more accurate assessment of the model's natural language understanding by ensuring that it is challenging to answer a question without properly understanding the prompt.

Updated: 2025-03-12 16:27:59

标题: ANPMI: 评估LLMs对于多项选择题的真实理解能力

摘要: 多项选择基准，由各种提示和选择组成，是评估语言模型自然语言理解能力最广泛使用的方法之一。针对特定提示，我们通常计算$P(Choice|Prompt)$来评估语言模型生成正确选择与错误选择相比的可能性。然而，我们观察到使用这种方法衡量的性能不仅反映了模型对提示的理解，还反映了它对某些选择的固有偏见，而与提示无关。这个问题使得准确衡量模型的自然语言理解变得具有挑战性，因为模型可能在没有完全理解提示的情况下选择答案。为了解决这一限制，我们提出了一种称为ANPMI的新型度量标准，它通过$-\log P(Choice)$对互信息点对进行了归一化。ANPMI通过确保在没有正确理解提示的情况下回答问题具有挑战性，提供了对模型自然语言理解的更准确评估。

更新时间: 2025-03-12 16:27:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.18798v3

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs). Retrieval augmentation and tool-use training approaches where a search engine is treated as a tool lack complex multi-turn retrieval flexibility or require large-scale supervised data. Prompting advanced LLMs with reasoning capabilities during inference to use search engines is not optimal, since the LLM does not learn how to optimally interact with the search engine. This paper introduces Search-R1, an extension of the DeepSeek-R1 model where the LLM learns -- solely through reinforcement learning (RL) -- to autonomously generate (multiple) search queries during step-by-step reasoning with real-time retrieval. Search-R1 optimizes LLM rollouts with multi-turn search interactions, leveraging retrieved token masking for stable RL training and a simple outcome-based reward function. Experiments on seven question-answering datasets show that Search-R1 improves performance by 26% (Qwen2.5-7B), 21% (Qwen2.5-3B), and 10% (LLaMA3.2-3B) over SOTA baselines. This paper further provides empirical insights into RL optimization methods, LLM choices, and response length dynamics in retrieval-augmented reasoning. The code and model checkpoints are available at https://github.com/PeterGriffinJin/Search-R1.

Updated: 2025-03-12 16:26:39

标题: Search-R1: 使用强化学习训练LLMs进行推理并利用搜索引擎

摘要: 高效获取外部知识和最新信息对于大型语言模型（LLMs）中的有效推理和文本生成至关重要。在检索增强和工具使用培训方法中，搜索引擎被视为工具，缺乏复杂的多轮检索灵活性或需要大规模监督数据。在推理过程中促使具有推理能力的高级LLMs使用搜索引擎并不是最佳选择，因为LLM并没有学会如何与搜索引擎进行最佳互动。本文介绍了Search-R1，这是DeepSeek-R1模型的扩展，LLM通过强化学习（RL）独立学习如何在实时检索中进行逐步推理时自动生成（多个）搜索查询。Search-R1通过多轮搜索交互优化LLM的展开，利用检索到的令牌屏蔽进行稳定的RL训练，并使用简单的基于结果的奖励函数。在七个问答数据集上的实验表明，Search-R1相对于SOTA基准线提高了26%（Qwen2.5-7B）、21%（Qwen2.5-3B）和10%（LLaMA3.2-3B）的性能。本文进一步提供了关于RL优化方法、LLM选择和检索增强推理中响应长度动态的实证见解。代码和模型检查点可在https://github.com/PeterGriffinJin/Search-R1找到。

更新时间: 2025-03-12 16:26:39

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2503.09516v1

RESTRAIN: Reinforcement Learning-Based Secure Framework for Trigger-Action IoT Environment

Internet of Things (IoT) platforms with trigger-action capability allow event conditions to trigger actions in IoT devices autonomously by creating a chain of interactions. Adversaries exploit this chain of interactions to maliciously inject fake event conditions into IoT hubs, triggering unauthorized actions on target IoT devices to implement remote injection attacks. Existing defense mechanisms focus mainly on the verification of event transactions using physical event fingerprints to enforce the security policies to block unsafe event transactions. These approaches are designed to provide offline defense against injection attacks. The state-of-the-art online defense mechanisms offer real-time defense, but extensive reliability on the inference of attack impacts on the IoT network limits the generalization capability of these approaches. In this paper, we propose a platform-independent multi-agent online defense system, namely RESTRAIN, to counter remote injection attacks at runtime. RESTRAIN allows the defense agent to profile attack actions at runtime and leverages reinforcement learning to optimize a defense policy that complies with the security requirements of the IoT network. The experimental results show that the defense agent effectively takes real-time defense actions against complex and dynamic remote injection attacks and maximizes the security gain with minimal computational overhead.

Updated: 2025-03-12 16:23:14

标题: RESTRAIN：基于强化学习的触发-动作IoT环境安全框架

摘要: 物联网平台具有触发-动作功能，允许事件条件自主触发物联网设备中的动作，通过创建一系列交互。攻击者利用这一系列交互恶意地向物联网中心注入虚假事件条件，触发目标物联网设备上的未经授权的动作，实施远程注入攻击。现有的防御机制主要集中在使用物理事件指纹验证事件交易，以执行安全策略阻止不安全的事件交易。这些方法旨在提供离线防御，以防止注入攻击。最先进的在线防御机制提供实时防御，但对攻击对物联网网络的影响的推理依赖过多，限制了这些方法的泛化能力。在本文中，我们提出了一个独立于平台的多代理在线防御系统，名为RESTRAIN，用于在运行时对抗远程注入攻击。RESTRAIN允许防御代理在运行时对攻击行为进行剖析，并利用强化学习来优化符合物联网网络安全要求的防御策略。实验结果表明，防御代理有效地采取了实时防御措施，对复杂和动态的远程注入攻击进行了应对，并最大限度地提高了安全性，同时具有最低的计算开销。

更新时间: 2025-03-12 16:23:14

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2503.09513v1

Analyzing the Role of Permutation Invariance in Linear Mode Connectivity

It was empirically observed in Entezari et al. (2021) that when accounting for the permutation invariance of neural networks, there is likely no loss barrier along the linear interpolation between two SGD solutions -- a phenomenon known as linear mode connectivity (LMC) modulo permutation. This phenomenon has sparked significant attention due to both its theoretical interest and practical relevance in applications such as model merging. In this paper, we provide a fine-grained analysis of this phenomenon for two-layer ReLU networks under a teacher-student setup. We show that as the student network width $m$ increases, the LMC loss barrier modulo permutation exhibits a double descent behavior. Particularly, when $m$ is sufficiently large, the barrier decreases to zero at a rate $O(m^{-1/2})$. Notably, this rate does not suffer from the curse of dimensionality and demonstrates how substantial permutation can reduce the LMC loss barrier. Moreover, we observe a sharp transition in the sparsity of GD/SGD solutions when increasing the learning rate and investigate how this sparsity preference affects the LMC loss barrier modulo permutation. Experiments on both synthetic and MNIST datasets corroborate our theoretical predictions and reveal a similar trend for more complex network architectures.

Updated: 2025-03-12 16:22:51

标题: 分析排列不变性在线性模式连接中的作用

摘要: 在Entezari等人（2021）的实证研究中观察到，考虑神经网络的置换不变性时，沿着两个SGD解之间的线性插值可能没有损失障碍 - 这种现象被称为线性模式连接（LMC）模除置换。这一现象引起了极大关注，因为它既具有理论意义，又在诸如模型融合等应用中具有实际意义。在本文中，我们针对师生设置下的两层ReLU网络对这一现象进行了细致的分析。我们展示了随着学生网络宽度$m$的增加，线性模式连接损失障碍模除置换呈现双谷现象。特别是，当$m$足够大时，障碍以$O(m^{-1/2})$的速率减少至零。值得注意的是，这种速率不受维度诅咒的影响，并展示了大规模置换如何降低LMC损失障碍。此外，我们观察到在增加学习率时，GD/SGD解的稀疏性发生了明显转变，并调查了这种稀疏性偏好如何影响线性模式连接损失障碍模除置换。在合成数据集和MNIST数据集上的实验验证了我们的理论预测，并揭示了更复杂网络架构的类似趋势。

更新时间: 2025-03-12 16:22:51

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.06001v2

Reinforcement Learning is all You Need

Inspired by the success of DeepSeek R1 in reasoning via reinforcement learning without human feedback, we train a 3B language model using the Countdown Game with pure reinforcement learning. Our model outperforms baselines on four of five benchmarks, demonstrating improved generalization beyond its training data. Notably, response length does not correlate with reasoning quality, and while "aha moments" emerge, they do not always yield correct answers. These findings highlight the potential of RL-only training for reasoning enhancement and suggest future work on refining reward structures to bridge emergent insights with accuracy.

Updated: 2025-03-12 16:22:28

标题: 强化学习就是你所需要的

摘要: 受DeepSeek R1在没有人类反馈的情况下通过强化学习进行推理的成功启发，我们使用纯强化学习训练了一个3B语言模型，使用倒计时游戏。我们的模型在五个基准测试中有四个表现优于基线，展示出超越训练数据的改进泛化能力。值得注意的是，回答长度与推理质量没有相关性，而“灵光一现”的时刻出现了，但并不总是产生正确答案。这些发现突显了仅使用强化学习进行推理增强的潜力，并建议未来在细化奖励结构方面进行工作，以弥合新兴见解与准确性之间的差距。

更新时间: 2025-03-12 16:22:28

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.09512v1

Med-gte-hybrid: A contextual embedding transformer model for extracting actionable information from clinical texts

We introduce a novel contextual embedding model med-gte-hybrid that was derived from the gte-large sentence transformer to extract information from unstructured clinical narratives. Our model tuning strategy for med-gte-hybrid combines contrastive learning and a denoising autoencoder. To evaluate the performance of med-gte-hybrid, we investigate several clinical prediction tasks in large patient cohorts extracted from the MIMIC-IV dataset, including Chronic Kidney Disease (CKD) patient prognosis, estimated glomerular filtration rate (eGFR) prediction, and patient mortality prediction. Furthermore, we demonstrate that the med-gte-hybrid model improves patient stratification, clustering, and text retrieval, thus outperforms current state-of-the-art models on the Massive Text Embedding Benchmark (MTEB). While some of our evaluations focus on CKD, our hybrid tuning of sentence transformers could be transferred to other medical domains and has the potential to improve clinical decision-making and personalised treatment pathways in various healthcare applications.

Updated: 2025-03-12 16:17:01

标题: Med-gte-hybrid：一种用于从临床文本中提取可操作信息的上下文嵌入变压器模型

摘要: 我们介绍了一个新颖的语境嵌入模型med-gte-hybrid，该模型源自gte-large句子转换器，用于从非结构化的临床叙述中提取信息。我们为med-gte-hybrid模型调整策略结合了对比学习和去噪自动编码器。为了评估med-gte-hybrid的性能，我们在从MIMIC-IV数据集中提取的大型患者群体中进行了几项临床预测任务的研究，包括慢性肾病（CKD）患者预后、估计肾小球滤过率（eGFR）预测和患者死亡预测。此外，我们证明了med-gte-hybrid模型改善了患者分层、聚类和文本检索，从而在大规模文本嵌入基准（MTEB）上胜过当前最先进的模型。尽管我们的一些评估重点放在CKD上，但我们对句子转换器的混合调整可以转移到其他医学领域，并有潜力改善各种医疗应用中的临床决策和个性化治疗路径。

更新时间: 2025-03-12 16:17:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.15996v2

Harnessing Causality in Reinforcement Learning With Bagged Decision Times

We consider reinforcement learning (RL) for a class of problems with bagged decision times. A bag contains a finite sequence of consecutive decision times. The transition dynamics are non-Markovian and non-stationary within a bag. All actions within a bag jointly impact a single reward, observed at the end of the bag. For example, in mobile health, multiple activity suggestions in a day collectively affect a user's daily commitment to being active. Our goal is to develop an online RL algorithm to maximize the discounted sum of the bag-specific rewards. To handle non-Markovian transitions within a bag, we utilize an expert-provided causal directed acyclic graph (DAG). Based on the DAG, we construct states as a dynamical Bayesian sufficient statistic of the observed history, which results in Markov state transitions within and across bags. We then formulate this problem as a periodic Markov decision process (MDP) that allows non-stationarity within a period. An online RL algorithm based on Bellman equations for stationary MDPs is generalized to handle periodic MDPs. We show that our constructed state achieves the maximal optimal value function among all state constructions for a periodic MDP. Finally, we evaluate the proposed method on testbed variants built from real data in a mobile health clinical trial.

Updated: 2025-03-12 16:15:10

标题: 利用装袋决策时间在强化学习中实现因果关系

摘要: 我们考虑对一类具有打包决策时间的问题进行强化学习（RL）。一个包含有一个有限的连续决策时间序列的袋子。转移动态在一个袋子内是非马尔可夫的和非平稳的。一个袋子内的所有动作共同影响单个奖励，该奖励在袋子结束时被观察到。例如，在移动健康中，一天内的多个活动建议共同影响用户对积极行动的日常承诺。我们的目标是开发一种在线RL算法，以最大化袋子特定奖励的折现总和。为了处理一个袋子内的非马尔可夫转移，我们利用专家提供的因果有向无环图（DAG）。基于DAG，我们构建状态作为观察历史的动态贝叶斯充分统计量，从而导致在和之间的袋子内的马尔可夫状态转移。然后，我们将这个问题表述为一个允许周期内非平稳性的周期性马尔可夫决策过程（MDP）。基于贝尔曼方程的在线RL算法用于处理周期性MDPs。我们表明，我们构建的状态在所有状态构造中实现了周期性MDP的最大最优值函数。最后，我们在移动健康临床试验中基于真实数据建立的测试台变体上评估了提出的方法。

更新时间: 2025-03-12 16:15:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.14659v2

Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework

The Mixture-of-Experts (MoE) model has succeeded in deep learning (DL). However, its complex architecture and advantages over dense models in image classification remain unclear. In previous studies, MoE performance has often been affected by noise and outliers in the input space. Some approaches incorporate input clustering for training MoE models, but most clustering algorithms lack access to labeled data, limiting their effectiveness. This paper introduces the Double-stage Feature-level Clustering and Pseudo-labeling-based Mixture of Experts (DFCP-MoE) framework, which consists of input feature extraction, feature-level clustering, and a computationally efficient pseudo-labeling strategy. This approach reduces the impact of noise and outliers while leveraging a small subset of labeled data to label a large portion of unlabeled inputs. We propose a conditional end-to-end joint training method that improves expert specialization by training the MoE model on well-labeled, clustered inputs. Unlike traditional MoE and dense models, the DFCP-MoE framework effectively captures input space diversity, leading to competitive inference results. We validate our approach on three benchmark datasets for multi-class classification tasks.

Updated: 2025-03-12 16:13:50

标题: 双阶段特征级聚类的专家混合框架

摘要: 混合专家（MoE）模型在深度学习（DL）中取得了成功。然而，它复杂的架构以及在图像分类中相对密集模型的优势仍不清楚。在先前的研究中，MoE的性能经常受到输入空间中的噪声和异常值的影响。一些方法将输入聚类用于训练MoE模型，但大多数聚类算法缺乏标记数据，限制了它们的有效性。本文介绍了基于双阶段特征级别聚类和伪标记的混合专家（DFCP-MoE）框架，其中包括输入特征提取、特征级别聚类和计算效率高的伪标记策略。这种方法减少了噪声和异常值的影响，同时利用少量标记数据为大量未标记的输入标记。我们提出了一种条件端到端联合训练方法，通过在良好标记、聚类的输入上训练MoE模型，改进专家的专业化。与传统的MoE和密集模型不同，DFCP-MoE框架有效捕捉输入空间的多样性，产生竞争力强的推理结果。我们在三个基准数据集上验证了我们的方法，用于多类分类任务。

更新时间: 2025-03-12 16:13:50

领域: cs.LG,cs.AI,cs.CV,cs.LO

下载: http://arxiv.org/abs/2503.09504v1

ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking -- enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs.

Updated: 2025-03-12 16:05:31

标题: ReMA：利用多智能体强化学习为LLMs学习元认知能力

摘要: 最近关于大型语言模型（LLMs）推理的研究试图通过整合元思维来进一步提高它们的性能，使模型能够监控、评估和控制其推理过程以实现更具适应性和有效性的问题解决。然而，目前单一代理工作缺乏专门设计用于获取元思维的方法，导致效果不佳。为了解决这一挑战，我们引入了强化元思维代理（ReMA），这是一个利用多代理强化学习（MARL）来引出元思维行为的新框架，鼓励LLMs思考思考。ReMA将推理过程分解为两个层次的代理：一个负责生成战略监督和计划的高层元思维代理，以及一个负责详细执行的低层推理代理。通过迭代强化学习和对齐目标，这些代理探索和学习协作，从而实现改进的泛化和鲁棒性。实验结果表明，ReMA在复杂推理任务上优于单一代理RL基线，包括竞争水平的数学基准和LLM作为裁判的基准。全面的消融研究进一步阐明了每个独特代理的演变动态，为理解元思维推理过程如何增强LLMs的推理能力提供了宝贵见解。

更新时间: 2025-03-12 16:05:31

领域: cs.AI,cs.CL,cs.LG,cs.MA

下载: http://arxiv.org/abs/2503.09501v1

Multiscale Stochastic Gradient Descent: Efficiently Training Convolutional Neural Networks

Stochastic Gradient Descent (SGD) is the foundation of modern deep learning optimization but becomes increasingly inefficient when training convolutional neural networks (CNNs) on high-resolution data. This paper introduces Multiscale Stochastic Gradient Descent (Multiscale-SGD), a novel optimization approach that exploits coarse-to-fine training strategies to estimate the gradient at a fraction of the cost, improving the computational efficiency of SGD type methods while preserving model accuracy. We derive theoretical criteria for Multiscale-SGD to be effective, and show that while standard convolutions can be used, they can be suboptimal for noisy data. This leads us to introduce a new class of learnable, scale-independent Mesh-Free Convolutions (MFCs) that ensure consistent gradient behavior across resolutions, making them well-suited for multiscale training. Through extensive empirical validation, we demonstrate that in practice, (i) our Multiscale-SGD approach can be used to train various architectures for a variety of tasks, and (ii) when the noise is not significant, standard convolutions benefit from our multiscale training framework. Our results establish a new paradigm for the efficient training of deep networks, enabling practical scalability in high-resolution and multiscale learning tasks.

Updated: 2025-03-12 16:05:08

标题: 多尺度随机梯度下降：高效训练卷积神经网络

摘要: 随机梯度下降（SGD）是现代深度学习优化的基础，但在高分辨率数据上训练卷积神经网络（CNNs）时，效率逐渐降低。本文介绍了多尺度随机梯度下降（Multiscale-SGD），这是一种新颖的优化方法，利用粗到细的训练策略来估计梯度，成本只是一部分，提高了SGD类型方法的计算效率，同时保持模型准确性。我们推导了Multiscale-SGD有效的理论标准，并表明，虽然可以使用标准卷积，但对于嘈杂数据来说可能不是最佳选择。这导致我们引入了一类新的可学习的、与尺度无关的无网格卷积（MFCs），确保梯度在不同分辨率上的一致行为，使它们非常适合多尺度训练。通过大量的实证验证，我们证明了实际上，（i）我们的多尺度SGD方法可以用于训练各种架构来完成各种任务，（ii）当噪声不明显时，标准卷积会受益于我们的多尺度训练框架。我们的结果确立了一种新的高效训练深度网络的范式，实现了在高分辨率和多尺度学习任务中的实际可扩展性。

更新时间: 2025-03-12 16:05:08

领域: cs.LG

下载: http://arxiv.org/abs/2501.12739v2

MindGYM: Enhancing Vision-Language Models via Synthetic Self-Challenging Questions

Large vision-language models (VLMs) face challenges in achieving robust, transferable reasoning abilities due to reliance on labor-intensive manual instruction datasets or computationally expensive self-supervised methods. To address these issues, we introduce MindGYM, a framework that enhances VLMs through synthetic self-challenging questions, consisting of three stages: (1) Seed Single-Hop Question Synthesis, generating cognitive questions across textual (e.g., logical deduction) and multimodal contexts (e.g., diagram-based queries) spanning eight semantic areas like ethical analysis; (2) Challenging Multi-Hop Question Synthesis, combining seed questions via diverse principles like bridging, visual-textual alignment, to create multi-step problems demanding deeper reasoning; and (3) Thinking-Induced Curriculum Fine-Tuning, a structured pipeline that progressively trains the model from scaffolded reasoning to standalone inference. By leveraging the model's self-synthesis capability, MindGYM achieves high data efficiency (e.g., +16% gains on MathVision-Mini with only 400 samples), computational efficiency (reducing both training and inference costs), and robust generalization across tasks. Extensive evaluations on seven benchmarks demonstrate superior performance over strong baselines, with notable improvements (+15.77% win rates) in reasoning depth and breadth validated via GPT-based scoring. MindGYM underscores the viability of self-challenging for refining VLM capabilities while minimizing human intervention and resource demands. Code and data are released to advance multimodal reasoning research.

Updated: 2025-03-12 16:03:03

标题: MindGYM: 通过合成的自我挑战问题增强视觉-语言模型

摘要: 大型视觉语言模型（VLMs）面临着在实现强大、可转移的推理能力方面的挑战，这是由于依赖于劳动密集型的手动指导数据集或计算昂贵的自监督方法。为了解决这些问题，我们引入了MindGYM，这是一个通过合成自我挑战问题来增强VLMs的框架，包括三个阶段：（1）种子单跳问题合成，生成跨文本（例如逻辑推理）和多模态上下文（例如基于图表的查询）的认知问题，涵盖了八个语义领域，如道德分析；（2）挑战性多跳问题合成，通过各种原则（如桥接、视觉文本对齐）将种子问题组合起来，创建需要更深层推理的多步问题；和（3）思维诱导的课程微调，一个结构化流程，逐渐训练模型从搭建的推理到独立推理。通过利用模型的自我合成能力，MindGYM实现了高数据效率（例如在仅有400个样本的MathVision-Mini上获得+16%的增益），计算效率（减少了训练和推理成本）以及在任务间的稳健泛化。对七个基准测试进行的广泛评估表明，MindGYM相对于强基线表现出优越性能，通过基于GPT的评分验证了在推理深度和广度方面的显着改善（+15.77%的胜率）。MindGYM强调了自我挑战对于改进VLM能力的可行性，同时最大程度地减少人类干预和资源需求。代码和数据已发布以推动多模态推理研究。

更新时间: 2025-03-12 16:03:03

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.09499v1

Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment

Healthcare relies on multiple types of data, such as medical images, genetic information, and clinical records, to improve diagnosis and treatment. However, missing data is a common challenge due to privacy restrictions, cost, and technical issues, making many existing multi-modal models unreliable. To address this, we propose a new multi-model model called Mixture of Experts, Symmetric Aligning, and Reconstruction (MoSARe), a deep learning framework that handles incomplete multimodal data while maintaining high accuracy. MoSARe integrates expert selection, cross-modal attention, and contrastive learning to improve feature representation and decision-making. Our results show that MoSARe outperforms existing models in situations when the data is complete. Furthermore, it provides reliable predictions even when some data are missing. This makes it especially useful in real-world healthcare settings, including resource-limited environments. Our code is publicly available at https://github.com/NazaninMn/MoSARe.

Updated: 2025-03-12 16:03:00

标题: 走向稳健的多模态表示：具有自适应专家和对齐的统一方法

摘要: 医疗保健依赖于多种类型的数据，例如医学影像、遗传信息和临床记录，以改进诊断和治疗。然而，由于隐私限制、成本和技术问题，缺失数据是一个常见挑战，使许多现有的多模态模型不可靠。为了解决这个问题，我们提出了一种新的多模型模型，称为Mixture of Experts，Symmetric Aligning和Reconstruction（MoSARe），这是一个处理不完整多模态数据并保持高准确性的深度学习框架。MoSARe整合了专家选择、跨模态注意力和对比学习，以改进特征表示和决策。我们的结果表明，在数据完整的情况下，MoSARe优于现有模型。此外，即使有些数据缺失，它也提供可靠的预测。这使得它在包括资源有限环境在内的真实世界医疗保健环境中特别有用。我们的代码可以在https://github.com/NazaninMn/MoSARe 上公开获取。

更新时间: 2025-03-12 16:03:00

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.09498v1

Federated Smoothing ADMM for Localization

This paper addresses the challenge of localization in federated settings, which are characterized by distributed data, non-convexity, and non-smoothness. To tackle the scalability and outlier issues inherent in such environments, we propose a robust algorithm that employs an $\ell_1$-norm formulation within a novel federated ADMM framework. This approach addresses the problem by integrating an iterative smooth approximation for the total variation consensus term and employing a Moreau envelope approximation for the convex function that appears in a subtracted form. This transformation ensures that the problem is smooth and weakly convex in each iteration, which results in enhanced computational efficiency and improved estimation accuracy. The proposed algorithm supports asynchronous updates and multiple client updates per iteration, which ensures its adaptability to real-world federated systems. To validate the reliability of the proposed algorithm, we show that the method converges to a stationary point, and numerical simulations highlight its superior performance in convergence speed and outlier resilience compared to existing state-of-the-art localization methods.

Updated: 2025-03-12 16:01:34

标题: 《基于联邦平滑ADMM的定位》

摘要: 本文讨论了联邦设置中的定位挑战，这些设置具有分布式数据、非凸性和非光滑性的特点。为了应对这些环境中固有的可伸缩性和异常值问题，我们提出了一种鲁棒的算法，该算法在一个新颖的联邦ADMM框架中采用了$\ell_1$-范数公式。这种方法通过集成用于总变差一致性项的迭代平滑近似和利用Moreau包络近似来处理出现在减法形式中的凸函数，从而解决了问题。这种转换确保了每次迭代中问题是平滑的和弱凸的，从而提高了计算效率和估计精度。所提出的算法支持异步更新和每次迭代的多个客户端更新，从而确保其适应实际联邦系统。为了验证所提算法的可靠性，我们展示了该方法收敛到一个稳定点，并数值模拟突出了其在收敛速度和异常值韧性方面相对于现有最先进的定位方法的优越性能。

更新时间: 2025-03-12 16:01:34

领域: cs.LG

下载: http://arxiv.org/abs/2503.09497v1

Independence Tests for Language Models

We consider the following problem: given the weights of two models, can we test whether they were trained independently -- i.e., from independent random initializations? We consider two settings: constrained and unconstrained. In the constrained setting, we make assumptions about model architecture and training and propose a family of statistical tests that yield exact p-values with respect to the null hypothesis that the models are trained from independent random initializations. These p-values are valid regardless of the composition of either model's training data; we compute them by simulating exchangeable copies of each model under our assumptions and comparing various similarity measures of weights and activations between the original two models versus these copies. We report the p-values from these tests on pairs of 21 open-weight models (210 total pairs) and correctly identify all pairs of non-independent models. Our tests remain effective even if one model was fine-tuned for many tokens. In the unconstrained setting, where we make no assumptions about training procedures, can change model architecture, and allow for adversarial evasion attacks, the previous tests no longer work. Instead, we propose a new test which matches hidden activations between two models, and which is robust to adversarial transformations and to changes in model architecture. The test can also do localized testing: identifying specific non-independent components of models. Though we no longer obtain exact p-values from this, empirically we find it behaves as one and reliably identifies non-independent models. Notably, we can use the test to identify specific parts of one model that are derived from another (e.g., how Llama 3.1-8B was pruned to initialize Llama 3.2-3B, or shared layers between Mistral-7B and StripedHyena-7B), and it is even robust to retraining individual layers of either model from scratch.

Updated: 2025-03-12 15:58:01

标题: 语言模型的独立性测试

摘要: 我们考虑以下问题：给定两个模型的权重，我们是否可以测试它们是否是独立训练的，即从独立的随机初始化开始？我们考虑两种情景：受限和无限制。在受限情景中，我们对模型架构和训练进行假设，并提出了一系列统计测试，产生与模型从独立随机初始化训练的零假设相关的确切p值。这些p值无论任何模型的训练数据组成如何都是有效的；我们通过在我们的假设下模拟每个模型的可交换副本并比较原始两个模型与这些副本之间的权重和激活之间的各种相似性度量来计算它们。我们报告了这些测试在21对开放权重模型（210对总共）上的p值，并正确识别所有非独立模型的对。即使一个模型经过多次令牌的微调，我们的测试仍然有效。在无限制情景中，我们不对训练程序进行任何假设，可以改变模型架构，并允许敌对规避攻击，之前的测试不再适用。相反，我们提出了一种新的测试，它匹配两个模型之间的隐藏激活，并且对敌对转换和模型架构的变化具有鲁棒性。该测试还可以进行局部测试：识别特定模型的非独立组件。虽然我们不再从中获得确切的p值，但根据经验，我们发现它的行为与确切p值一样，并可可靠地识别非独立模型。值得注意的是，我们可以使用这个测试来识别一个模型的特定部分是从另一个模型派生而来的（例如，Llama 3.1-8B如何被修剪以初始化Llama 3.2-3B，或Mistral-7B和StripedHyena-7B之间的共享层），即使对任一模型的单个层重新训练也具有鲁棒性。

更新时间: 2025-03-12 15:58:01

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2502.12292v2

Representation Retrieval Learning for Heterogeneous Data Integration

In the era of big data, large-scale, multi-modal datasets are increasingly ubiquitous, offering unprecedented opportunities for predictive modeling and scientific discovery. However, these datasets often exhibit complex heterogeneity, such as covariate shift, posterior drift, and missing modalities, that can hinder the accuracy of existing prediction algorithms. To address these challenges, we propose a novel Representation Retrieval ($R^2$) framework, which integrates a representation learning module (the representer) with a sparsity-induced machine learning model (the learner). Moreover, we introduce the notion of "integrativeness" for representers, characterized by the effective data sources used in learning representers, and propose a Selective Integration Penalty (SIP) to explicitly improve the property. Theoretically, we demonstrate that the $R^2$ framework relaxes the conventional full-sharing assumption in multi-task learning, allowing for partially shared structures, and that SIP can improve the convergence rate of the excess risk bound. Extensive simulation studies validate the empirical performance of our framework, and applications to two real-world datasets further confirm its superiority over existing approaches.

Updated: 2025-03-12 15:54:37

标题: 异构数据集成的表示检索学习

摘要: 在大数据时代，大规模、多模态数据集日益普遍，为预测建模和科学发现提供了前所未有的机会。然而，这些数据集通常表现出复杂的异质性，如协变量漂移、后验漂移和缺失模态，这可能会影响现有预测算法的准确性。为了解决这些挑战，我们提出了一个新颖的表示检索（$R^2$）框架，该框架将表示学习模块（表示者）与稀疏诱导的机器学习模型（学习者）进行整合。此外，我们引入了表示者的“整合性”概念，其特点是学习表示者所使用的有效数据源，并提出了一种选择性整合惩罚（SIP）来明确改进该属性。从理论上讲，我们证明了$R^2$框架放宽了多任务学习中传统的全共享假设，允许部分共享结构，并且SIP可以提高过量风险上界的收敛速度。广泛的模拟研究验证了我们框架的经验性能，并应用于两个真实世界数据集进一步证实了其优于现有方法的优越性。

更新时间: 2025-03-12 15:54:37

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2503.09494v1

Learning Cascade Ranking as One Network

Cascade Ranking is a prevalent architecture in large-scale top-k selection systems like recommendation and advertising platforms. Traditional training methods focus on single-stage optimization, neglecting interactions between stages. Recent advances such as RankFlow and FS-LTR have introduced interaction-aware training paradigms but still struggle to 1) align training objectives with the goal of the entire cascade ranking (i.e., end-to-end recall) and 2) learn effective collaboration patterns for different stages. To address these challenges, we propose LCRON, which introduces a novel surrogate loss function derived from the lower bound probability that ground truth items are selected by cascade ranking, ensuring alignment with the overall objective of the system. According to the properties of the derived bound, we further design an auxiliary loss for each stage to drive the reduction of this bound, leading to a more robust and effective top-k selection. LCRON enables end-to-end training of the entire cascade ranking system as a unified network. Experimental results demonstrate that LCRON achieves significant improvement over existing methods on public benchmarks and industrial applications, addressing key limitations in cascade ranking training and significantly enhancing system performance.

Updated: 2025-03-12 15:52:51

标题: 学习级联排名作为一个网络

摘要: 级联排序是大规模top-k选择系统中常见的架构，如推荐和广告平台。传统的训练方法专注于单阶段优化，忽略了各阶段之间的交互。最近的进展，如RankFlow和FS-LTR引入了意识到交互的训练范式，但仍然难以解决以下问题：1）将训练目标与整个级联排序的目标（即端到端召回）对齐；2）学习不同阶段的有效协作模式。为了解决这些挑战，我们提出了LCRON，引入了一种新颖的替代损失函数，该函数源自基于级联排序选择地面真实项目的概率的下限，确保与系统的整体目标对齐。根据派生边界的属性，我们进一步为每个阶段设计了一个辅助损失，以驱动该边界的减小，从而实现更强大和有效的top-k选择。LCRON使整个级联排序系统作为一个统一网络进行端到端培训。实验结果表明，LCRON在公共基准和工业应用中取得了显著的改进，解决了级联排序训练中的关键限制，并显著提高了系统性能。

更新时间: 2025-03-12 15:52:51

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2503.09492v1

Computation-Aware Kalman Filtering and Smoothing

Kalman filtering and smoothing are the foundational mechanisms for efficient inference in Gauss-Markov models. However, their time and memory complexities scale prohibitively with the size of the state space. This is particularly problematic in spatiotemporal regression problems, where the state dimension scales with the number of spatial observations. Existing approximate frameworks leverage low-rank approximations of the covariance matrix. But since they do not model the error introduced by the computational approximation, their predictive uncertainty estimates can be overly optimistic. In this work, we propose a probabilistic numerical method for inference in high-dimensional Gauss-Markov models which mitigates these scaling issues. Our matrix-free iterative algorithm leverages GPU acceleration and crucially enables a tunable trade-off between computational cost and predictive uncertainty. Finally, we demonstrate the scalability of our method on a large-scale climate dataset.

Updated: 2025-03-12 15:51:20

标题: 考虑计算的卡尔曼滤波和平滑处理

摘要: 卡尔曼滤波和平滑是高斯-马尔可夫模型中高效推断的基本机制。然而，它们的时间和内存复杂度随着状态空间的大小而急剧增加。这在时空回归问题中尤为棘手，其中状态维度随空间观测数量增加而增加。现有的近似框架利用协方差矩阵的低秩逼近。但由于它们没有建模计算逼近引入的误差，因此它们的预测不确定性估计可能过于乐观。在这项工作中，我们提出了一种用于高维高斯-马尔可夫模型推断的概率数值方法，可以缓解这些扩展性问题。我们的无矩阵迭代算法利用GPU加速，并且关键地实现了计算成本和预测不确定性之间可调节的权衡。最后，我们在一个大规模气候数据集上展示了我们方法的可扩展性。

更新时间: 2025-03-12 15:51:20

领域: cs.LG,cs.NA,math.NA,stat.ML

下载: http://arxiv.org/abs/2405.08971v2

Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking

Chain-of-Thought (CoT) significantly enhances the performance of large language models (LLMs) across a wide range of tasks, and prior research shows that CoT can theoretically increase expressiveness. However, there is limited mechanistic understanding of the algorithms that Transformer+CoT can learn. In this work, we (1) evaluate the state tracking capabilities of Transformer+CoT and its variants, confirming the effectiveness of CoT. (2) Next, we identify the circuit, a subset of model components, responsible for tracking the world state, finding that late-layer MLP neurons play a key role. We propose two metrics, compression and distinction, and show that the neuron sets for each state achieve nearly 100% accuracy, providing evidence of an implicit finite state automaton (FSA) embedded within the model. (3) Additionally, we explore three realistic settings: skipping intermediate steps, introducing data noise, and testing length generalization. Our results demonstrate that Transformer+CoT learns robust algorithms (FSA), highlighting its resilience in challenging scenarios.

Updated: 2025-03-12 15:47:08

标题: 在具有思维链的变压器中的有限状态自动机：关于状态跟踪的机械研究

摘要: Chain-of-Thought（CoT）显著提高了大型语言模型（LLMs）在各种任务中的性能，先前的研究表明CoT在理论上可以增加表现力。然而，对于Transformer+CoT可以学习的算法的机制理解有限。在这项工作中，我们（1）评估了Transformer+CoT及其变体的状态跟踪能力，确认了CoT的有效性。（2）接下来，我们确定了负责跟踪世界状态的电路，即模型组件的子集，发现后层MLP神经元起着关键作用。我们提出了两个指标，压缩和区分，并显示每个状态的神经元集合实现了近乎100％的准确性，提供了模型中内嵌的隐式有限状态自动机（FSA）的证据。（3）此外，我们探索了三种现实设置：跳过中间步骤，引入数据噪声和测试长度泛化。我们的结果表明，Transformer+CoT学习了健壮的算法（FSA），突显了它在具有挑战性的情景中的韧性。

更新时间: 2025-03-12 15:47:08

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.20129v2

A Novel Approach for Intrinsic Dimension Estimation

The real-life data have a complex and non-linear structure due to their nature. These non-linearities and the large number of features can usually cause problems such as the empty-space phenomenon and the well-known curse of dimensionality. Finding the nearly optimal representation of the dataset in a lower-dimensional space (i.e. dimensionality reduction) offers an applicable mechanism for improving the success of machine learning tasks. However, estimating the required data dimension for the nearly optimal representation (intrinsic dimension) can be very costly, particularly if one deals with big data. We propose a highly efficient and robust intrinsic dimension estimation approach that only relies on matrix-vector products for dimensionality reduction methods. An experimental study is also conducted to compare the performance of proposed method with state of the art approaches.

Updated: 2025-03-12 15:42:39

标题: 一种用于内在维度估计的新方法

摘要: 实际数据由于其本质具有复杂且非线性的结构。这些非线性和大量特征通常会导致问题，例如空间空缺现象和著名的维度灾难。在较低维度空间中找到数据集的近乎最优表示（即降维）为改善机器学习任务的成功提供了一个可行的机制。然而，估计近乎最优表示所需的数据维度（固有维度）可能非常昂贵，特别是在处理大数据时。我们提出了一种高效且稳健的固有维度估计方法，该方法仅依赖于矩阵-向量乘积用于降维方法。还进行了实验研究，以比较所提出方法与现有方法的性能。

更新时间: 2025-03-12 15:42:39

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.09485v1

Learning Spatially Adaptive $\ell_1$-Norms Weights for Convolutional Synthesis Regularization

We propose an unrolled algorithm approach for learning spatially adaptive parameter maps in the framework of convolutional synthesis-based $\ell_1$ regularization. More precisely, we consider a family of pre-trained convolutional filters and estimate deeply parametrized spatially varying parameters applied to the sparse feature maps by means of unrolling a FISTA algorithm to solve the underlying sparse estimation problem. The proposed approach is evaluated for image reconstruction of low-field MRI and compared to spatially adaptive and non-adaptive analysis-type procedures relying on Total Variation regularization and to a well-established model-based deep learning approach. We show that the proposed approach produces visually and quantitatively comparable results with the latter approaches and at the same time remains highly interpretable. In particular, the inferred parameter maps quantify the local contribution of each filter in the reconstruction, which provides valuable insight into the algorithm mechanism and could potentially be used to discard unsuited filters.

Updated: 2025-03-12 15:38:11

标题: 学习空间自适应$\ell_1$-范数权重用于卷积合成正则化

摘要: 我们提出了一种展开算法的方法，用于在卷积合成基于$\ell_1$正则化的框架中学习空间自适应参数映射。更具体地说，我们考虑了一组预先训练的卷积滤波器，并通过展开FISTA算法来估计深度参数化的空间变化参数，这些参数应用于稀疏特征映射，以解决底层稀疏估计问题。我们对提出的方法在低场MRI图像重建中进行评估，并与依赖总变差正则化的空间自适应和非自适应分析型程序以及一个成熟的基于模型的深度学习方法进行了比较。我们展示了提出的方法产生了与后者方法在视觉和定量上可比较的结果，并同时保持高度可解释性。特别是，推断的参数映射量化了重建中每个滤波器的局部贡献，这为算法机制提供了有价值的洞察，并有可能用于丢弃不合适的滤波器。

更新时间: 2025-03-12 15:38:11

领域: cs.LG,cs.CV,math.OC

下载: http://arxiv.org/abs/2503.09483v1

Neural reservoir control of a soft bio-hybrid arm

A long-standing engineering problem, the control of soft robots is difficult because of their highly non-linear, heterogeneous, anisotropic, and distributed nature. Here, bridging engineering and biology, a neural reservoir is employed for the dynamic control of a bio-hybrid model arm made of multiple muscle-tendon groups enveloping an elastic spine. We show how the use of reservoirs facilitates simultaneous control and self-modeling across a set of challenging tasks, outperforming classic neural network approaches. Further, by implementing a spiking reservoir on neuromorphic hardware, energy efficiency is achieved, with nearly two-orders of magnitude improvement relative to standard CPUs, with implications for the on-board control of untethered, small-scale soft robots.

Updated: 2025-03-12 15:31:33

标题: 神经库控制软生物混合臂

摘要: 一个长期存在的工程问题是控制软机器人很困难，因为它们具有高度非线性、异质性、各向异性和分布式的特性。在这里，通过将工程学和生物学联系起来，采用神经水库来动态控制一个由多个肌肉-肌腱组成的生物混合模型手臂，这个手臂包裹着一个弹性脊柱。我们展示了水库的使用如何促进了对一组具有挑战性任务的同时控制和自我建模，同时优于经典的神经网络方法。此外，通过在神经形态硬件上实现脉冲水库，实现了能效，相对于标准CPU，能效提高了近两个数量级，对于无绳、小规模软机器人的机载控制具有重要意义。

更新时间: 2025-03-12 15:31:33

领域: cs.RO,cs.LG,cs.NE

下载: http://arxiv.org/abs/2503.09477v1

Mixture of Experts based Multi-task Supervise Learning from Crowds

Existing truth inference methods in crowdsourcing aim to map redundant labels and items to the ground truth. They treat the ground truth as hidden variables and use statistical or deep learning-based worker behavior models to infer the ground truth. However, worker behavior models that rely on ground truth hidden variables overlook workers' behavior at the item feature level, leading to imprecise characterizations and negatively impacting the quality of truth inference. This paper proposes a new paradigm of multi-task supervised learning from crowds, which eliminates the need for modeling of items's ground truth in worker behavior models. Within this paradigm, we propose a worker behavior model at the item feature level called Mixture of Experts based Multi-task Supervised Learning from Crowds (MMLC). Two truth inference strategies are proposed within MMLC. The first strategy, named MMLC-owf, utilizes clustering methods in the worker spectral space to identify the projection vector of the oracle worker. Subsequently, the labels generated based on this vector are considered as the inferred truth. The second strategy, called MMLC-df, employs the MMLC model to fill the crowdsourced data, which can enhance the effectiveness of existing truth inference methods. Experimental results demonstrate that MMLC-owf outperforms state-of-the-art methods and MMLC-df enhances the quality of existing truth inference methods.

Updated: 2025-03-12 15:25:11

标题: 基于专家混合的多任务监督学习方法——来自众包的研究

摘要: 现有的众包中的真相推断方法旨在将冗余标签和项目映射到地面真相。它们将地面真相视为隐藏变量，并使用统计或基于深度学习的工作者行为模型来推断地面真相。然而，依赖于地面真相隐藏变量的工作者行为模型忽视了工作者在项目特征级别上的行为，导致不精确的表征并对真相推断的质量产生负面影响。本文提出了一种新的众包中的多任务监督学习范式，这种范式消除了在工作者行为模型中对项目地面真相建模的必要性。在这个范式中，我们提出了一个基于项目特征级别的工作者行为模型，称为基于专家混合的众包多任务监督学习（MMLC）。MMLC中提出了两种真相推断策略。第一种策略名为MMLC-owf，利用工作者谱空间中的聚类方法来识别神谕工作者的投影向量。随后，基于该向量生成的标签被视为推断出的真相。第二种策略称为MMLC-df，采用MMLC模型填充众包数据，可以增强现有真相推断方法的有效性。实验结果表明，MMLC-owf胜过最先进的方法，而MMLC-df提高了现有真相推断方法的质量。

更新时间: 2025-03-12 15:25:11

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.13268v2

CommonPower: A Framework for Safe Data-Driven Smart Grid Control

The growing complexity of power system management has led to an increased interest in reinforcement learning (RL). To validate their effectiveness, RL algorithms have to be evaluated across multiple case studies. Case study design is an arduous task requiring the consideration of many aspects, among them the influence of available forecasts and the level of decentralization in the control structure. Furthermore, vanilla RL controllers cannot themselves ensure the satisfaction of system constraints, which makes devising a safeguarding mechanism a necessary task for every case study before deploying the system. To address these shortcomings, we introduce the Python tool CommonPower, the first general framework for the modeling and simulation of power system management tailored towards machine learning. Its modular architecture enables users to focus on specific elements without having to implement a simulation environment. Another unique contribution of CommonPower is the automatic synthesis of model predictive controllers and safeguards. Beyond offering a unified interface for single-agent RL, multi-agent RL, and optimal control, CommonPower includes a training pipeline for machine-learning-based forecasters as well as a flexible mechanism for incorporating feedback of safeguards into the learning updates of RL controllers.

Updated: 2025-03-12 15:23:13

标题: 通用电力：一个用于安全数据驱动的智能电网控制框架

摘要: 电力系统管理日益复杂，导致对强化学习（RL）的兴趣增加。为了验证其有效性，RL算法必须在多个案例研究中进行评估。案例研究设计是一项艰巨的任务，需要考虑许多方面，其中包括可用预测的影响和控制结构中的分散级别。此外，普通的RL控制器本身无法确保系统约束条件的满足，这使得在部署系统之前为每个案例研究设计一种保护机制成为必要任务。为了解决这些缺点，我们引入了Python工具CommonPower，这是第一个针对机器学习定制的电力系统管理建模和仿真的通用框架。其模块化架构使用户能够专注于特定元素，而无需实施仿真环境。CommonPower的另一个独特贡献是自动生成模型预测控制器和保护机制。除了为单一代理RL、多代理RL和最优控制提供统一接口外，CommonPower还包括一个用于基于机器学习的预测器的训练管线，以及一个灵活的机制，用于将保护措施的反馈合并到RL控制器的学习更新中。

更新时间: 2025-03-12 15:23:13

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2406.03231v4

Automatic Association of Quality Requirements and Quantifiable Metrics for Cloud Security Certification

The European Cybersecurity Certification Scheme for Cloud Services (EUCS) is one of the first cybersecurity schemes in Europe, defined by the European Union Agency for Cybersecurity (ENISA). It aims to encourage cloud providers to strengthen their cybersecurity policies in order to receive an official seal of approval from European authorities. EUCS defines a set of security requirements that the cloud provider must meet, in whole or in part, in order to achieve the security certification. The requirements are written in natural language and cover every aspect of security in the cloud environment, from logging access to protecting the system with anti-malware tools to training staff. Operationally, each requirement is associated with one or more evaluable metrics. For example, a requirement to monitor access attempts to a service will have associated metrics that take into account the number of accesses, the number of access attempts, who is accessing, and what resources are being used. Partners in the European project Medina, which ended in October 2023, defined 163 metrics and manually mapped them to 70 EUCS requirements. Manual mapping is intuitively a long and costly process in terms of human resources. This paper proposes an approach based on Sentence Transformers to automatically associate requirements and metrics. In terms of correctness of associations, the proposed method achieves a Normalized Discounted Cumulative Gain of 0.640, improving a previous experiment by 0.146 points.

Updated: 2025-03-12 15:06:45

标题: 云安全认证的质量要求和可量化指标的自动关联

摘要: 欧洲云服务的网络安全认证方案（EUCS）是欧洲首个网络安全方案之一，由欧洲网络安全局（ENISA）制定。它旨在鼓励云服务提供商加强其网络安全政策，以获得欧洲当局的官方认可。EUCS定义了云服务提供商必须满足的一组安全要求，以全面或部分方式获得安全认证。这些要求以自然语言编写，涵盖云环境中安全的各个方面，从记录访问到使用反恶意软件工具保护系统再到培训员工。在操作上，每个要求都与一个或多个可评估的指标相关联。例如，监视对服务的访问尝试的要求将与相关的指标相关联，考虑到访问次数、访问尝试次数、谁在访问以及正在使用哪些资源。欧洲项目Medina的合作伙伴在2023年10月结束时定义了163个指标，并手动将它们映射到70个EUCS要求。从人力资源的角度来看，手动映射在直观上是一个漫长且昂贵的过程。本文提出了一种基于句子转换器的方法，可以自动关联要求和指标。在关联的正确性方面，所提出的方法实现了0.640的标准化折扣累积增益，比以前的实验提高了0.146个点。

更新时间: 2025-03-12 15:06:45

领域: cs.CR

下载: http://arxiv.org/abs/2503.09460v1

A Strategy for Label Alignment in Deep Neural Networks

One recent research demonstrated successful application of the label alignment property for unsupervised domain adaptation in a linear regression settings. Instead of regularizing representation learning to be domain invariant, the research proposed to regularize the linear regression model to align with the top singular vectors of the data matrix from the target domain. In this work we expand upon this idea and generalize it to the case of deep learning, where we derive an alternative formulation of the original adaptation algorithm exploiting label alignment suitable for deep neural network. We also perform experiments to demonstrate that our approach achieves comparable performance to mainstream unsupervised domain adaptation methods while having stabler convergence. All experiments and implementations in our work can be found at the following codebase: https://github.com/xuanrui-work/DeepLabelAlignment.

Updated: 2025-03-12 15:04:03

标题: 深度神经网络中标签对齐的策略

摘要: 最近的一项研究展示了在线性回归设置中，成功应用标签对齐属性进行无监督领域适应。该研究提出了一种新的方法，不是将表示学习规范化为领域不变，而是将线性回归模型规范化为与目标领域的数据矩阵的顶部奇异向量对齐。在这项工作中，我们扩展了这个想法，并将其推广到深度学习的情况，我们推导出了一个替代原始适应算法的公式，利用标签对齐适用于深度神经网络。我们还进行了实验，证明我们的方法实现了与主流无监督领域适应方法相当的性能，同时具有更稳定的收敛性。我们工作中的所有实验和实现都可以在以下代码库中找到：https://github.com/xuanrui-work/DeepLabelAlignment。

更新时间: 2025-03-12 15:04:03

领域: cs.LG

下载: http://arxiv.org/abs/2410.04722v2

Hierarchical Neuro-Symbolic Decision Transformer

We present a hierarchical neuro-symbolic control framework that couples classical symbolic planning with transformer-based policies to address complex, long-horizon decision-making tasks. At the high level, a symbolic planner constructs an interpretable sequence of operators based on logical propositions, ensuring systematic adherence to global constraints and goals. At the low level, each symbolic operator is translated into a sub-goal token that conditions a decision transformer to generate a fine-grained sequence of actions in uncertain, high-dimensional environments. We provide theoretical analysis showing how approximation errors from both the symbolic planner and the neural execution layer accumulate. Empirical evaluations in grid-worlds with multiple keys, locked doors, and item-collection tasks show that our hierarchical approach outperforms purely end-to-end neural approach in success rates and policy efficiency.

Updated: 2025-03-12 15:02:50

标题: 分层神经符号决策转换器

摘要: 我们提出了一个层次化的神经符号控制框架，将经典符号规划与基于转换器的策略结合起来，以解决复杂、长期决策任务。在高层次上，符号规划器基于逻辑命题构建可解释的操作符序列，确保系统地遵守全局约束和目标。在低层次上，每个符号操作符被转换为一个子目标标记，该标记会影响决策转换器在不确定的高维环境中生成细粒度的动作序列。我们提供了理论分析，展示了符号规划器和神经执行层的近似误差如何累积。在具有多个钥匙、锁定门和物品收集任务的网格世界中的实证评估表明，我们的层次化方法在成功率和策略效率方面优于纯粹的端到端神经方法。

更新时间: 2025-03-12 15:02:50

领域: cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.07148v2

Towards Hardware Supported Domain Generalization in DNN-Based Edge Computing Devices for Health Monitoring

Deep neural network (DNN) models have shown remarkable success in many real-world scenarios, such as object detection and classification. Unfortunately, these models are not yet widely adopted in health monitoring due to exceptionally high requirements for model robustness and deployment in highly resource-constrained devices. In particular, the acquisition of biosignals, such as electrocardiogram (ECG), is subject to large variations between training and deployment, necessitating domain generalization (DG) for robust classification quality across sensors and patients. The continuous monitoring of ECG also requires the execution of DNN models in convenient wearable devices, which is achieved by specialized ECG accelerators with small form factor and ultra-low power consumption. However, combining DG capabilities with ECG accelerators remains a challenge. This article provides a comprehensive overview of ECG accelerators and DG methods and discusses the implication of the combination of both domains, such that multi-domain ECG monitoring is enabled with emerging algorithm-hardware co-optimized systems. Within this context, an approach based on correction layers is proposed to deploy DG capabilities on the edge. Here, the DNN fine-tuning for unknown domains is limited to a single layer, while the remaining DNN model remains unmodified. Thus, computational complexity (CC) for DG is reduced with minimal memory overhead compared to conventional fine-tuning of the whole DNN model. The DNN model-dependent CC is reduced by more than 2.5x compared to DNN fine-tuning at an average increase of F1 score by more than 20% on the generalized target domain. In summary, this article provides a novel perspective on robust DNN classification on the edge for health monitoring applications.

Updated: 2025-03-12 15:02:39

标题: 朝向硬件支持的基于DNN的边缘计算设备在健康监测中的领域泛化

摘要: 深度神经网络（DNN）模型在许多实际场景中表现出非凡的成功，例如目标检测和分类。不幸的是，由于对模型鲁棒性和在高度资源受限设备中部署的极高要求，这些模型在健康监测领域尚未被广泛采用。特别是，生物信号（如心电图（ECG））的采集在训练和部署之间存在很大变化，因此需要进行领域泛化（DG）以实现跨传感器和患者的鲁棒分类质量。对ECG的持续监测还需要在便携式可穿戴设备中执行DNN模型，这是通过具有小尺寸和超低功耗的专用ECG加速器实现的。然而，将DG功能与ECG加速器结合仍然是一个挑战。本文全面介绍了ECG加速器和DG方法，并讨论了将两个领域结合的影响，从而利用新兴的算法-硬件共同优化系统实现多领域ECG监测。在此背景下，提出了一种基于校正层的方法，以在边缘部署DG功能。在这里，对未知领域进行DNN微调仅限于单个层，而其余DNN模型保持不变。因此，与整个DNN模型的常规微调相比，DG的计算复杂度（CC）减少，内存开销最小。与DNN微调相比，针对泛化目标领域，DNN模型的依赖CC减少了2.5倍以上，F1分数平均增加了20%以上。总之，本文为健康监测应用提供了关于边缘上的鲁棒DNN分类的新视角。

更新时间: 2025-03-12 15:02:39

领域: cs.LG

下载: http://arxiv.org/abs/2503.09661v1

SO(3)-Equivariant Neural Networks for Learning Vector Fields on Spheres

Analyzing vector fields on the sphere, such as wind speed and direction on Earth, is a difficult task. Models should respect both the rotational symmetries of the sphere and the inherent symmetries of the vector fields. In this paper, we introduce a deep learning architecture that respects both symmetry types using novel techniques based on group convolutions in the 3-dimensional rotation group. This architecture is suitable for scalar and vector fields on the sphere as they can be described as equivariant signals on the 3-dimensional rotation group. Experiments show that our architecture achieves lower prediction and reconstruction error when tested on rotated data compared to both standard CNNs and spherical CNNs.

Updated: 2025-03-12 15:00:32

标题: 在球面上学习向量场的SO(3)等变神经网络

摘要: 在球面上分析矢量场，例如地球上的风速和方向，是一项困难的任务。模型应该尊重球体的旋转对称性和矢量场的固有对称性。在本文中，我们引入了一种深度学习架构，通过基于3维旋转群中的群卷积的新技术，尊重这两种对称性。这种架构适用于球面上的标量和矢量场，因为它们可以被描述为3维旋转群上的等变信号。实验证明，与标准CNN和球面CNN相比，我们的架构在测试旋转数据时实现了更低的预测和重构误差。

更新时间: 2025-03-12 15:00:32

领域: cs.LG

下载: http://arxiv.org/abs/2503.09456v1

How Well Does Your Tabular Generator Learn the Structure of Tabular Data?

Heterogeneous tabular data poses unique challenges in generative modelling due to its fundamentally different underlying data structure compared to homogeneous modalities, such as images and text. Although previous research has sought to adapt the successes of generative modelling in homogeneous modalities to the tabular domain, defining an effective generator for tabular data remains an open problem. One major reason is that the evaluation criteria inherited from other modalities often fail to adequately assess whether tabular generative models effectively capture or utilise the unique structural information encoded in tabular data. In this paper, we carefully examine the limitations of the prevailing evaluation framework and introduce $\textbf{TabStruct}$, a novel evaluation benchmark that positions structural fidelity as a core evaluation dimension. Specifically, TabStruct evaluates the alignment of causal structures in real and synthetic data, providing a direct measure of how effectively tabular generative models learn the structure of tabular data. Through extensive experiments using generators from eight categories on seven datasets with expert-validated causal graphical structures, we show that structural fidelity offers a task-independent, domain-agnostic evaluation dimension. Our findings highlight the importance of tabular data structure and offer practical guidance for developing more effective and robust tabular generative models. Code is available at https://github.com/SilenceX12138/TabStruct.

Updated: 2025-03-12 14:54:58

标题: 您的表生成器有多好地学习了表格数据的结构？

摘要: 异构表格数据在生成建模中面临着独特的挑战，这是因为与图像和文本等同质模态相比，其基本不同的底层数据结构。尽管先前的研究试图将同质模态中生成建模的成功经验应用到表格领域，但定义一种有效的表格数据生成器仍然是一个未解决的问题。一个主要原因是，从其他模态继承的评估标准通常无法充分评估表格生成模型是否有效地捕获或利用表格数据中编码的独特结构信息。在本文中，我们仔细研究了主流评估框架的局限性，并引入了$\textbf{TabStruct}$，一个将结构保真度作为核心评估维度的新颖评估基准。具体来说，TabStruct评估了真实数据和合成数据中因果结构的对准程度，直接衡量表格生成模型学习表格数据结构的有效性。通过在七个数据集上使用来自八个类别的生成器进行广泛实验，并验证了因果图结构的专家验证，我们展示了结构保真度提供了一个与任务无关、与领域无关的评估维度。我们的研究结果突显了表格数据结构的重要性，并为开发更有效和更健壮的表格生成模型提供了实用指导。代码可在https://github.com/SilenceX12138/TabStruct获取。

更新时间: 2025-03-12 14:54:58

领域: cs.LG

下载: http://arxiv.org/abs/2503.09453v1

Training Foundation Models as Data Compression: On Information, Model Weights and Copyright Law

The training process of foundation models as for other classes of deep learning systems is based on minimizing the reconstruction error over a training set. For this reason, they are susceptible to the memorization and subsequent reproduction of training samples. In this paper, we introduce a training-as-compressing perspective, wherein the model's weights embody a compressed representation of the training data. From a copyright standpoint, this point of view implies that the weights can be considered a reproduction or, more likely, a derivative work of a potentially protected set of works. We investigate the technical and legal challenges that emerge from this framing of the copyright of outputs generated by foundation models, including their implications for practitioners and researchers. We demonstrate that adopting an information-centric approach to the problem presents a promising pathway for tackling these emerging complex legal issues.

Updated: 2025-03-12 14:54:13

标题: 将基础模型训练为数据压缩：关于信息、模型权重和版权法

摘要: 基础模型的训练过程，就像其他类别的深度学习系统一样，是基于最小化训练集上的重构误差。因此，它们容易出现对训练样本的记忆和后续再现。在本文中，我们引入了一个训练作为压缩的视角，其中模型的权重体现了训练数据的压缩表示。从版权的角度来看，这种观点意味着权重可以被视为潜在受保护作品的复制品或更可能是衍生作品。我们调查了以这种方式框定基础模型生成的输出的版权所涌现的技术和法律挑战，包括对从业者和研究人员的影响。我们证明，采用信息为中心的方法来处理这些新兴的复杂法律问题提供了一个有希望的途径。

更新时间: 2025-03-12 14:54:13

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.13493v4

Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning

We present a novel method for Deep Reinforcement Learning (DRL), incorporating the convex property of the value function over the belief space in Partially Observable Markov Decision Processes (POMDPs). We introduce hard- and soft-enforced convexity as two different approaches, and compare their performance against standard DRL on two well-known POMDP environments, namely the Tiger and FieldVisionRockSample problems. Our findings show that including the convexity feature can substantially increase performance of the agents, as well as increase robustness over the hyperparameter space, especially when testing on out-of-distribution domains. The source code for this work can be found at https://github.com/Dakout/Convex_DRL.

Updated: 2025-03-12 14:53:07

标题: 凸函数回归：利用凸函数启发的深度强化学习解决信念MDP问题

摘要: 我们提出了一种新颖的深度强化学习（DRL）方法，将价值函数在信念空间中的凸性特性纳入部分可观察马尔可夫决策过程（POMDPs）中。我们引入了硬性和软性凸性作为两种不同的方法，并将它们与标准DRL在两个众所周知的POMDP环境，即老虎和FieldVisionRockSample问题上的表现进行比较。我们的研究结果表明，包含凸性特征可以显著提高代理的性能，同时增加对超参数空间的鲁棒性，尤其是在测试超出分布范围的域时。这项工作的源代码可以在https://github.com/Dakout/Convex_DRL找到。

更新时间: 2025-03-12 14:53:07

领域: cs.LG

下载: http://arxiv.org/abs/2502.09298v2

Online Language Splatting

To enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by integrating language features into geometrically detailed 3D scene representations using 3D Gaussian Splatting (GS), these approaches rely on computationally intensive offline preprocessing of language features for each input image, limiting adaptability to new environments. In this work, we introduce Online Language Splatting, the first framework to achieve online, near real-time, open-vocabulary language mapping within a 3DGS-SLAM system without requiring pre-generated language features. The key challenge lies in efficiently fusing high-dimensional language features into 3D representations while balancing the computation speed, memory usage, rendering quality and open-vocabulary capability. To this end, we innovatively design: (1) a high-resolution CLIP embedding module capable of generating detailed language feature maps in 18ms per frame, (2) a two-stage online auto-encoder that compresses 768-dimensional CLIP features to 15 dimensions while preserving open-vocabulary capabilities, and (3) a color-language disentangled optimization approach to improve rendering quality. Experimental results show that our online method not only surpasses the state-of-the-art offline methods in accuracy but also achieves more than 40x efficiency boost, demonstrating the potential for dynamic and interactive AI applications.

Updated: 2025-03-12 14:49:24

标题: 线上语言溅射

摘要: 为了使AI代理能够与人类和3D环境无缝交互，它们不仅必须准确地感知3D世界，还必须将人类语言与3D空间表示对齐。虽然先前的工作通过将语言特征整合到几何详细的3D场景表示中使用3D高斯飞溅（GS）取得了显著进展，但这些方法依赖于对每个输入图像的语言特征进行计算密集型的离线预处理，从而限制了对新环境的适应性。在这项工作中，我们介绍了在线语言飞溅（Online Language Splatting），这是第一个在3DGS-SLAM系统内实现在线、接近实时、开放词汇语言映射的框架，而无需预先生成的语言特征。关键挑战在于有效地将高维语言特征融合到3D表示中，同时平衡计算速度、内存使用、渲染质量和开放词汇能力。为此，我们创新地设计了：（1）一个高分辨率的CLIP嵌入模块，能够在每帧18ms内生成详细的语言特征图，（2）一个两阶段在线自动编码器，将768维的CLIP特征压缩到15维，同时保留开放词汇能力，（3）一种颜色-语言解缠优化方法来提高渲染质量。实验结果表明，我们的在线方法不仅在准确性上超越了最先进的离线方法，还实现了超过40倍的效率提升，展示了动态和交互式AI应用的潜力。

更新时间: 2025-03-12 14:49:24

领域: cs.AI,cs.CV,cs.RO

下载: http://arxiv.org/abs/2503.09447v1

Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models

Text-to-image (T2I) diffusion models have achieved remarkable progress in generating high-quality images but also raise people's concerns about generating harmful or misleading content. While extensive approaches have been proposed to erase unwanted concepts without requiring retraining from scratch, they inadvertently degrade performance on normal generation tasks. In this work, we propose Interpret then Deactivate (ItD), a novel framework to enable precise concept removal in T2I diffusion models while preserving overall performance. ItD first employs a sparse autoencoder (SAE) to interpret each concept as a combination of multiple features. By permanently deactivating the specific features associated with target concepts, we repurpose SAE as a zero-shot classifier that identifies whether the input prompt includes target concepts, allowing selective concept erasure in diffusion models. Moreover, we demonstrate that ItD can be easily extended to erase multiple concepts without requiring further training. Comprehensive experiments across celebrity identities, artistic styles, and explicit content demonstrate ItD's effectiveness in eliminating targeted concepts without interfering with normal concept generation. Additionally, ItD is also robust against adversarial prompts designed to circumvent content filters. Code is available at: https://github.com/NANSirun/Interpret-then-deactivate.

Updated: 2025-03-12 14:46:40

标题: 稀疏自动编码器作为文本到图像扩散模型中概念擦除的零样本分类器

摘要: 文本到图像（T2I）扩散模型在生成高质量图像方面取得了显着进展，但也引起了人们对生成有害或误导性内容的担忧。尽管已经提出了大量方法来消除不需要从头开始重新训练的不良概念，但它们不经意地降低了正常生成任务的性能。在这项工作中，我们提出了“解释然后停用”（ItD）的新框架，以实现在T2I扩散模型中精确删除概念，同时保持整体性能。ItD首先利用稀疏自编码器（SAE）将每个概念解释为多个特征的组合。通过永久停用与目标概念相关联的特定特征，我们重新利用SAE作为零样本分类器，识别输入提示是否包含目标概念，从而允许在扩散模型中选择性概念擦除。此外，我们展示了ItD可以轻松扩展到擦除多个概念而无需进一步训练。跨名人身份、艺术风格和明确内容的全面实验表明ItD在消除目标概念方面的有效性，而不干扰正常概念生成。此外，ItD还能够抵抗旨在规避内容过滤器的对抗提示。代码可在以下链接找到：https://github.com/NANSirun/Interpret-then-deactivate。

更新时间: 2025-03-12 14:46:40

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.09446v1

Astrea: A MOE-based Visual Understanding Model with Progressive Alignment

Vision-Language Models (VLMs) based on Mixture-of-Experts (MoE) architectures have emerged as a pivotal paradigm in multimodal understanding, offering a powerful framework for integrating visual and linguistic information. However, the increasing complexity and diversity of tasks present significant challenges in coordinating load balancing across heterogeneous visual experts, where optimizing one specialist's performance often compromises others' capabilities. To address task heterogeneity and expert load imbalance, we propose Astrea, a novel multi-expert collaborative VLM architecture based on progressive pre-alignment. Astrea introduces three key innovations: 1) A heterogeneous expert coordination mechanism that integrates four specialized models (detection, segmentation, classification, captioning) into a comprehensive expert matrix covering essential visual comprehension elements; 2) A dynamic knowledge fusion strategy featuring progressive pre-alignment to harmonize experts within the VLM latent space through contrastive learning, complemented by probabilistically activated stochastic residual connections to preserve knowledge continuity; 3) An enhanced optimization framework utilizing momentum contrastive learning for long-range dependency modeling and adaptive weight allocators for real-time expert contribution calibration. Extensive evaluations across 12 benchmark tasks spanning VQA, image captioning, and cross-modal retrieval demonstrate Astrea's superiority over state-of-the-art models, achieving an average performance gain of +4.7\%. This study provides the first empirical demonstration that progressive pre-alignment strategies enable VLMs to overcome task heterogeneity limitations, establishing new methodological foundations for developing general-purpose multimodal agents.

Updated: 2025-03-12 14:44:52

标题: Astrea：基于MOE的逐步对齐视觉理解模型

摘要: 基于混合专家（MoE）架构的视觉-语言模型（VLMs）已经成为多模态理解中的关键范式，为整合视觉和语言信息提供了强大的框架。然而，任务的不断增加的复杂性和多样性给跨异构视觉专家的负载均衡协调带来了重大挑战，优化一个专家的性能往往会损害其他专家的能力。为了解决任务异质性和专家负载不平衡问题，我们提出了Astrea，这是一种基于渐进预对齐的新型多专家协作VLM架构。Astrea引入了三个关键创新：1）一个异质专家协调机制，将四个专门模型（检测、分割、分类、字幕）整合成一个全面的专家矩阵，涵盖了基本的视觉理解元素；2）一种动态知识融合策略，采用渐进预对齐通过对比学习在VLM潜在空间内协调专家，辅以概率激活的随机残差连接以保持知识的连续性；3）一种增强的优化框架，利用动量对比学习进行长程依赖建模，并利用自适应权重分配器进行实时专家贡献校准。在涵盖VQA、图像字幕和跨模态检索的12项基准任务上进行的广泛评估显示Astrea优于现有最先进模型，平均性能提升了+4.7%。这项研究首次通过实证证明了渐进预对齐策略使VLMs能够克服任务异质性限制，为开发通用多模态代理奠定了新的方法论基础。

更新时间: 2025-03-12 14:44:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09445v1

Ext2Gen: Alignment through Unified Extraction and Generation for Robust Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) enhances LLMs by integrating external knowledge, but generation remains fragile due to the uncertain placement of relevant chunks and retrieval-induced information overload, leading to hallucinations. We propose Ext2Gen, a novel extract-then-generate model that enhances RAG robustness by first extracting query-relevant sentences before generating answers. To optimize this model, we employ preference alignment through pairwise feedback learning, enabling the model to generate robust answers regardless of variations in retrieval results. Extensive experiments demonstrate that Ext2Gen effectively identifies query-relevant sentences with high precision and recall, leading to highly reliable answers. Furthermore, deploying our model in a RAG environment reveals that it not only boosts the performance of the base LLM but also synergizes with advanced retrieval strategies like query expansion. The model is available at https://huggingface.co/DISLab/Ext2Gen-8B-R2.

Updated: 2025-03-12 14:42:18

标题: Ext2Gen：通过统一提取和生成实现对齐，以实现稳健的检索增强生成

摘要: 检索增强生成（RAG）通过整合外部知识增强了LLMs，但由于相关块的不确定放置和检索引起的信息过载，生成仍然很脆弱，导致幻觉。我们提出了Ext2Gen，这是一种新颖的提取-生成模型，通过首先提取与查询相关的句子来增强RAG的鲁棒性，然后生成答案。为了优化这个模型，我们采用了通过成对反馈学习的偏好对齐，使模型能够生成稳健的答案，无论检索结果的变化如何。大量实验证明，Ext2Gen有效地识别与查询相关的句子，精度和召回率很高，导致非常可靠的答案。此外，在RAG环境中部署我们的模型显示，它不仅提升了基础LLM的性能，而且与高级检索策略如查询扩展相协同作用。该模型可在https://huggingface.co/DISLab/Ext2Gen-8B-R2获得。

更新时间: 2025-03-12 14:42:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.04789v2

Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models

Cross-lingual transfer enables vision-language models (VLMs) to perform vision tasks in various languages with training data only in one language. Current approaches rely on large pre-trained multilingual language models. However, they face the curse of multilinguality, sacrificing downstream task performance for multilingual capabilities, struggling with lexical ambiguities, and falling behind recent advances. In this work, we study the scaling laws of systematic generalization with monolingual VLMs for multilingual tasks, focusing on the impact of model size and seen training samples. We propose Florenz, a monolingual encoder-decoder VLM with 0.4B to 11.2B parameters combining the pre-trained VLM Florence-2 and the large language model Gemma-2. Florenz is trained with varying compute budgets on a synthetic dataset that features intentionally incomplete language coverage for image captioning, thus, testing generalization from the fully covered translation task. We show that not only does indirectly learning unseen task-language pairs adhere to a scaling law, but also that with our data generation pipeline and the proposed Florenz model family, image captioning abilities can emerge in a specific language even when only data for the translation task is available. Fine-tuning on a mix of downstream datasets yields competitive performance and demonstrates promising scaling trends in multimodal machine translation (Multi30K, CoMMuTE), lexical disambiguation (CoMMuTE), and image captioning (Multi30K, XM3600, COCO Karpathy).

Updated: 2025-03-12 14:41:10

标题: 佛罗伦萨：视觉语言模型中系统概括的尺度定律

摘要: 跨语言转移使视觉-语言模型（VLM）能够在各种语言中执行视觉任务，而训练数据仅限于一种语言。当前的方法依赖于大型预训练的多语言语言模型。然而，它们面临多语言的诅咒，为了多语言能力而牺牲下游任务的性能，与词汇模糊性作斗争，并落后于最新进展。在这项工作中，我们研究了单语VLM在多语言任务中系统泛化的规模律，重点关注了模型大小和已见训练样本的影响。我们提出了Florenz，一个单语编码器-解码器VLM，具有0.4B到11.2B个参数，结合了预训练的VLM Florence-2和大型语言模型Gemma-2。Florenz在合成数据集上进行训练，该数据集特意覆盖不完整的语言用于图像字幕，从而测试从完全覆盖的翻译任务中的泛化能力。我们展示了间接学习未见任务-语言对符合规模律，而且通过我们的数据生成流水线和提出的Florenz模型系列，即使只有翻译任务的数据可用，图像字幕能力也可以在特定语言中出现。在下游数据集的混合上微调产生了竞争性性能，并展示了在多模式机器翻译（Multi30K，CoMMuTE）、词汇消岐（CoMMuTE）和图像字幕（Multi30K，XM3600，COCO Karpathy）中的有希望的规模化趋势。

更新时间: 2025-03-12 14:41:10

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.09443v1

SVGBuilder: Component-Based Colored SVG Generation with Text-Guided Autoregressive Transformers

Scalable Vector Graphics (SVG) are essential XML-based formats for versatile graphics, offering resolution independence and scalability. Unlike raster images, SVGs use geometric shapes and support interactivity, animation, and manipulation via CSS and JavaScript. Current SVG generation methods face challenges related to high computational costs and complexity. In contrast, human designers use component-based tools for efficient SVG creation. Inspired by this, SVGBuilder introduces a component-based, autoregressive model for generating high-quality colored SVGs from textual input. It significantly reduces computational overhead and improves efficiency compared to traditional methods. Our model generates SVGs up to 604 times faster than optimization-based approaches. To address the limitations of existing SVG datasets and support our research, we introduce ColorSVG-100K, the first large-scale dataset of colored SVGs, comprising 100,000 graphics. This dataset fills the gap in color information for SVG generation models and enhances diversity in model training. Evaluation against state-of-the-art models demonstrates SVGBuilder's superior performance in practical applications, highlighting its efficiency and quality in generating complex SVG graphics.

Updated: 2025-03-12 14:34:11

标题: SVGBuilder：基于组件的有颜色SVG生成，采用文本引导的自回归变换器

摘要: 可缩放矢量图形（SVG）是多用途图形的基本基于XML的格式，提供分辨率独立性和可伸缩性。与栅格图像不同，SVG使用几何形状，并支持通过CSS和JavaScript进行交互、动画和操作。当前的SVG生成方法面临与高计算成本和复杂性相关的挑战。相比之下，人类设计师使用基于组件的工具来高效创建SVG。受此启发，SVGBuilder引入了一种基于组件的自回归模型，用于从文本输入生成高质量彩色SVG。与传统方法相比，它显著减少了计算开销并提高了效率。我们的模型生成的SVG比基于优化的方法快了高达604倍。为解决现有SVG数据集的局限性并支持我们的研究，我们引入了ColorSVG-100K，这是第一个包含10万个图形的大规模彩色SVG数据集。这个数据集填补了用于SVG生成模型的颜色信息的空白，并增强了模型训练的多样性。对最先进的模型进行评估显示，SVGBuilder在实际应用中表现出卓越的性能，突出了其在生成复杂SVG图形方面的效率和质量。

更新时间: 2025-03-12 14:34:11

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2412.10488v3

A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP

Motivated by applications in risk-sensitive reinforcement learning, we study mean-variance optimization in a discounted reward Markov Decision Process (MDP). Specifically, we analyze a Temporal Difference (TD) learning algorithm with linear function approximation (LFA) for policy evaluation. We derive finite-sample bounds that hold (i) in the mean-squared sense and (ii) with high probability under tail iterate averaging, both with and without regularization. Our bounds exhibit an exponentially decaying dependence on the initial error and a convergence rate of $O(1/t)$ after $t$ iterations. Moreover, for the regularized TD variant, our bound holds for a universal step size. Next, we integrate a Simultaneous Perturbation Stochastic Approximation (SPSA)-based actor update with an LFA critic and establish an $O(n^{-1/4})$ convergence guarantee, where $n$ denotes the iterations of the SPSA-based actor-critic algorithm. These results establish finite-sample theoretical guarantees for risk-sensitive actor-critic methods in reinforcement learning, with a focus on variance as a risk measure.

Updated: 2025-03-12 14:32:31

标题: 一个有限样本分析的演员-评论家算法，用于在折扣MDP中进行均值方差优化

摘要: 受风险敏感强化学习应用的启发，我们研究了在折现奖励马尔可夫决策过程（MDP）中的均值-方差优化。具体来说，我们分析了一种具有线性函数逼近（LFA）的时间差分（TD）学习算法用于策略评估。我们推导了有限样本界，这些界在均方意义下成立，并且在尾部迭代平均的情况下具有高概率，无论是否进行正则化。我们的界对初始误差具有指数衰减依赖性，并且在t次迭代后具有O(1/t)的收敛速度。此外，对于正则化的TD变种，我们的界适用于通用步长。接下来，我们将基于同时扰动随机逼近（SPSA）的演员更新与LFA评论员相结合，并建立了一个O(n^{-1/4})的收敛保证，其中n表示基于SPSA的演员-评论员算法的迭代次数。这些结果为风险敏感的演员-评论员方法在强化学习中提供了有限样本的理论保证，重点放在方差作为风险度量上。

更新时间: 2025-03-12 14:32:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.07892v2

PromptMap: An Alternative Interaction Style for AI-Based Image Generation

Recent technological advances popularized the use of image generation among the general public. Crafting effective prompts can, however, be difficult for novice users. To tackle this challenge, we developed PromptMap, a new interaction style for text-to-image AI that allows users to freely explore a vast collection of synthetic prompts through a map-like view with semantic zoom. PromptMap groups images visually by their semantic similarity, allowing users to discover relevant examples. We evaluated PromptMap in a between-subject online study ($n=60$) and a qualitative within-subject study ($n=12$). We found that PromptMap supported users in crafting prompts by providing them with examples. We also demonstrated the feasibility of using LLMs to create vast example collections. Our work contributes a new interaction style that supports users unfamiliar with prompting in achieving a satisfactory image output.

Updated: 2025-03-12 14:31:50

标题: PromptMap：一种用于基于AI的图像生成的替代交互方式

摘要: 最近的技术进步在普通大众中推广了图像生成的应用。然而，对于初学者来说，制定有效的提示可能会很困难。为了应对这一挑战，我们开发了PromptMap，这是一种新的文本到图像人工智能的交互风格，允许用户通过类似地图的视图自由探索大量的合成提示。PromptMap通过语义缩放将图像根据它们的语义相似性在视觉上进行分组，使用户能够发现相关的示例。我们在一个介于受试者之间的在线研究（n=60）和一个定性的受试者内研究（n=12）中评估了PromptMap。我们发现，PromptMap通过提供示例支持用户制定提示。我们还展示了使用LLMs创建庞大示例集合的可行性。我们的工作提出了一种新的交互风格，支持不熟悉提示的用户实现令人满意的图像输出。

更新时间: 2025-03-12 14:31:50

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2503.09436v1

CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection

Identifying vulnerabilities in source code is crucial, especially in critical software components. Existing methods such as static analysis, dynamic analysis, formal verification, and recently Large Language Models are widely used to detect security flaws. This paper introduces CASTLE (CWE Automated Security Testing and Low-Level Evaluation), a benchmarking framework for evaluating the vulnerability detection capabilities of different methods. We assess 13 static analysis tools, 10 LLMs, and 2 formal verification tools using a hand-crafted dataset of 250 micro-benchmark programs covering 25 common CWEs. We propose the CASTLE Score, a novel evaluation metric to ensure fair comparison. Our results reveal key differences: ESBMC (a formal verification tool) minimizes false positives but struggles with vulnerabilities beyond model checking, such as weak cryptography or SQL injection. Static analyzers suffer from high false positives, increasing manual validation efforts for developers. LLMs perform exceptionally well in the CASTLE dataset when identifying vulnerabilities in small code snippets. However, their accuracy declines, and hallucinations increase as the code size grows. These results suggest that LLMs could play a pivotal role in future security solutions, particularly within code completion frameworks, where they can provide real-time guidance to prevent vulnerabilities. The dataset is accessible at https://github.com/CASTLE-Benchmark.

Updated: 2025-03-12 14:30:05

标题: CASTLE：用于静态代码分析器和LLM（语言生成模型）向CWE（常见弱点枚举）检测的基准数据集

摘要: 在源代码中识别漏洞是至关重要的，特别是在关键软件组件中。现有的方法，如静态分析、动态分析、形式验证，以及最近广泛使用的大型语言模型，被用来检测安全漏洞。本文介绍了CASTLE（CWE自动安全测试和低级评估），这是一个用于评估不同方法漏洞检测能力的基准框架。我们评估了13种静态分析工具、10种LLMs和2种形式验证工具，使用一个手工制作的数据集，包括250个微基准程序，涵盖25个常见的CWE。我们提出了CASTLE分数，这是一个新颖的评估指标，以确保公平比较。我们的结果揭示了关键差异：ESBMC（一种形式验证工具）减少了假阳性，但在模型检查之外的漏洞，如弱加密或SQL注入方面遇到困难。静态分析器存在高假阳性，增加了开发人员的手动验证工作量。LLMs在CASTLE数据集中表现出色，可以很好地识别小代码片段中的漏洞。然而，随着代码规模的增长，它们的准确性下降，幻觉增加。这些结果表明，LLMs在未来安全解决方案中可能发挥关键作用，特别是在代码完成框架中，它们可以提供实时指导，以预防漏洞。该数据集可在https://github.com/CASTLE-Benchmark 获取。

更新时间: 2025-03-12 14:30:05

领域: cs.CR,cs.AI,cs.SE

下载: http://arxiv.org/abs/2503.09433v1

Multimodal Language Modeling for High-Accuracy Single Cell Transcriptomics Analysis and Generation

Pre-trained language models (PLMs) have revolutionized scientific research, yet their application to single-cell analysis remains limited. Text PLMs cannot process single-cell RNA sequencing data, while cell PLMs lack the ability to handle free text, restricting their use in multimodal tasks. Existing efforts to bridge these modalities often suffer from information loss or inadequate single-modal pre-training, leading to suboptimal performances. To address these challenges, we propose Single-Cell MultiModal Generative Pre-trained Transformer (scMMGPT), a unified PLM for joint cell and text modeling. scMMGPT effectively integrates the state-of-the-art cell and text PLMs, facilitating cross-modal knowledge sharing for improved performance. To bridge the text-cell modality gap, scMMGPT leverages dedicated cross-modal projectors, and undergoes extensive pre-training on 27 million cells -- the largest dataset for multimodal cell-text PLMs to date. This large-scale pre-training enables scMMGPT to excel in joint cell-text tasks, achieving an 84\% relative improvement of textual discrepancy for cell description generation, 20.5\% higher accuracy for cell type annotation, and 4\% improvement in $k$-NN accuracy for text-conditioned pseudo-cell generation, outperforming baselines.

Updated: 2025-03-12 14:26:16

标题: 多模态语言建模用于高准确性单细胞转录组学分析与生成

摘要: 预训练语言模型（PLMs）已经彻底改变了科学研究，然而它们在单细胞分析中的应用仍然有限。文本PLMs无法处理单细胞RNA测序数据，而细胞PLMs则缺乏处理自由文本的能力，限制了它们在多模态任务中的使用。现有的努力为了弥合这些模态经常遭受信息丢失或单模态预训练不足的困扰，导致性能不佳。为了解决这些挑战，我们提出了Single-Cell MultiModal Generative Pre-trained Transformer（scMMGPT），这是一个用于联合细胞和文本建模的统一PLM。scMMGPT有效地集成了最先进的细胞和文本PLMs，促进跨模态知识共享以提高性能。为了弥合文本-细胞模态差距，scMMGPT利用专用跨模态投影仪，并在2700万个细胞上进行了大量的预训练--这是迄今为止最大的多模态细胞-文本PLMs数据集。这种大规模的预训练使scMMGPT在联合细胞-文本任务中表现出色，实现了细胞描述生成的文本差异的84%相对改善，细胞类型注释的准确性提高了20.5%，以及在基于文本条件的伪细胞生成的$k$-NN准确率上提高了4%，胜过基线。

更新时间: 2025-03-12 14:26:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.09427v1

Measuring memorization in language models via probabilistic extraction

Large language models (LLMs) are susceptible to memorizing training data, raising concerns about the potential extraction of sensitive information at generation time. Discoverable extraction is the most common method for measuring this issue: split a training example into a prefix and suffix, then prompt the LLM with the prefix, and deem the example extractable if the LLM generates the matching suffix using greedy sampling. This definition yields a yes-or-no determination of whether extraction was successful with respect to a single query. Though efficient to compute, we show that this definition is unreliable because it does not account for non-determinism present in more realistic (non-greedy) sampling schemes, for which LLMs produce a range of outputs for the same prompt. We introduce probabilistic discoverable extraction, which, without additional cost, relaxes discoverable extraction by considering multiple queries to quantify the probability of extracting a target sequence. We evaluate our probabilistic measure across different models, sampling schemes, and training-data repetitions, and find that this measure provides more nuanced information about extraction risk compared to traditional discoverable extraction.

Updated: 2025-03-12 14:25:10

标题: 通过概率提取测量语言模型中的记忆化

摘要: 大型语言模型（LLMs）容易记忆训练数据，引发了人们对在生成时可能提取敏感信息的担忧。可发现的提取是衡量这一问题最常见的方法：将训练示例分成前缀和后缀，然后用前缀提示LLM，如果LLM使用贪婪抽样生成匹配的后缀，则认为该示例可提取。这个定义给出了关于单个查询是否成功提取的是或否决定。尽管计算效率高，我们表明这一定义是不可靠的，因为它没有考虑更现实（非贪婪）抽样方案中存在的非确定性，对于这些方案，LLMs为相同提示生成一系列输出。我们引入了概率性可发现提取，它在没有额外成本的情况下通过考虑多个查询来放宽可发现提取，以量化提取目标序列的概率。我们评估我们的概率性度量在不同模型、抽样方案和训练数据重复次数下的效果，并发现与传统的可发现提取相比，这一度量提供了更细致的有关提取风险的信息。

更新时间: 2025-03-12 14:25:10

领域: cs.LG

下载: http://arxiv.org/abs/2410.19482v2

Provable Imbalanced Point Clustering

We suggest efficient and provable methods to compute an approximation for imbalanced point clustering, that is, fitting $k$-centers to a set of points in $\mathbb{R}^d$, for any $d,k\geq 1$. To this end, we utilize \emph{coresets}, which, in the context of the paper, are essentially weighted sets of points in $\mathbb{R}^d$ that approximate the fitting loss for every model in a given set, up to a multiplicative factor of $1\pm\varepsilon$. We provide [Section 3 and Section E in the appendix] experiments that show the empirical contribution of our suggested methods for real images (novel and reference), synthetic data, and real-world data. We also propose choice clustering, which by combining clustering algorithms yields better performance than each one separately.

Updated: 2025-03-12 14:18:23

标题: 可证明的不平衡点聚类

摘要: 我们建议高效和可证明的方法来计算不平衡点聚类的近似，即将$k$个中心拟合到$\mathbb{R}^d$中的一组点，其中$d,k\geq 1$。为此，我们利用\emph{核心集}，在本文的背景下，核心集实质上是$\mathbb{R}^d$中的一组加权点，这些点近似表示给定集合中每个模型的拟合损失，最多乘以$1\pm\varepsilon$的因子。我们提供[附录中的第3节和第E节]实验证明了我们建议的方法对于真实图像（新颖和参考）、合成数据和真实世界数据的经验贡献。我们还提出了选择聚类，通过结合聚类算法，比单独使用每个算法表现更好。

更新时间: 2025-03-12 14:18:23

领域: cs.LG

下载: http://arxiv.org/abs/2408.14225v2

Efficient dynamic modal load reconstruction using physics-informed Gaussian processes based on frequency-sparse Fourier basis functions

Knowledge of the force time history of a structure is essential to assess its behaviour, ensure safety and maintain reliability. However, direct measurement of external forces is often challenging due to sensor limitations, unknown force characteristics, or inaccessible load points. This paper presents an efficient dynamic load reconstruction method using physics-informed Gaussian processes (GP) based on frequency-sparse Fourier basis functions. The GP's covariance matrices are built using the description of the system dynamics, and the model is trained using structural response measurements. This provides support and interpretability to the machine learning model, in contrast to purely data-driven methods. In addition, the model filters out irrelevant components in the Fourier basis function by leveraging the sparsity of structural responses in the frequency domain, thereby reducing computational complexity during optimization. The trained model for structural responses is then integrated with the differential equation for a harmonic oscillator, creating a probabilistic dynamic load model that predicts load patterns without requiring force data during training. The model's effectiveness is validated through two case studies: a numerical model of a wind-excited 76-story building and an experiment using a physical scale model of the Lilleb{\ae}lt Bridge in Denmark, excited by a servo motor. For both cases, validation of the reconstructed forces is provided using comparison metrics for several signal properties. The developed model holds potential for applications in structural health monitoring, damage prognosis, and load model validation.

Updated: 2025-03-12 14:16:27

标题: 基于频率稀疏傅立叶基函数的物理信息高斯过程实现高效的动态模态负载重构

摘要: 结构的力时间历史知识对于评估其行为、确保安全性和保持可靠性至关重要。然而，由于传感器限制、未知的力特性或无法接触的负载点，直接测量外部力通常具有挑战性。本文提出了一种使用基于频率稀疏傅里叶基函数的物理信息高斯过程（GP）的有效动态负载重构方法。GP的协方差矩阵是通过系统动态的描述构建的，模型是使用结构响应测量进行训练的。与纯数据驱动方法相比，这为机器学习模型提供了支持和可解释性。此外，该模型通过利用频域中结构响应的稀疏性来过滤傅里叶基函数中的无关成分，从而减少了优化过程中的计算复杂性。然后，将用于结构响应的训练模型与谐振子的微分方程相结合，创建了一种概率动态负载模型，可以在训练期间预测负载模式而无需力数据。通过两个案例研究验证了模型的有效性：一个是风激励的76层建筑的数值模型，另一个是使用丹麦Lillebælt桥的物理比例模型进行的实验，由伺服电机激励。对于两种情况，通过比较几种信号属性的度量指标提供了重构力的验证。开发的模型在结构健康监测、损伤预测和负载模型验证方面具有潜在的应用价值。

更新时间: 2025-03-12 14:16:27

领域: cs.LG

下载: http://arxiv.org/abs/2503.09418v1

Mitigating Membership Inference Vulnerability in Personalized Federated Learning

Federated Learning (FL) has emerged as a promising paradigm for collaborative model training without the need to share clients' personal data, thereby preserving privacy. However, the non-IID nature of the clients' data introduces major challenges for FL, highlighting the importance of personalized federated learning (PFL) methods. In PFL, models are trained to cater to specific feature distributions present in the population data. A notable method for PFL is the Iterative Federated Clustering Algorithm (IFCA), which mitigates the concerns associated with the non-IID-ness by grouping clients with similar data distributions. While it has been shown that IFCA enhances both accuracy and fairness, its strategy of dividing the population into smaller clusters increases vulnerability to Membership Inference Attacks (MIA), particularly among minorities with limited training samples. In this paper, we introduce IFCA-MIR, an improved version of IFCA that integrates MIA risk assessment into the clustering process. Allowing clients to select clusters based on both model performance and MIA vulnerability, IFCA-MIR achieves an improved performance with respect to accuracy, fairness, and privacy. We demonstrate that IFCA-MIR significantly reduces MIA risk while maintaining comparable model accuracy and fairness as the original IFCA.

Updated: 2025-03-12 14:10:35

标题: 在个性化联邦学习中减轻成员推断漏洞

摘要: 联邦学习（FL）已经成为一种有前途的模式，用于协作模型训练，无需共享客户个人数据，从而保护隐私。然而，客户数据的非IID特性为FL引入了重大挑战，突显了个性化联邦学习（PFL）方法的重要性。在PFL中，模型被训练以满足人口数据中特定特征分布的需求。PFL的一个值得注意的方法是迭代联邦聚类算法（IFCA），它通过将具有相似数据分布的客户分组，缓解了与非IID性相关的担忧。虽然已经证明IFCA提高了准确性和公平性，但其将人口分成较小群集的策略增加了对成员推理攻击（MIA）的脆弱性，特别是在训练样本有限的少数群体中。在本文中，我们介绍了IFCA-MIR，这是IFCA的改进版本，将MIA风险评估整合到聚类过程中。允许客户基于模型性能和MIA脆弱性选择群集，IFCA-MIR在准确性、公平性和隐私方面取得了改进的表现。我们证明IFCA-MIR显着降低了MIA风险，同时保持了与原始IFCA相当的模型准确性和公平性。

更新时间: 2025-03-12 14:10:35

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2503.09414v1

Benefits of Learning Rate Annealing for Tuning-Robustness in Stochastic Optimization

The learning rate in stochastic gradient methods is a critical hyperparameter that is notoriously costly to tune via standard grid search, especially for training modern large-scale models with billions of parameters. We identify a theoretical advantage of learning rate annealing schemes that decay the learning rate to zero at a polynomial rate, such as the widely-used cosine schedule, by demonstrating their increased robustness to initial parameter misspecification due to a coarse grid search. We present an analysis in a stochastic convex optimization setup demonstrating that the convergence rate of stochastic gradient descent with annealed schedules depends sublinearly on the multiplicative misspecification factor $\rho$ (i.e., the grid resolution), achieving a rate of $O(\rho^{1/(2p+1)}/\sqrt{T})$ where $p$ is the degree of polynomial decay and $T$ is the number of steps, in contrast to the $O(\rho/\sqrt{T})$ rate that arises with fixed stepsizes and exhibits a linear dependence on $\rho$. Experiments confirm the increased robustness compared to tuning with a fixed stepsize, that has significant implications for the computational overhead of hyperparameter search in practical training scenarios.

Updated: 2025-03-12 14:06:34

标题: 学习速率退火对随机优化中调节稳健性的好处

摘要: 随机梯度方法中的学习率是一个关键的超参数，通过标准的网格搜索调整学习率通常代价高昂，特别是对于训练拥有数十亿参数的现代大规模模型。我们确定了学习率退火方案的理论优势，通过将学习率以多项式速率衰减至零，例如广泛使用的余弦调度，证明了它们对于由于粗糙网格搜索导致的初始参数错误配置的增加的鲁棒性。我们在随机凸优化设置中进行了分析，证明了随着学习率衰减计划的收敛速度依赖于乘法错误配置因子$\rho$（即网格分辨率）的次线性，实现了一个速率为$O(\rho^{1/(2p+1)}/\sqrt{T})$，其中$p$是多项式衰减的度数，$T$是步数，与固定步长的$O(\rho/\sqrt{T})$速率相反，后者对$\rho$呈线性依赖。实验证实了与固定步长调整相比的增加的鲁棒性，对于实际训练场景中的超参数搜索的计算开销具有重要影响。

更新时间: 2025-03-12 14:06:34

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2503.09411v1

Dual Test-time Training for Out-of-distribution Recommender System

Deep learning has been widely applied in recommender systems, which has achieved revolutionary progress recently. However, most existing learning-based methods assume that the user and item distributions remain unchanged between the training phase and the test phase. However, the distribution of user and item features can naturally shift in real-world scenarios, potentially resulting in a substantial decrease in recommendation performance. This phenomenon can be formulated as an Out-Of-Distribution (OOD) recommendation problem. To address this challenge, we propose a novel Dual Test-Time-Training framework for OOD Recommendation, termed DT3OR. In DT3OR, we incorporate a model adaptation mechanism during the test-time phase to carefully update the recommendation model, allowing the model to specially adapt to the shifting user and item features. To be specific, we propose a self-distillation task and a contrastive task to assist the model learning both the user's invariant interest preferences and the variant user/item characteristics during the test-time phase, thus facilitating a smooth adaptation to the shifting features. Furthermore, we provide theoretical analysis to support the rationale behind our dual test-time training framework. To the best of our knowledge, this paper is the first work to address OOD recommendation via a test-time-training strategy. We conduct experiments on three datasets with various backbones. Comprehensive experimental results have demonstrated the effectiveness of DT3OR compared to other state-of-the-art baselines.

Updated: 2025-03-12 14:06:24

标题: 测试时间双重训练对于超出分发推荐系统的影响

摘要: 深度学习已广泛应用于推荐系统，最近取得了革命性的进展。然而，大多数现有的基于学习的方法假定用户和项目分布在训练阶段和测试阶段之间保持不变。然而，在现实世界的场景中，用户和项目特征的分布可能自然而然地发生变化，可能导致推荐性能大幅下降。这种现象可以被形式化为一种Out-Of-Distribution (OOD) 推荐问题。为了解决这一挑战，我们提出了一种新颖的用于OOD推荐的双重测试时间训练框架，称为DT3OR。在DT3OR中，我们在测试时间阶段引入了一个模型适应机制，仔细更新推荐模型，允许模型专门适应不断变化的用户和项目特征。具体来说，我们提出了自我蒸馏任务和对比任务，以帮助模型在测试时间阶段学习用户的不变兴趣偏好和变化的用户/项目特征，从而促进对变化特征的平滑适应。此外，我们提供了理论分析来支持我们双重测试时间训练框架背后的原理。据我们所知，这篇论文是第一篇通过测试时间训练策略来解决OOD推荐的工作。我们在三个数据集上进行了实验，使用了各种骨干网络。全面的实验结果已经证明了DT3OR相对于其他最先进基线方法的有效性。

更新时间: 2025-03-12 14:06:24

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.15620v2

Probabilistic Language-Image Pre-Training

Vision-language models (VLMs) embed aligned image-text pairs into a joint space but often rely on deterministic embeddings, assuming a one-to-one correspondence between images and texts. This oversimplifies real-world relationships, which are inherently many-to-many, with multiple captions describing a single image and vice versa. We introduce Probabilistic Language-Image Pre-training (ProLIP), the first probabilistic VLM pre-trained on a billion-scale image-text dataset using only probabilistic objectives, achieving a strong zero-shot capability (e.g., 74.6% ImageNet zero-shot accuracy with ViT-B/16). ProLIP efficiently estimates uncertainty by an "uncertainty token" without extra parameters. We also introduce a novel inclusion loss that enforces distributional inclusion relationships between image-text pairs and between original and masked inputs. Experiments demonstrate that, by leveraging uncertainty estimates, ProLIP benefits downstream tasks and aligns with intuitive notions of uncertainty, e.g., shorter texts being more uncertain and more general inputs including specific ones. Utilizing text uncertainties, we further improve ImageNet accuracy from 74.6% to 75.8% (under a few-shot setting), supporting the practical advantages of our probabilistic approach. The code is available at https://github.com/naver-ai/prolip

Updated: 2025-03-12 14:03:31

标题: 概率语言-图像预训练

摘要: 视觉语言模型（VLMs）将对齐的图像文本对嵌入到一个联合空间中，但通常依赖确定性嵌入，假设图像和文本之间是一对一的对应关系。这过于简化了现实世界的关系，现实世界的关系本质上是多对多的，一个图像可以有多个描述，反之亦然。我们引入了概率语言-图像预训练（ProLIP），这是第一个在十亿规模的图像文本数据集上仅使用概率目标进行预训练的概率VLM，实现了强大的零样本能力（例如，使用ViT-B/16实现74.6%的ImageNet零样本准确率）。ProLIP通过“不确定性标记”高效地估计不确定性，而无需额外参数。我们还引入了一种新的包含损失，强化了图像文本对和原始及掩盖输入之间的分布包含关系。实验证明，通过利用不确定性估计，ProLIP有利于下游任务，并与不确定性的直观概念相一致，例如，更短的文本更不确定，而更一般的输入包括特定的输入。利用文本的不确定性，我们将ImageNet的准确率从74.6%提高到75.8%（在少样本设置下），支持我们概率方法的实际优势。该代码可在https://github.com/naver-ai/prolip找到。

更新时间: 2025-03-12 14:03:31

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18857v3

A Survey on Spoken Italian Datasets and Corpora

Spoken language datasets are vital for advancing linguistic research, Natural Language Processing, and speech technology. However, resources dedicated to Italian, a linguistically rich and diverse Romance language, remain underexplored compared to major languages like English or Mandarin. This survey provides a comprehensive analysis of 66 spoken Italian datasets, highlighting their characteristics, methodologies, and applications. The datasets are categorized by speech type, source and context, and demographic and linguistic features, with a focus on their utility in fields such as Automatic Speech Recognition, emotion detection, and education. Challenges related to dataset scarcity, representativeness, and accessibility are discussed alongside recommendations for enhancing dataset creation and utilization. The full dataset inventory is publicly accessible via GitHub and archived on Zenodo, serving as a valuable resource for researchers and developers. By addressing current gaps and proposing future directions, this work aims to support the advancement of Italian speech technologies and linguistic research.

Updated: 2025-03-12 13:59:29

标题: 一个关于意大利口语数据集和语料库的调查

摘要: 口语语言数据集对于推动语言学研究、自然语言处理和语音技术至关重要。然而，相比于像英语或普通话这样的主要语言，专门用于意大利语的资源仍然未被充分开发。本调查对66个口语意大利语数据集进行了全面分析，突出它们的特征、方法和应用。这些数据集按照语音类型、来源和背景以及人口和语言特征进行分类，重点关注它们在自动语音识别、情感检测和教育等领域的实用性。与数据集稀缺性、代表性和可访问性相关的挑战与增强数据集创建和利用的建议一起讨论。完整的数据集清单可通过GitHub公开访问，并存档在Zenodo上，为研究人员和开发人员提供了宝贵的资源。通过解决当前的差距并提出未来的方向，本工作旨在支持意大利语音技术和语言学研究的进步。

更新时间: 2025-03-12 13:59:29

领域: cs.CL,cs.AI,cs.DL,A.1; I.2.7; J.5

下载: http://arxiv.org/abs/2501.06557v2

AI-based Framework for Robust Model-Based Connector Mating in Robotic Wire Harness Installation

Despite the widespread adoption of industrial robots in automotive assembly, wire harness installation remains a largely manual process, as it requires precise and flexible manipulation. To address this challenge, we design a novel AI-based framework that automates cable connector mating by integrating force control with deep visuotactile learning. Our system optimizes search-and-insertion strategies using first-order optimization over a multimodal transformer architecture trained on visual, tactile, and proprioceptive data. Additionally, we design a novel automated data collection and optimization pipeline that minimizes the need for machine learning expertise. The framework optimizes robot programs that run natively on standard industrial controllers, permitting human experts to audit and certify them. Experimental validations on a center console assembly task demonstrate significant improvements in cycle times and robustness compared to conventional robot programming approaches. Videos are available under https://claudius-kienle.github.io/AppMuTT.

Updated: 2025-03-12 13:59:26

标题: 基于人工智能的框架，用于机器人线束安装中稳健的基于模型的连接器配对

摘要: 尽管工业机器人在汽车装配中得到了广泛应用，但线束安装仍然是一个主要的手动过程，因为它需要精确和灵活的操作。为了解决这一挑战，我们设计了一个新颖的基于人工智能的框架，通过将力控制与深度视触学习相结合，自动化电缆连接器的配对。我们的系统通过在视触觉和本体感觉数据上训练的多模态变压器架构上进行一阶优化，优化搜索和插入策略。此外，我们设计了一个新颖的自动化数据收集和优化流程，最大程度减少了对机器学习专业知识的需求。该框架优化了在标准工业控制器上本地运行的机器人程序，允许人类专家对其进行审计和认证。在中控台装配任务上进行的实验验证显示，与传统的机器人编程方法相比，循迹时间和稳健性显著提高。视频可在https://claudius-kienle.github.io/AppMuTT 上观看。

更新时间: 2025-03-12 13:59:26

领域: cs.RO,cs.AI,cs.CE,cs.LG,68T40,I.2; J.2

下载: http://arxiv.org/abs/2503.09409v1

Multi-Agent Image Restoration

Image restoration (IR) is challenging due to the complexity of real-world degradations. While many specialized and all-in-one IR models have been developed, they fail to effectively handle complex, mixed degradations. Recent agentic methods RestoreAgent and AgenticIR leverage intelligent, autonomous workflows to alleviate this issue, yet they suffer from suboptimal results and inefficiency due to their resource-intensive finetunings, and ineffective searches and tool execution trials for satisfactory outputs. In this paper, we propose MAIR, a novel Multi-Agent approach for complex IR problems. We introduce a real-world degradation prior, categorizing degradations into three types: (1) scene, (2) imaging, and (3) compression, which are observed to occur sequentially in real world, and reverse them in the opposite order. Built upon this three-stage restoration framework, MAIR emulates a team of collaborative human specialists, including a "scheduler" for overall planning and multiple "experts" dedicated to specific degradations. This design minimizes search space and trial efforts, improving image quality while reducing inference costs. In addition, a registry mechanism is introduced to enable easy integration of new tools. Experiments on both synthetic and real-world datasets show that proposed MAIR achieves competitive performance and improved efficiency over the previous agentic IR system. Code and models will be made available.

Updated: 2025-03-12 13:53:57

标题: 多智能体图像恢复

摘要: 图像恢复(IR)具有挑战性，因为现实世界中的退化过程复杂。虽然已经开发了许多专门的和全能的IR模型，但它们无法有效处理复杂的混合退化。最近的智能方法RestoreAgent和AgenticIR利用智能自主工作流来缓解这一问题，但由于它们资源密集的微调、对满意输出的搜索和工具执行试验的无效性，它们遭受到次优结果和低效率。在本文中，我们提出了MAIR，一个新颖的用于复杂IR问题的多智能体方法。我们引入了一个真实世界的退化先验，将退化分为三种类型：(1) 场景，(2) 成像和(3) 压缩，这些观察到的退化会按顺序发生在现实世界中，并按相反顺序进行恢复。基于这个三阶段恢复框架，MAIR模拟了一个协作的人类专家团队，包括一个用于整体规划的“调度器”和多个专门处理特定退化的“专家”。这种设计最大程度地减少了搜索空间和试验工作量，提高了图像质量同时降低了推理成本。此外，引入了一个注册机制，以便轻松集成新工具。对合成和真实数据集的实验证明，提出的MAIR在性能上达到了竞争水平，并提高了效率，超过了以前的智能IR系统。代码和模型将会提供。

更新时间: 2025-03-12 13:53:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09403v1

Object-Centric World Model for Language-Guided Manipulation

A world model is essential for an agent to predict the future and plan in domains such as autonomous driving and robotics. To achieve this, recent advancements have focused on video generation, which has gained significant attention due to the impressive success of diffusion models. However, these models require substantial computational resources. To address these challenges, we propose a world model leveraging object-centric representation space using slot attention, guided by language instructions. Our model perceives the current state as an object-centric representation and predicts future states in this representation space conditioned on natural language instructions. This approach results in a more compact and computationally efficient model compared to diffusion-based generative alternatives. Furthermore, it flexibly predicts future states based on language instructions, and offers a significant advantage in manipulation tasks where object recognition is crucial. In this paper, we demonstrate that our latent predictive world model surpasses generative world models in visuo-linguo-motor control tasks, achieving superior sample and computation efficiency. We also investigate the generalization performance of the proposed method and explore various strategies for predicting actions using object-centric representations.

Updated: 2025-03-12 13:52:50

标题: 以物体为中心的世界模型用于语言引导操作

摘要: 一个世界模型对于代理人来说是至关重要的，以便预测未来并在自主驾驶和机器人等领域进行规划。为了实现这一目标，最近的进展集中在视频生成上，由于扩散模型取得了显著的成功，因此受到了广泛关注。然而，这些模型需要大量的计算资源。为了解决这些挑战，我们提出了一种利用对象中心表示空间的世界模型，使用槽注意力进行引导，并结合自然语言指令。我们的模型将当前状态视为对象中心表示，并在此表示空间中预测未来状态，以自然语言指令为条件。与基于扩散的生成替代方案相比，这种方法导致了一个更紧凑和计算效率更高的模型。此外，它可以灵活地根据语言指令预测未来状态，并在需要对象识别的操纵任务中提供重要优势。在本文中，我们证明了我们的潜在预测世界模型在视觉-语言-运动控制任务中优于生成世界模型，实现了更高的样本和计算效率。我们还调查了所提出方法的泛化性能，并探索了使用对象中心表示来预测动作的各种策略。

更新时间: 2025-03-12 13:52:50

领域: cs.AI,cs.CV,cs.RO

下载: http://arxiv.org/abs/2503.06170v2

ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation

Transformers, particularly Vision Transformers (ViTs), have achieved state-of-the-art performance in large-scale image classification. However, they often require large amounts of data and can exhibit biases that limit their robustness and generalizability. This paper introduces ForAug, a novel data augmentation scheme that addresses these challenges and explicitly includes inductive biases, which commonly are part of the neural network architecture, into the training data. ForAug is constructed by using pretrained foundation models to separate and recombine foreground objects with different backgrounds, enabling fine-grained control over image composition during training. It thus increases the data diversity and effective number of training samples. We demonstrate that training on ForNet, the application of ForAug to ImageNet, significantly improves the accuracy of ViTs and other architectures by up to 4.5 percentage points (p.p.) on ImageNet and 7.3 p.p. on downstream tasks. Importantly, ForAug enables novel ways of analyzing model behavior and quantifying biases. Namely, we introduce metrics for background robustness, foreground focus, center bias, and size bias and show that training on ForNet substantially reduces these biases compared to training on ImageNet. In summary, ForAug provides a valuable tool for analyzing and mitigating biases, enabling the development of more robust and reliable computer vision models. Our code and dataset are publicly available at https://github.com/tobna/ForAug.

Updated: 2025-03-12 13:49:45

标题: ForAug：重新组合前景和背景以改善视觉变换器训练并减轻偏见

摘要: 变压器，特别是视觉变压器（ViTs），已经在大规模图像分类中取得了最先进的性能。然而，它们通常需要大量的数据，并且可能表现出限制它们的鲁棒性和泛化能力的偏见。本文介绍了一种新颖的数据增强方案ForAug，该方案解决了这些挑战，并明确地将归纳偏见（通常是神经网络架构的一部分）纳入训练数据中。ForAug通过使用预训练的基础模型将具有不同背景的前景对象分离并重新组合，从而在训练过程中对图像组成进行细粒度控制。因此，它增加了数据的多样性和有效训练样本数量。我们证明，在ForNet上训练，即将ForAug应用于ImageNet，可以将ViTs和其他架构的准确性显著提高多达4.5个百分点（p.p.）在ImageNet上和7.3 p.p.在下游任务上。重要的是，ForAug使得分析模型行为和量化偏见的新方法成为可能。我们引入了背景鲁棒性、前景焦点、中心偏差和大小偏差的度量标准，并表明，在ForNet上训练相对于在ImageNet上训练，可以显著降低这些偏见。总之，ForAug为分析和减轻偏见提供了一个有价值的工具，促进了更加鲁棒和可靠的计算机视觉模型的发展。我们的代码和数据集可在https://github.com/tobna/ForAug上公开获取。

更新时间: 2025-03-12 13:49:45

领域: cs.CV,cs.AI,cs.LG,68T45,I.2.10; I.2.6; I.4.6

下载: http://arxiv.org/abs/2503.09399v1

Precoder Learning by Leveraging Unitary Equivariance Property

Incorporating mathematical properties of a wireless policy to be learned into the design of deep neural networks (DNNs) is effective for enhancing learning efficiency. Multi-user precoding policy in multi-antenna system, which is the mapping from channel matrix to precoding matrix, possesses a permutation equivariance property, which has been harnessed to design the parameter sharing structure of the weight matrix of DNNs. In this paper, we study a stronger property than permutation equivariance, namely unitary equivariance, for precoder learning. We first show that a DNN with unitary equivariance designed by further introducing parameter sharing into a permutation equivariant DNN is unable to learn the optimal precoder. We proceed to develop a novel non-linear weighting process satisfying unitary equivariance and then construct a joint unitary and permutation equivariant DNN. Simulation results demonstrate that the proposed DNN not only outperforms existing learning methods in learning performance and generalizability but also reduces training complexity.

Updated: 2025-03-12 13:48:34

标题: 通过利用酉等变性质进行预编码学习

摘要: 将要学习的无线策略的数学属性融入深度神经网络（DNNs）的设计中，对增强学习效率是有效的。在多天线系统中的多用户预编码策略，即从信道矩阵到预编码矩阵的映射，具有置换等变性属性，已被利用来设计DNNs的权重矩阵的参数共享结构。本文研究了比置换等变性更强的属性，即酉等变性，用于预编码器学习。我们首先展示了通过进一步将参数共享引入到置换等变性DNN中设计的具有酉等变性的DNN无法学习到最优的预编码器。然后我们继续开发了一个满足酉等变性的新型非线性加权过程，然后构建了一个联合酉和置换等变性的DNN。模拟结果表明，所提出的DNN不仅在学习性能和泛化性方面优于现有的学习方法，还减少了训练复杂性。

更新时间: 2025-03-12 13:48:34

领域: eess.SP,cs.LG,cs.SY,eess.SY,math.GR

下载: http://arxiv.org/abs/2503.09398v1

Close-up-GS: Enhancing Close-Up View Synthesis in 3D Gaussian Splatting with Progressive Self-Training

3D Gaussian Splatting (3DGS) has demonstrated impressive performance in synthesizing novel views after training on a given set of viewpoints. However, its rendering quality deteriorates when the synthesized view deviates significantly from the training views. This decline occurs due to (1) the model's difficulty in generalizing to out-of-distribution scenarios and (2) challenges in interpolating fine details caused by substantial resolution changes and occlusions. A notable case of this limitation is close-up view generation--producing views that are significantly closer to the object than those in the training set. To tackle this issue, we propose a novel approach for close-up view generation based by progressively training the 3DGS model with self-generated data. Our solution is based on three key ideas. First, we leverage the See3D model, a recently introduced 3D-aware generative model, to enhance the details of rendered views. Second, we propose a strategy to progressively expand the ``trust regions'' of the 3DGS model and update a set of reference views for See3D. Finally, we introduce a fine-tuning strategy to carefully update the 3DGS model with training data generated from the above schemes. We further define metrics for close-up views evaluation to facilitate better research on this problem. By conducting evaluations on specifically selected scenarios for close-up views, our proposed approach demonstrates a clear advantage over competitive solutions.

Updated: 2025-03-12 13:44:00

标题: Close-up-GS：通过渐进自训练增强3D高斯喷涂中的特写视图合成

摘要: 3D高斯雪花(3DGS)在训练给定视点集合后合成新视图方面表现出令人印象深刻的性能。然而，当合成视图与训练视图明显偏离时，其渲染质量会下降。这种下降是由于(1)模型难以推广到分布之外的情况，以及(2)由分辨率变化和遮挡引起的插值细节挑战。这种限制的一个显著案例是近距离视图生成--产生比训练集中更接近物体的视图。为了解决这个问题，我们提出了一种基于渐进训练3DGS模型的自生成数据的近距离视图生成方法。我们的解决方案基于三个关键思想。首先，我们利用See3D模型，这是一种最近引入的3D感知生成模型，来增强渲染视图的细节。其次，我们提出一种策略，逐步扩大3DGS模型的“信任区域”，并更新See3D的一组参考视图。最后，我们引入了一个微调策略，仔细地使用从上述方案生成的训练数据更新3DGS模型。我们进一步定义了用于近距离视图评估的指标，以促进对这个问题的更好研究。通过在专门选择的近距离视图场景上进行评估，我们提出的方法表现出明显优势。

更新时间: 2025-03-12 13:44:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09396v1

Adjusted Count Quantification Learning on Graphs

Quantification learning is the task of predicting the label distribution of a set of instances. We study this problem in the context of graph-structured data, where the instances are vertices. Previously, this problem has only been addressed via node clustering methods. In this paper, we extend the popular Adjusted Classify & Count (ACC) method to graphs. We show that the prior probability shift assumption upon which ACC relies is often not fulfilled and propose two novel graph quantification techniques: Structural importance sampling (SIS) makes ACC applicable in graph domains with covariate shift. Neighborhood-aware ACC improves quantification in the presence of non-homophilic edges. We show the effectiveness of our techniques on multiple graph quantification tasks.

Updated: 2025-03-12 13:42:13

标题: 在图上的调整计数量化学习

摘要: Quantification learning是指预测一组实例的标签分布的任务。我们在图结构数据的背景下研究了这个问题，其中实例是顶点。先前，这个问题只能通过节点聚类方法来解决。在本文中，我们将流行的Adjusted Classify & Count（ACC）方法扩展到图形中。我们展示了ACC依赖的先验概率转移假设通常不成立，并提出了两种新颖的图形量化技术：结构重要性采样（SIS）使ACC在具有协变量转移的图域中适用。邻域感知ACC在存在非同质边的情况下改善了量化效果。我们展示了我们的技术在多个图量化任务上的有效性。

更新时间: 2025-03-12 13:42:13

领域: cs.LG

下载: http://arxiv.org/abs/2503.09395v1

Symbolic Approximations to Ricci-flat Metrics Via Extrinsic Symmetries of Calabi-Yau Hypersurfaces

Ever since Yau's non-constructive existence proof of Ricci-flat metrics on Calabi-Yau manifolds, finding their explicit construction remains a major obstacle to development of both string theory and algebraic geometry. Recent computational approaches employ machine learning to create novel neural representations for approximating these metrics, offering high accuracy but limited interpretability. In this paper, we analyse machine learning approximations to flat metrics of Fermat Calabi-Yau n-folds and some of their one-parameter deformations in three dimensions in order to discover their new properties. We formalise cases in which the flat metric has more symmetries than the underlying manifold, and prove that these symmetries imply that the flat metric admits a surprisingly compact representation for certain choices of complex structure moduli. We show that such symmetries uniquely determine the flat metric on certain loci, for which we present an analytic form. We also incorporate our theoretical results into neural networks to reduce Ricci curvature for multiple Calabi--Yau manifolds compared to previous machine learning approaches. We conclude by distilling the ML models to obtain for the first time closed form expressions for Kahler metrics with near-zero scalar curvature.

Updated: 2025-03-12 13:38:59

标题: 通过卡拉比-亚当超曲面的外在对称性对无黎曲率度量的象征近似

摘要: 自从Yau在Calabi-Yau流形上提出了无构造性的Ricci平坦度量存在证明以来，找到它们的明确构造仍然是发展弦论和代数几何的主要障碍。最近的计算方法利用机器学习来创建新的神经表示，以逼近这些度量，提供了高准确性但有限的可解释性。在本文中，我们分析机器学习对Fermat Calabi-Yau n-折叠的平坦度量及其一些三维中的单参数变形的近似，以发现它们的新属性。我们形式化了平坦度量具有比基础流形更多对称性的情况，并证明这些对称性意味着对于某些复结构模量的选择，平坦度量具有令人惊讶地紧凑的表示。我们展示了这些对称性在某些位置唯一确定了平坦度量，针对这些位置我们提供了一个解析形式。我们还将我们的理论结果整合到神经网络中，以减少多个Calabi-Yau流形的Ricci曲率，与之前的机器学习方法相比。最后，我们通过提炼ML模型，首次得到了具有接近零标量曲率的Kahler度量的封闭形式表达式。

更新时间: 2025-03-12 13:38:59

领域: hep-th,cs.LG,math.AG,math.DG

下载: http://arxiv.org/abs/2412.19778v2

Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic

In XR downlink transmission, energy-efficient power scheduling (EEPS) is essential for conserving power resource while delivering large data packets within hard-latency constraints. Traditional constrained reinforcement learning (CRL) algorithms show promise in EEPS but still struggle with non-convex stochastic constraints, non-stationary data traffic, and sparse delayed packet dropout feedback (rewards) in XR. To overcome these challenges, this paper models the EEPS in XR as a dynamic parameter-constrained Markov decision process (DP-CMDP) with a varying transition function linked to the non-stationary data traffic and solves it by a proposed context-aware constrained reinforcement learning (CACRL) algorithm, which consists of a context inference (CI) module and a CRL module. The CI module trains an encoder and multiple potential networks to characterize the current transition function and reshape the packet dropout rewards according to the context, transforming the original DP-CMDP into a general CMDP with immediate dense rewards. The CRL module employs a policy network to make EEPS decisions under this CMDP and optimizes the policy using a constrained stochastic successive convex approximation (CSSCA) method, which is better suited for non-convex stochastic constraints. Finally, theoretical analyses provide deep insights into the CADAC algorithm, while extensive simulations demonstrate that it outperforms advanced baselines in both power conservation and satisfying packet dropout constraints.

Updated: 2025-03-12 13:37:19

标题: 上下文感知约束强化学习基于非稳态XR数据流量的能效功率调度

摘要: 在XR下行传输中，能效功率调度（EEPS）对于在硬延迟约束条件下传输大数据包的同时节约能源资源至关重要。传统的受限强化学习（CRL）算法在EEPS方面表现出潜力，但仍然面临非凸随机约束、非稳态数据流量和XR中稀疏延迟数据包丢失反馈（奖励）等挑战。为了克服这些挑战，本文将XR中的EEPS建模为具有动态参数约束的马尔可夫决策过程（DP-CMDP），其变化的转移函数与非稳态数据流量相关联，并通过提出的上下文感知受限强化学习（CACRL）算法解决，该算法包括上下文推断（CI）模块和CRL模块。CI模块训练编码器和多个潜在网络来表征当前的转移函数，并根据上下文重新塑造数据包丢失奖励，将原始的DP-CMDP转换为具有即时密集奖励的一般CMDP。CRL模块采用策略网络在该CMDP下做出EEPS决策，并使用约束随机逐步凸近似（CSSCA）方法优化策略，该方法更适用于非凸随机约束。最后，理论分析深入探讨了CADAC算法，而广泛的模拟表明它在节约能源和满足数据包丢失约束方面优于先进基线。

更新时间: 2025-03-12 13:37:19

领域: eess.SY,cs.ET,cs.LG,cs.SY

下载: http://arxiv.org/abs/2503.09391v1

Evaluating Reinforcement Learning Safety and Trustworthiness in Cyber-Physical Systems

Cyber-Physical Systems (CPS) often leverage Reinforcement Learning (RL) techniques to adapt dynamically to changing environments and optimize performance. However, it is challenging to construct safety cases for RL components. We therefore propose the SAFE-RL (Safety and Accountability Framework for Evaluating Reinforcement Learning) for supporting the development, validation, and safe deployment of RL-based CPS. We adopt a design science approach to construct the framework and demonstrate its use in three RL applications in small Uncrewed Aerial systems (sUAS)

Updated: 2025-03-12 13:33:07

标题: 评估强化学习在网络物理系统中的安全性和可信度

摘要: 网络物理系统（CPS）通常利用强化学习（RL）技术动态适应变化环境并优化性能。然而，为RL组件构建安全案例是具有挑战性的。因此，我们提出了SAFE-RL（用于评估强化学习的安全和问责框架），以支持RL基于CPS的开发、验证和安全部署。我们采用设计科学方法构建框架，并在三个RL应用中展示其用途，这些应用是在小型无人机系统（sUAS）中进行的。

更新时间: 2025-03-12 13:33:07

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2503.09388v1

Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models

Representation Engineering (RepE) is a novel paradigm for controlling the behavior of LLMs. Unlike traditional approaches that modify inputs or fine-tune the model, RepE directly manipulates the model's internal representations. As a result, it may offer more effective, interpretable, data-efficient, and flexible control over models' behavior. We present the first comprehensive survey of RepE for LLMs, reviewing the rapidly growing literature to address key questions: What RepE methods exist and how do they differ? For what concepts and problems has RepE been applied? What are the strengths and weaknesses of RepE compared to other methods? To answer these, we propose a unified framework describing RepE as a pipeline comprising representation identification, operationalization, and control. We posit that while RepE methods offer significant potential, challenges remain, including managing multiple concepts, ensuring reliability, and preserving models' performance. Towards improving RepE, we identify opportunities for experimental and methodological improvements and construct a guide for best practices.

Updated: 2025-03-12 13:31:36

标题: 大语言模型的分类学、机遇和表征工程的挑战

摘要: Representation Engineering (RepE) 是一种用于控制LLMs行为的新范式。与修改输入或微调模型的传统方法不同，RepE直接操纵模型的内部表示。因此，它可能提供比其他方法更有效、可解释、数据高效和灵活的行为控制。我们提出了对LLMs的RepE的第一次全面调查，回顾了迅速增长的文献，以回答关键问题：存在哪些RepE方法，它们之间有何不同？RepE已被应用于哪些概念和问题？与其他方法相比，RepE的优势和劣势是什么？为了回答这些问题，我们提出了一个描述RepE的统一框架，将其描述为一个包括表示识别、操作化和控制的管道。我们认为，虽然RepE方法具有重大潜力，但仍然存在挑战，包括管理多个概念、确保可靠性和保持模型性能。为了改进RepE，我们确定了实验和方法上的改进机会，并制定了最佳实践指南。

更新时间: 2025-03-12 13:31:36

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2502.19649v3

Revisiting Agnostic Boosting

Boosting is a key method in statistical learning, allowing for converting weak learners into strong ones. While well studied in the realizable case, the statistical properties of weak-to-strong learning remains less understood in the agnostic setting, where there are no assumptions on the distribution of the labels. In this work, we propose a new agnostic boosting algorithm with substantially improved sample complexity compared to prior works under very general assumptions. Our approach is based on a reduction to the realizable case, followed by a margin-based filtering step to select high-quality hypotheses. We conjecture that the error rate achieved by our proposed method is optimal up to logarithmic factors.

Updated: 2025-03-12 13:29:33

标题: 重新审视无偏提升

摘要: Boosting是统计学习中的关键方法，可以将弱学习器转化为强学习器。虽然在可实现情况下进行了深入研究，但对在没有对标签分布做任何假设的不可知设置中，弱到强学习的统计特性仍未被很好理解。在这项工作中，我们提出了一种新的不可知Boosting算法，在非常普遍的假设下比之前的工作具有大幅提高的样本复杂度。我们的方法基于将问题转化为可实现情况，然后通过基于边界的过滤步骤选择高质量的假设。我们猜测我们提出的方法所达到的错误率在对数因子上是最优的。

更新时间: 2025-03-12 13:29:33

领域: cs.LG

下载: http://arxiv.org/abs/2503.09384v1

Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs

Recommender systems (RecSys) are widely used across various modern digital platforms and have garnered significant attention. Traditional recommender systems usually focus only on fixed and simple recommendation scenarios, making it difficult to generalize to new and unseen recommendation tasks in an interactive paradigm. Recently, the advancement of large language models (LLMs) has revolutionized the foundational architecture of RecSys, driving their evolution into more intelligent and interactive personalized recommendation assistants. However, most existing studies rely on fixed task-specific prompt templates to generate recommendations and evaluate the performance of personalized assistants, which limits the comprehensive assessments of their capabilities. This is because commonly used datasets lack high-quality textual user queries that reflect real-world recommendation scenarios, making them unsuitable for evaluating LLM-based personalized recommendation assistants. To address this gap, we introduce RecBench+, a new dataset benchmark designed to access LLMs' ability to handle intricate user recommendation needs in the era of LLMs. RecBench+ encompasses a diverse set of queries that span both hard conditions and soft preferences, with varying difficulty levels. We evaluated commonly used LLMs on RecBench+ and uncovered below findings: 1) LLMs demonstrate preliminary abilities to act as recommendation assistants, 2) LLMs are better at handling queries with explicitly stated conditions, while facing challenges with queries that require reasoning or contain misleading information. Our dataset has been released at https://github.com/jiani-huang/RecBench.git.

Updated: 2025-03-12 13:28:23

标题: 走向下一代推荐系统：带有LLMs的个性化推荐助手的基准测试

摘要: 推荐系统（RecSys）被广泛应用于各种现代数字平台，并受到了广泛关注。传统的推荐系统通常只关注固定和简单的推荐场景，这使得难以推广到新的和未见过的推荐任务中的交互范式。最近，大型语言模型（LLMs）的进步彻底改变了RecSys的基本架构，推动它们演变成更智能和交互式的个性化推荐助手。然而，大多数现有研究依赖于固定的任务特定提示模板来生成推荐并评估个性化助手的性能，这限制了对它们能力的全面评估。这是因为常用数据集缺乏反映真实推荐场景的高质量文本用户查询，使其不适合评估基于LLM的个性化推荐助手。为了弥补这一差距，我们介绍了RecBench+，一个旨在评估LLMs在LLMs时代处理复杂用户推荐需求能力的新数据集基准。RecBench+包含一组多样化的查询，涵盖了困难条件和软偏好，难度各异。我们在RecBench+上评估了常用的LLMs，并揭示了以下发现：1）LLMs展示了作为推荐助手的初步能力，2）LLMs更擅长处理明确陈述条件的查询，而在需要推理或包含误导信息的查询中面临挑战。我们的数据集已在https://github.com/jiani-huang/RecBench.git上发布。

更新时间: 2025-03-12 13:28:23

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2503.09382v1

Faithful and Privacy-Preserving Implementation of Average Consensus

We propose a protocol based on mechanism design theory and encrypted control to solve average consensus problems among rational and strategic agents while preserving their privacy. The proposed protocol provides a mechanism that incentivizes the agents to faithfully implement the intended behavior specified in the protocol. Furthermore, the protocol runs over encrypted data using homomorphic encryption and secret sharing to protect the privacy of agents. We also analyze the security of the proposed protocol using a simulation paradigm in secure multi-party computation. The proposed protocol demonstrates that mechanism design and encrypted control can complement each other to achieve security under rational adversaries.

Updated: 2025-03-12 13:28:22

标题: 忠实且保护隐私的平均一致性实现

摘要: 我们提出了一个基于机制设计理论和加密控制的协议，以解决理性和战略代理之间的平均共识问题，同时保护他们的隐私。所提出的协议提供了一种机制，激励代理方忠实地实施协议中规定的预期行为。此外，该协议通过使用同态加密和秘密分享运行在加密数据上，以保护代理方的隐私。我们还使用安全多方计算中的仿真范式分析了所提出的协议的安全性。所提出的协议表明，机制设计和加密控制可以互补，以在理性对手下实现安全性。

更新时间: 2025-03-12 13:28:22

领域: eess.SY,cs.CR,cs.SY

下载: http://arxiv.org/abs/2503.09381v1

Pig behavior dataset and Spatial-temporal perception and enhancement networks based on the attention mechanism for pig behavior recognition

The recognition of pig behavior plays a crucial role in smart farming and welfare assurance for pigs. Currently, in the field of pig behavior recognition, the lack of publicly available behavioral datasets not only limits the development of innovative algorithms but also hampers model robustness and algorithm optimization.This paper proposes a dataset containing 13 pig behaviors that significantly impact welfare.Based on this dataset, this paper proposes a spatial-temporal perception and enhancement networks based on the attention mechanism to model the spatiotemporal features of pig behaviors and their associated interaction areas in video data. The network is composed of a spatiotemporal perception network and a spatiotemporal feature enhancement network. The spatiotemporal perception network is responsible for establishing connections between the pigs and the key regions of their behaviors in the video data. The spatiotemporal feature enhancement network further strengthens the important spatial features of individual pigs and captures the long-term dependencies of the spatiotemporal features of individual behaviors by remodeling these connections, thereby enhancing the model's perception of spatiotemporal changes in pig behaviors. Experimental results demonstrate that on the dataset established in this paper, our proposed model achieves a MAP score of 75.92%, which is an 8.17% improvement over the best-performing traditional model. This study not only improces the accuracy and generalizability of individual pig behavior recognition but also provides new technological tools for modern smart farming. The dataset and related code will be made publicly available alongside this paper.

Updated: 2025-03-12 13:27:29

标题: 猪行为数据集以及基于注意力机制的猪行为识别的空间-时间感知和增强网络

摘要: 猪的行为识别在智能农业和猪福利保障中起着至关重要的作用。目前，在猪行为识别领域，缺乏公开可用的行为数据集不仅限制了创新算法的发展，还影响了模型的稳健性和算法的优化。本文提出了一个包含13种显著影响福利的猪行为的数据集。基于这个数据集，本文提出了一种基于注意机制的空间-时间感知和增强网络，用于对猪行为的空间-时间特征及其相关交互区域在视频数据中进行建模。该网络由一个空间-时间感知网络和一个空间-时间特征增强网络组成。空间-时间感知网络负责在视频数据中建立猪与其行为关键区域之间的联系。空间-时间特征增强网络进一步加强了独立猪的重要空间特征，并通过重新建模这些联系，捕捉了独立行为的空间-时间特征的长期依赖，从而增强了模型对猪行为空间-时间变化的感知。实验结果表明，在本文建立的数据集上，我们提出的模型实现了75.92%的MAP分数，比表现最好的传统模型提高了8.17%。这项研究不仅提高了个体猪行为识别的准确性和泛化能力，还为现代智能农业提供了新的技术工具。数据集和相关代码将与本文一起公开发布。

更新时间: 2025-03-12 13:27:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09378v1

Quantum Computing and Cybersecurity Education: A Novel Curriculum for Enhancing Graduate STEM Learning

Quantum computing is an emerging paradigm with the potential to transform numerous application areas by addressing problems considered intractable in the classical domain. However, its integration into cyberspace introduces significant security and privacy challenges. The exponential rise in cyber attacks, further complicated by quantum capabilities, poses serious risks to financial systems and national security. The scope of quantum threats extends beyond traditional software, operating system, and network vulnerabilities, necessitating a shift in cybersecurity education. Traditional cybersecurity education, often reliant on didactic methods, lacks hands on, student centered learning experiences necessary to prepare students for these evolving challenges. There is an urgent need for curricula that address both classical and quantum security threats through experiential learning. In this work, we present the design and evaluation of EE 597: Introduction to Hardware Security, a graduate level course integrating hands-on quantum security learning with classical security concepts through simulations and cloud-based quantum hardware. Unlike conventional courses focused on quantum threats to cryptographic systems, EE 597 explores security challenges specific to quantum computing itself. We employ a mixed-methods evaluation using pre and post surveys to assess student learning outcomes and engagement. Results indicate significant improvements in students' understanding of quantum and hardware security, with strong positive feedback on course structure and remote instruction (mean scores: 3.33 to 3.83 on a 4 point scale).

Updated: 2025-03-12 13:26:54

标题: 量子计算和网络安全教育：增强研究生STEM学习的新课程

摘要: 量子计算是一种新兴的范式，有潜力通过解决在经典领域中被认为无法解决的问题来改变许多应用领域。然而，将其整合到网络空间中引入了重大的安全和隐私挑战。网络攻击的指数增长，再加上量子能力，对金融系统和国家安全构成严重风险。量子威胁的范围超出了传统软件、操作系统和网络漏洞，需要在网络安全教育中进行转变。传统的网络安全教育通常依赖于教学方法，缺乏对学生进行必要的以学生为中心的实践学习体验，以应对这些不断发展的挑战。有迫切需要设计既涵盖经典安全威胁又通过经验学习来解决量子安全威胁的课程。在这项工作中，我们介绍了EE 597：硬件安全导论的设计和评估，这是一门研究生水平的课程，通过模拟和基于云的量子硬件，将实践的量子安全学习与经典安全概念整合在一起。与传统课程侧重于量子威胁加密系统不同，EE 597探讨了与量子计算本身相关的安全挑战。我们采用混合方法评估，使用预测后调查评估学生学习成果和参与度。结果表明，学生对量子和硬件安全的理解有显著提高，并对课程结构和远程教学给予强烈的积极反馈（平均分数：4分制的3.33到3.83）。

更新时间: 2025-03-12 13:26:54

领域: cs.CR,cs.ET,quant-ph

下载: http://arxiv.org/abs/2503.09375v1

CryptoX : Compositional Reasoning Evaluation of Large Language Models

The compositional reasoning capacity has long been regarded as critical to the generalization and intelligence emergence of large language models LLMs. However, despite numerous reasoning-related benchmarks, the compositional reasoning capacity of LLMs is rarely studied or quantified in the existing benchmarks. In this paper, we introduce CryptoX, an evaluation framework that, for the first time, combines existing benchmarks and cryptographic, to quantify the compositional reasoning capacity of LLMs. Building upon CryptoX, we construct CryptoBench, which integrates these principles into several benchmarks for systematic evaluation. We conduct detailed experiments on widely used open-source and closed-source LLMs using CryptoBench, revealing a huge gap between open-source and closed-source LLMs. We further conduct thorough mechanical interpretability experiments to reveal the inner mechanism of LLMs' compositional reasoning, involving subproblem decomposition, subproblem inference, and summarizing subproblem conclusions. Through analysis based on CryptoBench, we highlight the value of independently studying compositional reasoning and emphasize the need to enhance the compositional reasoning capabilities of LLMs.

Updated: 2025-03-12 13:17:27

标题: CryptoX：大型语言模型的组合推理评估

摘要: 长期以来，组合推理能力一直被认为是大型语言模型LLMs泛化和智能产生的关键。然而，尽管存在许多与推理相关的基准，但现有基准中很少研究或量化LLMs的组合推理能力。在本文中，我们引入了CryptoX，一个评估框架，首次将现有基准和密码学结合起来，用于量化LLMs的组合推理能力。基于CryptoX，我们构建了CryptoBench，将这些原则整合到几个基准中进行系统评估。我们使用CryptoBench对广泛使用的开源和闭源LLMs进行了详细实验，揭示了开源和闭源LLMs之间的巨大差距。我们进一步进行了彻底的机械可解释性实验，揭示了LLMs组合推理的内在机制，包括子问题分解、子问题推理和总结子问题结论。通过基于CryptoBench的分析，我们强调了独立研究组合推理的价值，并强调了提高LLMs组合推理能力的必要性。

更新时间: 2025-03-12 13:17:27

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2502.07813v2

Revisiting Medical Image Retrieval via Knowledge Consolidation

As artificial intelligence and digital medicine increasingly permeate healthcare systems, robust governance frameworks are essential to ensure ethical, secure, and effective implementation. In this context, medical image retrieval becomes a critical component of clinical data management, playing a vital role in decision-making and safeguarding patient information. Existing methods usually learn hash functions using bottleneck features, which fail to produce representative hash codes from blended embeddings. Although contrastive hashing has shown superior performance, current approaches often treat image retrieval as a classification task, using category labels to create positive/negative pairs. Moreover, many methods fail to address the out-of-distribution (OOD) issue when models encounter external OOD queries or adversarial attacks. In this work, we propose a novel method to consolidate knowledge of hierarchical features and optimisation functions. We formulate the knowledge consolidation by introducing Depth-aware Representation Fusion (DaRF) and Structure-aware Contrastive Hashing (SCH). DaRF adaptively integrates shallow and deep representations into blended features, and SCH incorporates image fingerprints to enhance the adaptability of positive/negative pairings. These blended features further facilitate OOD detection and content-based recommendation, contributing to a secure AI-driven healthcare environment. Moreover, we present a content-guided ranking to improve the robustness and reproducibility of retrieval results. Our comprehensive assessments demonstrate that the proposed method could effectively recognise OOD samples and significantly outperform existing approaches in medical image retrieval (p<0.05). In particular, our method achieves a 5.6-38.9% improvement in mean Average Precision on the anatomical radiology dataset.

Updated: 2025-03-12 13:16:42

标题: 重新审视通过知识整合进行医学图像检索

摘要: 随着人工智能和数字医学日益渗透到医疗系统中，健全的治理框架对于确保伦理、安全和有效的实施至关重要。在这种背景下，医学图像检索成为临床数据管理的关键组成部分，对决策和保护患者信息起着至关重要的作用。现有方法通常使用瓶颈特征学习哈希函数，但无法从混合嵌入中产生代表性的哈希码。尽管对比哈希显示出优越的性能，但当前方法通常将图像检索视为分类任务，使用类别标签创建正/负对。此外，许多方法在模型遇到外部OOD查询或对抗性攻击时未能解决分布外（OOD）问题。在这项工作中，我们提出了一种整合层次特征和优化函数知识的新方法。我们通过引入Depth-aware Representation Fusion（DaRF）和Structure-aware Contrastive Hashing（SCH）来形成知识整合。DaRF自适应地将浅层和深层表示集成到混合特征中，SCH则融入图像指纹以增强正/负对的适应性。这些混合特征进一步促进OOD检测和基于内容的推荐，有助于建立一个安全的AI驱动的医疗环境。此外，我们提出了内容引导的排名来提高检索结果的稳健性和可重复性。我们的全面评估表明，所提出的方法能够有效识别OOD样本，并在医学图像检索中显著优于现有方法（p <0.05）。特别是，我们的方法在解剖放射学数据集上平均精度提高了5.6-38.9%。

更新时间: 2025-03-12 13:16:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09370v1

Magnetic Field Data Calibration with Transformer Model Using Physical Constraints: A Scalable Method for Satellite Missions, Illustrated by Tianwen-1

This study introduces a novel approach that integrates the magnetic field data correction from the Tianwen-1 Mars mission with a neural network architecture constrained by physical principles derived from Maxwell's equation equations. By employing a Transformer based model capable of efficiently handling sequential data, the method corrects measurement anomalies caused by satellite dynamics, instrument interference, and environmental noise. As a result, it significantly improves both the accuracy and the physical consistency of the calibrated data. Compared to traditional methods that require long data segments and manual intervention often taking weeks or even months to complete this new approach can finish calibration in just minutes to hours, and predictions are made within seconds. This innovation not only accelerates the process of space weather modeling and planetary magnetospheric studies but also provides a robust framework for future planetary exploration and solar wind interaction research.

Updated: 2025-03-12 13:15:56

标题: 利用物理约束条件对变压器模型进行磁场数据校准：一种适用于卫星任务的可扩展方法，以天问一号为例进行说明

摘要: 本研究介绍了一种新颖的方法，将天问一号火星任务的磁场数据校正与基于麦克斯韦方程的物理原理约束的神经网络架构相结合。通过采用一种基于Transformer的模型，能够高效处理序列数据，该方法校正了由卫星动态、仪器干扰和环境噪声引起的测量异常。结果显著提高了校准数据的精度和物理一致性。与传统方法相比，传统方法通常需要长时间的数据段和手动干预，通常需要数周甚至数月才能完成校准，而这种新方法可以在几分钟到几小时内完成校准，预测则在几秒钟内完成。这种创新不仅加速了空间天气建模和行星磁层研究的过程，还为未来行星探索和太阳风相互作用研究提供了一个强大的框架。

更新时间: 2025-03-12 13:15:56

领域: physics.space-ph,astro-ph.EP,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2501.00020v3

In Context Learning and Reasoning for Symbolic Regression with Large Language Models

Large Language Models (LLMs) are transformer-based machine learning models that have shown remarkable performance in tasks for which they were not explicitly trained. Here, we explore the potential of LLMs to perform symbolic regression -- a machine-learning method for finding simple and accurate equations from datasets. We prompt GPT-4 to suggest expressions from data, which are then optimized and evaluated using external Python tools. These results are fed back to GPT-4, which proposes improved expressions while optimizing for complexity and loss. Using chain-of-thought prompting, we instruct GPT-4 to analyze the data, prior expressions, and the scientific context (expressed in natural language) for each problem before generating new expressions. We evaluated the workflow in rediscovery of five well-known scientific equations from experimental data, and on an additional dataset without a known equation. GPT-4 successfully rediscovered all five equations, and in general, performed better when prompted to use a scratchpad and consider scientific context. We demonstrate how strategic prompting improves the model's performance and how the natural language interface simplifies integrating theory with data. We also observe how theory can sometimes offset noisy data and, in other cases, data can make up for poor context. Although this approach does not outperform established SR programs where target equations are more complex, LLMs can nonetheless iterate toward improved solutions while following instructions and incorporating scientific context in natural language.

Updated: 2025-03-12 13:14:22

标题: 大型语言模型的符号回归中的上下文学习和推理

摘要: 大型语言模型（LLMs）是基于变压器的机器学习模型，已在未经专门训练的任务中表现出卓越的性能。在这里，我们探讨了LLMs在执行符号回归方面的潜力--一种从数据集中找到简单准确方程的机器学习方法。我们提示GPT-4从数据中提出表达式，然后使用外部Python工具对其进行优化和评估。这些结果被反馈给GPT-4，它提出改进的表达式，同时优化复杂度和损失。通过思维链提示，我们指示GPT-4在生成新表达式之前分析数据、先前的表达式和科学背景（用自然语言表达）以解决每个问题。我们在从实验数据中重新发现五个众所周知的科学方程以及一个没有已知方程的附加数据集上评估了工作流程。GPT-4成功地重新发现了所有五个方程，并且在提示使用草稿本和考虑科学背景时表现更好。我们展示了战略提示如何提高模型的性能，以及自然语言界面如何简化理论与数据的整合。我们还观察到理论有时可以抵消嘈杂的数据，而在其他情况下，数据可以弥补糟糕的背景。尽管这种方法没有超越目标方程更复杂的已建立的SR程序，但LLMs仍然可以在遵循指示和将科学背景自然语言融入其中的同时朝着更好的解决方案迭代。

更新时间: 2025-03-12 13:14:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.17448v2

Refining Filter Global Feature Weighting for Fully-Unsupervised Clustering

In the context of unsupervised learning, effective clustering plays a vital role in revealing patterns and insights from unlabeled data. However, the success of clustering algorithms often depends on the relevance and contribution of features, which can differ between various datasets. This paper explores feature weighting for clustering and presents new weighting strategies, including methods based on SHAP (SHapley Additive exPlanations), a technique commonly used for providing explainability in various supervised machine learning tasks. By taking advantage of SHAP values in a way other than just to gain explainability, we use them to weight features and ultimately improve the clustering process itself in unsupervised scenarios. Our empirical evaluations across five benchmark datasets and clustering methods demonstrate that feature weighting based on SHAP can enhance unsupervised clustering quality, achieving up to a 22.69\% improvement over other weighting methods (from 0.586 to 0.719 in terms of the Adjusted Rand Index). Additionally, these situations where the weighted data boosts the results are highlighted and thoroughly explored, offering insight for practical applications.

Updated: 2025-03-12 13:14:09

标题: 优化过滤器全局特征加权用于完全无监督聚类

摘要: 在无监督学习的背景下，有效的聚类在从未标记的数据中揭示模式和见解方面起着至关重要的作用。然而，聚类算法的成功往往取决于特征的相关性和贡献度，在不同数据集之间可能存在差异。本文探讨了聚类的特征加权，并提出了新的加权策略，包括基于SHAP（SHapley Additive exPlanations）的方法，这是一种常用于在各种监督机器学习任务中提供可解释性的技术。通过利用SHAP值的方式不仅仅是为了获得可解释性，我们利用它们来加权特征，并最终改善无监督场景中的聚类过程。我们在五个基准数据集和聚类方法上进行的实证评估表明，基于SHAP的特征加权可以提高无监督聚类的质量，相对于其他加权方法，可以实现高达22.69%的改进（从调整兰德指数0.586到0.719）。此外，强调并深入探讨了加权数据提升结果的情况，为实际应用提供了见解。

更新时间: 2025-03-12 13:14:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11706v1

Membership Inference Attacks fueled by Few-Short Learning to detect privacy leakage tackling data integrity

Deep learning models have an intrinsic privacy issue as they memorize parts of their training data, creating a privacy leakage. Membership Inference Attacks (MIA) exploit it to obtain confidential information about the data used for training, aiming to steal information. They can be repurposed as a measurement of data integrity by inferring whether it was used to train a machine learning model. While state-of-the-art attacks achieve a significant privacy leakage, their requirements are not feasible enough, hindering their role as practical tools to assess the magnitude of the privacy risk. Moreover, the most appropriate evaluation metric of MIA, the True Positive Rate at low False Positive Rate lacks interpretability. We claim that the incorporation of Few-Shot Learning techniques to the MIA field and a proper qualitative and quantitative privacy evaluation measure should deal with these issues. In this context, our proposal is twofold. We propose a Few-Shot learning based MIA, coined as the FeS-MIA model, which eases the evaluation of the privacy breach of a deep learning model by significantly reducing the number of resources required for the purpose. Furthermore, we propose an interpretable quantitative and qualitative measure of privacy, referred to as Log-MIA measure. Jointly, these proposals provide new tools to assess the privacy leakage and to ease the evaluation of the training data integrity of deep learning models, that is, to analyze the privacy breach of a deep learning model. Experiments carried out with MIA over image classification and language modeling tasks and its comparison to the state-of-the-art show that our proposals excel at reporting the privacy leakage of a deep learning model with little extra information.

Updated: 2025-03-12 13:09:43

标题: 由少样本学习推动的成员推断攻击：检测隐私泄漏以应对数据完整性

摘要: 深度学习模型存在固有的隐私问题，因为它们会记住部分训练数据，从而造成隐私泄露。成员推断攻击(MIA)利用这一点来获取有关用于训练的数据的机密信息，旨在窃取信息。它们可以重新用作数据完整性的衡量，通过推断是否被用来训练机器学习模型。尽管最先进的攻击实现了显著的隐私泄露，但它们的要求还不够可行，阻碍了它们作为评估隐私风险程度的实用工具的作用。此外，MIA最合适的评估指标——在低误报率下的真阳性率缺乏可解释性。我们认为，将Few-Shot Learning技术纳入MIA领域，并采用适当的定性和定量隐私评估指标应该能够解决这些问题。在这种背景下，我们的提议是双重的。我们提出了一种基于Few-Shot learning的MIA模型，称为FeS-MIA模型，通过显著减少所需资源的数量，简化了对深度学习模型隐私泄露的评估。此外，我们提出了一种可解释的定量和定性隐私度量，称为Log-MIA度量。总的来说，这些提议提供了新工具，用于评估隐私泄露，并简化对深度学习模型的训练数据完整性的评估，即分析深度学习模型的隐私泄露。对图像分类和语言建模任务的MIA实验以及与最先进技术的比较表明，我们的提议在用少量额外信息报告深度学习模型的隐私泄露方面表现出色。

更新时间: 2025-03-12 13:09:43

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2503.09365v1

Towards Graph Foundation Models: A Transferability Perspective

In recent years, Graph Foundation Models (GFMs) have gained significant attention for their potential to generalize across diverse graph domains and tasks. Some works focus on Domain-Specific GFMs, which are designed to address a variety of tasks within a specific domain, while others aim to create General-Purpose GFMs that extend the capabilities of domain-specific models to multiple domains. Regardless of the type, transferability is crucial for applying GFMs across different domains and tasks. However, achieving strong transferability is a major challenge due to the structural, feature, and distributional variations in graph data. To date, there has been no systematic research examining and analyzing GFMs from the perspective of transferability. To bridge the gap, we present the first comprehensive taxonomy that categorizes and analyzes existing GFMs through the lens of transferability, structuring GFMs around their application scope (domain-specific vs. general-purpose) and their approaches to knowledge acquisition and transfer. We provide a structured perspective on current progress and identify potential pathways for advancing GFM generalization across diverse graph datasets and tasks. We aims to shed light on the current landscape of GFMs and inspire future research directions in GFM development.

Updated: 2025-03-12 13:04:05

标题: 走向图基础模型：一个可转移性的视角

摘要: 近年来，图形基础模型(GFMs)因其潜力在不同图形领域和任务之间进行泛化而受到重视。一些研究专注于领域特定的GFMs，旨在解决特定领域内的各种任务，而其他一些研究致力于创建通用型GFMs，将领域特定模型的能力拓展到多个领域。无论是哪种类型，传递性对于将GFMs应用到不同领域和任务中至关重要。然而，由于图形数据的结构、特征和分布变化，实现强大的传递性是一个主要挑战。迄今为止，还没有系统的研究从传递性的角度审视和分析GFMs。为了弥补这一差距，我们提出了第一个全面的分类法，通过传递性的视角对现有的GFMs进行分类和分析，将GFMs围绕其应用范围(领域特定vs.通用型)和知识获取与传递的方法进行结构化。我们为当前进展提供了一个结构化的视角，并确定了推动GFMs在不同图形数据集和任务中泛化的潜在途径。我们的目标是揭示GFMs的当前格局，并激发GFMs发展中的未来研究方向。

更新时间: 2025-03-12 13:04:05

领域: cs.LG

下载: http://arxiv.org/abs/2503.09363v1

On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions

Modern datasets span billions of samples, making training on all available data infeasible. Selecting a high quality subset helps in reducing training costs and enhancing model quality. Submodularity, a discrete analogue of convexity, is commonly used for solving such subset selection problems. However, existing algorithms for optimizing submodular functions are sequential, and the prior distributed methods require at least one central machine to fit the target subset in DRAM. At billion datapoint scale, even the subset may not fit a single machine, and the sequential algorithms are prohibitively slow. In this paper, we relax the requirement of having a central machine for the target subset by proposing a novel distributed bounding algorithm with provable approximation guarantees. The algorithm iteratively bounds the minimum and maximum utility values to select high quality points and discard the unimportant ones. When bounding does not find the complete subset, we use a multi-round, partition-based distributed greedy algorithm to identify the remaining subset. We discuss how to implement these algorithms in a distributed data processing framework and empirically analyze different configurations. We find high quality subsets on CIFAR-100 and ImageNet with marginal or no loss in quality compared to centralized methods, and scale to a dataset with 13 billion points.

Updated: 2025-03-12 13:02:27

标题: 关于利用成对次模函数进行分布式大于内存子集选择的研究

摘要: 现代数据集涵盖数十亿个样本，因此无法对所有可用数据进行训练。选择高质量的子集有助于减少训练成本并提高模型质量。子模性，离散凸性的类比，通常用于解决这种子集选择问题。然而，现有的优化子模性函数的算法是顺序的，先前的分布式方法需要至少一个中央机器来适应DRAM中的目标子集。在数十亿数据点的规模下，甚至子集也可能无法适应单个机器，顺序算法速度过慢。在本文中，我们通过提出一种具有可证估计保证的新型分布式边界算法，放宽了对于目标子集需要中央机器的要求。该算法通过迭代限定最小和最大效用值来选择高质量点并丢弃不重要的点。当限定未找到完整子集时，我们使用多轮、基于分区的分布式贪心算法来识别剩余子集。我们讨论如何在分布式数据处理框架中实现这些算法，并通过实证分析不同配置。我们在CIFAR-100和ImageNet上找到了高质量子集，与集中方法相比，质量几乎没有损失，且扩展到具有130亿数据点的数据集。

更新时间: 2025-03-12 13:02:27

领域: cs.LG,cs.AI,cs.CV,cs.DC,math.OC

下载: http://arxiv.org/abs/2402.16442v2

RetSTA: An LLM-Based Approach for Standardizing Clinical Fundus Image Reports

Standardization of clinical reports is crucial for improving the quality of healthcare and facilitating data integration. The lack of unified standards, including format, terminology, and style, is a great challenge in clinical fundus diagnostic reports, which increases the difficulty for large language models (LLMs) to understand the data. To address this, we construct a bilingual standard terminology, containing fundus clinical terms and commonly used descriptions in clinical diagnosis. Then, we establish two models, RetSTA-7B-Zero and RetSTA-7B. RetSTA-7B-Zero, fine-tuned on an augmented dataset simulating clinical scenarios, demonstrates powerful standardization behaviors. However, it encounters a challenge of limitation to cover a wider range of diseases. To further enhance standardization performance, we build RetSTA-7B, which integrates a substantial amount of standardized data generated by RetSTA-7B-Zero along with corresponding English data, covering diverse complex clinical scenarios and achieving report-level standardization for the first time. Experimental results demonstrate that RetSTA-7B outperforms other compared LLMs in bilingual standardization task, which validates its superior performance and generalizability. The checkpoints are available at https://github.com/AB-Story/RetSTA-7B.

Updated: 2025-03-12 13:00:57

标题: RetSTA：一种基于LLM的方法，用于标准化临床眼底图像报告

摘要: 临床报告的标准化对于改善卫生保健质量和促进数据整合至关重要。统一标准的缺乏，包括格式、术语和风格，在临床眼底诊断报告中是一个巨大的挑战，这增加了大型语言模型（LLMs）理解数据的难度。为了解决这个问题，我们构建了一个双语标准术语，包含眼底临床术语和临床诊断中常用的描述。然后，我们建立了两个模型，RetSTA-7B-Zero和RetSTA-7B。RetSTA-7B-Zero在模拟临床场景的扩充数据集上微调，展现了强大的标准化行为。然而，它面临着覆盖更广泛疾病范围的挑战。为了进一步提高标准化性能，我们构建了RetSTA-7B，它集成了由RetSTA-7B-Zero生成的大量标准化数据以及相应的英文数据，涵盖了多样化的复杂临床场景，首次实现了报告级别的标准化。实验结果表明，RetSTA-7B在双语标准化任务中胜过其他比较的LLMs，验证了其优越的性能和通用性。检查点可在https://github.com/AB-Story/RetSTA-7B找到。

更新时间: 2025-03-12 13:00:57

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.09358v1

Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and pipeline-have been successfully implemented for popular neural networks on main-stream hardware, optimizing the distributed deployment schedule requires extensive expertise and manual effort. Further more, while existing frameworks with most simple chain-like structures, they struggle with complex non-linear architectures. Mixture-of-experts and multi-modal models feature intricate MIMO and branch-rich topologies that require fine-grained operator-level parallelization beyond the capabilities of existing frameworks. We propose formulating parallelism planning as a scheduling optimization problem using mixed-integer programming. We propose a bi-level solution framework balancing optimality with computational efficiency, automatically generating effective distributed plans that capture both the heterogeneous structure of modern neural networks and the underlying hardware constraints. In experiments comparing against expert-designed strategies like DeepSeek's DualPipe, our framework achieves comparable or superior performance, reducing computational bubbles by half under the same memory constraints. The framework's versatility extends beyond throughput optimization to incorporate hardware utilization maximization, memory capacity constraints, and other considerations or potential strategies. Such capabilities position our solution as both a valuable research tool for exploring optimal parallelization strategies and a practical industrial solution for large-scale AI deployment.

Updated: 2025-03-12 13:00:29

标题: 分布式深度学习的自动操作员级并行规划——混合整数规划方法

摘要: 随着人工智能社区进入拥有数十亿参数的大型模型时代，分布式训练和推断变得至关重要。虽然各种并行策略-数据、模型、序列和流水线-已成功应用于主流硬件上的流行神经网络，但优化分布式部署时间表需要广泛的专业知识和手动努力。此外，现有框架虽然具有最简单的链状结构，但在处理复杂的非线性架构时仍然存在困难。专家设计的深度搜索的双管道等现有框架在与我们的框架相比较的实验中，我们的框架在相同的内存约束下将计算泡沫减少了一半。该框架的多功能性不仅限于吞吐量优化，还包括硬件利用率最大化、内存容量约束和其他考虑因素或潜在策略的整合。这些能力使我们的解决方案既成为探索最佳并行化策略的有价值的研究工具，又成为大规模人工智能部署的实际工业解决方案。

更新时间: 2025-03-12 13:00:29

领域: cs.LG,cs.AI,cs.DC,cs.DM

下载: http://arxiv.org/abs/2503.09357v1

Energy Dissipation Preserving Physics Informed Neural Network for Allen-Cahn Equations

This paper investigates a numerical solution of Allen-Cahn equation with constant and degenerate mobility, with polynomial and logarithmic energy functionals, with deterministic and random initial functions, and with advective term in one, two, and three spatial dimensions, based on the physics-informed neural network (PINN). To improve the learning capacity of the PINN, we incorporate the energy dissipation property of the Allen-Cahn equation as a penalty term into the loss function of the network. To facilitate the learning process of random initials, we employ a continuous analogue of the initial random condition by utilizing the Fourier series expansion. Adaptive methods from traditional numerical analysis are also integrated to enhance the effectiveness of the proposed PINN. Numerical results indicate a consistent decrease in the discrete energy, while also revealing phenomena such as phase separation and metastability.

Updated: 2025-03-12 12:50:29

标题: 保持能量耗散的物理信息神经网络用于Allen-Cahn方程

摘要: 本文研究了具有常数和退化迁移率的Allen-Cahn方程的数值解，具有多项式和对数能量泛函，具有确定性和随机初始函数，以及在一维、二维和三维空间维度中带有对流项，基于物理信息神经网络（PINN）。为了提高PINN的学习能力，我们将Allen-Cahn方程的能量耗散特性作为惩罚项纳入网络的损失函数中。为了促进随机初始条件的学习过程，我们利用傅立叶级数展开的连续模拟初始随机条件。传统数值分析中的自适应方法也被整合进来，以增强所提出的PINN的有效性。数值结果表明离散能量持续减少，同时还揭示了相分离和亚稳现象等现象。

更新时间: 2025-03-12 12:50:29

领域: math.NA,cs.LG,cs.NA,physics.comp-ph

下载: http://arxiv.org/abs/2411.08760v2

MOAT: Evaluating LMMs for Capability Integration and Instruction Grounding

Large multimodal models (LMMs) have demonstrated significant potential as generalists in vision-language (VL) tasks. However, there remains a significant gap between state-of-the-art LMMs and human performance when it comes to complex tasks that require a combination of fundamental VL capabilities, as well as tasks involving the grounding of complex instructions. To thoroughly investigate the human-LMM gap and its underlying causes, we propose MOAT, a diverse benchmark with complex real-world VL tasks that are challenging for LMMs. Specifically, the tasks in MOAT require LMMs to engage in generalist problem solving by integrating fundamental VL capabilities such as reading text, counting, understanding spatial relations, grounding textual and visual instructions, etc. All these abilities fit into a taxonomy proposed by us that contains 10 fundamental VL capabilities, enabling MOAT to provide a fine-grained view of LMMs' strengths and weaknesses. Besides, MOAT is the first benchmark to explicitly evaluate LMMs' ability to ground complex text and visual instructions, which is essential to many real-world applications. We evaluate over 20 proprietary and open source LMMs, as well as humans, on MOAT, and found that humans achieved 82.7% accuracy while the best performing LMM (OpenAI o1) achieved only 38.8%. To guide future model development, we analyze common trends in our results and discuss the underlying causes of observed performance gaps between LMMs and humans, focusing on which VL capability forms the bottleneck in complex tasks, whether test time scaling improves performance on MOAT, and how tiling harms LMMs' capability to count. Code and data are available at https://cambrian-yzt.github.io/MOAT.

Updated: 2025-03-12 12:49:31

标题: MOAT：评估用于能力整合和指导基础的LMMs

摘要: 大型多模态模型（LMMs）在视觉语言（VL）任务中展示了重要的潜力作为通才。然而，在涉及需要基本VL能力组合的复杂任务以及涉及复杂指令基础的任务方面，最先进的LMMs与人类表现之间仍存在显著差距。为了彻底调查人类-LMM差距及其潜在原因，我们提出了MOAT，一个具有挑战性的复杂现实世界VL任务的多样化基准。具体来说，MOAT中的任务要求LMMs通过整合基本VL能力，如阅读文本、计数、理解空间关系、基于文本和视觉指令等，从事通才问题求解。所有这些能力符合我们提出的包含10种基本VL能力的分类法，使MOAT能够提供对LMMs的优势和劣势的细致视图。此外，MOAT是第一个明确评估LMMs地面复杂文本和视觉指令能力的基准，这对许多实际应用至关重要。我们在MOAT上评估了20多个专有和开源的LMMs以及人类，并发现人类的准确率达到了82.7%，而表现最佳的LMM（OpenAI o1）仅达到了38.8%。为了指导未来的模型开发，我们分析了结果中的共同趋势，并讨论了LMMs和人类之间观察到的性能差距的潜在原因，重点关注在复杂任务中哪种VL能力形成瓶颈，测试时间缩放是否提高了MOAT上的性能，以及平铺如何损害LMMs的计数能力。代码和数据可在https://cambrian-yzt.github.io/MOAT 上获取。

更新时间: 2025-03-12 12:49:31

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.09348v1

Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts

Large Language Models (LLMs) are increasingly employed as automated evaluators to assess the safety of generated content, yet their reliability in this role remains uncertain. This study evaluates a diverse set of 11 LLM judge models across critical safety domains, examining three key aspects: self-consistency in repeated judging tasks, alignment with human judgments, and susceptibility to input artifacts such as apologetic or verbose phrasing. Our findings reveal that biases in LLM judges can significantly distort the final verdict on which content source is safer, undermining the validity of comparative evaluations. Notably, apologetic language artifacts alone can skew evaluator preferences by up to 98\%. Contrary to expectations, larger models do not consistently exhibit greater robustness, while smaller models sometimes show higher resistance to specific artifacts. To mitigate LLM evaluator robustness issues, we investigate jury-based evaluations aggregating decisions from multiple models. Although this approach both improves robustness and enhances alignment to human judgements, artifact sensitivity persists even with the best jury configurations. These results highlight the urgent need for diversified, artifact-resistant methodologies to ensure reliable safety assessments.

Updated: 2025-03-12 12:49:02

标题: 更安全还是更幸运？LLMs作为安全评估器对人为因素不具有鲁棒性

摘要: 大型语言模型（LLMs）越来越被用作自动生成内容安全性评估的自动评估器，然而它们在这一角色中的可靠性仍然不确定。本研究评估了一个多样化的11个LLM评判模型集合，跨越关键的安全领域，考察了三个关键方面：在重复评判任务中的自我一致性，与人类判断的一致性，以及对输入文本中道歉或冗长措辞等工件的敏感性。我们的研究结果表明，LLM评判中的偏见可以显著扭曲对哪个内容来源更安全的最终判决，削弱了比较评估的有效性。值得注意的是，仅凭道歉性语言工件就可以将评估者偏好扭曲高达98\%。与预期相反，更大的模型并不一致地表现出更大的稳健性，而较小的模型有时展现出对特定工件更高的抵抗力。为了减轻LLM评估器的稳健性问题，我们调查了基于陪审团的评估方法，汇总来自多个模型的决策。虽然这种方法既提高了稳健性又增强了与人类判断的一致性，但即使在最佳的陪审团配置下，对工件的敏感性仍然存在。这些结果突显了对多样化、抗工件方法的迫切需求，以确保可靠的安全评估。

更新时间: 2025-03-12 12:49:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.09347v1

A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning

Reinforcement learning (RL)-based fine-tuning has emerged as a powerful approach for aligning diffusion models with black-box objectives. Proximal policy optimization (PPO) is the most popular choice of method for policy optimization. While effective in terms of performance, PPO is highly sensitive to hyper-parameters and involves substantial computational overhead. REINFORCE, on the other hand, mitigates some computational complexities such as high memory overhead and sensitive hyper-parameter tuning, but has suboptimal performance due to high-variance and sample inefficiency. While the variance of the REINFORCE can be reduced by sampling multiple actions per input prompt and using a baseline correction term, it still suffers from sample inefficiency. To address these challenges, we systematically analyze the efficiency-effectiveness trade-off between REINFORCE and PPO, and propose leave-one-out PPO (LOOP), a novel RL for diffusion fine-tuning method. LOOP combines variance reduction techniques from REINFORCE, such as sampling multiple actions per input prompt and a baseline correction term, with the robustness and sample efficiency of PPO via clipping and importance sampling. Our results demonstrate that LOOP effectively improves diffusion models on various black-box objectives, and achieves a better balance between computational efficiency and performance.

Updated: 2025-03-12 12:43:07

标题: 一种简单有效的强化学习方法，用于文本到图像扩散微调

摘要: 基于强化学习（RL）的微调已经成为将扩散模型与黑盒目标对齐的强大方法。近端策略优化（PPO）是策略优化的最受欢迎的方法。虽然在性能方面有效，但PPO对超参数非常敏感，并且涉及大量计算开销。另一方面，REINFORCE可以减轻一些计算复杂性，比如高内存开销和敏感的超参数调整，但由于高方差和样本效率低而表现不佳。虽然通过对每个输入提示进行多次抽样和使用基准校正项可以减少REINFORCE的方差，但它仍然存在样本效率低的问题。为了解决这些挑战，我们系统地分析了REINFORCE和PPO之间的效率-有效性权衡，并提出了一种新颖的RL扩散微调方法leave-one-out PPO（LOOP）。LOOP将REINFORCE的方差减少技术（如对每个输入提示进行多次抽样和基准校正项）与PPO的稳健性和样本效率相结合，通过剪切和重要性抽样。我们的结果表明，LOOP有效地改进了各种黑盒目标上的扩散模型，并在计算效率和性能之间取得了更好的平衡。

更新时间: 2025-03-12 12:43:07

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.00897v4

Mixture of Experts for Node Classification

Nodes in the real-world graphs exhibit diverse patterns in numerous aspects, such as degree and homophily. However, most existent node predictors fail to capture a wide range of node patterns or to make predictions based on distinct node patterns, resulting in unsatisfactory classification performance. In this paper, we reveal that different node predictors are good at handling nodes with specific patterns and only apply one node predictor uniformly could lead to suboptimal result. To mitigate this gap, we propose a mixture of experts framework, MoE-NP, for node classification. Specifically, MoE-NP combines a mixture of node predictors and strategically selects models based on node patterns. Experimental results from a range of real-world datasets demonstrate significant performance improvements from MoE-NP.

Updated: 2025-03-12 12:33:46

标题: 专家混合用于节点分类

摘要: 在现实世界的图中，节点在许多方面表现出不同的模式，比如度和同质性。然而，大多数现有的节点预测器无法捕捉各种节点模式，也无法基于不同的节点模式进行预测，导致分类性能不佳。在本文中，我们发现不同的节点预测器擅长处理具有特定模式的节点，仅应用一个节点预测器可能导致次优结果。为了弥补这一差距，我们提出了一个节点分类的专家混合框架MoE-NP。具体而言，MoE-NP结合了一组节点预测器，并根据节点模式来策略性地选择模型。来自一系列现实世界数据集的实验结果显示，MoE-NP带来了显著的性能提升。

更新时间: 2025-03-12 12:33:46

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2412.00418v2

Online multidimensional dictionary learning

Dictionary learning is a widely used technique in signal processing and machine learning that aims to represent data as a linear combination of a few elements from an overcomplete dictionary. In this work, we propose a generalization of the dictionary learning technique using the t-product framework, enabling efficient handling of multidimensional tensor data. We address the dictionary learning problem through online methods suitable for tensor structures. To effectively address the sparsity problem, we utilize an accelerated Iterative Shrinkage-Thresholding Algorithm (ISTA) enhanced with an extrapolation technique known as Anderson acceleration. This approach significantly improves signal reconstruction results. Extensive experiments prove that our proposed method outperforms existing acceleration techniques, particularly in applications such as data completion. These results suggest that our approach can be highly beneficial for large-scale tensor data analysis in various domains.

Updated: 2025-03-12 12:31:29

标题: 在线多维字典学习

摘要: 字典学习是信号处理和机器学习中广泛使用的技术，旨在将数据表示为来自过完备字典的少数元素的线性组合。在这项工作中，我们提出了一种利用t-乘法框架的字典学习技术的泛化方法，实现对多维张量数据的高效处理。我们通过适用于张量结构的在线方法来解决字典学习问题。为了有效解决稀疏性问题，我们利用一种加速的迭代收缩阈值算法（ISTA），并加入一种称为Anderson加速的外推技术。这种方法显著改善了信号重建结果。大量实验证明，我们提出的方法优于现有的加速技术，特别是在数据完成等应用中。这些结果表明，我们的方法可以在各个领域的大规模张量数据分析中带来极大的益处。

更新时间: 2025-03-12 12:31:29

领域: math.NA,cs.LG,cs.NA,15A69, 15A72, 15A83

下载: http://arxiv.org/abs/2503.09337v1

NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language Model

Effective Human-Robot Interaction (HRI) is crucial for future service robots in aging societies. Existing solutions are biased toward only well-trained objects, creating a gap when dealing with new objects. Currently, HRI systems using predefined gestures or language tokens for pretrained objects pose challenges for all individuals, especially elderly ones. These challenges include difficulties in recalling commands, memorizing hand gestures, and learning new names. This paper introduces NVP-HRI, an intuitive multi-modal HRI paradigm that combines voice commands and deictic posture. NVP-HRI utilizes the Segment Anything Model (SAM) to analyze visual cues and depth data, enabling precise structural object representation. Through a pre-trained SAM network, NVP-HRI allows interaction with new objects via zero-shot prediction, even without prior knowledge. NVP-HRI also integrates with a large language model (LLM) for multimodal commands, coordinating them with object selection and scene distribution in real time for collision-free trajectory solutions. We also regulate the action sequence with the essential control syntax to reduce LLM hallucination risks. The evaluation of diverse real-world tasks using a Universal Robot showcased up to 59.2\% efficiency improvement over traditional gesture control, as illustrated in the video https://youtu.be/EbC7al2wiAc. Our code and design will be openly available at https://github.com/laiyuzhi/NVP-HRI.git.

Updated: 2025-03-12 12:30:18

标题: NVP-HRI：零射击自然语音和基于姿势的人机交互通过大语言模型

摘要: 有效的人机交互（HRI）对于未来服务机器人在老龄化社会中至关重要。现有的解决方案偏向于只针对经过良好训练的对象，当处理新对象时会出现差距。目前，使用预定义手势或语言令牌的HRI系统针对经过预先训练的对象存在挑战，尤其是对于老年人。这些挑战包括难以回忆命令、记住手势和学习新名称。本文介绍了NVP-HRI，这是一种直观的多模式HRI范式，结合了语音命令和示指姿势。NVP-HRI利用分割任何模型（SAM）分析视觉线索和深度数据，实现精确的结构化物体表示。通过预先训练的SAM网络，NVP-HRI允许通过零射预测与新对象互动，即使没有先前知识。NVP-HRI还与大型语言模型（LLM）集成，用于多模式命令，在实时协调对象选择和场景分布以实现无碰撞轨迹解决方案。我们还通过基本控制语法调节动作序列，以减少LLM幻觉风险。通过使用通用机器人进行多样化的真实世界任务评估，NVP-HRI显示出相对传统手势控制高达59.2\%的效率提升，如视频https://youtu.be/EbC7al2wiAc所示。我们的代码和设计将在https://github.com/laiyuzhi/NVP-HRI.git上开放获取。

更新时间: 2025-03-12 12:30:18

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.09335v1

CyberLLMInstruct: A New Dataset for Analysing Safety of Fine-Tuned LLMs Using Cyber Security Data

The integration of large language models (LLMs) into cyber security applications presents significant opportunities, such as enhancing threat analysis and malware detection, but can also introduce critical risks and safety concerns, including personal data leakage and automated generation of new malware. To address these challenges, we developed CyberLLMInstruct, a dataset of 54,928 instruction-response pairs spanning cyber security tasks such as malware analysis, phishing simulations, and zero-day vulnerabilities. The dataset was constructed through a multi-stage process. This involved sourcing data from multiple resources, filtering and structuring it into instruction-response pairs, and aligning it with real-world scenarios to enhance its applicability. Seven open-source LLMs were chosen to test the usefulness of CyberLLMInstruct: Phi 3 Mini 3.8B, Mistral 7B, Qwen 2.5 7B, Llama 3 8B, Llama 3.1 8B, Gemma 2 9B, and Llama 2 70B. In our primary example, we rigorously assess the safety of fine-tuned models using the OWASP top 10 framework, finding that fine-tuning reduces safety resilience across all tested LLMs and every adversarial attack (e.g., the security score of Llama 3.1 8B against prompt injection drops from 0.95 to 0.15). In our second example, we show that these same fine-tuned models can also achieve up to 92.50 percent accuracy on the CyberMetric benchmark. These findings highlight a trade-off between performance and safety, showing the importance of adversarial testing and further research into fine-tuning methodologies that can mitigate safety risks while still improving performance across diverse datasets and domains. All scripts required to reproduce the dataset, along with examples and relevant resources for replicating our results, will be made available upon the paper's acceptance.

Updated: 2025-03-12 12:29:27

标题: 《CyberLLMInstruct：一个用于使用网络安全数据分析精调LLM安全性的新数据集》

摘要: 将大型语言模型（LLM）集成到网络安全应用程序中提供了重大机遇，如增强威胁分析和恶意软件检测，但也可能引入关键风险和安全问题，包括个人数据泄露和自动生成新恶意软件。为解决这些挑战，我们开发了CyberLLMInstruct，一个包含54,928个指令-响应对的数据集，涵盖了网络安全任务，如恶意软件分析、钓鱼模拟和零日漏洞。该数据集通过多阶段过程构建而成，包括从多个资源获取数据，对其进行过滤和结构化成指令-响应对，并将其与真实场景对齐以增强适用性。我们选择了七个开源LLM来测试CyberLLMInstruct的实用性：Phi 3 Mini 3.8B、Mistral 7B、Qwen 2.5 7B、Llama 3 8B、Llama 3.1 8B、Gemma 2 9B和Llama 2 70B。在我们的主要示例中，我们使用OWASP十大框架严格评估了微调模型的安全性，发现微调会降低所有测试的LLM的安全弹性以及每个对抗性攻击（例如，Llama 3.1 8B对于提示注入的安全评分从0.95降至0.15）。在我们的第二个示例中，我们展示这些相同的微调模型在CyberMetric基准测试上也能达到高达92.50%的准确率。这些发现突显了性能和安全之间的权衡，显示了对抗性测试的重要性，以及进一步研究微调方法论的重要性，这些方法可以在改进各种数据集和领域的性能的同时减轻安全风险。在论文被接受后，将提供用于再现数据集的所有脚本，以及用于复制我们结果的示例和相关资源。

更新时间: 2025-03-12 12:29:27

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2503.09334v1

SDD-4DGS: Static-Dynamic Aware Decoupling in Gaussian Splatting for 4D Scene Reconstruction

Dynamic and static components in scenes often exhibit distinct properties, yet most 4D reconstruction methods treat them indiscriminately, leading to suboptimal performance in both cases. This work introduces SDD-4DGS, the first framework for static-dynamic decoupled 4D scene reconstruction based on Gaussian Splatting. Our approach is built upon a novel probabilistic dynamic perception coefficient that is naturally integrated into the Gaussian reconstruction pipeline, enabling adaptive separation of static and dynamic components. With carefully designed implementation strategies to realize this theoretical framework, our method effectively facilitates explicit learning of motion patterns for dynamic elements while maintaining geometric stability for static structures. Extensive experiments on five benchmark datasets demonstrate that SDD-4DGS consistently outperforms state-of-the-art methods in reconstruction fidelity, with enhanced detail restoration for static structures and precise modeling of dynamic motions. The code will be released.

Updated: 2025-03-12 12:25:58

标题: SDD-4DGS: 用于4D场景重建的高斯飞溅中的静态动态感知解耦

摘要: 场景中的动态和静态组件通常表现出不同的特性，然而大多数4D重建方法都会将它们混为一谈，导致在两种情况下性能都不佳。本文介绍了基于高斯喷涂的静态动态解耦4D场景重建的第一个框架SDD-4DGS。我们的方法建立在一种新颖的概率动态感知系数上，它自然地集成到高斯重建流程中，实现了静态和动态组件的自适应分离。通过精心设计的实现策略来实现这一理论框架，我们的方法有效地促进了对动态元素运动模式的明确学习，同时保持了静态结构的几何稳定性。对五个基准数据集进行的广泛实验表明，SDD-4DGS在重建保真度方面始终优于最先进的方法，对静态结构的细节恢复和动态运动的精确建模也有所提升。代码将被发布。

更新时间: 2025-03-12 12:25:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09332v1

Group-robust Machine Unlearning

Machine unlearning is an emerging paradigm to remove the influence of specific training data (i.e., the forget set) from a model while preserving its knowledge of the rest of the data (i.e., the retain set). Previous approaches assume the forget data to be uniformly distributed from all training datapoints. However, if the data to unlearn is dominant in one group, we empirically show that performance for this group degrades, leading to fairness issues. This work tackles the overlooked problem of non-uniformly distributed forget sets, which we call group-robust machine unlearning, by presenting a simple, effective strategy that mitigates the performance loss in dominant groups via sample distribution reweighting. Moreover, we present MIU (Mutual Information-aware Machine Unlearning), the first approach for group robustness in approximate machine unlearning. MIU minimizes the mutual information between model features and group information, achieving unlearning while reducing performance degradation in the dominant group of the forget set. Additionally, MIU exploits sample distribution reweighting and mutual information calibration with the original model to preserve group robustness. We conduct experiments on three datasets and show that MIU outperforms standard methods, achieving unlearning without compromising model robustness. Source code available at https://github.com/tdemin16/group-robust_machine_unlearning.

Updated: 2025-03-12 12:24:05

标题: 团体鲁棒的机器遗忘

摘要: 机器遗忘是一种新兴的范式，可以从模型中删除特定训练数据（即忘记集）的影响，同时保留其对其余数据（即保留集）的知识。先前的方法假设要遗忘的数据在所有训练数据点中均匀分布。然而，如果要遗忘的数据在某一组中占主导地位，我们经验证明该组的性能会下降，导致公平性问题。本文解决了被忽视的非均匀分布的遗忘集问题，我们称之为组鲁棒机器遗忘，通过提出一种简单有效的策略，通过样本分布重新加权减轻主导组中的性能损失。此外，我们提出了MIU（相互信息感知机器遗忘），这是近似机器遗忘中的第一种组鲁棒性方法。MIU最小化模型特征和组信息之间的互信息，实现遗忘同时减少忘记集中主导组的性能下降。此外，MIU利用样本分布重新加权和互信息校准与原始模型以保持组鲁棒性。我们在三个数据集上进行实验，并展示MIU优于标准方法，实现遗忘而不损害模型的鲁棒性。源代码可在https://github.com/tdemin16/group-robust_machine_unlearning找到。

更新时间: 2025-03-12 12:24:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.09330v1

Energy Optimized Piecewise Polynomial Approximation Utilizing Modern Machine Learning Optimizers

This work explores an extension of ML-optimized piecewise polynomial approximation by incorporating energy optimization as an additional objective. Traditional closed-form solutions enable continuity and approximation targets but lack flexibility in accommodating complex optimization goals. By leveraging modern gradient descent optimizers within TensorFlow, we introduce a framework that minimizes total curvature in cam profiles, leading to smoother motion and reduced energy consumption for input data that is unfavorable for sole approximation and continuity optimization. Experimental results confirm the effectiveness of this approach, demonstrating its potential to improve efficiency in scenarios where input data is noisy or suboptimal for conventional methods.

Updated: 2025-03-12 12:23:15

标题: 能源优化的分段多项式逼近利用现代机器学习优化器

摘要: 这项工作探讨了通过将能量优化作为额外目标，扩展ML优化的分段多项式逼近。传统的封闭形式解决方案可以实现连续性和逼近目标，但缺乏适应复杂优化目标的灵活性。通过利用TensorFlow中的现代梯度下降优化器，我们引入了一个框架，该框架最小化凸轮廓中的总曲率，从而使运动更加平滑，同时减少输入数据的能量消耗，这些数据对于单独逼近和连续性优化并不理想。实验结果证实了这种方法的有效性，展示了它在输入数据嘈杂或不够优化的情况下改善效率的潜力。

更新时间: 2025-03-12 12:23:15

领域: cs.LG

下载: http://arxiv.org/abs/2503.09329v1

Heuristic-Based Address Clustering in Cardano Blockchain

Blockchain technology has recently gained widespread popularity as a practical method of storing immutable data while preserving the privacy of users by anonymizing their real identities. This anonymization approach, however, significantly complicates the analysis of blockchain data. To address this problem, heuristic-based clustering algorithms as an effective way of linking all addresses controlled by the same entity have been presented in the literature. In this paper, considering the particular features of the Extended Unspent Transaction Outputs accounting model introduced by the Cardano blockchain, two new clustering heuristics are proposed for clustering the Cardano payment addresses. Applying these heuristics and employing the UnionFind algorithm, we efficiently cluster all the addresses that have appeared on the Cardano blockchain from September 2017 to January 2023, where each cluster represents a distinct entity. The results show that each medium-sized entity in the Cardano network owns and controls 9.67 payment addresses on average. The results also confirm that a power law distribution is fitted to the distribution of entity sizes recognized using our proposed heuristics.

Updated: 2025-03-12 12:22:26

标题: 基于启发式的地址聚类在Cardano区块链中

摘要: 区块链技术最近因其作为一种存储不可变数据的实际方法而广受欢迎，同时通过匿名化用户的真实身份来保护用户的隐私。然而，这种匿名化方法显著复杂化了对区块链数据的分析。为了解决这个问题，文献中提出了基于启发式的聚类算法作为将同一实体控制的所有地址联系起来的有效方式。在本文中，考虑到卡尔达诺区块链引入的扩展未花费交易输出会计模型的特定特征，提出了两种新的聚类启发式算法用于聚类卡尔达诺支付地址。应用这些启发式算法并利用UnionFind算法，我们有效地聚类了自2017年9月至2023年1月出现在卡尔达诺区块链上的所有地址，其中每个聚类代表一个不同的实体。结果显示，卡尔达诺网络中每个中等规模的实体平均拥有和控制9.67个支付地址。结果还证实，使用我们提出的启发式算法识别出的实体大小分布适合幂律分布。

更新时间: 2025-03-12 12:22:26

领域: cs.CR

下载: http://arxiv.org/abs/2503.09327v1

A Survey on Enhancing Causal Reasoning Ability of Large Language Models

Large language models (LLMs) have recently shown remarkable performance in language tasks and beyond. However, due to their limited inherent causal reasoning ability, LLMs still face challenges in handling tasks that require robust causal reasoning ability, such as health-care and economic analysis. As a result, a growing body of research has focused on enhancing the causal reasoning ability of LLMs. Despite the booming research, there lacks a survey to well review the challenges, progress and future directions in this area. To bridge this significant gap, we systematically review literature on how to strengthen LLMs' causal reasoning ability in this paper. We start from the introduction of background and motivations of this topic, followed by the summarisation of key challenges in this area. Thereafter, we propose a novel taxonomy to systematically categorise existing methods, together with detailed comparisons within and between classes of methods. Furthermore, we summarise existing benchmarks and evaluation metrics for assessing LLMs' causal reasoning ability. Finally, we outline future research directions for this emerging field, offering insights and inspiration to researchers and practitioners in the area.

Updated: 2025-03-12 12:20:31

标题: 《关于提高大型语言模型因果推理能力的调查》

摘要: 大型语言模型（LLMs）最近在语言任务和其他领域表现出卓越的性能。然而，由于它们有限的固有因果推理能力，LLMs在处理需要强大因果推理能力的任务，如医疗保健和经济分析方面仍面临挑战。因此，越来越多的研究致力于增强LLMs的因果推理能力。尽管研究蓬勃发展，但缺乏对这一领域中的挑战、进展和未来方向进行全面回顾的调查。为了填补这一重要差距，我们在本文系统地回顾了关于如何增强LLMs因果推理能力的文献。我们从本主题的背景和动机介绍开始，然后总结该领域的主要挑战。此后，我们提出了一种新颖的分类法，以系统地对现有方法进行分类，并进行详细比较。此外，我们总结了用于评估LLMs因果推理能力的现有基准和评估指标。最后，我们概述了这一新兴领域的未来研究方向，为这一领域的研究人员和从业者提供见解和启发。

更新时间: 2025-03-12 12:20:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.09326v1

Towards Robust Model Evolution with Algorithmic Recourse

Algorithmic Recourse is a way for users to modify their attributes to align with a model's expectations, thereby improving their outcomes after receiving unfavorable decisions. In real-world scenarios, users often need to strategically adjust their attributes to compete for limited resources. However, such strategic behavior induces users to "game" algorithms, causing model collapse due to distribution shifts. These shifts arise from user competition, resource constraints, and adaptive user responses. While prior research on Algorithmic Recourse has explored its effects on both systems and users, the impact of resource constraints and competition over time remains underexplored. In this work, we develop a general framework to model user strategic behaviors and their interactions with decision-making systems under resource constraints and competitive dynamics. Through theoretical analysis and empirical evaluation, we identify three key phenomena that arise consistently in both synthetic and real-world datasets: escalating decision boundaries, non-robust model predictions, and inequitable recourse actions. Finally, we discuss the broader social implications of these findings and present two algorithmic strategies aimed at mitigating these challenges.

Updated: 2025-03-12 12:17:34

标题: 朝向具有算法回溯的稳健模型演化

摘要: 算法性补救是用户修改其属性以符合模型期望的一种方式，从而在收到不利决定后改善结果。在现实世界中，用户经常需要战略性地调整他们的属性以竞争有限资源。然而，这种战略行为会导致用户“操纵”算法，从而导致模型崩溃，因为分布发生了变化。这些变化源于用户竞争、资源限制和用户反应的适应性。虽然先前对算法性补救的研究已经探讨了其对系统和用户的影响，但随着时间的推移，资源限制和竞争的影响仍未得到充分探讨。在这项工作中，我们开发了一个通用框架，模拟用户战略行为及其在资源限制和竞争动态下与决策系统的互动。通过理论分析和实证评估，我们发现在合成和真实数据集中一致出现了三个关键现象：不断升高的决策边界、不稳定的模型预测和不公平的补救行动。最后，我们讨论了这些发现的更广泛社会影响，并提出了两种旨在缓解这些挑战的算法策略。

更新时间: 2025-03-12 12:17:34

领域: cs.LG,cs.AI,68T42,I.2.11

下载: http://arxiv.org/abs/2503.09658v1

Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG

Ultra High Resolution (UHR) remote sensing imagery (RSI) (e.g. 100,000 $\times$ 100,000 pixels or more) poses a significant challenge for current Remote Sensing Multimodal Large Language Models (RSMLLMs). If choose to resize the UHR image to standard input image size, the extensive spatial and contextual information that UHR images contain will be neglected. Otherwise, the original size of these images often exceeds the token limits of standard RSMLLMs, making it difficult to process the entire image and capture long-range dependencies to answer the query based on the abundant visual context. In this paper, we introduce ImageRAG for RS, a training-free framework to address the complexities of analyzing UHR remote sensing imagery. By transforming UHR remote sensing image analysis task to image's long context selection task, we design an innovative image contextual retrieval mechanism based on the Retrieval-Augmented Generation (RAG) technique, denoted as ImageRAG. ImageRAG's core innovation lies in its ability to selectively retrieve and focus on the most relevant portions of the UHR image as visual contexts that pertain to a given query. Fast path and slow path are proposed in this framework to handle this task efficiently and effectively. ImageRAG allows RSMLLMs to manage extensive context and spatial information from UHR RSI, ensuring the analysis is both accurate and efficient.

Updated: 2025-03-12 12:16:13

标题: 用ImageRAG增强超高分辨率遥感图像分析

摘要: 超高分辨率（UHR）遥感图像（RSI）（例如100,000×100,000像素或更多）对当前遥感多模态大语言模型（RSMLLMs）构成重大挑战。如果选择将UHR图像调整为标准输入图像尺寸，将会忽视UHR图像所包含的广泛空间和上下文信息。另外，这些图像的原始尺寸通常超过标准RSMLLMs的令牌限制，使得难以处理整个图像并捕捉基于丰富视觉上下文回答查询的长程依赖。在本文中，我们介绍了ImageRAG for RS，这是一个无需训练的框架，用于解决分析UHR遥感图像的复杂性。通过将UHR遥感图像分析任务转化为图像的长上下文选择任务，我们设计了一种基于检索增强生成（RAG）技术的创新图像上下文检索机制，称为ImageRAG。ImageRAG的核心创新在于其能够有选择地检索和专注于与给定查询相关的UHR图像的最相关部分作为视觉上下文。该框架提出了快速路径和慢速路径，以有效和有效地处理此任务。ImageRAG允许RSMLLMs管理来自UHR RSI的广泛上下文和空间信息，确保分析既准确又高效。

更新时间: 2025-03-12 12:16:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.07688v2

DAVE: Diagnostic benchmark for Audio Visual Evaluation

Audio-visual understanding is a rapidly evolving field that seeks to integrate and interpret information from both auditory and visual modalities. Despite recent advances in multi-modal learning, existing benchmarks often suffer from strong visual bias -- where answers can be inferred from visual data alone -- and provide only aggregate scores that conflate multiple sources of error. This makes it difficult to determine whether models struggle with visual understanding, audio interpretation, or audio-visual alignment. In this work, we introduce DAVE (Diagnostic Audio Visual Evaluation), a novel benchmark dataset designed to systematically evaluate audio-visual models across controlled challenges. DAVE alleviates existing limitations by (i) ensuring both modalities are necessary to answer correctly and (ii) decoupling evaluation into atomic subcategories. Our detailed analysis of state-of-the-art models reveals specific failure modes and provides targeted insights for improvement. By offering this standardized diagnostic framework, we aim to facilitate more robust development of audio-visual models. The dataset is released: https://github.com/gorjanradevski/dave

Updated: 2025-03-12 12:12:46

标题: DAVE：音视频评估的诊断基准

摘要: 音视频理解是一个快速发展的领域，旨在整合和解释来自听觉和视觉模式的信息。尽管多模态学习取得了近期进展，但现有的基准往往存在强烈的视觉偏见--答案可以仅从视觉数据中推断--并且仅提供混淆多个错误来源的聚合分数。这使得很难确定模型是在视觉理解、音频解释还是音视频对齐方面遇到困难。在这项工作中，我们介绍了DAVE（诊断音频视觉评估），这是一个新颖的基准数据集，旨在系统评估音视频模型在受控挑战中的表现。DAVE通过确保两种模式都是正确回答所必需的，并将评估解耦为原子细分类，缓解了现有的限制。我们对最先进的模型进行了详细分析，揭示了特定的失败模式，并为改进提供了有针对性的见解。通过提供这种标准化的诊断框架，我们旨在促进更健壮的音视频模型开发。该数据集已发布：https://github.com/gorjanradevski/dave

更新时间: 2025-03-12 12:12:46

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.09321v1

2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos

When interacting with objects, humans effectively reason about which regions of objects are viable for an intended action, i.e., the affordance regions of the object. They can also account for subtle differences in object regions based on the task to be performed and whether one or two hands need to be used. However, current vision-based affordance prediction methods often reduce the problem to naive object part segmentation. In this work, we propose a framework for extracting affordance data from human activity video datasets. Our extracted 2HANDS dataset contains precise object affordance region segmentations and affordance class-labels as narrations of the activity performed. The data also accounts for bimanual actions, i.e., two hands co-ordinating and interacting with one or more objects. We present a VLM-based affordance prediction model, 2HandedAfforder, trained on the dataset and demonstrate superior performance over baselines in affordance region segmentation for various activities. Finally, we show that our predicted affordance regions are actionable, i.e., can be used by an agent performing a task, through demonstration in robotic manipulation scenarios.

Updated: 2025-03-12 12:12:07

标题: 2HandedAfforder: 从人类视频中学习精确可操作的双手动作能力

摘要: 与对象进行交互时，人类有效地推理出对象的哪些区域适合进行预期动作，即对象的可供性区域。他们还可以根据要执行的任务以及是否需要使用一只手或两只手来考虑对象区域的细微差异。然而，当前基于视觉的可供性预测方法通常将问题简化为简单的对象部分分割。在这项工作中，我们提出了一个从人类活动视频数据集中提取可供性数据的框架。我们提取的2HANDS数据集包含精确的对象可供性区域分割和作为活动执行描述的可供性类别标签。该数据还考虑了双手动作，即两只手协调并与一个或多个对象互动。我们提出了一个基于VLM的可供性预测模型2HandedAfforder，经过数据集训练，并展示了在各种活动的可供性区域分割中优于基线的性能。最后，我们展示了我们预测的可供性区域是可操作的，即可以通过在机器人操作场景中进行演示来被执行任务的代理使用。

更新时间: 2025-03-12 12:12:07

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2503.09320v1

RaceTEE: A Practical Privacy-Preserving Off-Chain Smart Contract Execution Architecture

Decentralized on-chain smart contracts enable trustless collaboration, yet their inherent data transparency and execution overhead hinder widespread adoption. Existing cryptographic approaches incur high computational costs and lack generality. Meanwhile, prior TEE-based solutions suffer from practical limitations, such as the inability to support inter-contract interactions, reliance on unbreakable TEEs, and compromised usability. We introduce RaceTEE, a practical and privacy-preserving off-chain execution architecture for smart contracts that leverages Trusted Execution Environments (TEEs). RaceTEE decouples transaction ordering (on-chain) from execution (off-chain), with computations performed competitively in TEEs, ensuring confidentiality and minimizing overhead. It further enhances practicality through three key improvements: supporting secure inter-contract interactions, providing a key rotation scheme that enforces forward and backward secrecy even in the event of TEE breaches, and enabling full compatibility with existing blockchains without altering the user interaction model. To validate its feasibility, we prototype RaceTEE using Intel SGX and Ethereum, demonstrating its applicability across various use cases and evaluating its performance.

Updated: 2025-03-12 12:10:02

标题: RaceTEE：一种实用的隐私保护的离链智能合约执行架构

摘要: 去中心化的on-chain智能合约实现了无需信任的合作，但其固有的数据透明性和执行开销阻碍了广泛采用。现有的加密方法会产生高昂的计算成本，并且缺乏通用性。与此同时，之前基于TEE的解决方案存在实际限制，例如无法支持合同间交互、依赖于不可破解的TEE和受损的可用性。我们引入了RaceTEE，一种实际且保护隐私的离链执行架构，用于智能合约，利用可信执行环境（TEE）。RaceTEE将交易排序（on-chain）与执行（off-chain）分离，计算在TEE中竞争执行，确保机密性并最小化开销。通过三个关键改进进一步增强了实用性：支持安全的合同间交互、提供强制前向和后向保密性的密钥轮换方案，即使在TEE遭到破坏的情况下，以及在不改变用户交互模型的情况下实现与现有区块链的完全兼容。为验证其可行性，我们使用Intel SGX和以太坊原型化了RaceTEE，展示了其在各种用例中的适用性，并评估了其性能。

更新时间: 2025-03-12 12:10:02

领域: cs.CR

下载: http://arxiv.org/abs/2503.09317v1

Diffusion Models as Cartoonists: The Curious Case of High Density Regions

We investigate what kind of images lie in the high-density regions of diffusion models. We introduce a theoretical mode-tracking process capable of pinpointing the exact mode of the denoising distribution, and we propose a practical high-density sampler that consistently generates images of higher likelihood than usual samplers. Our empirical findings reveal the existence of significantly higher likelihood samples that typical samplers do not produce, often manifesting as cartoon-like drawings or blurry images depending on the noise level. Curiously, these patterns emerge in datasets devoid of such examples. We also present a novel approach to track sample likelihoods in diffusion SDEs, which remarkably incurs no additional computational cost.

Updated: 2025-03-12 12:08:55

标题: 扩散模型如同漫画家：高密度区域的奇妙案例

摘要: 我们调查了扩散模型高密度区域中存在哪种图像。我们引入了一种理论模式跟踪过程，能够准确定位去噪分布的确切模式，并提出了一个实用的高密度采样器，可持续生成比通常采样器更高可能性的图像。我们的实证发现揭示了显著高可能性样本的存在，通常采样器无法产生，这些样本往往表现为卡通风格的绘画或根据噪声水平而变得模糊的图像。有趣的是，这些模式出现在不包含此类示例的数据集中。我们还提出了一种新方法来跟踪扩散SDE中的样本可能性，这在计算成本上并没有额外开销。

更新时间: 2025-03-12 12:08:55

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.01293v3

ShuffleGate: An Efficient and Self-Polarizing Feature Selection Method for Large-Scale Deep Models in Industry

Deep models in industrial applications rely on thousands of features for accurate predictions, such as deep recommendation systems. While new features are introduced to capture evolving user behavior, outdated or redundant features often remain, significantly increasing storage and computational costs. To address this issue, feature selection methods are widely adopted to identify and remove less important features. However, existing approaches face two major challenges: (1) they often require complex Hyperparameter (Hp) tuning, making them difficult to employ in practice, and (2) they fail to produce well-separated feature importance scores, which complicates straightforward feature removal. Moreover, the impact of removing unimportant features can only be evaluated through retraining the model, a time-consuming and resource-intensive process that severely hinders efficient feature selection. To solve these challenges, we propose a novel feature selection approach, Shuffle-Gate. In particular, it shuffles all feature values across instances simultaneously and uses a gating mechanism that allows the model to dynamically learn the weights for combining the original and shuffled inputs. Notably, it can generate well-separated feature importance scores and estimate the performance without retraining the model, while introducing only a single Hp. Experiments on four public datasets show that our approach outperforms state-of-the-art methods in selecting the top half of the feature set for model retraining. Moreover, it has been successfully integrated into the daily iteration of Bilibili's search models across various scenarios, where it significantly reduces feature set size and computational resource usage, while maintaining comparable performance.

Updated: 2025-03-12 12:05:03

标题: ShuffleGate：一种用于工业中大规模深度模型的高效自极化特征选择方法

摘要: 在工业应用中，深度模型依赖于成千上万个特征进行准确预测，比如深度推荐系统。虽然引入了新特征来捕捉不断发展的用户行为，但过时或冗余的特征通常仍然存在，显著增加了存储和计算成本。为解决这一问题，广泛采用特征选择方法来识别和删除不太重要的特征。然而，现有方法面临两个主要挑战：(1)它们通常需要复杂的超参数调整，使其难以在实践中应用，(2)它们未能产生明显区分的特征重要性评分，这使得直接删除特征变得复杂。此外，删除不重要特征的影响只能通过重新训练模型来评估，这是一个耗时且资源密集的过程，严重阻碍了高效特征选择。为解决这些挑战，我们提出了一种新颖的特征选择方法Shuffle-Gate。具体而言，它同时对所有实例的特征值进行洗牌，并使用门控机制，使模型能够动态学习权重以结合原始和洗牌输入。值得注意的是，它可以生成明显区分的特征重要性评分，并在不重新训练模型的情况下估计性能，同时仅引入一个超参数。对四个公共数据集的实验表明，我们的方法在选择模型重新训练的前一半特征集方面优于现有方法。此外，它已成功集成到Bilibili的搜索模型的日常迭代中，跨越各种场景，在显著减少特征集大小和计算资源使用的同时保持可比性能。

更新时间: 2025-03-12 12:05:03

领域: cs.LG

下载: http://arxiv.org/abs/2503.09315v1

Terrier: A Deep Learning Repeat Classifier

Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Existing tools often struggle to classify divergent taxa due to biases in reference libraries, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on RepBase, which includes over 100,000 repeat families -- four times more than Dfam -- Terrier maps 97.1% of RepBase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice and fruit flies), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian and flatworm genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.

Updated: 2025-03-12 12:03:26

标题: 猎犬：一个深度学习重复分类器

摘要: 重复的DNA序列支撑着基因组结构和进化过程，然而它们仍然很难准确分类。Terrier是一个深度学习模型，旨在通过使用一个公开可用的、经过筛选的重复序列库，根据RepeatMasker模式对重复的DNA序列进行分类，以克服这些挑战。现有的工具往往由于参考库中的偏见而难以对不同的分类进行分类，这限制了我们对重复进化和功能的理解。Terrier通过利用深度学习来提高准确性，克服了这些挑战。在RepBase上进行训练，其中包括超过10万个重复家族，比Dfam多四倍，Terrier将97.1%的RepBase序列映射到RepeatMasker分类中，提供了目前最全面的分类系统。在模式生物（水稻和果蝇）中与DeepTE、TERL和TEclass2进行比较时，Terrier在分类更广泛的序列时实现了更高的准确性。在非模式两栖动物和扁虫基因组中进行进一步验证，突出了其在改善非模式物种分类中的有效性，有助于研究重复驱动的进化、基因组不稳定性和表型变异。

更新时间: 2025-03-12 12:03:26

领域: q-bio.GN,cs.LG,I.2

下载: http://arxiv.org/abs/2503.09312v1

Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions

Adaptive questionnaires dynamically select the next question for a survey participant based on their previous answers. Due to digitalisation, they have become a viable alternative to traditional surveys in application areas such as political science. One limitation, however, is their dependency on data to train the model for question selection. Often, such training data (i.e., user interactions) are unavailable a priori. To address this problem, we (i) test whether Large Language Models (LLM) can accurately generate such interaction data and (ii) explore if these synthetic data can be used to pre-train the statistical model of an adaptive political survey. To evaluate this approach, we utilise existing data from the Swiss Voting Advice Application (VAA) Smartvote in two ways: First, we compare the distribution of LLM-generated synthetic data to the real distribution to assess its similarity. Second, we compare the performance of an adaptive questionnaire that is randomly initialised with one pre-trained on synthetic data to assess their suitability for training. We benchmark these results against an "oracle" questionnaire with perfect prior knowledge. We find that an off-the-shelf LLM (GPT-4) accurately generates answers to the Smartvote questionnaire from the perspective of different Swiss parties. Furthermore, we demonstrate that initialising the statistical model with synthetic data can (i) significantly reduce the error in predicting user responses and (ii) increase the candidate recommendation accuracy of the VAA. Our work emphasises the considerable potential of LLMs to create training data to improve the data collection process in adaptive questionnaires in LLM-affine areas such as political surveys.

Updated: 2025-03-12 12:02:36

标题: 自适应政治调查和GPT-4：通过模拟用户交互解决冷启动问题

摘要: 自适应问卷根据受访者先前的答案动态选择下一个问题。由于数字化，它们已成为在政治科学等应用领域中传统调查的可行替代方案之一。然而，一个限制是它们对用于问题选择模型训练的数据的依赖。通常，这种训练数据（即用户互动）事先是不可用的。为了解决这个问题，我们（i）测试了大型语言模型（LLM）是否能够准确生成这种互动数据，以及（ii）探讨这些合成数据是否可以用于预先训练自适应政治调查的统计模型。为了评估这种方法，我们利用瑞士投票建议应用（VAA）Smartvote的现有数据进行两种方式的处理：首先，我们比较LLM生成的合成数据的分布与真实分布，以评估它们的相似性。其次，我们比较了随机初始化的自适应问卷与一个在合成数据上预先训练的问卷的性能，以评估它们用于训练的适用性。我们将这些结果与具有完美先验知识的“神谕”问卷进行基准测试。我们发现，一个现成的LLM（GPT-4）能够准确地从不同瑞士政党的观点生成Smartvote问卷的答案。此外，我们证明，用合成数据初始化统计模型可以（i）显着减少预测用户响应的错误，以及（ii）提高VAA的候选人推荐准确性。我们的工作强调了LLM在创建训练数据方面的巨大潜力，以改善LLM相关领域（如政治调查）中的自适应问卷中的数据收集过程。

更新时间: 2025-03-12 12:02:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.09311v1

Steering No-Regret Agents in MFGs under Model Uncertainty

Incentive design is a popular framework for guiding agents' learning dynamics towards desired outcomes by providing additional payments beyond intrinsic rewards. However, most existing works focus on a finite, small set of agents or assume complete knowledge of the game, limiting their applicability to real-world scenarios involving large populations and model uncertainty. To address this gap, we study the design of steering rewards in Mean-Field Games (MFGs) with density-independent transitions, where both the transition dynamics and intrinsic reward functions are unknown. This setting presents non-trivial challenges, as the mediator must incentivize the agents to explore for its model learning under uncertainty, while simultaneously steer them to converge to desired behaviors without incurring excessive incentive payments. Assuming agents exhibit no(-adaptive) regret behaviors, we contribute novel optimistic exploration algorithms. Theoretically, we establish sub-linear regret guarantees for the cumulative gaps between the agents' behaviors and the desired ones. In terms of the steering cost, we demonstrate that our total incentive payments incur only sub-linear excess, competing with a baseline steering strategy that stabilizes the target policy as an equilibrium. Our work presents an effective framework for steering agents behaviors in large-population systems under uncertainty.

Updated: 2025-03-12 12:02:02

标题: 在模型不确定性下引导无后悔代理人在MFGs中

摘要: 激励设计是引导代理学习动态朝向期望结果的流行框架，通过提供超出内在奖励的额外支付。然而，大多数现有作品关注有限的、小规模的代理集合，或者假设对游戏具有完全的知识，从而限制了它们适用于涉及大规模人群和模型不确定性的实际场景。为了弥补这一差距，我们研究了具有独立于密度的转换的均场博弈(MFGs)中的引导奖励设计，在这种情况下，转换动态和内在奖励函数都是未知的。这种设置带来了非平凡的挑战，因为中介必须激励代理在不确定性下进行模型学习的探索，同时引导他们收敛到期望的行为，而不会造成过多的激励支付。假设代理展现出无（自适应）后悔行为，我们提出了新颖的乐观探索算法。在理论上，我们建立了代理行为与期望行为之间的累积差距保证为次线性后悔。在引导成本方面，我们证明我们的总激励支付只会产生次线性过度，与将目标策略稳定为均衡的基线引导策略竞争。我们的工作提供了一个在不确定性下引导大规模人群系统中代理行为的有效框架。

更新时间: 2025-03-12 12:02:02

领域: cs.LG,cs.AI,cs.MA,stat.ML

下载: http://arxiv.org/abs/2503.09309v1

MRGen: Segmentation Data Engine For Underrepresented MRI Modalities

Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data, and manual mask annotations can be costly and labor-intensive to acquire. This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities, particularly on annotation-scarce MRI. Concretely, our contributions are threefold: (i) we introduce MRGen-DB, a large-scale radiology image-text dataset comprising extensive samples with rich metadata, including modality labels, attributes, regions, and organs information, with a subset having pixelwise mask annotations; (ii) we present MRGen, a diffusion-based data engine for controllable medical image synthesis, conditioned on text prompts and segmentation masks. MRGen can generate realistic images for diverse MRI modalities lacking mask annotations, facilitating segmentation training in low-source domains; (iii) extensive experiments across multiple modalities demonstrate that MRGen significantly improves segmentation performance on unannotated modalities by providing high-quality synthetic data. We believe that our method bridges a critical gap in medical image analysis, extending segmentation capabilities to scenarios that are challenging to acquire manual annotations.

Updated: 2025-03-12 11:59:46

标题: MRGen：针对少数MRI模态的分割数据引擎

摘要: 训练罕见但临床意义重大的医学图像分割模型具有挑战性，因为缺乏带有注释的数据，而手动标注掩模可能会成本高昂且劳动密集。本文研究利用生成模型来合成训练数据，为少数表现不足的模态训练分割模型，特别是在注释稀缺的MRI上。具体来说，我们的贡献有三个方面：（i）我们介绍了MRGen-DB，这是一个包含丰富样本和丰富元数据的大规模放射学图像文本数据集，包括模态标签、属性、区域和器官信息，其中一部分具有像素级掩模注释；（ii）我们提出了MRGen，这是一个基于扩散的数据引擎，用于可控医学图像合成，通过文本提示和分割掩模进行条件化。MRGen可以为缺乏掩模注释的多样化MRI模态生成逼真的图像，促进低资源域中的分割训练；（iii）在多个模态上进行的广泛实验表明，MRGen通过提供高质量的合成数据显著改善了未注释模态的分割性能。我们相信我们的方法弥合了医学图像分析中的重要差距，将分割能力扩展到难以获取手动标注的场景。

更新时间: 2025-03-12 11:59:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.04106v2

DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

This work explores knowledge distillation (KD) for visually-rich document (VRD) applications such as document layout analysis (DLA) and document image classification (DIC). While VRD research is dependent on increasingly sophisticated and cumbersome models, the field has neglected to study efficiency via model compression. Here, we design a KD experimentation methodology for more lean, performant models on document understanding (DU) tasks that are integral within larger task pipelines. We carefully selected KD strategies (response-based, feature-based) for distilling knowledge to and from backbones with different architectures (ResNet, ViT, DiT) and capacities (base, small, tiny). We study what affects the teacher-student knowledge gap and find that some methods (tuned vanilla KD, MSE, SimKD with an apt projector) can consistently outperform supervised student training. Furthermore, we design downstream task setups to evaluate covariate shift and the robustness of distilled DLA models on zero-shot layout-aware document visual question answering (DocVQA). DLA-KD experiments result in a large mAP knowledge gap, which unpredictably translates to downstream robustness, accentuating the need to further explore how to efficiently obtain more semantic document layout awareness.

Updated: 2025-03-12 11:58:36

标题: DistilDoc：用于视觉丰富文档应用的知识蒸馏

摘要: 这项工作探讨了知识蒸馏（KD）在视觉丰富文档（VRD）应用中的应用，如文档布局分析（DLA）和文档图像分类（DIC）。虽然VRD研究依赖于越来越复杂和繁琐的模型，但该领域忽视了通过模型压缩实现效率。在这里，我们为文档理解（DU）任务设计了一个KD实验方法论，以便在更大的任务流水线中实现更精简、高性能的模型。我们精心选择了KD策略（基于响应、基于特征）用于将知识从不同架构（ResNet、ViT、DiT）和容量（基础、小型、微小）的骨干网络中提取和蒸馏。我们研究了教师-学生知识差距的影响，并发现一些方法（调整过的普通KD、均方误差、具有适当投影仪的SimKD）可以始终优于监督学生训练。此外，我们设计了下游任务设置来评估协变量转移和蒸馏的DLA模型在零样本布局感知文档视觉问答（DocVQA）上的稳健性。DLA-KD实验结果显示出较大的mAP知识差距，这在下游稳健性上表现为不可预测，并强调了进一步探索如何有效获取更多语义文档布局意识的必要性。

更新时间: 2025-03-12 11:58:36

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.08226v2

Naive Feature Selection: a Nearly Tight Convex Relaxation for Sparse Naive Bayes

Due to its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a combinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our convex relaxation bounds becomes tight as the marginal contribution of additional features decreases, using a priori duality gap bounds dervied from the Shapley-Folkman theorem. We show how to produce primal solutions satisfying these bounds. Both binary and multinomial sparse models are solvable in time almost linear in problem size, representing a very small extra relative cost compared to the classical naive Bayes. Numerical experiments on text data show that the naive Bayes feature selection method is as statistically effective as state-of-the-art feature selection methods such as recursive feature elimination, $l_1$-penalized logistic regression and LASSO, while being orders of magnitude faster.

Updated: 2025-03-12 11:57:25

标题: 天真特征选择：一种几乎紧密的凸松弛方法用于稀疏朴素贝叶斯

摘要: 由于其线性复杂性，朴素贝叶斯分类仍然是一种具有吸引力的监督学习方法，特别是在非常大规模的情况下。我们提出了朴素贝叶斯的稀疏版本，可用于特征选择。这导致了一个组合最大似然问题，在二进制数据情况下我们提供了一个精确解，或者在多项式情况下提供了一个界。我们证明了我们的凸松弛界随着额外特征的边际贡献减少而变得紧密，利用从谢普利-福克曼定理导出的先验对偶间隙界限。我们展示了如何产生满足这些界限的原始解。二进制和多项式稀疏模型在问题规模几乎线性可解，与经典朴素贝叶斯相比，代表了非常小的额外相对成本。文本数据上的数值实验表明，朴素贝叶斯特征选择方法在统计上与最先进的特征选择方法（如递归特征消除，$l_1$惩罚逻辑回归和LASSO）一样有效，同时速度快了数个数量级。

更新时间: 2025-03-12 11:57:25

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/1905.09884v4

Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference

Large Language Models have revolutionized natural language processing, yet serving them efficiently in data centers remains challenging due to mixed workloads comprising latency-sensitive (LS) and best-effort (BE) jobs. Existing inference systems employ iteration-level first-come-first-served scheduling, causing head-of-line blocking when BE jobs delay LS jobs. We introduce QLLM, a novel inference system designed for Mixture of Experts (MoE) models, featuring a fine-grained, priority-aware preemptive scheduler. QLLM enables expert-level preemption, deferring BE job execution while minimizing LS time-to-first-token (TTFT). Our approach removes iteration-level scheduling constraints, enabling the scheduler to preempt jobs at any layer based on priority. Evaluations on an Nvidia A100 GPU show that QLLM significantly improves performance. It reduces LS TTFT by an average of $65.5\times$ and meets the SLO at up to $7$ requests/sec, whereas the baseline fails to do so under the tested workload. Additionally, it cuts LS turnaround time by up to $12.8\times$ without impacting throughput. QLLM is modular, extensible, and seamlessly integrates with Hugging Face MoE models.

Updated: 2025-03-12 11:56:01

标题: 优先级感知的抢占式调度在MoE推断中用于混合优先级工作负载

摘要: 大型语言模型已经彻底改变了自然语言处理，但是在数据中心高效地为它们提供服务仍然具有挑战性，因为工作负载混合了延迟敏感（LS）和尽力而为（BE）任务。现有的推断系统采用迭代级别的先来先服务调度，当BE任务延迟LS任务时会导致头部阻塞。我们引入了QLLM，这是一个为专家混合（MoE）模型设计的新型推断系统，具有细粒度、优先级感知的抢占式调度器。QLLM实现了专家级别的抢占，推迟BE任务执行同时最小化LS的首个标记时间（TTFT）。我们的方法消除了迭代级别调度的约束，使调度器能够基于优先级在任何层面抢占任务。在Nvidia A100 GPU上的评估表明，QLLM显著提高了性能。它将LS的TTFT平均减少了65.5倍，并在高达7个请求/秒时满足了SLO，而基线在测试工作负载下无法做到。此外，它将LS的周转时间最多减少了12.8倍，而不影响吞吐量。QLLM是模块化、可扩展的，并与Hugging Face MoE模型无缝集成。

更新时间: 2025-03-12 11:56:01

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2503.09304v1

Detecting and Preventing Data Poisoning Attacks on AI Models

This paper investigates the critical issue of data poisoning attacks on AI models, a growing concern in the ever-evolving landscape of artificial intelligence and cybersecurity. As advanced technology systems become increasingly prevalent across various sectors, the need for robust defence mechanisms against adversarial attacks becomes paramount. The study aims to develop and evaluate novel techniques for detecting and preventing data poisoning attacks, focusing on both theoretical frameworks and practical applications. Through a comprehensive literature review, experimental validation using the CIFAR-10 and Insurance Claims datasets, and the development of innovative algorithms, this paper seeks to enhance the resilience of AI models against malicious data manipulation. The study explores various methods, including anomaly detection, robust optimization strategies, and ensemble learning, to identify and mitigate the effects of poisoned data during model training. Experimental results indicate that data poisoning significantly degrades model performance, reducing classification accuracy by up to 27% in image recognition tasks (CIFAR-10) and 22% in fraud detection models (Insurance Claims dataset). The proposed defence mechanisms, including statistical anomaly detection and adversarial training, successfully mitigated poisoning effects, improving model robustness and restoring accuracy levels by an average of 15-20%. The findings further demonstrate that ensemble learning techniques provide an additional layer of resilience, reducing false positives and false negatives caused by adversarial data injections.

Updated: 2025-03-12 11:55:01

标题: 检测和防止对人工智能模型的数据毒化攻击

摘要: 本文调查了关于人工智能模型中数据毒化攻击的关键问题，这是人工智能和网络安全不断发展的领域中越来越关注的问题。随着先进技术系统在各个领域的日益普及，对抗性攻击的防御机制变得至关重要。本研究旨在开发和评估检测和防止数据毒化攻击的新技术，重点关注理论框架和实际应用。通过全面的文献综述、使用CIFAR-10和保险理赔数据集的实验验证以及创新算法的开发，本文旨在增强人工智能模型对恶意数据操纵的抵抗能力。该研究探讨了各种方法，包括异常检测、鲁棒优化策略和集成学习，以识别和减轻模型训练过程中毒化数据的影响。实验结果表明，数据毒化显著降低了模型性能，在图像识别任务（CIFAR-10）中将分类准确度降低了高达27%，在欺诈检测模型（保险理赔数据集）中降低了22%。提出的防御机制，包括统计异常检测和对抗训练，成功地减轻了毒化效应，提高了模型的鲁棒性，并将准确性水平恢复至平均水平的15-20%。研究结果进一步表明，集成学习技术提供了额外的鲁棒性层，减少了由对抗性数据注入造成的假阳性和假阴性。

更新时间: 2025-03-12 11:55:01

领域: cs.CR,eess.IV

下载: http://arxiv.org/abs/2503.09302v1

Distributional Counterfactual Explanations With Optimal Transport

Counterfactual explanations (CE) are the de facto method for providing insights into black-box decision-making models by identifying alternative inputs that lead to different outcomes. However, existing CE approaches, including group and global methods, focus predominantly on specific input modifications, lacking the ability to capture nuanced distributional characteristics that influence model outcomes across the entire input-output spectrum. This paper proposes distributional counterfactual explanation (DCE), shifting focus to the distributional properties of observed and counterfactual data, thus providing broader insights. DCE is particularly beneficial for stakeholders making strategic decisions based on statistical data analysis, as it makes the statistical distribution of the counterfactual resembles the one of the factual when aligning model outputs with a target distribution\textemdash something that the existing CE methods cannot fully achieve. We leverage optimal transport (OT) to formulate a chance-constrained optimization problem, deriving a counterfactual distribution aligned with its factual counterpart, supported by statistical confidence. The efficacy of this approach is demonstrated through experiments, highlighting its potential to provide deeper insights into decision-making models.

Updated: 2025-03-12 11:53:06

标题: 使用最优输运进行分布式反事实解释

摘要: 因果推断解释（CE）是提供黑匣子决策模型洞察的事实方法，通过识别导致不同结果的替代输入。然而，现有的CE方法，包括群体和全局方法，主要关注特定输入修改，缺乏捕捉影响整个输入-输出谱的模型结果的微妙分布特征的能力。本文提出了分布式因果推断解释（DCE），将焦点转移到观察和因果数据的分布特性，从而提供更广泛的洞察。DCE对于基于统计数据分析做出战略决策的利益相关者特别有益，因为当将模型输出与目标分布对齐时，它使因果推断的统计分布类似于实际情况\textemdash 这是现有CE方法无法完全实现的。我们利用最优输运（OT）制定了一个机会约束优化问题，导出了一个与其实际对应物对齐的因果分布，支持统计置信度。通过实验证明了这种方法的有效性，突显了它提供更深入洞察决策模型的潜力。

更新时间: 2025-03-12 11:53:06

领域: cs.AI,stat.ML

下载: http://arxiv.org/abs/2401.13112v6

LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics

Time Series Forecasting (TSF) is critical in many real-world domains like financial planning and health monitoring. Recent studies have revealed that Large Language Models (LLMs), with their powerful in-contextual modeling capabilities, hold significant potential for TSF. However, existing LLM-based methods usually perform suboptimally because they neglect the inherent characteristics of time series data. Unlike the textual data used in LLM pre-training, the time series data is semantically sparse and comprises distinctive temporal patterns. To address this problem, we propose LLM-PS to empower the LLM for TSF by learning the fundamental \textit{Patterns} and meaningful \textit{Semantics} from time series data. Our LLM-PS incorporates a new multi-scale convolutional neural network adept at capturing both short-term fluctuations and long-term trends within the time series. Meanwhile, we introduce a time-to-text module for extracting valuable semantics across continuous time intervals rather than isolated time points. By integrating these patterns and semantics, LLM-PS effectively models temporal dependencies, enabling a deep comprehension of time series and delivering accurate forecasts. Intensive experimental results demonstrate that LLM-PS achieves state-of-the-art performance in both short- and long-term forecasting tasks, as well as in few- and zero-shot settings.

Updated: 2025-03-12 11:45:11

标题: LLM-PS: 利用时间模式和语义赋能大型语言模型进行时间序列预测

摘要: 时间序列预测（TSF）在许多实际领域中至关重要，如财务规划和健康监测。最近的研究表明，大型语言模型（LLMs）凭借其强大的上下文建模能力，对TSF具有重要潜力。然而，现有基于LLM的方法通常表现不佳，因为它们忽视了时间序列数据的固有特征。与LLM预训练中使用的文本数据不同，时间序列数据在语义上较稀疏，包含独特的时间模式。为解决这个问题，我们提出LLM-PS，通过从时间序列数据中学习基本的“模式”和有意义的“语义”，使LLM能够更好地进行TSF。我们的LLM-PS结合了一种新的多尺度卷积神经网络，能够捕捉时间序列中的短期波动和长期趋势。同时，我们引入了一个时间到文本模块，用于提取连续时间间隔中的有价值语义，而不是孤立的时间点。通过整合这些模式和语义，LLM-PS有效地建模了时间依赖关系，实现了对时间序列的深度理解，并提供了准确的预测。大量实验结果表明，LLM-PS在短期和长期预测任务以及少数和零样本设置中取得了最先进的性能。

更新时间: 2025-03-12 11:45:11

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.09656v1

SQLCritic: Correcting Text-to-SQL Generation via Clause-wise Critic

Recent advancements in Text-to-SQL systems have improved the conversion of natural language queries into SQL, but challenges remain in ensuring accuracy and reliability. While self-correction techniques refine outputs, they often introduce new errors. Existing methods focused on execution feedback mainly address syntax issues, leaving semantic errors -- where the query's logic fails to align with the user's intent -- largely unaddressed. We propose a novel approach combining structured execution feedback with a trained critic agent that provides detailed, interpretable critiques. This method effectively identifies and corrects both syntactic and semantic errors, enhancing accuracy and interpretability. Experimental results show significant improvements on two major Text-to-SQL benchmarks, Spider and BIRD, demonstrating the effectiveness of our approach.

Updated: 2025-03-12 11:41:45

标题: SQLCritic: 通过逐句评审纠正文本到SQL生成

摘要: 最近的Text-to-SQL系统的进展已经改进了自然语言查询转换为SQL的过程，但在确保准确性和可靠性方面仍面临挑战。尽管自我校正技术可以改进输出，但往往会引入新的错误。现有的方法主要关注执行反馈，主要解决语法问题，而语义错误——即查询逻辑与用户意图不一致的问题——基本上未得到解决。我们提出了一种新颖的方法，将结构化执行反馈与经过训练的评论代理相结合，提供详细、可解释的批评。这种方法有效地识别和纠正了语法和语义错误，提高了准确性和可解释性。实验结果显示，在两个主要的Text-to-SQL基准测试Spider和BIRD上显著改进，证明了我们方法的有效性。

更新时间: 2025-03-12 11:41:45

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.07996v2

Prompt Inference Attack on Distributed Large Language Model Inference Frameworks

The inference process of modern large language models (LLMs) demands prohibitive computational resources, rendering them infeasible for deployment on consumer-grade devices. To address this limitation, recent studies propose distributed LLM inference frameworks, which employ split learning principles to enable collaborative LLM inference on resource-constrained hardware. However, distributing LLM layers across participants requires the transmission of intermediate outputs, which may introduce privacy risks to the original input prompts - a critical issue that has yet to be thoroughly explored in the literature. In this paper, we rigorously examine the privacy vulnerabilities of distributed LLM inference frameworks by designing and evaluating three prompt inference attacks aimed at reconstructing input prompts from intermediate LLM outputs. These attacks are developed under various query and data constraints to reflect diverse real-world LLM service scenarios. Specifically, the first attack assumes an unlimited query budget and access to an auxiliary dataset sharing the same distribution as the target prompts. The second attack also leverages unlimited queries but uses an auxiliary dataset with a distribution differing from the target prompts. The third attack operates under the most restrictive scenario, with limited query budgets and no auxiliary dataset available. We evaluate these attacks on a range of LLMs, including state-of-the-art models such as Llama-3.2 and Phi-3.5, as well as widely-used models like GPT-2 and BERT for comparative analysis. Our experiments show that the first two attacks achieve reconstruction accuracies exceeding 90%, while the third achieves accuracies typically above 50%, even under stringent constraints. These findings highlight privacy risks in distributed LLM inference frameworks, issuing a strong alert on their deployment in real-world applications.

Updated: 2025-03-12 11:36:29

标题: 对分布式大型语言模型推断框架的快速推理攻击

摘要: 现代大型语言模型（LLMs）的推理过程需要巨大的计算资源，使它们无法在消费级设备上部署。为了解决这一限制，最近的研究提出了分布式LLM推理框架，采用分割学习原理实现在资源受限硬件上的协作LLM推理。然而，将LLM层分布在参与者之间需要传输中间输出，这可能会给原始输入提示带来隐私风险，这是文献中尚未深入探讨的关键问题。在本文中，我们通过设计和评估三种目标推理攻击，严格检验了分布式LLM推理框架的隐私漏洞，旨在从中间LLM输出中重建输入提示。这些攻击是根据各种查询和数据约束条件开发的，以反映不同的现实世界LLM服务场景。具体来说，第一种攻击假设具有无限的查询预算，并且可以访问一个与目标提示具有相同分布的辅助数据集。第二种攻击也利用无限的查询，但使用一个与目标提示分布不同的辅助数据集。第三种攻击在最为严格的情况下操作，具有有限的查询预算并且没有可用的辅助数据集。我们评估了这些攻击在一系列LLM上的表现，包括Llama-3.2和Phi-3.5等最先进模型，以及用于比较分析的广泛使用的模型如GPT-2和BERT。我们的实验表明，前两种攻击的重建准确率超过90%，而第三种在严格约束条件下通常也能达到50%以上的准确率。这些发现突显了分布式LLM推理框架中的隐私风险，并对它们在现实应用中的部署发出了强烈警示。

更新时间: 2025-03-12 11:36:29

领域: cs.CR

下载: http://arxiv.org/abs/2503.09291v1

Unmask It! AI-Generated Product Review Detection in Dravidian Languages

The rise of Generative AI has led to a surge in AI-generated reviews, often posing a serious threat to the credibility of online platforms. Reviews serve as the primary source of information about products and services. Authentic reviews play a vital role in consumer decision-making. The presence of fabricated content misleads consumers, undermines trust and facilitates potential fraud in digital marketplaces. This study focuses on detecting AI-generated product reviews in Tamil and Malayalam, two low-resource languages where research in this domain is relatively under-explored. We worked on a range of approaches - from traditional machine learning methods to advanced transformer-based models such as Indic-BERT, IndicSBERT, MuRIL, XLM-RoBERTa and MalayalamBERT. Our findings highlight the effectiveness of leveraging the state-of-the-art transformers in accurately identifying AI-generated content, demonstrating the potential in enhancing the detection of fake reviews in low-resource language settings.

Updated: 2025-03-12 11:35:04

标题: 揭开面纱！在德拉维达语言中生成的产品评论检测

摘要: 生成AI的兴起导致了AI生成评论的激增，这往往对在线平台的可信度构成严重威胁。评论是关于产品和服务的主要信息来源。真实的评论在消费者决策中发挥着至关重要的作用。虚构内容的存在误导消费者，破坏信任，并促成数字市场中潜在的欺诈行为。本研究重点关注于检测泰米尔语和马拉雅拉姆语中的AI生成产品评论，这两种资源较少的语言在该领域的研究相对未被深入探索。我们采用了一系列方法 - 从传统的机器学习方法到先进的基于变压器的模型，如Indic-BERT、IndicSBERT、MuRIL、XLM-RoBERTa和MalayalamBERT。我们的研究结果突出了利用最先进的变压器在准确识别AI生成内容方面的有效性，展示了在资源匮乏的语言环境中增强检测虚假评论的潜力。

更新时间: 2025-03-12 11:35:04

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.09289v1

SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

In recent years, the rapid advancement of Artificial Intelligence (AI) technologies, particularly Large Language Models (LLMs), has revolutionized the paradigm of scientific discovery, establishing AI-for-Science (AI4Science) as a dynamic and evolving field. However, there is still a lack of an effective framework for the overall assessment of AI4Science, particularly from a holistic perspective on data quality and model capability. Therefore, in this study, we propose SciHorizon, a comprehensive assessment framework designed to benchmark the readiness of AI4Science from both scientific data and LLM perspectives. First, we introduce a generalizable framework for assessing AI-ready scientific data, encompassing four key dimensions: Quality, FAIRness, Explainability, and Compliance which are subdivided into 15 sub-dimensions. Drawing on data resource papers published between 2018 and 2023 in peer-reviewed journals, we present recommendation lists of AI-ready datasets for both Earth and Life Sciences, making a novel and original contribution to the field. Concurrently, to assess the capabilities of LLMs across multiple scientific disciplines, we establish 16 assessment dimensions based on five core indicators Knowledge, Understanding, Reasoning, Multimodality, and Values spanning Mathematics, Physics, Chemistry, Life Sciences, and Earth and Space Sciences. Using the developed benchmark datasets, we have conducted a comprehensive evaluation of over 20 representative open-source and closed source LLMs. All the results are publicly available and can be accessed online at www.scihorizon.cn/en.

Updated: 2025-03-12 11:34:41

标题: SciHorizon: 从科学数据到大型语言模型的AI-for-Science准备性基准测试

摘要: 近年来，人工智能（AI）技术的快速发展，尤其是大型语言模型（LLMs），已经彻底改变了科学发现的范式，将AI用于科学（AI4Science）确立为一个充满活力和不断发展的领域。然而，目前仍然缺乏一个有效的框架来全面评估AI4Science，尤其是从数据质量和模型能力的整体角度。因此，在这项研究中，我们提出了SciHorizon，一个综合评估框架，旨在从科学数据和LLM的角度评估AI4Science的准备情况。首先，我们介绍了一个通用的框架，用于评估适用于AI的科学数据，包括四个关键维度：质量、公平性、可解释性和合规性，这些维度又分为15个子维度。根据2018年至2023年在同行评议期刊上发表的数据资源论文，我们提出了适用于地球科学和生命科学的AI准备数据集的推荐列表，为该领域作出了新颖和原创的贡献。同时，为了评估LLMs在多个科学学科中的能力，我们建立了基于五个核心指标知识、理解、推理、多模态性和价值的16个评估维度，涵盖数学、物理、化学、生命科学以及地球和空间科学。利用开发的基准数据集，我们对20多个代表性的开源和闭源LLMs进行了全面评估。所有结果都是公开的，并可以在线访问www.scihorizon.cn/en。

更新时间: 2025-03-12 11:34:41

领域: cs.LG,cs.CL,cs.DL,cs.IR

下载: http://arxiv.org/abs/2503.13503v1

AI-native Memory 2.0: Second Me

Human interaction with the external world fundamentally involves the exchange of personal memory, whether with other individuals, websites, applications, or, in the future, AI agents. A significant portion of this interaction is redundant, requiring users to repeatedly provide the same information across different contexts. Existing solutions, such as browser-stored credentials, autofill mechanisms, and unified authentication systems, have aimed to mitigate this redundancy by serving as intermediaries that store and retrieve commonly used user data. The advent of large language models (LLMs) presents an opportunity to redefine memory management through an AI-native paradigm: SECOND ME. SECOND ME acts as an intelligent, persistent memory offload system that retains, organizes, and dynamically utilizes user-specific knowledge. By serving as an intermediary in user interactions, it can autonomously generate context-aware responses, prefill required information, and facilitate seamless communication with external systems, significantly reducing cognitive load and interaction friction. Unlike traditional memory storage solutions, SECOND ME extends beyond static data retention by leveraging LLM-based memory parameterization. This enables structured organization, contextual reasoning, and adaptive knowledge retrieval, facilitating a more systematic and intelligent approach to memory management. As AI-driven personal agents like SECOND ME become increasingly integrated into digital ecosystems, SECOND ME further represents a critical step toward augmenting human-world interaction with persistent, contextually aware, and self-optimizing memory systems. We have open-sourced the fully localizable deployment system at GitHub: https://github.com/Mindverse/Second-Me.

Updated: 2025-03-12 11:31:31

标题: AI本地内存2.0: 第二个我

摘要: 人类与外部世界的互动基本上涉及个人记忆的交流，无论是与其他个体、网站、应用程序还是未来的AI代理之间。这种互动的一个重要部分是冗余的，需要用户在不同的上下文中重复提供相同的信息。现有的解决方案，如浏览器存储的凭据、自动填充机制和统一认证系统，旨在通过充当存储和检索常用用户数据的中介来减轻这种冗余。大型语言模型（LLMs）的出现为通过AI本地范式重新定义记忆管理提供了机会：SECOND ME。SECOND ME作为一个智能、持久的记忆卸载系统，保留、组织并动态利用用户特定的知识。通过在用户互动中充当中介，它可以自主生成具有上下文意识的响应，预填必需信息，并促进与外部系统的无缝通信，显著减少认知负荷和互动摩擦。与传统的记忆存储解决方案不同，SECOND ME通过利用基于LLM的记忆参数化而不仅仅限于静态数据保留。这使得结构化组织、上下文推理和自适应知识检索成为可能，促进了更系统化和智能化的记忆管理方法。随着像SECOND ME这样的AI驱动个人代理越来越多地集成到数字生态系统中，SECOND ME进一步代表了朝着通过持久的、具有上下文意识的、自我优化的记忆系统增强人类与世界互动的关键步骤。我们在GitHub上开源了完全可本地化部署系统：https://github.com/Mindverse/Second-Me。

更新时间: 2025-03-12 11:31:31

领域: cs.AI,cs.CL,cs.HC

下载: http://arxiv.org/abs/2503.08102v2

UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer

With the rapid development of diffusion models in image generation, the demand for more powerful and flexible controllable frameworks is increasing. Although existing methods can guide generation beyond text prompts, the challenge of effectively combining multiple conditional inputs while maintaining consistency with all of them remains unsolved. To address this, we introduce UniCombine, a DiT-based multi-conditional controllable generative framework capable of handling any combination of conditions, including but not limited to text prompts, spatial maps, and subject images. Specifically, we introduce a novel Conditional MMDiT Attention mechanism and incorporate a trainable LoRA module to build both the training-free and training-based versions. Additionally, we propose a new pipeline to construct SubjectSpatial200K, the first dataset designed for multi-conditional generative tasks covering both the subject-driven and spatially-aligned conditions. Extensive experimental results on multi-conditional generation demonstrate the outstanding universality and powerful capability of our approach with state-of-the-art performance.

Updated: 2025-03-12 11:22:47

标题: UniCombine：具有扩散变压器的统一多条件组合

摘要: 随着图像生成中扩散模型的快速发展，对更强大和灵活的可控框架的需求正在增加。尽管现有方法可以引导生成超出文本提示，但有效地结合多个条件输入并同时保持与所有条件的一致性的挑战仍未解决。为了解决这个问题，我们引入了UniCombine，这是一种基于DiT的多条件可控生成框架，能够处理任何组合的条件，包括但不限于文本提示、空间地图和主题图像。具体来说，我们引入了一种新颖的条件MMDiT注意机制，并结合一个可训练的LoRA模块来构建基于训练和不基于训练的版本。此外，我们提出了一个新的流程来构建SubjectSpatial200K，这是专为多条件生成任务设计的第一个数据集，涵盖了主题驱动和空间对齐的条件。对多条件生成的广泛实验结果展示了我们方法的卓越普适性和强大能力，具有最新技术表现。

更新时间: 2025-03-12 11:22:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09277v1

DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection

Open-Vocabulary object detectors can recognize a wide range of categories using simple textual prompts. However, improving their ability to detect rare classes or specialize in certain domains remains a challenge. While most recent methods rely on a single set of model weights for adaptation, we take a different approach by using modular deep learning. We introduce DitHub, a framework designed to create and manage a library of efficient adaptation modules. Inspired by Version Control Systems, DitHub organizes expert modules like branches that can be fetched and merged as needed. This modular approach enables a detailed study of how adaptation modules combine, making it the first method to explore this aspect in Object Detection. Our approach achieves state-of-the-art performance on the ODinW-13 benchmark and ODinW-O, a newly introduced benchmark designed to evaluate how well models adapt when previously seen classes reappear. For more details, visit our project page: https://aimagelab.github.io/DitHub/

Updated: 2025-03-12 11:15:34

标题: DitHub：用于增量开放词汇目标检测的模块化框架

摘要: 开放词汇对象检测器可以使用简单的文本提示识别各种范畴。然而，改进它们检测罕见类别或专门化某些领域的能力仍然是一个挑战。虽然大多数最近的方法依赖于单一的模型权重进行适应，但我们采用了一种不同的方法，即使用模块化深度学习。我们介绍了DitHub，一个旨在创建和管理高效适应模块库的框架。受版本控制系统的启发，DitHub组织专家模块，可以根据需要获取和合并，就像分支一样。这种模块化方法使得对适应模块如何组合进行详细研究成为可能，这是首个在目标检测中探索这一方面的方法。我们的方法在ODinW-13基准测试和ODinW-O上实现了最先进的性能，后者是一个新引入的基准测试，旨在评估模型在先前出现的类别重新出现时适应的情况。更多详情，请访问我们的项目页面：https://aimagelab.github.io/DitHub/

更新时间: 2025-03-12 11:15:34

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.09271v1

Rule-Guided Reinforcement Learning Policy Evaluation and Improvement

We consider the challenging problem of using domain knowledge to improve deep reinforcement learning policies. To this end, we propose LEGIBLE, a novel approach, following a multi-step process, which starts by mining rules from a deep RL policy, constituting a partially symbolic representation. These rules describe which decisions the RL policy makes and which it avoids making. In the second step, we generalize the mined rules using domain knowledge expressed as metamorphic relations. We adapt these relations from software testing to RL to specify expected changes of actions in response to changes in observations. The third step is evaluating generalized rules to determine which generalizations improve performance when enforced. These improvements show weaknesses in the policy, where it has not learned the general rules and thus can be improved by rule guidance. LEGIBLE supported by metamorphic relations provides a principled way of expressing and enforcing domain knowledge about RL environments. We show the efficacy of our approach by demonstrating that it effectively finds weaknesses, accompanied by explanations of these weaknesses, in eleven RL environments and by showcasing that guiding policy execution with rules improves performance w.r.t. gained reward.

Updated: 2025-03-12 11:13:08

标题: 基于规则引导的强化学习策略评估与改进

摘要: 我们考虑使用领域知识来改进深度强化学习策略的挑战性问题。为此，我们提出了LEGIBLE，一种新颖的方法，遵循一个多步过程，首先从深度强化学习策略中挖掘规则，构成部分符号表示。这些规则描述了强化学习策略做出的决策以及避免做出的决策。在第二步中，我们使用作为变形关系表达的领域知识来概括挖掘出的规则。我们将这些关系从软件测试中适应到强化学习，以指定对观察变化作出响应的行动预期变化。第三步是评估概括的规则，以确定哪些概括在强化执行时能够改善性能。这些改进显示了策略的弱点，它没有学习到一般规则，因此可以通过规则指导来改进。LEGIBLE通过变形关系提供了一种表达和强制执行关于RL环境的领域知识的原则方法。我们通过展示我们的方法有效地发现了十一个RL环境中的弱点，并解释了这些弱点的情况，展示了通过规则引导策略执行可以提高性能，从而证明了我们方法的有效性。

更新时间: 2025-03-12 11:13:08

领域: cs.LG,cs.SE

下载: http://arxiv.org/abs/2503.09270v1

Single-Qudit Quantum Neural Networks for Multiclass Classification

This paper proposes a single-qudit quantum neural network for multiclass classification, by using the enhanced representational capacity of high-dimensional qudit states. Our design employs an $d$-dimensional unitary operator, where $d$ corresponds to the number of classes, constructed using the Cayley transform of a skew-symmetric matrix, to efficiently encode and process class information. This architecture enables a direct mapping between class labels and quantum measurement outcomes, reducing circuit depth and computational overhead. To optimize network parameters, we introduce a hybrid training approach that combines an extended activation function -- derived from a truncated multivariable Taylor series expansion -- with support vector machine optimization for weight determination. We evaluate our model on the MNIST and EMNIST datasets, demonstrating competitive accuracy while maintaining a compact single-qudit quantum circuit. Our findings highlight the potential of qudit-based QNNs as scalable alternatives to classical deep learning models, particularly for multiclass classification. However, practical implementation remains constrained by current quantum hardware limitations. This research advances quantum machine learning by demonstrating the feasibility of higher-dimensional quantum systems for efficient learning tasks.

Updated: 2025-03-12 11:12:05

标题: 单量子比特量子神经网络用于多类分类

摘要: 本文提出了一种用于多类分类的单量子位量子神经网络，利用高维量子位态的增强表示能力。我们的设计采用了一个$d$维的幺正算子，其中$d$对应于类的数量，使用一个斜对称矩阵的Cayley变换来构建，以有效地编码和处理类信息。这种架构实现了类标签和量子测量结果之间的直接映射，减少了电路深度和计算开销。为了优化网络参数，我们引入了一种混合训练方法，结合了一个扩展激活函数 -- 源自截断的多变量泰勒级数展开 -- 以及支持向量机优化用于权重确定。我们在MNIST和EMNIST数据集上评估了我们的模型，展示了竞争性的准确性，同时保持了紧凑的单量子位量子电路。我们的研究突出了基于量子位的QNN作为可扩展的替代方案，特别适用于多类分类。然而，目前量子硬件限制限制了实际实现。这项研究通过展示高维量子系统对于高效学习任务的可行性，推进了量子机器学习的发展。

更新时间: 2025-03-12 11:12:05

领域: quant-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.09269v1

Neural Normalized Cut: A Differential and Generalizable Approach for Spectral Clustering

Spectral clustering, as a popular tool for data clustering, requires an eigen-decomposition step on a given affinity to obtain the spectral embedding. Nevertheless, such a step suffers from the lack of generalizability and scalability. Moreover, the obtained spectral embeddings can hardly provide a good approximation to the ground-truth partition and thus a k-means step is adopted to quantize the embedding. In this paper, we propose a simple yet effective scalable and generalizable approach, called Neural Normalized Cut (NeuNcut), to learn the clustering membership for spectral clustering directly. In NeuNcut, we properly reparameterize the unknown cluster membership via a neural network, and train the neural network via stochastic gradient descent with a properly relaxed normalized cut loss. As a result, our NeuNcut enjoys a desired generalization ability to directly infer clustering membership for out-of-sample unseen data and hence brings us an efficient way to handle clustering task with ultra large-scale data. We conduct extensive experiments on both synthetic data and benchmark datasets and experimental results validate the effectiveness and the superiority of our approach. Our code is available at: https://github.com/hewei98/NeuNcut.

Updated: 2025-03-12 11:00:16

标题: 神经标准切割：一种差分和可推广的用于谱聚类的方法

摘要: 谱聚类作为一种流行的数据聚类工具，需要对给定的关联性进行特征分解，以获得谱嵌入。然而，这样的步骤缺乏泛化性和可扩展性。此外，获得的谱嵌入很难提供对真实划分的良好近似，因此采用k-means步骤对嵌入进行量化。在本文中，我们提出了一种简单而有效的可扩展和具有泛化性的方法，称为神经归一化割（NeuNcut），直接学习谱聚类的聚类成员资格。在NeuNcut中，我们通过神经网络适当地重新参数化未知的集群成员资格，并通过适当放松的归一化割损失来训练神经网络。因此，我们的NeuNcut具有直接推断未见样本数据的聚类成员资格的理想泛化能力，从而为我们处理超大规模数据的聚类任务提供了有效的方法。我们在合成数据和基准数据集上进行了大量实验，实验结果验证了我们方法的有效性和优越性。我们的代码可在以下链接找到：https://github.com/hewei98/NeuNcut。

更新时间: 2025-03-12 11:00:16

领域: cs.LG

下载: http://arxiv.org/abs/2503.09260v1

A Deep Reinforcement Learning Approach to Automated Stock Trading, using xLSTM Networks

Traditional Long Short-Term Memory (LSTM) networks are effective for handling sequential data but have limitations such as gradient vanishing and difficulty in capturing long-term dependencies, which can impact their performance in dynamic and risky environments like stock trading. To address these limitations, this study explores the usage of the newly introduced Extended Long Short Term Memory (xLSTM) network in combination with a deep reinforcement learning (DRL) approach for automated stock trading. Our proposed method utilizes xLSTM networks in both actor and critic components, enabling effective handling of time series data and dynamic market environments. Proximal Policy Optimization (PPO), with its ability to balance exploration and exploitation, is employed to optimize the trading strategy. Experiments were conducted using financial data from major tech companies over a comprehensive timeline, demonstrating that the xLSTM-based model outperforms LSTM-based methods in key trading evaluation metrics, including cumulative return, average profitability per trade, maximum earning rate, maximum pullback, and Sharpe ratio. These findings mark the potential of xLSTM for enhancing DRL-based stock trading systems.

Updated: 2025-03-12 10:56:03

标题: 一种基于深度强化学习的自动股票交易方法，使用xLSTM网络

摘要: 传统的长短期记忆（LSTM）网络在处理序列数据方面是有效的，但存在梯度消失和难以捕捉长期依赖等限制，这可能影响它们在像股票交易这样的动态和风险环境中的表现。为了解决这些限制，本研究探讨了新引入的扩展长短期记忆（xLSTM）网络与深度强化学习（DRL）方法相结合应用于自动股票交易。我们提出的方法利用了xLSTM网络在演员和评论家组件中的应用，能够有效处理时间序列数据和动态市场环境。采用能够平衡探索和开发利用的近端策略优化（PPO）来优化交易策略。实验使用主要科技公司的财务数据在全面的时间范围内进行，结果表明基于xLSTM的模型在关键交易评估指标中表现优于基于LSTM的方法，包括累积回报、每笔交易的平均盈利能力、最大收益率、最大回撤和夏普比率。这些发现标志着xLSTM在增强基于DRL的股票交易系统方面的潜力。

更新时间: 2025-03-12 10:56:03

领域: cs.CE,cs.LG,q-fin.TR

下载: http://arxiv.org/abs/2503.09655v1

DeepInnovation AI: A Global Dataset Mapping the AI innovation and technology Transfer from Academic Research to Industrial Patents

In the rapidly evolving field of artificial intelligence (AI), mapping innovation patterns and understanding effective technology transfer from academic research to practical applications are essential for economic growth. This paper introduces DeepInnovationAI, the first comprehensive global dataset designed to bridge the gap between academic papers and industrial patents. However, existing data infrastructures face three major limitations: fragmentation, incomplete coverage, and insufficient evaluative capacity. Here, we present DeepInnovationAI, a comprehensive global dataset documenting AI innovation trajectories. The dataset comprises three structured files: DeepPatentAI.csv: Contains 2,356,204 patent records with 8 field-specific attributes. DeepDiveAI.csv: Encompasses 3,511,929 academic publications with 13 metadata fields. These two datasets employ large language models, multilingual text analysis and dual-layer BERT classifiers to accurately identify AI-related content and utilizing hypergraph analysis methods to create robust innovation metrics. In addition, DeepCosineAI.csv: By applying semantic vector proximity analysis, this file presents approximately one hundred million calculated paper-patent similarity pairs to enhance understanding of how theoretical advancements translate into commercial technologies. This enables researchers, policymakers, and industry leaders to anticipate trends and identify emerging areas for collaboration. With its extensive temporal and geographical scope, DeepInnovationAI supports detailed analysis of technological development patterns and international competition dynamics, providing a robust foundation for modeling AI innovation dynamics and technology transfer processes.

Updated: 2025-03-12 10:56:02

标题: DeepInnovation AI：一个全球数据集，用于映射从学术研究到工业专利的人工智能创新和技术转移.

摘要: 在快速发展的人工智能（AI）领域，映射创新模式并了解从学术研究到实际应用的有效技术转移对经济增长至关重要。本文介绍了DeepInnovationAI，这是第一个旨在弥合学术论文和工业专利之间差距的全面全球数据集。然而，现有的数据基础设施面临三个主要限制：碎片化、覆盖不完整以及评估能力不足。在这里，我们介绍了DeepInnovationAI，这是一个记录AI创新轨迹的全面全球数据集。该数据集包括三个结构化文件：DeepPatentAI.csv：包含2,356,204个专利记录，具有8个领域特定属性。DeepDiveAI.csv：包括3,511,929篇学术出版物，具有13个元数据字段。这两个数据集采用大型语言模型、多语言文本分析和双层BERT分类器，准确识别与AI相关的内容，并利用超图分析方法创建强大的创新指标。此外，DeepCosineAI.csv：通过应用语义向量接近分析，该文件呈现大约一亿计算的论文-专利相似对，以增强对理论进展如何转化为商业技术的理解。这使研究人员、决策者和行业领导能够预测趋势并确定合作的新兴领域。通过其广泛的时间和地理范围，DeepInnovationAI支持对技术发展模式和国际竞争动态的详细分析，为建模AI创新动态和技术转移过程提供坚实基础。

更新时间: 2025-03-12 10:56:02

领域: cs.DB,cs.AI,cs.DL

下载: http://arxiv.org/abs/2503.09257v1

Large-scale Regional Traffic Signal Control Based on Single-Agent Reinforcement Learning

In the context of global urbanization and motorization, traffic congestion has become a significant issue, severely affecting the quality of life, environment, and economy. This paper puts forward a single-agent reinforcement learning (RL)-based regional traffic signal control (TSC) model. Different from multi - agent systems, this model can coordinate traffic signals across a large area, with the goals of alleviating regional traffic congestion and minimizing the total travel time. The TSC environment is precisely defined through specific state space, action space, and reward functions. The state space consists of the current congestion state, which is represented by the queue lengths of each link, and the current signal phase scheme of intersections. The action space is designed to select an intersection first and then adjust its phase split. Two reward functions are meticulously crafted. One focuses on alleviating congestion and the other aims to minimize the total travel time while considering the congestion level. The experiments are carried out with the SUMO traffic simulation software. The performance of the TSC model is evaluated by comparing it with a base case where no signal-timing adjustments are made. The results show that the model can effectively control congestion. For example, the queuing length is significantly reduced in the scenarios tested. Moreover, when the reward is set to both alleviate congestion and minimize the total travel time, the average travel time is remarkably decreased, which indicates that the model can effectively improve traffic conditions. This research provides a new approach for large-scale regional traffic signal control and offers valuable insights for future urban traffic management.

Updated: 2025-03-12 10:51:29

标题: 基于单智能体强化学习的大规模区域交通信号控制

摘要: 在全球城市化和机动化的背景下，交通拥堵已成为一个重要问题，严重影响生活质量、环境和经济。本文提出了基于单智能体强化学习（RL）的区域交通信号控制（TSC）模型。与多智能体系统不同，该模型可以协调大范围内的交通信号，旨在缓解区域交通拥堵并最小化总行车时间。TSC环境通过特定的状态空间、动作空间和奖励函数进行精确定义。状态空间包括当前拥堵状态，由每个链路的排队长度和交叉口当前信号相位方案表示。动作空间设计为首先选择一个交叉口，然后调整其相位分割。两个奖励函数被精心设计。一个侧重于缓解拥堵，另一个旨在在考虑拥堵水平的情况下最小化总行车时间。实验使用SUMO交通仿真软件进行。通过将其与未进行信号定时调整的基线情况进行比较，评估了TSC模型的性能。结果显示该模型可以有效控制拥堵。例如，在测试的情景中，排队长度显著减少。此外，当奖励设置为既缓解拥堵又最小化总行车时间时，平均行车时间显着减少，这表明该模型可以有效改善交通状况。这项研究为大规模区域交通信号控制提供了一种新方法，并为未来城市交通管理提供了有价值的见解。

更新时间: 2025-03-12 10:51:29

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.09252v1

HELM: Hierarchical Encoding for mRNA Language Modeling

Messenger RNA (mRNA) plays a crucial role in protein synthesis, with its codon structure directly impacting biological properties. While Language Models (LMs) have shown promise in analyzing biological sequences, existing approaches fail to account for the hierarchical nature of mRNA's codon structure. We introduce Hierarchical Encoding for mRNA Language Modeling (HELM), a novel pre-training strategy that incorporates codon-level hierarchical structure into language model training. HELM modulates the loss function based on codon synonymity, aligning the model's learning process with the biological reality of mRNA sequences. We evaluate HELM on diverse mRNA datasets and tasks, demonstrating that HELM outperforms standard language model pre-training as well as existing foundation model baselines on seven diverse downstream property prediction tasks and an antibody region annotation tasks on average by around 8%. Additionally, HELM enhances the generative capabilities of language model, producing diverse mRNA sequences that better align with the underlying true data distribution compared to non-hierarchical baselines.

Updated: 2025-03-12 10:51:14

标题: HELM: mRNA语言建模的分层编码

摘要: 信使RNA（mRNA）在蛋白质合成中发挥关键作用，其密码子结构直接影响生物特性。虽然语言模型（LMs）在分析生物序列方面显示出潜力，但现有方法未能考虑mRNA密码子结构的分层性质。我们引入了Hierarchical Encoding for mRNA Language Modeling（HELM），这是一种新颖的预训练策略，将密码子级别的分层结构纳入语言模型训练中。HELM根据密码子的同义性调节损失函数，使模型的学习过程与mRNA序列的生物现实相一致。我们在多样的mRNA数据集和任务上评估了HELM，表明HELM在七个不同的下游特性预测任务和抗体区域注释任务上的表现优于标准语言模型预训练以及现有的基准模型，平均提高了约8％。此外，HELM增强了语言模型的生成能力，生成的多样化mRNA序列与底层真实数据分布更好地吻合，相比非分层基线。

更新时间: 2025-03-12 10:51:14

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2410.12459v2

Locally Differentially Private Online Federated Learning With Correlated Noise

We introduce a locally differentially private (LDP) algorithm for online federated learning that employs temporally correlated noise to improve utility while preserving privacy. To address challenges posed by the correlated noise and local updates with streaming non-IID data, we develop a perturbed iterate analysis that controls the impact of the noise on the utility. Moreover, we demonstrate how the drift errors from local updates can be effectively managed for several classes of nonconvex loss functions. Subject to an $(\epsilon,\delta)$-LDP budget, we establish a dynamic regret bound that quantifies the impact of key parameters and the intensity of changes in the dynamic environment on the learning performance. Numerical experiments confirm the efficacy of the proposed algorithm.

Updated: 2025-03-12 10:46:46

标题: 具有相关噪声的本地差分隐私在线联邦学习

摘要: 我们介绍了一种用于在线联合学习的局部差分隐私（LDP）算法，该算法利用时间相关的噪声来提高效用同时保护隐私。为了解决由相关噪声和具有流式非独立同分布数据的本地更新所带来的挑战，我们开发了一个扰动迭代分析，控制噪声对效用的影响。此外，我们展示了如何有效地管理来自本地更新的漂移误差，适用于多类非凸损失函数。在（ε，δ）-LDP预算的限制下，我们建立了一个动态遗憾界，量化了关键参数和动态环境变化强度对学习性能的影响。数值实验证实了该算法的有效性。

更新时间: 2025-03-12 10:46:46

领域: cs.LG,cs.DC,stat.ML

下载: http://arxiv.org/abs/2411.18752v3

SCOPE-DTI: Semi-Inductive Dataset Construction and Framework Optimization for Practical Usability Enhancement in Deep Learning-Based Drug Target Interaction Prediction

Deep learning-based drug-target interaction (DTI) prediction methods have demonstrated strong performance; however, real-world applicability remains constrained by limited data diversity and modeling complexity. To address these challenges, we propose SCOPE-DTI, a unified framework combining a large-scale, balanced semi-inductive human DTI dataset with advanced deep learning modeling. Constructed from 13 public repositories, the SCOPE dataset expands data volume by up to 100-fold compared to common benchmarks such as the Human dataset. The SCOPE model integrates three-dimensional protein and compound representations, graph neural networks, and bilinear attention mechanisms to effectively capture cross domain interaction patterns, significantly outperforming state-of-the-art methods across various DTI prediction tasks. Additionally, SCOPE-DTI provides a user-friendly interface and database. We further validate its effectiveness by experimentally identifying anticancer targets of Ginsenoside Rh1. By offering comprehensive data, advanced modeling, and accessible tools, SCOPE-DTI accelerates drug discovery research.

Updated: 2025-03-12 10:46:25

标题: SCOPE-DTI：半归纳数据集构建和框架优化，用于深度学习药物靶点相互作用预测中的实用性增强

摘要: 基于深度学习的药物靶点相互作用（DTI）预测方法表现出强大性能；然而，现实世界中的适用性仍受限于有限的数据多样性和建模复杂性。为了解决这些挑战，我们提出了SCOPE-DTI，一个统一的框架，将大规模、平衡的半归纳人类DTI数据集与先进的深度学习建模相结合。SCOPE数据集由13个公共存储库构建，与常见基准数据集（如Human数据集）相比，数据量增加了多达100倍。SCOPE模型整合了三维蛋白质和化合物表示、图神经网络和双线性注意机制，有效捕捉跨领域交互模式，显著优于各种DTI预测任务中的最新方法。此外，SCOPE-DTI提供了用户友好的界面和数据库。我们进一步通过实验确定了人参皂苷Rh1的抗癌靶点，验证了其有效性。通过提供全面的数据、先进的建模和易于访问的工具，SCOPE-DTI加速了药物发现研究。

更新时间: 2025-03-12 10:46:25

领域: cs.LG,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2503.09251v1

Considering Length Diversity in Retrieval-Augmented Summarization

This study investigates retrieval-augmented summarization by specifically examining the impact of exemplar summary lengths under length constraints, not covered by previous work. We propose a Diverse Length-aware Maximal Marginal Relevance (DL-MMR) algorithm to better control summary lengths. This algorithm combines the query relevance with diverse target lengths in retrieval-augmented summarization. Unlike previous methods that necessitate exhaustive exemplar exemplar relevance comparisons using MMR, DL-MMR considers the exemplar target length as well and avoids comparing exemplars to each other, thereby reducing computational cost and conserving memory during the construction of an exemplar pool. Experimental results showed the effectiveness of DL-MMR, which considers length diversity, compared to the original MMR algorithm. DL-MMR additionally showed the effectiveness in memory saving of 781,513 times and computational cost reduction of 500,092 times, while maintaining the same level of informativeness.

Updated: 2025-03-12 10:43:33

标题: 考虑检索增强式摘要中的长度多样性

摘要: 这项研究通过具体考察模范摘要长度在长度约束条件下的影响，探讨了检索增强摘要。我们提出了一种多样化长度感知最大边际相关性（DL-MMR）算法，以更好地控制摘要长度。该算法将查询相关性与检索增强摘要中的多样化目标长度结合起来。与以往需要使用MMR进行详尽模范相关性比较的方法不同，DL-MMR还考虑了模范目标长度，并避免将模范互相比较，从而降低了计算成本并在构建模范池时节省了内存。实验结果显示，考虑长度多样性的DL-MMR算法相比原始MMR算法更为有效。DL-MMR还展示了在保持相同信息量水平的情况下，内存节省了781,513倍，计算成本降低了500,092倍。

更新时间: 2025-03-12 10:43:33

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2503.09249v1

What is the relation between Slow Feature Analysis and the Successor Representation?

Slow feature analysis (SFA) is an unsupervised method for extracting representations from time series data. The successor representation (SR) is a method for representing states in a Markov decision process (MDP) based on transition statistics. While SFA and SR stem from distinct areas of machine learning, they share important properties, both in terms of their mathematics and the types of information they are sensitive to. This work studies their connection along these two axes. In particular, both SFA and SR are explored analytically, and in the setting of a one-hot encoded MDP, a formal equivalence is demonstrated in terms of the grid-like representations that occur as solutions/eigenvectors. Moreover, it is shown that the columns of the matrices involved in SFA contain place-like representations, which are formally distinct from place-cell models that have already been defined using SFA.

Updated: 2025-03-12 10:41:49

标题: Slow Feature Analysis与继任者表示之间的关系是什么？

摘要: 慢特征分析（SFA）是一种从时间序列数据中提取表示的无监督方法。继承者表示法（SR）是一种基于转移统计的马尔可夫决策过程（MDP）中表示状态的方法。虽然SFA和SR源自机器学习的不同领域，但它们在数学和敏感信息类型方面都具有重要属性。本文研究了它们在这两个方面的联系。特别地，SFA和SR都在分析和一个one-hot编码的MDP环境中进行了探讨，在这种环境下，通过出现的解/特征向量形式上展示了网格状表示的形式等价性。此外，还表明SFA中涉及的矩阵的列包含地点样式表示，这些形式上与已经使用SFA定义的地点细胞模型不同。

更新时间: 2025-03-12 10:41:49

领域: cs.LG

下载: http://arxiv.org/abs/2409.16991v2

In-Context Defense in Computer Agents: An Empirical Study

Computer agents powered by vision-language models (VLMs) have significantly advanced human-computer interaction, enabling users to perform complex tasks through natural language instructions. However, these agents are vulnerable to context deception attacks, an emerging threat where adversaries embed misleading content into the agent's operational environment, such as a pop-up window containing deceptive instructions. Existing defenses, such as instructing agents to ignore deceptive elements, have proven largely ineffective. As the first systematic study on protecting computer agents, we introduce textbf{in-context defense}, leveraging in-context learning and chain-of-thought (CoT) reasoning to counter such attacks. Our approach involves augmenting the agent's context with a small set of carefully curated exemplars containing both malicious environments and corresponding defensive responses. These exemplars guide the agent to first perform explicit defensive reasoning before action planning, reducing susceptibility to deceptive attacks. Experiments demonstrate the effectiveness of our method, reducing attack success rates by 91.2% on pop-up window attacks, 74.6% on average on environment injection attacks, while achieving 100% successful defenses against distracting advertisements. Our findings highlight that (1) defensive reasoning must precede action planning for optimal performance, and (2) a minimal number of exemplars (fewer than three) is sufficient to induce an agent's defensive behavior.

Updated: 2025-03-12 10:38:15

标题: 计算机代理中的上下文防御：一项实证研究

摘要: 由视觉-语言模型（VLM）驱动的计算机代理已经显著推进了人机交互，使用户能够通过自然语言指令执行复杂任务。然而，这些代理容易受到上下文欺骗攻击的影响，这是一种新兴威胁，对手将误导性内容嵌入代理的操作环境，例如包含欺骗性指令的弹出窗口。现有的防御措施，如指示代理忽略欺骗性元素，已被证明在很大程度上无效。作为第一项关于保护计算机代理的系统研究，我们引入了文本bf{上下文防御}，利用上下文学习和思维链（CoT）推理来对抗这种攻击。我们的方法涉及将代理的上下文与一小组精心策划的示例相结合，这些示例包含恶意环境和相应的防御响应。这些示例引导代理在行动计划之前首先执行显式的防御推理，从而降低对欺骗性攻击的敏感性。实验证明了我们方法的有效性，在弹出窗口攻击上将攻击成功率降低了91.2％，在平均环境注入攻击上降低了74.6％，同时实现了对分散注意力广告的100％成功防御。我们的研究结果强调了（1）防御推理必须在行动计划之前进行才能实现最佳性能，（2）少量示例（少于三个）足以引导代理的防御行为。

更新时间: 2025-03-12 10:38:15

领域: cs.AI

下载: http://arxiv.org/abs/2503.09241v1

Structural Entropy Guided Unsupervised Graph Out-Of-Distribution Detection

With the emerging of huge amount of unlabeled data, unsupervised out-of-distribution (OOD) detection is vital for ensuring the reliability of graph neural networks (GNNs) by identifying OOD samples from in-distribution (ID) ones during testing, where encountering novel or unknown data is inevitable. Existing methods often suffer from compromised performance due to redundant information in graph structures, which impairs their ability to effectively differentiate between ID and OOD data. To address this challenge, we propose SEGO, an unsupervised framework that integrates structural entropy into OOD detection regarding graph classification. Specifically, within the architecture of contrastive learning, SEGO introduces an anchor view in the form of coding tree by minimizing structural entropy. The obtained coding tree effectively removes redundant information from graphs while preserving essential structural information, enabling the capture of distinct graph patterns between ID and OOD samples. Furthermore, we present a multi-grained contrastive learning scheme at local, global, and tree levels using triplet views, where coding trees with essential information serve as the anchor view. Extensive experiments on real-world datasets validate the effectiveness of SEGO, demonstrating superior performance over state-of-the-art baselines in OOD detection. Specifically, our method achieves the best performance on 9 out of 10 dataset pairs, with an average improvement of 3.7\% on OOD detection datasets, significantly surpassing the best competitor by 10.8\% on the FreeSolv/ToxCast dataset pair.

Updated: 2025-03-12 10:24:40

标题: 结构熵引导的无监督图形外分布检测

摘要: 随着大量未标记数据的出现，无监督的离群检测对于确保图神经网络（GNNs）的可靠性至关重要，通过在测试过程中识别出分布内（ID）样本中的离群样本，从而避免遇到新颖或未知数据。现有方法往往由于图结构中的冗余信息而导致性能受损，从而削弱了它们有效区分ID和OOD数据的能力。为了解决这一挑战，我们提出了SEGO，这是一个无监督框架，将结构熵整合到图分类的OOD检测中。具体而言，在对比学习的架构中，SEGO通过最小化结构熵引入编码树形式的锚视图。所得到的编码树有效地从图中删除了冗余信息，同时保留了必要的结构信息，从而能够捕捉ID和OOD样本之间的不同图样式。此外，我们提出了一个多粒度对比学习方案，包括本地、全局和树级别的三元视图，其中具有必要信息的编码树作为锚视图。对真实世界数据集进行的广泛实验验证了SEGO的有效性，在OOD检测方面表现优于最先进的基线。具体而言，我们的方法在10个数据集对中有9个表现最佳，在OOD检测数据集上平均提高了3.7\%，在FreeSolv/ToxCast数据集对上超过最佳竞争对手10.8%。

更新时间: 2025-03-12 10:24:40

领域: cs.LG

下载: http://arxiv.org/abs/2503.03241v2

Physical knowledge improves prediction of EM Fields

We propose a 3D U-Net model to predict the spatial distribution of electromagnetic fields inside a radio-frequency (RF) coil with a subject present, using the phase, amplitude, and position of the coils, along with the density, permittivity, and conductivity of the surrounding medium as inputs. To improve accuracy, we introduce a physics-augmented variant, U-Net Phys, which incorporates Gauss's law of magnetism into the loss function using finite differences. We train our models on electromagnetic field simulations from CST Studio Suite for an eight-channel dipole array RF coil at 7T MRI. Experimental results show that U-Net Phys significantly outperforms the standard U-Net, particularly in predicting fields within the subject, demonstrating the advantage of integrating physical constraints into deep learning-based field prediction.

Updated: 2025-03-12 10:22:57

标题: 物理知识改善对电磁场的预测

摘要: 我们提出了一个3D U-Net模型，用于预测在有对象存在的情况下，利用线圈的相位、幅度和位置以及周围介质的密度、介电常数和电导率作为输入，预测射频线圈内部电磁场的空间分布。为了提高准确性，我们引入了一种物理增强的变种，U-Net Phys，它将高斯磁定律融入损失函数中，使用有限差分。我们在CST Studio Suite上对7T MRI的八通道偶极阵射频线圈的电磁场模拟进行训练。实验结果表明，U-Net Phys在预测主体内部的场时明显优于标准U-Net，展示了将物理约束集成到基于深度学习的场预测中的优势。

更新时间: 2025-03-12 10:22:57

领域: cs.LG

下载: http://arxiv.org/abs/2503.11703v1

Audio-Visual Deepfake Detection With Local Temporal Inconsistencies

This paper proposes an audio-visual deepfake detection approach that aims to capture fine-grained temporal inconsistencies between audio and visual modalities. To achieve this, both architectural and data synthesis strategies are introduced. From an architectural perspective, a temporal distance map, coupled with an attention mechanism, is designed to capture these inconsistencies while minimizing the impact of irrelevant temporal subsequences. Moreover, we explore novel pseudo-fake generation techniques to synthesize local inconsistencies. Our approach is evaluated against state-of-the-art methods using the DFDC and FakeAVCeleb datasets, demonstrating its effectiveness in detecting audio-visual deepfakes.

Updated: 2025-03-12 10:22:54

标题: 使用本地时间不一致性进行音视频深度伪造检测

摘要: 本文提出了一种音视频深伪造检测方法，旨在捕捉音频和视觉模态之间细粒度的时间不一致性。为实现这一目标，引入了建筑和数据综合策略。从建筑角度来看，设计了一个时间距离图，结合注意机制，以捕捉这些不一致性，同时最大程度地减少不相关的时间子序列的影响。此外，我们探索了新颖的伪造生成技术，以合成局部不一致性。我们的方法通过对DFDC和FakeAVCeleb数据集的评估，与最先进的方法进行了比较，证明了其在检测音视频深伪造方面的有效性。

更新时间: 2025-03-12 10:22:54

领域: cs.CV,cs.CR,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2501.08137v3

SSD: A State-based Stealthy Backdoor Attack For Navigation System in UAV Route Planning

Unmanned aerial vehicles (UAVs) are increasingly employed to perform high-risk tasks that require minimal human intervention. However, UAVs face escalating cybersecurity threats, particularly from GNSS spoofing attacks. While previous studies have extensively investigated the impacts of GNSS spoofing on UAVs, few have focused on its effects on specific tasks. Moreover, the influence of UAV motion states on the assessment of network security risks is often overlooked. To address these gaps, we first provide a detailed evaluation of how motion states affect the effectiveness of network attacks. We demonstrate that nonlinear motion states not only enhance the effectiveness of position spoofing in GNSS spoofing attacks but also reduce the probability of speed-related attack detection. Building upon this, we propose a state-triggered backdoor attack method (SSD) to deceive GNSS systems and assess its risk to trajectory planning tasks. Extensive validation of SSD's effectiveness and stealthiness is conducted. Experimental results show that, with appropriately tuned hyperparameters, SSD significantly increases positioning errors and the risk of task failure, while maintaining 100% stealth across three state-of-the-art detectors.

Updated: 2025-03-12 10:19:58

标题: SSD：一种基于状态的无形后门攻击，用于无人机航线规划中的导航系统

摘要: 无人机（UAVs）越来越多地被用于执行需要最少人工干预的高风险任务。然而，UAVs面临着不断升级的网络安全威胁，尤其是来自GNSS欺骗攻击。虽然以前的研究广泛调查了GNSS欺骗对UAVs的影响，但很少有研究关注其对特定任务的影响。此外，无人机运动状态对评估网络安全风险的影响经常被忽视。为了填补这些空白，我们首先提供了对运动状态如何影响网络攻击效果的详细评估。我们证明非线性运动状态不仅增强了GNSS欺骗攻击中位置欺骗的有效性，还降低了与速度相关的攻击检测的概率。在此基础上，我们提出了一种基于状态触发的后门攻击方法（SSD）来欺骗GNSS系统，并评估其对轨迹规划任务的风险。对SSD的有效性和隐蔽性进行了广泛验证。实验结果表明，在适当调整超参数的情况下，SSD显著增加了定位误差和任务失败的风险，同时在三种最先进的检测器中保持了100%的隐蔽性。

更新时间: 2025-03-12 10:19:58

领域: cs.CR

下载: http://arxiv.org/abs/2502.20178v2

Towards Regulatory-Confirmed Adaptive Clinical Trials: Machine Learning Opportunities and Solutions

Randomized Controlled Trials (RCTs) are the gold standard for evaluating the effect of new medical treatments. Treatments must pass stringent regulatory conditions in order to be approved for widespread use, yet even after the regulatory barriers are crossed, real-world challenges might arise: Who should get the treatment? What is its true clinical utility? Are there discrepancies in the treatment effectiveness across diverse and under-served populations? We introduce two new objectives for future clinical trials that integrate regulatory constraints and treatment policy value for both the entire population and under-served populations, thus answering some of the questions above in advance. Designed to meet these objectives, we formulate Randomize First Augment Next (RFAN), a new framework for designing Phase III clinical trials. Our framework consists of a standard randomized component followed by an adaptive one, jointly meant to efficiently and safely acquire and assign patients into treatment arms during the trial. Then, we propose strategies for implementing RFAN based on causal, deep Bayesian active learning. Finally, we empirically evaluate the performance of our framework using synthetic and real-world semi-synthetic datasets.

Updated: 2025-03-12 10:17:54

标题: 走向获得监管确认的自适应临床试验：机器学习的机遇和解决方案

摘要: 随机对照试验（RCTs）是评估新医疗治疗效果的黄金标准。治疗必须通过严格的监管条件才能获得广泛应用的批准，然而即使跨过了监管障碍，现实世界中可能会出现挑战：谁应该接受治疗？它的真实临床效用是什么？治疗效果在不同和未受服务的人群之间是否存在差异？我们提出了未来临床试验的两个新目标，将监管约束和治疗政策价值整合到整个人群和未受服务人群中，从而提前回答一些上述问题。为了实现这些目标，我们制定了一种新的设计第三期临床试验的框架：首先随机分组，然后增强，即Randomize First Augment Next（RFAN）。我们的框架包括标准随机成分和自适应成分，旨在在试验期间高效安全地获取和分配患者到治疗组。接着，我们提出了基于因果、深度贝叶斯主动学习的策略来实施RFAN。最后，我们通过合成和真实世界半合成数据集对我们的框架性能进行了实证评估。

更新时间: 2025-03-12 10:17:54

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.09226v1

LREF: A Novel LLM-based Relevance Framework for E-commerce

Query and product relevance prediction is a critical component for ensuring a smooth user experience in e-commerce search. Traditional studies mainly focus on BERT-based models to assess the semantic relevance between queries and products. However, the discriminative paradigm and limited knowledge capacity of these approaches restrict their ability to comprehend the relevance between queries and products fully. With the rapid advancement of Large Language Models (LLMs), recent research has begun to explore their application to industrial search systems, as LLMs provide extensive world knowledge and flexible optimization for reasoning processes. Nonetheless, directly leveraging LLMs for relevance prediction tasks introduces new challenges, including a high demand for data quality, the necessity for meticulous optimization of reasoning processes, and an optimistic bias that can result in over-recall. To overcome the above problems, this paper proposes a novel framework called the LLM-based RElevance Framework (LREF) aimed at enhancing e-commerce search relevance. The framework comprises three main stages: supervised fine-tuning (SFT) with Data Selection, Multiple Chain of Thought (Multi-CoT) tuning, and Direct Preference Optimization (DPO) for de-biasing. We evaluate the performance of the framework through a series of offline experiments on large-scale real-world datasets, as well as online A/B testing. The results indicate significant improvements in both offline and online metrics. Ultimately, the model was deployed in a well-known e-commerce application, yielding substantial commercial benefits.

Updated: 2025-03-12 10:10:30

标题: LREF：一种基于LLM的电子商务相关性框架

摘要: 查询和产品相关性预测是确保电子商务搜索中用户体验流畅的关键组成部分。传统研究主要集中在基于BERT的模型上，用于评估查询和产品之间的语义相关性。然而，这些方法的辨别范式和有限的知识容量限制了它们充分理解查询和产品之间相关性的能力。随着大型语言模型（LLMs）的快速发展，最近的研究开始探索它们在工业搜索系统中的应用，因为LLMs提供了广泛的世界知识和灵活的推理过程优化。然而，直接利用LLMs进行相关性预测任务引入了新的挑战，包括对数据质量的高需求，对推理过程的细致优化的必要性，以及可能导致过度召回的乐观偏差。为了克服上述问题，本文提出了一个名为基于LLM的相关性框架（LREF）的新颖框架，旨在增强电子商务搜索的相关性。该框架包括三个主要阶段：带数据选择的监督微调（SFT），多链思维（Multi-CoT）调整和用于去偏的直接偏好优化（DPO）。我们通过一系列离线实验和在线A/B测试评估了框架的性能。结果表明在离线和在线指标上都有显著改进。最终，该模型部署在一个知名的电子商务应用程序中，带来了实质性的商业利益。

更新时间: 2025-03-12 10:10:30

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2503.09223v1

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios

Detecting text generated by large language models (LLMs) is of great recent interest. With zero-shot methods like DetectGPT, detection capabilities have reached impressive levels. However, the reliability of existing detectors in real-world applications remains underexplored. In this study, we present a new benchmark, DetectRL, highlighting that even state-of-the-art (SOTA) detection techniques still underperformed in this task. We collected human-written datasets from domains where LLMs are particularly prone to misuse. Using popular LLMs, we generated data that better aligns with real-world applications. Unlike previous studies, we employed heuristic rules to create adversarial LLM-generated text, simulating various prompts usages, human revisions like word substitutions, and writing noises like spelling mistakes. Our development of DetectRL reveals the strengths and limitations of current SOTA detectors. More importantly, we analyzed the potential impact of writing styles, model types, attack methods, the text lengths, and real-world human writing factors on different types of detectors. We believe DetectRL could serve as an effective benchmark for assessing detectors in real-world scenarios, evolving with advanced attack methods, thus providing more stressful evaluation to drive the development of more efficient detectors. Data and code are publicly available at: https://github.com/NLP2CT/DetectRL.

Updated: 2025-03-12 10:08:22

标题: DetectRL：在现实场景中基于LLM生成文本检测的基准测试

摘要: 最近对检测大型语言模型（LLMs）生成的文本非常感兴趣。使用像DetectGPT这样的零-shot 方法，检测能力已经达到了令人印象深刻的水平。然而，现有检测器在实际应用中的可靠性仍未得到充分探索。在这项研究中，我们提出了一个新的基准，DetectRL，突出显示即使是最先进的（SOTA）检测技术在这项任务中仍表现不佳。我们从LLMs特别容易被滥用的领域收集了人工编写的数据集。使用流行的LLMs，我们生成了更符合实际应用的数据。与以往研究不同，我们采用启发式规则创建了对抗性的LLM生成文本，模拟了各种提示使用、人工修订（如单词替换）和写作噪音（如拼写错误）。我们开发的DetectRL揭示了当前SOTA检测器的优势和局限性。更重要的是，我们分析了写作风格、模型类型、攻击方法、文本长度和真实世界人类写作因素对不同类型检测器的潜在影响。我们相信DetectRL可以作为一个有效的基准，用于评估实际场景中的检测器，在进步的攻击方法下不断演进，从而提供更具挑战性的评估，推动更高效检测器的发展。数据和代码可以在https://github.com/NLP2CT/DetectRL 上公开获取。

更新时间: 2025-03-12 10:08:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.23746v3

A unifying framework for generalised Bayesian online learning in non-stationary environments

We propose a unifying framework for methods that perform probabilistic online learning in non-stationary environments. We call the framework BONE, which stands for generalised (B)ayesian (O)nline learning in (N)on-stationary (E)nvironments. BONE provides a common structure to tackle a variety of problems, including online continual learning, prequential forecasting, and contextual bandits. The framework requires specifying three modelling choices: (i) a model for measurements (e.g., a neural network), (ii) an auxiliary process to model non-stationarity (e.g., the time since the last changepoint), and (iii) a conditional prior over model parameters (e.g., a multivariate Gaussian). The framework also requires two algorithmic choices, which we use to carry out approximate inference under this framework: (i) an algorithm to estimate beliefs (posterior distribution) about the model parameters given the auxiliary variable, and (ii) an algorithm to estimate beliefs about the auxiliary variable. We show how the modularity of our framework allows for many existing methods to be reinterpreted as instances of BONE, and it allows us to propose new methods. We compare experimentally existing methods with our proposed new method on several datasets, providing insights into the situations that make each method more suitable for a specific task. We provide a Jax open source library to facilitate the adoption of this framework.

Updated: 2025-03-12 10:05:37

标题: 一个统一的框架用于非稳态环境中的广义贝叶斯在线学习

摘要: 我们提出了一个统一的框架，用于在非稳态环境中进行概率在线学习的方法。我们将这个框架称为BONE，代表广义（B）贝叶斯（O）在线学习在（N）非稳态（E）环境中。BONE提供了一个共同的结构，用于解决各种问题，包括在线连续学习、预测预处理和情境猜谜。该框架需要指定三个建模选择：（i）测量模型（例如，神经网络），（ii）用于建模非稳态性的辅助过程（例如，自上次变更点以来的时间），以及（iii）模型参数的条件先验（例如，多变量高斯）。该框架还需要两个算法选择，我们用它们来进行近似推断：（i）用于估计关于模型参数的信念（后验分布）的算法，考虑到辅助变量，以及（ii）用于估计有关辅助变量的信念的算法。我们展示了我们框架的模块化性如何使得许多现有方法可以重新解释为BONE的实例，并且使我们能够提出新的方法。我们在几个数据集上实验证明了现有方法与我们提出的新方法的比较，提供了关于使得每种方法更适合特定任务的情况的见解。我们提供了一个Jax开源库，以促进这一框架的采用。

更新时间: 2025-03-12 10:05:37

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2411.10153v3

Evaluating the Generalizability of LLMs in Automated Program Repair

LLM-based automated program repair methods have attracted significant attention for their state-of-the-art performance. However, they were primarily evaluated on a few well known datasets like Defects4J, raising questions about their effectiveness on new datasets. In this study, we evaluate 11 top-performing LLMs on DEFECTS4J-TRANS, a new dataset derived from transforming Defects4J while maintaining the original semantics. Results from experiments on both Defects4J and DEFECTS4J-TRANS show that all studied LLMs have limited generalizability in APR tasks, with the average number of correct and plausible patches decreasing by 49.48% and 42.90%, respectively, on DEFECTS4J-TRANS. Further investigation into incorporating additional repair-relevant information in repair prompts reveals that, although this information significantly enhances the LLMs' capabilities (increasing the number of correct and plausible patches by up to 136.67% and 121.82%, respectively), performance still falls short of their original results. This indicates that prompt engineering alone is insufficient to substantially enhance LLMs' repair capabilities. Based on our study, we also offer several recommendations for future research.

Updated: 2025-03-12 10:03:58

标题: 评估LLMs在自动程序修复中的泛化能力

摘要: 基于LLM的自动程序修复方法因其领先的性能而受到重视。然而，它们主要是在一些众所周知的数据集（如Defects4J）上进行评估，这引发了关于它们在新数据集上有效性的疑问。在本研究中，我们评估了11种表现最佳的LLM在DEFECTS4J-TRANS上的表现，这是从对Defects4J进行转换而得到的新数据集，同时保持了原始语义。在对Defects4J和DEFECTS4J-TRANS的实验结果显示，所有研究的LLM在APR任务中的泛化能力有限，正确和合理修补程序的平均数量分别减少了49.48%和42.90%。进一步研究如何将额外的修复相关信息整合到修复提示中表明，尽管这些信息显著增强了LLM的能力（分别增加了正确和合理修补程序的数量高达136.67%和121.82%），但性能仍然不及其原始结果。这表明仅仅进行提示工程是不足以显著提升LLM的修复能力的。基于我们的研究，我们还提出了一些建议供未来研究参考。

更新时间: 2025-03-12 10:03:58

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2503.09217v1

Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space

Advanced end-to-end autonomous driving systems predict other vehicles' motions and plan ego vehicle's trajectory. The world model that can foresee the outcome of the trajectory has been used to evaluate the end-to-end autonomous driving system. However, existing world models predominantly emphasize the trajectory of the ego vehicle and leave other vehicles uncontrollable. This limitation hinders their ability to realistically simulate the interaction between the ego vehicle and the driving scenario. In addition, it remains a challenge to match multiple trajectories with each vehicle in the video to control the video generation. To address above issues, a driving \textbf{W}orld \textbf{M}odel named EOT-WM is proposed in this paper, unifying \textbf{E}go-\textbf{O}ther vehicle \textbf{T}rajectories in videos. Specifically, we first project ego and other vehicle trajectories in the BEV space into the image coordinate to match each trajectory with its corresponding vehicle in the video. Then, trajectory videos are encoded by the Spatial-Temporal Variational Auto Encoder to align with driving video latents spatially and temporally in the unified visual space. A trajectory-injected diffusion Transformer is further designed to denoise the noisy video latents for video generation with the guidance of ego-other vehicle trajectories. In addition, we propose a metric based on control latent similarity to evaluate the controllability of trajectories. Extensive experiments are conducted on the nuScenes dataset, and the proposed model outperforms the state-of-the-art method by 30\% in FID and 55\% in FVD. The model can also predict unseen driving scenes with self-produced trajectories.

Updated: 2025-03-12 10:02:18

标题: 其他车辆轨迹也是必需的：一个驾驶世界模型统一了视频潜在空间中的自我-其他车辆轨迹

摘要: 先进的端到端自动驾驶系统可以预测其他车辆的运动并规划自车的轨迹。已经使用可以预见轨迹结果的世界模型来评估端到端自动驾驶系统。然而，现有的世界模型主要强调自车的轨迹，而忽略了其他车辆的控制。这种限制阻碍了它们实际模拟自车与驾驶场景之间的交互能力。此外，将多条轨迹与视频中的每辆车匹配以控制视频生成仍然是一个挑战。为了解决以上问题，本文提出了一种名为EOT-WM的驾驶世界模型，统一了视频中的自车-其他车辆轨迹。具体地，首先将自车和其他车辆轨迹在BEV空间中投影到图像坐标中，以便将每条轨迹与视频中对应的车辆匹配。然后，通过空间-时间变分自动编码器对轨迹视频进行编码，以在统一的视觉空间中在时空上与驾驶视频潜在变量对齐。进一步设计了一个注入轨迹的扩散Transformer，以在自车-其他车辆轨迹的指导下去噪视频潜在变量以进行视频生成。此外，我们提出了一种基于控制潜在相似性的度量来评估轨迹的可控性。我们在nuScenes数据集上进行了大量实验，结果显示所提出的模型在FID方面的性能比最先进的方法提高了30％，在FVD方面提高了55％。该模型还可以预测自动生成轨迹的未见驾驶场景。

更新时间: 2025-03-12 10:02:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09215v1

Why LLMs Cannot Think and How to Fix It

This paper elucidates that current state-of-the-art Large Language Models (LLMs) are fundamentally incapable of making decisions or developing "thoughts" within the feature space due to their architectural constraints. We establish a definition of "thought" that encompasses traditional understandings of that term and adapt it for application to LLMs. We demonstrate that the architectural design and language modeling training methodology of contemporary LLMs inherently preclude them from engaging in genuine thought processes. Our primary focus is on this theoretical realization rather than practical insights derived from experimental data. Finally, we propose solutions to enable thought processes within the feature space and discuss the broader implications of these architectural modifications.

Updated: 2025-03-12 10:00:09

标题: 为什么LLMs无法思考以及如何解决这个问题

摘要: 本文阐明了当前最先进的大型语言模型(LLMs)基本上无法在特征空间内做出决策或发展“思维”，这是由于它们的结构约束所致。我们确立了一个涵盖传统思维理解的“思维”定义，并将其调整以适用于LLMs。我们展示了当代LLMs的架构设计和语言建模训练方法本质上排除了它们参与真正的思维过程。我们的主要关注点是这种理论意识，而不是从实验数据中得出的实际见解。最后，我们提出解决方案，以在特征空间内实现思维过程，并讨论这些架构修改的更广泛影响。

更新时间: 2025-03-12 10:00:09

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.09211v1

From Idea to Implementation: Evaluating the Influence of Large Language Models in Software Development -- An Opinion Paper

The introduction of transformer architecture was a turning point in Natural Language Processing (NLP). Models based on the transformer architecture such as Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-Trained Transformer (GPT) have gained widespread popularity in various applications such as software development and education. The availability of Large Language Models (LLMs) such as ChatGPT and Bard to the general public has showcased the tremendous potential of these models and encouraged their integration into various domains such as software development for tasks such as code generation, debugging, and documentation generation. In this study, opinions from 11 experts regarding their experience with LLMs for software development have been gathered and analysed to draw insights that can guide successful and responsible integration. The overall opinion of the experts is positive, with the experts identifying advantages such as increase in productivity and reduced coding time. Potential concerns and challenges such as risk of over-dependence and ethical considerations have also been highlighted.

Updated: 2025-03-12 09:57:19

标题: 从想法到实施：评估大型语言模型在软件开发中的影响--一篇观点论文

摘要: 变压器架构的引入是自然语言处理（NLP）的一个转折点。基于变压器架构的模型，如双向编码器表示来自变压器（BERT）和生成预训练变压器（GPT），在软件开发和教育等各种应用中广受欢迎。大型语言模型（LLMs）如ChatGPT和巴德向公众开放，展示了这些模型的巨大潜力，并鼓励将它们整合到软件开发等各个领域，用于任务如代码生成、调试和文档生成。本研究收集和分析了11位专家关于他们在软件开发中使用LLMs的经验，以得出可以指导成功和负责任的整合的见解。专家的整体意见是积极的，他们认为优势包括提高生产力和减少编码时间。同时也强调了潜在的担忧和挑战，如过度依赖的风险和伦理考虑。

更新时间: 2025-03-12 09:57:19

领域: cs.AI

下载: http://arxiv.org/abs/2503.07450v2

AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers

Traditional methods for eliciting people's opinions face a trade-off between depth and scale: structured surveys enable large-scale data collection but limit respondents' ability to voice their opinions in their own words, while conversational interviews provide deeper insights but are resource-intensive. This study explores the potential of replacing human interviewers with large language models (LLMs) to conduct scalable conversational interviews. Our goal is to assess the performance of AI Conversational Interviewing and to identify opportunities for improvement in a controlled environment. We conducted a small-scale, in-depth study with university students who were randomly assigned to a conversational interview by either AI or human interviewers, both employing identical questionnaires on political topics. Various quantitative and qualitative measures assessed interviewer adherence to guidelines, response quality, participant engagement, and overall interview efficacy. The findings indicate the viability of AI Conversational Interviewing in producing quality data comparable to traditional methods, with the added benefit of scalability. We publish our data and materials for re-use and present specific recommendations for effective implementation.

Updated: 2025-03-12 09:55:22

标题: AI对话式采访：以LLMs为自适应面试者改变调查问卷

摘要: 传统调查方法在引发人们意见之间存在深度和规模的权衡：结构化调查能够进行大规模数据收集，但限制了受访者用自己的话表达意见的能力，而对话式访谈提供了更深入的洞察，但需要耗费资源。本研究探讨了用大型语言模型（LLMs）替代人类访谈者进行可扩展的对话式访谈的潜力。我们的目标是评估AI对话式访谈的性能，并在受控环境中确定改进机会。我们对随机分配给由AI或人类访谈者进行的政治话题相同问卷的大学生进行了规模较小、深入的研究。各种定量和定性指标评估了访谈者对指南的遵循程度、回应质量、参与者参与度和整体访谈效果。研究结果表明，AI对话式访谈在产生与传统方法可比的质量数据方面是可行的，并且具有可扩展性的额外好处。我们公开了数据和材料供再利用，并提出了有效实施的具体建议。

更新时间: 2025-03-12 09:55:22

领域: cs.HC,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.01824v2

Robust Asymmetric Heterogeneous Federated Learning with Corrupted Clients

This paper studies a challenging robust federated learning task with model heterogeneous and data corrupted clients, where the clients have different local model structures. Data corruption is unavoidable due to factors such as random noise, compression artifacts, or environmental conditions in real-world deployment, drastically crippling the entire federated system. To address these issues, this paper introduces a novel Robust Asymmetric Heterogeneous Federated Learning (RAHFL) framework. We propose a Diversity-enhanced supervised Contrastive Learning technique to enhance the resilience and adaptability of local models on various data corruption patterns. Its basic idea is to utilize complex augmented samples obtained by the mixed-data augmentation strategy for supervised contrastive learning, thereby enhancing the ability of the model to learn robust and diverse feature representations. Furthermore, we design an Asymmetric Heterogeneous Federated Learning strategy to resist corrupt feedback from external clients. The strategy allows clients to perform selective one-way learning during collaborative learning phase, enabling clients to refrain from incorporating lower-quality information from less robust or underperforming collaborators. Extensive experimental results demonstrate the effectiveness and robustness of our approach in diverse, challenging federated learning environments. Our code and models are public available at https://github.com/FangXiuwen/RAHFL.

Updated: 2025-03-12 09:52:04

标题: 具有受损客户的健壮不对称异构联邦学习

摘要: 本文研究了一个具有模型异构和数据受损客户的挑战性强大联邦学习任务，其中客户具有不同的本地模型结构。由于随机噪声、压缩伪影或实际部署中的环境条件等因素，数据损坏是不可避免的，严重削弱整个联邦系统。为了解决这些问题，本文引入了一个新颖的Robust Asymmetric Heterogeneous Federated Learning（RAHFL）框架。我们提出了一种增强多样性的监督对比学习技术，以增强各种数据损坏模式下本地模型的韧性和适应性。其基本思想是利用混合数据增强策略获得的复杂增强样本进行监督对比学习，从而增强模型学习稳健和多样化的特征表示能力。此外，我们设计了一种不对称异构联邦学习策略，以抵抗来自外部客户的损坏反馈。该策略允许客户在协作学习阶段进行选择性的单向学习，使客户免受不够健壮或表现不佳的合作者提供的低质量信息的影响。大量实验结果表明，我们的方法在多样且具有挑战性的联邦学习环境中的有效性和鲁棒性。我们的代码和模型可以在https://github.com/FangXiuwen/RAHFL 上公开获取。

更新时间: 2025-03-12 09:52:04

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.09206v1

A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation

Advancements in image segmentation play an integral role within the broad scope of Deep Learning-based Computer Vision. Furthermore, their widespread applicability in critical real-world tasks has resulted in challenges related to the reliability of such algorithms. Hence, uncertainty quantification has been extensively studied within this context, enabling the expression of model ignorance (epistemic uncertainty) or data ambiguity (aleatoric uncertainty) to prevent uninformed decision-making. Due to the rapid adoption of Convolutional Neural Network (CNN)-based segmentation models in high-stake applications, a substantial body of research has been published on this very topic, causing its swift expansion into a distinct field. This work provides a comprehensive overview of probabilistic segmentation, by discussing fundamental concepts of uncertainty quantification, governing advancements in the field as well as the application to various tasks. Moreover, literature on both types of uncertainties trace back to four key applications: (1) to quantify statistical inconsistencies in the annotation process due ambiguous images, (2) correlating prediction error with uncertainty, (3) expanding the model hypothesis space for better generalization, and (4) Active Learning. An extensive discussion follows that includes an overview of utilized datasets for each of the applications and evaluation of the available methods. We also highlight challenges related to architectures, uncertainty quantification methods, standardization and benchmarking, and finally end with recommendations for future work such as methods based on single forward passes and models that appropriately leverage volumetric data.

Updated: 2025-03-12 09:51:17

标题: 一个关于深度概率图像分割中贝叶斯不确定性量化的综述

摘要: 图像分割技术的进展在基于深度学习的计算机视觉领域起着至关重要的作用。此外，它们在关键的现实世界任务中的广泛适用性导致了与这些算法可靠性相关的挑战。因此，在这种背景下，不确定性量化已得到广泛研究，使得可以表达模型的无知（认知不确定性）或数据的模糊性（随机不确定性），以防止未经思考的决策。由于卷积神经网络（CNN）分割模型在高风险应用中的迅速采用，这一主题已经发表了大量研究成果，使其迅速扩展成为一个独立的领域。本文通过讨论不确定性量化的基本概念、领域内的进展以及应用于各种任务，提供了对概率分割的全面概述。此外，关于两种不确定性的文献均可以追溯到四个关键应用：（1）量化由于图像模糊而导致的注释过程中的统计不一致性，（2）将预测误差与不确定性相关联，（3）扩展模型假设空间以实现更好的泛化，以及（4）主动学习。随后进行了广泛的讨论，包括对每个应用中使用的数据集的概述以及可用方法的评估。我们还强调了与架构、不确定性量化方法、标准化和基准测试相关的挑战，最后提出了一些关于未来工作的建议，如基于单次前向传递的方法和适当利用体积数据的模型。

更新时间: 2025-03-12 09:51:17

领域: cs.CV,cs.AI,cs.LG,eess.IV,stat.ML

下载: http://arxiv.org/abs/2411.16370v3

MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics

This work presents the MarineGym, a high-performance reinforcement learning (RL) platform specifically designed for underwater robotics. It aims to address the limitations of existing underwater simulation environments in terms of RL compatibility, training efficiency, and standardized benchmarking. MarineGym integrates a proposed GPU-accelerated hydrodynamic plugin based on Isaac Sim, achieving a rollout speed of 250,000 frames per second on a single NVIDIA RTX 3060 GPU. It also provides five models of unmanned underwater vehicles (UUVs), multiple propulsion systems, and a set of predefined tasks covering core underwater control challenges. Additionally, the DR toolkit allows flexible adjustments of simulation and task parameters during training to improve Sim2Real transfer. Further benchmark experiments demonstrate that MarineGym improves training efficiency over existing platforms and supports robust policy adaptation under various perturbations. We expect this platform could drive further advancements in RL research for underwater robotics. For more details about MarineGym and its applications, please visit our project page: https://marine-gym.com/.

Updated: 2025-03-12 09:47:58

标题: 海洋健身房：一个用于水下机器人的高性能强化学习平台

摘要: 这项工作介绍了MarineGym，这是一个专为水下机器人设计的高性能强化学习（RL）平台。它旨在解决现有水下仿真环境在RL兼容性、训练效率和标准化基准测试方面的局限性。MarineGym集成了一个基于Isaac Sim的提出的GPU加速的水动力学插件，实现了在单个NVIDIA RTX 3060 GPU上每秒250,000帧的速度。它还提供了五种无人水下车辆（UUVs）模型，多种推进系统，以及涵盖核心水下控制挑战的一组预定义任务。此外，DR工具包允许在训练过程中灵活调整仿真和任务参数，以改善从Sim2Real的转移。进一步的基准实验表明，MarineGym提高了训练效率，支持在各种扰动下的稳健策略适应。我们期望该平台能推动水下机器人的RL研究取得进一步进展。有关MarineGym及其应用的更多详细信息，请访问我们的项目页面：https://marine-gym.com/。

更新时间: 2025-03-12 09:47:58

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2503.09203v1

Time-EAPCR: A Deep Learning-Based Novel Approach for Anomaly Detection Applied to the Environmental Field

As human activities intensify, environmental systems such as aquatic ecosystems and water treatment systems face increasingly complex pressures, impacting ecological balance, public health, and sustainable development, making intelligent anomaly monitoring essential. However, traditional monitoring methods suffer from delayed responses, insufficient data processing capabilities, and weak generalisation, making them unsuitable for complex environmental monitoring needs.In recent years, machine learning has been widely applied to anomaly detection, but the multi-dimensional features and spatiotemporal dynamics of environmental ecological data, especially the long-term dependencies and strong variability in the time dimension, limit the effectiveness of traditional methods.Deep learning, with its ability to automatically learn features, captures complex nonlinear relationships, improving detection performance. However, its application in environmental monitoring is still in its early stages and requires further exploration.This paper introduces a new deep learning method, Time-EAPCR (Time-Embedding-Attention-Permutated CNN-Residual), and applies it to environmental science. The method uncovers feature correlations, captures temporal evolution patterns, and enables precise anomaly detection in environmental systems.We validated Time-EAPCR's high accuracy and robustness across four publicly available environmental datasets. Experimental results show that the method efficiently handles multi-source data, improves detection accuracy, and excels across various scenarios with strong adaptability and generalisation. Additionally, a real-world river monitoring dataset confirmed the feasibility of its deployment, providing reliable technical support for environmental monitoring.

Updated: 2025-03-12 09:44:15

标题: Time-EAPCR: 一种基于深度学习的新颖方法，用于环境领域的异常检测

摘要: 随着人类活动的加剧，诸如水生态系统和水处理系统等环境系统面临着日益复杂的压力，影响着生态平衡、公共健康和可持续发展，使得智能异常监测至关重要。然而，传统的监测方法存在着响应延迟、数据处理能力不足和泛化能力弱的问题，使其不适用于复杂的环境监测需求。近年来，机器学习已被广泛应用于异常检测，但环境生态数据的多维特征和时空动态，尤其是时间维度上的长期依赖和强烈变异性，限制了传统方法的有效性。深度学习通过自动学习特征、捕捉复杂的非线性关系，提高了检测性能。然而，其在环境监测中的应用仍处于早期阶段，需要进一步探索。本文介绍了一种新的深度学习方法，Time-EAPCR(Time-Embedding-Attention-Permutated CNN-Residual)，并将其应用于环境科学。该方法揭示了特征之间的相关性，捕捉了时间演变模式，并实现了对环境系统中精确异常检测。我们验证了Time-EAPCR在四个公开环境数据集上的高准确性和稳健性。实验结果表明，该方法有效处理多源数据，提高了检测准确性，并在各种场景中表现出强大的适应性和泛化能力。此外，一个真实的河流监测数据集证实了其部署的可行性，为环境监测提供了可靠的技术支持。

更新时间: 2025-03-12 09:44:15

领域: cs.LG

下载: http://arxiv.org/abs/2503.09200v1

Bags of Projected Nearest Neighbours: Competitors to Random Forests?

In this paper we introduce a simple and intuitive adaptive k nearest neighbours classifier, and explore its utility within the context of bootstrap aggregating ("bagging"). The approach is based on finding discriminant subspaces which are computationally efficient to compute, and are motivated by enhancing the discrimination of classes through nearest neighbour classifiers. This adaptiveness promotes diversity of the individual classifiers fit across different bootstrap samples, and so further leverages the variance reducing effect of bagging. Extensive experimental results are presented documenting the strong performance of the proposed approach in comparison with Random Forest classifiers, as well as other nearest neighbours based ensembles from the literature, plus other relevant benchmarks. Code to implement the proposed approach is available in the form of an R package from https://github.com/DavidHofmeyr/BOPNN.

Updated: 2025-03-12 09:44:12

标题: 袋装投影最近邻居：随机森林的竞争对手？

摘要: 在本文中，我们介绍了一种简单直观的自适应k最近邻分类器，并探讨了在自举聚合（“bagging”）背景下其实用性。该方法基于寻找计算效率高的判别子空间，其动机是通过最近邻分类器增强类别的区分度。这种自适应性促进了在不同自举样本中适应的各个分类器的多样性，从而进一步利用了bagging的方差减小效果。通过广泛的实验结果展示了所提出方法与随机森林分类器以及文献中其他基于最近邻的集成方法以及其他相关基准的强大性能。实现所提出方法的代码可通过R包在https://github.com/DavidHofmeyr/BOPNN 上获得。

更新时间: 2025-03-12 09:44:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.09651v1

GENEOnet: Statistical analysis supporting explainability and trustworthiness

Group Equivariant Non-Expansive Operators (GENEOs) have emerged as mathematical tools for constructing networks for Machine Learning and Artificial Intelligence. Recent findings suggest that such models can be inserted within the domain of eXplainable Artificial Intelligence (XAI) due to their inherent interpretability. In this study, we aim to verify this claim with respect to GENEOnet, a GENEO network developed for an application in computational biochemistry by employing various statistical analyses and experiments. Such experiments first allow us to perform a sensitivity analysis on GENEOnet's parameters to test their significance. Subsequently, we show that GENEOnet exhibits a significantly higher proportion of equivariance compared to other methods. Lastly, we demonstrate that GENEOnet is on average robust to perturbations arising from molecular dynamics. These results collectively serve as proof of the explainability, trustworthiness, and robustness of GENEOnet and confirm the beneficial use of GENEOs in the context of Trustworthy Artificial Intelligence.

Updated: 2025-03-12 09:43:48

标题: GENEOnet：支持可解释性和可信度的统计分析

摘要: Group Equivariant Non-Expansive Operators (GENEOs)已经成为构建机器学习和人工智能网络的数学工具。最近的研究发现，这样的模型可以被插入到可解释人工智能（XAI）领域，因为它们具有固有的可解释性。在这项研究中，我们旨在通过采用各种统计分析和实验来验证GENEOnet这一GENEO网络在计算生物化学应用中的可解释性。这些实验首先允许我们对GENEOnet的参数进行敏感性分析以测试它们的重要性。随后，我们展示了GENEOnet相对于其他方法具有显着更高的等变性比例。最后，我们证明GENEOnet对分子动力学引起的扰动平均上具有稳健性。这些结果共同证明了GENEOnet的可解释性、可信度和稳健性，并确认了在值得信赖的人工智能环境中使用GENEOs的益处。

更新时间: 2025-03-12 09:43:48

领域: cs.LG,cs.AI,q-bio.BM,stat.AP

下载: http://arxiv.org/abs/2503.09199v1

Enhancing elusive clues in knowledge learning by contrasting attention of language models

Causal language models acquire vast amount of knowledge from general text corpus during pretraining, but the efficiency of knowledge learning is known to be unsatisfactory, especially when learning from knowledge-dense and small-sized corpora. The deficiency can come from long-distance dependencies which are hard to capture by language models, and overfitting to co-occurrence patterns and distracting clues in the training text. To address these issues, the paper proposes a method to enhance knowledge learning during language model pretraining, by enhancing elusive but important clues in text discovered by the language model themselves. We found that larger language models pay more attention to non-obvious but important clues, which are often overlooked by smaller language models. Therefore, we can identify these clues by contrasting the attention weights of large and small language models. We use the identified clues as a guide to perform token-dropout data augmentation on the training text, and observed a significant boost in both small and large models' performance in fact memorization. This shows that the behavior contrast between more and less-performant language models contains important clues for knowledge learning, and it can be ``amplified" for a straight-forward improvement in knowledge learning efficiency.

Updated: 2025-03-12 09:42:19

标题: 通过对比语言模型的注意力，提升知识学习中难以捉摸的线索

摘要: 因果语言模型在预训练过程中从一般文本语料库中获取大量知识，但知识学习的效率被认为是不令人满意的，特别是在从知识密集且规模较小的语料库中学习时。这种不足可能来自语言模型难以捕捉的长距离依赖关系，以及对训练文本中共现模式和干扰性线索过拟合的情况。为解决这些问题，该论文提出了一种方法，通过增强语言模型自身发现的文本中难以捕捉但重要的线索来增强知识学习。我们发现，更大的语言模型更关注不明显但重要的线索，而这些线索通常被规模较小的语言模型忽略。因此，我们可以通过对比大型和小型语言模型的注意力权重来识别这些线索。我们将识别的线索用作在训练文本上执行标记丢弃数据增强的指南，并观察到在事实记忆方面，小型和大型模型的表现都显著提升。这表明，表现更好和表现较差的语言模型之间的行为对比包含了知识学习的重要线索，并且可以被“放大”以直接提升知识学习效率。

更新时间: 2025-03-12 09:42:19

领域: cs.AI

下载: http://arxiv.org/abs/2409.17954v2

Foundation Models for Spatio-Temporal Data Science: A Tutorial and Survey

Spatio-Temporal (ST) data science, which includes sensing, managing, and mining large-scale data across space and time, is fundamental to understanding complex systems in domains such as urban computing, climate science, and intelligent transportation. Traditional deep learning approaches have significantly advanced this field, particularly in the stage of ST data mining. However, these models remain task-specific and often require extensive labeled data. Inspired by the success of Foundation Models (FM), especially large language models, researchers have begun exploring the concept of Spatio-Temporal Foundation Models (STFMs) to enhance adaptability and generalization across diverse ST tasks. Unlike prior architectures, STFMs empower the entire workflow of ST data science, ranging from data sensing, management, to mining, thereby offering a more holistic and scalable approach. Despite rapid progress, a systematic study of STFMs for ST data science remains lacking. This survey aims to provide a comprehensive review of STFMs, categorizing existing methodologies and identifying key research directions to advance ST general intelligence.

Updated: 2025-03-12 09:42:18

标题: 基础模型用于时空数据科学：教程和调查

摘要: 时空（ST）数据科学包括感知、管理和挖掘跨时空大规模数据，对于理解诸如城市计算、气候科学和智能交通等领域的复杂系统至关重要。传统的深度学习方法在ST数据挖掘阶段取得了显著进展。然而，这些模型仍然是任务特定的，并且通常需要大量标记数据。受基础模型（FM）特别是大型语言模型成功的启发，研究人员开始探索时空基础模型（STFMs）的概念，以增强跨不同ST任务的适应性和泛化能力。与先前的架构不同，STFMs赋予了整个ST数据科学的工作流程权力，从数据感知、管理到挖掘，因此提供了一种更全面和可扩展的方法。尽管取得了快速进展，但对于ST数据科学的STFMs的系统研究仍然不足。本调查旨在对STFMs进行全面回顾，对现有方法进行分类，并确定推进ST通用智能的关键研究方向。

更新时间: 2025-03-12 09:42:18

领域: cs.DB,cs.LG

下载: http://arxiv.org/abs/2503.13502v1

Addressing pitfalls in implicit unobserved confounding synthesis using explicit block hierarchical ancestral sampling

Unbiased data synthesis is crucial for evaluating causal discovery algorithms in the presence of unobserved confounding, given the scarcity of real-world datasets. A common approach, implicit parameterization, encodes unobserved confounding by modifying the off-diagonal entries of the idiosyncratic covariance matrix while preserving positive definiteness. Within this approach, state-of-the-art protocols have two distinct issues that hinder unbiased sampling from the complete space of causal models: first, the use of diagonally dominant constructions, which restrict the spectrum of partial correlation matrices; and second, the restriction of possible graphical structures when sampling bidirected edges, unnecessarily ruling out valid causal models. To address these limitations, we propose an improved explicit modeling approach for unobserved confounding, leveraging block-hierarchical ancestral generation of ground truth causal graphs. Algorithms for converting the ground truth DAG into ancestral graph is provided so that the output of causal discovery algorithms could be compared with. We prove that our approach fully covers the space of causal models, including those generated by the implicit parameterization, thus enabling more robust evaluation of methods for causal discovery and inference.

Updated: 2025-03-12 09:38:40

标题: 解决隐式未观察混杂合成中的问题：使用明确的分块分层祖先抽样

摘要: 无偏数据综合对于在存在未观察到的混淆因素的情况下评估因果发现算法至关重要，考虑到真实世界数据集的稀缺性。一种常见的方法，隐式参数化，通过修改特征协方差矩阵的非对角线条目来编码未观察到的混淆因素，同时保持正定性。在这种方法中，最先进的协议存在两个明显问题，阻碍了从完整因果模型空间中进行无偏采样：首先，使用对角线占优的构造，限制了偏相关矩阵的谱范围；其次，在对双向边进行采样时限制可能的图结构，不必要地排除了有效的因果模型。为了解决这些限制，我们提出了一种改进的显式建模方法，用于未观察到的混淆因素，利用块分层祖先生成真实因果图。提供了将真实有向无环图转化为祖先图的算法，以便将因果发现算法的输出进行比较。我们证明了我们的方法完全覆盖了因果模型的空间，包括由隐式参数化生成的模型，从而使得对因果发现和推断方法进行更健壮的评估成为可能。

更新时间: 2025-03-12 09:38:40

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2503.09194v1

EVOKE: Elevating Chest X-ray Report Generation via Multi-View Contrastive Learning and Patient-Specific Knowledge

Radiology reports are crucial for planning treatment strategies and facilitating effective doctor-patient communication. However, the manual creation of these reports places a significant burden on radiologists. While automatic radiology report generation presents a promising solution, existing methods often rely on single-view radiographs, which constrain diagnostic accuracy. To address this challenge, we propose \textbf{EVOKE}, a novel chest X-ray report generation framework that incorporates multi-view contrastive learning and patient-specific knowledge. Specifically, we introduce a multi-view contrastive learning method that enhances visual representation by aligning multi-view radiographs with their corresponding report. After that, we present a knowledge-guided report generation module that integrates available patient-specific indications (e.g., symptom descriptions) to trigger the production of accurate and coherent radiology reports. To support research in multi-view report generation, we construct Multi-view CXR and Two-view CXR datasets using publicly available sources. Our proposed EVOKE surpasses recent state-of-the-art methods across multiple datasets, achieving a 2.9\% F\textsubscript{1} RadGraph improvement on MIMIC-CXR, a 7.3\% BLEU-1 improvement on MIMIC-ABN, a 3.1\% BLEU-4 improvement on Multi-view CXR, and an 8.2\% F\textsubscript{1,mic-14} CheXbert improvement on Two-view CXR.

Updated: 2025-03-12 09:38:02

标题: EVOKE：通过多视角对比学习和患者特定知识提升胸部X射线报告生成

摘要: 放射学报告对于制定治疗策略和促进有效的医患沟通至关重要。然而，手动创建这些报告给放射科医生带来了重大负担。虽然自动生成放射学报告提供了一个有前途的解决方案，但现有方法通常依赖于单视图X光片，这限制了诊断的准确性。为了解决这一挑战，我们提出了一种新颖的胸部X光报告生成框架\textbf{EVOKE}，该框架结合了多视图对比学习和患者特定知识。具体地，我们引入了一种多视图对比学习方法，通过将多视图X光片与其对应的报告对齐来增强视觉表征。之后，我们提出了一个知识引导的报告生成模块，该模块整合了可用的患者特定指标（例如症状描述）来触发准确和连贯的放射学报告的生成。为了支持多视图报告生成的研究，我们使用公开来源构建了多视图CXR和双视图CXR数据集。我们提出的EVOKE在多个数据集上超越了最近的最先进方法，在MIMIC-CXR上实现了2.9\%的F\textsubscript{1} RadGraph 改进，在MIMIC-ABN上实现了7.3\%的BLEU-1 改进，在多视图CXR上实现了3.1\%的BLEU-4 改进，在双视图CXR上实现了8.2\%的F\textsubscript{1,mic-14} CheXbert 改进。

更新时间: 2025-03-12 09:38:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.10224v2

Differential Privacy Personalized Federated Learning Based on Dynamically Sparsified Client Updates

Personalized federated learning is extensively utilized in scenarios characterized by data heterogeneity, facilitating more efficient and automated local training on data-owning terminals. This includes the automated selection of high-performance model parameters for upload, thereby enhancing the overall training process. However, it entails significant risks of privacy leakage. Existing studies have attempted to mitigate these risks by utilizing differential privacy. Nevertheless, these studies present two major limitations: (1) The integration of differential privacy into personalized federated learning lacks sufficient personalization, leading to the introduction of excessive noise into the model. (2) It fails to adequately control the spatial scope of model update information, resulting in a suboptimal balance between data privacy and model effectiveness in differential privacy federated learning. In this paper, we propose a differentially private personalized federated learning approach that employs dynamically sparsified client updates through reparameterization and adaptive norm(DP-pFedDSU). Reparameterization training effectively selects personalized client update information, thereby reducing the quantity of updates. This approach minimizes the introduction of noise to the greatest extent possible. Additionally, dynamic adaptive norm refers to controlling the norm space of model updates during the training process, mitigating the negative impact of clipping on the update information. These strategies substantially enhance the effective integration of differential privacy and personalized federated learning. Experimental results on EMNIST, CIFAR-10, and CIFAR-100 demonstrate that our proposed scheme achieves superior performance and is well-suited for more complex personalized federated learning scenarios.

Updated: 2025-03-12 09:34:05

标题: 基于动态稀疏化客户端更新的差分隐私个性化联邦学习

摘要: 个性化的联邦学习广泛应用于数据异构性场景中，促进了数据拥有者终端上更高效和自动化的本地训练。这包括自动选择上传高性能模型参数，从而增强整体训练过程。然而，这也带来了隐私泄露的重大风险。现有研究尝试通过利用差分隐私来减轻这些风险。然而，这些研究存在两个主要限制：（1）差分隐私与个性化联邦学习的整合缺乏足够的个性化，导致模型中引入过多的噪声。（2）未能充分控制模型更新信息的空间范围，导致在差分隐私联邦学习中数据隐私和模型效果之间的最优平衡。在本文中，我们提出了一种不同ially私有的个性化联邦学习方法，通过重新参数化和自适应范数(DP-pFedDSU)来动态稀疏化客户端更新。重新参数化训练有效地选择个性化的客户端更新信息，从而减少更新的数量。这种方法尽可能地减少了引入噪声。此外，动态自适应范数指的是在训练过程中控制模型更新的范数空间，减轻剪切对更新信息的负面影响。这些策略大大增强了差分隐私与个性化联邦学习的有效整合。在EMNIST、CIFAR-10和CIFAR-100上的实验结果表明，我们提出的方案实现了卓越的性能，并且非常适用于更复杂的个性化联邦学习场景。

更新时间: 2025-03-12 09:34:05

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2503.09192v1

Toward a method for LLM-enabled Indoor Navigation

Indoor navigation presents unique challenges due to complex layouts, lack of GPS signals, and accessibility concerns. Existing solutions often struggle with real-time adaptability and user-specific needs. In this work, we explore the potential of a Large Language Model (LLM), i.e., ChatGPT, to generate natural, context-aware navigation instructions from indoor map images. We design and evaluate test cases across different real-world environments, analyzing the effectiveness of LLMs in interpreting spatial layouts, handling user constraints, and planning efficient routes. Our findings demonstrate the potential of LLMs for supporting personalized indoor navigation, with an average of 52% correct indications and a maximum of 62%. The results do not appear to depend on the complexity of the layout or the complexity of the expected path, but rather on the number of points of interest and the abundance of visual information, which negatively affect the performance.

Updated: 2025-03-12 09:32:43

标题: 朝向一种LLM技术实现的室内导航方法

摘要: 室内导航面临着独特的挑战，由于复杂的布局、缺乏GPS信号和无障碍问题。现有的解决方案通常在实时适应性和用户特定需求方面存在困难。在这项工作中，我们探讨了大型语言模型（LLM），即ChatGPT，从室内地图图像生成自然、上下文感知的导航指令的潜力。我们设计并评估了不同真实环境中的测试案例，分析了LLM在解释空间布局、处理用户约束和规划高效路线方面的效果。我们的研究结果表明，LLM在支持个性化室内导航方面具有潜力，平均正确指示率为52%，最高可达62%。结果似乎并不取决于布局的复杂性或预期路径的复杂性，而是取决于兴趣点的数量和视觉信息的丰富程度，这些因素会对性能产生负面影响。

更新时间: 2025-03-12 09:32:43

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.11702v1

Status and Future Prospects of the Standardization Framework Industry 4.0: A European Perspective

The rapid development of Industry 4.0 technologies requires robust and comprehensive standardization to ensure interoperability, safety and efficiency in the Industry of the Future. This paper examines the fundamental role and functionality of standardization, with a particular focus on its importance in Europe's regulatory framework. Based on this, selected topics in context of standardization activities in context intelligent manufacturing and digital twins are highlighted and, by that, an overview of the Industry 4.0 standards framework is provided. This paper serves both as an informative guide to the existing standards in Industry 4.0 with respect to Artificial Intelligence and Digital Twins, and as a call to action for increased cooperation between standardization bodies and the research community. By fostering such collaboration, we aim to facilitate the continued development and implementation of standards that will drive innovation and progress in the manufacturing sector.

Updated: 2025-03-12 09:30:52

标题: 工业4.0标准化框架的现状与未来展望：欧洲视角

摘要: 工业4.0技术的快速发展需要健全和全面的标准化，以确保未来工业中的互操作性、安全性和效率。本文探讨了标准化的基本作用和功能，特别关注了它在欧洲监管框架中的重要性。基于此，突出了智能制造和数字孪生背景下标准化活动中的选定主题，并提供了工业4.0标准框架的概述。本文既作为对工业4.0现有标准的信息指南，特别是在人工智能和数字孪生方面，也呼吁加强标准化机构与研究界之间的合作。通过促进这种合作，我们旨在促进标准的持续发展和实施，推动制造业领域的创新和进步。

更新时间: 2025-03-12 09:30:52

领域: cs.ET,cs.AI,cs.CY

下载: http://arxiv.org/abs/2503.08460v2

Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework

Bimanual robotic manipulation is an emerging and critical topic in the robotics community. Previous works primarily rely on integrated control models that take the perceptions and states of both arms as inputs to directly predict their actions. However, we think bimanual manipulation involves not only coordinated tasks but also various uncoordinated tasks that do not require explicit cooperation during execution, such as grasping objects with the closest hand, which integrated control frameworks ignore to consider due to their enforced cooperation in the early inputs. In this paper, we propose a novel decoupled interaction framework that considers the characteristics of different tasks in bimanual manipulation. The key insight of our framework is to assign an independent model to each arm to enhance the learning of uncoordinated tasks, while introducing a selective interaction module that adaptively learns weights from its own arm to improve the learning of coordinated tasks. Extensive experiments on seven tasks in the RoboTwin dataset demonstrate that: (1) Our framework achieves outstanding performance, with a 23.5% boost over the SOTA method. (2) Our framework is flexible and can be seamlessly integrated into existing methods. (3) Our framework can be effectively extended to multi-agent manipulation tasks, achieving a 28% boost over the integrated control SOTA. (4) The performance boost stems from the decoupled design itself, surpassing the SOTA by 16.5% in success rate with only 1/6 of the model size.

Updated: 2025-03-12 09:28:41

标题: 重新思考双手机器人操纵：采用解耦交互框架学习

摘要: 双手机器人操作是机器人领域中一个新兴而关键的话题。先前的研究主要依赖于集成控制模型，这些模型将两只手臂的感知和状态作为输入，直接预测它们的动作。然而，我们认为双手操作不仅涉及协调任务，还涉及各种不需要在执行过程中明确合作的非协调任务，比如用最近的手抓取物体，这些任务在集成控制框架中被忽视了，因为它们在早期输入中被强制合作。在本文中，我们提出了一个新颖的解耦相互作用框架，考虑了双手操作中不同任务的特点。我们框架的关键洞察是为每只手臂分配一个独立模型，以增强非协调任务的学习，同时引入一个选择性交互模块，自适应地从自身手臂学习权重，以提高协调任务的学习。在RoboTwin数据集中的七个任务上进行的大量实验表明：（1）我们的框架取得了出色的性能，比SOTA方法提高了23.5%。（2）我们的框架灵活，可以无缝集成到现有方法中。（3）我们的框架可以有效地扩展到多智能体操作任务中，比集成控制SOTA提高了28%。（4）性能提升源自解耦设计本身，仅使用1/6的模型尺寸，成功率超过SOTA 16.5%。

更新时间: 2025-03-12 09:28:41

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2503.09186v1

MsaMIL-Net: An End-to-End Multi-Scale Aware Multiple Instance Learning Network for Efficient Whole Slide Image Classification

Bag-based Multiple Instance Learning (MIL) approaches have emerged as the mainstream methodology for Whole Slide Image (WSI) classification. However, most existing methods adopt a segmented training strategy, which first extracts features using a pre-trained feature extractor and then aggregates these features through MIL. This segmented training approach leads to insufficient collaborative optimization between the feature extraction network and the MIL network, preventing end-to-end joint optimization and thereby limiting the overall performance of the model. Additionally, conventional methods typically extract features from all patches of fixed size, ignoring the multi-scale observation characteristics of pathologists. This not only results in significant computational resource waste when tumor regions represent a minimal proportion (as in the Camelyon16 dataset) but may also lead the model to suboptimal solutions. To address these limitations, this paper proposes an end-to-end multi-scale WSI classification framework that integrates multi-scale feature extraction with multiple instance learning. Specifically, our approach includes: (1) a semantic feature filtering module to reduce interference from non-lesion areas; (2) a multi-scale feature extraction module to capture pathological information at different levels; and (3) a multi-scale fusion MIL module for global modeling and feature integration. Through an end-to-end training strategy, we simultaneously optimize both the feature extractor and MIL network, ensuring maximum compatibility between them. Experiments were conducted on three cross-center datasets (DigestPath2019, BCNB, and UBC-OCEAN). Results demonstrate that our proposed method outperforms existing state-of-the-art approaches in terms of both accuracy (ACC) and AUC metrics.

Updated: 2025-03-12 09:27:31

标题: MsaMIL-Net：一种端到端的多尺度感知多实例学习网络，用于高效的整张幻灯片图像分类

摘要: 基于袋式多实例学习（MIL）方法已经成为全幻灯片图像（WSI）分类的主流方法。然而，大多数现有方法采用分段训练策略，首先使用预训练的特征提取器提取特征，然后通过MIL对这些特征进行聚合。这种分段训练方法导致特征提取网络与MIL网络之间的协同优化不足，阻碍了端到端的联合优化，从而限制了模型的整体性能。此外，传统方法通常从固定大小的所有补丁中提取特征，忽略了病理学家的多尺度观察特征。这不仅在肿瘤区域代表最小比例（如Camelyon16数据集）时导致了显著的计算资源浪费，还可能导致模型达到次优解。为了解决这些限制，本文提出了一种端到端多尺度WSI分类框架，将多尺度特征提取与多实例学习相结合。具体而言，我们的方法包括：（1）一个语义特征过滤模块，以减少非病变区域的干扰；（2）一个多尺度特征提取模块，以不同级别捕获病理信息；和（3）一个多尺度融合MIL模块，用于全局建模和特征集成。通过端到端的训练策略，我们同时优化特征提取器和MIL网络，确保它们之间的最大兼容性。在三个跨中心数据集（DigestPath2019、BCNB和UBC-OCEAN）上进行了实验。结果表明，我们提出的方法在准确性（ACC）和AUC指标方面优于现有的最先进方法。

更新时间: 2025-03-12 09:27:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.08581v2

Exploiting Unstructured Sparsity in Fully Homomorphic Encrypted DNNs

The deployment of deep neural networks (DNNs) in privacy-sensitive environments is constrained by computational overheads in fully homomorphic encryption (FHE). This paper explores unstructured sparsity in FHE matrix multiplication schemes as a means of reducing this burden while maintaining model accuracy requirements. We demonstrate that sparsity can be exploited in arbitrary matrix multiplication, providing runtime benefits compared to a baseline naive algorithm at all sparsity levels. This is a notable departure from the plaintext domain, where there is a trade-off between sparsity and the overhead of the sparse multiplication algorithm. In addition, we propose three sparse multiplication schemes in FHE based on common plaintext sparse encodings. We demonstrate the performance gain is scheme-invariant; however, some sparse schemes vastly reduce the memory storage requirements of the encrypted matrix at high sparsity values. Our proposed sparse schemes yield an average performance gain of 2.5x at 50% unstructured sparsity, with our multi-threading scheme providing a 32.5x performance increase over the equivalent single-threaded sparse computation when utilizing 64 cores.

Updated: 2025-03-12 09:24:31

标题: 利用全同态加密的DNN中的非结构稀疏性

摘要: 在隐私敏感环境中部署深度神经网络（DNNs）受到完全同态加密（FHE）中的计算开销的限制。本文探讨了FHE矩阵乘法方案中的非结构化稀疏性，作为减轻这一负担同时保持模型精度要求的手段。我们展示了稀疏性可以在任意矩阵乘法中被利用，与基线朴素算法相比，可以获得运行时优势在所有稀疏水平上。这与明文域有显著不同，在明文域中，稀疏性和稀疏乘法算法的开销之间存在权衡。此外，我们提出了基于常见明文稀疏编码的三种FHE中的稀疏乘法方案。我们展示了性能提升是方案不变的；然而，一些稀疏方案在高稀疏值时大大降低了加密矩阵的存储需求。我们提出的稀疏方案在50%的非结构化稀疏性下平均性能提升了2.5倍，我们的多线程方案在利用64个核心时提供了32.5倍的性能增加，相较于等效的单线程稀疏计算。

更新时间: 2025-03-12 09:24:31

领域: cs.CR,cs.DC,cs.LG,cs.PF

下载: http://arxiv.org/abs/2503.09184v1

Memory-Efficient 4-bit Preconditioned Stochastic Optimization

Preconditioned stochastic optimization algorithms, exemplified by Shampoo, outperform first-order optimizers by offering theoretical convergence benefits and practical gains in large-scale neural network training. However, they incur substantial memory overhead due to the storage demands of non-diagonal preconditioning matrices. To address this, we introduce 4-bit quantization for Shampoo's preconditioners. We introduce two key methods: First, we apply Cholesky decomposition followed by quantization of the Cholesky factors, reducing memory usage by leveraging their lower triangular structure while better preserving spectral properties to minimize information loss. To our knowledge, this is the first quantization approach applied to Cholesky factors of preconditioners. Second, we incorporate error feedback in the quantization process, efficiently storing Cholesky factor and error state in the lower and upper triangular parts of the same matrix. Through extensive experiments, we demonstrate that combining Cholesky quantization with error feedback enhances memory efficiency and algorithm performance in large-scale deep-learning tasks. Theoretically, we also provide convergence proofs for quantized Shampoo under both smooth and non-smooth stochastic optimization settings.

Updated: 2025-03-12 09:19:31

标题: 高效的存储4位预处理随机优化

摘要: 具有预处理的随机优化算法，例如Shampoo，在大规模神经网络训练中通过提供理论收敛优势和实际收益来优于一阶优化器。然而，由于非对角预处理矩阵的存储需求，它们会产生大量内存开销。为解决这个问题，我们引入了Shampoo预处理器的4位量化。我们引入了两种关键方法：首先，我们应用Cholesky分解，然后对Cholesky因子进行量化，通过利用它们的下三角结构来减少内存使用，同时更好地保留谱特性以最小化信息损失。据我们所知，这是首个应用于预处理器的Cholesky因子的量化方法。其次，我们在量化过程中结合错误反馈，将Cholesky因子和错误状态有效地存储在同一矩阵的下三角和上三角部分。通过大量实验证明，将Cholesky量化与错误反馈相结合可以增强大规模深度学习任务中的内存效率和算法性能。理论上，我们还为在平滑和非平滑随机优化设置下的量化Shampoo提供了收敛证明。

更新时间: 2025-03-12 09:19:31

领域: cs.LG,cs.CV,math.OC

下载: http://arxiv.org/abs/2412.10663v2

Depth Any Video with Scalable Synthetic Data

Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results. In this paper, we introduce Depth Any Video, a model that tackles the challenge through two key innovations. First, we develop a scalable synthetic data pipeline, capturing real-time video depth data from diverse virtual environments, yielding 40,000 video clips of 5-second duration, each with precise depth annotations. Second, we leverage the powerful priors of generative video diffusion models to handle real-world videos effectively, integrating advanced techniques such as rotary position encoding and flow matching to further enhance flexibility and efficiency. Unlike previous models, which are limited to fixed-length video sequences, our approach introduces a novel mixed-duration training strategy that handles videos of varying lengths and performs robustly across different frame rates-even on single frames. At inference, we propose a depth interpolation method that enables our model to infer high-resolution video depth across sequences of up to 150 frames. Our model outperforms all previous generative depth models in terms of spatial accuracy and temporal consistency. The code and model weights are open-sourced.

Updated: 2025-03-12 09:16:59

标题: 用可扩展的合成数据深度分析任何视频

摘要: 视频深度估计长期以来受到一致且可扩展的地面真实数据的稀缺性的阻碍，导致结果不一致且不可靠。在本文中，我们介绍了Depth Any Video，这是一个通过两项关键创新应对挑战的模型。首先，我们开发了一个可扩展的合成数据管道，从多样的虚拟环境中捕获实时视频深度数据，产生了40,000个持续5秒的视频剪辑，每个都有精确的深度标注。其次，我们利用生成式视频扩散模型的强大先验知识有效处理现实世界的视频，整合了先进的技术，如旋转位置编码和流匹配，进一步提高了灵活性和效率。与以往的模型不同，这些模型仅限于固定长度的视频序列，我们的方法引入了一种新颖的混合持续时间训练策略，可处理不同长度的视频并在不同帧率下表现稳健-甚至可以处理单帧。在推断阶段，我们提出了一种深度插值方法，使我们的模型能够推断出高分辨率视频深度，覆盖多达150帧的序列。我们的模型在空间精度和时间一致性方面优于所有以往的生成深度模型。代码和模型权重已开源。

更新时间: 2025-03-12 09:16:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.10815v2

Dynamic Feature Selection from Variable Feature Sets Using Features of Features

Machine learning models usually assume that a set of feature values used to obtain an output is fixed in advance. However, in many real-world problems, a cost is associated with measuring these features. To address the issue of reducing measurement costs, various methods have been proposed to dynamically select which features to measure, but existing methods assume that the set of measurable features remains constant, which makes them unsuitable for cases where the set of measurable features varies from instance to instance. To overcome this limitation, we define a new problem setting for Dynamic Feature Selection (DFS) with variable feature sets and propose a deep learning method that utilizes prior information about each feature, referred to as ''features of features''. Experimental results on several datasets demonstrate that the proposed method effectively selects features based on the prior information, even when the set of measurable features changes from instance to instance.

Updated: 2025-03-12 09:13:21

标题: 利用特征选择特征从可变特征集合中动态选择特征

摘要: 机器学习模型通常假设用于获得输出的一组特征值事先固定。然而，在许多实际问题中，测量这些特征值会带来成本。为了解决减少测量成本的问题，已经提出了各种方法来动态选择要测量的特征，但现有方法假设可测量特征的集合保持不变，这使它们不适用于可测量特征的集合从实例到实例变化的情况。为了克服这一限制，我们定义了一个新的问题设置，即具有可变特征集的动态特征选择（DFS），并提出了一种利用关于每个特征的先验信息的深度学习方法，称为“特征的特征”。对几个数据集的实验结果表明，所提出的方法可以有效地基于先验信息选择特征，即使可测量特征的集合在实例到实例之间发生变化。

更新时间: 2025-03-12 09:13:21

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2503.09181v1

Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers

In this work, we present a generalized formulation of the Transformer algorithm by reinterpreting its core mechanisms within the framework of Path Integral formalism. In this perspective, the attention mechanism is recast as a process that integrates all possible transition paths leading to future token states, with temporal evolution governed by the Feed-Forward Network. By systematically mapping each component of the Transformer to its counterpart in the Path Integral formulation, we obtain a more compact and efficient representation, in which the contextual information of a sequence is condensed into memory-like segments. These segments are recurrently processed across Transformer layers, enabling more effective long-term information retention. We validate the effectiveness of this approach through the Passkey retrieval task and a summarization task, demonstrating that the proposed method preserves historical information while exhibiting memory usage that scales linearly with sequence length. This contrasts with the non-linear memory growth typically observed in standard attention mechanisms. We expect that this quantum-inspired generalization of the Transformer architecture will open new avenues for enhancing both the efficiency and expressiveness of future Transformer models.

Updated: 2025-03-12 09:13:15

标题: 路径积分形式中无限上下文转换器中的折叠上下文浓缩

摘要: 在这项工作中，我们通过重新解释其核心机制，提出了Transformer算法的一般化公式，将其置于路径积分形式主义的框架内。从这个角度来看，注意力机制被重新构造为一个过程，该过程整合了所有可能导致未来标记状态的过渡路径，其时间演变由前馈网络控制。通过系统地将Transformer的每个组件映射到路径积分公式中的对应组件，我们获得了一种更紧凑和高效的表示形式，其中序列的上下文信息被压缩为类似于记忆的片段。这些片段在Transformer层之间经常被处理，从而实现更有效的长期信息保留。我们通过Passkey检索任务和一个总结任务验证了这种方法的有效性，证明了所提出的方法保留了历史信息，同时展示了随着序列长度线性增长的记忆使用量。这与通常在标准注意力机制中观察到的非线性记忆增长形成对比。我们预计，这种受量子启发的Transformer架构的一般化将为增强未来Transformer模型的效率和表达能力开辟新的途径。

更新时间: 2025-03-12 09:13:15

领域: hep-ph,cs.AI,cs.CL,cs.LG,cs.NE

下载: http://arxiv.org/abs/2405.04620v4

Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions

Modern reconstruction techniques can effectively model complex 3D scenes from sparse 2D views. However, automatically assessing the quality of novel views and identifying artifacts is challenging due to the lack of ground truth images and the limitations of No-Reference image metrics in predicting reliable artifact maps. The absence of such metrics hinders the assessment of the quality of novel views and limits the adoption of post-processing techniques, such as inpainting, to enhance reconstruction quality. To tackle this, recent work has established a new category of metrics (Cross-Reference), predicting image quality solely by leveraging context from alternate viewpoint captures (arXiv:2404.14409). In this work, we propose a new Cross-Reference metric, Puzzle Similarity, which is designed to localize artifacts in novel views. Our approach utilizes image patch statistics from the input views to establish a scene-specific distribution, later used to identify poorly reconstructed regions in the novel views. Given the lack of good measures to evaluate Cross-Reference methods in the context of 3D reconstruction, we collected a novel human-labeled dataset of artifact and distortion maps in unseen reconstructed views. Through this dataset, we demonstrate that our method achieves state-of-the-art localization of artifacts in novel views, correlating with human assessment, even without aligned references. We can leverage our new metric to enhance applications like automatic image restoration, guided acquisition, or 3D reconstruction from sparse inputs. Find the project page at https://nihermann.github.io/puzzlesim/ .

Updated: 2025-03-12 09:04:43

标题: 谜题相似性：一种感知引导的跨参考度量，在3D场景重建中用于工件检测

摘要: 现代重建技术可以有效地从稀疏的2D视图对复杂的3D场景进行建模。然而，由于缺乏基准图像和无参考图像度量在预测可靠的伪影图中的局限性，自动评估新视图的质量和识别伪影是具有挑战性的。缺乏这样的度量指标阻碍了对新视图质量的评估，并限制了采用后处理技术，如修复，以增强重建质量。为了解决这个问题，最近的工作建立了一种新的度量指标类别（交叉参考），仅通过利用来自交叉视角捕获的上下文来预测图像质量（arXiv:2404.14409）。在这项工作中，我们提出了一种新的交叉参考度量指标，拼图相似性，旨在定位新视图中的伪影。我们的方法利用输入视图中的图像块统计数据来建立一个特定场景的分布，后来用于识别新视图中重建不良的区域。鉴于缺乏在3D重建背景下评估交叉参考方法的良好度量标准，我们收集了一个新的人工标记的数据集，其中包含未见重建视图中的伪影和失真图。通过这个数据集，我们证明我们的方法在新视图中实现了与人类评估相关的伪影的最新定位，即使没有对齐的参考。我们可以利用我们的新度量标准来增强自动图像恢复、引导采集或从稀疏输入中进行3D重建等应用程序。在项目页面https://nihermann.github.io/puzzlesim/找到项目页面。

更新时间: 2025-03-12 09:04:43

领域: cs.CV,cs.AI,cs.GR,cs.LG,68T07, 68T45, 68T10,I.4; I.3; I.2

下载: http://arxiv.org/abs/2411.17489v2

Analysis of a multi-target linear shrinkage covariance estimator

Multi-target linear shrinkage is an extension of the standard single-target linear shrinkage for covariance estimation. We combine several constant matrices - the targets - with the sample covariance matrix. We derive the oracle and a \textit{bona fide} multi-target linear shrinkage estimator with exact and empirical mean. In both settings, we proved its convergence towards the oracle under Kolmogorov asymptotics. Finally, we show empirically that it outperforms other standard estimators in various situations.

Updated: 2025-03-12 09:02:55

标题: 多目标线性收缩协方差估计器的分析

摘要: 多目标线性收缩是标准单目标线性收缩用于协方差估计的扩展。我们将几个常数矩阵 - 目标 - 与样本协方差矩阵相结合。我们推导出具有准确和经验均值的正交和真实的多目标线性收缩估计量。在两种情况下，我们证明了在Kolmogorov渐近性质下向正交收敛的正交收缩估计量。最后，我们实证表明它在各种情况下优于其他标准估计量。

更新时间: 2025-03-12 09:02:55

领域: math.ST,cs.LG,math.PR,stat.ML,stat.TH

下载: http://arxiv.org/abs/2405.20086v2

Unveiling Concept Attribution in Diffusion Models

Diffusion models have shown remarkable abilities in generating realistic and high-quality images from text prompts. However, a trained model remains largely black-box; little do we know about the roles of its components in exhibiting a concept such as objects or styles. Recent works employ causal tracing to localize knowledge-storing layers in generative models without showing how other layers contribute to the target concept. In this work, we approach diffusion models' interpretability problem from a more general perspective and pose a question: \textit{``How do model components work jointly to demonstrate knowledge?''}. To answer this question, we decompose diffusion models using component attribution, systematically unveiling the importance of each component (specifically the model parameter) in generating a concept. The proposed framework, called \textbf{C}omponent \textbf{A}ttribution for \textbf{D}iffusion Model (CAD), discovers the localization of concept-inducing (positive) components, while interestingly uncovers another type of components that contribute negatively to generating a concept, which is missing in the previous knowledge localization work. Based on this holistic understanding of diffusion models, we introduce two fast, inference-time model editing algorithms, CAD-Erase and CAD-Amplify; in particular, CAD-Erase enables erasure and CAD-Amplify allows amplification of a generated concept by ablating the positive and negative components, respectively, while retaining knowledge of other concepts. Extensive experimental results validate the significance of both positive and negative components pinpointed by our framework, demonstrating the potential of providing a complete view of interpreting generative models. Our code is available \href{https://github.com/mail-research/CAD-attribution4diffusion}{here}.

Updated: 2025-03-12 09:02:44

标题: 揭示扩散模型中的概念归因

摘要: 扩散模型在从文本提示生成逼真且高质量的图像方面表现出卓越的能力。然而，经过训练的模型仍然在很大程度上是一个黑匣子；我们对其组件在展示诸如对象或风格之类概念中的作用知之甚少。最近的研究采用因果追踪来定位生成模型中存储知识的层，但并未展示其他层如何对目标概念起作用。在这项工作中，我们从更一般的角度解决了扩散模型的可解释性问题，并提出了一个问题：“模型组件如何共同展示知识？”为了回答这个问题，我们使用组件归因对扩散模型进行分解，系统地揭示了每个组件（特别是模型参数）在生成概念中的重要性。所提出的框架，称为扩散模型的组件归因（CAD），发现了诱导概念（正面）组件的定位，同时有趣地揭示了另一种类型的组件，这些组件对生成概念起到了负面贡献，而这在先前的知识定位工作中是缺失的。基于对扩散模型的这种全面理解，我们引入了两种快速的推断时模型编辑算法，CAD-Erase和CAD-Amplify；特别是，CAD-Erase允许擦除，CAD-Amplify允许通过消除正面和负面组件分别增强生成的概念，同时保留其他概念的知识。大量实验证实了我们的框架所确定的正面和负面组件的重要性，展示了解释生成模型的完整视图的潜力。我们的代码可以在此处找到：https://github.com/mail-research/CAD-attribution4diffusion。

更新时间: 2025-03-12 09:02:44

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.02542v2

Long-Term Planning Around Humans in Domestic Environments with 3D Scene Graphs

Long-term planning for robots operating in domestic environments poses unique challenges due to the interactions between humans, objects, and spaces. Recent advancements in trajectory planning have leveraged vision-language models (VLMs) to extract contextual information for robots operating in real-world environments. While these methods achieve satisfying performance, they do not explicitly model human activities. Such activities influence surrounding objects and reshape spatial constraints. This paper presents a novel approach to trajectory planning that integrates human preferences, activities, and spatial context through an enriched 3D scene graph (3DSG) representation. By incorporating activity-based relationships, our method captures the spatial impact of human actions, leading to more context-sensitive trajectory adaptation. Preliminary results demonstrate that our approach effectively assigns costs to spaces influenced by human activities, ensuring that the robot trajectory remains contextually appropriate and sensitive to the ongoing environment. This balance between task efficiency and social appropriateness enhances context-aware human-robot interactions in domestic settings. Future work includes implementing a full planning pipeline and conducting user studies to evaluate trajectory acceptability.

Updated: 2025-03-12 09:00:45

标题: 在家庭环境中利用3D场景图进行围绕人类的长期规划

摘要: 在家庭环境中长期规划机器人操作面临着独特的挑战，因为人类、物体和空间之间的互动。最近在轨迹规划方面取得的进展利用了视觉-语言模型（VLMs）来提取机器人在现实环境中操作时的上下文信息。虽然这些方法取得了令人满意的表现，但它们并没有明确地建模人类活动。这些活动影响周围的物体并重新塑造了空间约束。本文提出了一种新颖的轨迹规划方法，通过丰富的三维场景图（3DSG）表示，将人类的偏好、活动和空间背景整合进来。通过整合基于活动的关系，我们的方法捕捉了人类行为对空间的影响，从而实现了更加具有上下文敏感性的轨迹适应。初步结果表明，我们的方法有效地将成本分配给受人类活动影响的空间，确保机器人轨迹保持上下文适当性并对正在进行的环境敏感。在任务效率和社交适当性之间的平衡增强了家庭环境中上下文感知的人机交互。未来的工作包括实施完整的规划流程并进行用户研究来评估轨迹的可接受性。

更新时间: 2025-03-12 09:00:45

领域: cs.RO,cs.AI,68T40,I.2

下载: http://arxiv.org/abs/2503.09173v1

State-space systems as dynamic generative models

A probabilistic framework to study the dependence structure induced by deterministic discrete-time state-space systems between input and output processes is introduced. General sufficient conditions are formulated under which output processes exist and are unique once an input process has been fixed, a property that in the deterministic state-space literature is known as the echo state property. When those conditions are satisfied, the given state-space system becomes a generative model for probabilistic dependences between two sequence spaces. Moreover, those conditions guarantee that the output depends continuously on the input when using the Wasserstein metric. The output processes whose existence is proved are shown to be causal in a specific sense and to generalize those studied in purely deterministic situations. The results in this paper constitute a significant stochastic generalization of sufficient conditions for the deterministic echo state property to hold, in the sense that the stochastic echo state property can be satisfied under contractivity conditions that are strictly weaker than those in deterministic situations. This means that state-space systems can induce a purely probabilistic dependence structure between input and output sequence spaces even when there is no functional relation between those two spaces.

Updated: 2025-03-12 09:00:23

标题: State-space系统作为动态生成模型

摘要: 提出了一个概率框架来研究确定性离散时间状态空间系统之间输入和输出过程引起的依赖结构。在给定输入过程后，制定了一般充分条件，使得输出过程存在且唯一，这在确定性状态空间文献中被称为回声状态特性。当这些条件得到满足时，给定的状态空间系统成为两个序列空间之间概率依赖的生成模型。此外，这些条件保证了在使用Wasserstein度量时输出过程连续地依赖于输入。已证明存在的输出过程在特定意义上是因果的，并且广泛应用于纯确定性情况下研究的情况。本文的结果构成了确定性回声状态特性的充分条件的显著随机泛化，因为随机回声状态特性可以在比确定性情况更弱的收缩条件下满足。这意味着状态空间系统可以在两个序列空间之间引发纯粹的概率依赖结构，即使这两个空间之间没有功能关系。

更新时间: 2025-03-12 09:00:23

领域: stat.ML,cs.LG,math.DS,math.PR,math.ST,stat.TH,37H05, 37N35, 62M10, 68T05

下载: http://arxiv.org/abs/2404.08717v3

Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images

Early detection of dementia, such as Alzheimer's disease (AD) or mild cognitive impairment (MCI), is essential to enable timely intervention and potential treatment. Accurate detection of AD/MCI is challenging due to the high complexity, cost, and often invasive nature of current diagnostic techniques, which limit their suitability for large-scale population screening. Given the shared embryological origins and physiological characteristics of the retina and brain, retinal imaging is emerging as a potentially rapid and cost-effective alternative for the identification of individuals with or at high risk of AD. In this paper, we present a novel PolarNet+ that uses retinal optical coherence tomography angiography (OCTA) to discriminate early-onset AD (EOAD) and MCI subjects from controls. Our method first maps OCTA images from Cartesian coordinates to polar coordinates, allowing approximate sub-region calculation to implement the clinician-friendly early treatment of diabetic retinopathy study (ETDRS) grid analysis. We then introduce a multi-view module to serialize and analyze the images along three dimensions for comprehensive, clinically useful information extraction. Finally, we abstract the sequence embedding into a graph, transforming the detection task into a general graph classification problem. A regional relationship module is applied after the multi-view module to excavate the relationship between the sub-regions. Such regional relationship analyses validate known eye-brain links and reveal new discriminative patterns.

Updated: 2025-03-12 08:58:41

标题: 超越眼睛：使用视网膜OCTA图像进行早期痴呆症检测的关系模型

摘要: 早期发现痴呆症，如阿尔茨海默病（AD）或轻度认知障碍（MCI），对及时干预和潜在治疗至关重要。由于目前诊断技术的复杂性高、成本高且通常侵入性较强，准确检测AD/MCI具有挑战性，这限制了它们用于大规模人群筛查的适用性。鉴于视网膜和大脑具有共同的胚胎起源和生理特征，视网膜成像正逐渐成为一种快速且成本效益高的识别患有或高危AD个体的替代方法。本文介绍了一种新型PolarNet+，利用视网膜光学相干断层扫描血管成像（OCTA）来区分早发AD（EOAD）和MCI受试者与对照组。我们的方法首先将OCTA图像从笛卡尔坐标映射到极坐标，允许近似子区计算以实现临床友好的糖尿病视网膜早期治疗研究（ETDRS）网格分析。然后，我们引入一个多视图模块，沿三个维度对图像进行串行化和分析，以实现全面、临床有用的信息提取。最后，我们将序列嵌入抽象为一个图形，将检测任务转化为一个一般的图分类问题。在多视图模块之后应用区域关系模块，以挖掘子区之间的关系。这种区域关系分析验证了已知的眼脑联系，并揭示了新的区分模式。

更新时间: 2025-03-12 08:58:41

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.05117v2

Effective Feature Selection for Predicting Spreading Factor with ML in Large LoRaWAN-based Mobile IoT Networks

LoRaWAN is a low-power long-range protocol that enables reliable and robust communication. This paper addresses the challenge of predicting the spreading factor (SF) in LoRaWAN networks using machine learning (ML) techniques. Optimal SF allocation is crucial for optimizing data transmission in IoT-enabled mobile devices, yet it remains a challenging task due to the fluctuation in environment and network conditions. We evaluated ML model performance across a large publicly available dataset to explore the best feature across key LoRaWAN features such as RSSI, SNR, frequency, distance between end devices and gateways, and antenna height of the end device, further, we also experimented with 31 different combinations possible for 5 features. We trained and evaluated the model using k-nearest neighbors (k-NN), Decision Tree Classifier (DTC), Random Forest (RF), and Multinomial Logistic Regression (MLR) algorithms. The combination of RSSI and SNR was identified as the best feature set. The finding of this paper provides valuable information for reducing the overall cost of dataset collection for ML model training and extending the battery life of LoRaWAN devices. This work contributes to a more reliable LoRaWAN system by understanding the importance of specific feature sets for optimized SF allocation.

Updated: 2025-03-12 08:58:28

标题: 在大型LoRaWAN移动物联网中利用机器学习进行预测传播因子的有效特征选择

摘要: LoRaWAN是一种低功耗长距离协议，可以实现可靠和稳健的通信。本文通过使用机器学习（ML）技术解决了在LoRaWAN网络中预测扩频因子（SF）的挑战。在IoT设备中，最佳的SF分配对于优化数据传输至关重要，然而由于环境和网络条件的波动，这仍然是一项具有挑战性的任务。我们评估了ML模型在一个大型公开可用的数据集上的性能，以探索关键LoRaWAN特征（如RSSI、SNR、频率、终端设备与网关之间的距离以及终端设备的天线高度）之间的最佳特征。此外，我们还尝试了5个特征的31种不同组合。我们使用k-最近邻（k-NN）、决策树分类器（DTC）、随机森林（RF）和多项式逻辑回归（MLR）算法对模型进行训练和评估。发现RSSI和SNR的组合被确定为最佳特征集。本文的研究结果为减少ML模型训练的数据集收集总成本和延长LoRaWAN设备的电池寿命提供了宝贵信息。这项工作通过理解特定特征集对于优化SF分配的重要性，有助于构建更可靠的LoRaWAN系统。

更新时间: 2025-03-12 08:58:28

领域: cs.LG

下载: http://arxiv.org/abs/2503.09170v1

SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Large Language Models (LLMs) have demonstrated improved generation performance by incorporating externally retrieved knowledge, a process known as retrieval-augmented generation (RAG). Despite the potential of this approach, existing studies evaluate RAG effectiveness by 1) assessing retrieval and generation components jointly, which obscures retrieval's distinct contribution, or 2) examining retrievers using traditional metrics such as NDCG, which creates a gap in understanding retrieval's true utility in the overall generation process. To address the above limitations, in this work, we introduce an automatic evaluation method that measures retrieval quality through the lens of information gain within the RAG framework. Specifically, we propose Semantic Perplexity (SePer), a metric that captures the LLM's internal belief about the correctness of the retrieved information. We quantify the utility of retrieval by the extent to which it reduces semantic perplexity post-retrieval. Extensive experiments demonstrate that SePer not only aligns closely with human preferences but also offers a more precise and efficient evaluation of retrieval utility across diverse RAG scenarios.

Updated: 2025-03-12 08:49:58

标题: SePer：通过语义困惑度降低的视角衡量检索效用

摘要: Large Language Models（LLMs）已经通过整合外部检索知识，即所谓的检索增强生成（RAG）过程，展示了改善的生成性能。尽管这种方法具有潜力，但现有研究评估RAG的有效性时存在以下问题：1）联合评估检索和生成组件，使检索的独特贡献变得模糊；2）使用传统指标如NDCG来检查检索器，会导致对检索在整体生成过程中真正效用的理解存在差距。为了解决上述限制，在这项工作中，我们引入了一种自动评估方法，通过RAG框架内的信息增益视角来衡量检索质量。具体来说，我们提出了语义困惑（SePer），这是一个捕捉LLM对检索信息正确性的内部信念的指标。我们通过检索在检索后减少语义困惑的程度来量化检索的效用。大量实验证明，SePer不仅与人类喜好密切一致，而且在各种RAG场景中提供了更精确和高效的检索效用评估。

更新时间: 2025-03-12 08:49:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.01478v4

Blockchain Data Analytics: Review and Challenges

The integration of blockchain technology with data analytics is essential for extracting insights in the cryptocurrency space. Although academic literature on blockchain data analytics is limited, various industry solutions have emerged to address these needs. This paper provides a comprehensive literature review, drawing from both academic research and industry applications. We classify blockchain analytics tools into categories such as block explorers, on-chain data providers, research platforms, and crypto market data providers. Additionally, we discuss the challenges associated with blockchain data analytics, including data accessibility, scalability, accuracy, and interoperability. Our findings emphasize the importance of bridging academic research and industry innovations to advance blockchain data analytics.

Updated: 2025-03-12 08:49:51

标题: 区块链数据分析：回顾与挑战

摘要: 将区块链技术与数据分析相结合对于在加密货币领域提取见解至关重要。尽管有关区块链数据分析的学术文献有限，但各种行业解决方案已经出现以解决这些需求。本文提供了一份全面的文献综述，涵盖了学术研究和行业应用。我们将区块链分析工具分类为区块浏览器、链上数据提供商、研究平台和加密市场数据提供商等类别。此外，我们讨论了与区块链数据分析相关的挑战，包括数据获取、可扩展性、准确性和互操作性。我们的研究结果强调了将学术研究与行业创新联系起来推动区块链数据分析的重要性。

更新时间: 2025-03-12 08:49:51

领域: cs.CR,cs.DB

下载: http://arxiv.org/abs/2503.09165v1

AI-Driven Decision Support in Oncology: Evaluating Data Readiness for Skin Cancer Treatment

This research focuses on evaluating and enhancing data readiness for the development of an Artificial Intelligence (AI)-based Clinical Decision Support System (CDSS) in the context of skin cancer treatment. The study, conducted at the Skin Tumor Center of the University Hospital M\"unster, delves into the essential role of data quality, availability, and extractability in implementing effective AI applications in oncology. By employing a multifaceted methodology, including literature review, data readiness assessment, and expert workshops, the study addresses the challenges of integrating AI into clinical decision-making. The research identifies crucial data points for skin cancer treatment decisions, evaluates their presence and quality in various information systems, and highlights the difficulties in extracting information from unstructured data. The findings underline the significance of high-quality, accessible data for the success of AI-driven CDSS in medical settings, particularly in the complex field of oncology.

Updated: 2025-03-12 08:49:03

标题: 基于人工智能的肿瘤学决策支持：评估皮肤癌治疗数据的准备情况

摘要: 这项研究旨在评估和增强数据准备工作，以开发基于人工智能（AI）的临床决策支持系统（CDSS），并将其应用于皮肤癌治疗领域。该研究在明斯特大学医院皮肤肿瘤中心进行，探讨了数据质量、可用性和可提取性在实施肿瘤学中有效的AI应用中的关键作用。通过采用多方面的方法，包括文献综述、数据准备评估和专家研讨会，该研究解决了将AI整合到临床决策中的挑战。研究确定了皮肤癌治疗决策的关键数据点，评估了它们在各种信息系统中的存在和质量，并强调了从非结构化数据中提取信息的困难。研究结果强调了高质量、可访问的数据对医疗设置中AI驱动的CDSS成功的重要性，特别是在肿瘤学这一复杂领域。

更新时间: 2025-03-12 08:49:03

领域: cs.AI,68T99, 62P10, 92C50,I.2.6; J.3; H.2.8

下载: http://arxiv.org/abs/2503.09164v1

Can open source large language models be used for tumor documentation in Germany? -- An evaluation on urological doctors' notes

Tumor documentation in Germany is largely done manually, requiring reading patient records and entering data into structured databases. Large language models (LLMs) could potentially enhance this process by improving efficiency and reliability. This evaluation tests eleven different open source LLMs with sizes ranging from 1-70 billion model parameters on three basic tasks of the tumor documentation process: identifying tumor diagnoses, assigning ICD-10 codes, and extracting the date of first diagnosis. For evaluating the LLMs on these tasks, a dataset of annotated text snippets based on anonymized doctors' notes from urology was prepared. Different prompting strategies were used to investigate the effect of the number of examples in few-shot prompting and to explore the capabilities of the LLMs in general. The models Llama 3.1 8B, Mistral 7B, and Mistral NeMo 12 B performed comparably well in the tasks. Models with less extensive training data or having fewer than 7 billion parameters showed notably lower performance, while larger models did not display performance gains. Examples from a different medical domain than urology could also improve the outcome in few-shot prompting, which demonstrates the ability of LLMs to handle tasks needed for tumor documentation. Open source LLMs show a strong potential for automating tumor documentation. Models from 7-12 billion parameters could offer an optimal balance between performance and resource efficiency. With tailored fine-tuning and well-designed prompting, these models might become important tools for clinical documentation in the future. The code for the evaluation is available from https://github.com/stefan-m-lenz/UroLlmEval. We also release the dataset as a new valuable resource that addresses the shortage of authentic and easily accessible benchmarks in German-language medical NLP.

Updated: 2025-03-12 08:48:46

标题: 开源大型语言模型能否用于德国的肿瘤文档记录？——对泌尿科医生笔记的评估

摘要: 在德国，肿瘤文档主要是通过手动完成的，需要阅读病历并将数据输入到结构化数据库中。大型语言模型(LLMs)有可能通过提高效率和可靠性来增强这一过程。本评估测试了十一个不同的开源LLMs，模型参数范围从10亿到70亿，用于肿瘤文档过程的三项基本任务：识别肿瘤诊断、分配ICD-10代码和提取首次诊断日期。为了评估LLMs在这些任务上的表现，准备了一组基于泌尿科医生匿名笔记的注释文本片段数据集。使用不同的提示策略来调查在少样本提示中示例数量的影响，并探索LLMs的能力。Llama 3.1 8B、Mistral 7B和Mistral NeMo 12 B模型在任务中表现相当不错。训练数据较少或参数少于70亿的模型表现明显较低，而更大型的模型并没有表现出性能提升。来自泌尿科以外的不同医疗领域的例子也可以改善少样本提示的结果，这表明LLMs具有处理肿瘤文档所需任务的能力。开源LLMs显示出自动化肿瘤文档的巨大潜力。70-120亿参数的模型可能在性能和资源效率之间提供最佳平衡。通过定制微调和精心设计的提示，这些模型可能成为未来临床文档的重要工具。评估的代码可从https://github.com/stefan-m-lenz/UroLlmEval获取。我们还发布了该数据集，作为解决德语医学自然语言处理中真实和易于获得的基准数据不足的宝贵资源。

更新时间: 2025-03-12 08:48:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2501.12106v2

ProtTeX: Structure-In-Context Reasoning and Editing of Proteins with Large Language Models

Large language models have made remarkable progress in the field of molecular science, particularly in understanding and generating functional small molecules. This success is largely attributed to the effectiveness of molecular tokenization strategies. In protein science, the amino acid sequence serves as the sole tokenizer for LLMs. However, many fundamental challenges in protein science are inherently structure-dependent. The absence of structure-aware tokens significantly limits the capabilities of LLMs for comprehensive biomolecular comprehension and multimodal generation. To address these challenges, we introduce a novel framework, ProtTeX, which tokenizes the protein sequences, structures, and textual information into a unified discrete space. This innovative approach enables joint training of the LLM exclusively through the Next-Token Prediction paradigm, facilitating multimodal protein reasoning and generation. ProtTeX enables general LLMs to perceive and process protein structures through sequential text input, leverage structural information as intermediate reasoning components, and generate or manipulate structures via sequential text output. Experiments demonstrate that our model achieves significant improvements in protein function prediction, outperforming the state-of-the-art domain expert model with a twofold increase in accuracy. Our framework enables high-quality conformational generation and customizable protein design. For the first time, we demonstrate that by adopting the standard training and inference pipelines from the LLM domain, ProtTeX empowers decoder-only LLMs to effectively address diverse spectrum of protein-related tasks.

Updated: 2025-03-12 08:46:33

标题: ProtTeX：使用大型语言模型对蛋白质进行结构上下文推理和编辑

摘要: 大型语言模型在分子科学领域取得了显著进展，特别是在理解和生成功能性小分子方面。这一成功很大程度上归因于分子标记策略的有效性。在蛋白质科学中，氨基酸序列作为LLM的唯一标记器。然而，许多蛋白质科学中的基本挑战本质上依赖于结构。缺乏结构感知标记明显限制了LLM在全面生物分子理解和多模态生成方面的能力。为了解决这些挑战，我们引入了一种新颖的框架ProtTeX，将蛋白质序列、结构和文本信息标记为统一的离散空间。这种创新方法通过下一个标记预测范式，实现了LLM的联合训练，促进了多模态蛋白质推理和生成。ProtTeX使一般的LLM能够通过顺序文本输入来感知和处理蛋白质结构，利用结构信息作为中间推理组件，并通过顺序文本输出生成或操作结构。实验证明，我们的模型在蛋白质功能预测方面取得了显著改进，准确率比最先进的领域专家模型提高了一倍。我们的框架实现了高质量的构象生成和可定制的蛋白质设计。我们首次证明，通过采用LLM领域的标准训练和推理流程，ProtTeX使解码器专用的LLM能够有效地解决各种与蛋白质相关的任务。

更新时间: 2025-03-12 08:46:33

领域: q-bio.BM,cs.AI

下载: http://arxiv.org/abs/2503.08179v2

Technical Insights and Legal Considerations for Advancing Federated Learning in Bioinformatics

Federated learning leverages data across institutions to improve clinical discovery while complying with data-sharing restrictions and protecting patient privacy. As the evolution of biobanks in genetics and systems biology has proved, accessing more extensive and varied data pools leads to a faster and more robust exploration and translation of results. More widespread use of federated learning may have the same impact in bioinformatics, allowing access to many combinations of genotypic, phenotypic and environmental information that are undercovered or not included in existing biobanks. This paper reviews the methodological, infrastructural and legal issues that academic and clinical institutions must address before implementing it. Finally, we provide recommendations for the reliable use of federated learning and its effective translation into clinical practice.

Updated: 2025-03-12 08:45:31

标题: 生物信息学中推进联邦学习的技术洞见和法律考虑

摘要: 联邦学习利用跨机构的数据，以改进临床发现，同时遵守数据共享限制并保护患者隐私。正如遗传学和系统生物学中生物库的发展所证明的那样，访问更广泛和各种数据池可以加快结果的探索和转化速度，使其更加稳健。联邦学习的更广泛应用可能会在生物信息学领域产生相同的影响，允许访问许多基因型、表型和环境信息的组合，这些信息在现有生物库中未被充分利用或未包含。本文回顾了学术和临床机构在实施之前必须解决的方法论、基础设施和法律问题。最后，我们提供了关于可靠使用联邦学习及其有效转化为临床实践的建议。

更新时间: 2025-03-12 08:45:31

领域: q-bio.OT,cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.09649v1

A Survey of Direct Preference Optimization

Large Language Models (LLMs) have demonstrated unprecedented generative capabilities, yet their alignment with human values remains critical for ensuring helpful and harmless deployments. While Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful paradigm for aligning LLMs with human preferences, its reliance on complex reward modeling introduces inherent trade-offs in computational efficiency and training stability. In this context, Direct Preference Optimization (DPO) has recently gained prominence as a streamlined alternative that directly optimizes LLMs using human preferences, thereby circumventing the need for explicit reward modeling. Owing to its theoretical elegance and computational efficiency, DPO has rapidly attracted substantial research efforts exploring its various implementations and applications. However, this field currently lacks systematic organization and comparative analysis. In this survey, we conduct a comprehensive overview of DPO and introduce a novel taxonomy, categorizing previous works into four key dimensions: data strategy, learning framework, constraint mechanism, and model property. We further present a rigorous empirical analysis of DPO variants across standardized benchmarks. Additionally, we discuss real-world applications, open challenges, and future directions for DPO. This work delivers both a conceptual framework for understanding DPO and practical guidance for practitioners, aiming to advance robust and generalizable alignment paradigms. All collected resources are available and will be continuously updated at https://github.com/liushunyu/awesome-direct-preference-optimization.

Updated: 2025-03-12 08:45:15

标题: 一项关于直接偏好优化的调查

摘要: 大型语言模型（LLMs）展示了前所未有的生成能力，然而它们与人类价值观的一致性对于确保有益且无害的部署至关重要。虽然从人类反馈中进行强化学习（RLHF）已经成为一种强大的范式，用于使LLMs与人类偏好保持一致，但其依赖于复杂的奖励建模会引入计算效率和训练稳定性方面的固有权衡。在这种背景下，直接偏好优化（DPO）最近作为一种简化的替代方案备受关注，直接利用人类偏好对LLMs进行优化，从而避免了需要显式奖励建模的需求。由于其理论优雅和计算效率，DPO已迅速吸引了大量研究工作，探索其各种实现和应用。然而，这个领域目前缺乏系统性组织和比较分析。在这项调查中，我们对DPO进行了全面的概述，并引入了一种新的分类法，将先前的工作分为四个关键维度：数据策略、学习框架、约束机制和模型属性。我们进一步在标准化基准上对DPO变体进行了严格的实证分析。此外，我们讨论了DPO的实际应用、开放性挑战和未来发展方向。这项工作为理解DPO提供了一个概念框架，并为从业者提供了实用指导，旨在推进稳健且可泛化的对齐范式。所有收集的资源都可以在https://github.com/liushunyu/awesome-direct-preference-optimization上找到，并将持续更新。

更新时间: 2025-03-12 08:45:15

领域: cs.LG

下载: http://arxiv.org/abs/2503.11701v1

Unreflected Use of Tabular Data Repositories Can Undermine Research Quality

Data repositories have accumulated a large number of tabular datasets from various domains. Machine Learning researchers are actively using these datasets to evaluate novel approaches. Consequently, data repositories have an important standing in tabular data research. They not only host datasets but also provide information on how to use them in supervised learning tasks. In this paper, we argue that, despite great achievements in usability, the unreflected usage of datasets from data repositories may have led to reduced research quality and scientific rigor. We present examples from prominent recent studies that illustrate the problematic use of datasets from OpenML, a large data repository for tabular data. Our illustrations help users of data repositories avoid falling into the traps of (1) using suboptimal model selection strategies, (2) overlooking strong baselines, and (3) inappropriate preprocessing. In response, we discuss possible solutions for how data repositories can prevent the inappropriate use of datasets and become the cornerstones for improved overall quality of empirical research studies.

Updated: 2025-03-12 08:41:49

标题: 未经反思的表格数据仓库使用可能会损害研究质量

摘要: 数据存储库已经积累了来自各个领域的大量表格数据集。机器学习研究人员积极利用这些数据集来评估新颖的方法。因此，数据存储库在表格数据研究中具有重要地位。它们不仅托管数据集，还提供如何在监督学习任务中使用它们的信息。在本文中，我们认为，尽管在可用性方面取得了巨大成就，但对数据存储库中数据集的未经反思的使用可能导致研究质量和科学严谨性降低。我们提供了最近一些著名研究中对来自OpenML的数据集使用存在问题的例子。我们的例证帮助数据存储库的用户避免陷入（1）使用次优模型选择策略，（2）忽视强基线和（3）不当预处理的陷阱。作为回应，我们讨论了数据存储库如何防止数据集的不当使用，并成为改善实证研究整体质量的基石的可能解决方案。

更新时间: 2025-03-12 08:41:49

领域: cs.LG

下载: http://arxiv.org/abs/2503.09159v1

QUCE: The Minimisation and Quantification of Path-Based Uncertainty for Generative Counterfactual Explanations

Deep Neural Networks (DNNs) stand out as one of the most prominent approaches within the Machine Learning (ML) domain. The efficacy of DNNs has surged alongside recent increases in computational capacity, allowing these approaches to scale to significant complexities for addressing predictive challenges in big data. However, as the complexity of DNN models rises, interpretability diminishes. In response to this challenge, explainable models such as Adversarial Gradient Integration (AGI) leverage path-based gradients provided by DNNs to elucidate their decisions. Yet the performance of path-based explainers can be compromised when gradients exhibit irregularities during out-of-distribution path traversal. In this context, we introduce Quantified Uncertainty Counterfactual Explanations (QUCE), a method designed to mitigate out-of-distribution traversal by minimizing path uncertainty. QUCE not only quantifies uncertainty when presenting explanations but also generates more certain counterfactual examples. We showcase the performance of the QUCE method by comparing it with competing methods for both path-based explanations and generative counterfactual examples.

Updated: 2025-03-12 08:31:10

标题: QUCE: 生成性反事实解释的基于路径的不确定性的最小化和量化

摘要: 深度神经网络（DNNs）作为机器学习（ML）领域中最突出的方法之一。DNNs的有效性随着最近计算能力的增加而激增，使得这些方法能够扩展到解决大数据中的预测挑战的重要复杂性。然而，随着DNN模型复杂性的增加，可解释性逐渐减弱。为了应对这一挑战，可解释模型如对抗梯度整合（AGI）利用DNN提供的基于路径的梯度来阐明它们的决策。然而，当梯度在路径遍历过程中出现异常时，基于路径的解释器的性能可能会受到影响。在这种情况下，我们引入了Quantified Uncertainty Counterfactual Explanations（QUCE），这是一种旨在通过最小化路径不确定性来减轻路径遍历过程中异常的方法。QUCE不仅在提供解释时量化不确定性，还生成更确定的反事实例。我们通过将QUCE方法与竞争方法进行比较，展示了其性能在基于路径的解释和生成反事实示例方面的表现。

更新时间: 2025-03-12 08:31:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.17516v4

Is LLMs Hallucination Usable? LLM-based Negative Reasoning for Fake News Detection

The questionable responses caused by knowledge hallucination may lead to LLMs' unstable ability in decision-making. However, it has never been investigated whether the LLMs' hallucination is possibly usable to generate negative reasoning for facilitating the detection of fake news. This study proposes a novel supervised self-reinforced reasoning rectification approach - SR$^3$ that yields both common reasonable reasoning and wrong understandings (negative reasoning) for news via LLMs reflection for semantic consistency learning. Upon that, we construct a negative reasoning-based news learning model called - \emph{NRFE}, which leverages positive or negative news-reasoning pairs for learning the semantic consistency between them. To avoid the impact of label-implicated reasoning, we deploy a student model - \emph{NRFE-D} that only takes news content as input to inspect the performance of our method by distilling the knowledge from \emph{NRFE}. The experimental results verified on three popular fake news datasets demonstrate the superiority of our method compared with three kinds of baselines including prompting on LLMs, fine-tuning on pre-trained SLMs, and other representative fake news detection methods.

Updated: 2025-03-12 08:29:59

标题: LLM的幻觉是否可用？基于LLM的负推理用于假新闻检测

摘要: 由知识幻觉引起的可疑回应可能导致LLMs在决策能力上的不稳定。然而，尚未调查LLMs的幻觉是否可能用于生成负面推理以促进检测虚假新闻。本研究提出了一种新颖的监督自我强化推理矫正方法-SR$^3$，通过LLMs反思进行语义一致性学习，产生常见合理推理和错误理解（负面推理）以用于新闻。在此基础上，我们构建了一种基于负面推理的新闻学习模型-NRFE，利用正面或负面新闻推理对学习它们之间的语义一致性。为了避免标签涉及的推理的影响，我们部署了一个学生模型-NRFE-D，只接受新闻内容作为输入，通过从NRFE中提炼知识来检验我们方法的性能。在三个流行的虚假新闻数据集上验证的实验结果证实了我们的方法与三种基线（包括在LLMs上提示、在预训练SLMs上微调和其他代表性虚假新闻检测方法）相比的优越性。

更新时间: 2025-03-12 08:29:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.09153v1

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

We introduce Reangle-A-Video, a unified framework for generating synchronized multi-view videos from a single input video. Unlike mainstream approaches that train multi-view video diffusion models on large-scale 4D datasets, our method reframes the multi-view video generation task as video-to-videos translation, leveraging publicly available image and video diffusion priors. In essence, Reangle-A-Video operates in two stages. (1) Multi-View Motion Learning: An image-to-video diffusion transformer is synchronously fine-tuned in a self-supervised manner to distill view-invariant motion from a set of warped videos. (2) Multi-View Consistent Image-to-Images Translation: The first frame of the input video is warped and inpainted into various camera perspectives under an inference-time cross-view consistency guidance using DUSt3R, generating multi-view consistent starting images. Extensive experiments on static view transport and dynamic camera control show that Reangle-A-Video surpasses existing methods, establishing a new solution for multi-view video generation. We will publicly release our code and data. Project page: https://hyeonho99.github.io/reangle-a-video/

Updated: 2025-03-12 08:26:15

标题: 重新定义视频：将4D视频生成作为视频到视频的翻译

摘要: 我们介绍了Reangle-A-Video，这是一个统一的框架，用于从单个输入视频生成同步的多视角视频。与主流方法不同，主流方法在大规模4D数据集上训练多视角视频扩散模型，我们的方法将多视角视频生成任务重新构建为视频到视频的转换，利用公开可用的图像和视频扩散先验。实质上，Reangle-A-Video分为两个阶段。（1）多视角运动学习：通过自监督方式，对图像到视频扩散变换器进行同步微调，从一组扭曲的视频中提取视角不变的运动。（2）多视角一致的图像到图像转换：将输入视频的第一帧在推理时间下进行跨视图一致性引导，使用DUSt3R将其扭曲和修补到各种摄像机视角中，生成多视角一致的起始图像。对静态视图传输和动态摄像机控制进行了大量实验，结果表明Reangle-A-Video超越了现有方法，建立了多视角视频生成的新解决方案。我们将公开发布我们的代码和数据。项目页面：https://hyeonho99.github.io/reangle-a-video/

更新时间: 2025-03-12 08:26:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09151v1

MBCT: Tree-Based Feature-Aware Binning for Individual Uncertainty Calibration

Most machine learning classifiers only concern classification accuracy, while certain applications (such as medical diagnosis, meteorological forecasting, and computation advertising) require the model to predict the true probability, known as a calibrated estimate. In previous work, researchers have developed several calibration methods to post-process the outputs of a predictor to obtain calibrated values, such as binning and scaling methods. Compared with scaling, binning methods are shown to have distribution-free theoretical guarantees, which motivates us to prefer binning methods for calibration. However, we notice that existing binning methods have several drawbacks: (a) the binning scheme only considers the original prediction values, thus limiting the calibration performance; and (b) the binning approach is non-individual, mapping multiple samples in a bin to the same value, and thus is not suitable for order-sensitive applications. In this paper, we propose a feature-aware binning framework, called Multiple Boosting Calibration Trees (MBCT), along with a multi-view calibration loss to tackle the above issues. Our MBCT optimizes the binning scheme by the tree structures of features, and adopts a linear function in a tree node to achieve individual calibration. Our MBCT is non-monotonic, and has the potential to improve order accuracy, due to its learnable binning scheme and the individual calibration. We conduct comprehensive experiments on three datasets in different fields. Results show that our method outperforms all competing models in terms of both calibration error and order accuracy. We also conduct simulation experiments, justifying that the proposed multi-view calibration loss is a better metric in modeling calibration error.

Updated: 2025-03-12 08:15:57

标题: MBCT：基于树的特征感知分箱用于个体不确定性校准

摘要: 大多数机器学习分类器仅关注分类准确性，而某些应用（如医学诊断、气象预测和计算广告）要求模型预测真实概率，即被称为校准估计。在先前的研究中，研究人员已经开发了几种校准方法，用于后处理预测器的输出以获得校准值，如分箱和缩放方法。与缩放相比，分箱方法被证明具有无分布理论保证，这促使我们更偏向于使用分箱方法进行校准。然而，我们注意到现有的分箱方法存在几个缺点：（a）分箱方案仅考虑原始预测值，从而限制了校准性能；（b）分箱方法是非个体化的，将一个箱内的多个样本映射到相同值，因此不适用于对顺序敏感的应用。在本文中，我们提出了一个特征感知的分箱框架，称为多提升校准树（MBCT），以及一个多视图校准损失来解决上述问题。我们的MBCT通过特征的树结构优化分箱方案，并在树节点中采用线性函数实现个体校准。我们的MBCT是非单调的，并具有提高顺序准确性的潜力，这归功于其可学习的分箱方案和个体校准。我们在不同领域的三个数据集上进行了全面实验。结果显示，我们的方法在校准误差和顺序准确性方面均优于所有竞争模型。我们还进行了模拟实验，证明了提出的多视图校准损失是在建模校准误差方面更好的指标。

更新时间: 2025-03-12 08:15:57

领域: cs.LG

下载: http://arxiv.org/abs/2202.04348v2

High-Rank Irreducible Cartesian Tensor Decomposition and Bases of Equivariant Spaces

Irreducible Cartesian tensors (ICTs) play a crucial role in the design of equivariant graph neural networks, as well as in theoretical chemistry and chemical physics. Meanwhile, the design space of available linear operations on tensors that preserve symmetry presents a significant challenge. The ICT decomposition and a basis of this equivariant space are difficult to obtain for high-rank tensors. After decades of research, Bonvicini (2024) recently achieves an explicit ICT decomposition for $n=5$ with factorial time/space complexity. In this work we, for the first time, obtains decomposition matrices for ICTs up to rank $n=9$ with reduced and affordable complexity, by constructing what we call path matrices. The path matrices are obtained via performing chain-like contractions with Clebsch-Gordan matrices following the parentage scheme. We prove and leverage that the concatenation of path matrices is an orthonormal change-of-basis matrix between the Cartesian tensor product space and the spherical direct sum spaces. Furthermore, we identify a complete orthogonal basis for the equivariant space, rather than a spanning set (Pearce-Crump, 2023), through this path matrices technique. To the best of our knowledge, this is also the first analytic, rather than numerical, method for theoretically obtaining arbitrary rank orthogonal ICT decomposition matrices and orthogonal equivariant bases. We further extend our result to the arbitrary tensor product and direct sum spaces, enabling free design between different spaces while keeping symmetry. The Python code is available at https://github.com/ShihaoShao-GH/ICT-decomposition-and-equivariant-bases, where the $n=6,\dots,9$ ICT decomposition matrices are obtained in 1s, 3s, 11s, and 4m32s on 28-cores Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz, respectively.

Updated: 2025-03-12 08:15:14

标题: 高阶不可约笛卡尔张量分解和等变空间的基底

摘要: 不可约笛卡尔张量（ICTs）在等变图神经网络的设计中起着关键作用，同时也在理论化学和化学物理中扮演重要角色。然而，能够保持对称性的张量上可用的线性操作的设计空间是一个重大挑战。对于高阶张量，ICT分解和该等变空间的基础很难获得。经过几十年的研究，Bonvicini（2024）最近通过阶乘时间/空间复杂度实现了$n=5$的明确ICT分解。在这项工作中，我们首次通过构建我们称之为路径矩阵，以降低且可负担得起的复杂性，获得了高达$n=9$的ICT的分解矩阵。路径矩阵是通过按照亲缘规则使用克莱布-戈登矩阵执行链状收缩而获得的。我们证明并利用路径矩阵的串联是笛卡尔张量积空间和球面直和空间之间的标准正交基变换矩阵。此外，我们通过这种路径矩阵技术，确定了等变空间的完全正交基，而不是一个张成集。根据我们所知，这也是第一种理论上获得任意秩正交ICT分解矩阵和正交等变基的方法，而不是数值方法。我们进一步将结果扩展到任意张量积和直和空间，实现在保持对称性的同时在不同空间之间自由设计。在28核Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz上，我们分别在1秒、3秒、11秒和4分钟32秒内获得了$n=6,\dots,9$的ICT分解矩阵。Python代码可在https://github.com/ShihaoShao-GH/ICT-decomposition-and-equivariant-bases找到。

更新时间: 2025-03-12 08:15:14

领域: cs.LG,math-ph,math.MP,physics.chem-ph,physics.comp-ph,quant-ph

下载: http://arxiv.org/abs/2412.18263v5

Inductive Spatio-Temporal Kriging with Physics-Guided Increment Training Strategy for Air Quality Inference

The deployment of sensors for air quality monitoring is constrained by high costs, leading to inadequate network coverage and data deficits in some areas. Utilizing existing observations, spatio-temporal kriging is a method for estimating air quality at unobserved locations during a specific period. Inductive spatio-temporal kriging with increment training strategy has demonstrated its effectiveness using virtual nodes to simulate unobserved nodes. However, a disparity between virtual and real nodes persists, complicating the application of learning patterns derived from virtual nodes to actual unobserved ones. To address these limitations, this paper presents a Physics-Guided Increment Training Strategy (PGITS). Specifically, we design a dynamic graph generation module to incorporate the advection and diffusion processes of airborne particles as physical knowledge into the graph structure, dynamically adjusting the adjacency matrix to reflect physical interactions between nodes. By using physics principles as a bridge between virtual and real nodes, this strategy ensures the features of virtual nodes and their pseudo labels are closer to actual nodes. Consequently, the learned patterns of virtual nodes can be applied to actual unobserved nodes for effective kriging.

Updated: 2025-03-12 08:14:46

标题: 基于物理引导的增量式训练策略的感应时空克里金插值在空气质量推断中的应用

摘要: 空气质量监测传感器的部署受到高昂成本的限制，导致网络覆盖不足和一些地区数据缺乏。利用现有观测，时空克里金方法是在特定时间段内估计未观测位置的空气质量的一种方法。具有增量训练策略的归纳时空克里金已经证明了其有效性，使用虚拟节点模拟未观测节点。然而，虚拟和实际节点之间仍存在差异，使得从虚拟节点推导的学习模式应用到实际未观测节点变得复杂。为了解决这些限制，本文提出了一种物理引导的增量训练策略（PGITS）。具体来说，我们设计了一个动态图生成模块，将空气中颗粒物的平流和扩散过程作为物理知识整合到图结构中，动态调整邻接矩阵以反映节点之间的物理交互。通过使用物理原理作为虚拟和实际节点之间的桥梁，该策略确保虚拟节点的特征和其伪标签更接近实际节点。因此，虚拟节点的学习模式可以应用于实际未观测节点，实现有效的克里金估计。

更新时间: 2025-03-12 08:14:46

领域: cs.LG

下载: http://arxiv.org/abs/2503.09646v1

SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning

Generalized Category Discovery (GCD) aims to classify unlabelled images from both `seen' and `unseen' classes by transferring knowledge from a set of labelled `seen' class images. A key theme in existing GCD approaches is adapting large-scale pre-trained models for the GCD task. An alternate perspective, however, is to adapt the data representation itself for better alignment with the pre-trained model. As such, in this paper, we introduce a two-stage adaptation approach termed SPTNet, which iteratively optimizes model parameters (i.e., model-finetuning) and data parameters (i.e., prompt learning). Furthermore, we propose a novel spatial prompt tuning method (SPT) which considers the spatial property of image data, enabling the method to better focus on object parts, which can transfer between seen and unseen classes. We thoroughly evaluate our SPTNet on standard benchmarks and demonstrate that our method outperforms existing GCD methods. Notably, we find our method achieves an average accuracy of 61.4% on the SSB, surpassing prior state-of-the-art methods by approximately 10%. The improvement is particularly remarkable as our method yields extra parameters amounting to only 0.117% of those in the backbone architecture. Project page: https://visual-ai.github.io/sptnet.

Updated: 2025-03-12 08:14:35

标题: SPTNet: 一种用于空间提示调整的泛化类别发现的高效替代框架

摘要: 广义类别发现（GCD）旨在通过从一组标记为“已见”类别的图像中转移知识来对未标记的图像进行分类。现有的GCD方法中的一个关键主题是调整大规模预训练模型以用于GCD任务。然而，另一种观点是调整数据表示本身以更好地与预训练模型对齐。因此，在本文中，我们介绍了一种名为SPTNet的两阶段适应方法，该方法迭代优化模型参数（即模型微调）和数据参数（即提示学习）。此外，我们提出了一种新颖的空间提示调整方法（SPT），该方法考虑了图像数据的空间属性，使该方法能够更好地专注于可以在已见和未见类别之间转移的物体部分。我们在标准基准测试中对我们的SPTNet进行了彻底评估，并展示了我们的方法优于现有的GCD方法。值得注意的是，我们发现我们的方法在SSB上实现了61.4％的平均准确率，超过了以前的最先进方法约10％。这一改进特别显着，因为我们的方法仅产生了基础架构参数的0.117％。项目页面：https://visual-ai.github.io/sptnet。

更新时间: 2025-03-12 08:14:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.13684v3

Efficient UAV Swarm-Based Multi-Task Federated Learning with Dynamic Task Knowledge Sharing

UAV swarms are widely used in emergency communications, area monitoring, and disaster relief. Coordinated by control centers, they are ideal for federated learning (FL) frameworks. However, current UAV-assisted FL methods primarily focus on single tasks, overlooking the need for multi-task training. In disaster relief scenarios, UAVs perform tasks such as crowd detection, road feasibility analysis, and disaster assessment, which exhibit time-varying demands and potential correlations. In order to meet the time-varying requirements of tasks and complete multiple tasks efficiently under resource constraints, in this paper, we propose a UAV swarm based multi-task FL framework, where ground emergency vehicles (EVs) collaborate with UAVs to accomplish multiple tasks efficiently under constrained energy and bandwidth resources. Through theoretical analysis, we identify key factors affecting task performance and introduce a task attention mechanism to dynamically evaluate task importance, thereby achieving efficient resource allocation. Additionally, we propose a task affinity (TA) metric to capture the dynamic correlation among tasks, thereby promoting task knowledge sharing to accelerate training and improve the generalization ability of the model in different scenarios. To optimize resource allocation, we formulate a two-layer optimization problem to jointly optimize UAV transmission power, computation frequency, bandwidth allocation, and UAV-EV associations. For the inner problem, we derive closed-form solutions for transmission power, computation frequency, and bandwidth allocation and apply a block coordinate descent method for optimization. For the outer problem, a two-stage algorithm is designed to determine optimal UAV-EV associations. Furthermore, theoretical analysis reveals a trade-off between UAV energy consumption and multi-task performance.

Updated: 2025-03-12 08:13:39

标题: 高效的无人机群体基于多任务联邦学习与动态任务知识共享

摘要: 无人机群广泛应用于紧急通信、区域监测和灾难救援。它们由控制中心协调，是联邦学习（FL）框架的理想选择。然而，当前的无人机辅助FL方法主要集中在单一任务上，忽视了多任务训练的需求。在灾难救援场景中，无人机执行诸如人群检测、道路可行性分析和灾害评估等任务，这些任务表现出时变需求和潜在相关性。为了满足任务的时变需求并在资源约束下高效完成多个任务，本文提出了基于无人机群的多任务FL框架，在此框架中，地面紧急车辆（EVs）与无人机协作，在受限能量和带宽资源下高效完成多个任务。通过理论分析，我们确定了影响任务性能的关键因素，并引入了任务关注机制来动态评估任务重要性，从而实现高效资源分配。此外，我们提出了任务亲和性（TA）度量来捕捉任务之间的动态相关性，从而促进任务知识共享，加快训练速度，并提高模型在不同场景中的泛化能力。为了优化资源分配，我们制定了一个两层优化问题，共同优化无人机传输功率、计算频率、带宽分配和无人机-EV关联。对于内部问题，我们推导了传输功率、计算频率和带宽分配的闭式解，并应用块坐标下降方法进行优化。对于外部问题，设计了一个两阶段算法来确定最佳的无人机-EV关联。此外，理论分析揭示了无人机能量消耗和多任务性能之间的权衡。

更新时间: 2025-03-12 08:13:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.09144v1

A New $\sim 5σ$ Tension at Characteristic Redshift from DESI-DR1 BAO and DES-SN5YR Observations

We perform a model-independent reconstruction of the angular diameter distance ($D_{A}$) using the Multi-Task Gaussian Process (MTGP) framework with DESI-DR1 BAO and DES-SN5YR datasets. We calibrate the comoving sound horizon at the baryon drag epoch $r_d$ to the Planck best-fit value, ensuring consistency with early-universe physics. With the reconstructed $D_A$ at two key redshifts, $z\sim 1.63$ (where $D_{A}^{\prime} =0$) and at $z\sim 0.512$ (where $D_{A}^{\prime} = D_{A}$), we derive the expansion rate of the Universe $H(z)$ at these redshifts. Our findings reveal that at $z\sim 1.63$, the $H(z)$ is fully consistent with the Planck-2018 $\Lambda$CDM prediction, confirming no new physics at that redshift. However, at $z \sim 0.512$, the derived $H(z)$ shows a more than $5\sigma$ discrepancy with the Planck-2018 $\Lambda$CDM prediction, suggesting a possible breakdown of the $\Lambda$CDM model as constrained by Planck-2018 at this lower redshift. This emerging $\sim 5\sigma$ tension at $z\sim 0.512$, distinct from the existing ``Hubble Tension'', may signal the first strong evidence for new physics at low redshifts.

Updated: 2025-03-12 08:13:04

标题: 一个新的从DESI-DR1 BAO和DES-SN5YR观测中发现的特征红移处的$\sim 5σ$紧张态

摘要: 我们利用DESI-DR1 BAO和DES-SN5YR数据集，使用多任务高斯过程（MTGP）框架对角直径距离（$D_{A}$）进行了模型无关的重建。我们校准了在重子拖曳时期的共动声音地平线$r_d$到Planck最佳拟合值，确保与早期宇宙物理学一致。通过在两个关键红移$z\sim 1.63$（其中$D_{A}^{\prime} =0$）和$z\sim 0.512$（其中$D_{A}^{\prime} = D_{A}$）处重建的$D_A$，我们推导了这些红移处的宇宙膨胀率$H(z)$。我们的研究结果表明，在$z\sim 1.63$处，$H(z)$与Planck-2018年$\Lambda$CDM预测完全一致，确认该红移处没有新物理。然而，在$z\sim 0.512$处，推导的$H(z)$与Planck-2018年$\Lambda$CDM预测存在超过$5\sigma$的偏差，暗示了在这个较低红移处由Planck-2018年约束的$\Lambda$CDM模型可能出现破裂。这种出现在$z\sim 0.512$处的约$5\sigma$紧张状态，与现有的“哈勃张力”不同，可能意味着低红移处首次出现新物理的强有力证据。

更新时间: 2025-03-12 08:13:04

领域: astro-ph.CO,cs.LG,gr-qc,hep-th

下载: http://arxiv.org/abs/2503.02880v2

ANLS* -- A Universal Document Processing Metric for Generative Large Language Models

Traditionally, discriminative models have been the predominant choice for tasks like document classification and information extraction. These models make predictions that fall into a limited number of predefined classes, facilitating a binary true or false evaluation and enabling the direct calculation of metrics such as the F1 score. However, recent advancements in generative large language models (GLLMs) have prompted a shift in the field due to their enhanced zero-shot capabilities, which eliminate the need for a downstream dataset and computationally expensive fine-tuning. However, evaluating GLLMs presents a challenge as the binary true or false evaluation used for discriminative models is not applicable to the predictions made by GLLMs. This paper introduces a new metric for generative models called ANLS* for evaluating a wide variety of tasks, including information extraction and classification tasks. The ANLS* metric extends existing ANLS metrics as a drop-in-replacement and is still compatible with previously reported ANLS scores. An evaluation of 7 different datasets, and more than 20 different GLLMs together with 3 different prompting methods using the ANLS* metric is also provided, demonstrating the importance of the proposed metric. We also benchmark a novel approach to generate prompts for documents, called SFT, against other prompting techniques such as LATIN. In almost all cases, SFT outperforms other techniques and improves the state-of-the-art, sometimes by as much as $10$ percentage points. Sources are available at https://github.com/deepopinion/anls_star_metric

Updated: 2025-03-12 08:02:54

标题: ANLS* -- 一种用于生成大型语言模型的通用文档处理度量标准

摘要: 传统上，辨别模型一直是文档分类和信息提取等任务的主要选择。这些模型做出的预测属于有限数量的预定义类别，促进了二元真假评估，并使得能够直接计算诸如F1分数之类的指标成为可能。然而，由于生成式大型语言模型（GLLMs）的最新进展，领域发生了转变，因为它们增强了零-shot能力，消除了对下游数据集和计算昂贵的微调的需求。然而，评估GLLMs存在挑战，因为辨别模型使用的二元真假评估并不适用于GLLMs的预测。本文介绍了一种用于评估各种任务（包括信息提取和分类任务）的生成模型的新指标ANLS*。ANLS*指标扩展了现有的ANLS指标，可作为一种替换，并仍与先前报告的ANLS分数兼容。还提供了使用ANLS*指标对7个不同数据集和超过20种不同GLLMs以及3种不同提示方法进行评估的结果，展示了所提出指标的重要性。我们还对一种用于生成文档提示的新方法SFT进行了基准测试，与其他提示技术（如LATIN）进行了对比。在几乎所有情况下，SFT胜过其他技术，并改进了最新技术，有时提高了达到10个百分点。文献来源：https://github.com/deepopinion/anls_star_metric

更新时间: 2025-03-12 08:02:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.03848v9

Bayesian WeakS-to-Strong from Text Classification to Generation

Advances in large language models raise the question of how alignment techniques will adapt as models become increasingly complex and humans will only be able to supervise them weakly. Weak-to-Strong mimics such a scenario where weak model supervision attempts to harness the full capabilities of a much stronger model. This work extends Weak-to-Strong to WeakS-to-Strong by exploring an ensemble of weak models which simulate the variability in human opinions. Confidence scores are estimated using a Bayesian approach to guide the WeakS-to-Strong generalization. Furthermore, we extend the application of WeakS-to-Strong from text classification tasks to text generation tasks where more advanced strategies are investigated for supervision. Moreover, direct preference optimization is applied to advance the student model's preference learning, beyond the basic learning framework of teacher forcing. Results demonstrate the effectiveness of the proposed approach for the reliability of a strong student model, showing potential for superalignment.

Updated: 2025-03-12 07:57:44

标题: 贝叶斯弱到强：从文本分类到生成

摘要: 大型语言模型的进展引发了一个问题，即随着模型变得越来越复杂，人类只能弱监督它们，对齐技术将如何适应。弱到强的模仿者模拟了这样一种情景，其中弱模型监督试图利用更强大模型的全部能力。这项工作通过探索一组模拟人类观点变化的弱模型来扩展了弱到强到强。使用贝叶斯方法估计置信度分数以指导弱到强的泛化。此外，我们将弱到强的应用从文本分类任务扩展到文本生成任务，其中研究了更高级的监督策略。此外，直接偏好优化应用于推进学生模型的偏好学习，超越了传统的教师强迫学习框架。结果表明，所提出的方法对于强大的学生模型的可靠性具有有效性，显示了超级对齐的潜力。

更新时间: 2025-03-12 07:57:44

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.03199v4

Clustering by Nonparametric Smoothing

A novel formulation of the clustering problem is introduced in which the task is expressed as an estimation problem, where the object to be estimated is a function which maps a point to its distribution of cluster membership. Unlike existing approaches which implicitly estimate such a function, like Gaussian Mixture Models (GMMs), the proposed approach bypasses any explicit modelling assumptions and exploits the flexible estimation potential of nonparametric smoothing. An intuitive approach for selecting the tuning parameters governing estimation is provided, which allows the proposed method to automatically determine both an appropriate level of flexibility and also the number of clusters to extract from a given data set. Experiments on a large collection of publicly available data sets are used to document the strong performance of the proposed approach, in comparison with relevant benchmarks from the literature. R code to implement the proposed approach is available from https://github.com/DavidHofmeyr/ CNS

Updated: 2025-03-12 07:44:11

标题: 非参数平滑聚类

摘要: 介绍了一种新颖的聚类问题表述，将任务表述为一个估计问题，其中要估计的对象是将一个点映射到其簇成员分布的函数。与现有的隐式估计这种函数的方法（如高斯混合模型）不同，所提出的方法绕过了任何显式建模假设，并利用了非参数平滑的灵活估计潜力。提供了一个直观的方法来选择控制估计的调整参数，这使得所提出的方法可以自动确定适当的灵活性水平以及从给定数据集中提取的簇的数量。通过对大量公开可用数据集进行实验，证明了所提出方法的强大性能，与文献中相关基准的对比。可以从https://github.com/DavidHofmeyr/CNS获取实施所提出方法的R代码。

更新时间: 2025-03-12 07:44:11

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.09134v1

Investigation of Frame Differences as Motion Cues for Video Object Segmentation

Automatic Video Object Segmentation (AVOS) refers to the task of autonomously segmenting target objects in video sequences without relying on human-provided annotations in the first frames. In AVOS, the use of motion information is crucial, with optical flow being a commonly employed method for capturing motion cues. However, the computation of optical flow is resource-intensive, making it unsuitable for real-time applications, especially on edge devices with limited computational resources. In this study, we propose using frame differences as an alternative to optical flow for motion cue extraction. We developed an extended U-Net-like AVOS model that takes a frame on which segmentation is performed and a frame difference as inputs, and outputs an estimated segmentation map. Our experimental results demonstrate that the proposed model achieves performance comparable to the model with optical flow as an input, particularly when applied to videos captured by stationary cameras. Our results suggest the usefulness of employing frame differences as motion cues in cases with limited computational resources.

Updated: 2025-03-12 07:42:15

标题: 研究帧差异作为视频对象分割的运动线索

摘要: 自动视频对象分割（AVOS）是指在视频序列中自动分割目标对象的任务，而不依赖于人工提供的注释。在AVOS中，运动信息的使用至关重要，光流是捕捉运动线索的常用方法。然而，光流的计算是资源密集型的，这使其不适用于实时应用，特别是在计算资源有限的边缘设备上。在本研究中，我们提出使用帧差作为替代光流的运动线索提取方法。我们开发了一种扩展的类U-Net的AVOS模型，它以进行分割的帧和帧差作为输入，并输出一个估计的分割地图。我们的实验结果表明，所提出的模型在应用于静态摄像机拍摄的视频时，性能与以光流为输入的模型相当。我们的结果表明，在计算资源有限的情况下，使用帧差作为运动线索是有用的。

更新时间: 2025-03-12 07:42:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09132v1

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling

Large language models (LLMs) frequently hallucinate due to misaligned self-awareness, generating erroneous outputs when addressing queries beyond their knowledge boundaries. While existing approaches mitigate hallucinations via uncertainty estimation or query rejection, they suffer from computational inefficiency or sacrificed helpfulness. To address these issues, we propose the Explicit Knowledge Boundary Modeling (EKBM) framework, integrating fast and slow reasoning systems to harmonize reliability and usability. The framework first employs a fast-thinking model to generate confidence-labeled responses, enabling immediate use of high-confidence outputs. For uncertain predictions, a slow refinement model conducts targeted reasoning to improve accuracy. To align model behavior with our proposed object, we propose a hybrid training pipeline, enhancing self-awareness without degrading task performance. Evaluations on dialogue state tracking tasks demonstrate that EKBM achieves superior model reliability over uncertainty-based baselines. Further analysis reveals that refinement substantially boosts accuracy while maintaining low computational overhead. Our work establishes a scalable paradigm for advancing LLM reliability and balancing accuracy and practical utility in error-sensitive applications.

Updated: 2025-03-12 07:42:04

标题: 通过明确知识边界建模提高LLM可靠性

摘要: 大型语言模型（LLMs）经常出现幻觉，这是由于自我意识不一致导致，在处理超出其知识范围的查询时生成错误输出。虽然现有方法通过不确定性估计或查询拒绝来缓解幻觉，但它们存在计算效率低下或牺牲有用性的问题。为了解决这些问题，我们提出了显式知识边界建模（EKBM）框架，将快速和慢速推理系统集成在一起，以协调可靠性和可用性。该框架首先利用快速思考模型生成带有置信标签的响应，从而能够立即使用高置信度的输出。对于不确定的预测，慢速细化模型进行有针对性的推理以提高准确性。为了使模型行为与我们提出的对象保持一致，我们提出了混合训练流程，增强自我意识而不降低任务性能。在对话状态跟踪任务上的评估表明，EKBM相对于基于不确定性的基线具有更优越的模型可靠性。进一步的分析显示，细化显著提高了准确性，同时保持低计算开销。我们的工作为提高LLM可靠性和在错误敏感应用中平衡准确性和实用性奠定了可扩展的范式。

更新时间: 2025-03-12 07:42:04

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.02233v2

Urban Region Representation Learning: A Flexible Approach

The increasing availability of urban data offers new opportunities for learning region representations, which can be used as input to machine learning models for downstream tasks such as check-in or crime prediction. While existing solutions have produced promising results, an issue is their fixed formation of regions and fixed input region features, which may not suit the needs of different downstream tasks. To address this limitation, we propose a model named FlexiReg for urban region representation learning that is flexible with both the formation of urban regions and the input region features. FlexiReg is based on a spatial grid partitioning over the spatial area of interest. It learns representations for the grid cells, leveraging publicly accessible data, including POI, land use, satellite imagery, and street view imagery. We propose adaptive aggregation to fuse the cell representations and prompt learning techniques to tailor the representations towards different tasks, addressing the needs of varying formations of urban regions and downstream tasks. Extensive experiments on five real-world datasets demonstrate that FlexiReg outperforms state-of-the-art models by up to 202% in term of the accuracy of four diverse downstream tasks using the produced urban region representations.

Updated: 2025-03-12 07:33:44

标题: 城市地区表示学习：一种灵活的方法

摘要: 随着城市数据的增加，为学习区域表示提供了新的机会，这些表示可以作为机器学习模型的输入，用于诸如签到或犯罪预测等下游任务。虽然现有的解决方案已经取得了令人满意的结果，但一个问题是它们对区域的固定形成和固定输入区域特征，这可能不适合不同下游任务的需求。为了解决这一限制，我们提出了一个名为FlexiReg的模型，用于学习城市区域表示，它在城市区域的形成和输入区域特征方面具有灵活性。FlexiReg基于对感兴趣的空间区域进行空间网格划分。它学习网格单元的表示，利用公开可访问的数据，包括POI、土地利用、卫星图像和街景图像。我们提出自适应聚合来融合单元表示，并促使学习技术来定制表示以适应不同任务，解决城市区域的不同形成和下游任务的需求。对五个真实数据集的大量实验表明，FlexiReg在利用生成的城市区域表示进行四项不同下游任务的准确性方面，表现优于最先进的模型高达202%。

更新时间: 2025-03-12 07:33:44

领域: cs.LG

下载: http://arxiv.org/abs/2503.09128v1

Adaptive$^2$: Adaptive Domain Mining for Fine-grained Domain Adaptation Modeling

Advertising systems often face the multi-domain challenge, where data distributions vary significantly across scenarios. Existing domain adaptation methods primarily focus on building domain-adaptive neural networks but often rely on hand-crafted domain information, e.g., advertising placement, which may be sub-optimal. We think that fine-grained "domain" patterns exist that are difficult to hand-craft in online advertisement. Thus, we propose Adaptive$^2$, a novel framework that first learns domains adaptively using a domain mining module by self-supervision and then employs a shared&specific network to model shared and conflicting information. As a practice, we use VQ-VAE as the domain mining module and conduct extensive experiments on public benchmarks. Results show that traditional domain adaptation methods with hand-crafted domains perform no better than single-domain models under fair FLOPS conditions, highlighting the importance of domain definition. In contrast, Adaptive$^2$ outperforms existing approaches, emphasizing the effectiveness of our method and the significance of domain mining. We also deployed Adaptive$^2$ in the live streaming scenario of Kuaishou Advertising System, demonstrating its commercial value and potential for automatic domain identification. To the best of our knowledge, Adaptive$^2$ is the first approach to automatically learn both domain identification and adaptation in online advertising, opening new research directions for this area.

Updated: 2025-03-12 07:26:16

标题: 自适应$^2$: 用于精细领域自适应建模的自适应领域挖掘

摘要: 广告系统经常面临多领域挑战，其中数据分布在不同场景中显著变化。现有的领域自适应方法主要集中在构建领域自适应神经网络，但往往依赖手工制定的领域信息，例如广告位置，可能不够优化。我们认为存在难以手工制定的细粒度“领域”模式在在线广告中。因此，我们提出了Adaptive$^2$，一个新颖的框架，首先通过自监督学习领域挖掘模块自适应地学习领域，然后利用共享和特定网络来建模共享和冲突信息。作为实践，我们使用VQ-VAE作为领域挖掘模块，并在公共基准上进行了大量实验。结果显示，具有手工制定领域的传统领域自适应方法在公平的FLOPS条件下表现不如单领域模型，突显了领域定义的重要性。相比之下，Adaptive$^2$胜过现有方法，强调了我们方法的有效性和领域挖掘的重要性。我们还在Kuaishou广告系统的直播流场景中部署了Adaptive$^2，展示其商业价值和自动领域识别的潜力。据我们所知，Adaptive$^2是第一个能够在在线广告中自动学习领域识别和自适应的方法，为这一领域开辟了新的研究方向。

更新时间: 2025-03-12 07:26:16

领域: cs.LG,I.2.6; H.3.3

下载: http://arxiv.org/abs/2412.08198v2

AdvAD: Exploring Non-Parametric Diffusion for Imperceptible Adversarial Attacks

Imperceptible adversarial attacks aim to fool DNNs by adding imperceptible perturbation to the input data. Previous methods typically improve the imperceptibility of attacks by integrating common attack paradigms with specifically designed perception-based losses or the capabilities of generative models. In this paper, we propose Adversarial Attacks in Diffusion (AdvAD), a novel modeling framework distinct from existing attack paradigms. AdvAD innovatively conceptualizes attacking as a non-parametric diffusion process by theoretically exploring basic modeling approach rather than using the denoising or generation abilities of regular diffusion models requiring neural networks. At each step, much subtler yet effective adversarial guidance is crafted using only the attacked model without any additional network, which gradually leads the end of diffusion process from the original image to a desired imperceptible adversarial example. Grounded in a solid theoretical foundation of the proposed non-parametric diffusion process, AdvAD achieves high attack efficacy and imperceptibility with intrinsically lower overall perturbation strength. Additionally, an enhanced version AdvAD-X is proposed to evaluate the extreme of our novel framework under an ideal scenario. Extensive experiments demonstrate the effectiveness of the proposed AdvAD and AdvAD-X. Compared with state-of-the-art imperceptible attacks, AdvAD achieves an average of 99.9$\%$ (+17.3$\%$) ASR with 1.34 (-0.97) $l_2$ distance, 49.74 (+4.76) PSNR and 0.9971 (+0.0043) SSIM against four prevalent DNNs with three different architectures on the ImageNet-compatible dataset. Code is available at https://github.com/XianguiKang/AdvAD.

Updated: 2025-03-12 07:22:39

标题: AdvAD：探索用于不可察觉对抗攻击的非参数扩散

摘要: Imperceptible对抗性攻击旨在通过向输入数据添加不可察觉的扰动来欺骗DNNs。先前的方法通常通过将常见攻击范式与特别设计的基于感知的损失或生成模型的能力相结合，以改善攻击的不可察觉性。本文提出了Adversarial Attacks in Diffusion (AdvAD)，这是一个与现有攻击范式不同的新型建模框架。AdvAD创新地将攻击概念化为一个非参数扩散过程，通过理论上探讨基本建模方法，而不是使用需要神经网络的常规扩散模型的去噪或生成能力。在每一步，只使用被攻击的模型而没有任何额外网络，精心制作出更微妙但有效的对抗性指导，逐渐将扩散过程的终点从原始图像引导到所需的不可察觉的对抗性示例。基于提出的非参数扩散过程的坚实理论基础，AdvAD在固有地较低的总体扰动强度下实现了高攻击效果和不可察觉性。此外，提出了增强版AdvAD-X，以在理想情况下评估我们的新框架的极端。大量实验证明了提出的AdvAD和AdvAD-X的有效性。与最先进的不可察觉攻击相比，AdvAD在与三种不同架构的四个流行DNNs在ImageNet兼容数据集上的平均ASR分别为99.9% (+17.3%)、1.34 (-0.97) $l_2$距离、49.74 (+4.76) PSNR和0.9971 (+0.0043) SSIM。代码可在https://github.com/XianguiKang/AdvAD找到。

更新时间: 2025-03-12 07:22:39

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.09124v1

PRISM: Privacy-Preserving Improved Stochastic Masking for Federated Generative Models

Despite recent advancements in federated learning (FL), the integration of generative models into FL has been limited due to challenges such as high communication costs and unstable training in heterogeneous data environments. To address these issues, we propose PRISM, a FL framework tailored for generative models that ensures (i) stable performance in heterogeneous data distributions and (ii) resource efficiency in terms of communication cost and final model size. The key of our method is to search for an optimal stochastic binary mask for a random network rather than updating the model weights, identifying a sparse subnetwork with high generative performance; i.e., a ``strong lottery ticket''. By communicating binary masks in a stochastic manner, PRISM minimizes communication overhead. This approach, combined with the utilization of maximum mean discrepancy (MMD) loss and a mask-aware dynamic moving average aggregation method (MADA) on the server side, facilitates stable and strong generative capabilities by mitigating local divergence in FL scenarios. Moreover, thanks to its sparsifying characteristic, PRISM yields a lightweight model without extra pruning or quantization, making it ideal for environments such as edge devices. Experiments on MNIST, FMNIST, CelebA, and CIFAR10 demonstrate that PRISM outperforms existing methods, while maintaining privacy with minimal communication costs. PRISM is the first to successfully generate images under challenging non-IID and privacy-preserving FL environments on complex datasets, where previous methods have struggled.

Updated: 2025-03-12 07:22:25

标题: PRISM：隐私保护改进随机掩码技术用于联邦生成模型

摘要: 尽管联邦学习（FL）近年来取得了进展，但由于高通信成本和异构数据环境中不稳定的训练等挑战，生成模型与FL的整合受到限制。为解决这些问题，我们提出了PRISM，这是一种专为生成模型量身定制的FL框架，确保在异构数据分布中稳定性能，并在通信成本和最终模型大小方面实现资源效率。我们方法的关键是寻找一个随机网络的最佳随机二进制掩模，而不是更新模型权重，从而识别具有高生成性能的稀疏子网络；即，“强大的中奖券”。通过以随机方式传输二进制掩模，PRISM减少了通信开销。这种方法与在服务器端利用最大均值差异（MMD）损失和掩模感知动态移动平均聚合方法（MADA）相结合，通过减少FL场景中的局部发散，促进了稳定和强大的生成能力。此外，由于其稀疏特性，PRISM生成了一个轻量级模型，无需额外的修剪或量化，使其非常适合边缘设备等环境。对MNIST、FMNIST、CelebA和CIFAR10的实验表明，PRISM优于现有方法，同时在保护隐私的同时保持最低的通信成本。PRISM是首个成功在复杂数据集上的具有挑战性的非IID和隐私保护FL环境下生成图像的方法，而以前的方法则很难做到。

更新时间: 2025-03-12 07:22:25

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2503.08085v2

On the Generalization Properties of Diffusion Models

Diffusion models are a class of generative models that serve to establish a stochastic transport map between an empirically observed, yet unknown, target distribution and a known prior. Despite their remarkable success in real-world applications, a theoretical understanding of their generalization capabilities remains underdeveloped. This work embarks on a comprehensive theoretical exploration of the generalization attributes of diffusion models. We establish theoretical estimates of the generalization gap that evolves in tandem with the training dynamics of score-based diffusion models, suggesting a polynomially small generalization error ($O(n^{-2/5}+m^{-4/5})$) on both the sample size $n$ and the model capacity $m$, evading the curse of dimensionality (i.e., not exponentially large in the data dimension) when early-stopped. Furthermore, we extend our quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities with progressively increasing distances between modes. This precisely elucidates the adverse effect of "modes shift" in ground truths on the model generalization. Moreover, these estimates are not solely theoretical constructs but have also been confirmed through numerical simulations. Our findings contribute to the rigorous understanding of diffusion models' generalization properties and provide insights that may guide practical applications.

Updated: 2025-03-12 07:18:59

标题: 关于扩散模型的泛化特性

摘要: 扩散模型是一类生成模型，旨在建立一个随机传输映射，将经验观察到但未知的目标分布与已知的先验分布联系起来。尽管在实际应用中取得了显著成功，但对其泛化能力的理论理解仍未充分发展。本文对扩散模型的泛化属性进行了全面的理论探讨。我们建立了随着基于分数的扩散模型的训练动态而演变的泛化差距的理论估计，表明在样本大小$n$和模型容量$m$上都有多项式小的泛化误差($O(n^{-2/5}+m^{-4/5})$)，在早停时避免了维数灾难（即在数据维度上不是指数级增长）。此外，我们将定量分析扩展到数据相关的情景，其中目标分布被描绘为一系列密度，其模式之间的距离逐渐增加。这准确阐明了“模式转移”对模型泛化的不利影响。此外，这些估计不仅是理论构造，还通过数值模拟得到了证实。我们的研究结果有助于深入理解扩散模型的泛化特性，并提供可能指导实际应用的见解。

更新时间: 2025-03-12 07:18:59

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2311.01797v4

FaiREE: Fair Classification with Finite-Sample and Distribution-Free Guarantee

Algorithmic fairness plays an increasingly critical role in machine learning research. Several group fairness notions and algorithms have been proposed. However, the fairness guarantee of existing fair classification methods mainly depends on specific data distributional assumptions, often requiring large sample sizes, and fairness could be violated when there is a modest number of samples, which is often the case in practice. In this paper, we propose FaiREE, a fair classification algorithm that can satisfy group fairness constraints with finite-sample and distribution-free theoretical guarantees. FaiREE can be adapted to satisfy various group fairness notions (e.g., Equality of Opportunity, Equalized Odds, Demographic Parity, etc.) and achieve the optimal accuracy. These theoretical guarantees are further supported by experiments on both synthetic and real data. FaiREE is shown to have favorable performance over state-of-the-art algorithms.

Updated: 2025-03-12 07:17:23

标题: FaiREE：具有有限样本和无分布保证的公平分类

摘要: 算法公平性在机器学习研究中扮演着日益关键的角色。已经提出了几种群体公平性概念和算法。然而，现有公平分类方法的公平保证主要取决于特定数据分布假设，通常需要大样本量，并且在样本数量较少时可能会违反公平性，这在实践中经常发生。本文提出了FaiREE，一种公平分类算法，可以满足有限样本和无分布理论保证的群体公平性约束。FaiREE可以适应满足各种群体公平性概念（如机会平等、平等赔率、人口平等等），并实现最佳准确性。这些理论保证进一步得到了对合成数据和真实数据的实验支持。实验表明，FaiREE相对于最先进的算法具有有利的性能。

更新时间: 2025-03-12 07:17:23

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2211.15072v5

On the Internal Representations of Graph Metanetworks

Weight space learning is an emerging paradigm in the deep learning community. The primary goal of weight space learning is to extract informative features from a set of parameters using specially designed neural networks, often referred to as \emph{metanetworks}. However, it remains unclear how these metanetworks learn solely from parameters. To address this, we take the first step toward understanding \emph{representations} of metanetworks, specifically graph metanetworks (GMNs), which achieve state-of-the-art results in this field, using centered kernel alignment (CKA). Through various experiments, we reveal that GMNs and general neural networks (\textit{e.g.,} multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs)) differ in terms of their representation space.

Updated: 2025-03-12 07:12:34

标题: 关于图元网络内部表示的研究

摘要: 权重空间学习是深度学习社区中的新兴范式。权重空间学习的主要目标是利用专门设计的神经网络（通常被称为元网络）从一组参数中提取信息特征。然而，目前尚不清楚这些元网络如何仅从参数中学习。为了解决这个问题，我们首先着手理解元网络的表示，特别是图元网络（GMNs），它们在这一领域取得了最先进的结果，使用中心核对齐（CKA）。通过各种实验，我们发现GMNs和一般神经网络（如多层感知机（MLPs）和卷积神经网络（CNNs））在表示空间方面有所不同。

更新时间: 2025-03-12 07:12:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.09120v1

GRU: Mitigating the Trade-off between Unlearning and Retention for Large Language Models

Large language model (LLM) unlearning has demonstrated its essential role in removing privacy and copyright-related responses, crucial for their legal and safe applications. However, the pursuit of complete unlearning often comes with substantial costs due to its compromises in their general functionality, leading to a notorious trade-off between unlearning and retention. In examining the update process for unlearning dynamically, we find gradients hold essential information for revealing this trade-off. In particular, we look at the varying relationship between retention performance and directional disparities between gradients during unlearning. It motivates the sculpting of an update mechanism derived from gradients from two sources, i.e., harmful for retention and useful for unlearning. Accordingly, we propose Gradient Rectified Unlearning (GRU), an enhanced unlearning framework controlling the updating gradients in a geometry-focused and optimization-driven manner such that their side impacts on other, unrelated responses can be minimized. Specifically, GRU derives a closed-form solution to project the unlearning gradient onto the orthogonal space of that gradient harmful for retention, ensuring minimal deviation from its original direction under the condition that overall performance is retained. Comprehensive experiments are conducted to demonstrate that GRU, as a general framework, is straightforward to implement and efficiently enhances a range of baseline methods through its adaptable and compatible characteristics. Additionally, experimental results show its broad effectiveness across a diverse set of benchmarks for LLM unlearning.

Updated: 2025-03-12 07:08:54

标题: GRU: 减轻大型语言模型在遗忘和保留之间的权衡

摘要: 大型语言模型（LLM）的遗忘已经证明了它在消除与隐私和版权相关的响应方面的重要作用，这对于它们的合法和安全应用至关重要。然而，追求完全遗忘往往会带来巨大的成本，因为它在一般功能性方面的妥协，导致了遗忘和保留之间的著名权衡。在动态地审视遗忘的更新过程中，我们发现梯度持有揭示这种权衡的关键信息。特别是，我们关注在遗忘过程中保留性能和梯度之间的方向差异之间的变化关系。它激发了一个更新机制的雕塑，该机制源自来自两个源头的梯度，即对于保留有害而对于遗忘有用。因此，我们提出了Gradient Rectified Unlearning（GRU），这是一个增强的遗忘框架，以几何为重点，并以优化驱动的方式控制更新梯度，以便它们对其他不相关的响应的副作用可以被最小化。具体而言，GRU推导出一个封闭形式的解决方案，将遗忘梯度投影到对于保留有害的梯度的正交空间上，确保在整体性能保持的情况下最小的偏离。进行了全面的实验，证明了GRU作为一个通用框架，易于实现，并通过其适应性和兼容性特征有效地增强了一系列基线方法。此外，实验结果显示其在LLM遗忘的各种基准测试中的广泛有效性。

更新时间: 2025-03-12 07:08:54

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.09117v1

Drift-Aware Federated Learning: A Causal Perspective

Federated learning (FL) facilitates collaborative model training among multiple clients while preserving data privacy, often resulting in enhanced performance compared to models trained by individual clients. However, factors such as communication frequency and data distribution can contribute to feature drift, hindering the attainment of optimal training performance. This paper examine the relationship between model update drift and global as well as local optimizer from causal perspective. The influence of the global optimizer on feature drift primarily arises from the participation frequency of certain clients in server updates, whereas the effect of the local optimizer is typically associated with imbalanced data distributions.To mitigate this drift, we propose a novel framework termed Causal drift-Aware Federated lEarning (CAFE). CAFE exploits the causal relationship between feature-invariant components and classification outcomes to independently calibrate local client sample features and classifiers during the training phase. In the inference phase, it eliminated the drifts in the global model that favor frequently communicating clients.Experimental results demonstrate that CAFE's integration of feature calibration, parameter calibration, and historical information effectively reduces both drift towards majority classes and tendencies toward frequently communicating nodes.

Updated: 2025-03-12 07:05:30

标题: 漂移感知的联邦学习：因果视角

摘要: 联邦学习（FL）促进了多个客户之间的协作模型训练，同时保护数据隐私，通常比单个客户训练的模型表现出更好的性能。然而，通信频率和数据分布等因素可能导致特征漂移，阻碍达到最佳训练性能。本文从因果的角度研究了模型更新漂移与全局和局部优化器之间的关系。全局优化器对特征漂移的影响主要来自某些客户在服务器更新中的参与频率，而局部优化器的影响通常与数据分布不平衡有关。为了减轻这种漂移，我们提出了一个名为因果漂移感知联邦学习（CAFE）的新框架。CAFE利用特征不变成分和分类结果之间的因果关系，在训练阶段独立校准本地客户样本特征和分类器。在推断阶段，它消除了有利于频繁通信客户的全局模型漂移。实验结果表明，CAFE整合了特征校准、参数校准和历史信息，有效减少了向多数类别的漂移和向频繁通信节点的倾向。

更新时间: 2025-03-12 07:05:30

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2503.09116v1

Multimodal Foundation Models for Material Property Prediction and Discovery

Artificial intelligence is transforming computational materials science, improving the prediction of material properties, and accelerating the discovery of novel materials. Recently, publicly available material data repositories have grown rapidly. This growth encompasses not only more materials but also a greater variety and quantity of their associated properties. Existing machine learning efforts in materials science focus primarily on single-modality tasks, i.e. relationships between materials and a single physical property, thus not taking advantage of the rich and multimodal set of material properties. Here, we introduce Multimodal Learning for Materials (MultiMat), which enables self-supervised multi-modality training of foundation models for materials. We demonstrate our framework's potential using data from the Materials Project database on multiple axes: (i) MultiMat achieves state-of-the-art performance for challenging material property prediction tasks; (ii) MultiMat enables novel and accurate material discovery via latent space similarity, enabling screening for stable materials with desired properties; and (iii) MultiMat encodes interpretable emergent features that may provide novel scientific insights.

Updated: 2025-03-12 07:04:21

标题: 多模态基础模型用于材料性质预测和发现

摘要: 人工智能正在改变计算材料科学，提高材料性能预测的准确性，加速新材料的发现。最近，公开可用的材料数据库迅速增长。这种增长不仅涵盖了更多的材料，还包括更多种类和数量的相关属性。现有的材料科学中的机器学习工作主要集中在单模态任务，即材料与单一物理属性之间的关系，因此未能充分利用丰富和多模态的材料属性集。在这里，我们介绍了用于材料的多模态学习（MultiMat），它实现了自监督的多模态培训基础模型。我们使用来自Materials Project数据库的数据在多个方面展示了我们框架的潜力：（i）MultiMat在具有挑战性的材料性能预测任务中实现了最先进的性能；（ii）MultiMat通过潜在空间相似性实现了对稳定材料的筛选，从而实现了对具有期望性质的稳定材料的发现；（iii）MultiMat编码了可解释的新兴特征，可能提供新的科学见解。

更新时间: 2025-03-12 07:04:21

领域: cs.LG,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2312.00111v4

Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge

The rapid rise of Language Models (LMs) has expanded the capabilities of natural language processing, powering applications from text generation to complex decision-making. While state-of-the-art LMs often boast hundreds of billions of parameters and are primarily deployed in data centers, recent trends show a growing focus on compact models-typically under 10 billion parameters-enabled by techniques such as quantization and other model compression techniques. This shift paves the way for LMs on edge devices, offering potential benefits such as enhanced privacy, reduced latency, and improved data sovereignty. However, the inherent complexity of even these smaller models, combined with the limited computing resources of edge hardware, raises critical questions about the practical trade-offs in executing LM inference outside the cloud. To address these challenges, we present a comprehensive evaluation of generative LM inference on representative CPU-based and GPU-accelerated edge devices. Our study measures key performance indicators-including memory usage, inference speed, and energy consumption-across various device configurations. Additionally, we examine throughput-energy trade-offs, cost considerations, and usability, alongside an assessment of qualitative model performance. While quantization helps mitigate memory overhead, it does not fully eliminate resource bottlenecks, especially for larger models. Our findings quantify the memory and energy constraints that must be considered for practical real-world deployments, offering concrete insights into the trade-offs between model size, inference performance, and efficiency. The exploration of LMs at the edge is still in its early stages. We hope this study provides a foundation for future research, guiding the refinement of models, the enhancement of inference efficiency, and the advancement of edge-centric AI systems.

Updated: 2025-03-12 07:01:34

标题: 有时痛苦但肯定有希望：边缘语言模型推断的可行性和权衡

摘要: 语言模型（LMs）的快速崛起扩展了自然语言处理的能力，推动了从文本生成到复杂决策制定等应用的发展。尽管最先进的LMs通常拥有数百亿参数，并且主要部署在数据中心，但最近的趋势显示出对紧凑模型的日益关注，这些模型通常具有不超过100亿参数，通过量化和其他模型压缩技术实现。这种转变为在边缘设备上实现LMs铺平了道路，提供了增强隐私、降低延迟和改善数据主权等潜在好处。然而，即使是这些较小的模型的固有复杂性，再加上边缘硬件的有限计算资源，引发了在云外执行LM推断的实际权衡的关键问题。为了解决这些挑战，我们提供了对代表性基于CPU和GPU加速的边缘设备上生成LM推断的全面评估。我们的研究衡量了不同设备配置下的关键性能指标，包括内存使用、推断速度和能耗。此外，我们还考察了吞吐量-能耗权衡、成本考虑以及可用性，同时评估了定性模型性能。虽然量化有助于减轻内存开销，但并不能完全消除资源瓶颈，尤其是对于较大的模型。我们的发现量化了必须考虑的内存和能耗约束，为实际的部署提供了具体见解，探讨了模型大小、推断性能和效率之间的权衡。LMs在边缘的探索仍处于早期阶段。我们希望这项研究为未来的研究奠定基础，引导模型的完善、推断效率的提高以及边缘中心的AI系统的发展。

更新时间: 2025-03-12 07:01:34

领域: cs.LG,cs.AI,cs.DC,cs.PF

下载: http://arxiv.org/abs/2503.09114v1

Constraint-Guided Learning of Data-driven Health Indicator Models: An Application on the Pronostia Bearing Dataset

This paper presents a constraint-guided deep learning framework for developing physically consistent health indicators in bearing prognostics and health management. Conventional data-driven methods often lack physical plausibility, while physics-based models are limited by incomplete system knowledge. To address this, we integrate domain knowledge into deep learning using constraints to enforce monotonicity, bound output values between 1 and 0 (representing healthy to failed states), and ensure consistency between signal energy trends and health indicator estimates. This eliminates the need for complex loss term balancing. We implement constraint-guided gradient descent within an autoencoder architecture, creating a constrained autoencoder. However, the framework is adaptable to other architectures. Using time-frequency representations of accelerometer signals from the Pronostia dataset, our constrained model generates smoother, more reliable degradation profiles compared to conventional methods, aligning with expected physical behavior. Performance is assessed using three metrics: trendability, robustness, and consistency. Compared to a conventional baseline, the constrained model improves all three. Another baseline, incorporating monotonicity via a soft-ranking loss function, outperforms in trendability but falls short in robustness and consistency. An ablation study confirms that the monotonicity constraint enhances trendability, the boundary constraint ensures consistency, and the energy-health consistency constraint improves robustness. These findings highlight the effectiveness of constraint-guided deep learning in producing reliable, physically meaningful health indicators, offering a promising direction for future prognostic applications.

Updated: 2025-03-12 07:01:27

标题: Constraint-Guided Learning of Data-driven Health Indicator Models: An Application on the Pronostia Bearing Dataset 基于约束的数据驱动健康指标模型学习：应用于Pronostia轴承数据集

摘要: 本文提出了一个受约束的深度学习框架，用于开发在轴承预测和健康管理中具有物理一致性的健康指标。传统的数据驱动方法通常缺乏物理可信度，而基于物理的模型受不完整系统知识的限制。为了解决这个问题，我们将领域知识整合到深度学习中，使用约束来强制单调性，将输出值限制在1和0之间（代表健康到失败状态），并确保信号能量趋势与健康指标估计之间的一致性。这消除了对复杂损失项平衡的需求。我们在自动编码器架构中实现了受约束的梯度下降，创建了一个受约束的自动编码器。然而，该框架适用于其他架构。使用Pronostia数据集的加速度计信号的时频表示，我们的受约束模型生成比传统方法更平滑、更可靠的退化曲线，与预期的物理行为一致。性能通过三个指标进行评估：趋势性、鲁棒性和一致性。与传统基线相比，受约束模型在所有三个方面都有所改善。另一个基线，通过软排名损失函数引入单调性，虽然在趋势性方面表现优异，但在鲁棒性和一致性方面表现不佳。消融研究证实，单调性约束增强了趋势性，边界约束确保了一致性，能量-健康一致性约束提高了鲁棒性。这些发现突出了受约束的深度学习在产生可靠、具有物理意义的健康指标方面的有效性，为未来预测应用提供了一个有前途的方向。

更新时间: 2025-03-12 07:01:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.09113v1

A Comprehensive Survey on Enterprise Financial Risk Analysis from Big Data Perspective

Enterprise financial risk analysis aims at predicting the future financial risk of enterprises. Due to its wide and significant application, enterprise financial risk analysis has always been the core research topic in the fields of Finance and Management. Based on advanced computer science and artificial intelligence technologies, enterprise risk analysis research is experiencing rapid developments and making significant progress. Therefore, it is both necessary and challenging to comprehensively review the relevant studies. Although there are already some valuable and impressive surveys on enterprise risk analysis from the perspective of Finance and Management, these surveys introduce approaches in a relatively isolated way and lack recent advances in enterprise financial risk analysis. In contrast, this paper attempts to provide a systematic literature survey of enterprise risk analysis approaches from Big Data perspective, which reviews more than 250 representative articles in the past almost 50 years (from 1968 to 2023). To the best of our knowledge, this is the first and only survey work on enterprise financial risk from Big Data perspective. Specifically, this survey connects and systematizes the existing enterprise financial risk studies, i.e. to summarize and interpret the problems, methods, and spotlights in a comprehensive way. In particular, we first introduce the issues of enterprise financial risks in terms of their types,granularity, intelligence, and evaluation metrics, and summarize the corresponding representative works. Then, we compare the analysis methods used to learn enterprise financial risk, and finally summarize the spotlights of the most representative works. Our goal is to clarify current cutting-edge research and its possible future directions to model enterprise risk, aiming to fully understand the mechanisms of enterprise risk generation and contagion.

Updated: 2025-03-12 06:59:50

标题: 从大数据视角下企业财务风险分析的全面调查

摘要: 企业财务风险分析旨在预测企业未来的财务风险。由于其广泛而重要的应用，企业财务风险分析一直是金融和管理领域的核心研究课题。基于先进的计算机科学和人工智能技术，企业风险分析研究正在经历快速发展并取得显著进展。因此，全面审查相关研究既是必要的，也是具有挑战性的。尽管已经有一些有价值和令人印象深刻的关于企业风险分析的调查，这些调查以相对孤立的方式介绍方法，并缺乏企业财务风险分析的最新进展。相比之下，本文试图从大数据的角度提供对企业风险分析方法的系统文献综述，回顾了过去近50年（从1968年到2023年）的250多篇代表性文章。据我们所知，这是第一篇也是唯一一篇关于大数据视角下企业财务风险的调查工作。具体而言，该调查连接并系统化现有的企业财务风险研究，即全面总结和解释问题、方法和关注点。特别是，我们首先介绍企业财务风险的问题，包括类型、粒度、智能和评估指标，并总结相应的代表性作品。然后，我们比较用于学习企业财务风险的分析方法，最后总结最具代表性作品的关注点。我们的目标是澄清当前尖端研究及其可能的未来方向，以建立企业风险模型，旨在充分理解企业风险的产生和传播机制。

更新时间: 2025-03-12 06:59:50

领域: q-fin.RM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2211.14997v4

Freeze and Cluster: A Simple Baseline for Rehearsal-Free Continual Category Discovery

This paper addresses the problem of Rehearsal-Free Continual Category Discovery (RF-CCD), which focuses on continuously identifying novel class by leveraging knowledge from labeled data. Existing methods typically train from scratch, overlooking the potential of base models, and often resort to data storage to prevent forgetting. Moreover, because RF-CCD encompasses both continual learning and novel class discovery, previous approaches have struggled to effectively integrate advanced techniques from these fields, resulting in less convincing comparisons and failing to reveal the unique challenges posed by RF-CCD. To address these challenges, we lead the way in integrating advancements from both domains and conducting extensive experiments and analyses. Our findings demonstrate that this integration can achieve state-of-the-art results, leading to the conclusion that in the presence of pre-trained models, the representation does not improve and may even degrade with the introduction of unlabeled data. To mitigate representation degradation, we propose a straightforward yet highly effective baseline method. This method first utilizes prior knowledge of known categories to estimate the number of novel classes. It then acquires representations using a model specifically trained on the base classes, generates high-quality pseudo-labels through k-means clustering, and trains only the classifier layer. We validate our conclusions and methods by conducting extensive experiments across multiple benchmarks, including the Stanford Cars, CUB, iNat, and Tiny-ImageNet datasets. The results clearly illustrate our findings, demonstrate the effectiveness of our baseline, and pave the way for future advancements in RF-CCD.

Updated: 2025-03-12 06:46:32

标题: 冻结和聚类：一种无需排练的持续类别发现简单基准

摘要: 本文讨论了无需排练的持续类别发现（RF-CCD）问题，重点是通过利用标记数据中的知识持续识别新颖类别。现有方法通常从头开始训练，忽视了基础模型的潜力，并经常利用数据存储以防止遗忘。此外，由于RF-CCD涵盖了持续学习和新颖类别发现，先前的方法一直在努力有效地整合这两个领域的先进技术，导致比较不够令人信服，并未揭示RF-CCD所面临的独特挑战。为了解决这些挑战，我们在整合两个领域的进展并进行广泛实验和分析方面走在了前面。我们的研究结果表明，这种整合可以实现最先进的结果，从而得出结论，在预训练模型存在的情况下，表示并不会提高，甚至可能会随着未标记数据的引入而下降。为了减轻表示的退化，我们提出了一个简单但非常有效的基线方法。该方法首先利用已知类别的先前知识来估计新颖类别的数量。然后利用专门针对基础类别进行训练的模型获取表示，通过k均值聚类生成高质量的伪标签，并仅训练分类器层。我们通过在多个基准数据集上进行广泛实验验证了我们的结论和方法，包括斯坦福汽车、CUB、iNat和Tiny-ImageNet数据集。结果清楚地说明了我们的发现，展示了我们基线的有效性，并为RF-CCD的未来进展铺平了道路。

更新时间: 2025-03-12 06:46:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09106v1

Test-Time Discovery via Hashing Memory

We introduce Test-Time Discovery (TTD) as a novel task that addresses class shifts during testing, requiring models to simultaneously identify emerging categories while preserving previously learned ones. A key challenge in TTD is distinguishing newly discovered classes from those already identified. To address this, we propose a training-free, hash-based memory mechanism that enhances class discovery through fine-grained comparisons with past test samples. Leveraging the characteristics of unknown classes, our approach introduces hash representation based on feature scale and directions, utilizing Locality-Sensitive Hashing (LSH) for efficient grouping of similar samples. This enables test samples to be easily and quickly compared with relevant past instances. Furthermore, we design a collaborative classification strategy, combining a prototype classifier for known classes with an LSH-based classifier for novel ones. To enhance reliability, we incorporate a self-correction mechanism that refines memory labels through hash-based neighbor retrieval, ensuring more stable and accurate class assignments. Experimental results demonstrate that our method achieves good discovery of novel categories while maintaining performance on known classes, establishing a new paradigm in model testing. Our code is available at https://github.com/fanlyu/ttd.

Updated: 2025-03-12 06:43:01

标题: 通过哈希内存进行测试时间发现

摘要: 我们引入Test-Time Discovery (TTD)作为一项新颖的任务，解决测试过程中的类别转移问题，要求模型同时识别新出现的类别并保留先前学习的类别。TTD中的一个关键挑战是区分新发现的类别和已经识别的类别。为了解决这个问题，我们提出了一种基于哈希的内存机制，无需训练，通过与过去测试样本的细粒度比较增强类别发现能力。利用未知类别的特征，我们的方法引入了基于特征尺度和方向的哈希表示，利用局部敏感哈希（LSH）有效地对相似样本进行分组。这使得测试样本能够轻松快速地与相关的过去实例进行比较。此外，我们设计了一种协同分类策略，将已知类别的原型分类器与基于LSH的新类别分类器结合在一起。为了增强可靠性，我们还融入了一个自我校正机制，通过基于哈希的邻居检索来优化内存标签，确保更稳定和准确的类别分配。实验结果表明，我们的方法在发现新类别的同时保持对已知类别的性能，为模型测试建立了一个新的范式。我们的代码可以在https://github.com/fanlyu/ttd 上找到。

更新时间: 2025-03-12 06:43:01

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.10699v1

The Shape of Attraction in UMAP: Exploring the Embedding Forces in Dimensionality Reduction

Uniform manifold approximation and projection (UMAP) is among the most popular neighbor embedding methods. The method relies on attractive and repulsive forces among high-dimensional data points to obtain a low-dimensional embedding. In this paper, we analyze the forces to reveal their effects on cluster formations and visualization. Repulsion emphasizes differences, controlling cluster boundaries and inter-cluster distance. Attraction is more subtle, as attractive tension between points can manifest simultaneously as attraction and repulsion in the lower-dimensional mapping. This explains the need for learning rate annealing and motivates the different treatments between attractive and repulsive terms. Moreover, by modifying attraction, we improve the consistency of cluster formation under random initialization. Overall, our analysis makes UMAP and similar embedding methods more interpretable, more robust, and more accurate.

Updated: 2025-03-12 06:37:43

标题: UMAP中的吸引力形状：探索降维中的嵌入力量

摘要: Uniform Manifold Approximation and Projection (UMAP) 是最受欢迎的邻域嵌入方法之一。该方法依赖于高维数据点之间的吸引和排斥力来获得低维嵌入。在本文中，我们分析了这些力量，揭示它们对聚类形成和可视化的影响。排斥力强调差异，控制聚类边界和簇间距离。吸引力更加微妙，因为点之间的吸引张力可以同时表现为低维映射中的吸引和排斥。这解释了学习率退火的必要性，并激发了对吸引和排斥项之间不同处理方式的动机。此外，通过修改吸引力，我们改善了在随机初始化下聚类形成的一致性。总的来说，我们的分析使UMAP和类似的嵌入方法更具可解释性、更加健壮和更加准确。

更新时间: 2025-03-12 06:37:43

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.09101v1

Trustworthy AIGC Copyright Management with Full Lifecycle Recording and Multi-party Supervision in Blockchain

As artificial intelligence technology becomes increasingly widespread, AI-generated content (AIGC) is gradually penetrating into many fields. Although AIGC plays an increasingly prominent role in business and cultural communication, the issue of copyright has also triggered widespread social discussion. The current legal system for copyright is built around human creators, yet in the realm of AIGC, the role of humans in content creation has diminished, with the creative expression primarily reliant on artificial intelligence. This discrepancy has led to numerous complexities and challenges in determining the copyright ownership of AIGC within the established legal boundaries. In view of this, it is necessary to meticulously record contributions of all entities involved in the generation of AIGC to achieve a fair distribution of copyright. For this purpose, this study thoroughly records the intermediate data generated throughout the full lifecycle of AIGC and deposits them into a decentralized blockchain system for secure multi-party supervision, thereby constructing a trustworthy AIGC copyright management system. In the event of copyright disputes, auditors can retrieve valuable proof from the blockchain, accurately defining the copyright ownership of AIGC products. Both theoretical and experimental analyses confirm that this scheme shows exceptional performance and security in the management of AIGC copyrights.

Updated: 2025-03-12 06:33:02

标题: 在区块链中实现可信的AIGC版权管理，具有全生命周期记录和多方监督

摘要: 随着人工智能技术日益普及，人工智能生成内容（AIGC）逐渐渗入到许多领域。尽管AIGC在商业和文化交流中发挥着越来越重要的作用，但版权问题也引发了广泛的社会讨论。当前的版权法律体系是建立在人类创作者之上的，然而在AIGC领域，人类在内容创作中的角色已经减弱，创意表达主要依赖于人工智能。这种差异导致了在确定AIGC版权所有权方面存在许多复杂性和挑战。鉴于此，有必要仔细记录所有参与AIGC生成的实体的贡献，以实现版权的公平分配。为此，本研究全面记录了AIGC完整生命周期中生成的中间数据，并将其存入去中心化的区块链系统，以进行安全的多方监督，从而构建一个可信赖的AIGC版权管理系统。在版权争议发生时，审计员可以从区块链中检索有价值的证据，准确定义AIGC产品的版权所有权。理论和实验分析均证实，该方案在AIGC版权管理方面表现出优异的性能和安全性。

更新时间: 2025-03-12 06:33:02

领域: cs.CY,cs.CR

下载: http://arxiv.org/abs/2406.14966v2

Simulation of Two-Qubit Grover Algorithm in MBQC with Universal Blind Quantum Computation

The advancement of quantum computing technology has led to the emergence of early-stage quantum cloud computing services. To fully realize the potential of quantum cloud computing, it is essential to develop techniques that ensure the privacy of both data and functions. Quantum computations often leverage superposition to evaluate a function on all possible inputs simultaneously, making function privacy a critical requirement. In 2009, Broadbent et al. introduced the Universal Blind Quantum Computation (UBQC) protocol, which is based on Measurement-Based Quantum Computation (MBQC) and provides a framework for ensuring both function and data privacy in quantum computing. Although theoretical results indicate an equivalence between MBQC and circuitbased quantum computation, translating MBQC into circuitbased implementations remains challenging due to higher qubit requirements and the complexity of the transformation process. Consequently, current quantum cloud computing platforms are limited in their ability to simulate MBQC efficiently. This paper presents an efficient method to simulate MBQC on circuit-based quantum computing platforms. We validate this approach by implementing the two-qubit Grover algorithm in the MBQC framework and further demonstrate blindness by applying the UBQC protocol. This work verifies the simulation of a blind quantum computation using the two-qubit Grover algorithm on a circuit-based quantum computing platform.

Updated: 2025-03-12 06:29:35

标题: 在MBQC中使用通用盲量子计算模拟双量子比特Grover算法

摘要: 量子计算技术的进步导致了早期量子云计算服务的出现。为了充分实现量子云计算的潜力，必须开发确保数据和功能隐私的技术。量子计算通常利用叠加来同时评估所有可能的输入上的函数，使函数隐私成为一个关键要求。2009年，Broadbent等人提出了基于测量的量子计算（MBQC）的通用盲量子计算（UBQC）协议，为确保量子计算中的函数和数据隐私提供了框架。尽管理论结果表明MBQC和基于电路的量子计算之间存在等价性，但将MBQC转换为基于电路的实现仍然具有挑战性，因为需要更多的量子比特和转换过程的复杂性。因此，当前的量子云计算平台在有效模拟MBQC方面存在局限性。本文提出了一种在基于电路的量子计算平台上模拟MBQC的高效方法。我们通过在MBQC框架中实现两量子比特的Grover算法来验证这种方法，并进一步通过应用UBQC协议来证明盲目性。这项工作验证了在基于电路的量子计算平台上使用两量子比特的Grover算法模拟盲量子计算。

更新时间: 2025-03-12 06:29:35

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2503.09099v1

Parallel Backpropagation for Inverse of a Convolution with Application to Normalizing Flows

The inverse of an invertible convolution is an important operation that comes up in Normalizing Flows, Image Deblurring, etc. The naive algorithm for backpropagation of this operation using Gaussian elimination has running time $O(n^3)$ where $n$ is the number of pixels in the image. We give a fast parallel backpropagation algorithm with running time $O(\sqrt{n})$ for a square image and provide a GPU implementation of the same. Inverse of Convolutions are usually used in Normalizing Flows in the sampling pass, making them slow. We propose to use the Inverse of Convolutions in the forward (image to latent vector) pass of the Normalizing flow. Since the sampling pass is the inverse of the forward pass, it will use convolutions only, resulting in efficient sampling times. We use our parallel backpropagation algorithm to optimize the inverse of the convolution layer, resulting in fast training times. We implement this approach in various Normalizing Flow backbones, resulting in our Inverse-Flow models. We benchmark Inverse-Flow on standard datasets and show significantly improved sampling times with similar bits per dimension compared to previous models.

Updated: 2025-03-12 06:28:50

标题: 并行反向传播用于卷积的逆运算，应用于归一化流

摘要: 可逆卷积的逆运算是一种重要的操作，在归一化流、图像去模糊等领域经常出现。使用高斯消元的朴素算法进行该运算的反向传播，时间复杂度为$O(n^3)$，其中$n$为图像中的像素数量。我们提出了一种快速并行反向传播算法，对于正方形图像，时间复杂度为$O(\sqrt{n})，并提供了相应的GPU实现。卷积的逆运算通常在归一化流的采样过程中使用，导致速度较慢。我们建议在归一化流的前向（图像到潜在向量）传递中使用卷积的逆运算。由于采样过程是前向传递的逆过程，它将仅使用卷积，从而实现高效的采样时间。我们使用并行反向传播算法来优化卷积层的逆运算，从而获得快速的训练时间。我们在各种归一化流主干中实现了这种方法，形成了我们的Inverse-Flow模型。我们在标准数据集上对Inverse-Flow进行基准测试，与以前的模型相比，显示出显著改善的采样时间，且每维比特相似。

更新时间: 2025-03-12 06:28:50

领域: cs.CV,cs.LG,cs.MM,math.PR

下载: http://arxiv.org/abs/2410.14634v3

Zero-Shot Subject-Centric Generation for Creative Application Using Entropy Fusion

Generative models are widely used in visual content creation. However, current text-to-image models often face challenges in practical applications-such as textile pattern design and meme generation-due to the presence of unwanted elements that are difficult to separate with existing methods. Meanwhile, subject-reference generation has emerged as a key research trend, highlighting the need for techniques that can produce clean, high-quality subject images while effectively removing extraneous components. To address this challenge, we introduce a framework for reliable subject-centric image generation. In this work, we propose an entropy-based feature-weighted fusion method to merge the informative cross-attention features obtained from each sampling step of the pretrained text-to-image model FLUX, enabling a precise mask prediction and subject-centric generation. Additionally, we have developed an agent framework based on Large Language Models (LLMs) that translates users' casual inputs into more descriptive prompts, leading to highly detailed image generation. Simultaneously, the agents extract primary elements of prompts to guide the entropy-based feature fusion, ensuring focused primary element generation without extraneous components. Experimental results and user studies demonstrate our methods generates high-quality subject-centric images, outperform existing methods or other possible pipelines, highlighting the effectiveness of our approach.

Updated: 2025-03-12 06:27:30

标题: 零样本主体中心生成：使用熵融合进行创意应用

摘要: 生成模型在视觉内容创建中被广泛应用。然而，当前的文本到图像模型在实际应用中常常面临挑战，比如在纺织图案设计和模因生成中，存在难以用现有方法分离的不需要的元素。与此同时，主题参考生成已经成为一个重要的研究趋势，强调了需要能够产生干净、高质量主题图像的技术，同时有效地去除多余组件。为了解决这一挑战，我们提出了一个可靠的以主题为中心的图像生成框架。在这项工作中，我们提出了一种基于熵的特征加权融合方法，用于合并从预训练文本到图像模型FLUX的每个采样步骤中获得的信息交叉注意力特征，实现精确的掩膜预测和以主题为中心的生成。此外，我们还开发了一个基于大型语言模型（LLM）的代理框架，将用户的随意输入转化为更具描述性的提示，从而实现高度详细的图像生成。同时，代理提取提示的主要元素，以指导基于熵的特征融合，确保在没有多余组件的情况下生成集中的主要元素。实验结果和用户研究证明了我们的方法产生了高质量的以主题为中心的图像，优于现有方法或其他可能的流程，突显了我们方法的有效性。

更新时间: 2025-03-12 06:27:30

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2503.10697v1

Self-Consistent Equation-guided Neural Networks for Censored Time-to-Event Data

In survival analysis, estimating the conditional survival function given predictors is often of interest. There is a growing trend in the development of deep learning methods for analyzing censored time-to-event data, especially when dealing with high-dimensional predictors that are complexly interrelated. Many existing deep learning approaches for estimating the conditional survival functions extend the Cox regression models by replacing the linear function of predictor effects by a shallow feed-forward neural network while maintaining the proportional hazards assumption. Their implementation can be computationally intensive due to the use of the full dataset at each iteration because the use of batch data may distort the at-risk set of the partial likelihood function. To overcome these limitations, we propose a novel deep learning approach to non-parametric estimation of the conditional survival functions using the generative adversarial networks leveraging self-consistent equations. The proposed method is model-free and does not require any parametric assumptions on the structure of the conditional survival function. We establish the convergence rate of our proposed estimator of the conditional survival function. In addition, we evaluate the performance of the proposed method through simulation studies and demonstrate its application on a real-world dataset.

Updated: 2025-03-12 06:24:35

标题: 自洽方程引导的神经网络用于截尾时间事件数据

摘要: 在生存分析中，估计给定预测因子的条件生存函数通常是感兴趣的。近年来，在开发用于分析截尾时间至事件数据的深度学习方法方面存在着一个增长趋势，特别是在处理复杂相互关联的高维预测因子时。许多现有的深度学习方法用于估计条件生存函数，通过将 Cox 回归模型扩展，将预测因子效应的线性函数替换为浅层前馈神经网络，同时保持比例风险假设。由于在每次迭代中使用完整数据集，它们的实现可能计算密集，因为使用批处理数据可能扭曲部分似然函数的风险集。为了克服这些限制，我们提出了一种利用自洽方程的生成对抗网络来进行条件生存函数的非参数估计的新型深度学习方法。所提出的方法是无模型的，不需要对条件生存函数的结构进行任何参数假设。我们建立了提出的条件生存函数估计器的收敛速度。此外，我们通过模拟研究评估了所提出方法的性能，并展示了其在真实数据集上的应用。

更新时间: 2025-03-12 06:24:35

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2503.09097v1

C^2 ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion

Backdoor attacks pose a significant threat to deep learning models, enabling adversaries to embed hidden triggers that manipulate the behavior of the model during inference. Traditional backdoor attacks typically rely on inserting explicit triggers (e.g., external patches, or perturbations) into input data, but they often struggle to evade existing defense mechanisms. To address this limitation, we investigate backdoor attacks through the lens of the reasoning process in deep learning systems, drawing insights from interpretable AI. We conceptualize backdoor activation as the manipulation of learned concepts within the model's latent representations. Thus, existing attacks can be seen as implicit manipulations of these activated concepts during inference. This raises interesting questions: why not manipulate the concepts explicitly? This idea leads to our novel backdoor attack framework, Concept Confusion Attack (C^2 ATTACK), which leverages internal concepts in the model's reasoning as "triggers" without introducing explicit external modifications. By avoiding the use of real triggers and directly activating or deactivating specific concepts in latent spaces, our approach enhances stealth, making detection by existing defenses significantly harder. Using CLIP as a case study, experimental results demonstrate the effectiveness of C^2 ATTACK, achieving high attack success rates while maintaining robustness against advanced defenses.

Updated: 2025-03-12 06:17:12

标题: C^2 ATTACK：通过概念混淆实现对CLIP的表示后门

摘要: 后门攻击对深度学习模型构成重大威胁，使对手能够嵌入隐藏的触发器，在推断过程中操纵模型的行为。传统的后门攻击通常依赖于向输入数据中插入显式触发器（例如外部补丁或扰动），但它们通常难以逃避现有的防御机制。为了解决这一限制，我们通过可解释的人工智能的视角研究后门攻击的推理过程，从中获得洞见。我们将后门激活概念化为对模型潜在表示中学习概念的操纵。因此，现有的攻击可以看作是在推理过程中对这些激活的概念的隐式操纵。这引发了一个有趣的问题：为什么不明确操纵这些概念？这个想法导致了我们的新型后门攻击框架，概念混淆攻击（C^2 ATTACK），它利用模型推理中的内部概念作为“触发器”，而不引入显式的外部修改。通过避免使用真实的触发器，并直接激活或停用潜在空间中的特定概念，我们的方法增强了隐蔽性，使得现有防御机制更难检测。以CLIP为案例研究，实验结果证明了C^2 ATTACK的有效性，在保持对抗先进防御的稳健性的同时，实现了高攻击成功率。

更新时间: 2025-03-12 06:17:12

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2503.09095v1

Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models

This study explores the realm of knowledge base question answering (KBQA). KBQA is considered a challenging task, particularly in parsing intricate questions into executable logical forms. Traditional semantic parsing (SP)-based methods require extensive data annotations, which result in significant costs. Recently, the advent of few-shot in-context learning, powered by large language models (LLMs), has showcased promising capabilities. However, fully leveraging LLMs to parse questions into logical forms in low-resource scenarios poses a substantial challenge. To tackle these hurdles, we introduce Interactive-KBQA, a framework designed to generate logical forms through direct interaction with knowledge bases (KBs). Within this framework, we have developed three generic APIs for KB interaction. For each category of complex question, we devised exemplars to guide LLMs through the reasoning processes. Our method achieves competitive results on the WebQuestionsSP, ComplexWebQuestions, KQA Pro, and MetaQA datasets with a minimal number of examples (shots). Importantly, our approach supports manual intervention, allowing for the iterative refinement of LLM outputs. By annotating a dataset with step-wise reasoning processes, we showcase our model's adaptability and highlight its potential for contributing significant enhancements to the field.

Updated: 2025-03-12 06:15:34

标题: Interactive-KBQA：使用大型语言模型进行知识库问答的多轮交互

摘要: 这项研究探索了知识库问答（KBQA）领域。KBQA被认为是一个具有挑战性的任务，特别是在将复杂问题解析成可执行的逻辑形式方面。传统的基于语义解析（SP）的方法需要大量的数据注释，这导致了显著的成本。最近，少样本上下文学习的出现，由大型语言模型（LLMs）支持，展示了有希望的能力。然而，充分利用LLMs在低资源情况下将问题解析成逻辑形式是一个重大挑战。为了解决这些障碍，我们引入了交互式KBQA，这是一个旨在通过与知识库（KBs）直接交互生成逻辑形式的框架。在这个框架内，我们开发了三个用于KB交互的通用API。对于每一类复杂问题，我们设计了示例来引导LLMs进行推理过程。我们的方法在WebQuestionsSP、ComplexWebQuestions、KQA Pro和MetaQA数据集上以最少数量的示例（shots）取得了竞争性的结果。重要的是，我们的方法支持手动干预，允许对LLM输出进行迭代改进。通过用逐步推理过程对数据集进行注释，我们展示了我们模型的适应性，并突出了其为该领域做出显著改进的潜力。

更新时间: 2025-03-12 06:15:34

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2402.15131v3

Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process

Gaussian process (GP) is arguably one of the most widely used machine learning algorithms in practice. One of its prominent applications is Bayesian optimization (BO). Although the vanilla GP itself is already a powerful tool for BO, it is often beneficial to be able to consider the dependencies of multiple outputs. To do so, Multi-task GP (MTGP) is formulated, but it is not trivial to fully understand the derivations of its formulations and their gradients from the previous literature. This paper serves friendly derivations of the MTGP formulations and their gradients.

Updated: 2025-03-12 06:12:01

标题: 多输出（又称多任务）高斯过程的输出相关性推断导出

摘要: 高斯过程（GP）可以说是实践中最广泛使用的机器学习算法之一。其中一个显著的应用是贝叶斯优化（BO）。尽管原始的GP本身已经是BO的一个强大工具，但能够考虑多个输出之间的依赖关系往往是有益的。为此，提出了多任务GP（MTGP），但要完全理解其公式和梯度的推导并不容易从以前的文献中进行。本文提供了MTGP公式和梯度的友好推导。

更新时间: 2025-03-12 06:12:01

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2501.07964v3

Multi-Modal Foundation Models for Computational Pathology: A Survey

Foundation models have emerged as a powerful paradigm in computational pathology (CPath), enabling scalable and generalizable analysis of histopathological images. While early developments centered on uni-modal models trained solely on visual data, recent advances have highlighted the promise of multi-modal foundation models that integrate heterogeneous data sources such as textual reports, structured domain knowledge, and molecular profiles. In this survey, we provide a comprehensive and up-to-date review of multi-modal foundation models in CPath, with a particular focus on models built upon hematoxylin and eosin (H&E) stained whole slide images (WSIs) and tile-level representations. We categorize 32 state-of-the-art multi-modal foundation models into three major paradigms: vision-language, vision-knowledge graph, and vision-gene expression. We further divide vision-language models into non-LLM-based and LLM-based approaches. Additionally, we analyze 28 available multi-modal datasets tailored for pathology, grouped into image-text pairs, instruction datasets, and image-other modality pairs. Our survey also presents a taxonomy of downstream tasks, highlights training and evaluation strategies, and identifies key challenges and future directions. We aim for this survey to serve as a valuable resource for researchers and practitioners working at the intersection of pathology and AI.

Updated: 2025-03-12 06:03:33

标题: 多模态基础模型用于计算病理学：一项调查

摘要: 基础模型已经成为计算病理学（CPath）中的一种强大范式，使得组织病理学图像的可扩展和可泛化分析成为可能。尽管早期的发展集中于仅基于视觉数据训练的单模型，但最近的进展突显了多模态基础模型的潜力，这些模型整合了异质数据源，如文本报告、结构化领域知识和分子特征。在这项调查中，我们对CPath中的多模态基础模型进行了全面和最新的回顾，特别关注建立在溴甲蓝和嗜酸性染色（H&E）全幻灯片图像（WSIs）和瓦片级表示之上的模型。我们将32个最先进的多模态基础模型分为三个主要范式：视觉-语言、视觉-知识图谱和视觉-基因表达。我们进一步将视觉-语言模型分为非LLM基础和LLM基础方法。此外，我们分析了28个专为病理学定制的多模态数据集，分为图像-文本对、指导数据集和图像-其他模态对。我们的调查还提供了下游任务的分类法，突出了训练和评估策略，并确定了关键挑战和未来发展方向。我们希望这项调查能成为在病理学和人工智能交叉领域工作的研究人员和实践者的宝贵资源。

更新时间: 2025-03-12 06:03:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09091v1

Hamiltonian Neural Networks for Robust Out-of-Time Credit Scoring

This paper presents a novel credit scoring approach using neural networks to address class imbalance and out-of-time prediction challenges. We develop a specific optimizer and loss function inspired by Hamiltonian mechanics that better captures credit risk dynamics. Testing on the Freddie Mac Single-Family Loan-Level Dataset shows our model achieves superior discriminative power (AUC) in out-of-time scenarios compared to conventional methods. The approach has consistent performance between in-sample and future test sets, maintaining reliability across time periods. This interdisciplinary method spans physical systems theory and financial risk management, offering practical advantages for long-term model stability.

Updated: 2025-03-12 06:03:20

标题: 汉密尔顿神经网络用于稳健的超时信用评分

摘要: 这篇论文提出了一种使用神经网络的新型信用评分方法，以解决类别不平衡和超出时间预测挑战。我们开发了一个受哈密顿力学启发的特定优化器和损失函数，更好地捕捉信用风险动态。在Freddie Mac单一家庭贷款级数据集上的测试显示，与传统方法相比，我们的模型在超出时间场景中实现了更优越的判别能力（AUC）。该方法在样本内和未来测试集之间具有一致的表现，保持了随时间变化的可靠性。这种跨学科方法涵盖了物理系统理论和金融风险管理，为长期模型稳定性提供了实际优势。

更新时间: 2025-03-12 06:03:20

领域: cs.LG

下载: http://arxiv.org/abs/2410.10182v2

Large Language Model as Meta-Surrogate for Data-Driven Many-Task Optimization: A Proof-of-Principle Study

In many-task optimization scenarios, surrogate models are valuable for mitigating the computational burden of repeated fitness evaluations across tasks. This study proposes a novel meta-surrogate framework to assist many-task optimization, by leveraging the knowledge transfer strengths and emergent capabilities of large language models (LLMs). We formulate a unified framework for many-task fitness prediction, by defining a universal model with metadata to fit a group of problems. Fitness prediction is performed on metadata and decision variables, enabling efficient knowledge sharing across tasks and adaptability to new tasks. The LLM-based meta-surrogate treats fitness prediction as conditional probability estimation, employing a unified token sequence representation for task metadata, inputs, and outputs. This approach facilitates efficient inter-task knowledge sharing through shared token embeddings and captures complex task dependencies via multi-task model training. Experimental results demonstrate the model's emergent generalization ability, including zero-shot performance on problems with unseen dimensions. When integrated into evolutionary transfer optimization (ETO), our framework supports dual-level knowledge transfer -- at both the surrogate and individual levels -- enhancing optimization efficiency and robustness. This work establishes a novel foundation for applying LLMs in surrogate modeling, offering a versatile solution for many-task optimization.

Updated: 2025-03-12 06:00:27

标题: 大语言模型作为数据驱动的多任务优化的元替代物：一个原理验证研究

摘要: 在许多任务优化场景中，代理模型对于缓解跨任务重复适应度评估的计算负担是非常有价值的。本研究提出了一个新颖的元代理框架，通过利用大型语言模型（LLMs）的知识转移优势和新兴能力来辅助许多任务优化。我们制定了一个统一的框架用于许多任务适应度预测，通过定义一个具有元数据的通用模型来适应一组问题。适应度预测是在元数据和决策变量上进行的，可以实现跨任务的有效知识共享并适应新任务。基于LLM的元代理将适应度预测视为条件概率估计，利用统一的令牌序列表示任务元数据、输入和输出。这种方法通过共享令牌嵌入实现了跨任务知识的有效共享，并通过多任务模型训练捕获复杂的任务依赖关系。实验结果表明了模型的新兴泛化能力，包括在具有未见维度的问题上的零射击性能。当集成到进化转移优化（ETO）中时，我们的框架支持双层知识转移 - 在代理和个体级别 - 提高了优化效率和稳健性。这项工作为在代理建模中应用LLMs奠定了一个新颖的基础，为许多任务优化提供了一个多功能的解决方案。

更新时间: 2025-03-12 06:00:27

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2503.08301v2

Safe RuleFit: Learning Optimal Sparse Rule Model by Meta Safe Screening

We consider the problem of learning a sparse rule model, a prediction model in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyper-rectangle in the input space. Since the number of all possible such rules is extremely large, it has been computationally intractable to select the optimal set of active rules. In this paper, to solve this difficulty for learning the optimal sparse rule model, we propose Safe RuleFit (SRF). Our basic idea is to develop meta safe screening (mSS), which is a non-trivial extension of well-known safe screening (SS) techniques. While SS is used for screening out one feature, mSS can be used for screening out multiple features by exploiting the inclusion-relations of hyper-rectangles in the input space. SRF provides a general framework for fitting sparse rule models for regression and classification, and it can be extended to handle more general sparse regularizations such as group regularization. We demonstrate the advantages of SRF through intensive numerical experiments.

Updated: 2025-03-12 05:59:48

标题: Safe RuleFit: 通过元安全筛选学习最优稀疏规则模型

摘要: 我们考虑学习稀疏规则模型的问题，即以规则的稀疏线性组合形式的预测模型，其中规则是在输入空间中定义的超矩形上的指示函数。由于所有可能的规则数量极其庞大，选择最佳活动规则集是计算上难以处理的。在本文中，为了解决学习最佳稀疏规则模型的困难，我们提出了Safe RuleFit (SRF)。我们的基本思想是开发元安全筛选（mSS），这是众所周知的安全筛选（SS）技术的一个非平凡扩展。虽然SS用于筛选出一个特征，mSS可以利用输入空间中超矩形的包含关系来筛选出多个特征。SRF为回归和分类拟合稀疏规则模型提供了一个通用框架，并且可以扩展到处理更一般的稀疏正则化，如群组正则化。我们通过大量的数值实验展示了SRF的优势。

更新时间: 2025-03-12 05:59:48

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/1810.01683v2

LocAgent: Graph-Guided LLM Agents for Code Localization

Code localization--identifying precisely where in a codebase changes need to be made--is a fundamental yet challenging task in software maintenance. Existing approaches struggle to efficiently navigate complex codebases when identifying relevant code sections. The challenge lies in bridging natural language problem descriptions with the appropriate code elements, often requiring reasoning across hierarchical structures and multiple dependencies. We introduce LocAgent, a framework that addresses code localization through graph-based representation. By parsing codebases into directed heterogeneous graphs, LocAgent creates a lightweight representation that captures code structures (files, classes, functions) and their dependencies (imports, invocations, inheritance), enabling LLM agents to effectively search and locate relevant entities through powerful multi-hop reasoning. Experimental results on real-world benchmarks demonstrate that our approach significantly enhances accuracy in code localization. Notably, our method with the fine-tuned Qwen-2.5-Coder-Instruct-32B model achieves comparable results to SOTA proprietary models at greatly reduced cost (approximately 86% reduction), reaching up to 92.7% accuracy on file-level localization while improving downstream GitHub issue resolution success rates by 12% for multiple attempts (Pass@10). Our code is available at https://github.com/gersteinlab/LocAgent.

Updated: 2025-03-12 05:55:01

标题: LocAgent：用于代码定位的图引导的LLM代理

摘要: 代码本地化——准确确定需要进行更改的代码库中的位置——是软件维护中的一项基本且具有挑战性的任务。现有方法在识别相关代码部分时往往难以高效地导航复杂的代码库。挑战在于将自然语言问题描述与适当的代码元素进行桥接，通常需要跨层次结构和多个依赖进行推理。我们介绍了LocAgent，这是一个通过基于图的表示解决代码本地化的框架。通过将代码库解析为定向异构图，LocAgent创建了一个轻量级表示，捕捉了代码结构（文件、类、函数）及其依赖关系（导入、调用、继承），使LLM代理能够通过强大的多跳推理有效地搜索和定位相关实体。对真实世界基准测试的实验结果表明，我们的方法显著提高了代码本地化的准确性。值得注意的是，我们的方法与经过微调的Qwen-2.5-Coder-Instruct-32B模型在大大降低成本（约86%降低）的情况下实现了与SOTA专有模型相当的结果，文件级本地化准确率达到92.7%，同时将下游GitHub问题解决成功率提高了12%（Pass@10）以获得多次尝试。我们的代码可在https://github.com/gersteinlab/LocAgent找到。

更新时间: 2025-03-12 05:55:01

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.09089v1

A Unified Framework for Motion Reasoning and Generation in Human Interaction

Recent advancements in large language models (LLMs) have significantly improved their ability to generate natural and contextually relevant text, enabling more human-like AI interactions. However, generating and understanding interactive human-like motion, where multiple individuals engage in coordinated movements, remains challenging due to the complexity of modeling these interactions. Additionally, a unified and versatile model is needed to handle diverse interactive scenarios, such as chat systems that dynamically adapt to user instructions and assigned roles. To address these challenges, we introduce VIM, the Versatile Interactive Motion-language model, which integrates both language and motion modalities to effectively understand, generate, and control interactive motions in multi-turn conversational contexts. Unlike previous studies that primarily focus on uni-directional tasks such as text-to-motion or motion-to-text, VIM employs a unified architecture capable of simultaneously understanding and generating both motion and text modalities. Given the absence of an appropriate dataset to support this task, we introduce Inter-MT2, a large-scale instruction-tuning dataset containing 82.7K multi-turn interactive motion instructions, covering 153K interactive motion samples. Inter-MT2 spans diverse instructional scenarios, including motion editing, question answering, and story generation, leveraging off-the-shelf large language models and motion diffusion models to construct a broad set of interactive motion instructions. We extensively evaluate the versatility of VIM across multiple interactive motion-related tasks, including motion-to-text, text-to-motion, reaction generation, motion editing, and reasoning about motion sequences.

Updated: 2025-03-12 05:54:44

标题: 一个统一的框架用于人类交互中的运动推理和生成

摘要: 最近对大型语言模型（LLMs）的进展显著提高了它们生成自然和上下文相关文本的能力，使得更具人类化的人工智能交互成为可能。然而，生成和理解交互式人类化动作，即多个个体进行协调运动，仍然具有挑战性，因为建模这些交互的复杂性。此外，需要一个统一且多功能的模型来处理各种交互情境，例如动态适应用户指令和分配角色的聊天系统。为了解决这些挑战，我们介绍了VIM，即多功能交互式动作语言模型，它整合了语言和动作模态，有效地理解、生成和控制多轮对话背景下的交互动作。与之前主要专注于单向任务（如文本到动作或动作到文本）的研究不同，VIM采用了一个统一的架构，能够同时理解和生成动作和文本模态。鉴于没有适当的数据集支持这一任务，我们介绍了Inter-MT2，一个包含82.7K多轮交互式动作指令的大规模调整数据集，涵盖了153K交互式动作样本。Inter-MT2涵盖了各种指令情境，包括动作编辑、问题回答和故事生成，利用现成的大型语言模型和动作扩散模型构建了广泛的交互式动作指令集。我们广泛评估了VIM在多个交互式动作相关任务中的多功能性，包括动作到文本、文本到动作、反应生成、动作编辑以及关于动作序列的推理。

更新时间: 2025-03-12 05:54:44

领域: cs.AI

下载: http://arxiv.org/abs/2410.05628v5

Large Language Model Enhanced Knowledge Representation Learning: A Survey

Knowledge Representation Learning (KRL) is crucial for enabling applications of symbolic knowledge from Knowledge Graphs (KGs) to downstream tasks by projecting knowledge facts into vector spaces. Despite their effectiveness in modeling KG structural information, KRL methods are suffering from the sparseness of KGs. The rise of Large Language Models (LLMs) built on the Transformer architecture presents promising opportunities for enhancing KRL by incorporating textual information to address information sparsity in KGs. LLM-enhanced KRL methods, including three key approaches, encoder-based methods that leverage detailed contextual information, encoder-decoder-based methods that utilize a unified Seq2Seq model for comprehensive encoding and decoding, and decoder-based methods that utilize extensive knowledge from large corpora, have significantly advanced the effectiveness and generalization of KRL in addressing a wide range of downstream tasks. This work provides a broad overview of downstream tasks while simultaneously identifying emerging research directions in these evolving domains.

Updated: 2025-03-12 05:48:32

标题: 大型语言模型增强知识表示学习：一项调查

摘要: 知识表示学习（KRL）对于通过将知识事实投影到向量空间来实现从知识图谱（KGs）到下游任务的应用至关重要。尽管KRL方法在建模KG结构信息方面表现出有效性，但由于KG的稀疏性，这些方法正在遭受困扰。建立在Transformer架构上的大型语言模型（LLMs）的兴起为通过整合文本信息来解决KG中信息稀疏性问题的KRL提供了有希望的机会。LLM增强的KRL方法包括三种关键方法，即利用详细上下文信息的基于编码器的方法，利用统一的Seq2Seq模型进行全面编码和解码的编码器-解码器方法，以及利用大型语料库中的广泛知识的基于解码器的方法，这些方法显著提高了KRL在解决各种下游任务方面的有效性和泛化能力。本文对下游任务进行了广泛的概述，同时识别了这些不断发展领域中新兴的研究方向。

更新时间: 2025-03-12 05:48:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.00936v4

Differentiable Folding for Nearest Neighbor Model Optimization

The Nearest Neighbor model is the $\textit{de facto}$ thermodynamic model of RNA secondary structure formation and is a cornerstone of RNA structure prediction and sequence design. The current functional form (Turner 2004) contains $\approx13,000$ underlying thermodynamic parameters, and fitting these to both experimental and structural data is computationally challenging. Here, we leverage recent advances in $\textit{differentiable folding}$, a method for directly computing gradients of the RNA folding algorithms, to devise an efficient, scalable, and flexible means of parameter optimization that uses known RNA structures and thermodynamic experiments. Our method yields a significantly improved parameter set that outperforms existing baselines on all metrics, including an increase in the average predicted probability of ground-truth sequence-structure pairs for a single RNA family by over 23 orders of magnitude. Our framework provides a path towards drastically improved RNA models, enabling the flexible incorporation of new experimental data, definition of novel loss terms, large training sets, and even treatment as a module in larger deep learning pipelines. We make available a new database, RNAometer, with experimentally-determined stabilities for small RNA model systems.

Updated: 2025-03-12 05:36:12

标题: Nearest Neighbor模型优化的可微折叠

摘要: 最近邻模型是RNA二级结构形成的事实上的热力学模型，是RNA结构预测和序列设计的基石。当前的功能形式（Turner 2004）包含大约13,000个基础热力学参数，将这些参数拟合到实验和结构数据上是具有挑战性的计算任务。在这里，我们利用了最近在可微折叠方面的进展，这是一种直接计算RNA折叠算法梯度的方法，设计了一种高效、可扩展和灵活的参数优化方法，利用已知的RNA结构和热力学实验。我们的方法产生了一组显著改进的参数集，优于现有基线的所有度量标准，包括一个RNA家族的平均预测概率增加了23个数量级以上的真实序列-结构对。我们的框架为大幅改进的RNA模型提供了途径，使得能够灵活地结合新的实验数据，定义新的损失项，使用大型训练集，甚至作为更大的深度学习管道中的一个模块。我们提供了一个新的数据库，RNAometer，其中包含小RNA模型系统的经过实验确定的稳定性。

更新时间: 2025-03-12 05:36:12

领域: q-bio.BM,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2503.09085v1

Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment

Long Video Question Answering (LVQA) is challenging due to the need for temporal reasoning and large-scale multimodal data processing. Existing methods struggle with retrieving cross-modal information from long videos, especially when relevant details are sparsely distributed. We introduce UMaT (Unified Multi-modal as Text), a retrieval-augmented generation (RAG) framework that efficiently processes extremely long videos while maintaining cross-modal coherence. UMaT converts visual and auditory data into a unified textual representation, ensuring semantic and temporal alignment. Short video clips are analyzed using a vision-language model, while automatic speech recognition (ASR) transcribes dialogue. These text-based representations are structured into temporally aligned segments, with adaptive filtering to remove redundancy and retain salient details. The processed data is embedded into a vector database, enabling precise retrieval of dispersed yet relevant content. Experiments on a benchmark LVQA dataset show that UMaT outperforms existing methods in multimodal integration, long-form video understanding, and sparse information retrieval. Its scalability and interpretability allow it to process videos over an hour long while maintaining semantic and temporal coherence. These findings underscore the importance of structured retrieval and multimodal synchronization for advancing LVQA and long-form AI systems.

Updated: 2025-03-12 05:28:24

标题: 一切皆可用文字描述：具有语义和时间对齐的简单统一多模态框架

摘要: 长视频问答（LVQA）由于需要时间推理和大规模多模态数据处理而具有挑战性。现有方法在从长视频中检索跨模态信息方面存在困难，特别是当相关细节稀疏分布时。我们介绍了UMaT（Unified Multi-modal as Text），这是一个检索增强生成（RAG）框架，可以高效处理极长视频，并保持跨模态一致性。UMaT将视觉和听觉数据转换为统一的文本表示，确保语义和时间对齐。短视频剪辑使用视觉语言模型进行分析，同时自动语音识别（ASR）转录对话。这些基于文本的表示被结构化为时间对齐的段落，通过自适应过滤来消除冗余并保留显著细节。处理后的数据嵌入到矢量数据库中，实现对分散但相关内容的精确检索。对基准LVQA数据集的实验表明，UMaT在多模态整合、长视频理解和稀疏信息检索方面优于现有方法。其可扩展性和可解释性使其能够处理长达一个小时的视频，并保持语义和时间一致性。这些发现强调了结构化检索和多模态同步对推进LVQA和长篇AI系统的重要性。

更新时间: 2025-03-12 05:28:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.09081v1

FedMSGL: A Self-Expressive Hypergraph Based Federated Multi-View Learning

Federated learning is essential for enabling collaborative model training across decentralized data sources while preserving data privacy and security. This approach mitigates the risks associated with centralized data collection and addresses concerns related to data ownership and compliance. Despite significant advancements in federated learning algorithms that address communication bottlenecks and enhance privacy protection, existing works overlook the impact of differences in data feature dimensions, resulting in global models that disproportionately depend on participants with large feature dimensions. Additionally, current single-view federated learning methods fail to account for the unique characteristics of multi-view data, leading to suboptimal performance in processing such data. To address these issues, we propose a Self-expressive Hypergraph Based Federated Multi-view Learning method (FedMSGL). The proposed method leverages self-expressive character in the local training to learn uniform dimension subspace with latent sample relation. At the central side, an adaptive fusion technique is employed to generate the global model, while constructing a hypergraph from the learned global and view-specific subspace to capture intricate interconnections across views. Experiments on multi-view datasets with different feature dimensions validated the effectiveness of the proposed method.

Updated: 2025-03-12 05:13:45

标题: FedMSGL: 基于自表达超图的联邦多视图学习

摘要: 联邦学习对于在分散的数据源之间进行协作模型训练并同时保护数据隐私和安全至关重要。这种方法有助于减轻与集中式数据收集相关的风险，并解决与数据所有权和合规性相关的问题。尽管联邦学习算法取得了显著进展，解决了通信瓶颈并增强了隐私保护，但现有研究忽视了数据特征维度差异的影响，导致全局模型不成比例地依赖具有大特征维度的参与者。此外，当前的单视图联邦学习方法未能考虑多视图数据的独特特征，导致在处理此类数据时性能不佳。为了解决这些问题，我们提出了一种基于自表达超图的联邦多视图学习方法（FedMSGL）。所提出的方法利用本地训练中的自表达特性来学习具有潜在样本关系的统一维度子空间。在中央端，采用自适应融合技术生成全局模型，同时从学习的全局和视图特定子空间构建超图，以捕获跨视图的复杂相互关系。在具有不同特征维度的多视图数据集上的实验证实了所提出方法的有效性。

更新时间: 2025-03-12 05:13:45

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2503.09643v1

Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection

Visual instructions for long-horizon tasks are crucial as they intuitively clarify complex concepts and enhance retention across extended steps. Directly generating a series of images using text-to-image models without considering the context of previous steps results in inconsistent images, increasing cognitive load. Additionally, the generated images often miss objects or the attributes such as color, shape, and state of the objects are inaccurate. To address these challenges, we propose LIGER, the first training-free framework for Long-horizon Instruction GEneration with logic and attribute self-Reflection. LIGER first generates a draft image for each step with the historical prompt and visual memory of previous steps. This step-by-step generation approach maintains consistency between images in long-horizon tasks. Moreover, LIGER utilizes various image editing tools to rectify errors including wrong attributes, logic errors, object redundancy, and identity inconsistency in the draft images. Through this self-reflection mechanism, LIGER improves the logic and object attribute correctness of the images. To verify whether the generated images assist human understanding, we manually curated a new benchmark consisting of various long-horizon tasks. Human-annotated ground truth expressions reflect the human-defined criteria for how an image should appear to be illustrative. Experiments demonstrate the visual instructions generated by LIGER are more comprehensive compared with baseline methods.

Updated: 2025-03-12 05:11:02

标题: 长视野下的图像指导生成：逻辑和属性自我反思

摘要: 长时间任务的视觉指导至关重要，因为它们直观地澄清复杂概念，并增强在延长步骤中的保留。在不考虑先前步骤的上下文的情况下，直接使用文本到图像模型生成一系列图像会导致图像不一致，增加认知负荷。此外，生成的图像经常缺少物体或物体的属性（如颜色、形状和状态）不准确。为了解决这些挑战，我们提出了LIGER，这是第一个无需训练的框架，用于具有逻辑和属性自我反思的长视程指令生成。LIGER首先根据历史提示和先前步骤的视觉记忆为每个步骤生成草图图像。这种逐步生成方法在长时间任务中保持图像之间的一致性。此外，LIGER利用各种图像编辑工具纠正草稿图像中的错误，包括错误的属性、逻辑错误、物体冗余和身份不一致性。通过这种自我反思机制，LIGER提高了图像的逻辑和物体属性的正确性。为了验证生成的图像是否有助于人类理解，我们手动策划了一个包含各种长视程任务的新基准。人类注释的地面真实表达反映了人类定义的图像应该如何呈现以进行说明的标准。实验证明，与基线方法相比，LIGER生成的视觉指导更加全面。

更新时间: 2025-03-12 05:11:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.13500v1

Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog

In contrast to conventional visual question answering, video-grounded dialog necessitates a profound understanding of both dialog history and video content for accurate response generation. Despite commendable progress made by existing approaches, they still face the challenges of incrementally understanding complex dialog history and assimilating video information. In response to these challenges, we present an iterative search and reasoning framework, which consists of a textual encoder, a visual encoder, and a generator. Specifically, we devise a path search and aggregation strategy in the textual encoder, mining core cues from dialog history that are pivotal to understanding the posed questions. Concurrently, our visual encoder harnesses an iterative reasoning network to extract and emphasize critical visual markers from videos, enhancing the depth of visual comprehension. Finally, we utilize the pre-trained GPT-2 model as our answer generator to decode the mined hidden clues into coherent and contextualized answers. Extensive experiments on three public datasets demonstrate the effectiveness and generalizability of our proposed framework.

Updated: 2025-03-12 05:09:37

标题: 揭示隐藏的连接：基于视频对话的迭代搜索和推理

摘要: 与传统的视觉问答相比，基于视频的对话需要对对话历史和视频内容进行深入理解，以生成准确的响应。尽管现有方法取得了可观的进展，但仍面临逐步理解复杂对话历史和吸收视频信息的挑战。为了应对这些挑战，我们提出了一个迭代搜索和推理框架，包括一个文本编码器，一个视觉编码器和一个生成器。具体来说，我们在文本编码器中设计了一种路径搜索和聚合策略，从对话历史中挖掘关键线索，这些线索对理解提出的问题至关重要。同时，我们的视觉编码器利用迭代推理网络从视频中提取和强调关键的视觉标记，增强视觉理解的深度。最后，我们利用预训练的GPT-2模型作为我们的答案生成器，将挖掘出的隐藏线索解码成连贯和情境化的答案。对三个公共数据集的广泛实验证明了我们提出的框架的有效性和泛化性。

更新时间: 2025-03-12 05:09:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2310.07259v4

On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery

An intriguing property of the Transformer is its ability to perform in-context learning (ICL), where the Transformer can solve different inference tasks without parameter updating based on the contextual information provided by the corresponding input-output demonstration pairs. It has been theoretically proved that ICL is enabled by the capability of Transformers to perform gradient-descent algorithms (Von Oswald et al., 2023a; Bai et al., 2024). This work takes a step further and shows that Transformers can perform learning-to-optimize (L2O) algorithms. Specifically, for the ICL sparse recovery (formulated as LASSO) tasks, we show that a K-layer Transformer can perform an L2O algorithm with a provable convergence rate linear in K. This provides a new perspective explaining the superior ICL capability of Transformers, even with only a few layers, which cannot be achieved by the standard gradient-descent algorithms. Moreover, unlike the conventional L2O algorithms that require the measurement matrix involved in training to match that in testing, the trained Transformer is able to solve sparse recovery problems generated with different measurement matrices. Besides, Transformers as an L2O algorithm can leverage structural information embedded in the training tasks to accelerate its convergence during ICL, and generalize across different lengths of demonstration pairs, where conventional L2O algorithms typically struggle or fail. Such theoretical findings are supported by our experimental results.

Updated: 2025-03-12 05:09:21

标题: 关于变压器在上下文稀疏恢复中学习优化能力的研究

摘要: Transformer的一个引人注目的特性是其能够进行上下文学习（ICL），其中Transformer可以在基于上下文信息提供的相应输入-输出演示对的情况下解决不同的推理任务，而无需参数更新。已经理论证明ICL是由Transformer执行梯度下降算法的能力所实现的（Von Oswald等人，2023年a；Bai等人，2024年）。这项工作更进一步地表明，Transformer可以执行学习优化（L2O）算法。具体而言，对于ICL稀疏恢复（以LASSO形式表达）任务，我们展示了K层Transformer可以执行一个具有线性收敛速率的L2O算法。这为解释Transformer的卓越ICL能力提供了一个新的视角，即使只有几层，也无法通过标准梯度下降算法实现。此外，与需要训练中涉及的测量矩阵与测试中相匹配的传统L2O算法不同，训练好的Transformer能够解决使用不同测量矩阵生成的稀疏恢复问题。此外，作为L2O算法的Transformer可以利用嵌入在训练任务中的结构信息，在ICL期间加快其收敛速度，并在不同长度的演示对之间进行泛化，而传统的L2O算法通常会遇到困难或失败。这些理论发现得到了我们的实验结果的支持。

更新时间: 2025-03-12 05:09:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.13981v2

Theoretical Guarantees for High Order Trajectory Refinement in Generative Flows

Flow matching has emerged as a powerful framework for generative modeling, offering computational advantages over diffusion models by leveraging deterministic Ordinary Differential Equations (ODEs) instead of stochastic dynamics. While prior work established the worst case optimality of standard flow matching under Wasserstein distances, the theoretical guarantees for higher-order flow matching - which incorporates acceleration terms to refine sample trajectories - remain unexplored. In this paper, we bridge this gap by proving that higher-order flow matching preserves worst case optimality as a distribution estimator. We derive upper bounds on the estimation error for second-order flow matching, demonstrating that the convergence rates depend polynomially on the smoothness of the target distribution (quantified via Besov spaces) and key parameters of the ODE dynamics. Our analysis employs neural network approximations with carefully controlled depth, width, and sparsity to bound acceleration errors across both small and large time intervals, ultimately unifying these results into a general worst case optimal bound for all time steps.

Updated: 2025-03-12 05:07:07

标题: 生成式流中高阶轨迹细化的理论保证

摘要: 流匹配已经成为生成建模的一个强大框架，通过利用确定性的常微分方程（ODEs）而不是随机动力学，它在计算上比扩散模型具有优势。尽管先前的工作已经建立了在Wasserstein距离下标准流匹配的最坏情况最优性，但对于高阶流匹配的理论保证 - 它包含加速度项以改进样本轨迹 - 仍未被探索。在本文中，我们通过证明高阶流匹配作为分布估计器保持最坏情况最优性，填补了这一空白。我们导出了二阶流匹配的估计误差上限，展示了收敛速率多项式依赖于目标分布的平滑度（通过Besov空间量化）和ODE动力学的关键参数。我们的分析采用神经网络逼近，精心控制深度、宽度和稀疏性，以限制加速度误差在小和大时间间隔内，最终将这些结果统一为所有时间步长的一般最坏情况最优边界。

更新时间: 2025-03-12 05:07:07

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2503.09069v1

Probing Network Decisions: Capturing Uncertainties and Unveiling Vulnerabilities Without Label Information

To improve trust and transparency, it is crucial to be able to interpret the decisions of Deep Neural classifiers (DNNs). Instance-level examinations, such as attribution techniques, are commonly employed to interpret the model decisions. However, when interpreting misclassified decisions, human intervention may be required. Analyzing the attribu tions across each class within one instance can be particularly labor intensive and influenced by the bias of the human interpreter. In this paper, we present a novel framework to uncover the weakness of the classifier via counterfactual examples. A prober is introduced to learn the correctness of the classifier's decision in terms of binary code-hit or miss. It enables the creation of the counterfactual example concerning the prober's decision. We test the performance of our prober's misclassification detection and verify its effectiveness on the image classification benchmark datasets. Furthermore, by generating counterfactuals that penetrate the prober, we demonstrate that our framework effectively identifies vulnerabilities in the target classifier without relying on label information on the MNIST dataset.

Updated: 2025-03-12 05:05:58

标题: 探究网络决策：捕捉不确定性并揭示脆弱性，无需标签信息

摘要: 为了提高信任和透明度，能够解释深度神经分类器（DNNs）的决策至关重要。实例级别的检查，如归因技术，通常被用来解释模型的决策。然而，在解释错误分类的决策时，可能需要人工干预。分析每个实例中各个类别的归因可能特别耗时，并受到人类解释者的偏见影响。在本文中，我们提出了一个新颖的框架，通过反事实例来揭示分类器的弱点。引入了一个探针来学习分类器的决策的正确性，以二进制代码的击中或未命中来衡量。它能够创建与探针决策相关的反事实例。我们测试了我们探针的误分类检测性能，并验证了其在图像分类基准数据集上的有效性。此外，通过生成渗透探针的反事实，我们证明了我们的框架能够有效地识别目标分类器的漏洞，而无需依赖MNIST数据集上的标签信息。

更新时间: 2025-03-12 05:05:58

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.09068v1

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.

Updated: 2025-03-12 05:00:07

标题: Open-Sora 2.0：在20万美元内训练商业级视频生成模型

摘要: 视频生成模型在过去一年取得了显著进展。人工智能视频的质量持续提高，但代价是模型规模更大、数据量增加，以及对训练计算的更大需求。在这份报告中，我们呈现了Open-Sora 2.0，一个只需20万美元训练的商业级视频生成模型。通过这个模型，我们展示了训练顶尖视频生成模型的成本是可以高度可控的。我们详细介绍了所有有助于这一效率突破的技术，包括数据整理、模型架构、训练策略和系统优化。根据人类评估结果和VBench分数，Open-Sora 2.0与全球领先的视频生成模型（包括开源的HunyuanVideo和闭源的Runway Gen-3 Alpha）相媲美。通过将Open-Sora 2.0完全开源，我们旨在使更广泛的人群获得先进的视频生成技术，促进内容创作中更广泛的创新和创造力。所有资源均可在以下网址公开获取：https://github.com/hpcaitech/Open-Sora。

更新时间: 2025-03-12 05:00:07

领域: cs.GR,cs.AI

下载: http://arxiv.org/abs/2503.09642v1

Inductive Moment Matching

Diffusion models and Flow Matching generate high-quality samples but are slow at inference, and distilling them into few-step models often leads to instability and extensive tuning. To resolve these trade-offs, we propose Inductive Moment Matching (IMM), a new class of generative models for one- or few-step sampling with a single-stage training procedure. Unlike distillation, IMM does not require pre-training initialization and optimization of two networks; and unlike Consistency Models, IMM guarantees distribution-level convergence and remains stable under various hyperparameters and standard model architectures. IMM surpasses diffusion models on ImageNet-256x256 with 1.99 FID using only 8 inference steps and achieves state-of-the-art 2-step FID of 1.98 on CIFAR-10 for a model trained from scratch.

Updated: 2025-03-12 05:00:02

标题: 感应矩匹配

摘要: 扩散模型和流匹配产生高质量的样本，但在推理过程中速度较慢，并将它们蒸馏为几步模型往往会导致不稳定性和大量调整。为了解决这些折衷，我们提出了归纳矩匹配（IMM），这是一种新型的生成模型类，用于单步或几步采样，具有单阶段训练过程。不同于蒸馏，IMM不需要预训练初始化和优化两个网络；与一致性模型不同，IMM保证分布级收敛，并在各种超参数和标准模型架构下保持稳定。IMM在ImageNet-256x256上以1.99 FID仅使用8个推理步骤超越扩散模型，并在从头开始训练的模型在CIFAR-10上实现了1.98的最新2步FID。

更新时间: 2025-03-12 05:00:02

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2503.07565v3

Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they remain vulnerable to adversarial manipulations such as jailbreaking via prompt injection attacks. These attacks bypass safety mechanisms to generate restricted or harmful content. In this study, we investigated the underlying latent subspaces of safe and jailbroken states by extracting hidden activations from a LLM. Inspired by attractor dynamics in neuroscience, we hypothesized that LLM activations settle into semi stable states that can be identified and perturbed to induce state transitions. Using dimensionality reduction techniques, we projected activations from safe and jailbroken responses to reveal latent subspaces in lower dimensional spaces. We then derived a perturbation vector that when applied to safe representations, shifted the model towards a jailbreak state. Our results demonstrate that this causal intervention results in statistically significant jailbreak responses in a subset of prompts. Next, we probed how these perturbations propagate through the model's layers, testing whether the induced state change remains localized or cascades throughout the network. Our findings indicate that targeted perturbations induced distinct shifts in activations and model responses. Our approach paves the way for potential proactive defenses, shifting from traditional guardrail based methods to preemptive, model agnostic techniques that neutralize adversarial states at the representation level.

Updated: 2025-03-12 04:59:22

标题: 探究LLM中的潜在子空间以增强人工智能安全性：识别和操纵对抗状态

摘要: 大型语言模型（LLMs）在各种任务中展示出了卓越的能力，但它们仍然容易受到通过提示注入攻击等对抗性操纵的影响，例如越狱。这些攻击可以绕过安全机制，生成受限或有害内容。在这项研究中，我们通过从LLM中提取隐藏激活来研究安全状态和越狱状态的潜在潜在子空间。受神经科学中的吸引子动力学启发，我们假设LLM的激活会稳定到半稳态，可以识别并扰动以引起状态转换。通过降维技术，我们投影了来自安全和越狱响应的激活，以揭示较低维空间中的潜在子空间。然后，我们导出了一个扰动向量，当应用于安全表示时，将模型转向越狱状态。我们的结果表明，这种因果干预在一部分提示中导致了具有统计显著性的越狱响应。接下来，我们探究这些扰动如何通过模型的层传播，测试诱发的状态变化是否保持局部化或贯穿整个网络。我们的发现表明，有针对性的扰动导致激活和模型响应的明显变化。我们的方法为潜在的主动防御铺平了道路，从传统的基于护栏的方法转向预防性的、对模型无偏见的技术，以在表示级别中中和对抗性状态。

更新时间: 2025-03-12 04:59:22

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.09066v1

Chain of Thoughtlessness? An Analysis of CoT in Planning

Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution. Previous work has claimed that this can be mitigated with chain of thought prompting-a method of demonstrating solution procedures-with the intuition that it is possible to in-context teach an LLM an algorithm for solving the problem. This paper presents a case study of chain of thought on problems from Blocksworld, a classical planning domain, and examines the performance of two state-of-the-art LLMs across two axes: generality of examples given in prompt, and complexity of problems queried with each prompt. While our problems are very simple, we only find meaningful performance improvements from chain of thought prompts when those prompts are exceedingly specific to their problem class, and that those improvements quickly deteriorate as the size n of the query-specified stack grows past the size of stacks shown in the examples. We also create scalable variants of three domains commonly studied in previous CoT papers and demonstrate the existence of similar failure modes. Our results hint that, contrary to previous claims in the literature, CoT's performance improvements do not stem from the model learning general algorithmic procedures via demonstrations but depend on carefully engineering highly problem specific prompts. This spotlights drawbacks of chain of thought, especially the sharp tradeoff between possible performance gains and the amount of human labor necessary to generate examples with correct reasoning traces.

Updated: 2025-03-12 04:56:46

标题: 思维链的疏忽？对规划中思维链的分析

摘要: 大型语言模型（LLM）在推理问题上的表现通常不会在分布之外泛化。先前的工作声称可以通过思维链提示来缓解这一问题，这是一种展示解决方案过程的方法，其直觉是可以在上下文中教授LLM解决问题的算法。本文对来自Blocksworld的问题进行了思维链的案例研究，这是一个经典的规划领域，并检查了两种最先进的LLM在两个方面的表现：提示中给出的示例的一般性，以及使用每个提示查询的问题的复杂性。尽管我们的问题非常简单，但只有当这些提示与其问题类别极为特定时，我们才发现思维链提示带来了有意义的性能改进，并且这些改进在查询指定堆栈的大小n超过示例中显示的堆栈大小时很快恶化。我们还创建了三个领域的可扩展变体，这些领域在先前的CoT论文中经常被研究，并展示了类似的失败模式的存在。我们的结果暗示，与文献中先前的说法相反，CoT的性能改进并不源自模型通过演示学习通用的算法程序，而是依赖于精心设计的高度特定问题的提示。这突出了思维链的缺点，特别是可能性能增益和生成具有正确推理痕迹的示例所需的人力之间的尖锐权衡。

更新时间: 2025-03-12 04:56:46

领域: cs.AI

下载: http://arxiv.org/abs/2405.04776v3

Distributional Off-policy Evaluation with Bellman Residual Minimization

We study distributional off-policy evaluation (OPE), of which the goal is to learn the distribution of the return for a target policy using offline data generated by a different policy. The theoretical foundation of many existing work relies on the supremum-extended statistical distances such as supremum-Wasserstein distance, which are hard to estimate. In contrast, we study the more manageable expectation-extended statistical distances and provide a novel theoretical justification on their validity for learning the return distribution. Based on this attractive property, we propose a new method called Energy Bellman Residual Minimizer (EBRM) for distributional OPE. We provide corresponding in-depth theoretical analyses. We establish a finite-sample error bound for the EBRM estimator under the realizability assumption. Furthermore, we introduce a variant of our method based on a multi-step extension which improves the error bound for non-realizable settings. Notably, unlike prior distributional OPE methods, the theoretical guarantees of our method do not require the completeness assumption.

Updated: 2025-03-12 04:52:57

标题: 使用贝尔曼残差最小化的分布式离线策略评估

摘要: 我们研究分布式离线策略评估（OPE），其目标是利用由不同策略生成的离线数据学习目标策略的回报分布。许多现有工作的理论基础依赖于最高扩展的统计距离，如最高Wasserstein距离，这些距离难以估计。相反，我们研究了更易处理的期望扩展统计距离，并提供了关于它们在学习回报分布方面有效性的新理论证明。基于这一有吸引力的特性，我们提出了一种新方法，称为Energy Bellman Residual Minimizer（EBRM）用于分布式OPE。我们提供了相应的深入理论分析。我们建立了在可实现假设下对EBRM估计器的有限样本误差界。此外，我们介绍了一个基于多步扩展的方法变体，可以改善非可实现设置的误差界。值得注意的是，与先前的分布式OPE方法不同，我们方法的理论保证不需要完备性假设。

更新时间: 2025-03-12 04:52:57

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.01900v3

Implicit Contrastive Representation Learning with Guided Stop-gradient

In self-supervised representation learning, Siamese networks are a natural architecture for learning transformation-invariance by bringing representations of positive pairs closer together. But it is prone to collapse into a degenerate solution. To address the issue, in contrastive learning, a contrastive loss is used to prevent collapse by moving representations of negative pairs away from each other. But it is known that algorithms with negative sampling are not robust to a reduction in the number of negative samples. So, on the other hand, there are algorithms that do not use negative pairs. Many positive-only algorithms adopt asymmetric network architecture consisting of source and target encoders as a key factor in coping with collapse. By exploiting the asymmetric architecture, we introduce a methodology to implicitly incorporate the idea of contrastive learning. As its implementation, we present a novel method guided stop-gradient. We apply our method to benchmark algorithms SimSiam and BYOL and show that our method stabilizes training and boosts performance. We also show that the algorithms with our method work well with small batch sizes and do not collapse even when there is no predictor. The code is available at https://github.com/bych-lee/gsg.

Updated: 2025-03-12 04:46:53

标题: 隐式对比表示学习与引导停梯度

摘要: 在自监督表示学习中，孪生网络是通过将正样本的表示拉近来学习变换不变性的自然架构。但是它容易崩溃为退化解。为了解决这个问题，在对比学习中，使用对比损失来防止崩溃，通过将负样本的表示分开。但是已知具有负采样的算法对负样本数量的减少不具有鲁棒性。另一方面，有一些不使用负对的算法。许多仅使用正对的算法采用了包含源和目标编码器的不对称网络架构作为应对崩溃的关键因素。通过利用不对称架构，我们引入了一种隐式融入对比学习思想的方法。作为其实现，我们提出了一种新颖的方法 guided stop-gradient。我们将我们的方法应用于基准算法 SimSiam 和 BYOL，并展示我们的方法稳定训练并提高性能。我们还展示了我们的方法与小批量大小一起工作良好，并且即使没有预测器，也不会崩溃。代码可在 https://github.com/bych-lee/gsg 上找到。

更新时间: 2025-03-12 04:46:53

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.09058v1

Detect, Investigate, Judge and Determine: A Knowledge-guided Framework for Few-shot Fake News Detection

Few-Shot Fake News Detection (FS-FND) aims to distinguish inaccurate news from real ones in extremely low-resource scenarios. This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media. Large Language Models (LLMs) have demonstrated competitive performance with the help of their rich prior knowledge and excellent in-context learning abilities. However, existing methods face significant limitations, such as the Understanding Ambiguity and Information Scarcity, which significantly undermine the potential of LLMs. To address these shortcomings, we propose a Dual-perspective Knowledge-guided Fake News Detection (DKFND) model, designed to enhance LLMs from both inside and outside perspectives. Specifically, DKFND first identifies the knowledge concepts of each news article through a Detection Module. Subsequently, DKFND creatively designs an Investigation Module to retrieve inside and outside valuable information concerning to the current news, followed by another Judge Module to evaluate the relevance and confidence of them. Finally, a Determination Module further derives two respective predictions and obtain the final result. Extensive experiments on two public datasets show the efficacy of our proposed method, particularly in low-resource settings.

Updated: 2025-03-12 04:46:47

标题: 检测、调查、判断和确定：一种基于知识指导的少样本假新闻检测框架

摘要: Few-Shot Fake News Detection (FS-FND)旨在在极低资源情景中区分不准确的新闻和真实的新闻。由于虚假新闻在社交媒体上的广泛传播和有害影响，这一任务已经引起了更多关注。大型语言模型(LLMs)凭借其丰富的先验知识和出色的上下文学习能力已经展现出竞争力。然而，现有方法存在重大限制，例如理解模糊性和信息稀缺性，这些限制显著削弱了LLMs的潜力。为了解决这些缺点，我们提出了一个双视角知识引导的虚假新闻检测(DKFND)模型，旨在从内外两个角度增强LLMs。具体而言，DKFND首先通过一个检测模块识别每篇新闻文章的知识概念。随后，DKFND创造性地设计了一个调查模块，用于检索与当前新闻相关的内部和外部有价值的信息，然后通过另一个评判模块评估它们的相关性和可信度。最后，一个决策模块进一步得出两个相应的预测并获得最终结果。对两个公共数据集的大量实验证明了我们提出的方法的有效性，特别是在低资源环境中。

更新时间: 2025-03-12 04:46:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.08952v5

Overcoming Data and Model Heterogeneities in Decentralized Federated Learning via Synthetic Anchors

Conventional Federated Learning (FL) involves collaborative training of a global model while maintaining user data privacy. One of its branches, decentralized FL, is a serverless network that allows clients to own and optimize different local models separately, which results in saving management and communication resources. Despite the promising advancements in decentralized FL, it may reduce model generalizability due to lacking a global model. In this scenario, managing data and model heterogeneity among clients becomes a crucial problem, which poses a unique challenge that must be overcome: How can every client's local model learn generalizable representation in a decentralized manner? To address this challenge, we propose a novel Decentralized FL technique by introducing Synthetic Anchors, dubbed as DeSA. Based on the theory of domain adaptation and Knowledge Distillation (KD), we theoretically and empirically show that synthesizing global anchors based on raw data distribution facilitates mutual knowledge transfer. We further design two effective regularization terms for local training: 1) REG loss that regularizes the distribution of the client's latent embedding with the anchors and 2) KD loss that enables clients to learn from others. Through extensive experiments on diverse client data distributions, we showcase the effectiveness of DeSA in enhancing both inter- and intra-domain accuracy of each client.

Updated: 2025-03-12 04:39:54

标题: 通过合成锚点克服去中心化联邦学习中的数据和模型异质性

摘要: 传统的联邦学习（FL）涉及协作训练全局模型，同时保护用户数据隐私。其中的一个分支，分散式FL，是一个无服务器网络，允许客户端分别拥有和优化不同的本地模型，从而节省管理和通信资源。尽管分散式FL取得了令人期待的进展，但由于缺乏全局模型，可能会降低模型的泛化能力。在这种情况下，管理客户端之间的数据和模型异质性成为一个关键问题，提出了一个必须克服的独特挑战：如何使每个客户端的本地模型以分散的方式学习可泛化的表示？为了解决这一挑战，我们提出了一种新颖的分散式FL技术，引入了合成锚点，称为DeSA。基于领域自适应和知识蒸馏（KD）的理论，我们在理论和实证上展示了基于原始数据分布合成全局锚点有助于相互知识传递。我们进一步设计了两个有效的正则化项用于本地训练：1）REG损失，通过将客户端的潜在嵌入与锚点正则化，2）KD损失，使客户端能够从其他客户端学习。通过广泛的实验，展示了DeSA在增强每个客户端的跨域和内域准确性方面的有效性。

更新时间: 2025-03-12 04:39:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.11525v2

Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?

The rapid rise of Language Models (LMs) has expanded their use in several applications. Yet, due to constraints of model size, associated cost, or proprietary restrictions, utilizing state-of-the-art (SOTA) LLMs is not always feasible. With open, smaller LMs emerging, more applications can leverage their capabilities, but selecting the right LM can be challenging as smaller LMs do not perform well universally. This work tries to bridge this gap by proposing a framework to experimentally evaluate small, open LMs in practical settings through measuring semantic correctness of outputs across three practical aspects: task types, application domains, and reasoning types, using diverse prompt styles. It also conducts an in-depth comparison of 10 small, open LMs to identify the best LM and prompt style depending on specific application requirements using the proposed framework. We also show that if selected appropriately, they can outperform SOTA LLMs like DeepSeek-v2, GPT-4o, GPT-4o-mini, Gemini-1.5-Pro, and even compete with GPT-4o.

Updated: 2025-03-12 04:37:42

标题: 小型语言模型准备好与大型语言模型在实际应用中竞争吗？

摘要: 语言模型（LMs）的快速崛起扩大了它们在几个应用领域的使用。然而，由于模型大小的限制、相关成本或专有限制，利用最先进的（SOTA）LLMs并不总是可行的。随着开放、较小的LMs出现，更多应用可以利用它们的能力，但选择合适的LM可能具有挑战性，因为较小的LMs在所有情况下表现不佳。这项工作试图通过提出一个框架，在实际环境中实验评估小型、开放的LMs，通过测量输出的语义正确性来跨越这一差距，涵盖三个实际方面：任务类型、应用领域和推理类型，使用多样的提示样式。它还对10个小型、开放的LMs进行深入比较，以确定根据具体应用要求选择最佳的LM和提示样式，使用提出的框架。我们还展示了如果适当选择，它们可以胜过像DeepSeek-v2、GPT-4o、GPT-4o-mini、Gemini-1.5-Pro等SOTA LLMs，甚至与GPT-4o竞争。

更新时间: 2025-03-12 04:37:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.11402v3

TreeX: Generating Global Graphical GNN Explanations via Critical Subtree Extraction

The growing demand for transparency and interpretability in critical domains has driven increased interests in comprehending the explainability of Message-Passing (MP) Graph Neural Networks (GNNs). Although substantial research efforts have been made to generate explanations for individual graph instances, identifying global explaining concepts for a GNN still poses great challenges, especially when concepts are desired in a graphical form on the dataset level. While most prior works treat GNNs as black boxes, in this paper, we propose to unbox GNNs by analyzing and extracting critical subtrees incurred by the inner workings of message passing, which correspond to critical subgraphs in the datasets. By aggregating subtrees in an embedding space with an efficient algorithm, which does not require complex subgraph matching or search, we can make intuitive graphical explanations for Message-Passing GNNs on local, class and global levels. We empirically show that our proposed approach not only generates clean subgraph concepts on a dataset level in contrast to existing global explaining methods which generate non-graphical rules (e.g., language or embeddings) as explanations, but it is also capable of providing explanations for individual instances with a comparable or even superior performance as compared to leading local-level GNN explainers.

Updated: 2025-03-12 04:36:28

标题: TreeX：通过关键子树提取生成全局图形GNN解释

摘要: 对于关键领域中透明度和可解释性需求日益增长，促使人们对消息传递（MP）图神经网络（GNNs）的可解释性产生了更大兴趣。尽管已经进行了大量研究工作来为单个图实例生成解释，但对于GNNs来说，特别是在数据集级别上希望以图形形式呈现概念时，仍然存在巨大挑战。尽管大多数先前的工作将GNNs视为黑匣子，在本文中，我们提出通过分析和提取由消息传递内部机制引起的关键子树来解开GNNs的黑匣子，这些子树对应于数据集中的关键子图。通过使用高效算法在嵌入空间中聚合子树，而无需复杂的子图匹配或搜索，我们可以为消息传递GNNs在本地、类别和全局层面上提供直观的图形解释。我们从实证上证明，与现有的生成非图形规则（如语言或嵌入）作为解释的全局解释方法相比，我们提出的方法不仅在数据集级别生成清晰的子图概念，而且还能够为单个实例提供解释，并且与领先的本地级别GNN解释器相比具有相当甚至更好的性能。

更新时间: 2025-03-12 04:36:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.09051v1

Adaptive Backdoor Attacks with Reasonable Constraints on Graph Neural Networks

Recent studies show that graph neural networks (GNNs) are vulnerable to backdoor attacks. Existing backdoor attacks against GNNs use fixed-pattern triggers and lack reasonable trigger constraints, overlooking individual graph characteristics and rendering insufficient evasiveness. To tackle the above issues, we propose ABARC, the first Adaptive Backdoor Attack with Reasonable Constraints, applying to both graph-level and node-level tasks in GNNs. For graph-level tasks, we propose a subgraph backdoor attack independent of the graph's topology. It dynamically selects trigger nodes for each target graph and modifies node features with constraints based on graph similarity, feature range, and feature type. For node-level tasks, our attack begins with an analysis of node features, followed by selecting and modifying trigger features, which are then constrained by node similarity, feature range, and feature type. Furthermore, an adaptive edge-pruning mechanism is designed to reduce the impact of neighbors on target nodes, ensuring a high attack success rate (ASR). Experimental results show that even with reasonable constraints for attack evasiveness, our attack achieves a high ASR while incurring a marginal clean accuracy drop (CAD). When combined with the state-of-the-art defense randomized smoothing (RS) method, our attack maintains an ASR over 94%, surpassing existing attacks by more than 7%.

Updated: 2025-03-12 04:23:10

标题: 具有合理约束的图神经网络的自适应后门攻击

摘要: 最近的研究表明，图神经网络（GNNs）容易受到后门攻击。现有的针对GNNs的后门攻击使用固定模式触发器，缺乏合理的触发器约束，忽视了个体图特征，并导致躲避性不足。为了解决上述问题，我们提出了ABARC，第一个具有合理约束的自适应后门攻击，适用于GNNs中的图级和节点级任务。对于图级任务，我们提出了一个与图的拓扑无关的子图后门攻击。它动态选择每个目标图的触发节点，并根据图相似性、特征范围和特征类型进行约束修改节点特征。对于节点级任务，我们的攻击从分析节点特征开始，然后选择和修改触发特征，然后通过节点相似性、特征范围和特征类型进行约束。此外，设计了一种自适应边修剪机制，以减少邻居对目标节点的影响，确保高攻击成功率（ASR）。实验结果显示，即使对攻击躲避性进行了合理约束，我们的攻击仍然可以在产生较小的净准确率下降（CAD）的情况下实现高ASR。当与最先进的防御方法随机平滑（RS）方法结合时，我们的攻击保持在94%以上的ASR，超过现有攻击超过7%。

更新时间: 2025-03-12 04:23:10

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2503.09049v1

Performance Evaluation of Threshold Signing Schemes in Cryptography

Threshold Signature Scheme (TSS) protocols have gained significant attention over the past ten years due to their widespread adoption in cryptocurrencies. The adoption is mainly boosted by Gennaro and Goldfedder's TSS protocol. Since then, various TSS protocols have been introduced with different features, such as security and performance, etc. Large organizations are using TSS protocols to protect many digital assets, such as cryptocurrency. However, the adoption of these TSS protocols requires an understanding of state-of-the-art research in threshold signing. This study describes the holistic view of TSS protocols, evaluates cutting-edge TSS protocols, highlights their characteristics, and compares them in terms of security and performance. The evaluation of these TSS protocols will help the researchers address real-world problems by considering the relevant merits of different TSS protocols.

Updated: 2025-03-12 04:17:58

标题: 密码学中阈值签名方案的性能评估

摘要: 阈值签名方案（TSS）协议由于在过去十年中在加密货币中被广泛采用而引起了广泛关注。这种采用主要受到Gennaro和Goldfedder的TSS协议的推动。自那时以来，已经引入了各种具有不同特点（如安全性和性能等）的TSS协议。大型组织正在使用TSS协议来保护许多数字资产，如加密货币。然而，采用这些TSS协议需要理解阈值签名的最新研究。本研究描述了TSS协议的整体视图，评估了最前沿的TSS协议，突出了它们的特点，并从安全性和性能的角度进行比较。对这些TSS协议的评估将帮助研究人员通过考虑不同TSS协议的相关优点来解决现实世界的问题。

更新时间: 2025-03-12 04:17:58

领域: cs.CR

下载: http://arxiv.org/abs/2503.09047v1

Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition

In this work, we investigate how a model's tendency to broadly integrate its parametric knowledge evolves throughout pretraining, and how this behavior affects overall performance, particularly in terms of knowledge acquisition and forgetting. We introduce the concept of knowledge entropy, which quantifies the range of memory sources the model engages with; high knowledge entropy indicates that the model utilizes a wide range of memory sources, while low knowledge entropy suggests reliance on specific sources with greater certainty. Our analysis reveals a consistent decline in knowledge entropy as pretraining advances. We also find that the decline is closely associated with a reduction in the model's ability to acquire and retain knowledge, leading us to conclude that diminishing knowledge entropy (smaller number of active memory sources) impairs the model's knowledge acquisition and retention capabilities. We find further support for this by demonstrating that increasing the activity of inactive memory sources enhances the model's capacity for knowledge acquisition and retention.

Updated: 2025-03-12 04:17:41

标题: 语言模型预训练过程中的知识熵衰减阻碍新知识的获取

摘要: 在这项工作中，我们研究了模型在预训练过程中如何倾向于广泛整合其参数知识，以及这种行为如何影响整体性能，特别是在知识获取和遗忘方面。我们引入了知识熵的概念，用来量化模型涉及的记忆来源范围；高知识熵表明模型利用了广泛的记忆来源，而低知识熵则表明依赖于特定来源并具有更大的确定性。我们的分析显示，在预训练不断进行的过程中，知识熵呈现出一致性下降的趋势。我们还发现，这种下降与模型获取和保留知识能力的减弱密切相关，从而导致我们得出结论：知识熵的减少（活跃记忆来源数量较少）会损害模型的知识获取和保留能力。我们进一步证实了这一点，通过展示增加非活跃记忆来源的活动性可以增强模型的知识获取和保留能力。

更新时间: 2025-03-12 04:17:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.01380v3

Discovering Influential Neuron Path in Vision Transformers

Vision Transformer models exhibit immense power yet remain opaque to human understanding, posing challenges and risks for practical applications. While prior research has attempted to demystify these models through input attribution and neuron role analysis, there's been a notable gap in considering layer-level information and the holistic path of information flow across layers. In this paper, we investigate the significance of influential neuron paths within vision Transformers, which is a path of neurons from the model input to output that impacts the model inference most significantly. We first propose a joint influence measure to assess the contribution of a set of neurons to the model outcome. And we further provide a layer-progressive neuron locating approach that efficiently selects the most influential neuron at each layer trying to discover the crucial neuron path from input to output within the target model. Our experiments demonstrate the superiority of our method finding the most influential neuron path along which the information flows, over the existing baseline solutions. Additionally, the neuron paths have illustrated that vision Transformers exhibit some specific inner working mechanism for processing the visual information within the same image category. We further analyze the key effects of these neurons on the image classification task, showcasing that the found neuron paths have already preserved the model capability on downstream tasks, which may also shed some lights on real-world applications like model pruning. The project website including implementation code is available at https://foundation-model-research.github.io/NeuronPath/.

Updated: 2025-03-12 04:10:46

标题: 在视觉Transformer中发现有影响力的神经元路径

摘要: 视觉Transformer模型展示了巨大的能力，但对人类理解仍然不透明，这给实际应用带来了挑战和风险。尽管先前的研究已尝试通过输入归因和神经元角色分析来揭示这些模型，但在考虑层级信息和跨层信息流的整体路径时存在明显的空白。在本文中，我们研究了视觉Transformer中具有影响力的神经元路径的重要性，这是从模型输入到输出的一组神经元路径，对模型推断产生最显著影响。我们首先提出了一个联合影响度量来评估一组神经元对模型结果的贡献。我们进一步提供了一个逐层递进的神经元定位方法，有效地选择每一层中最具影响力的神经元，试图发现从输入到输出的关键神经元路径在目标模型内的路径。我们的实验表明，我们的方法在找到信息流动的最具影响力的神经元路径方面优于现有的基准解决方案。此外，神经元路径表明，视觉Transformer展示了一些特定的内部工作机制，用于处理同一图像类别中的视觉信息。我们进一步分析了这些神经元对图像分类任务的关键影响，展示了发现的神经元路径已经保留了模型在下游任务上的能力，这也可能为模型剪枝等实际应用提供一些启示。项目网站包括实现代码，请访问https://foundation-model-research.github.io/NeuronPath/。

更新时间: 2025-03-12 04:10:46

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.09046v1

ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development

Checkpointing to preserve training states is crucial during the development of Large Foundation Models (LFMs), for training resumption upon various failures or changes in GPU resources and parallelism configurations. In addition, saved checkpoints are dispatched to evaluation tasks or transferred across different training stages (e.g., from pre-training to post-training). All these scenarios require resharding distributed checkpoints from one parallelism to another. In production environments, different LFMs are trained with various frameworks and storage backends, depending on model sizes and training scales. A high-performance checkpointing system is needed to enable efficient checkpoint management at scale throughout the lifecycle of LFM development. We introduce ByteCheckpoint, an industrial-grade checkpointing system for large-scale LFM training. ByteCheckpoint features: a parallelism-agnostic checkpoint representation that enables efficient load-time checkpoint resharding; a generic checkpoint saving/loading workflow to accommodate multiple training frameworks and support different storage backends; full-stack optimizations to ensure high I/O efficiency and scalability; a suite of monitoring tools to streamline large-scale performance analysis and bottleneck detection. Compared to existing open-source checkpointing systems [52, 58], ByteCheckpoint significantly reduces runtime checkpoint stalls, achieving an average reduction of 54.20x. For saving and loading times, ByteCheckpoint achieves improvements of up to 9.96x and 8.80x, respectively.

Updated: 2025-03-12 04:10:33

标题: ByteCheckpoint：用于大型基础模型开发的统一检查点系统

摘要: 检查点技术在开发大型基础模型（LFM）过程中至关重要，用于在各种故障或GPU资源和并行配置更改时恢复训练状态。此外，保存的检查点被分派给评估任务或在不同训练阶段之间传输（例如，从预训练到后训练）。所有这些场景都需要将分布式检查点从一个并行性重新分配到另一个。在生产环境中，根据模型大小和训练规模，不同的LFM使用不同的框架和存储后端进行训练。需要一个高性能的检查点系统，以实现在LFM开发生命周期中的规模化高效检查点管理。我们介绍了ByteCheckpoint，一种用于大规模LFM训练的工业级检查点系统。ByteCheckpoint具有以下特点：一种并行性不可知的检查点表示形式，可实现高效的加载时检查点重新分配；通用的检查点保存/加载工作流程，以适应多个训练框架并支持不同的存储后端；全栈优化，确保高I/O效率和可伸缩性；一套监控工具，以简化大规模性能分析和瓶颈检测。与现有的开源检查点系统相比[52, 58]，ByteCheckpoint显著减少了运行时检查点停顿，平均减少了54.20倍。对于保存和加载时间，ByteCheckpoint分别实现了高达9.96倍和8.80倍的改进。

更新时间: 2025-03-12 04:10:33

领域: cs.AI

下载: http://arxiv.org/abs/2407.20143v3

A Hybrid Neural Network with Smart Skip Connections for High-Precision, Low-Latency EMG-Based Hand Gesture Recognition

Electromyography (EMG) is extensively used in key biomedical areas, such as prosthetics, and assistive and interactive technologies. This paper presents a new hybrid neural network named ConSGruNet for precise and efficient hand gesture recognition. The proposed model comprises convolutional neural networks with smart skip connections in conjunction with a Gated Recurrent Unit (GRU). The proposed model is trained on the complete Ninapro DB1 dataset. The proposed model boasts an accuracy of 99.7\% in classifying 53 classes in just 25 milliseconds. In addition to being fast, the proposed model is lightweight with just 3,946 KB in size. Moreover, the proposed model has also been evaluated for the reliability parameters, i.e., Cohen's kappa coefficient, Matthew's correlation coefficient, and confidence intervals. The close to ideal results of these parameters validate the models performance on unseen data.

Updated: 2025-03-12 04:01:32

标题: 一个具有智能跳跃连接的混合神经网络用于高精度、低延迟的基于EMG的手势识别

摘要: 肌电图（EMG）广泛应用于关键生物医学领域，如假肢和辅助以及交互技术。本文介绍了一种名为ConSGruNet的新型混合神经网络，用于精确和高效的手势识别。所提出的模型包括卷积神经网络与智能跳跃连接，结合门控循环单元（GRU）。该模型在完整的Ninapro DB1数据集上进行训练。所提出的模型在仅25毫秒内对53个类进行分类准确率达到99.7％。除了速度快外，所提出的模型还具有轻量级特性，仅大小为3,946 KB。此外，所提出的模型还针对可靠性参数进行了评估，即Cohen的卡帕系数、Matthew的相关系数和置信区间。这些参数的接近理想结果验证了模型在未知数据上的性能。

更新时间: 2025-03-12 04:01:32

领域: cs.CR

下载: http://arxiv.org/abs/2503.09041v1

A method for classification of data with uncertainty using hypothesis testing

Binary classification is a task that involves the classification of data into one of two distinct classes. It is widely utilized in various fields. However, conventional classifiers tend to make overconfident predictions for data that belong to overlapping regions of the two class distributions or for data outside the distributions (out-of-distribution data). Therefore, conventional classifiers should not be applied in high-risk fields where classification results can have significant consequences. In order to address this issue, it is necessary to quantify uncertainty and adopt decision-making approaches that take it into account. Many methods have been proposed for this purpose; however, implementing these methods often requires performing resampling, improving the structure or performance of models, and optimizing the thresholds of classifiers. We propose a new decision-making approach using two types of hypothesis testing. This method is capable of detecting ambiguous data that belong to the overlapping regions of two class distributions, as well as out-of-distribution data that are not included in the training data distribution. In addition, we quantify uncertainty using the empirical distribution of feature values derived from the training data obtained through the trained model. The classification threshold is determined by the $\alpha$-quantile and ($1-\alpha$)-quantile, where the significance level $\alpha$ is set according to each specific situation.

Updated: 2025-03-12 03:58:43

标题: 一种使用假设检验对具有不确定性数据进行分类的方法

摘要: 二分类是一个任务，涉及将数据分类为两个不同的类别之一。它被广泛应用于各个领域。然而，传统的分类器往往会对属于两个类别分布重叠区域的数据或分布之外的数据（分布之外的数据）做出过于自信的预测。因此，在分类结果可能产生重大后果的高风险领域中，不应该应用传统的分类器。为了解决这个问题，有必要量化不确定性并采用考虑到不确定性的决策方法。已经提出了许多方法来解决这个问题；然而，实施这些方法通常需要进行重采样、改进模型的结构或性能，并优化分类器的阈值。我们提出了一种使用两种类型的假设检验的新的决策方法。这种方法能够检测属于两个类别分布重叠区域的模糊数据，以及不包含在训练数据分布中的分布之外的数据。此外，我们使用通过训练模型获得的训练数据的特征值的经验分布来量化不确定性。分类阈值由$\alpha$-分位数和($1-\alpha$)-分位数确定，其中显著性水平$\alpha$根据具体情况设置。

更新时间: 2025-03-12 03:58:43

领域: cs.LG

下载: http://arxiv.org/abs/2502.08582v2

Image Encryption Using DNA Encoding, Snake Permutation and Chaotic Substitution Techniques

Securing image data in IoT networks and other insecure information channels is a matter of critical concern. This paper presents a new image encryption scheme using DNA encoding, snake permutation and chaotic substitution techniques that ensures robust security of the image data with reduced computational overhead. The DNA encoding and snake permutation modules ensure effective scrambling of the pixels and result in efficient diffusion in the plaintext image. For the confusion part, the chaotic substitution technique is implemented, which substitutes the pixel values chosen randomly from 3 S-boxes. Extensive security analysis validate the efficacy of the image encryption algorithm proposed in this paper and results demonstrate that the encrypted images have an ideal information entropy of 7.9895 and an almost zero correlation coefficient of -0.001660. These results indicate a high degree of randomness and no correlation in the encrypted image.

Updated: 2025-03-12 03:54:37

标题: 使用DNA编码、蛇置换和混沌替代技术的图像加密

摘要: 在物联网网络和其他不安全的信息通道中保护图像数据是一个至关重要的问题。本文提出了一种使用DNA编码、蛇形置换和混沌替代技术的新图像加密方案，以确保图像数据的强大安全性并降低计算开销。DNA编码和蛇形置换模块确保了像素的有效混淆，并在明文图像中实现了有效的扩散。对于混淆部分，实施了混沌替代技术，从3个S盒中随机选择像素值进行替换。广泛的安全性分析验证了本文提出的图像加密算法的有效性，结果表明加密后的图像具有理想的信息熵为7.9895，几乎零的相关系数为-0.001660。这些结果表明加密图像具有高度的随机性和无相关性。

更新时间: 2025-03-12 03:54:37

领域: cs.CR

下载: http://arxiv.org/abs/2503.09038v1

Balancing Content Size in RAG-Text2SQL System

Large Language Models (LLMs) have emerged as a promising solution for converting natural language queries into SQL commands, enabling seamless database interaction. However, these Text-to-SQL (Text2SQL) systems face inherent limitations, hallucinations, outdated knowledge, and untraceable reasoning. To address these challenges, the integration of retrieval-augmented generation (RAG) with Text2SQL models has gained traction. RAG serves as a retrieval mechanism, providing essential contextual information, such as table schemas and metadata, to enhance the query generation process. Despite their potential, RAG + Text2SQL systems are susceptible to the quality and size of retrieved documents. While richer document content can improve schema relevance and retrieval accuracy, it also introduces noise, increasing the risk of hallucinations and reducing query fidelity as the prompt size of the Text2SQL model increases. This research investigates the nuanced trade-off between document size and quality, aiming to strike a balance that optimizes system performance. Key thresholds are identified where performance degradation occurs, along with actionable strategies to mitigate these challenges. Additionally, we explore the phenomenon of hallucinations in Text2SQL models, emphasizing the critical role of curated document presentation in minimizing errors. Our findings provide a roadmap for enhancing the robustness of RAG + Text2SQL systems, offering practical insights for real-world applications.

Updated: 2025-03-12 03:53:50

标题: 在RAG-Text2SQL系统中平衡内容大小

摘要: 大型语言模型（LLMs）已经成为将自然语言查询转换为SQL命令的有希望的解决方案，实现了无缝的数据库交互。然而，这些文本到SQL（Text2SQL）系统面临固有的限制，幻觉，过时的知识和无法追踪的推理。为了解决这些挑战，检索增强生成（RAG）与Text2SQL模型的整合已经引起了关注。RAG作为检索机制，提供了必要的上下文信息，如表模式和元数据，以增强查询生成过程。尽管具有潜力，RAG + Text2SQL系统容易受到检索文档的质量和大小的影响。虽然更丰富的文档内容可以提高模式相关性和检索精度，但也会引入噪音，增加幻觉的风险，并随着Text2SQL模型的提示大小增加，降低查询的准确性。本研究探讨了文档大小和质量之间微妙的权衡，旨在找到一个优化系统性能的平衡点。确定了性能下降的关键阈值，以及缓解这些挑战的可行策略。此外，我们探讨了Text2SQL模型中幻觉现象，强调了精心策划文档呈现在减少错误中的关键作用。我们的研究结果为增强RAG + Text2SQL系统的鲁棒性提供了一个路线图，为实际应用提供了实用见解。

更新时间: 2025-03-12 03:53:50

领域: cs.IR,cs.AI,cs.DB

下载: http://arxiv.org/abs/2502.15723v2

ManeuverGPT Agentic Control for Safe Autonomous Stunt Maneuvers

The next generation of active safety features in autonomous vehicles should be capable of safely executing evasive hazard-avoidance maneuvers akin to those performed by professional stunt drivers to achieve high-agility motion at the limits of vehicle handling. This paper presents a novel framework, ManeuverGPT, for generating and executing high-dynamic stunt maneuvers in autonomous vehicles using large language model (LLM)-based agents as controllers. We target aggressive maneuvers, such as J-turns, within the CARLA simulation environment and demonstrate an iterative, prompt-based approach to refine vehicle control parameters, starting tabula rasa without retraining model weights. We propose an agentic architecture comprised of three specialized agents (1) a Query Enricher Agent for contextualizing user commands, (2) a Driver Agent for generating maneuver parameters, and (3) a Parameter Validator Agent that enforces physics-based and safety constraints. Experimental results demonstrate successful J-turn execution across multiple vehicle models through textual prompts that adapt to differing vehicle dynamics. We evaluate performance via established success criteria and discuss limitations regarding numeric precision and scenario complexity. Our findings underscore the potential of LLM-driven control for flexible, high-dynamic maneuvers, while highlighting the importance of hybrid approaches that combine language-based reasoning with algorithmic validation.

Updated: 2025-03-12 03:51:41

标题: ManeuverGPT：用于安全自主特技机动的控制方案

摘要: 自动驾驶车辆中下一代主动安全功能应能够安全地执行规避危险避让动作，类似于专业特技驾驶员执行的高敏捷度动作，达到车辆处理极限的目的。本文提出了一个新颖的框架ManeuverGPT，用于利用基于大型语言模型（LLM）的代理作为控制器，在自动驾驶车辆中生成和执行高动态特技动作。我们针对CARLA模拟环境中的激进动作，如J转弯，展示了一种迭代的、基于提示的方法，用于优化车辆控制参数，从零开始而无需重新训练模型权重。我们提出了一个包含三个专门代理的代理架构：（1）用于上下文化用户命令的查询丰富代理，（2）用于生成动作参数的驾驶代理，以及（3）强制执行基于物理和安全约束的参数验证代理。实验结果表明，通过适应不同车辆动态的文本提示，成功执行了多种车型的J转弯。我们通过已建立的成功标准评估性能，并讨论了数字精度和场景复杂性方面的限制。我们的发现强调了LLM驱动控制对于灵活、高动态特技动作的潜力，同时强调了将基于语言推理与算法验证相结合的混合方法的重要性。

更新时间: 2025-03-12 03:51:41

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.09035v1

Axiomatic Explainer Globalness via Optimal Transport

Explainability methods are often challenging to evaluate and compare. With a multitude of explainers available, practitioners must often compare and select explainers based on quantitative evaluation metrics. One particular differentiator between explainers is the diversity of explanations for a given dataset; i.e. whether all explanations are identical, unique and uniformly distributed, or somewhere between these two extremes. In this work, we define a complexity measure for explainers, globalness, which enables deeper understanding of the distribution of explanations produced by feature attribution and feature selection methods for a given dataset. We establish the axiomatic properties that any such measure should possess and prove that our proposed measure, Wasserstein Globalness, meets these criteria. We validate the utility of Wasserstein Globalness using image, tabular, and synthetic datasets, empirically showing that it both facilitates meaningful comparison between explainers and improves the selection process for explainability methods.

Updated: 2025-03-12 03:46:50

标题: 通过最优输运公理解释全球性

摘要: 解释性方法通常很难评估和比较。由于存在大量的解释器，从业者通常必须基于定量评估指标来比较和选择解释器。解释器之间的一个特定区别是针对给定数据集的解释的多样性；即所有解释是否相同、独特且均匀分布，或者介于这两个极端之间。在这项工作中，我们定义了一个解释器的复杂度度量，全局性，它可以更深入地了解针对给定数据集生成的特征归因和特征选择方法所产生的解释的分布。我们建立了任何这种度量应具备的公理属性，并证明我们提出的度量，Wasserstein全局性，满足这些标准。我们通过使用图像、表格和合成数据集验证了Wasserstein全局性的实用性，从经验上显示它既促进了解释器之间的有意义比较，又改善了解释性方法的选择过程。

更新时间: 2025-03-12 03:46:50

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.01126v2

RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification

In this paper, we propose RFUAV as a new benchmark dataset for radio-frequency based (RF-based) unmanned aerial vehicle (UAV) identification and address the following challenges: Firstly, many existing datasets feature a restricted variety of drone types and insufficient volumes of raw data, which fail to meet the demands of practical applications. Secondly, existing datasets often lack raw data covering a broad range of signal-to-noise ratios (SNR), or do not provide tools for transforming raw data to different SNR levels. This limitation undermines the validity of model training and evaluation. Lastly, many existing datasets do not offer open-access evaluation tools, leading to a lack of unified evaluation standards in current research within this field. RFUAV comprises approximately 1.3 TB of raw frequency data collected from 37 distinct UAVs using the Universal Software Radio Peripheral (USRP) device in real-world environments. Through in-depth analysis of the RF data in RFUAV, we define a drone feature sequence called RF drone fingerprint, which aids in distinguishing drone signals. In addition to the dataset, RFUAV provides a baseline preprocessing method and model evaluation tools. Rigorous experiments demonstrate that these preprocessing methods achieve state-of-the-art (SOTA) performance using the provided evaluation tools. The RFUAV dataset and baseline implementation are publicly available at https://github.com/kitoweeknd/RFUAV/.

Updated: 2025-03-12 03:46:09

标题: RFUAV：一个用于无人机检测和识别的基准数据集

摘要: 在本文中，我们提出了RFUAV作为一个新的基准数据集，用于基于射频（RF-based）的无人机（UAV）识别，并解决以下挑战：首先，许多现有数据集只包含有限种类的无人机类型和不足的原始数据量，无法满足实际应用的需求。其次，现有数据集通常缺乏涵盖广泛信噪比（SNR）范围的原始数据，或者没有提供将原始数据转换为不同SNR级别的工具。这种限制削弱了模型训练和评估的有效性。最后，许多现有数据集没有提供开放式评估工具，导致当前研究领域缺乏统一的评估标准。 RFUAV包含大约1.3TB的原始频率数据，通过在真实环境中使用通用软件无线电外围设备（USRP）设备收集37种不同UAV的数据。通过对RFUAV中的RF数据进行深入分析，我们定义了一种称为RF无人机指纹的无人机特征序列，有助于区分无人机信号。除了数据集外，RFUAV还提供了基准预处理方法和模型评估工具。严格的实验表明，这些预处理方法使用提供的评估工具实现了最先进的性能。RFUAV数据集和基准实现可以在https://github.com/kitoweeknd/RFUAV/上公开获取。

更新时间: 2025-03-12 03:46:09

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.09033v1

Teaching LLMs How to Learn with Contextual Fine-Tuning

Prompting Large Language Models (LLMs), or providing context on the expected model of operation, is an effective way to steer the outputs of such models to satisfy human desiderata after they have been trained. But in rapidly evolving domains, there is often need to fine-tune LLMs to improve either the kind of knowledge in their memory or their abilities to perform open ended reasoning in new domains. When human's learn new concepts, we often do so by linking the new material that we are studying to concepts we have already learned before. To that end, we ask, "can prompting help us teach LLMs how to learn". In this work, we study a novel generalization of instruction tuning, called contextual fine-tuning, to fine-tune LLMs. Our method leverages instructional prompts designed to mimic human cognitive strategies in learning and problem-solving to guide the learning process during training, aiming to improve the model's interpretation and understanding of domain-specific knowledge. We empirically demonstrate that this simple yet effective modification improves the ability of LLMs to be fine-tuned rapidly on new datasets both within the medical and financial domains.

Updated: 2025-03-12 03:45:53

标题: 教授LLMs如何通过情境微调学习

摘要: 激励大型语言模型（LLMs），或者提供关于预期操作模型的上下文，是引导这些模型输出以满足人类期望的有效方法。但在快速发展的领域中，通常需要对LLMs进行微调，以改善其记忆中的知识类型或其在新领域中执行开放式推理的能力。当人类学习新概念时，我们经常通过将我们正在学习的新材料与我们之前学过的概念联系起来来做到这一点。为此，我们问道，“提示是否可以帮助我们教会LLMs如何学习”。在这项工作中，我们研究了一种称为上下文微调的新型指导调整方法，用于微调LLMs。我们的方法利用设计成模仿人类在学习和解决问题中的认知策略的指导性提示，来引导训练过程中的学习，旨在提高模型对领域特定知识的解释和理解能力。我们通过实验证明，这种简单而有效的修改提高了LLMs在医疗和金融领域内快速在新数据集上进行微调的能力。

更新时间: 2025-03-12 03:45:53

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.09032v1

HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data

In the domain of Multimodal Large Language Models (MLLMs), achieving human-centric video understanding remains a formidable challenge. Existing benchmarks primarily emphasize object and action recognition, often neglecting the intricate nuances of human emotions, behaviors, and speech-visual alignment within video content. We present HumanVBench, an innovative benchmark meticulously crafted to bridge these gaps in the evaluation of video MLLMs. HumanVBench comprises 16 carefully designed tasks that explore two primary dimensions: inner emotion and outer manifestations, spanning static and dynamic, basic and complex, as well as single-modal and cross-modal aspects. With two advanced automated pipelines for video annotation and distractor-included QA generation, HumanVBench utilizes diverse state-of-the-art (SOTA) techniques to streamline benchmark data synthesis and quality assessment, minimizing human annotation dependency tailored to human-centric multimodal attributes. A comprehensive evaluation across 22 SOTA video MLLMs reveals notable limitations in current performance, especially in cross-modal and emotion perception, underscoring the necessity for further refinement toward achieving more human-like understanding. HumanVBench is open-sourced to facilitate future advancements and real-world applications in video MLLMs.

Updated: 2025-03-12 03:42:48

标题: HumanVBench：使用合成基准数据探索MLLMs的人类中心视频理解能力

摘要: 在多模态大型语言模型（MLLMs）领域，实现以人为中心的视频理解仍然是一个艰巨的挑战。现有的基准主要侧重于对象和动作识别，往往忽略了视频内容中人类情绪、行为和语音-视觉对齐等微妙细节。我们提出了HumanVBench，这是一个创新的基准，精心设计，旨在弥合评估视频MLLMs中的这些差距。HumanVBench包括16个精心设计的任务，探索两个主要维度：内在情绪和外在表现，涵盖静态和动态、基础和复杂、以及单模态和跨模态方面。通过两种先进的自动化流水线用于视频注释和包含干扰项的问答生成，HumanVBench利用各种最新技术来简化基准数据合成和质量评估，最大程度地减少了人类注释依赖性，量身定制以人为中心的多模态属性。对22个最新视频MLLMs进行全面评估揭示了当前性能存在显著局限，特别是在跨模态和情感感知方面，强调了进一步改进以实现更类似人类理解的必要性。HumanVBench是开源的，旨在促进未来在视频MLLMs中的进展和实际应用。

更新时间: 2025-03-12 03:42:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.17574v2

Adaptive Temperature Based on Logits Correlation in Knowledge Distillation

Knowledge distillation is a technique to imitate a performance that a deep learning model has, but reduce the size on another model. It applies the outputs of a model to train another model having comparable accuracy. These two distinct models are similar to the way information is delivered in human society, with one acting as the "teacher" and the other as the "student". Softmax plays a role in comparing logits generated by models with each other by converting probability distributions. It delivers the logits of a teacher to a student with compression through a parameter named temperature. Tuning this variable reinforces the distillation performance. Although only this parameter helps with the interaction of logits, it is not clear how temperatures promote information transfer. In this paper, we propose a novel approach to calculate the temperature. Our method only refers to the maximum logit generated by a teacher model, which reduces computational time against state-of-the-art methods. Our method shows a promising result in different student and teacher models on a standard benchmark dataset. Algorithms using temperature can obtain the improvement by plugging in this dynamic approach. Furthermore, the approximation of the distillation process converges to a correlation of logits by both models. This reinforces the previous argument that the distillation conveys the relevance of logits. We report that this approximating algorithm yields a higher temperature compared to the commonly used static values in testing.

Updated: 2025-03-12 03:41:31

标题: 基于logit相关性的知识蒸馏中的自适应温度

摘要: 知识蒸馏是一种模仿深度学习模型性能的技术，但在另一个模型上减小大小。它将一个模型的输出应用于训练另一个具有可比准确性的模型。这两个不同的模型类似于信息在人类社会中传递的方式，一个充当“老师”，另一个充当“学生”。Softmax在通过转换概率分布比较模型生成的对数时起着作用。它通过名为温度的参数将老师的对数传递给学生进行压缩。调整这个变量可以增强蒸馏性能。尽管只有这个参数有助于对数的交互，但温度如何促进信息传递尚不清楚。在本文中，我们提出了一种计算温度的新方法。我们的方法只涉及老师模型生成的最大对数，这减少了与最先进方法相比的计算时间。我们的方法在标准基准数据集上的不同学生和老师模型上展示了令人期待的结果。使用温度的算法可以通过采用这种动态方法获得改进。此外，蒸馏过程的近似收敛到两个模型的对数相关性。这加强了以前的论点，即蒸馏传达了对数的相关性。我们报告称，这种近似算法在测试中产生比常用的静态值更高的温度。

更新时间: 2025-03-12 03:41:31

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.09030v1

Intelligent logistics management robot path planning algorithm integrating transformer and GCN network

This research delves into advanced route optimization for robots in smart logistics, leveraging a fusion of Transformer architectures, Graph Neural Networks (GNNs), and Generative Adversarial Networks (GANs). The approach utilizes a graph-based representation encompassing geographical data, cargo allocation, and robot dynamics, addressing both spatial and resource limitations to refine route efficiency. Through extensive testing with authentic logistics datasets, the proposed method achieves notable improvements, including a 15% reduction in travel distance, a 20% boost in time efficiency, and a 10% decrease in energy consumption. These findings highlight the algorithm's effectiveness, promoting enhanced performance in intelligent logistics operations.

Updated: 2025-03-12 03:29:21

标题: 智能物流管理机器人路径规划算法整合变压器和GCN网络

摘要: 这项研究探讨了智能物流中机器人的高级路线优化，利用Transformer架构、图神经网络（GNNs）和生成对抗网络（GANs）的融合。该方法利用基于图的表示，涵盖地理数据、货物分配和机器人动力学，解决了空间和资源限制，以提高路线效率。通过对真实物流数据的大量测试，提出的方法取得了显著的改进，包括旅行距离减少15％，时间效率提高20％，能源消耗降低10％。这些发现突显了算法的有效性，促进了智能物流操作的性能提升。

更新时间: 2025-03-12 03:29:21

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2501.02749v2

A Real-time Multimodal Transformer Neural Network-powered Wildfire Forecasting System

Due to climate change, the extreme wildfire has become one of the most dangerous natural hazards to human civilization. Even though, some wildfires may be initially caused by human activity, but the spread of wildfires is mainly determined by environmental factors, for examples, (1) weather conditions such as temperature, wind direction and intensity, and moisture levels; (2) the amount and types of dry vegetation in a local area, and (3) topographic or local terrian conditions, which affects how much rain an area gets and how fire dynamics will be constrained or faciliated. Thus, to accurately forecast wildfire occurrence has become one of most urgent and taunting environmental challenges in global scale. In this work, we developed a real-time Multimodal Transformer Neural Network Machine Learning model that combines several advanced artificial intelligence techniques and statistical methods to practically forecast the occurrence of wildfire at the precise location in real time, which not only utilizes large scale data information such as hourly weather forecasting data, but also takes into account small scale topographical data such as local terrain condition and local vegetation conditions collecting from Google Earth images to determine the probabilities of wildfire occurrence location at small scale as well as their timing synchronized with weather forecast information. By using the wildfire data in the United States from 1992 to 2015 to train the multimodal transformer neural network, it can predict the probabilities of wildfire occurrence according to the real-time weather forecast and the synchronized Google Earth image data to provide the wildfire occurrence probability in any small location ($100m^2$) within 24 hours ahead.

Updated: 2025-03-12 03:22:04

标题: 一个实时多模态变压器神经网络驱动的森林火灾预测系统

摘要: 由于气候变化，极端野火已成为对人类文明最危险的自然灾害之一。尽管一些野火可能最初是由人类活动引起的，但野火的传播主要取决于环境因素，例如（1）天气条件，如温度、风向和强度，以及湿度水平；（2）当地干燥植被的数量和类型，以及（3）地形或当地地形条件，影响一个区域的降雨量和火灾动态将受到多大限制或促进。因此，准确预测野火发生已成为全球范围内最紧迫和困难的环境挑战之一。在这项工作中，我们开发了一个实时的多模态Transformer神经网络机器学习模型，结合了几种先进的人工智能技术和统计方法，可以实时预测火灾在精确位置的发生，不仅利用大规模数据信息，如每小时的天气预报数据，还考虑从Google Earth图像中收集的小尺度地形数据，例如当地地形条件和当地植被条件，以确定野火发生地点的概率，并与天气预报信息同步。通过使用1992年至2015年美国的野火数据来训练多模态Transformer神经网络，可以根据实时天气预报和同步的Google Earth图像数据预测野火发生的概率，在未来24小时内在任何小区域（$100m^2$）内提供野火发生概率。

更新时间: 2025-03-12 03:22:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.05971v2

Accurate INT8 Training Through Dynamic Block-Level Fallback

Transformer models have achieved remarkable success across various AI applications but face significant training costs. Low-bit training, such as INT8 training, can leverage computational units with higher throughput, and has already demonstrated its effectiveness on GPT2 models with block-level quantization. However, it struggles with modern Transformer variants incorporating GLU units. This is because those variants demonstrate complex distributions of activation outliers. To address the challenge, we propose Fallback Quantization, implementing mixed-precision GEMM that dynamically falls back 8-bit to 16-bit for activation blocks containing outliers. Experiments show that our approach is robustly competent in both fine-tuning and pretraining settings. Moreover, our method achieves a 1.57x end-to-end training speedup on RTX4090 GPUs.

Updated: 2025-03-12 03:20:28

标题: 准确的INT8训练通过动态块级备用方案

摘要: Transformer模型在各种人工智能应用中取得了显著的成功，但面临着巨大的训练成本。低比特训练，如INT8训练，可以利用具有更高吞吐量的计算单元，并且已经在具有块级量化的GPT2模型上证明了其有效性。然而，它在包含GLU单元的现代Transformer变体中面临困难。这是因为这些变体展示了复杂的激活异常值分布。为了解决这一挑战，我们提出了Fallback Quantization，实现了动态回退8位到16位的混合精度GEMM，以处理包含异常值的激活块。实验证明，我们的方法在微调和预训练设置中表现出了稳健的竞争力。此外，我们的方法在RTX4090 GPU上实现了1.57倍的端到端训练加速。

更新时间: 2025-03-12 03:20:28

领域: cs.LG

下载: http://arxiv.org/abs/2503.08040v2

Prompt Inversion Attack against Collaborative Inference of Large Language Models

Large language models (LLMs) have been widely applied for their remarkable capability of content generation. However, the practical use of open-source LLMs is hindered by high resource requirements, making deployment expensive and limiting widespread development. The collaborative inference is a promising solution for this problem, in which users collaborate by each hosting a subset of layers and transmitting intermediate activation. Many companies are building collaborative inference platforms to reduce LLM serving costs, leveraging users' underutilized GPUs. Despite widespread interest in collaborative inference within academia and industry, the privacy risks associated with LLM collaborative inference have not been well studied. This is largely because of the challenge posed by inverting LLM activation due to its strong non-linearity. In this paper, to validate the severity of privacy threats in LLM collaborative inference, we introduce the concept of prompt inversion attack (PIA), where a malicious participant intends to recover the input prompt through the activation transmitted by its previous participant. Extensive experiments show that our PIA method substantially outperforms existing baselines. For example, our method achieves an 88.4\% token accuracy on the Skytrax dataset with the Llama-65B model when inverting the maximum number of transformer layers, while the best baseline method only achieves 22.8\% accuracy. The results verify the effectiveness of our PIA attack and highlights its practical threat to LLM collaborative inference systems.

Updated: 2025-03-12 03:20:03

标题: 大型语言模型协作推理的即时反转攻击

摘要: 大型语言模型（LLMs）因其出色的内容生成能力而被广泛应用。然而，开源LLMs的实际应用受到高资源需求的限制，使得部署成本高昂，限制了广泛的发展。合作推理是解决这一问题的一种有前途的解决方案，其中用户通过每个托管一组层并传输中间激活来合作。许多公司正在构建合作推理平台，以降低LLM提供成本，利用用户未充分利用的GPU。尽管学术界和工业界对合作推理表现出了广泛的兴趣，但与LLM合作推理相关的隐私风险尚未得到很好的研究。这主要是因为由于其强烈的非线性，倒置LLM激活所带来的挑战。在本文中，为了验证LLM合作推理中隐私威胁的严重性，我们引入了提示倒置攻击（PIA）的概念，其中恶意参与者试图通过其上一个参与者传输的激活来恢复输入提示。大量实验证明，我们的PIA方法明显优于现有的基线。例如，我们的方法在Llama-65B模型的Skytrax数据集上当倒置最大数量的变压器层时，实现了88.4％的标记精度，而最佳基线方法仅实现了22.8％的准确度。结果验证了我们的PIA攻击的有效性，并突显了其对LLM合作推理系统的实际威胁。

更新时间: 2025-03-12 03:20:03

领域: cs.CR

下载: http://arxiv.org/abs/2503.09022v1

Language Models Fail to Introspect About Their Knowledge of Language

There has been recent interest in whether large language models (LLMs) can introspect about their own internal states. Such abilities would make LLMs more interpretable, and also validate the use of standard introspective methods in linguistics to evaluate grammatical knowledge in models (e.g., asking "Is this sentence grammatical?"). We systematically investigate emergent introspection across 21 open-source LLMs, in two domains where introspection is of theoretical interest: grammatical knowledge and word prediction. Crucially, in both domains, a model's internal linguistic knowledge can be theoretically grounded in direct measurements of string probability. We then evaluate whether models' responses to metalinguistic prompts faithfully reflect their internal knowledge. We propose a new measure of introspection: the degree to which a model's prompted responses predict its own string probabilities, beyond what would be predicted by another model with nearly identical internal knowledge. While both metalinguistic prompting and probability comparisons lead to high task accuracy, we do not find evidence that LLMs have privileged "self-access". Our findings complicate recent results suggesting that models can introspect, and add new evidence to the argument that prompted responses should not be conflated with models' linguistic generalizations.

Updated: 2025-03-12 03:18:36

标题: 语言模型无法自省其对语言知识的了解

摘要: 最近人们对大型语言模型（LLMs）是否能够自我反省其内部状态产生了兴趣。这种能力将使LLMs更易解释，也验证了在语言学中评估模型的语法知识的标准自省方法的使用（例如，询问“这个句子是否符合语法？”）。我们系统地调查了21个开源LLMs中的出现的内省，这两个领域对内省具有理论兴趣：语法知识和单词预测。关键的是，在这两个领域，模型的内部语言知识可以从字符串概率的直接测量中从理论上得到支持。然后，我们评估模型对元语言提示的响应是否忠实地反映了其内部知识。我们提出了一种新的内省度量：模型对提示的响应预测其自身字符串概率的程度，超出了几乎具有相同内部知识的另一个模型的预测。虽然元语言提示和概率比较都导致高任务准确性，但我们并未发现LLMs具有特权的“自我访问”的证据。我们的发现使最近表明模型能够自省的结果变得更加复杂，并为论证提示的响应不应与模型的语言概括混淆增加了新的证据。

更新时间: 2025-03-12 03:18:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.07513v2

Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning

Large Language Models (LLMs) have been widely adopted in commercial code completion engines, significantly enhancing coding efficiency and productivity. However, LLMs may generate code with quality issues that violate coding standards and best practices, such as poor code style and maintainability, even when the code is functionally correct. This necessitates additional effort from developers to improve the code, potentially negating the efficiency gains provided by LLMs. To address this problem, we propose a novel comparative prefix-tuning method for controllable high-quality code generation. Our method introduces a single, property-specific prefix that is prepended to the activations of the LLM, serving as a lightweight alternative to fine-tuning. Unlike existing methods that require training multiple prefixes, our approach trains only one prefix and leverages pairs of high-quality and low-quality code samples, introducing a sequence-level ranking loss to guide the model's training. This comparative approach enables the model to better understand the differences between high-quality and low-quality code, focusing on aspects that impact code quality. Additionally, we design a data construction pipeline to collect and annotate pairs of high-quality and low-quality code, facilitating effective training. Extensive experiments on the Code Llama 7B model demonstrate that our method improves code quality by over 100% in certain task categories, while maintaining functional correctness. We also conduct ablation studies and generalization experiments, confirming the effectiveness of our method's components and its strong generalization capability.

Updated: 2025-03-12 03:15:46

标题: 使用比较性前缀调整增强大型语言模型中高质量代码生成

摘要: 大型语言模型（LLMs）已广泛应用于商业代码补全引擎中，显著提高了编码效率和生产力。然而，LLMs可能生成具有质量问题的代码，违反编码标准和最佳实践，比如代码风格和可维护性差，即使代码在功能上是正确的。这需要开发人员额外努力来改进代码，可能抵消了LLMs提供的效率增益。为了解决这个问题，我们提出了一种用于可控高质量代码生成的新颖比较前缀调整方法。我们的方法引入了一个特定属性的前缀，该前缀添加到LLM的激活之前，作为对微调的轻量级替代。与现有方法需要训练多个前缀不同，我们的方法只训练一个前缀，并利用高质量和低质量代码样本对，引入序列级排序损失来指导模型的训练。这种比较方法使模型更好地理解高质量和低质量代码之间的差异，侧重于影响代码质量的方面。此外，我们设计了一个数据构建管道，用于收集和注释高质量和低质量代码对，促进有效训练。对Code Llama 7B模型的广泛实验表明，我们的方法在某些任务类别中将代码质量提高了超过100％，同时保持功能正确性。我们还进行了消融研究和泛化实验，证实了我们方法组件的有效性及其强大的泛化能力。

更新时间: 2025-03-12 03:15:46

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2503.09020v1

Feasibility-aware Imitation Learning from Observations through a Hand-mounted Demonstration Interface

Imitation learning through a demonstration interface is expected to learn policies for robot automation from intuitive human demonstrations. However, due to the differences in human and robot movement characteristics, a human expert might unintentionally demonstrate an action that the robot cannot execute. We propose feasibility-aware behavior cloning from observation (FABCO). In the FABCO framework, the feasibility of each demonstration is assessed using the robot's pre-trained forward and inverse dynamics models. This feasibility information is provided as visual feedback to the demonstrators, encouraging them to refine their demonstrations. During policy learning, estimated feasibility serves as a weight for the demonstration data, improving both the data efficiency and the robustness of the learned policy. We experimentally validated FABCO's effectiveness by applying it to a pipette insertion task involving a pipette and a vial. Four participants assessed the impact of the feasibility feedback and the weighted policy learning in FABCO. Additionally, we used the NASA Task Load Index (NASA-TLX) to evaluate the workload induced by demonstrations with visual feedback.

Updated: 2025-03-12 03:14:04

标题: 考虑可行性的从观察中通过手持演示界面进行模仿学习

摘要: 通过演示界面进行模仿学习预期能够从直观的人类演示中学习机器人自动化策略。然而，由于人类和机器人运动特性的差异，人类专家可能会无意中展示机器人无法执行的动作。我们提出了基于观察的可行性感知行为克隆（FABCO）。在FABCO框架中，利用机器人预先训练的正向和逆向动力学模型评估每个演示的可行性。这些可行性信息作为视觉反馈提供给演示者，鼓励他们改进演示。在策略学习过程中，估计的可行性作为演示数据的权重，提高了学习策略的数据效率和鲁棒性。我们通过将FABCO应用于涉及移液管和试管的移液任务来实验验证了FABCO的有效性。四名参与者评估了FABCO中可行性反馈和加权策略学习的影响。此外，我们使用NASA任务负荷指数（NASA-TLX）评估了带有视觉反馈的演示所引起的工作负荷。

更新时间: 2025-03-12 03:14:04

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2503.09018v1

Natural Humanoid Robot Locomotion with Generative Motion Prior

Natural and lifelike locomotion remains a fundamental challenge for humanoid robots to interact with human society. However, previous methods either neglect motion naturalness or rely on unstable and ambiguous style rewards. In this paper, we propose a novel Generative Motion Prior (GMP) that provides fine-grained motion-level supervision for the task of natural humanoid robot locomotion. To leverage natural human motions, we first employ whole-body motion retargeting to effectively transfer them to the robot. Subsequently, we train a generative model offline to predict future natural reference motions for the robot based on a conditional variational auto-encoder. During policy training, the generative motion prior serves as a frozen online motion generator, delivering precise and comprehensive supervision at the trajectory level, including joint angles and keypoint positions. The generative motion prior significantly enhances training stability and improves interpretability by offering detailed and dense guidance throughout the learning process. Experimental results in both simulation and real-world environments demonstrate that our method achieves superior motion naturalness compared to existing approaches. Project page can be found at https://sites.google.com/view/humanoid-gmp

Updated: 2025-03-12 03:04:15

标题: 具有生成式运动先验的自然人形机器人运动

摘要: 自然和逼真的运动对于人形机器人与人类社会互动仍然是一个基本挑战。然而，先前的方法要么忽视了运动的自然性，要么依赖于不稳定和模糊的风格奖励。在本文中，我们提出了一种新颖的生成运动先验（GMP），为自然人形机器人运动的任务提供了细粒度的运动级监督。为了利用自然的人类运动，我们首先采用全身运动重定向，有效地将它们转移到机器人身上。随后，我们离线训练一个生成模型，基于条件变分自动编码器，预测机器人未来的自然参考运动。在策略训练过程中，生成运动先验作为一个冻结的在线运动生成器，提供了在轨迹级别上精确和全面的监督，包括关节角度和关键点位置。生成运动先验显著增强了训练稳定性，并通过在学习过程中提供详细和密集的指导来提高可解释性。在模拟和现实环境中的实验结果表明，与现有方法相比，我们的方法实现了更优越的运动自然性。项目页面位于https://sites.google.com/view/humanoid-gmp。

更新时间: 2025-03-12 03:04:15

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2503.09015v1

Cumulative Reasoning with Large Language Models

Recent advancements in large language models (LLMs) have shown remarkable progress, yet their ability to solve complex problems remains limited. In this work, we introduce Cumulative Reasoning (CR), an approach that utilizes LLMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective composition, significantly enhancing problem-solving capabilities. We demonstrate CR's advantage through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over the prior state-of-the-art. In solving MATH problems, CR achieves a 4.2% increase from previous methods and a 43% relative improvement in the most challenging level 5 problems. When incorporating a code environment with CR, we further harness LLMs' reasoning capabilities and outperform the Program of Thought (PoT) method by 38.8%. The code is available at https://github.com/iiis-ai/cumulative-reasoning.

Updated: 2025-03-12 02:55:36

标题: 使用大型语言模型进行累积推理

摘要: 最近大型语言模型(LLM)的进展显示出显著的进步，但它们解决复杂问题的能力仍然有限。在这项工作中，我们引入了累积推理(CR)方法，该方法累积和迭代地利用LLM，模拟人类思维过程来解决问题。CR将任务分解为较小、可管理的组件，并利用先前的命题进行有效的组合，显著增强了解决问题的能力。我们通过几个复杂的推理任务展示了CR的优势：在逻辑推理任务中，它比现有方法表现更好，提高了9.3%，在经过筛选的FOLIO维基数据集上实现了98.04%的准确率。在24点游戏中，它实现了98%的准确率，比以前的最先进方法提高了24%。在解决数学问题时，CR比以前的方法提高了4.2%，在最具挑战性的5级问题中相对提高了43%。当将代码环境与CR结合时，我们进一步利用LLM的推理能力，并比思维程序(PoT)方法提高了38.8%。代码可在https://github.com/iiis-ai/cumulative-reasoning找到。

更新时间: 2025-03-12 02:55:36

领域: cs.AI

下载: http://arxiv.org/abs/2308.04371v7

Robustness Inspired Graph Backdoor Defense

Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification. However, recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption. Despite initial efforts to defend against specific graph backdoor attacks, there is no work on defending against various types of backdoor attacks where generated triggers have different properties. Hence, we first empirically verify that prediction variance under edge dropping is a crucial indicator for identifying poisoned nodes. With this observation, we propose using random edge dropping to detect backdoors and theoretically show that it can efficiently distinguish poisoned nodes from clean ones. Furthermore, we introduce a novel robust training strategy to efficiently counteract the impact of the triggers. Extensive experiments on real-world datasets show that our framework can effectively identify poisoned nodes, significantly degrade the attack success rate, and maintain clean accuracy when defending against various types of graph backdoor attacks with different properties.

Updated: 2025-03-12 02:55:02

标题: 强健性启发的图形后门防御

摘要: 图神经网络（GNNs）在节点分类和图分类等任务中取得了令人期待的结果。然而，最近的研究表明GNNs容易受到后门攻击，这对其在现实世界中的应用构成了重大威胁。尽管已经开始努力针对特定图后门攻击进行防御，但尚未有工作针对具有不同特性的各种后门攻击进行防御。因此，我们首先通过实证验证，在边缘删除下的预测方差是识别被污染节点的关键指标。基于这一观察结果，我们提出使用随机边缘删除来检测后门，并理论上证明它能有效区分被污染节点和干净节点。此外，我们引入了一种新颖的稳健训练策略，以有效抵消触发器的影响。对真实世界数据集进行的大量实验表明，我们的框架可以有效识别被污染节点，显著降低攻击成功率，并在防御具有不同特性的各种图后门攻击时保持干净准确性。

更新时间: 2025-03-12 02:55:02

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2406.09836v2

Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement

Long-range dependencies are critical for effective graph representation learning, yet most existing datasets focus on small graphs tailored to inductive tasks, offering limited insight into long-range interactions. Current evaluations primarily compare models employing global attention (e.g., graph transformers) with those using local neighborhood aggregation (e.g., message-passing neural networks) without a direct measurement of long-range dependency. In this work, we introduce City-Networks, a novel large-scale transductive learning dataset derived from real-world city roads. This dataset features graphs with over $10^5$ nodes and significantly larger diameters than those in existing benchmarks, naturally embodying long-range information. We annotate the graphs using an eccentricity-based approach, ensuring that the classification task inherently requires information from distant nodes. Furthermore, we propose a model-agnostic measurement based on the Jacobians of neighbors from distant hops, offering a principled quantification of long-range dependencies. Finally, we provide theoretical justifications for both our dataset design and the proposed measurement - particularly by focusing on over-smoothing and influence score dilution - which establishes a robust foundation for further exploration of long-range interactions in graph neural networks.

Updated: 2025-03-12 02:51:17

标题: 朝向量化图机器学习中的长程相互作用：一个大型图数据集和一种测量方法

摘要: 长程依赖对于有效的图表示学习至关重要，然而大多数现有数据集侧重于针对归纳任务定制的小图，提供了有限的对长程相互作用的洞察。当前的评估主要比较采用全局注意力（例如图变换器）的模型与使用局部邻域聚合（例如消息传递神经网络）的模型，而没有直接测量长程依赖。在这项工作中，我们介绍了City-Networks，这是一个源自现实世界城市道路的新颖大规模传导学习数据集。该数据集的图具有超过$10^5$个节点，并且比现有基准测试中的图直径要大得多，自然地体现了长程信息。我们使用基于偏心率的方法对图进行了注释，确保分类任务本质上需要远距离节点的信息。此外，我们提出了一种基于远距离跳跃邻居的雅可比矩阵的模型无关的测量方法，提供了对长程依赖的原则性量化。最后，我们为我们的数据集设计和提出的测量提供了理论上的证明，特别是通过关注过度平滑和影响分数稀释，建立了对图神经网络中长程相互作用进一步探索的坚实基础。

更新时间: 2025-03-12 02:51:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.09008v1

Measuring directional bias amplification in image captions using predictability

When we train models on biased ML datasets, they not only learn these biases but can inflate them at test time - a phenomenon called bias amplification. To measure bias amplification in ML datasets, many co-occurrence-based metrics have been proposed. Co-occurrence-based metrics are effective in measuring bias amplification in simple problems like image classification. However, these metrics are ineffective for complex problems like image captioning as they cannot capture the semantics of a caption. To measure bias amplification in captions, prior work introduced a predictability-based metric called Leakage in Captioning (LIC). While LIC captures the semantics and context of captions, it has limitations. LIC cannot identify the direction in which bias is amplified, poorly estimates dataset bias due to a weak vocabulary substitution strategy, and is highly sensitive to attacker models (a hyperparameter in predictability-based metrics). To overcome these issues, we propose Directional Predictability Amplification in Captioning (DPAC). DPAC measures directional bias amplification in captions, provides a better estimate of dataset bias using an improved substitution strategy, and is less sensitive to attacker models. Our experiments on the COCO captioning dataset show how DPAC is the most reliable metric to measure bias amplification in captions.

Updated: 2025-03-12 02:47:54

标题: 使用可预测性测量图像标题中的方向偏差放大

摘要: 当我们在有偏见的机器学习数据集上训练模型时，它们不仅学习到这些偏见，还可能在测试时放大这些偏见-这种现象被称为偏见放大。为了衡量机器学习数据集中的偏见放大，许多基于共现的度量指标被提出。基于共现的度量指标在简单问题如图像分类中衡量偏见放大是有效的。然而，这些指标在复杂问题如图像字幕生成中是无效的，因为它们无法捕捉字幕的语义。为了衡量字幕中的偏见放大，先前的工作引入了一种基于可预测性的度量指标，称为字幕中的泄漏（LIC）。虽然LIC捕捉了字幕的语义和上下文，但它也存在一些局限性。LIC无法确定偏见被放大的方向，由于弱词汇替换策略，对数据集偏见的估计不准确，并且对攻击者模型（可预测性度量中的一个超参数）非常敏感。为了解决这些问题，我们提出了字幕中的方向可预测性放大（DPAC）。DPAC度量字幕中的方向偏见放大，使用改进的替换策略更好地估计数据集的偏见，并且对攻击者模型的敏感性较低。我们在COCO字幕数据集上的实验表明，DPAC是衡量字幕中偏见放大最可靠的度量指标。

更新时间: 2025-03-12 02:47:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.07878v2

RelationMatch: Matching In-batch Relationships for Semi-supervised Learning

Semi-supervised learning has emerged as a pivotal approach for leveraging scarce labeled data alongside abundant unlabeled data. Despite significant progress, prevailing SSL methods predominantly enforce consistency between different augmented views of individual samples, thereby overlooking the rich relational structure inherent within a mini-batch. In this paper, we present RelationMatch, a novel SSL framework that explicitly enforces in-batch relational consistency through a Matrix Cross-Entropy (MCE) loss function. The proposed MCE loss is rigorously derived from both matrix analysis and information geometry perspectives, ensuring theoretical soundness and practical efficacy. Extensive empirical evaluations on standard benchmarks, including a notable 15.21% accuracy improvement over FlexMatch on STL-10, demonstrate that RelationMatch not only advances state-of-the-art performance but also provides a principled foundation for incorporating relational cues in SSL.

Updated: 2025-03-12 02:45:16

标题: RelationMatch: 在半监督学习中匹配批处理关系

摘要: 半监督学习已成为利用稀缺标记数据和丰富未标记数据的关键方法。尽管取得了显著进展，但目前的半监督学习方法主要强调对个体样本的不同增强视图之间的一致性，从而忽视了小批量中固有的丰富关系结构。本文提出了RelationMatch，一种新颖的半监督学习框架，通过矩阵交叉熵（MCE）损失函数明确强调批内关系一致性。所提出的MCE损失从矩阵分析和信息几何学的角度严格推导出，确保理论上的准确性和实际有效性。在标准基准测试中进行了广泛的实证评估，包括在STL-10上比FlexMatch提高了15.21％的准确性，证明RelationMatch不仅提高了最先进的性能，而且为在半监督学习中融入关系线索提供了基础。

更新时间: 2025-03-12 02:45:16

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2305.10397v3

Generative Models in Decision Making: A Survey

In recent years, the exceptional performance of generative models in generative tasks has sparked significant interest in their integration into decision-making processes. Due to their ability to handle complex data distributions and their strong model capacity, generative models can be effectively incorporated into decision-making systems by generating trajectories that guide agents toward high-reward state-action regions or intermediate sub-goals. This paper presents a comprehensive review of the application of generative models in decision-making tasks. We classify seven fundamental types of generative models: energy-based models, generative adversarial networks, variational autoencoders, normalizing flows, diffusion models, generative flow networks, and autoregressive models. Regarding their applications, we categorize their functions into three main roles: controllers, modelers and optimizers, and discuss how each role contributes to decision-making. Furthermore, we examine the deployment of these models across five critical real-world decision-making scenarios. Finally, we summarize the strengths and limitations of current approaches and propose three key directions for advancing next-generation generative directive models: high-performance algorithms, large-scale generalized decision-making models, and self-evolving and adaptive models.

Updated: 2025-03-12 02:32:00

标题: 决策中的生成模型：综述

摘要: 近年来，生成模型在生成任务中表现出色，引起了人们对将其整合到决策过程中的极大兴趣。由于它们处理复杂数据分布的能力和强大的模型容量，生成模型可以通过生成引导代理向高奖励状态-动作区域或中间子目标的轨迹来有效地纳入决策系统中。本文对生成模型在决策任务中的应用进行了全面审查。我们将生成模型分类为七种基本类型：基于能量的模型、生成对抗网络、变分自动编码器、归一化流、扩散模型、生成流网络和自回归模型。关于它们的应用，我们将它们的功能分类为三种主要角色：控制器、建模者和优化器，并讨论每种角色对决策的贡献。此外，我们还研究了这些模型在五个关键的现实世界决策场景中的部署。最后，我们总结了当前方法的优势和局限性，并提出了推进下一代生成指导模型的三个关键方向：高性能算法、大规模通用决策模型以及自我进化和自适应模型。

更新时间: 2025-03-12 02:32:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.17100v3

KNighter: Transforming Static Analysis with LLM-Synthesized Checkers

Static analysis is a powerful technique for bug detection in critical systems like operating system kernels. However, designing and implementing static analyzers is challenging, time-consuming, and typically limited to predefined bug patterns. While large language models (LLMs) have shown promise for static analysis, directly applying them to scan large codebases remains impractical due to computational constraints and contextual limitations. We present KNighter, the first approach that unlocks practical LLM-based static analysis by automatically synthesizing static analyzers from historical bug patterns. Rather than using LLMs to directly analyze massive codebases, our key insight is leveraging LLMs to generate specialized static analyzers guided by historical patch knowledge. KNighter implements this vision through a multi-stage synthesis pipeline that validates checker correctness against original patches and employs an automated refinement process to iteratively reduce false positives. Our evaluation on the Linux kernel demonstrates that KNighter generates high-precision checkers capable of detecting diverse bug patterns overlooked by existing human-written analyzers. To date, KNighter-synthesized checkers have discovered 70 new bugs/vulnerabilities in the Linux kernel, with 56 confirmed and 41 already fixed. 11 of these findings have been assigned CVE numbers. This work establishes an entirely new paradigm for scalable, reliable, and traceable LLM-based static analysis for real-world systems via checker synthesis.

Updated: 2025-03-12 02:30:19

标题: KNighter：利用LLM合成检查器改变静态分析

摘要: 静态分析是一种在关键系统（如操作系统内核）中检测错误的强大技术。然而，设计和实现静态分析器具有挑战性，耗时且通常仅限于预定义的错误模式。虽然大型语言模型（LLMs）已经展现出在静态分析方面的潜力，但直接将它们应用于扫描大型代码库仍然不切实际，这是由于计算约束和上下文限制导致的。我们提出了KNighter，这是第一种通过自动合成静态分析器从历史错误模式中解锁实用LLM-based静态分析的方法。与直接使用LLMs分析大型代码库不同，我们的关键见解是利用LLMs生成受历史修补知识引导的专门静态分析器。KNighter通过一个多阶段合成管道来实现这一愿景，验证检查器的正确性与原始修补程序，并采用自动化的改进过程来迭代减少误报。我们在Linux内核上的评估表明，KNighter生成了高精度的检查器，能够检测出现有人类编写的分析器忽略的各种错误模式。到目前为止，KNighter合成的检查器在Linux内核中发现了70个新的错误/漏洞，其中56个已经确认，41个已经修复。这些发现中有11个已被分配CVE编号。这项工作通过检查器合成为实际系统建立了一个全新的可扩展、可靠且可追踪的LLM-based静态分析范式。

更新时间: 2025-03-12 02:30:19

领域: cs.SE,cs.AI,cs.CR,cs.OS

下载: http://arxiv.org/abs/2503.09002v1

Two Simple Principles for Diffusion-Based Test-Time Adaptation

Recently, diffusion-based test-time adaptations (TTA) have shown great advances, which leverage a diffusion model to map the images in the unknown test domain to the training domain. The unseen and diverse test domains make diffusion-based TTA an ill-posed problem. In this paper, we unravel two simple principles of the design tricks for diffusion-based methods. Intuitively, \textit{Principle 1} says semantic similarity preserving. We should preserve the semantic similarity between the original and generated test images. \textit{Principle 2} suggests minimal modifications. This principle enables the diffusion to map the test images to the training domain with minimal modifications of the test images. In particular, following the two principles, we propose our simple yet effective principle-guided diffusion-based test-time adaptation method (PDDA). Concretely, following Principle 1, we propose a semantic keeper, the method to preserve feature similarity, where the semantic keeper could filter the corruption introduced from the test domain, thus better preserving the semantics. Following Principle 2, we propose a modification keeper, where we introduce a regularization constraint into the generative process to minimize modifications to the test image. Meanwhile, there is a hidden conflict between the two principles. We further introduce the gradient-based view to unify the direction generated from two principles. Extensive experiments on CIFAR-10C, CIFAR-100C, ImageNet-W, and ImageNet-C with WideResNet-28-10, ResNet-50, Swin-T, and ConvNext-T demonstrate that PDDA significantly performs better than the complex state-of-the-art baselines. Specifically, PDDA achieves 2.4\% average accuracy improvements in ImageNet-C without any training process.

Updated: 2025-03-12 02:19:07

标题: 基于扩散的测试时间适应性的两个简单原则

摘要: 最近，基于扩散的测试时间适应（TTA）已经取得了巨大的进展，利用扩散模型将未知测试域中的图像映射到训练域。看不见和多样化的测试域使得基于扩散的TTA成为一个不适定的问题。本文揭示了设计基于扩散方法的两个简单原则。直观地，\textit{原则1} 表示语义相似性的保持。我们应该保持原始和生成的测试图像之间的语义相似性。\textit{原则2} 建议最小修改。这个原则使得扩散能够以最小的修改将测试图像映射到训练域。特别地，遵循这两个原则，我们提出了我们简单而有效的基于原则的扩散测试时间适应方法（PDDA）。具体地，遵循原则1，我们提出了语义保持器，这是一种保持特征相似性的方法，其中语义保持器可以过滤从测试域引入的污染，从而更好地保持语义。遵循原则2，我们提出了一个修改保持器，在生成过程中引入正则化约束来最小化对测试图像的修改。同时，这两个原则之间存在隐含的冲突。我们进一步引入了基于梯度的观点，统一了从两个原则生成的方向。在CIFAR-10C、CIFAR-100C、ImageNet-W和ImageNet-C上使用WideResNet-28-10、ResNet-50、Swin-T和ConvNext-T进行的大量实验表明，PDDA比复杂的最先进基线表现显著更好。具体来说，PDDA在ImageNet-C上实现了2.4\%的平均准确率提升，而无需任何训练过程。

更新时间: 2025-03-12 02:19:07

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2312.05274v2

From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches

Model merging has achieved significant success, with numerous innovative methods proposed to enhance capabilities by combining multiple models. However, challenges persist due to the lack of a unified framework for classification and systematic comparative analysis, leading to inconsistencies in terminologies and categorizations. Meanwhile, as an increasing number of fine-tuned models are publicly available, their original training data often remain inaccessible due to privacy concerns or intellectual property restrictions. This makes traditional multi-task learning based on shared training data impractical. In scenarios where direct access to training data is infeasible, merging model parameters to create a unified model with broad generalization across multiple domains becomes crucial, further underscoring the importance of model merging techniques. Despite the rapid progress in this field, a comprehensive taxonomy and survey summarizing recent advances and predicting future directions are still lacking. This paper addresses these gaps by establishing a new taxonomy of model merging methods, systematically comparing different approaches, and providing an overview of key developments. By offering a structured perspective on this evolving area, we aim to help newcomers quickly grasp the field's landscape and inspire further innovations.

Updated: 2025-03-12 02:17:31

标题: 从任务特定模型到统一系统：模型合并方法综述

摘要: 模型合并取得了显著的成功，提出了许多创新方法，通过结合多个模型来增强能力。然而，由于缺乏统一的分类和系统比较分析框架，挑战仍然存在，导致术语和分类的不一致。与此同时，随着越来越多经过精细调整的模型公开可用，由于隐私和知识产权限制，它们的原始训练数据通常无法访问。这使得基于共享训练数据的传统多任务学习变得不切实际。在无法直接访问训练数据的情况下，将模型参数合并以创建一个在多个领域具有广泛泛化能力的统一模型变得至关重要，进一步强调了模型合并技术的重要性。尽管这一领域取得了快速进展，但仍缺乏对最新进展的综合分类和调查，并对未来方向进行预测。本文通过建立一个新的模型合并方法分类法，系统比较不同方法，并概述关键发展，填补了这些空白。通过提供对这一不断发展领域的结构化视角，我们旨在帮助新手快速掌握该领域的全貌，并激发进一步创新。

更新时间: 2025-03-12 02:17:31

领域: cs.LG

下载: http://arxiv.org/abs/2503.08998v1

Leveraging Knowledge Graphs and LLMs for Context-Aware Messaging

Personalized messaging plays an essential role in improving communication in areas such as healthcare, education, and professional engagement. This paper introduces a framework that uses the Knowledge Graph (KG) to dynamically rephrase written communications by integrating individual and context-specific data. The knowledge graph represents individuals, locations, and events as critical nodes, linking entities mentioned in messages to their corresponding graph nodes. The extraction of relevant information, such as preferences, professional roles, and cultural norms, is then combined with the original message and processed through a large language model (LLM) to generate personalized responses. The framework demonstrates notable message acceptance rates in various domains: 42% in healthcare, 53% in education, and 78% in professional recruitment. By integrating entity linking, event detection, and language modeling, this approach offers a structured and scalable solution for context-aware, audience-specific communication, facilitating advanced applications in diverse fields.

Updated: 2025-03-12 02:17:15

标题: 利用知识图谱和LLMs实现上下文感知消息传递

摘要: 个性化消息在改善医疗保健、教育和专业参与等领域的沟通中起着至关重要的作用。本文介绍了一个框架，该框架利用知识图（KG）通过集成个体和特定上下文数据来动态重述书面通信。知识图将个体、位置和事件表示为关键节点，将消息中提到的实体链接到它们对应的图节点。然后，提取相关信息（如偏好、专业角色和文化规范），并与原始消息结合，通过大型语言模型（LLM）处理生成个性化响应。该框架在各个领域表现出显著的消息接受率：医疗保健领域为42％，教育领域为53％，专业招聘领域为78％。通过整合实体链接、事件检测和语言建模，这种方法为上下文感知、针对特定受众的沟通提供了一种结构化和可扩展的解决方案，促进了在不同领域进行高级应用。

更新时间: 2025-03-12 02:17:15

领域: cs.AI

下载: http://arxiv.org/abs/2503.13499v1

Unified Locomotion Transformer with Simultaneous Sim-to-Real Transfer for Quadrupeds

Quadrupeds have gained rapid advancement in their capability of traversing across complex terrains. The adoption of deep Reinforcement Learning (RL), transformers and various knowledge transfer techniques can greatly reduce the sim-to-real gap. However, the classical teacher-student framework commonly used in existing locomotion policies requires a pre-trained teacher and leverages the privilege information to guide the student policy. With the implementation of large-scale models in robotics controllers, especially transformers-based ones, this knowledge distillation technique starts to show its weakness in efficiency, due to the requirement of multiple supervised stages. In this paper, we propose Unified Locomotion Transformer (ULT), a new transformer-based framework to unify the processes of knowledge transfer and policy optimization in a single network while still taking advantage of privilege information. The policies are optimized with reinforcement learning, next state-action prediction, and action imitation, all in just one training stage, to achieve zero-shot deployment. Evaluation results demonstrate that with ULT, optimal teacher and student policies can be obtained at the same time, greatly easing the difficulty in knowledge transfer, even with complex transformer-based models.

Updated: 2025-03-12 02:15:13

标题: 四足动物的统一运动变换器，同时进行从Sim到真实的转移

摘要: 四足动物在穿越复杂地形方面取得了快速进展。采用深度强化学习（RL）、变压器和各种知识转移技术可以大大减少从模拟到现实的差距。然而，现有 locomotion 策略中通常使用的经典师生框架需要预先训练的老师，并利用特权信息来指导学生策略。随着机器人控制器中大规模模型的实施，特别是基于变压器的模型，这种知识蒸馏技术开始显示出在效率方面的弱点，因为需要多个监督阶段。在本文中，我们提出了统一运动变压器（ULT），这是一个基于变压器的新框架，可以统一知识转移和策略优化的过程在一个网络中，同时仍然利用特权信息。这些策略通过强化学习、下一个状态-动作预测和动作模仿进行优化，所有这些只需一个训练阶段，以实现零-shot 部署。评估结果表明，通过 ULT，可以同时获得最佳的老师和学生策略，大大减轻了知识转移的困难，即使使用复杂的基于变压器的模型也是如此。

更新时间: 2025-03-12 02:15:13

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2503.08997v1

The Logic of Counterfactuals and the Epistemology of Causal Inference

The 2021 Nobel Prize in Economics recognized an epistemology of causal inference based on the Rubin causal model (Rubin 1974), which merits broader attention in philosophy. This model, in fact, presupposes a logical principle of counterfactuals, Conditional Excluded Middle (CEM), the locus of a pivotal debate between Stalnaker (1968) and Lewis (1973) on the semantics of counterfactuals. Proponents of CEM should recognize that this connection points to a new argument for CEM -- a Quine-Putnam indispensability argument grounded in the Nobel-winning applications of the Rubin model in health and social sciences. To advance the dialectic, I challenge this argument with an updated Rubin causal model that retains its successes while dispensing with CEM. This novel approach combines the strengths of the Rubin causal model and a causal model familiar in philosophy, the causal Bayes net. The takeaway: deductive logic and inductive inference, often studied in isolation, are deeply interconnected.

Updated: 2025-03-12 02:08:24

标题: 反事实条件句的逻辑与因果推断的认识论

摘要: 2021年诺贝尔经济学奖表彰了基于Rubin因果模型(Rubin 1974)的因果推断认识论，这在哲学上值得更广泛的关注。事实上，这一模型预设了一个逻辑原则，即对条件排斥中间(CEM)，这是Stalnaker(1968)和Lewis(1973)在反事实语义上的一个关键辩论的焦点。支持CEM的人应该认识到，这种连接指向了CEM的一个新论据--一个基于诺贝尔获奖的Rubin模型在健康和社会科学中的应用的Quine-Putnam必要性论据。为了推动辩证法，我提出了一个更新的Rubin因果模型，保留了其成功，同时摒弃了CEM。这种新颖的方法结合了Rubin因果模型和哲学中熟悉的因果贝叶斯网络的优势。结论是：演绎逻辑和归纳推理，通常被孤立地研究，实际上是深度相互联系的。

更新时间: 2025-03-12 02:08:24

领域: cs.AI,econ.EM,stat.ME,stat.OT

下载: http://arxiv.org/abs/2405.11284v3

DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation

Research on learned cardinality estimation has achieved significant progress in recent years. However, existing methods still face distinct challenges that hinder their practical deployment in production environments. We conceptualize these challenges as the "Trilemma of Cardinality Estimation", where learned cardinality estimation methods struggle to balance generality, accuracy, and updatability. To address these challenges, we introduce DistJoin, a join cardinality estimator based on efficient distribution prediction using multi-autoregressive models. Our contributions are threefold: (1) We propose a method for estimating both equi and non-equi join cardinality by leveraging the conditional probability distributions of individual tables in a decoupled manner. (2) To meet the requirements of efficient training and inference for DistJoin, we develop Adaptive Neural Predicate Modulation (ANPM), a high-throughput conditional probability distribution estimation model. (3) We formally analyze the variance of existing similar methods and demonstrate that such approaches suffer from variance accumulation issues. To mitigate this problem, DistJoin employs a selectivity-based approach rather than a count-based approach to infer join cardinality, effectively reducing variance. In summary, DistJoin not only represents the first data-driven method to effectively support both equi and non-equi joins but also demonstrates superior accuracy while enabling fast and flexible updates. We evaluate DistJoin on JOB-light and JOB-light-ranges, extending the evaluation to non-equi join conditions. The results demonstrate that our approach achieves the highest accuracy, robustness to data updates, generality, and comparable update and inference speed relative to existing methods.

Updated: 2025-03-12 02:07:08

标题: DistJoin：基于自适应神经谓词调制的解耦连接基数估计器

摘要: 近年来，关于学习基数估计的研究取得了显著进展。然而，现有方法仍然面临着明显的挑战，阻碍了它们在生产环境中的实际部署。我们将这些挑战概念化为“基数估计的三难题”，即学习基数估计方法在平衡普适性、准确性和可更新性方面存在困难。为了解决这些挑战，我们引入了DistJoin，这是一种基于高效分布预测的连接基数估计器，使用多自回归模型。我们的贡献有三个方面：(1)我们提出了一种通过分离的方式利用单个表的条件概率分布来估计等值和非等值连接基数的方法。(2)为了满足DistJoin的高效训练和推理要求，我们开发了自适应神经谓词调制(ANPM)，这是一种高吞吐量的条件概率分布估计模型。(3)我们正式分析了现有类似方法的方差，并展示了这些方法存在方差积累问题。为了缓解这个问题，DistJoin采用了基于选择性而不是基于计数的方法来推断连接基数，有效减少了方差。总之，DistJoin不仅代表了第一个有效支持等值和非等值连接的数据驱动方法，而且在实现快速灵活更新的同时表现出更高的准确性。我们在JOB-light和JOB-light-ranges上评估了DistJoin，将评估扩展到非等值连接条件。结果表明，我们的方法在准确性、数据更新的鲁棒性、普适性以及相对于现有方法的更新和推理速度方面表现出最高水平。

更新时间: 2025-03-12 02:07:08

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2503.08994v1

Edge AI-Powered Real-Time Decision-Making for Autonomous Vehicles in Adverse Weather Conditions

Autonomous vehicles (AVs) are transforming modern transportation, but their reliability and safety are significantly challenged by harsh weather conditions such as heavy rain, fog, and snow. These environmental factors impair the performance of cameras, LiDAR, and radar, leading to reduced situational awareness and increased accident risks. Conventional cloud-based AI systems introduce communication delays, making them unsuitable for the rapid decision-making required in real-time autonomous navigation. This paper presents a novel Edge AI-driven real-time decision-making framework designed to enhance AV responsiveness under adverse weather conditions. The proposed approach integrates convolutional neural networks (CNNs) and recurrent neural networks (RNNs) for improved perception, alongside reinforcement learning (RL)-based strategies to optimize vehicle control in uncertain environments. By processing data at the network edge, this system significantly reduces decision latency while improving AV adaptability. The framework is evaluated using simulated driving scenarios in CARLA and real-world data from the Waymo Open Dataset, covering diverse weather conditions. Experimental results indicate that the proposed model achieves a 40% reduction in processing time and a 25% enhancement in perception accuracy compared to conventional cloud-based systems. These findings highlight the potential of Edge AI in improving AV autonomy, safety, and efficiency, paving the way for more reliable self-driving technology in challenging real-world environments.

Updated: 2025-03-12 02:02:05

标题: 边缘人工智能驱动的自动车辆在恶劣天气条件下的实时决策制定

摘要: 自动驾驶汽车（AVs）正在改变现代交通运输，但它们的可靠性和安全性受到恶劣天气条件（如大雨、雾和雪）的严重挑战。这些环境因素影响了摄像头、激光雷达和雷达的性能，导致了情境感知的降低和事故风险的增加。传统的基于云的人工智能系统引入了通信延迟，使其不适用于实时自主导航所需的快速决策。本文提出了一种新颖的边缘人工智能驱动的实时决策框架，旨在增强AV在恶劣天气条件下的响应能力。所提出的方法集成了卷积神经网络（CNNs）和递归神经网络（RNNs）以改进感知，同时采用基于强化学习（RL）的策略来优化不确定环境下的车辆控制。通过在网络边缘处理数据，该系统显著减少了决策延迟，同时提高了AV的适应性。该框架在CARLA中进行模拟驾驶场景评估，并使用Waymo Open Dataset的真实数据，涵盖了各种天气条件。实验结果表明，与传统基于云的系统相比，所提出的模型在处理时间上实现了40%的降低，感知准确性提高了25%。这些发现突显了边缘人工智能在提高AV自主性、安全性和效率方面的潜力，为在具有挑战性的现实环境中更可靠的自动驾驶技术铺平了道路。

更新时间: 2025-03-12 02:02:05

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.09638v1

Merging Language and Domain Specific Models: The Impact on Technical Vocabulary Acquisition

Advancements in Natural Language Processing have enabled specialized language models, but integrating domain-specific knowledge into general-purpose models in multilingual settings remains challenging, particularly for technical vocabulary. This paper investigates the integration of technical vocabulary in merged language models and explores the knowledge transfer mechanisms involved when combining a general-purpose language-specific model with a domain-specific model, focusing on the resulting model's comprehension of technical jargon. Our experiments analyze the impact of this merging process on the target model's proficiency in handling specialized terminology. We present a quantitative evaluation of the performance of the merged model, comparing it with that of the individual constituent models. The findings offer insights into the effectiveness of different model merging methods for enhancing domain-specific knowledge and highlight potential challenges and future directions in leveraging these methods for cross-lingual knowledge transfer in Natural Language Processing.

Updated: 2025-03-12 02:00:49

标题: 合并语言和领域特定模型：对技术词汇习得的影响

摘要: 自然语言处理的进步已经使得专门的语言模型成为可能，但在多语言环境中将领域特定知识整合到通用模型中仍然具有挑战性，特别是对于技术词汇。本文研究了在合并语言模型中整合技术词汇，并探讨了将通用语言模型与领域特定模型结合时涉及的知识传递机制，重点关注结果模型对技术行话的理解能力。我们的实验分析了此合并过程对目标模型处理专业术语的熟练程度的影响。我们对合并模型的性能进行了定量评估，并将其与单独的组成模型进行了比较。研究结果为增强领域特定知识提供了不同模型合并方法的有效性，并突出了在自然语言处理中利用这些方法进行跨语言知识传递可能面临的挑战和未来方向。

更新时间: 2025-03-12 02:00:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.12001v2

Unique Rashomon Sets for Robust Active Learning

Collecting labeled data for machine learning models is often expensive and time-consuming. Active learning addresses this challenge by selectively labeling the most informative observations, but when initial labeled data is limited, it becomes difficult to distinguish genuinely informative points from those appearing uncertain primarily due to noise. Ensemble methods like random forests are a powerful approach to quantifying this uncertainty but do so by aggregating all models indiscriminately. This includes poor performing models and redundant models, a problem that worsens in the presence of noisy data. We introduce UNique Rashomon Ensembled Active Learning (UNREAL), which selectively ensembles only distinct models from the Rashomon set, which is the set of nearly optimal models. Restricting ensemble membership to high-performing models with different explanations helps distinguish genuine uncertainty from noise-induced variation. We show that UNREAL achieves faster theoretical convergence rates than traditional active learning approaches and demonstrates empirical improvements of up to 20% in predictive accuracy across five benchmark datasets, while simultaneously enhancing model interpretability.

Updated: 2025-03-12 01:53:55

标题: 独特的拉细摩集合用于稳健的主动学习

摘要: 为机器学习模型收集标记数据通常是昂贵且耗时的。主动学习通过选择性地标记最具信息量的观测数据来解决这一挑战，但当初始标记数据有限时，很难区分真正具有信息量的点和因噪声而看似不确定的点。集成方法如随机森林是一种强大的方法来量化这种不确定性，但通过无差别地聚合所有模型来实现。这包括性能较差的模型和冗余的模型，这一问题在存在嘈杂数据时会加剧。我们介绍了UNique Rashomon Ensembled Active Learning (UNREAL)，它仅选择来自Rashomon集合的不同模型进行集成，Rashomon集合是几乎最优模型的集合。将集成成员限制为具有不同解释的高性能模型有助于区分真正的不确定性和噪声引起的变化。我们证明，UNREAL比传统的主动学习方法实现了更快的理论收敛速度，并在五个基准数据集上实现了高达20%的预测准确性的经验改进，同时提高了模型的可解释性。

更新时间: 2025-03-12 01:53:55

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.06770v2

Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models

Adversarial factuality refers to the deliberate insertion of misinformation into input prompts by an adversary, characterized by varying levels of expressed confidence. In this study, we systematically evaluate the performance of several open-source large language models (LLMs) when exposed to such adversarial inputs. Three tiers of adversarial confidence are considered: strongly confident, moderately confident, and limited confidence. Our analysis encompasses eight LLMs: LLaMA 3.1 (8B), Phi 3 (3.8B), Qwen 2.5 (7B), Deepseek-v2 (16B), Gemma2 (9B), Falcon (7B), Mistrallite (7B), and LLaVA (7B). Empirical results indicate that LLaMA 3.1 (8B) exhibits a robust capability in detecting adversarial inputs, whereas Falcon (7B) shows comparatively lower performance. Notably, for the majority of the models, detection success improves as the adversary's confidence decreases; however, this trend is reversed for LLaMA 3.1 (8B) and Phi 3 (3.8B), where a reduction in adversarial confidence corresponds with diminished detection performance. Further analysis of the queries that elicited the highest and lowest rates of successful attacks reveals that adversarial attacks are more effective when targeting less commonly referenced or obscure information.

Updated: 2025-03-12 01:53:49

标题: 对抗错误信息：关于开源大型语言模型中对抗性事实性的实证研究

摘要: 对抗性事实性指的是敌对者有意向输入提示中插入错误信息，表现为不同程度的表达信心。在这项研究中，我们系统评估了几个开源的大型语言模型（LLMs）在暴露于这种对抗性输入时的表现。考虑了三种对抗性信心级别：强信心、适度信心和有限信心。我们的分析涵盖了八个LLMs：LLaMA 3.1（8B）、Phi 3（3.8B）、Qwen 2.5（7B）、Deepseek-v2（16B）、Gemma2（9B）、Falcon（7B）、Mistrallite（7B）和LLaVA（7B）。实证结果表明，LLaMA 3.1（8B）在检测对抗性输入方面具有强大的能力，而Falcon（7B）表现相对较差。值得注意的是，对于大多数模型，检测成功率随着对手信心的降低而提高；然而，对于LLaMA 3.1（8B）和Phi 3（3.8B），这一趋势相反，对抗性信心的降低对应着检测性能的降低。对导致最高和最低攻击成功率的查询的进一步分析表明，对抗性攻击在针对较少引用或模糊信息时更为有效。

更新时间: 2025-03-12 01:53:49

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2503.10690v1

JBFuzz: Jailbreaking LLMs Efficiently and Effectively Using Fuzzing

Large language models (LLMs) have shown great promise as language understanding and decision making tools, and they have permeated various aspects of our everyday life. However, their widespread availability also comes with novel risks, such as generating harmful, unethical, or offensive content, via an attack called jailbreaking. Despite extensive efforts from LLM developers to align LLMs using human feedback, they are still susceptible to jailbreak attacks. To tackle this issue, researchers often employ red-teaming to understand and investigate jailbreak prompts. However, existing red-teaming approaches lack effectiveness, scalability, or both. To address these issues, we propose JBFuzz, a novel effective, automated, and scalable red-teaming technique for jailbreaking LLMs. JBFuzz is inspired by the success of fuzzing for detecting bugs/vulnerabilities in software. We overcome three challenges related to effectiveness and scalability by devising novel seed prompts, a lightweight mutation engine, and a lightweight and accurate evaluator for guiding the fuzzer. Assimilating all three solutions results in a potent fuzzer that only requires black-box access to the target LLM. We perform extensive experimental evaluation of JBFuzz using nine popular and widely-used LLMs. We find that JBFuzz successfully jailbreaks all LLMs for various harmful/unethical questions, with an average attack success rate of 99%. We also find that JBFuzz is extremely efficient as it jailbreaks a given LLM for a given question in 60 seconds on average. Our work highlights the susceptibility of the state-of-the-art LLMs to jailbreak attacks even after safety alignment, and serves as a valuable red-teaming tool for LLM developers.

Updated: 2025-03-12 01:52:17

标题: JBFuzz：使用模糊测试高效有效地越狱LLM

摘要: 大型语言模型（LLMs）已经显示出作为语言理解和决策工具的巨大潜力，并且它们已经渗透到我们日常生活的各个方面。然而，它们的广泛可用性也带来了新的风险，例如通过一种称为越狱的攻击生成有害、不道德或冒犯性内容。尽管LLM开发人员已经付出了大量努力来使用人类反馈来对齐LLMs，但它们仍然容易受到越狱攻击的影响。为了解决这个问题，研究人员经常使用红队来理解和调查越狱提示。然而，现有的红队方法缺乏效果性、可扩展性，或两者兼而有之。为了解决这些问题，我们提出了JBFuzz，这是一种新颖、有效、自动化和可扩展的用于越狱LLMs的红队技术。 JBFuzz受到模糊测试在软件中检测漏洞/漏洞的成功启发。我们通过设计新颖的种子提示、轻量级突变引擎和轻量级准确评估器来克服与效果性和可扩展性相关的三个挑战。吸收所有三种解决方案将导致一个强大的模糊器，只需要对目标LLM进行黑盒访问。我们使用九种流行和广泛使用的LLMs对JBFuzz进行了广泛的实验评估。我们发现JBFuzz成功地越狱了所有LLMs，针对各种有害/不道德的问题，平均攻击成功率为99%。我们还发现JBFuzz非常高效，因为它平均只需要60秒就可以为给定的问题越狱给定的LLM。我们的工作突出了即使进行了安全对齐，最先进的LLMs仍然容易受到越狱攻击的影响，并为LLM开发人员提供了一种有价值的红队工具。

更新时间: 2025-03-12 01:52:17

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.08990v1

Hierarchical Contact-Rich Trajectory Optimization for Multi-Modal Manipulation using Tight Convex Relaxations

Designing trajectories for manipulation through contact is challenging as it requires reasoning of object \& robot trajectories as well as complex contact sequences simultaneously. In this paper, we present a novel framework for simultaneously designing trajectories of robots, objects, and contacts efficiently for contact-rich manipulation. We propose a hierarchical optimization framework where Mixed-Integer Linear Program (MILP) selects optimal contacts between robot \& object using approximate dynamical constraints, and then a NonLinear Program (NLP) optimizes trajectory of the robot(s) and object considering full nonlinear constraints. We present a convex relaxation of bilinear constraints using binary encoding technique such that MILP can provide tighter solutions with better computational complexity. The proposed framework is evaluated on various manipulation tasks where it can reason about complex multi-contact interactions while providing computational advantages. We also demonstrate our framework in hardware experiments using a bimanual robot system. The video summarizing this paper and hardware experiments is found https://youtu.be/s2S1Eg5RsRE?si=chPkftz_a3NAHxLq

Updated: 2025-03-12 01:43:20

标题: 使用紧凸放松的分层接触丰富轨迹优化进行多模式操作

摘要: 设计通过接触进行操作的轨迹是具有挑战性的，因为它需要同时推理物体和机器人轨迹以及复杂的接触序列。在本文中，我们提出了一个新颖的框架，用于有效地同时设计机器人、物体和接触的轨迹，以进行接触丰富的操作。我们提出了一个分层优化框架，其中混合整数线性规划（MILP）使用近似动力学约束选择机器人和物体之间的最佳接触点，然后一个非线性规划（NLP）优化机器人和物体的轨迹，考虑到完全的非线性约束。我们提出了使用二进制编码技术的双线性约束的凸松弛，使MILP能够提供更紧密的解决方案，并具有更好的计算复杂性。所提出的框架在各种操作任务上进行了评估，它可以推理复杂的多接触交互，并提供计算优势。我们还在一个双手机器人系统的硬件实验中展示了我们的框架。总结本文和硬件实验的视频可以在https://youtu.be/s2S1Eg5RsRE?si=chPkftz_a3NAHxLq找到。

更新时间: 2025-03-12 01:43:20

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.07963v2

I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?

The remarkable achievements of large language models (LLMs) have led many to conclude that they exhibit a form of intelligence. This is as opposed to explanations of their capabilities based on their ability to perform relatively simple manipulations of vast volumes of data. To illuminate the distinction between these explanations, we introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables. Under mild conditions, even when the mapping from the latent space to the observed space is non-invertible, we establish an identifiability result: the representations learned by LLMs through next-token prediction can be approximately modeled as the logarithm of the posterior probabilities of these latent discrete concepts, up to an invertible linear transformation. This theoretical finding not only provides evidence that LLMs capture underlying generative factors, but also strongly reinforces the linear representation hypothesis, which posits that LLMs learn linear representations of human-interpretable concepts. Empirically, we validate our theoretical results through evaluations on both simulation data and the Pythia, Llama, and DeepSeek model families.

Updated: 2025-03-12 01:21:17

标题: 我预测，因此我存在：下一个令牌预测足以从数据中学习人类可解释的概念吗？

摘要: 大型语言模型（LLMs）取得的显著成就导致许多人得出结论，认为它们表现出一种智能形式。这与基于它们能够执行相对简单的大量数据操作的能力解释相对。为了阐明这些解释之间的区别，我们引入了一种新颖的生成模型，该模型基于表示为潜在离散变量的人类可解释概念生成标记。在温和条件下，即使从潜在空间到观察空间的映射是不可逆的，我们建立了可辨识性结果：通过下一个标记预测学习的LLMs表示可以近似建模为这些潜在离散概念的后验概率的对数，直到可逆的线性转换。这一理论发现不仅提供了LLMs捕捉基础生成因素的证据，而且强力地支持线性表示假设，即LLMs学习人类可解释概念的线性表示。在实证方面，我们通过对模拟数据和Pythia、Llama和DeepSeek模型系列进行评估验证了我们的理论结果。

更新时间: 2025-03-12 01:21:17

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.08980v1

A conversion theorem and minimax optimality for continuum contextual bandits

We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated with the context. The goal is to minimize all the underlying functions for the received contexts, leading to the contextual notion of regret, which is stronger than the standard static regret. Assuming that the objective functions are $\gamma$-H\"older with respect to the contexts, $0<\gamma\le 1,$ we demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear contextual regret. We prove a static-to-contextual regret conversion theorem that provides an upper bound for the contextual regret of the output algorithm as a function of the static regret of the input algorithm. We further study the implications of this general result for three fundamental cases of dependency of the objective function on the action variable: (a) Lipschitz bandits, (b) convex bandits, (c) strongly convex and smooth bandits. For Lipschitz bandits and $\gamma=1,$ combining our results with the lower bound of Slivkins (2014), we prove that the minimax optimal contextual regret for the noise-free adversarial setting is achieved. Then, we prove that in the presence of noise, the contextual regret rate as a function of the number of queries is the same for convex bandits as it is for strongly convex and smooth bandits. Lastly, we present a minimax lower bound, implying two key facts. First, obtaining a sub-linear contextual regret may be impossible over functions that are not continuous with respect to the context. Second, for convex bandits and strongly convex and smooth bandits, the algorithms that we propose achieve, up to a logarithmic factor, the minimax optimal rate of contextual regret as a function of the number of queries.

Updated: 2025-03-12 01:15:47

标题: 一个连续背景赌博机的转换定理和极小极大优化

摘要: 我们研究了上下文连续性赌博机问题，其中学习者依次接收侧面信息向量，并必须在一个凸集中选择一个动作，最小化与上下文相关联的函数。目标是最小化接收到的上下文的所有基础函数，从而导致上下文概念的后悔，这比标准静态后悔更强。假设目标函数与上下文关于$\gamma$-H\"older连续，$0<\gamma\le 1$，我们证明任何实现亚线性静态后悔的算法可以扩展为实现亚线性上下文后悔。我们证明了一个静态到上下文后悔转换定理，为输出算法的上下文后悔提供了一个上界，作为输入算法的静态后悔的函数。我们进一步研究了这一通用结果对于目标函数对动作变量依赖的三种基本情况的影响：(a) Lipschitz赌徒，(b) 凸赌徒，(c) 强凸和光滑赌徒。对于Lipschitz赌徒和$\gamma=1$，将我们的结果与Slivkins (2014)的下界结合起来，我们证明了在无噪声对抗设置下，最小化的最优上下文后悔被实现。然后，我们证明在存在噪声的情况下，对于凸赌徒和强凸光滑赌徒，上下文后悔率作为查询次数的函数与凸赌徒相同。最后，我们提出了一个最小二乘下界，暗示了两个关键事实。首先，在不连续于上下文的函数上获得亚线性上下文后悔可能是不可能的。其次，对于凸赌徒和强凸光滑赌徒，我们提出的算法实现了最小二乘下界的上下文后悔率，只有一个对数因子作为查询次数的函数。

更新时间: 2025-03-12 01:15:47

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.05714v5

Federated Learning on Virtual Heterogeneous Data with Local-global Distillation

While Federated Learning (FL) is gaining popularity for training machine learning models in a decentralized fashion, numerous challenges persist, such as asynchronization, computational expenses, data heterogeneity, and gradient and membership privacy attacks. Lately, dataset distillation has emerged as a promising solution for addressing the aforementioned challenges by generating a compact synthetic dataset that preserves a model's training efficacy. However, we discover that using distilled local datasets can amplify the heterogeneity issue in FL. To address this, we propose Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation (FedLGD), where we seamlessly integrate dataset distillation algorithms into FL pipeline and train FL using a smaller synthetic dataset (referred as virtual data). Specifically, to harmonize the domain shifts, we propose iterative distribution matching to inpaint global information to local virtual data and use federated gradient matching to distill global virtual data that serve as anchor points to rectify heterogeneous local training, without compromising data privacy. We experiment on both benchmark and real-world datasets that contain heterogeneous data from different sources, and further scale up to an FL scenario that contains a large number of clients with heterogeneous and class-imbalanced data. Our method outperforms state-of-the-art heterogeneous FL algorithms under various settings. Our code is available at https://github.com/ubc-tea/FedLGD.

Updated: 2025-03-12 01:01:17

标题: 使用本地全局蒸馏在虚拟异构数据上进行联邦学习

摘要: 尽管联邦学习（FL）在以分散方式训练机器学习模型方面越来越受欢迎，但仍存在许多挑战，如异步化、计算开销、数据异质性、梯度和成员隐私攻击。最近，数据集精炼已经成为解决上述挑战的一种有前途的方法，通过生成一个保留模型训练效果的紧凑合成数据集。然而，我们发现使用精炼的本地数据集可能会放大FL中的异质性问题。为了解决这个问题，我们提出了基于虚拟异构数据的联邦学习与本地-全局数据集精炼（FedLGD）的方法，我们将数据集精炼算法无缝集成到FL流程中，并使用一个更小的合成数据集（称为虚拟数据）进行FL训练。具体地，为了协调领域变化，我们提出迭代分布匹配将全局信息填充到本地虚拟数据中，并使用联邦梯度匹配来提炼作为锚点的全局虚拟数据，以纠正异质本地训练，同时不损害数据隐私。我们在包含来自不同来源的异构数据的基准和真实世界数据集上进行了实验，并进一步扩展到包含大量客户端具有异构和类不平衡数据的FL场景。我们的方法在各种设置下优于最先进的异构FL算法。我们的代码可在https://github.com/ubc-tea/FedLGD 上找到。

更新时间: 2025-03-12 01:01:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2303.02278v3

TetraGrip: Sensor-Driven Multi-Suction Reactive Object Manipulation in Cluttered Scenes

Warehouse robotic systems equipped with vacuum grippers must reliably grasp a diverse range of objects from densely packed shelves. However, these environments present significant challenges, including occlusions, diverse object orientations, stacked and obstructed items, and surfaces that are difficult to suction. We introduce \tetra, a novel vacuum-based grasping strategy featuring four suction cups mounted on linear actuators. Each actuator is equipped with an optical time-of-flight (ToF) proximity sensor, enabling reactive grasping. We evaluate \tetra in a warehouse-style setting, demonstrating its ability to manipulate objects in stacked and obstructed configurations. Our results show that our RL-based policy improves picking success in stacked-object scenarios by 22.86\% compared to a single-suction gripper. Additionally, we demonstrate that TetraGrip can successfully grasp objects in scenarios where a single-suction gripper fails due to physical limitations, specifically in two cases: (1) picking an object occluded by another object and (2) retrieving an object in a complex scenario. These findings highlight the advantages of multi-actuated, suction-based grasping in unstructured warehouse environments. The project website is available at: \href{https://tetragrip.github.io/}{https://tetragrip.github.io/}.

Updated: 2025-03-12 00:53:52

标题: 四指抓握：传感器驱动的多吸附反应式物体操作在混乱场景中

摘要: 配备真空夹具的仓库机器人系统必须可靠地抓取密集堆放货架上的各种物体。然而，这些环境存在重大挑战，包括遮挡、各种物体方向、堆叠和阻塞物品，以及难以吸附的表面。我们介绍了一种新颖的基于真空的抓取策略\textit{tetra}，其特点是在线性执行器上安装了四个吸盘。每个执行器配备了光学飞行时间（ToF）接近传感器，实现了反应性抓取。我们在一个类似仓库的环境中评估了\textit{tetra}，展示了其在堆叠和阻塞配置中操纵物体的能力。我们的结果表明，相对于单吸盘夹具，我们基于强化学习的策略在堆叠物体场景中提高了22.86\%的拾取成功率。此外，我们证明了TetraGrip能够成功抓取单吸盘夹具由于物理限制而失败的场景中的物体，特别是在两种情况下：（1）拾取被另一个物体遮挡的物体和（2）在复杂场景中检索物体。这些发现突显了在无序仓库环境中使用多执行器、基于吸附的抓取的优势。项目网站可访问：\href{https://tetragrip.github.io/}{https://tetragrip.github.io/}。

更新时间: 2025-03-12 00:53:52

领域: cs.RO,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.08978v1

MERGE -- A Bimodal Dataset for Static Music Emotion Recognition

The Music Emotion Recognition (MER) field has seen steady developments in recent years, with contributions from feature engineering, machine learning, and deep learning. The landscape has also shifted from audio-centric systems to bimodal ensembles that combine audio and lyrics. However, a severe lack of public and sizeable bimodal databases has hampered the development and improvement of bimodal audio-lyrics systems. This article proposes three new audio, lyrics, and bimodal MER research datasets, collectively called MERGE, created using a semi-automatic approach. To comprehensively assess the proposed datasets and establish a baseline for benchmarking, we conducted several experiments for each modality, using feature engineering, machine learning, and deep learning methodologies. In addition, we propose and validate fixed train-validate-test splits. The obtained results confirm the viability of the proposed datasets, achieving the best overall result of 79.21% F1-score for bimodal classification using a deep neural network.

Updated: 2025-03-12 00:52:43

标题: MERGE--一个用于静态音乐情感识别的双模数据集

摘要: 音乐情感识别（MER）领域近年来取得了稳步发展，得益于特征工程、机器学习和深度学习的贡献。该领域的发展也从以音频为中心的系统转变为结合音频和歌词的双模态集成系统。然而，公开且规模可观的双模态数据库严重不足，阻碍了双模态音频-歌词系统的发展和改进。本文提出了三个新的音频、歌词和双模态MER研究数据集，统称为MERGE，采用半自动化方法创建。为了全面评估所提出的数据集并建立基准进行基准测试，我们对每种模态进行了多次实验，使用特征工程、机器学习和深度学习方法。此外，我们提出并验证了固定的训练-验证-测试拆分。实验结果证实了所提出数据集的可行性，使用深度神经网络实现了双模态分类的最佳综合结果，F1分数为79.21%。

更新时间: 2025-03-12 00:52:43

领域: cs.SD,cs.IR,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2407.06060v2

RandLoRA: Full-rank parameter-efficient fine-tuning of large models

Low-Rank Adaptation (LoRA) and its variants have shown impressive results in reducing the number of trainable parameters and memory requirements of large transformer networks while maintaining fine-tuning performance. The low-rank nature of the weight update inherently limits the representation power of fine-tuned models, however, thus potentially compromising performance on complex tasks. This raises a critical question: when a performance gap between LoRA and standard fine-tuning is observed, is it due to the reduced number of trainable parameters or the rank deficiency? This paper aims to answer this question by introducing RandLoRA, a parameter-efficient method that performs full-rank updates using a learned linear combinations of low-rank, non-trainable random matrices. Our method limits the number of trainable parameters by restricting optimization to diagonal scaling matrices applied to the fixed random matrices. This allows us to effectively overcome the low-rank limitations while maintaining parameter and memory efficiency during training. Through extensive experimentation across vision, language, and vision-language benchmarks, we systematically evaluate the limitations of LoRA and existing random basis methods. Our findings reveal that full-rank updates are beneficial across vision and language tasks individually, and even more so for vision-language tasks, where RandLoRA significantly reduces -- and sometimes eliminates -- the performance gap between standard fine-tuning and LoRA, demonstrating its efficacy.

Updated: 2025-03-12 00:43:45

标题: RandLoRA：大型模型的全秩参数高效微调

摘要: 低秩适应（LoRA）及其变种在减少大型变压器网络的可训练参数和内存需求方面表现出色，同时保持微调性能。然而，权重更新的低秩特性固有地限制了微调模型的表示能力，因此可能会损害在复杂任务上的性能。这引发了一个关键问题：当观察到LoRA和标准微调之间的性能差距时，是由于可训练参数数量减少还是秩缺陷？本文旨在通过引入RandLoRA来回答这个问题，这是一种参数高效的方法，使用学习的低秩、不可训练随机矩阵的线性组合执行全秩更新。我们的方法通过将优化限制在应用于固定随机矩阵的对角缩放矩阵，限制了可训练参数的数量。这使我们能够有效地克服低秩限制，同时在训练过程中保持参数和内存效率。通过对视觉、语言和视觉-语言基准的广泛实验，我们系统评估了LoRA和现有随机基方法的限制。我们的发现表明，全秩更新在视觉和语言任务中各自都是有益的，对于视觉-语言任务来说尤其如此，其中RandLoRA显著减少了标准微调和LoRA之间的性能差距，有时甚至消除了，证明了其有效性。

更新时间: 2025-03-12 00:43:45

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2502.00987v2

ExBody2: Advanced Expressive Humanoid Whole-Body Control

This paper tackles the challenge of enabling real-world humanoid robots to perform expressive and dynamic whole-body motions while maintaining overall stability and robustness. We propose Advanced Expressive Whole-Body Control (Exbody2), a method for producing whole-body tracking controllers that are trained on both human motion capture and simulated data and then transferred to the real world. We introduce a technique for decoupling the velocity tracking of the entire body from tracking body landmarks. We use a teacher policy to produce intermediate data that better conforms to the robot's kinematics and to automatically filter away infeasible whole-body motions. This two-step approach enabled us to produce a student policy that can be deployed on the robot that can walk, crouch, and dance. We also provide insight into the trade-off between versatility and the tracking performance on specific motions. We observed significant improvement of tracking performance after fine-tuning on a small amount of data, at the expense of the others.

Updated: 2025-03-12 00:40:43

标题: ExBody2: 先进的表现力人形机器人全身控制

摘要: 本文解决了使现实世界中的人形机器人能够执行富有表现力和动态的整体运动，同时保持整体稳定性和鲁棒性的挑战。我们提出了高级表现力整体控制（Exbody2），这是一种用于生成在人类动作捕捉和模拟数据上训练并然后转移到现实世界的整体跟踪控制器的方法。我们引入了一种技术，用于将整个身体的速度跟踪与跟踪身体标志分开。我们使用教师策略生成更符合机器人运动学并自动过滤掉不可行的整体运动的中间数据。这种两步方法使我们能够生成可以部署在机器人上的学生策略，可以行走、蹲下和跳舞。我们还提供了关于通用性和特定动作跟踪性能之间的权衡的见解。我们观察到，在少量数据微调后，跟踪性能显著提高，而其他方面则有所牺牲。

更新时间: 2025-03-12 00:40:43

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.13196v2

Not All Edges are Equally Robust: Evaluating the Robustness of Ranking-Based Federated Learning

Federated Ranking Learning (FRL) is a state-of-the-art FL framework that stands out for its communication efficiency and resilience to poisoning attacks. It diverges from the traditional FL framework in two ways: 1) it leverages discrete rankings instead of gradient updates, significantly reducing communication costs and limiting the potential space for malicious updates, and 2) it uses majority voting on the server side to establish the global ranking, ensuring that individual updates have minimal influence since each client contributes only a single vote. These features enhance the system's scalability and position FRL as a promising paradigm for FL training. However, our analysis reveals that FRL is not inherently robust, as certain edges are particularly vulnerable to poisoning attacks. Through a theoretical investigation, we prove the existence of these vulnerable edges and establish a lower bound and an upper bound for identifying them in each layer. Based on this finding, we introduce a novel local model poisoning attack against FRL, namely the Vulnerable Edge Manipulation (VEM) attack. The VEM attack focuses on identifying and perturbing the most vulnerable edges in each layer and leveraging an optimization-based approach to maximize the attack's impact. Through extensive experiments on benchmark datasets, we demonstrate that our attack achieves an overall 53.23% attack impact and is 3.7x more impactful than existing methods. Our findings highlight significant vulnerabilities in ranking-based FL systems and underline the urgency for the development of new robust FL frameworks.

Updated: 2025-03-12 00:38:14

标题: 并非所有边缘节点都同样强大：评估基于排名的联邦学习的稳健性

摘要: 联邦排名学习（FRL）是一种最先进的联邦学习框架，因其通信效率和对毒化攻击的弹性而脱颖而出。它在两个方面与传统的联邦学习框架有所不同：1）它利用离散排名而不是梯度更新，显著降低通信成本并限制恶意更新的潜在空间；2）它在服务器端使用多数投票来建立全局排名，确保个体更新的最小影响，因为每个客户端只贡献一票。这些特性增强了系统的可扩展性，并将FRL定位为联邦学习培训的一个有前途的范式。然而，我们的分析揭示了FRL并非天生具有鲁棒性，因为某些边缘特别容易受到毒化攻击。通过理论研究，我们证明了这些脆弱边缘的存在，并为在每一层中识别它们建立了下限和上限。基于这一发现，我们引入了一种针对FRL的新型本地模型毒化攻击，即脆弱边缘操纵（VEM）攻击。VEM攻击侧重于识别和扰乱每一层中最脆弱的边缘，并利用基于优化的方法来最大化攻击的影响。通过对基准数据集的广泛实验，我们证明了我们的攻击实现了总体攻击影响的53.23%，比现有方法更具有影响力。我们的发现突显了基于排名的联邦学习系统存在重大漏洞，并强调了开发新的鲁棒联邦学习框架的紧迫性。

更新时间: 2025-03-12 00:38:14

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2503.08976v1

COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Training large-scale neural networks in vision, and multimodal domains demands substantial memory resources, primarily due to the storage of optimizer states. While LoRA, a popular parameter-efficient method, reduces memory usage, it often suffers from suboptimal performance due to the constraints of low-rank updates. Low-rank gradient projection methods (e.g., GaLore, Flora) reduce optimizer memory by projecting gradients and moment estimates into low-rank spaces via singular value decomposition or random projection. However, they fail to account for inter-projection correlation, causing performance degradation, and their projection strategies often incur high computational costs. In this paper, we present COAP (Correlation-Aware Gradient Projection), a memory-efficient method that minimizes computational overhead while maintaining training performance. Evaluated across various vision, language, and multimodal tasks, COAP outperforms existing methods in both training speed and model performance. For LLaMA-1B, it reduces optimizer memory by 61% with only 2% additional time cost, achieving the same PPL as AdamW. With 8-bit quantization, COAP cuts optimizer memory by 81% and achieves 4x speedup over GaLore for LLaVA-v1.5-7B fine-tuning, while delivering higher accuracy.

Updated: 2025-03-12 00:36:08

标题: COAP: 考虑相关性的梯度投影记忆高效训练

摘要: 在视觉和多模态领域训练大规模神经网络需要大量的内存资源，主要是由于优化器状态的存储。虽然 LoRA，一种流行的参数高效方法，可以减少内存使用，但由于低秩更新的限制，经常遭受次优性能。低秩梯度投影方法（例如 GaLore，Flora）通过奇异值分解或随机投影将梯度和矩估计投影到低秩空间中，从而减少优化器内存。然而，它们未能考虑投影之间的相关性，导致性能下降，而且它们的投影策略通常会产生高计算成本。在本文中，我们提出了 COAP（Correlation-Aware Gradient Projection），这是一种内存高效的方法，可以减少计算开销同时保持训练性能。在各种视觉、语言和多模态任务中进行评估，COAP 在训练速度和模型性能方面均优于现有方法。对于 LLaMA-1B，它将优化器内存减少了61%，只增加了2% 的时间成本，实现了与 AdamW 相同的 PPL。通过8位量化，COAP 将优化器内存减少了81%，在 LLaVA-v1.5-7B 微调中比 GaLore 实现了4倍速度提升，同时提高了准确性。

更新时间: 2025-03-12 00:36:08

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2412.00071v2

Quantitative Analysis of Deeply Quantized Tiny Neural Networks Robust to Adversarial Attacks

Reducing the memory footprint of Machine Learning (ML) models, especially Deep Neural Networks (DNNs), is imperative to facilitate their deployment on resource-constrained edge devices. However, a notable drawback of DNN models lies in their susceptibility to adversarial attacks, wherein minor input perturbations can deceive them. A primary challenge revolves around the development of accurate, resilient, and compact DNN models suitable for deployment on resource-constrained edge devices. This paper presents the outcomes of a compact DNN model that exhibits resilience against both black-box and white-box adversarial attacks. This work has achieved this resilience through training with the QKeras quantization-aware training framework. The study explores the potential of QKeras and an adversarial robustness technique, Jacobian Regularization (JR), to co-optimize the DNN architecture through per-layer JR methodology. As a result, this paper has devised a DNN model employing this co-optimization strategy based on Stochastic Ternary Quantization (STQ). Its performance was compared against existing DNN models in the face of various white-box and black-box attacks. The experimental findings revealed that, the proposed DNN model had small footprint and on average, it exhibited better performance than Quanos and DS-CNN MLCommons/TinyML (MLC/T) benchmarks when challenged with white-box and black-box attacks, respectively, on the CIFAR-10 image and Google Speech Commands audio datasets.

Updated: 2025-03-12 00:34:25

标题: 深度量化微小神经网络对抗攻击的定量分析

摘要: 降低机器学习（ML）模型，特别是深度神经网络（DNN）的内存占用是至关重要的，以便将它们部署在资源受限的边缘设备上。然而，DNN模型的一个显著缺点在于它们容易受到对抗性攻击的影响，即微小的输入扰动可能会欺骗它们。一个主要挑战围绕着开发准确、具有韧性和紧凑的DNN模型，适用于在资源受限的边缘设备上部署。本文介绍了一种紧凑的DNN模型的结果，该模型对黑盒和白盒对抗性攻击都表现出韧性。这项工作通过使用QKeras量化感知训练框架进行训练来实现这种韧性。该研究探讨了QKeras和一种对抗鲁棒性技术，雅可比正则化（JR），通过逐层JR方法来共同优化DNN架构的潜力。因此，本文设计了一种基于随机三元量化（STQ）的DNN模型，采用这种共同优化策略。在CIFAR-10图像和Google Speech Commands音频数据集上分别挑战白盒和黑盒攻击时，该模型的性能与现有DNN模型进行了比较。实验结果显示，提出的DNN模型具有较小的内存占用，并且在面对白盒和黑盒攻击时，平均表现优于Quanos和DS-CNN MLCommons/TinyML（MLC/T）基准。

更新时间: 2025-03-12 00:34:25

领域: cs.LG,cs.CR,cs.PF

下载: http://arxiv.org/abs/2503.08973v1

Evaluation of state-of-the-art deep learning models in the segmentation of the heart ventricles in parasternal short-axis echocardiograms

Previous studies on echocardiogram segmentation are focused on the left ventricle in parasternal long-axis views. In this study, deep-learning models were evaluated on the segmentation of the ventricles in parasternal short-axis echocardiograms (PSAX-echo). Segmentation of the ventricles in complementary echocardiogram views will allow the computation of important metrics with the potential to aid in diagnosing cardio-pulmonary diseases and other cardiomyopathies. Evaluating state-of-the-art models with small datasets can reveal if they improve performance on limited data. PSAX-echo were performed on 33 volunteer women. An experienced cardiologist identified end-diastole and end-systole frames from 387 scans, and expert observers manually traced the contours of the cardiac structures. Traced frames were pre-processed and used to create labels to train 2 specific-domain (Unet-Resnet101 and Unet-ResNet50), and 4 general-domain (3 Segment Anything (SAM) variants, and the Detectron2) deep-learning models. The performance of the models was evaluated using the Dice similarity coefficient (DSC), Hausdorff distance (HD), and difference in cross-sectional area (DCSA). The Unet-Resnet101 model provided superior performance in the segmentation of the ventricles with 0.83, 4.93 pixels, and 106 pixel2 on average for DSC, HD, and DCSA respectively. A fine-tuned MedSAM model provided a performance of 0.82, 6.66 pixels, and 1252 pixel2, while the Detectron2 model provided 0.78, 2.12 pixels, and 116 pixel2 for the same metrics respectively. Deep-learning models are suitable for the segmentation of the left and right ventricles in PSAX-echo. This study demonstrated that specific-domain trained models such as Unet-ResNet provide higher accuracy for echo segmentation than general-domain segmentation models when working with small and locally acquired datasets.

Updated: 2025-03-12 00:33:01

标题: 评估现代深度学习模型在剖面短轴超声心动图中心室分割中的应用

摘要: 先前关于超声心动图分割的研究主要集中在经胸长轴视图中的左心室。本研究评估了深度学习模型在经胸短轴超声心动图（PSAX-echo）中对心室的分割效果。在补充性超声心动图视图中对心室进行分割将有助于计算重要指标，有助于诊断心肺疾病和其他心肌病。通过使用小数据集评估最先进的模型，可以揭示它们在有限数据上是否提高性能。33名志愿者女性接受了PSAX-echo检查。一位经验丰富的心脏病专家从387张扫描中确定了舒张末期和收缩末期帧，专家观察员手动跟踪了心脏结构的轮廓。跟踪的帧经过预处理并用于创建标签，以训练两个特定领域（Unet-Resnet101和Unet-ResNet50）和四个通用领域（3个Segment Anything（SAM）变体和Detectron2）的深度学习模型。使用Dice相似系数（DSC）、Hausdorff距离（HD）和横截面面积差异（DCSA）评估了模型的性能。Unet-Resnet101模型在心室分割中表现出优越性能，平均DSC、HD和DCSA分别为0.83、4.93像素和106像素2。精调的MedSAM模型在相同指标上表现为0.82、6.66像素和1252像素2，而Detectron2模型分别提供了0.78、2.12像素和116像素2的性能。深度学习模型适用于在PSAX-echo中分割左右心室。本研究表明，与使用小型和本地获取的数据集时的通用领域分割模型相比，特定领域训练的模型（如Unet-ResNet）在超声分割中提供更高的准确性。

更新时间: 2025-03-12 00:33:01

领域: eess.IV,cs.CV,cs.LG,I.4.6; I.2.m

下载: http://arxiv.org/abs/2503.08970v1

Large Language Models-Aided Program Debloating

As software grows in complexity to accommodate diverse features and platforms, software bloating has emerged as a significant challenge, adversely affecting performance and security. However, existing approaches inadequately address the dual objectives of debloating: maintaining functionality by preserving essential features and enhancing security by reducing security issues. Specifically, current software debloating techniques often rely on input-based analysis, using user inputs as proxies for the specifications of desired features. However, these approaches frequently overfit provided inputs, leading to functionality loss and potential security vulnerabilities. To address these limitations, we propose LEADER, a program debloating framework enhanced by Large Language Models (LLMs), which leverages their semantic understanding, generative capabilities, and decision-making strengths. LEADER mainly consists of two modules: (1) a documentation-guided test augmentation module designed to preserve functionality, which leverages LLMs to comprehend program documentation and generates sufficient tests to cover the desired features comprehensively, and (2) a multi-advisor-aided program debloating module that employs a neuro-symbolic pipeline to ensure that the security of the software can be perceived during debloating. This module combines debloating and security advisors for analysis and employs an LLM as a decision-maker to eliminate undesired code securely. Extensive evaluations on widely used benchmarks demonstrate the efficacy of LEADER. These results demonstrate that LEADER surpasses the state-of-the-art tool CovA in functionality and security. These results underscore the potential of LEADER to set a new standard in program debloating by effectively balancing functionality and security.

Updated: 2025-03-12 00:30:51

标题: 大型语言模型辅助的程序精简

摘要: 随着软件变得越来越复杂以适应各种功能和平台，软件膨胀已经成为一个重要挑战，对性能和安全性产生不利影响。然而，现有方法未能充分解决软件膨胀的双重目标：通过保留必要功能来维持功能性，并通过减少安全问题来增强安全性。具体来说，当前的软件膨胀技术通常依赖于基于输入的分析，使用用户输入作为期望功能规范的代理。然而，这些方法经常过度拟合提供的输入，导致功能丧失和潜在安全漏洞。为了解决这些限制，我们提出了LEADER，一个通过大型语言模型（LLMs）增强的程序膨胀框架，利用它们的语义理解、生成能力和决策能力。LEADER主要包括两个模块：（1）一个以文档为导向的测试增强模块，旨在保留功能，利用LLMs理解程序文档并生成足够的测试来全面覆盖所需功能；（2）一个多顾问辅助的程序膨胀模块，采用神经符号化流程来确保在膨胀过程中可以感知软件的安全性。该模块结合了膨胀和安全顾问的分析，并利用LLM作为决策者来安全地消除不需要的代码。对广泛使用的基准进行的大量评估显示了LEADER的有效性。这些结果表明，LEADER在功能性和安全性方面超越了现有工具CovA。这些结果强调了LEADER在有效平衡功能性和安全性方面设立新标准的潜力。

更新时间: 2025-03-12 00:30:51

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2503.08969v1

CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash Processing

Homomorphic encryption (HE) allows secure computation on encrypted data without revealing the original data, providing significant benefits for privacy-sensitive applications. Many cloud computing applications (e.g., DNA read mapping, biometric matching, web search) use exact string matching as a key operation. However, prior string matching algorithms that use homomorphic encryption are limited by high computational latency caused by the use of complex operations and data movement bottlenecks due to the large encrypted data size. In this work, we provide an efficient algorithm-hardware codesign to accelerate HE-based secure exact string matching. We propose CIPHERMATCH, which (i) reduces the increase in memory footprint after encryption using an optimized software-based data packing scheme, (ii) eliminates the use of costly homomorphic operations (e.g., multiplication and rotation), and (iii) reduces data movement by designing a new in-flash processing (IFP) architecture. We demonstrate the benefits of CIPHERMATCH using two case studies: (1) Exact DNA string matching and (2) encrypted database search. Our pure software-based CIPHERMATCH implementation that uses our memory-efficient data packing scheme improves performance and reduces energy consumption by 42.9X and 17.6X, respectively, compared to the state-of-the-art software baseline. Integrating CIPHERMATCH with IFP improves performance and reduces energy consumption by 136.9X and 256.4X, respectively, compared to the software-based CIPHERMATCH implementation.

Updated: 2025-03-12 00:25:58

标题: CIPHERMATCH: 通过内存高效数据打包和快闪处理加速同态加密字符串匹配

摘要: 同态加密（HE）允许在加密数据上进行安全计算，而不会泄露原始数据，为隐私敏感的应用程序提供重要的好处。许多云计算应用程序（例如DNA读取映射、生物特征匹配、网络搜索）使用精确字符串匹配作为关键操作。然而，先前使用同态加密的字符串匹配算法受限于由于复杂操作和数据移动瓶颈导致的高计算延迟，原因在于加密数据大小较大。在这项工作中，我们提供了一种有效的算法-硬件代码设计，用于加速基于HE的安全精确字符串匹配。我们提出了CIPHERMATCH，它（i）使用优化的基于软件的数据打包方案，减少加密后内存足迹的增加，（ii）消除昂贵的同态操作（例如乘法和旋转）的使用，（iii）通过设计新的闪存处理（IFP）架构来减少数据移动。我们使用两个案例研究展示了CIPHERMATCH的好处：（1）精确DNA字符串匹配和（2）加密数据库搜索。我们纯基于软件的CIPHERMATCH实现使用我们的内存高效的数据打包方案，相比于最先进的软件基准，性能提高了42.9倍，能耗降低了17.6倍。将CIPHERMATCH与IFP集成，相比于基于软件的CIPHERMATCH实现，性能提高了136.9倍，能耗降低了256.4倍。

更新时间: 2025-03-12 00:25:58

领域: cs.CR,cs.AR,cs.DC

下载: http://arxiv.org/abs/2503.08968v1

Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data. Our goal is to improve audio classification accuracy with limited labeled data. Traditional data augmentation techniques, which apply artificial transformations (e.g., adding random noise or masking segments), struggle to create data that captures the true diversity present in real-world audios. To address this shortcoming, we propose to augment the dataset with synthetic audio generated from text-to-audio (T2A) diffusion models. However, synthesizing effective augmentations is challenging because not only should the generated data be acoustically consistent with the underlying small-scale dataset, but they should also have sufficient compositional diversity. To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization. This ensures that the acoustic characteristics of the generated data remain consistent with the small-scale dataset. To address the second challenge, we propose a novel caption generation technique that leverages the reasoning capabilities of Large Language Models to (1) generate diverse and meaningful audio captions and (2) iteratively refine their quality. The generated captions are then used to prompt the aligned T2A model. We extensively evaluate Synthio on ten datasets and four simulated limited-data settings. Results indicate our method consistently outperforms all baselines by 0.1%-39% using a T2A model trained only on weakly-captioned AudioSet.

Updated: 2025-03-12 00:25:08

标题: Synthio：利用合成数据增强小规模音频分类数据集

摘要: 我们提出了Synthio，这是一种新颖的方法，用于通过合成数据来增强小规模音频分类数据集。我们的目标是在有限标记数据的情况下提高音频分类的准确性。传统的数据增强技术，如应用人工变换（例如添加随机噪音或遮蔽片段），很难创造出捕捉真实世界音频中存在的真实多样性的数据。为了解决这个缺点，我们提出使用从文本到音频（T2A）扩散模型生成的合成音频来增强数据集。然而，合成有效的增强是具有挑战性的，因为生成的数据不仅应该在声学上与基础小规模数据集保持一致，而且还应该具有足够的组成多样性。为了克服第一个挑战，我们利用偏好优化来对齐T2A模型的生成和小规模数据集。这确保了生成数据的声学特征与小规模数据集保持一致。为了解决第二个挑战，我们提出了一种新颖的标题生成技术，利用大型语言模型的推理能力来（1）生成多样且有意义的音频标题，并且（2）迭代地改进它们的质量。生成的标题然后用于提示对齐的T2A模型。我们在十个数据集和四个模拟的有限数据设置上对Synthio进行了广泛评估。结果表明，我们的方法在仅在弱标记的AudioSet上训练的T2A模型中始终优于所有基线方法0.1％-39％。

更新时间: 2025-03-12 00:25:08

领域: eess.AS,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.02056v2