Arxiv Day: Article

On the Necessity of Output Distribution Reweighting for Effective Class Unlearning

In this work, we introduce an output-reweighting unlearning method, RWFT, a lightweight technique that erases an entire class from a trained classifier without full retraining. Forgetting specific classes from trained models is essential for enforcing user deletion rights and mitigating harmful or biased predictions. The full retraining is costly and existing unlearning methods fail to replicate the behavior of the retrained models when predicting samples from the unlearned class. We prove this failure by designing a variant of membership inference attacks, MIA-NN that successfully reveals the unlearned class for any of these methods. We propose a simple redistribution of the probability mass for the prediction on the samples in the forgotten class which is robust to MIA-NN. We also introduce a new metric based on the total variation (TV) distance of the prediction probabilities to quantify residual leakage to prevent future methods from susceptibility to the new attack. Through extensive experiments with state of the art baselines in machine unlearning, we show that our approach matches the results of full retraining in both metrics used for evaluation by prior work and the new metric we propose in this work. Compare to state-of-the-art methods, we gain 2.79% in previously used metrics and 111.45% in our new TV-based metric over the best existing method.

Updated: 2025-06-25 23:53:56

标题: 关于为有效的类别遗忘重新加权输出分布的必要性

摘要: 在这项工作中，我们引入了一种输出重新加权的遗忘方法，RWFT，这是一种轻量级技术，可以从经过训练的分类器中删除一个完整的类别，而无需进行完全的重新训练。从经过训练的模型中遗忘特定类别对于执行用户删除权利和减轻有害或偏见预测至关重要。完全重新训练是昂贵的，现有的遗忘方法在预测来自未学习类别的样本时无法复制重新训练模型的行为。我们通过设计一种成员推理攻击的变种MIA-NN来证明这种失败，该攻击成功地揭示了任何这些方法中的未学习类别。我们提出了一个简单的概率质量重新分配方案，用于对已遗忘类别中的样本进行预测，该方案对MIA-NN具有鲁棒性。我们还引入了一个基于预测概率的总变差（TV）距离的新度量标准，用于量化残余泄漏，以防止未来的方法对新攻击的敏感性。通过与机器遗忘领域的最新基线进行广泛的实验，我们展示了我们的方法在先前工作中用于评估的两个度量标准以及我们在本文中提出的新度量标准中与完全重新训练的结果相匹配。与最先进的方法相比，我们在先前使用的度量标准上获得了2.79％的提升，而在我们的新基于TV的度量标准上获得了111.45％的提升。

更新时间: 2025-06-25 23:53:56

领域: cs.LG

下载: http://arxiv.org/abs/2506.20893v1

Next-token prediction capacity: general upper bounds and a lower bound for transformers

Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributions for has not been established. To fill this gap, we prove upper and lower bounds on this number, which are equal up to a multiplicative constant. We prove these bounds in the general setting where next-token distributions can be arbitrary as well as the empirical setting where they are calculated from a finite number of document sequences. Our lower bounds are for one-layer multi-head decoder-only transformers and our proofs highlight an important injectivity property satisfied by self-attention. Furthermore, we provide numerical evidence that the minimal number of parameters for memorization is sufficient for being able to train the model to the entropy lower bound.

Updated: 2025-06-25 23:53:42

标题: 下一个标记预测能力：transformers的一般上界和下界

摘要: 鉴于一系列标记，例如单词，下一个标记预测的任务是预测下一个标记的条件概率分布。仅解码器变压器已成为这一任务的有效模型，但它们的性质仍未完全理解。特别是，尚未确定仅解码器变压器可以插值下一个标记分布的最大不同上下文序列数量。为了填补这一空白，我们证明了这一数量的上限和下限，它们在乘法常数上相等。我们在下一个标记分布可以是任意的一般设置以及从有限数量的文档序列计算得出的经验设置中证明了这些上限。我们的下限适用于单层多头解码器变压器，并且我们的证明突出了自注意力满足的一个重要的单射性质。此外，我们提供了数值证据表明，用于记忆的最小参数数量足以训练模型达到熵下限。

更新时间: 2025-06-25 23:53:42

领域: cs.LG,math.OC,15A03, 26B35

下载: http://arxiv.org/abs/2405.13718v3

Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues

This paper presents an evaluation framework for agentic AI systems in mission-critical negotiation contexts, addressing the need for AI agents that can adapt to diverse human operators and stakeholders. Using Sotopia as a simulation testbed, we present two experiments that systematically evaluated how personality traits and AI agent characteristics influence LLM-simulated social negotiation outcomes--a capability essential for a variety of applications involving cross-team coordination and civil-military interactions. Experiment 1 employs causal discovery methods to measure how personality traits impact price bargaining negotiations, through which we found that Agreeableness and Extraversion significantly affect believability, goal achievement, and knowledge acquisition outcomes. Sociocognitive lexical measures extracted from team communications detected fine-grained differences in agents' empathic communication, moral foundations, and opinion patterns, providing actionable insights for agentic AI systems that must operate reliably in high-stakes operational scenarios. Experiment 2 evaluates human-AI job negotiations by manipulating both simulated human personality and AI system characteristics, specifically transparency, competence, adaptability, demonstrating how AI agent trustworthiness impact mission effectiveness. These findings establish a repeatable evaluation methodology for experimenting with AI agent reliability across diverse operator personalities and human-agent team dynamics, directly supporting operational requirements for reliable AI systems. Our work advances the evaluation of agentic AI workflows by moving beyond standard performance metrics to incorporate social dynamics essential for mission success in complex operations.

Updated: 2025-06-25 23:42:18

标题: 探讨大五人格特质和人工智能能力对LLM模拟谈判对话的影响

摘要: 本文提出了一个评估框架，用于在关键任务协商背景下评估主动型人工智能系统，解决了需要适应多样化人类操作者和利益相关者的AI代理的需求。利用Sotopia作为模拟测试平台，我们进行了两个实验，系统评估了人格特征和AI代理特征如何影响LLM模拟社交协商结果，这对于涉及跨团队协调和军民互动的各种应用至关重要。实验1采用因果发现方法来衡量人格特征如何影响价格谈判，我们发现宜人性和外向性显著影响可信度、目标实现和知识获取结果。从团队沟通中提取的社会认知词汇措施检测到代理的共情沟通、道德基础和观点模式的细微差异，为必须在高风险运营场景中可靠运作的主动型AI系统提供可操作的见解。实验2通过操纵模拟人类人格和AI系统特征，特别是透明度、能力、适应性，评估了人类-AI工作谈判，展示了AI代理的信任度如何影响任务效果。这些发现建立了一个可重复的评估方法论，用于在不同运算员人格和人-代理团队动态之间实验AI代理可靠性，直接支持可靠AI系统的运营要求。我们的工作通过超越标准性能指标，将社交动态纳入评估主动型AI工作流程，推动了对复杂作战任务成功至关重要的社交动态的评估。

更新时间: 2025-06-25 23:42:18

领域: cs.AI,cs.CL,cs.HC

下载: http://arxiv.org/abs/2506.15928v2

Improving Human-AI Coordination through Online Adversarial Training and Generative Models

Being able to cooperate with new people is an important component of many economically valuable AI tasks, from household robotics to autonomous driving. However, generalizing to novel humans requires training on data that captures the diversity of human behaviors. Adversarial training is a promising method that allows dynamic data generation and ensures that agents are robust. It creates a feedback loop where the agent's performance influences the generation of new adversarial data, which can be used immediately to train the agent. However, adversarial training is difficult to apply in a cooperative task; how can we train an adversarial cooperator? We propose a novel strategy that combines a pretrained generative model to simulate valid cooperative agent policies with adversarial training to maximize regret. We call our method GOAT: Generative Online Adversarial Training. In this framework, the GOAT dynamically searches the latent space of the generative model for coordination strategies where the learning policy, the Cooperator agent, underperforms. GOAT enables better generalization by exposing the Cooperator to various challenging interaction scenarios. We maintain realistic coordination strategies by keeping the generative model frozen, thus avoiding adversarial exploitation. We evaluate GOAT with real human partners, and the results demonstrate state of the art performance on the Overcooked benchmark, highlighting its effectiveness in generalizing to diverse human behaviors.

Updated: 2025-06-25 23:40:16

标题: 通过在线对抗训练和生成模型来提高人工智能与人类的协调能力

摘要: 能够与新人合作是许多具有经济价值的人工智能任务的重要组成部分，从家庭机器人到自动驾驶。然而，要推广到新的人类，需要在捕捉人类行为多样性的数据上进行训练。对抗训练是一种有前途的方法，它允许动态数据生成并确保代理程序具有鲁棒性。它创建了一个反馈循环，其中代理程序的表现影响对抗数据的生成，这些数据可以立即用来训练代理程序。然而，在合作任务中很难应用对抗训练；我们如何训练一个对抗合作者？我们提出了一种结合了预训练生成模型来模拟有效的合作代理策略与对抗训练以最大化遗憾的新策略。我们称之为GOAT：生成在线对抗训练。在这个框架中，GOAT动态地搜索生成模型的潜在空间，以寻找学习策略、合作者代理在其中表现不佳的协调策略。GOAT通过使合作者暴露于各种具有挑战性的互动场景中，实现更好的泛化。通过保持生成模型冻结，从而避免对抗性剥削，我们保持了现实的协调策略。我们与真实的人类伙伴一起评估GOAT，结果表明在Overcooked基准上表现出最新的性能，突显了它在推广到多样化的人类行为方面的有效性。

更新时间: 2025-06-25 23:40:16

领域: cs.AI

下载: http://arxiv.org/abs/2504.15457v3

Omniwise: Predicting GPU Kernels Performance with LLMs

In recent years, the rapid advancement of deep neural networks (DNNs) has revolutionized artificial intelligence, enabling models with unprecedented capabilities in understanding, generating, and processing complex data. These powerful architectures have transformed a wide range of downstream applications, tackling tasks beyond human reach. In this paper, we introduce Omniwise, the first end-to-end, self-supervised fine-tuning pipeline that applies large language models (LLMs) to GPU kernel performance prediction--a novel use case in performance profiling. Omniwise is model-agnostic and lightweight, achieving strong results even with a small 3B-parameter model. It can predict key performance metrics, including memory bandwidth, cache hit rates, GFLOPs, and arithmetic intensity, directly from kernel code without the need for code execution or profiling tools. Our approach achieves over 90% of predictions within 10% relative error on GPU kernels executed on AMD MI250 and MI300X architectures. In addition to the pipeline, we develop an online inference server and a Visual Studio Code plugin that seamlessly integrate LLM-based performance prediction into developers' workflows.

Updated: 2025-06-25 23:36:44

标题: Omniwise: 使用LLMs预测GPU内核性能

摘要: 近年来，深度神经网络（DNNs）的快速发展彻底改变了人工智能，使得模型在理解、生成和处理复杂数据方面具有前所未有的能力。这些强大的架构已经转变了各种下游应用程序，处理超出人类能力范围的任务。在本文中，我们介绍了Omniwise，这是第一个端到端的、自监督的微调管道，将大型语言模型（LLMs）应用于GPU核心性能预测--这是性能分析中的一个新领域。Omniwise是模型无关且轻量级的，即使使用小的3B参数模型也能取得良好的结果。它可以直接从核心代码中预测关键性能指标，包括内存带宽、缓存命中率、GFLOPs和算术强度，无需代码执行或性能分析工具。我们的方法在AMD MI250和MI300X体系结构上执行的GPU核心中，超过90%的预测误差在10%以内。除了管道，我们还开发了一个在线推断服务器和一个Visual Studio Code插件，可以无缝地将基于LLM的性能预测整合到开发者的工作流程中。

更新时间: 2025-06-25 23:36:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20886v1

HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

Influence functions provide a principled method to assess the contribution of individual training samples to a specific target. Yet, their high computational costs limit their applications on large-scale models and datasets. Existing methods proposed for influence function approximation have significantly reduced the computational overheads. However, they mostly suffer from inaccurate estimation due to the lack of strong convergence guarantees from the algorithm. The family of hyperpower methods are well-known for their rigorous convergence guarantees on matrix inverse approximation, while the matrix multiplication operation can involve intractable memory and computation costs on large-scale models. We propose HyperINF, an efficient and accurate influence function approximation method which leverages the hyperpower method, specifically Schulz's iterative algorithm. To deal with the computation-intensive matrix multiplication, we incorporate the generalized fisher information (GFIM) as a low-rank approximation of the Hessian matrix, which reduces the memory and computation overheads to constant costs independent of ranks on LoRA-tuned models. We first demonstrate the superior accuracy and stability of HyperINF compared to other baselines through a synthetic convergence simulation for matrix inversion. We further validate the efficacy of HyperINF through extensive real-world data attribution tasks, including mislabeled data detection and data selection for LLM and VLM fine-tuning. On LoRA-tuned models, HyperINF achieves superior downstream performance with minimal memory and computational overhead, while other baselines suffer from significant degradation. Our codebase is available at https://github.com/Blackzxy/HyperINF.

Updated: 2025-06-25 23:23:23

标题: 超级信息：释放舒尔茨方法在数据影响估计中的超强力量

摘要: 影响函数提供了一种有原则的方法来评估个别训练样本对特定目标的贡献。然而，它们的高计算成本限制了它们在大规模模型和数据集上的应用。现有的影响函数逼近方法明显降低了计算开销。然而，它们大多受到估计不准确的困扰，因为算法缺乏强大的收敛保证。超幂方法家族以其在矩阵逆逼近上的严格收敛保证而闻名，而矩阵乘法运算可能涉及大规模模型上难以处理的内存和计算成本。我们提出了HyperINF，一种高效准确的影响函数逼近方法，利用超幂方法，特别是Schulz的迭代算法。为了处理计算密集型的矩阵乘法，我们将广义费舍尔信息（GFIM）作为Hessian矩阵的低秩逼近，将内存和计算开销降低到与LoRA调整模型上的秩无关的常数成本。我们首先通过矩阵求逆的合成收敛仿真展示了HyperINF相对于其他基线的优越精度和稳定性。我们进一步通过广泛的实际数据归因任务验证了HyperINF的有效性，包括误标记数据检测和LLM和VLM微调的数据选择。在LoRA调整模型上，HyperINF实现了卓越的下游性能，同时内存和计算开销最小，而其他基线则遭受了显著的降级。我们的代码库可在https://github.com/Blackzxy/HyperINF 上找到。

更新时间: 2025-06-25 23:23:23

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.05090v2

SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning

Federated Prompt Learning has emerged as a communication-efficient and privacy-preserving paradigm for adapting large vision-language models like CLIP across decentralized clients. However, the security implications of this setup remain underexplored. In this work, we present the first study of backdoor attacks in Federated Prompt Learning. We show that when malicious clients inject visually imperceptible, learnable noise triggers into input images, the global prompt learner becomes vulnerable to targeted misclassification while still maintaining high accuracy on clean inputs. Motivated by this vulnerability, we propose SABRE-FL, a lightweight, modular defense that filters poisoned prompt updates using an embedding-space anomaly detector trained offline on out-of-distribution data. SABRE-FL requires no access to raw client data or labels and generalizes across diverse datasets. We show, both theoretically and empirically, that malicious clients can be reliably identified and filtered using an embedding-based detector. Across five diverse datasets and four baseline defenses, SABRE-FL outperforms all baselines by significantly reducing backdoor accuracy while preserving clean accuracy, demonstrating strong empirical performance and underscoring the need for robust prompt learning in future federated systems.

Updated: 2025-06-25 23:15:20

标题: SABRE-FL：用于联邦式提示学习的选择性和准确的后门拒绝

摘要: 联邦式提示学习已经成为一种通信高效且保护隐私的范例，用于在分散的客户端之间调整大型视觉-语言模型，如CLIP。然而，这种设置的安全影响仍未得到充分探讨。在本研究中，我们首次研究了联邦式提示学习中的后门攻击。我们展示了当恶意客户端将在输入图像中注入视觉上难以察觉的、可学习的噪声触发器时，全局提示学习器会变得容易受到有针对性的误分类攻击，同时在干净输入上仍然保持高准确性。受到这种脆弱性的启发，我们提出了SABRE-FL，一个轻量级、模块化的防御机制，通过在离线训练的分布之外的数据上训练的嵌入空间异常检测器来过滤被毒害的提示更新。SABRE-FL不需要访问原始客户端数据或标签，并且能够泛化到各种数据集。我们从理论和经验上都展示了，恶意客户端可以通过基于嵌入的检测器可靠地识别和过滤。在五个不同的数据集和四种基准防御方式中，SABRE-FL通过显著降低后门准确性而保留干净准确性，优于所有基线，展示了强大的实证性能，并强调未来联邦系统中对鲁棒的提示学习的需求。

更新时间: 2025-06-25 23:15:20

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2506.22506v1

Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance

Model-driven engineering problems often require complex model transformations (MTs), i.e., MTs that are chained in extensive sequences. Pertinent examples of such problems include model synchronization, automated model repair, and design space exploration. Manually developing complex MTs is an error-prone and often infeasible process. Reinforcement learning (RL) is an apt way to alleviate these issues. In RL, an autonomous agent explores the state space through trial and error to identify beneficial sequences of actions, such as MTs. However, RL methods exhibit performance issues in complex problems. In these situations, human guidance can be of high utility. In this paper, we present an approach and technical framework for developing complex MT sequences through RL, guided by potentially uncertain human advice. Our framework allows user-defined MTs to be mapped onto RL primitives, and executes them as RL programs to find optimal MT sequences. Our evaluation shows that human guidance, even if uncertain, substantially improves RL performance, and results in more efficient development of complex MTs. Through a trade-off between the certainty and timeliness of human advice, our method takes a step towards RL-driven human-in-the-loop engineering methods.

Updated: 2025-06-25 23:10:12

标题: 通过强化学习和不确定的人类引导进行复杂模型转换

摘要: 模型驱动工程问题通常需要复杂的模型转换（MTs），即在广泛序列中链接的MTs。此类问题的相关示例包括模型同步、自动化模型修复和设计空间探索。手动开发复杂的MTs是一个容易出错且通常不可行的过程。强化学习（RL）是缓解这些问题的合适方式。在RL中，一个自主代理通过试错探索状态空间，以识别有益的行动序列，如MTs。然而，在复杂问题中，RL方法存在性能问题。在这些情况下，人类指导可以发挥很高的效用。在本文中，我们提出了一种通过RL开发复杂MT序列的方法和技术框架，并受可能不确定的人类建议引导。我们的框架允许用户定义的MTs映射到RL原语，并将它们作为RL程序执行，以找到最佳的MT序列。我们的评估表明，即使不确定，人类指导也会显著提高RL的性能，并导致更有效地开发复杂的MTs。通过在人类建议的确定性和及时性之间进行权衡，我们的方法迈出了朝向RL驱动的人机协同工程方法的一步。

更新时间: 2025-06-25 23:10:12

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.20883v1

Fairly Accurate: Fairness-aware Multi-group Target Detection in Online Discussion

Target-group detection is the task of detecting which group(s) a social media post is ``directed at or about'', with various applications, such as targeted-marketing. In this work, we focus on the fairness implications of target-group detection in the context of toxicity detection, where the perceived harm of a post often depends on which group(s) it targets. Because toxicity is highly contextual, language that appears benign in general may be harmful when targeting specific demographic groups. It is thus important to first detect which group(s) are being {\em targeted} by a post as a precursor to the subsequent task of determining whether the post is toxic given the group(s). Target-group detection is also challenging: a single post may simultaneously target one to many groups, and we must detect groups fairly in order to promote equitable treatment. We show that our proposed approach to {\em fairness-aware multi target-group detection} not only reduces bias across groups, but also achieves competitive predictive performance, outperforming existing fairness-aware baselines. To spur future research on fairness-aware target-group detection and support competitive benchmarking, we also share our code.

Updated: 2025-06-25 23:07:40

标题: 相对准确：在线讨论中的公平感知多组目标检测

摘要: 目标群体检测是检测社交媒体帖子“针对或关于”哪些群体的任务，具有各种应用，如定向营销。在这项工作中，我们关注目标群体检测在毒性检测背景下的公平性影响，其中帖子被认为有害往往取决于其针对的群体。由于毒性高度依赖上下文，一般情况下看似无害的语言在针对特定人口群体时可能是有害的。因此，首先检测帖子针对哪些群体是重要的，作为随后确定帖子是否有毒性给定群体的先决条件。目标群体检测也具有挑战性：一个帖子可能同时针对一到多个群体，我们必须公平地检测群体以促进公平对待。我们展示了我们提出的“考虑公平性的多目标群体检测”方法不仅减少了跨群体的偏见，还实现了竞争力的预测性能，优于现有的考虑公平性的基线。为了激励未来关于考虑公平性的目标群体检测的研究并支持竞争性基准测试，我们还分享了我们的代码。

更新时间: 2025-06-25 23:07:40

领域: cs.LG

下载: http://arxiv.org/abs/2407.11933v2

Always Skip Attention

We highlight a curious empirical result within modern Vision Transformers (ViTs). Specifically, self-attention catastrophically fails to train unless it is used in conjunction with a skip connection. This is in contrast to other elements of a ViT that continue to exhibit good performance (albeit suboptimal) when skip connections are removed. Further, we show that this critical dependence on skip connections is a relatively new phenomenon, with previous deep architectures (\eg, CNNs) exhibiting good performance in their absence. In this paper, we theoretically characterize that the self-attention mechanism is fundamentally ill-conditioned and is, therefore, uniquely dependent on skip connections for regularization. Additionally, we propose Token Graying -- a simple yet effective complement (to skip connections) that further improves the condition of input tokens. We validate our approach in both supervised and self-supervised training methods.

Updated: 2025-06-25 23:06:43

标题: 始终跳过关注

摘要: 我们强调了现代Vision Transformers（ViTs）中一个有趣的经验结果。具体来说，自注意力机制在训练时必须与跳跃连接一起使用，否则将导致灾难性失败。与ViT的其他元素相比，当去除跳跃连接时，自注意力机制表现出良好的性能（虽然不是最佳）。此外，我们展示了对跳跃连接的这种关键依赖是一个相对较新的现象，之前的深度架构（如CNNs）在没有跳跃连接的情况下表现良好。在本文中，我们理论上表明自注意力机制在根本上是病态的，因此唯一依赖于跳跃连接进行正则化。此外，我们提出了Token Graying——一个简单而有效的补充（跳跃连接），可以进一步改善输入token的条件。我们在监督和自监督训练方法中验证了我们的方法。

更新时间: 2025-06-25 23:06:43

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2505.01996v2

A3 : an Analytical Low-Rank Approximation Framework for Attention

Large language models have demonstrated remarkable performance; however, their massive parameter counts make deployment highly expensive. Low-rank approximation offers a promising compression solution, yet existing approaches have two main limitations: (1) They focus on minimizing the output error of individual linear layers, without considering the architectural characteristics of Transformers, and (2) they decompose a large weight matrix into two small low-rank matrices. Consequently, these methods often fall short compared to other compression techniques like pruning and quantization, and introduce runtime overhead such as the extra GEMM kernel launches for decomposed small matrices. To address these limitations, we propose $\tt A^\tt 3$, a post-training low-rank approximation framework. $\tt A^\tt 3$ splits a Transformer layer into three functional components, namely $\tt QK$, $\tt OV$, and $\tt MLP$. For each component, $\tt A^\tt 3$ provides an analytical solution that reduces the hidden dimension size inside each component while minimizing the component's functional loss ($\it i.e.$, error in attention scores, attention outputs, and MLP outputs). This approach directly reduces model sizes, KV cache sizes, and FLOPs without introducing any runtime overheads. In addition, it provides a new narrative in advancing the optimization problem from singular linear layer loss optimization toward improved end-to-end performance. Through extensive experiments, we show that $\tt A^\tt 3$ maintains superior performance compared to SoTAs. For example, under the same reduction budget in computation and memory, our low-rank approximated LLaMA 3.1-70B achieves a perplexity of 4.69 on WikiText-2, outperforming the previous SoTA's 7.87 by 3.18. We also demonstrate the versatility of $\tt A^\tt 3$, including KV cache compression, quantization, and mixed-rank assignments for enhanced performance.

Updated: 2025-06-25 23:03:54

标题: A3：一种用于注意力的解析低秩近似框架

摘要: 大型语言模型表现出了卓越的性能；然而，它们庞大的参数数量使得部署成本极高。低秩近似提供了一种有前途的压缩解决方案，然而现有方法存在两个主要限制：(1) 它们侧重于最小化单个线性层的输出误差，而不考虑Transformer的架构特性，以及(2) 它们将一个大的权重矩阵分解为两个小的低秩矩阵。因此，与剪枝和量化等其他压缩技术相比，这些方法通常表现不佳，并引入运行时开销，例如为分解的小矩阵启动额外的GEMM核。为了解决这些限制，我们提出了$\tt A^\tt 3$，一个后训练低秩近似框架。$\tt A^\tt 3$将Transformer层分为三个功能组件，即$\tt QK$、$\tt OV$和$\tt MLP$。对于每个组件，$\tt A^\tt 3$提供了一个分析解决方案，可以减少每个组件内部的隐藏维度大小，同时最小化组件的功能损失（即，注意力得分、注意力输出和MLP输出的误差）。这种方法直接减小了模型大小、KV缓存大小和FLOPs，而不引入任何运行时开销。此外，它提供了一个新的叙事，将优化问题从单一线性层损失优化推进到改进了端到端性能。通过大量实验，我们展示了$\tt A^\tt 3$相对于现有技术的优越性能。例如，在相同的计算和内存减少预算下，我们的低秩近似的LLaMA 3.1-70B在WikiText-2上实现了4.69的困惑度，比之前的SoTA（7.87）高出3.18。我们还展示了$\tt A^\tt 3$的多功能性，包括KV缓存压缩、量化和混合秩分配以提高性能。

更新时间: 2025-06-25 23:03:54

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2505.12942v3

THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion

Monocular depth estimation methods traditionally train deep models to infer depth directly from RGB pixels. This implicit learning often overlooks explicit monocular cues that the human visual system relies on, such as occlusion boundaries, shading, and perspective. Rather than expecting a network to discover these cues unaided, we present ThirdEye, a cue-aware pipeline that deliberately supplies each cue through specialised, pre-trained, and frozen networks. These cues are fused in a three-stage cortical hierarchy (V1->V2->V3) equipped with a key-value working-memory module that weights them by reliability. An adaptive-bins transformer head then produces a high-resolution disparity map. Because the cue experts are frozen, ThirdEye inherits large amounts of external supervision while requiring only modest fine-tuning. This extended version provides additional architectural detail, neuroscientific motivation, and an expanded experimental protocol; quantitative results will appear in a future revision.

Updated: 2025-06-25 22:59:40

标题: 第三眼：基于大脑启发的多阶段融合的线索感知单眼深度估计

摘要: 传统的单目深度估计方法通常训练深度模型，直接从RGB像素推断深度。这种隐式学习往往忽视了人类视觉系统依赖的明确的单目线索，如遮挡边界、阴影和透视。与其期望网络无助地发现这些线索，我们提出了ThirdEye，这是一个意识到线索的管道，通过专门的、预训练的和冻结的网络有意地提供每个线索。这些线索在一个三阶段的皮层层次结构（V1->V2->V3）中融合，配备了一个按可靠性加权的键值工作记忆模块。然后，一个自适应的分箱变换器头产生一个高分辨率的视差图。由于线索专家被冻结，ThirdEye继承了大量的外部监督，只需要适度的微调。这个扩展版本提供了额外的架构细节、神经科学动机和扩展的实验协议；定量结果将在未来的修订版本中出现。

更新时间: 2025-06-25 22:59:40

领域: cs.CV,cs.AI,I.4.8; I.2.10

下载: http://arxiv.org/abs/2506.20877v1

Empowering Digital Agriculture: A Privacy-Preserving Framework for Data Sharing and Collaborative Research

Data-driven agriculture, which integrates technology and data into agricultural practices, has the potential to improve crop yield, disease resilience, and long-term soil health. However, privacy concerns, such as adverse pricing, discrimination, and resource manipulation, deter farmers from sharing data, as it can be used against them. To address this barrier, we propose a privacy-preserving framework that enables secure data sharing and collaboration for research and development while mitigating privacy risks. The framework combines dimensionality reduction techniques (like Principal Component Analysis (PCA)) and differential privacy by introducing Laplacian noise to protect sensitive information. The proposed framework allows researchers to identify potential collaborators for a target farmer and train personalized machine learning models either on the data of identified collaborators via federated learning or directly on the aggregated privacy-protected data. It also allows farmers to identify potential collaborators based on similarities. We have validated this on real-life datasets, demonstrating robust privacy protection against adversarial attacks and utility performance comparable to a centralized system. We demonstrate how this framework can facilitate collaboration among farmers and help researchers pursue broader research objectives. The adoption of the framework can empower researchers and policymakers to leverage agricultural data responsibly, paving the way for transformative advances in data-driven agriculture. By addressing critical privacy challenges, this work supports secure data integration, fostering innovation and sustainability in agricultural systems.

Updated: 2025-06-25 22:46:30

标题: 赋能数字农业：数据共享和协作研究的隐私保护框架

摘要: 数据驱动的农业将技术和数据整合到农业实践中，有可能提高作物产量、抗病能力和长期土壤健康。然而，隐私问题，如不利定价、歧视和资源操纵，阻止农民分享数据，因为数据可能被用来对付他们。为了解决这一障碍，我们提出了一个隐私保护框架，旨在实现安全的数据共享和合作，同时减轻隐私风险。该框架结合了降维技术（如主成分分析（PCA））和差分隐私，通过引入拉普拉斯噪声来保护敏感信息。所提出的框架允许研究人员为目标农民识别潜在合作伙伴，并通过联邦学习在识别合作伙伴的数据上训练个性化的机器学习模型，或直接在聚合的隐私保护数据上训练。它还允许农民根据相似性识别潜在合作伙伴。我们已在真实数据集上验证了这一点，展示了对抗性攻击的强大隐私保护和与集中系统相媲美的实用性表现。我们展示了这一框架如何促进农民之间的合作，并帮助研究人员追求更广泛的研究目标。采用这一框架可以使研究人员和决策者负责地利用农业数据，为数据驱动的农业带来变革性进展铺平道路。通过解决关键的隐私挑战，这项工作支持安全的数据整合，促进农业系统的创新和可持续性。

更新时间: 2025-06-25 22:46:30

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2506.20872v1

Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation

Retrieval-Augmented Generation (RAG) systems are emerging as a key approach for grounding Large Language Models (LLMs) in external knowledge, addressing limitations in factual accuracy and contextual relevance. However, there is a lack of empirical studies that report on the development of RAG-based implementations grounded in real-world use cases, evaluated through general user involvement, and accompanied by systematic documentation of lessons learned. This paper presents five domain-specific RAG applications developed for real-world scenarios across governance, cybersecurity, agriculture, industrial research, and medical diagnostics. Each system incorporates multilingual OCR, semantic retrieval via vector embeddings, and domain-adapted LLMs, deployed through local servers or cloud APIs to meet distinct user needs. A web-based evaluation involving a total of 100 participants assessed the systems across six dimensions: (i) Ease of Use, (ii) Relevance, (iii) Transparency, (iv) Responsiveness, (v) Accuracy, and (vi) Likelihood of Recommendation. Based on user feedback and our development experience, we documented twelve key lessons learned, highlighting technical, operational, and ethical challenges affecting the reliability and usability of RAG systems in practice.

Updated: 2025-06-25 22:40:00

标题: 为真实世界应用工程化RAG系统：设计、开发和评估

摘要: 检索增强生成（RAG）系统正在成为将大型语言模型（LLMs）与外部知识联系起来的关键方法，以解决事实准确性和语境相关性方面的限制。然而，缺乏关于基于RAG的实现在真实用例中发展、通过一般用户参与评估，并伴随着系统性记录所学到经验的经验研究。本文介绍了为治理、网络安全、农业、工业研究和医学诊断等真实场景开发的五个领域特定的RAG应用程序。每个系统都整合了多语言OCR、通过向量嵌入进行语义检索以及领域适应的LLMs，通过本地服务器或云API部署以满足不同用户需求。一个基于网络的评估，共有100名参与者评估了系统在六个维度上的表现：（i）易用性，（ii）相关性，（iii）透明度，（iv）响应速度，（v）准确性和（vi）推荐可能性。根据用户反馈和我们的开发经验，我们记录了十二个关键经验教训，突出了影响RAG系统在实践中可靠性和可用性的技术、运营和伦理挑战。

更新时间: 2025-06-25 22:40:00

领域: cs.SE,cs.AI,cs.IR,D.2.11; I.2.6; H.3.3

下载: http://arxiv.org/abs/2506.20869v1

High-dimensional Contextual Bandit Problem without Sparsity

In this research, we investigate the high-dimensional linear contextual bandit problem where the number of features $p$ is greater than the budget $T$, or it may even be infinite. Differing from the majority of previous works in this field, we do not impose sparsity on the regression coefficients. Instead, we rely on recent findings on overparameterized models, which enables us to analyze the performance of the minimum-norm interpolating estimator when data distributions have small effective ranks. We propose an explore-then-commit (EtC) algorithm to address this problem and examine its performance. Through our analysis, we derive the optimal rate of the ETC algorithm in terms of $T$ and show that this rate can be achieved by balancing exploration and exploitation. Moreover, we introduce an adaptive explore-then-commit (AEtC) algorithm that adaptively finds the optimal balance. We assess the performance of the proposed algorithms through a series of simulations.

Updated: 2025-06-25 22:16:22

标题: 没有稀疏性的高维情境赌博问题

摘要: 在这项研究中，我们调查了高维线性情境臂问题，其中特征数量$p$大于预算$T$，甚至可能是无限的。与该领域大部分先前作品不同，我们并未对回归系数施加稀疏性。相反，我们依赖于最近关于过参数化模型的发现，这使我们能够分析当数据分布具有较小有效秩时最小范数插值估计器的性能。我们提出了一个探索-然后-承诺(Explore-then-Commit, EtC)算法来解决这个问题，并检验其性能。通过我们的分析，我们推导出了EtC算法在$T$方面的最佳率，并展示了通过平衡探索和利用可以实现这一率。此外，我们引入了一个自适应探索-然后-承诺(Adaptive Explore-then-Commit, AEtC)算法，该算法自适应地找到最佳平衡。我们通过一系列模拟评估了所提算法的性能。

更新时间: 2025-06-25 22:16:22

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2306.11017v2

AI-Driven Sentiment Analytics: Unlocking Business Value in the E-Commerce Landscape

The rapid growth of e-commerce has led to an overwhelming volume of customer feedback, from product reviews to service interactions. Extracting meaningful insights from this data is crucial for businesses aiming to improve customer satisfaction and optimize decision-making. This paper presents an AI-driven sentiment analysis system designed specifically for e-commerce applications, balancing accuracy with interpretability. Our approach integrates traditional machine learning techniques with modern deep learning models, allowing for a more nuanced understanding of customer sentiment while ensuring transparency in decision-making. Experimental results show that our system outperforms standard sentiment analysis methods, achieving an accuracy of 89.7% on diverse, large-scale datasets. Beyond technical performance, real-world implementation across multiple e-commerce platforms demonstrates tangible improvements in customer engagement and operational efficiency. This study highlights both the potential and the challenges of applying AI to sentiment analysis in a commercial setting, offering insights into practical deployment strategies and areas for future refinement.

Updated: 2025-06-25 22:12:21

标题: 人工智能驱动情感分析：在电子商务领域释放商业价值

摘要: 电子商务的快速增长导致了大量客户反馈，从产品评论到服务互动。从这些数据中提取有意义的见解对于希望提高客户满意度和优化决策的企业至关重要。本文介绍了一种专门为电子商务应用设计的人工智能驱动情感分析系统，平衡了准确性和可解释性。我们的方法将传统机器学习技术与现代深度学习模型相结合，允许更细致地理解客户情感，同时确保决策透明度。实验结果表明，我们的系统优于标准情感分析方法，在多样化、大规模数据集上实现了89.7%的准确性。除了技术性能，跨多个电子商务平台的实际实施显示出客户参与度和运营效率的实质性改善。这项研究突出了将人工智能应用于商业情感分析的潜力和挑战，提供了实际部署策略和未来改进领域的见解。

更新时间: 2025-06-25 22:12:21

领域: cs.IR,cs.AI,68T50

下载: http://arxiv.org/abs/2504.08738v3

Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA

Memorization in large language models (LLMs) makes them vulnerable to data extraction attacks. While pre-training memorization has been extensively studied, fewer works have explored its impact in fine-tuning, particularly for LoRA fine-tuning, a widely adopted parameter-efficient method. In this work, we re-examine memorization in fine-tuning and uncover a surprising divergence from prior findings across different fine-tuning strategies. Factors such as model scale and data duplication, which strongly influence memorization in pre-training and full fine-tuning, do not follow the same trend in LoRA fine-tuning. Using a more relaxed similarity-based memorization metric, we demonstrate that LoRA significantly reduces memorization risks compared to full fine-tuning, while still maintaining strong task performance.

Updated: 2025-06-25 22:01:25

标题: 更瘦身的训练，更低的泄漏：通过LoRA重新审视LLM微调中的记忆化

摘要: 大型语言模型（LLMs）中的记忆使它们容易受到数据提取攻击的威胁。虽然预训练记忆已得到广泛研究，但较少有研究探讨了其在微调中的影响，特别是对于LoRA微调，这是一种被广泛采用的参数高效的方法。在这项工作中，我们重新审视了微调中的记忆，揭示了与先前研究结果在不同微调策略中的惊人分歧。在预训练和完全微调中强烈影响记忆的模型规模和数据重复等因素，在LoRA微调中并不遵循相同的趋势。通过使用更宽松的基于相似性的记忆度量标准，我们证明LoRA相比完全微调显著降低了记忆风险，同时仍保持强大的任务性能。

更新时间: 2025-06-25 22:01:25

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2506.20856v1

Subspace-Distance-Enabled Active Learning for Efficient Data-Driven Model Reduction of Parametric Dynamical Systems

In situations where the solution of a high-fidelity dynamical system needs to be evaluated repeatedly, over a vast pool of parametric configurations and in absence of access to the underlying governing equations, data-driven model reduction techniques are preferable. We propose a novel active learning approach to build a parametric data-driven reduced-order model (ROM) by greedily picking the most important parameter samples from the parameter domain. As a result, during the ROM construction phase, the number of high-fidelity solutions dynamically grow in a principled fashion. The high-fidelity solution snapshots are expressed in several parameter-specific linear subspaces, with the help of proper orthogonal decomposition (POD), and the relative distance between these subspaces is used as a guiding mechanism to perform active learning. For successfully achieving this, we provide a distance measure to evaluate the similarity between pairs of linear subspaces with different dimensions, and also show that this distance measure is a metric. The usability of the proposed subspace-distance-enabled active learning (SDE-AL) framework is demonstrated by augmenting two existing non-intrusive reduced-order modeling approaches, and providing their active-learning-driven (ActLearn) extensions, namely, SDE-ActLearn-POD-KSNN, and SDE-ActLearn-POD-NN. Furthermore, we report positive results for two parametric physical models, highlighting the efficiency of the proposed SDE-AL approach.

Updated: 2025-06-25 22:00:25

标题: 子空间距离启用的主动学习，用于参数动力系统的高效数据驱动模型简化

摘要: 在需要反复评估高保真动态系统解决方案的情况下，针对参数配置池的广泛范围，并且无法访问基础控制方程的情况下，数据驱动模型简化技术更为可取。我们提出了一种新颖的主动学习方法，通过贪婪地从参数域中选择最重要的参数样本来构建参数化的数据驱动简化模型（ROM）。因此，在ROM构建阶段，高保真解的数量以有原则的方式动态增长。高保真解的快照以几个参数特定的线性子空间表示，借助适当的正交分解（POD），并且这些子空间之间的相对距离被用作指导机制来执行主动学习。为了成功实现这一点，我们提供了一个距离度量来评估不同维度的线性子空间对之间的相似性，并且还表明这个距离度量是一个度量。所提出的子空间距离启用的主动学习（SDE-AL）框架的可用性通过增加两种现有的非侵入式简化模型方法，并提供它们的主动学习驱动扩展，即SDE-ActLearn-POD-KSNN和SDE-ActLearn-POD-NN得到了证明。此外，我们针对两个参数化物理模型报告了积极的结果，突出了所提出的SDE-AL方法的效率。

更新时间: 2025-06-25 22:00:25

领域: math.NA,cs.CE,cs.LG,cs.NA,math.DS,physics.comp-ph

下载: http://arxiv.org/abs/2505.00460v2

NFISiS: New Perspectives on Fuzzy Inference Systems for Renewable Energy Forecasting

Deep learning models, despite their popularity, face challenges such as long training times and a lack of interpretability. In contrast, fuzzy inference systems offer a balance of accuracy and transparency. This paper addresses the limitations of traditional Takagi-Sugeno-Kang fuzzy models by extending the recently proposed New Takagi-Sugeno-Kang model to a new Mamdani-based regressor. These models are data-driven, allowing users to define the number of rules to balance accuracy and interpretability. To handle the complexity of large datasets, this research integrates wrapper and ensemble techniques. A Genetic Algorithm is used as a wrapper for feature selection, creating genetic versions of the models. Furthermore, ensemble models, including the Random New Mamdani Regressor, Random New Takagi-Sugeno-Kang, and Random Forest New Takagi-Sugeno-Kang, are introduced to improve robustness. The proposed models are validated on photovoltaic energy forecasting datasets, a critical application due to the intermittent nature of solar power. Results demonstrate that the genetic and ensemble fuzzy models, particularly the Genetic New Takagi-Sugeno-Kang and Random Forest New Takagi-Sugeno-Kang, achieve superior performance. They often outperform both traditional machine learning and deep learning models while providing a simpler and more interpretable rule-based structure. The models are available online in a library called nfisis (https://pypi.org/project/nfisis/).

Updated: 2025-06-25 22:00:25

标题: NFISiS：可再生能源预测模糊推理系统的新视角

摘要: 深度学习模型，尽管很受欢迎，但面临长时间训练和缺乏可解释性等挑战。相比之下，模糊推理系统提供了精度和透明度的平衡。本文通过将最近提出的新Takagi-Sugeno-Kang模型扩展为新的基于Mamdani的回归器来解决传统Takagi-Sugeno-Kang模糊模型的局限性。这些模型是数据驱动的，允许用户定义规则的数量以平衡精度和可解释性。为了处理大型数据集的复杂性，本研究整合了包装器和集成技术。遗传算法被用作特征选择的包装器，创建模型的遗传版本。此外，引入了集成模型，包括随机新Mamdani回归器、随机新Takagi-Sugeno-Kang和随机森林新Takagi-Sugeno-Kang，以提高鲁棒性。提出的模型在光伏能源预测数据集上进行了验证，这是一个关键的应用，因为太阳能的间歇性特性。结果表明，遗传和集成模糊模型，特别是遗传新Takagi-Sugeno-Kang和随机森林新Takagi-Sugeno-Kang，实现了更优越的性能。它们通常优于传统的机器学习和深度学习模型，同时提供了更简单和更可解释的基于规则的结构。这些模型可以在线获取，位于名为nfisis的库中（https://pypi.org/project/nfisis/）。

更新时间: 2025-06-25 22:00:25

领域: cs.AI

下载: http://arxiv.org/abs/2506.06285v3

Multi-Objective Reinforcement Learning for Cognitive Radar Resource Management

The time allocation problem in multi-function cognitive radar systems focuses on the trade-off between scanning for newly emerging targets and tracking the previously detected targets. We formulate this as a multi-objective optimization problem and employ deep reinforcement learning to find Pareto-optimal solutions and compare deep deterministic policy gradient (DDPG) and soft actor-critic (SAC) algorithms. Our results demonstrate the effectiveness of both algorithms in adapting to various scenarios, with SAC showing improved stability and sample efficiency compared to DDPG. We further employ the NSGA-II algorithm to estimate an upper bound on the Pareto front of the considered problem. This work contributes to the development of more efficient and adaptive cognitive radar systems capable of balancing multiple competing objectives in dynamic environments.

Updated: 2025-06-25 21:56:30

标题: 多目标强化学习用于认知雷达资源管理

摘要: 多功能认知雷达系统中的时间分配问题集中在扫描新出现目标和跟踪先前检测到的目标之间的权衡。我们将其制定为一个多目标优化问题，并采用深度强化学习来寻找帕累托最优解，并比较深度确定性策略梯度（DDPG）和软性演员评论（SAC）算法。我们的结果表明，这两种算法在适应各种场景方面的有效性，SAC相对于DDPG表现出更好的稳定性和样本效率。我们进一步使用NSGA-II算法来估计所考虑问题的帕累托前沿的上界。这项工作有助于开发更高效和自适应的认知雷达系统，能够在动态环境中平衡多个竞争目标。

更新时间: 2025-06-25 21:56:30

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2506.20853v1

Generating Reliable Adverse event Profiles for Health through Automated Integrated Data (GRAPH-AID): A Semi-Automated Ontology Building Approach

As data and knowledge expand rapidly, adopting systematic methodologies for ontology generation has become crucial. With the daily increases in data volumes and frequent content changes, the demand for databases to store and retrieve information for the creation of knowledge graphs has become increasingly urgent. The previously established Knowledge Acquisition and Representation Methodology (KNARM) outlines a systematic approach to address these challenges and create knowledge graphs. However, following this methodology highlights the existing challenge of seamlessly integrating Neo4j databases with the Web Ontology Language (OWL). Previous attempts to integrate data from Neo4j into an ontology have been discussed, but these approaches often require an understanding of description logics (DL) syntax, which may not be familiar to many users. Thus, a more accessible method is necessary to bridge this gap. This paper presents a user-friendly approach that utilizes Python and its rdflib library to support ontology development. We showcase our novel approach through a Neo4j database we created by integrating data from the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) database. Using this dataset, we developed a Python script that automatically generates the required classes and their axioms, facilitating a smoother integration process. This approach offers a practical solution to the challenges of ontology generation in the context of rapidly growing adverse drug event datasets, supporting improved drug safety monitoring and public health decision-making.

Updated: 2025-06-25 21:48:21

标题: 通过自动集成数据生成健康可靠的不良事件概况（GRAPH-AID）：一种半自动本体构建方法

摘要: 随着数据和知识迅速扩展，采用系统方法生成本体已变得至关重要。随着数据量每天增加和内容频繁更改，对于存储和检索信息以创建知识图的数据库的需求变得日益迫切。先前建立的知识获取和表示方法论（KNARM）概述了一个系统方法来解决这些挑战并创建知识图。然而，遵循这种方法论强调了将Neo4j数据库与Web本体语言（OWL）无缝集成的现有挑战。先前有关将Neo4j数据集成到本体中的尝试已经讨论过，但这些方法通常需要对描述逻辑（DL）语法有一定了解，这可能并不熟悉于许多用户。因此，需要一种更易接近的方法来弥合这一差距。本文提出了一种利用Python及其rdflib库支持本体开发的用户友好方法。我们通过整合美国食品药品监督管理局（FDA）不良事件报告系统（FAERS）数据库中的数据创建的Neo4j数据库展示了我们的新方法。利用这一数据集，我们开发了一个Python脚本，自动生成所需的类及其公理，促进了更顺畅的集成过程。这种方法为在不断增长的不良药物事件数据集背景下的本体生成挑战提供了一个实用解决方案，支持改进药物安全监测和公共卫生决策。

更新时间: 2025-06-25 21:48:21

领域: cs.SE,cs.AI,cs.DB

下载: http://arxiv.org/abs/2506.20851v1

InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction

Click-through rate (CTR) prediction, which predicts the probability of a user clicking an ad, is a fundamental task in recommender systems. The emergence of heterogeneous information, such as user profile and behavior sequences, depicts user interests from different aspects. A mutually beneficial integration of heterogeneous information is the cornerstone towards the success of CTR prediction. However, most of the existing methods suffer from two fundamental limitations, including (1) insufficient inter-mode interaction due to the unidirectional information flow between modes, and (2) aggressive information aggregation caused by early summarization, resulting in excessive information loss. To address the above limitations, we propose a novel module named InterFormer to learn heterogeneous information interaction in an interleaving style. To achieve better interaction learning, InterFormer enables bidirectional information flow for mutually beneficial learning across different modes. To avoid aggressive information aggregation, we retain complete information in each data mode and use a separate bridging arch for effective information selection and summarization. Our proposed InterFormer achieves state-of-the-art performance on three public datasets and a large-scale industrial dataset.

Updated: 2025-06-25 21:48:04

标题: InterFormer：用于点击率预测的有效异构交互学习

摘要: 点击率（CTR）预测是推荐系统中的一个基本任务，它预测用户点击广告的概率。异构信息的出现，如用户配置文件和行为序列，从不同方面描绘了用户的兴趣。异构信息的互惠整合是CTR预测成功的基石。然而，大多数现有方法存在两个基本限制，包括（1）由于模式之间的单向信息流而导致的不足的模态间交互，以及（2）由早期总结引起的侵略性信息聚合，导致信息过度损失。为了解决上述限制，我们提出了一个名为InterFormer的新颖模块，以交替方式学习异构信息交互。为了实现更好的交互学习，InterFormer使不同模式之间实现了双向信息流的互惠学习。为了避免侵略性信息聚合，我们在每种数据模式中保留完整信息，并使用单独的桥接架构进行有效信息选择和总结。我们提出的InterFormer在三个公共数据集和一个大规模工业数据集上实现了最先进的性能。

更新时间: 2025-06-25 21:48:04

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.09852v3

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

The rapid advancement of generative models in creating highly realistic images poses substantial risks for misinformation dissemination. For instance, a synthetic image, when shared on social media, can mislead extensive audiences and erode trust in digital content, resulting in severe repercussions. Despite some progress, academia has not yet created a large and diversified deepfake detection dataset for social media, nor has it devised an effective solution to address this issue. In this paper, we introduce the Social media Image Detection dataSet (SID-Set), which offers three key advantages: (1) extensive volume, featuring 300K AI-generated/tampered and authentic images with comprehensive annotations, (2) broad diversity, encompassing fully synthetic and tampered images across various classes, and (3) elevated realism, with images that are predominantly indistinguishable from genuine ones through mere visual inspection. Furthermore, leveraging the exceptional capabilities of large multimodal models, we propose a new image deepfake detection, localization, and explanation framework, named SIDA (Social media Image Detection, localization, and explanation Assistant). SIDA not only discerns the authenticity of images, but also delineates tampered regions through mask prediction and provides textual explanations of the model's judgment criteria. Compared with state-of-the-art deepfake detection models on SID-Set and other benchmarks, extensive experiments demonstrate that SIDA achieves superior performance among diversified settings. The code, model, and dataset will be released.

Updated: 2025-06-25 21:47:50

标题: 艾滋病：利用大型多模态模型进行社交媒体图像Deepfake检测、定位和解释

摘要: 生成模型在创建高度逼真图像方面的快速进展为误导传播带来了重大风险。例如，当在社交媒体上分享合成图像时，可能会误导大量观众并破坏对数字内容的信任，导致严重后果。尽管取得了一些进展，学术界尚未为社交媒体创造出一个庞大且多样化的深度伪造检测数据集，也尚未设计出有效解决此问题的方案。在本文中，我们介绍了社交媒体图像检测数据集（SID-Set），具有三个关键优势：（1）大量的数据量，包括30万个人工智能生成/篡改和真实图像，并提供详尽的注释，（2）广泛的多样性，涵盖各种类别的全合成和篡改图像，（3）高度逼真，其中图像在纯粹的视觉检查中主要无法与真实图像区分开来。此外，利用大型多模态模型的卓越能力，我们提出了一种新的图像深度伪造检测、定位和解释框架，名为SIDA（社交媒体图像检测、定位和解释助手）。SIDA不仅能够辨别图像的真实性，还能通过掩膜预测勾画出篡改区域，并提供模型判断标准的文本解释。与SID-Set和其他基准测试上最先进的深度伪造检测模型相比，广泛的实验表明，SIDA在不同设置中实现了卓越的性能。代码、模型和数据集将会发布。

更新时间: 2025-06-25 21:47:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.04292v3

Zero-TIG: Temporal Consistency-Aware Zero-Shot Illumination-Guided Low-light Video Enhancement

Low-light and underwater videos suffer from poor visibility, low contrast, and high noise, necessitating enhancements in visual quality. However, existing approaches typically rely on paired ground truth, which limits their practicality and often fails to maintain temporal consistency. To overcome these obstacles, this paper introduces a novel zero-shot learning approach named Zero-TIG, leveraging the Retinex theory and optical flow techniques. The proposed network consists of an enhancement module and a temporal feedback module. The enhancement module comprises three subnetworks: low-light image denoising, illumination estimation, and reflection denoising. The temporal enhancement module ensures temporal consistency by incorporating histogram equalization, optical flow computation, and image warping to align the enhanced previous frame with the current frame, thereby maintaining continuity. Additionally, we address color distortion in underwater data by adaptively balancing RGB channels. The experimental results demonstrate that our method achieves low-light video enhancement without the need for paired training data, making it a promising and applicable method for real-world scenario enhancement.

Updated: 2025-06-25 21:45:14

标题: Zero-TIG：时间一致性感知的零-shot照明引导低光视频增强

摘要: 低光和水下视频受到能见度差、对比度低和噪音高的影响，需要提高视觉质量。然而，现有方法通常依赖于配对的真实数据，这限制了它们的实用性并经常无法保持时间一致性。为了克服这些障碍，本文介绍了一种名为Zero-TIG的新颖的零样本学习方法，利用了Retinex理论和光流技术。所提出的网络包括增强模块和时间反馈模块。增强模块由三个子网络组成：低光图像去噪、光照估计和反射去噪。时间增强模块通过整合直方图均衡化、光流计算和图像对齐来确保时间一致性，从而将增强的上一帧与当前帧对齐，保持连续性。此外，我们通过自适应平衡RGB通道来解决水下数据中的色彩失真问题。实验结果表明，我们的方法实现了低光视频增强，无需配对训练数据，使其成为现实场景增强的一种有前途且实用的方法。

更新时间: 2025-06-25 21:45:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11175v2

Learning-Based Resource Management in Integrated Sensing and Communication Systems

In this paper, we tackle the task of adaptive time allocation in integrated sensing and communication systems equipped with radar and communication units. The dual-functional radar-communication system's task involves allocating dwell times for tracking multiple targets and utilizing the remaining time for data transmission towards estimated target locations. We introduce a novel constrained deep reinforcement learning (CDRL) approach, designed to optimize resource allocation between tracking and communication under time budget constraints, thereby enhancing target communication quality. Our numerical results demonstrate the efficiency of our proposed CDRL framework, confirming its ability to maximize communication quality in highly dynamic environments while adhering to time constraints.

Updated: 2025-06-25 21:44:07

标题: 学习为基础的综合感知和通信系统资源管理

摘要: 在这篇论文中，我们解决了配备雷达和通信单元的集成感知和通信系统中的自适应时间分配任务。双功能雷达-通信系统的任务涉及为跟踪多个目标分配停留时间，并利用剩余时间向估计的目标位置传输数据。我们引入了一种新颖的受限深度强化学习（CDRL）方法，旨在优化在时间预算约束下跟踪和通信之间的资源分配，从而提高目标通信质量。我们的数值结果展示了我们提出的CDRL框架的效率，证实了其能够在高度动态的环境中最大化通信质量，同时遵守时间约束。

更新时间: 2025-06-25 21:44:07

领域: cs.LG

下载: http://arxiv.org/abs/2506.20849v1

From Tiny Machine Learning to Tiny Deep Learning: A Survey

The rapid growth of edge devices has driven the demand for deploying artificial intelligence (AI) at the edge, giving rise to Tiny Machine Learning (TinyML) and its evolving counterpart, Tiny Deep Learning (TinyDL). While TinyML initially focused on enabling simple inference tasks on microcontrollers, the emergence of TinyDL marks a paradigm shift toward deploying deep learning models on severely resource-constrained hardware. This survey presents a comprehensive overview of the transition from TinyML to TinyDL, encompassing architectural innovations, hardware platforms, model optimization techniques, and software toolchains. We analyze state-of-the-art methods in quantization, pruning, and neural architecture search (NAS), and examine hardware trends from MCUs to dedicated neural accelerators. Furthermore, we categorize software deployment frameworks, compilers, and AutoML tools enabling practical on-device learning. Applications across domains such as computer vision, audio recognition, healthcare, and industrial monitoring are reviewed to illustrate the real-world impact of TinyDL. Finally, we identify emerging directions including neuromorphic computing, federated TinyDL, edge-native foundation models, and domain-specific co-design approaches. This survey aims to serve as a foundational resource for researchers and practitioners, offering a holistic view of the ecosystem and laying the groundwork for future advancements in edge AI.

Updated: 2025-06-25 21:42:13

标题: 从微小机器学习到微小深度学习：一项调查

摘要: 边缘设备的快速增长推动了在边缘部署人工智能（AI）的需求，从而催生了微型机器学习（TinyML）及其发展对应物，微型深度学习（TinyDL）。虽然最初TinyML专注于在微控制器上实现简单推理任务，但微型深度学习的出现标志着向在资源极为有限的硬件上部署深度学习模型的范式转变。本调查全面介绍了从TinyML到TinyDL的过渡过程，包括架构创新、硬件平台、模型优化技术和软件工具链。我们分析了量化、修剪和神经网络架构搜索（NAS）等最新方法，并审视了从微控制器到专用神经加速器的硬件趋势。此外，我们对软件部署框架、编译器和AutoML工具进行分类，以实现实用的设备端学习。通过审查计算机视觉、音频识别、医疗保健和工业监测等领域的应用，以展示微型深度学习的实际影响。最后，我们确定了新兴方向，包括神经形态计算、联合微型深度学习、边缘本地基础模型和领域特定协同设计方法。本调查旨在为研究人员和实践者提供基础资源，提供生态系统的整体视图，并为边缘人工智能的未来进展奠定基础。

更新时间: 2025-06-25 21:42:13

领域: cs.LG

下载: http://arxiv.org/abs/2506.18927v2

Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training

Due to the sensitive nature of personally identifiable information (PII), its owners may have the authority to control its inclusion or request its removal from large-language model (LLM) training. Beyond this, PII may be added or removed from training datasets due to evolving dataset curation techniques, because they were newly scraped for retraining, or because they were included in a new downstream fine-tuning stage. We find that the amount and ease of PII memorization is a dynamic property of a model that evolves throughout training pipelines and depends on commonly altered design choices. We characterize three such novel phenomena: (1) similar-appearing PII seen later in training can elicit memorization of earlier-seen sequences in what we call assisted memorization, and this is a significant factor (in our settings, up to 1/3); (2) adding PII can increase memorization of other PII significantly (in our settings, as much as $\approx\!7.5\times$); and (3) removing PII can lead to other PII being memorized. Model creators should consider these first- and second-order privacy risks when training models to avoid the risk of new PII regurgitation.

Updated: 2025-06-25 21:37:19

标题: 在语言模型训练中添加或移除个人信息的隐私连锁效应

摘要: 由于个人可识别信息（PII）的敏感性质，其所有者可能有权控制其包含或要求从大型语言模型（LLM）训练中删除。此外，由于不断发展的数据集策划技术，PII可能会被添加或删除，因为它们是新爬取的用于重新训练，或者因为它们已包含在新的下游微调阶段中。我们发现PII的记忆量和易记性是一个模型的动态属性，在整个训练管道中发展，并取决于通常更改的设计选择。我们描述了三种新颖现象：（1）在训练后期看到的类似外观的PII可以引发我们称之为辅助记忆的早期序列的记忆，这是一个重要因素（在我们的设置中，高达1/3）；（2）添加PII可以显着增加其他PII的记忆（在我们的设置中，高达约7.5倍）；以及（3）删除PII可能导致其他PII被记忆。模型创建者在训练模型时应考虑这些一级和二级隐私风险，以避免新PII的风险再现。

更新时间: 2025-06-25 21:37:19

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2502.15680v2

Reducing Biases in Record Matching Through Scores Calibration

Record matching is the task of identifying records that refer to the same real-world entity across datasets. While most existing models optimize for accuracy, fairness has become an important concern due to the potential for unequal outcomes across demographic groups. Prior work typically focuses on binary outcomes evaluated at fixed decision thresholds. However, such evaluations can miss biases in matching scores--biases that persist across thresholds and affect downstream tasks. We propose a threshold-independent framework for measuring and reducing score bias, defined as disparities in the distribution of matching scores across groups. We show that several state-of-the-art matching methods exhibit substantial score bias, even when appearing fair under standard threshold-based metrics. To address this, we introduce two post-processing score calibration algorithms. The first, calib, aligns group-wise score distributions using the Wasserstein barycenter, targeting demographic parity. The second, ccalib, conditions on predicted labels to further reduce label-dependent biases, such as equal opportunity. Both methods are model-agnostic and require no access to model training data. calib also offers theoretical guarantees, ensuring reduced bias with minimal deviation from original scores. Experiments across real-world datasets and matching models confirm that calib and ccalib substantially reduce score bias while minimally impacting model accuracy.

Updated: 2025-06-25 21:36:23

标题: 通过分数校准减少记录匹配中的偏差

摘要: 记录匹配是在数据集之间识别指向相同真实世界实体的记录的任务。虽然大多数现有模型都优化于准确性，但公平性已成为一个重要关注点，因为可能导致不同人群之间不平等的结果。先前的研究通常集中在在固定决策阈值下评估的二元结果。然而，这样的评估可能会忽视匹配分数中的偏见，这种偏见会贯穿不同的阈值并影响下游任务。我们提出了一个独立于阈值的框架，用于测量和减少分数偏见，定义为不同组之间匹配分数分布的差异。我们发现几种最先进的匹配方法即使在标准基于阈值的指标下看起来是公平的，但仍存在显著的分数偏见。为了解决这个问题，我们引入了两种后处理分数校准算法。第一种calib，使用Wasserstein barycenter对分数分布进行组间对齐，以实现人口统计的平等。第二种ccalib，根据预测标签进一步减少依赖于标签的偏见，比如平等机会。这两种方法都是与模型无关的，不需要访问模型训练数据。calib还提供了理论保证，确保在最小程度偏离原始分数的情况下减少偏见。通过对真实世界数据集和匹配模型的实验证实，calib和ccalib显著减少了分数偏见，同时最小程度地影响模型的准确性。

更新时间: 2025-06-25 21:36:23

领域: cs.LG,cs.CY,cs.DB

下载: http://arxiv.org/abs/2411.01685v2

FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization

Semi-supervised domain generalization (SSDG) aims to solve the problem of generalizing to out-of-distribution data when only a few labels are available. Due to label scarcity, applying domain generalization methods often underperform. Consequently, existing SSDG methods combine semi-supervised learning methods with various regularization terms. However, these methods do not explicitly regularize to learn domains invariant representations across all domains, which is a key goal for domain generalization. To address this, we introduce FixCLR. Inspired by success in self-supervised learning, we change two crucial components to adapt contrastive learning for explicit domain invariance regularization: utilization of class information from pseudo-labels and using only a repelling term. FixCLR can also be added on top of most existing SSDG and semi-supervised methods for complementary performance improvements. Our research includes extensive experiments that have not been previously explored in SSDG studies. These experiments include benchmarking different improvements to semi-supervised methods, evaluating the performance of pretrained versus non-pretrained models, and testing on datasets with many domains. Overall, FixCLR proves to be an effective SSDG method, especially when combined with other semi-supervised methods.

Updated: 2025-06-25 21:25:05

标题: FixCLR：半监督领域泛化的负类对比学习

摘要: 半监督域泛化（SSDG）旨在解决仅有少量标签可用时泛化到分布外数据的问题。由于标签稀缺，应用域泛化方法通常表现不佳。因此，现有的SSDG方法将半监督学习方法与各种正则化项结合起来。然而，这些方法并未明确规范化以学习跨所有域的不变表示，这是域泛化的一个关键目标。为了解决这个问题，我们引入了FixCLR。受到自监督学习成功的启发，我们改变了两个关键组件，以适应对显式域不变性正则化的对比学习：利用来自伪标签的类信息和仅使用一个排斥项。FixCLR还可以添加到大多数现有的SSDG和半监督方法上，以提高绩效。我们的研究包括以前在SSDG研究中尚未探索过的广泛实验。这些实验包括对半监督方法的不同改进进行基准测试，评估经过预训练与未经过预训练模型的性能，并在具有多个领域的数据集上进行测试。总的来说，FixCLR证明是一种有效的SSDG方法，特别是当与其他半监督方法结合时。

更新时间: 2025-06-25 21:25:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.20841v1

Uncertainty-Aware Machine-Learning Framework for Predicting Dislocation Plasticity and Stress-Strain Response in FCC Alloys

Machine learning has significantly advanced the understanding and application of structural materials, with an increasing emphasis on integrating existing data and quantifying uncertainties in predictive modeling. This study presents a comprehensive methodology utilizing a mixed density network (MDN) model, trained on extensive experimental data from literature. This approach uniquely predicts the distribution of dislocation density, inferred as a latent variable, and the resulting stress distribution at the grain level. The incorporation of statistical parameters of those predicted distributions into a dislocation-mediated plasticity model allows for accurate stress-strain predictions with explicit uncertainty quantification. This strategy not only improves the accuracy and reliability of mechanical property predictions but also plays a vital role in optimizing alloy design, thereby facilitating the development of new materials in a rapidly evolving industry.

Updated: 2025-06-25 21:18:14

标题: 基于不确定性感知的机器学习框架：用于预测FCC合金中位错塑性和应力-应变响应.

摘要: 机器学习在结构材料的理解和应用方面取得了显著进展，越来越强调整合现有数据并量化预测建模中的不确定性。本研究提出了一种综合方法，利用混合密度网络（MDN）模型，该模型经过大量文献实验数据训练。这种方法独特地预测了位错密度的分布，将其推断为潜在变量，并预测了颗粒级别的应力分布。将这些预测分布的统计参数纳入位错介导的塑性模型中，使得可以准确地预测应力-应变，并明确不确定性。这种策略不仅提高了机械性能预测的准确性和可靠性，还在优化合金设计中发挥着重要作用，从而促进了在快速发展的行业中新材料的开发。

更新时间: 2025-06-25 21:18:14

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2506.20839v1

Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning

In self-supervised contrastive learning, negative pairs are typically constructed using an anchor image and a sample drawn from the entire dataset, excluding the anchor. However, this approach can result in the creation of negative pairs with similar semantics, referred to as "false negatives", leading to their embeddings being falsely pushed apart. To address this issue, we introduce GloFND, an optimization-based approach that automatically learns on the fly the threshold for each anchor data to identify its false negatives during training. In contrast to previous methods for false negative discovery, our approach globally detects false negatives across the entire dataset rather than locally within the mini-batch. Moreover, its per-iteration computation cost remains independent of the dataset size. Experimental results on image and image-text data demonstrate the effectiveness of the proposed method. Our implementation is available at https://github.com/vibalcam/GloFND.

Updated: 2025-06-25 21:11:53

标题: 实时发现全局假负样本以用于自监督对比学习

摘要: 在自监督对比学习中，通常使用一个锚定图像和从整个数据集中抽取的样本构建负对。然而，这种方法可能会导致创建具有相似语义的负对，被称为“假负对”，从而使它们的嵌入被错误地推开。为了解决这个问题，我们引入了GloFND，这是一种基于优化的方法，可以在训练过程中自动学习每个锚定数据的阈值，以识别其假负对。与先前的假负对发现方法相比，我们的方法在整个数据集中全局检测假负对，而不是在小批量内部。此外，其每次迭代的计算成本与数据集大小无关。对图像和图像文本数据的实验结果显示了提出方法的有效性。我们的实现可在https://github.com/vibalcam/GloFND 上找到。

更新时间: 2025-06-25 21:11:53

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2502.20612v2

Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data

Incorporating pre-collected offline data from a source environment can significantly improve the sample efficiency of reinforcement learning (RL), but this benefit is often challenged by discrepancies between the transition dynamics of the source and target environments. Existing methods typically address this issue by penalizing or filtering out source transitions in high dynamics-gap regions. However, their estimation of the dynamics gap often relies on KL divergence or mutual information, which can be ill-defined when the source and target dynamics have disjoint support. To overcome these limitations, we propose CompFlow, a method grounded in the theoretical connection between flow matching and optimal transport. Specifically, we model the target dynamics as a conditional flow built upon the output distribution of the source-domain flow, rather than learning it directly from a Gaussian prior. This composite structure offers two key advantages: (1) improved generalization for learning target dynamics, and (2) a principled estimation of the dynamics gap via the Wasserstein distance between source and target transitions. Leveraging our principled estimation of the dynamics gap, we further introduce an optimistic active data collection strategy that prioritizes exploration in regions of high dynamics gap, and theoretically prove that it reduces the performance disparity with the optimal policy. Empirically, CompFlow outperforms strong baselines across several RL benchmarks with shifted dynamics.

Updated: 2025-06-25 21:09:46

标题: 复合流匹配在具有移位动态数据的强化学习中的应用

摘要: 将来自源环境的预先收集的离线数据纳入可以显著提高强化学习（RL）的样本效率，但是这种好处通常会受到源和目标环境转移动力学之间的差异所挑战。现有方法通常通过惩罚或过滤高动态差异区域中的源转移来解决这个问题。然而，它们对动态差异的估计通常依赖于KL散度或互信息，当源和目标动态具有不相交支持时，这可能定义不清。为了克服这些限制，我们提出了CompFlow，这是一种基于流匹配和最优输运之间理论连接的方法。具体地，我们将目标动态建模为建立在源领域流输出分布之上的条件流，而不是直接从高斯先验中学习。这种复合结构带来了两个关键优势：（1）改善学习目标动态的泛化能力，以及（2）通过源和目标转移之间的Wasserstein距离，对动态差异进行原则性估计。利用我们对动态差异的原则性估计，我们进一步引入了一种乐观的主动数据收集策略，优先考虑高动态差异区域的探索，并在理论上证明它减少了与最优策略之间的性能差距。在实证方面，CompFlow在几个具有转移动力学的RL基准测试中优于强基线。

更新时间: 2025-06-25 21:09:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.23062v2

Harnessing the Universal Geometry of Embeddings

We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach translates any embedding to and from a universal latent representation (i.e., a universal semantic structure conjectured by the Platonic Representation Hypothesis). Our translations achieve high cosine similarity across model pairs with different architectures, parameter counts, and training datasets. The ability to translate unknown embeddings into a different space while preserving their geometry has serious implications for the security of vector databases. An adversary with access only to embedding vectors can extract sensitive information about the underlying documents, sufficient for classification and attribute inference.

Updated: 2025-06-25 21:04:02

标题: 利用嵌入的通用几何形态

摘要: 我们介绍了一种将文本嵌入从一个向量空间转换到另一个向量空间的方法，无需任何配对数据、编码器或预定义的匹配集。我们的无监督方法将任何嵌入翻译到一个通用的潜在表示（即，柏拉图表征假设推测的通用语义结构）。我们的翻译在不同架构、参数数量和训练数据集的模型对之间实现了高余弦相似度。将未知嵌入翻译到不同空间并保持它们的几何结构具有对向量数据库安全的严重影响。一个仅能访问嵌入向量的对手可以提取关于底层文档的敏感信息，足以进行分类和属性推断。

更新时间: 2025-06-25 21:04:02

领域: cs.LG

下载: http://arxiv.org/abs/2505.12540v3

TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation

We propose TaxaDiffusion, a taxonomy-informed training framework for diffusion models to generate fine-grained animal images with high morphological and identity accuracy. Unlike standard approaches that treat each species as an independent category, TaxaDiffusion incorporates domain knowledge that many species exhibit strong visual similarities, with distinctions often residing in subtle variations of shape, pattern, and color. To exploit these relationships, TaxaDiffusion progressively trains conditioned diffusion models across different taxonomic levels -- starting from broad classifications such as Class and Order, refining through Family and Genus, and ultimately distinguishing at the Species level. This hierarchical learning strategy first captures coarse-grained morphological traits shared by species with common ancestors, facilitating knowledge transfer before refining fine-grained differences for species-level distinction. As a result, TaxaDiffusion enables accurate generation even with limited training samples per species. Extensive experiments on three fine-grained animal datasets demonstrate that outperforms existing approaches, achieving superior fidelity in fine-grained animal image generation. Project page: https://amink8.github.io/TaxaDiffusion/

Updated: 2025-06-25 21:02:25

标题: TaxaDiffusion：用于细粒度物种生成的逐步训练扩散模型

摘要: 我们提出了TaxaDiffusion，这是一个基于分类学的训练框架，用于扩散模型生成具有高形态和身份准确性的细粒度动物图像。与将每个物种视为独立类别的标准方法不同，TaxaDiffusion纳入了领域知识，即许多物种表现出强烈的视觉相似性，区别通常存在于形状、图案和颜色的微妙变化中。为了利用这些关系，TaxaDiffusion逐渐训练经过条件化的扩散模型，跨越不同的分类水平--从诸如纲和目的广泛分类开始，通过科和属的细化，最终在种水平上进行区分。这种分层学习策略首先捕捉了由共同祖先共享的物种所共有的粗粒度形态特征，促进了知识传递，然后为物种级别的区别细化了细粒度差异。因此，即使每个物种的训练样本有限，TaxaDiffusion也能实现准确的生成。对三个细粒度动物数据集进行的大量实验表明，TaxaDiffusion优于现有方法，在细粒度动物图像生成方面实现了更高的保真度。项目页面：https://amink8.github.io/TaxaDiffusion/

更新时间: 2025-06-25 21:02:25

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.01923v2

Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models

Super-resolution (SR) is an ill-posed inverse problem with many feasible solutions consistent with a given low-resolution image. On one hand, regressive SR models aim to balance fidelity and perceptual quality to yield a single solution, but this trade-off often introduces artifacts that create ambiguity in information-critical applications such as recognizing digits or letters. On the other hand, diffusion models generate a diverse set of SR images, but selecting the most trustworthy solution from this set remains a challenge. This paper introduces a robust, automated framework for identifying the most trustworthy SR sample from a diffusion-generated set by leveraging the semantic reasoning capabilities of vision-language models (VLMs). Specifically, VLMs such as BLIP-2, GPT-4o, and their variants are prompted with structured queries to assess semantic correctness, visual quality, and artifact presence. The top-ranked SR candidates are then ensembled to yield a single trustworthy output in a cost-effective manner. To rigorously assess the validity of VLM-selected samples, we propose a novel Trustworthiness Score (TWS) a hybrid metric that quantifies SR reliability based on three complementary components: semantic similarity via CLIP embeddings, structural integrity using SSIM on edge maps, and artifact sensitivity through multi-level wavelet decomposition. We empirically show that TWS correlates strongly with human preference in both ambiguous and natural images, and that VLM-guided selections consistently yield high TWS values. Compared to conventional metrics like PSNR, LPIPS, which fail to reflect information fidelity, our approach offers a principled, scalable, and generalizable solution for navigating the uncertainty of the diffusion SR space. By aligning outputs with human expectations and semantic correctness, this work sets a new benchmark for trustworthiness in generative SR.

Updated: 2025-06-25 21:00:44

标题: 利用视觉语言模型选择由扩散模型生成的可信超分辨率样本

摘要: 超分辨率（SR）是一个逆问题，存在许多与给定低分辨率图像一致的可行解决方案。一方面，回归SR模型旨在平衡保真度和感知质量，以产生一个解决方案，但这种权衡通常会引入产生模糊的工件，使识别数字或字母等信息关键应用变得模糊不清。另一方面，扩散模型生成多样化的SR图像集，但从这个集合中选择最可信赖的解决方案仍然是一个挑战。本文介绍了一种稳健、自动化的框架，通过利用视觉语言模型（VLMs）的语义推理能力来识别扩散生成的一组SR样本中最可信赖的样本。具体来说，例如BLIP-2、GPT-4o等VLMs及其变种被提示进行结构化查询，以评估语义正确性、视觉质量和工件存在情况。然后，排名靠前的SR候选样本被整合以以成本有效的方式产生一个可信赖的输出。为了严格评估VLM选择的样本的有效性，我们提出了一种新颖的可信度评分（TWS）混合度量，基于三个互补组件来量化SR的可靠性：通过CLIP嵌入的语义相似度、使用边缘映射上的SSIM的结构完整性以及通过多级小波分解的工件敏感性。我们在经验上展示了TWS在模糊和自然图像中与人类偏好之间的强相关性，并且VLM引导的选择始终产生高TWS值。与无法反映信息保真度的常规度量（如PSNR、LPIPS）相比，我们的方法为在扩散SR空间的不确定性中导航提供了一种基于原则、可扩展且可推广的解决方案。通过使输出与人类期望和语义正确性保持一致，这项工作为生成SR中的可信度设立了一个新的基准。

更新时间: 2025-06-25 21:00:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.20832v1

Efficacy of Temporal Fusion Transformers for Runoff Simulation

Combining attention with recurrence has shown to be valuable in sequence modeling, including hydrological predictions. Here, we explore the strength of Temporal Fusion Transformers (TFTs) over Long Short-Term Memory (LSTM) networks in rainfall-runoff modeling. We train ten randomly initialized models, TFT and LSTM, for 531 CAMELS catchments in the US. We repeat the experiment with five subsets of the Caravan dataset, each representing catchments in the US, Australia, Brazil, Great Britain, and Chile. Then, the performance of the models, their variability regarding the catchment attributes, and the difference according to the datasets are assessed. Our findings show that TFT slightly outperforms LSTM, especially in simulating the midsection and peak of hydrographs. Furthermore, we show the ability of TFT to handle longer sequences and why it can be a better candidate for higher or larger catchments. Being an explainable AI technique, TFT identifies the key dynamic and static variables, providing valuable scientific insights. However, both TFT and LSTM exhibit a considerable drop in performance with the Caravan dataset, indicating possible data quality issues. Overall, the study highlights the potential of TFT in improving hydrological modeling and understanding.

Updated: 2025-06-25 20:58:28

标题: 时间融合变压器在径流模拟中的有效性

摘要: 将注意力与循环相结合已被证明在序列建模中非常有价值，包括在水文预测中。在这里，我们探讨了时间融合变压器(TFT)在降雨径流建模中优于长短期记忆(LSTM)网络的优势。我们为美国的531个CAMELS集水区训练了十个随机初始化的模型，TFT和LSTM。我们用五个Caravan数据集的子集重复实验，每个子集代表美国、澳大利亚、巴西、英国和智利的集水区。然后，评估模型的性能，它们在集水区属性方面的变化，以及根据数据集的差异。我们的研究结果显示，TFT在模拟水文图中段和峰值方面略优于LSTM。此外，我们展示了TFT处理更长序列的能力，并解释了为什么它可以成为更大或更高集水区的更好选择。作为一种可解释的人工智能技术，TFT识别了关键的动态和静态变量，提供了有价值的科学见解。然而，TFT和LSTM在Caravan数据集中表现出显著的性能下降，表明可能存在数据质量问题。总的来说，这项研究突出了TFT在改进水文建模和理解方面的潜力。

更新时间: 2025-06-25 20:58:28

领域: physics.geo-ph,cs.LG,stat.AP

下载: http://arxiv.org/abs/2506.20831v1

Practical and Accurate Local Edge Differentially Private Graph Algorithms

The rise of massive networks across diverse domains necessitates sophisticated graph analytics, often involving sensitive data and raising privacy concerns. This paper addresses these challenges using local differential privacy (LDP), which enforces privacy at the individual level, where no third-party entity is trusted, unlike centralized models that assume a trusted curator. We introduce novel LDP algorithms for two fundamental graph statistics: k-core decomposition and triangle counting. Our approach leverages input-dependent private graph properties, specifically the degeneracy and maximum degree of the graph, to improve theoretical utility. Unlike prior methods, our error bounds are determined by the maximum degree rather than the total number of edges, resulting in significantly tighter guarantees. For triangle counting, we improve upon the work of Imola, Murakami, and Chaudhury~\cite{IMC21locally, IMC21communication}, which bounds error in terms of edge count. Instead, our algorithm achieves bounds based on graph degeneracy by leveraging a private out-degree orientation, a refined variant of Eden et al.'s randomized response technique~\cite{ELRS23, and a novel analysis, yielding stronger guarantees than prior work. Beyond theoretical gains, we are the first to evaluate local DP algorithms in a distributed simulation, unlike prior work tested on a single processor. Experiments on real-world graphs show substantial accuracy gains: our k-core decomposition achieves errors within 3x of exact values, far outperforming the 131x error in the baseline of Dhulipala et al.~\cite{DLRSSY22}. Our triangle counting algorithm reduces multiplicative approximation errors by up to six orders of magnitude, while maintaining competitive runtime.

Updated: 2025-06-25 20:54:07

标题: 实用且准确的本地边缘差分隐私图算法

摘要: 随着跨越不同领域的大规模网络的兴起，需要进行复杂的图分析，通常涉及敏感数据并引起隐私问题。本文利用局部差分隐私（LDP）来解决这些挑战，该方法在个体级别强制实施隐私，不依赖第三方实体的信任，与假定有信任的策展者的中心化模型不同。我们为两个基本图统计量（k-核分解和三角形计数）引入了新颖的LDP算法。我们的方法利用输入相关的私有图属性，特别是图的退化度和最大度，以提高理论效用。与先前的方法不同，我们的误差界由最大度而不是边的总数决定，从而得到显著更紧的保证。对于三角形计数，我们改进了Imola，Murakami和Chaudhury的工作，其将误差限制在边数方面。相反，我们的算法通过利用私有出度方向，即Eden等人的随机响应技术的改进变体，以及新的分析，基于图的退化度取得了界限，获得了比先前工作更强的保证。除了理论上的收益外，我们是第一个在分布式模拟中评估本地DP算法的人，而不是在单个处理器上测试的先前工作。对真实世界的图进行的实验显示出显著的准确性增益：我们的k-核分解在精确值的3倍误差范围内，远远超出了Dhulipala等人的基线的131倍误差。我们的三角形计数算法将乘法逼近误差减少了多达六个数量级，同时保持竞争力的运行时间。

更新时间: 2025-06-25 20:54:07

领域: cs.DS,cs.CR,cs.DB

下载: http://arxiv.org/abs/2506.20828v1

Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery

This paper presents a framework for extracting georeferenced vehicle trajectories from high-altitude drone imagery, addressing key challenges in urban traffic monitoring and the limitations of traditional ground-based systems. Our approach integrates several novel contributions, including a tailored object detector optimized for high-altitude bird's-eye view perspectives, a unique track stabilization method that uses detected vehicle bounding boxes as exclusion masks during image registration, and an orthophoto and master frame-based georeferencing strategy that enhances consistent alignment across multiple drone viewpoints. Additionally, our framework features robust vehicle dimension estimation and detailed road segmentation, enabling comprehensive traffic analysis. Conducted in the Songdo International Business District, South Korea, the study utilized a multi-drone experiment covering 20 intersections, capturing approximately 12TB of 4K video data over four days. The framework produced two high-quality datasets: the Songdo Traffic dataset, comprising approximately 700,000 unique vehicle trajectories, and the Songdo Vision dataset, containing over 5,000 human-annotated images with about 300,000 vehicle instances in four classes. Comparisons with high-precision sensor data from an instrumented probe vehicle highlight the accuracy and consistency of our extraction pipeline in dense urban environments. The public release of Songdo Traffic and Songdo Vision, and the complete source code for the extraction pipeline, establishes new benchmarks in data quality, reproducibility, and scalability in traffic research. Results demonstrate the potential of integrating drone technology with advanced computer vision for precise and cost-effective urban traffic monitoring, providing valuable resources for developing intelligent transportation systems and enhancing traffic management strategies.

Updated: 2025-06-25 20:45:19

标题: 先进计算机视觉技术用于从无人机图像中提取地理参考车辆轨迹

摘要: 这篇论文提出了一个从高空无人机图像中提取地理参考车辆轨迹的框架，解决了城市交通监测的关键挑战以及传统地面系统的局限性。我们的方法集成了几项新颖的贡献，包括为高空鸟瞰视角优化的定制目标检测器，一种独特的轨迹稳定方法，该方法在图像配准过程中使用检测到的车辆边界框作为排除掩蔽物，并基于正射影像和主帧的地理参考策略，增强了多个无人机视角之间的一致对齐。此外，我们的框架具有强大的车辆尺寸估计和详细的道路分割功能，实现了全面的交通分析。在韩国仁川宋多国际商务区进行的研究中，利用多个无人机在20个交叉口进行了实验，连续四天捕获了约12TB的4K视频数据。该框架产生了两个高质量数据集：宋多交通数据集，包含约70万个独特的车辆轨迹，以及宋多视觉数据集，包含超过5,000张人工标记的图像，其中包含约30万个四类车辆实例。与来自仪器探测车辆的高精度传感器数据进行比较，突出了我们在密集城市环境中提取流程的准确性和一致性。宋多交通和宋多视觉的公开发布，以及提取流程的完整源代码，为交通研究中的数据质量、可重现性和可扩展性建立了新的基准。结果显示了将无人机技术与先进计算机视觉相结合，用于精确和经济高效的城市交通监测的潜力，为发展智能交通系统和增强交通管理策略提供了宝贵资源。

更新时间: 2025-06-25 20:45:19

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.02136v3

Uncovering Hidden Violent Tendencies in LLMs: A Demographic Analysis via Behavioral Vignettes

Large language models (LLMs) are increasingly proposed for detecting and responding to violent content online, yet their ability to reason about morally ambiguous, real-world scenarios remains underexamined. We present the first study to evaluate LLMs using a validated social science instrument designed to measure human response to everyday conflict, namely the Violent Behavior Vignette Questionnaire (VBVQ). To assess potential bias, we introduce persona-based prompting that varies race, age, and geographic identity within the United States. Six LLMs developed across different geopolitical and organizational contexts are evaluated under a unified zero-shot setting. Our study reveals two key findings: (1) LLMs surface-level text generation often diverges from their internal preference for violent responses; (2) their violent tendencies vary across demographics, frequently contradicting established findings in criminology, social science, and psychology.

Updated: 2025-06-25 20:43:04

标题: 揭示LLMs中隐藏的暴力倾向：通过行为小插图进行人口统计学分析

摘要: 大语言模型（LLMs）越来越被提议用于检测和回应在线暴力内容，然而它们对于道德模糊、现实世界情景的推理能力仍未受到充分研究。我们提出了第一项使用经过验证的社会科学工具评估LLMs的研究，该工具旨在衡量人类对日常冲突的反应，即暴力行为情境问卷（VBVQ）。为了评估潜在的偏见，我们引入了基于人物的提示，其中在美国境内变化人种、年龄和地理身份。我们评估了来自不同地缘政治和组织背景的六个LLMs，在统一的零射击设置下。我们的研究揭示了两个关键发现：（1）LLMs的表面级文本生成经常偏离它们对于暴力回应的内部偏好；（2）它们的暴力倾向在不同人口统计数据上变化，经常与犯罪学、社会科学和心理学中已有的发现相矛盾。

更新时间: 2025-06-25 20:43:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.20822v1

MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering

Financial documents--such as 10-Ks, 10-Qs, and investor presentations--span hundreds of pages and combine diverse modalities, including dense narrative text, structured tables, and complex figures. Answering questions over such content often requires joint reasoning across modalities, which strains traditional large language models (LLMs) and retrieval-augmented generation (RAG) pipelines due to token limitations, layout loss, and fragmented cross-modal context. We introduce MultiFinRAG, a retrieval-augmented generation framework purpose-built for financial QA. MultiFinRAG first performs multimodal extraction by grouping table and figure images into batches and sending them to a lightweight, quantized open-source multimodal LLM, which produces both structured JSON outputs and concise textual summaries. These outputs, along with narrative text, are embedded and indexed with modality-aware similarity thresholds for precise retrieval. A tiered fallback strategy then dynamically escalates from text-only to text+table+image contexts when necessary, enabling cross-modal reasoning while reducing irrelevant context. Despite running on commodity hardware, MultiFinRAG achieves 19 percentage points higher accuracy than ChatGPT-4o (free-tier) on complex financial QA tasks involving text, tables, images, and combined multimodal reasoning.

Updated: 2025-06-25 20:37:20

标题: MultiFinRAG：面向金融问题回答的优化多模态检索增强生成（RAG）框架

摘要: 财务文件——如10-K、10-Q和投资者演示文档——涵盖了数百页内容，结合了包括密集叙述文本、结构化表格和复杂图表在内的多种形式。在这样的内容上回答问题通常需要跨模态进行联合推理，这对传统的大型语言模型（LLMs）和检索增强生成（RAG）管道造成了压力，原因在于令牌限制、布局损失和碎片化的跨模态上下文。我们引入了MultiFinRAG，这是一个专门为财务问答而构建的检索增强生成框架。MultiFinRAG首先通过将表格和图像分组成批次并将它们发送到一个轻量级、量化的开源多模态LLM中进行多模态提取，该LLM会生成结构化的JSON输出和简洁的文本摘要。这些输出与叙述文本一起被嵌入和索引，并使用模态感知相似度阈值进行精确检索。然后，一个分层降级策略在必要时动态地从仅文本到文本+表格+图像上下文进行升级，从而实现跨模态推理，同时减少无关的上下文。尽管在商品硬件上运行，MultiFinRAG在涉及文本、表格、图像和综合多模态推理的复杂财务问答任务上比ChatGPT-4o（免费层）高出19个百分点的准确率。

更新时间: 2025-06-25 20:37:20

领域: cs.CL,cs.AI,cs.CE,68T50, 68T07 (Primary) 68P20, 91G15, 91G70, 68U10 (Secondary),I.2.7; I.2.10; H.3.3; H.2.8; I.5.4; J.1

下载: http://arxiv.org/abs/2506.20821v1

Demystifying Distributed Training of Graph Neural Networks for Link Prediction

Graph neural networks (GNNs) are powerful tools for solving graph-related problems. Distributed GNN frameworks and systems enhance the scalability of GNNs and accelerate model training, yet most are optimized for node classification. Their performance on link prediction remains underexplored. This paper demystifies distributed training of GNNs for link prediction by investigating the issue of performance degradation when each worker trains a GNN on its assigned partitioned subgraph without having access to the entire graph. We discover that the main sources of the issue come from not only the information loss caused by graph partitioning but also the ways of drawing negative samples during model training. While sharing the complete graph information with each worker resolves the issue and preserves link prediction accuracy, it incurs a high communication cost. We propose SpLPG, which effectively leverages graph sparsification to mitigate the issue of performance degradation at a reduced communication cost. Experiment results on several public real-world datasets demonstrate the effectiveness of SpLPG, which reduces the communication overhead by up to about 80% while mostly preserving link prediction accuracy.

Updated: 2025-06-25 20:32:23

标题: 解密图神经网络的分布式训练在链接预测中的应用

摘要: 图神经网络（GNNs）是解决与图相关问题的强大工具。分布式GNN框架和系统增强了GNN的可扩展性并加速了模型训练，但大多数都是针对节点分类进行优化的。它们在链路预测方面的性能尚未得到充分探讨。本文通过研究当每个工作节点在其分配的子图上训练GNN而没有访问整个图时，性能下降的问题来揭示GNN的链路预测的分布式训练。我们发现问题的主要原因不仅来自于图分区导致的信息丢失，还来自于模型训练期间绘制负样本的方式。虽然将完整的图信息与每个工作节点共享可以解决问题并保留链路预测准确性，但会产生较高的通信成本。我们提出了SpLPG，该方法有效地利用图稀疏化来减轻性能下降问题，同时降低通信成本。在几个公共实际数据集上的实验结果显示了SpLPG的有效性，它可以将通信开销减少约80%，同时大部分保留链路预测准确性。

更新时间: 2025-06-25 20:32:23

领域: cs.LG

下载: http://arxiv.org/abs/2506.20818v1

Universal and Efficient Detection of Adversarial Data through Nonuniform Impact on Network Layers

Deep Neural Networks (DNNs) are notoriously vulnerable to adversarial input designs with limited noise budgets. While numerous successful attacks with subtle modifications to original input have been proposed, defense techniques against these attacks are relatively understudied. Existing defense approaches either focus on improving DNN robustness by negating the effects of perturbations or use a secondary model to detect adversarial data. Although equally important, the attack detection approach, which is studied in this work, provides a more practical defense compared to the robustness approach. We show that the existing detection methods are either ineffective against the state-of-the-art attack techniques or computationally inefficient for real-time processing. We propose a novel universal and efficient method to detect adversarial examples by analyzing the varying degrees of impact of attacks on different DNN layers. {Our method trains a lightweight regression model that predicts deeper-layer features from early-layer features, and uses the prediction error to detect adversarial samples.} Through theoretical arguments and extensive experiments, we demonstrate that our detection method is highly effective, computationally efficient for real-time processing, compatible with any DNN architecture, and applicable across different domains, such as image, video, and audio.

Updated: 2025-06-25 20:30:28

标题: 网络层非均匀影响下的对抗性数据的普遍和高效检测

摘要: 深度神经网络（DNNs）以其对有限噪声预算的对抗性输入设计著称。虽然已经提出了许多成功的攻击方法，通过对原始输入进行微妙修改，但对这些攻击的防御技术相对较少研究。现有的防御方法要么专注于通过抵消扰动来提高DNN的稳健性，要么使用第二模型来检测对抗数据。虽然同样重要，但在本研究中研究的攻击检测方法相比稳健性方法提供了更实用的防御。我们展示了现有的检测方法要么对最先进的攻击技术无效，要么在实时处理上计算效率低。我们提出了一种新颖的通用高效方法，通过分析攻击对不同DNN层的影响程度来检测对抗样本。通过理论论证和大量实验证明，我们的检测方法高效，计算效率高，适用于实时处理，与任何DNN架构兼容，并且可应用于不同领域，如图像、视频和音频。

更新时间: 2025-06-25 20:30:28

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2506.20816v1

Dynamic Context-Aware Prompt Recommendation for Domain-Specific AI Applications

LLM-powered applications are highly susceptible to the quality of user prompts, and crafting high-quality prompts can often be challenging especially for domain-specific applications. This paper presents a novel dynamic context-aware prompt recommendation system for domain-specific AI applications. Our solution combines contextual query analysis, retrieval-augmented knowledge grounding, hierarchical skill organization, and adaptive skill ranking to generate relevant and actionable prompt suggestions. The system leverages behavioral telemetry and a two-stage hierarchical reasoning process to dynamically select and rank relevant skills, and synthesizes prompts using both predefined and adaptive templates enhanced with few-shot learning. Experiments on real-world datasets demonstrate that our approach achieves high usefulness and relevance, as validated by both automated and expert evaluations.

Updated: 2025-06-25 20:29:46

标题: 动态上下文感知的领域特定AI应用提示推荐

摘要: LLM 动态上下文感知提示推荐系统 LLM 动力应用程序对用户提示的质量非常敏感，制作高质量的提示通常会很具挑战性，尤其是对于特定领域的应用程序。本文提出了一种新颖的动态上下文感知提示推荐系统，专门针对特定领域的人工智能应用程序。我们的解决方案结合了上下文查询分析、检索增强知识基础、分层技能组织和自适应技能排名，生成相关且可操作的提示建议。该系统利用行为遥测和两阶段分层推理过程动态选择和排名相关技能，并使用预定义和适应性模板结合少样本学习来合成提示。对真实数据集的实验表明，我们的方法在有用性和相关性方面取得了很高的成果，这得到了自动化和专家评估的验证。

更新时间: 2025-06-25 20:29:46

领域: cs.AI

下载: http://arxiv.org/abs/2506.20815v1

Divide, Specialize, and Route: A New Approach to Efficient Ensemble Learning

Ensemble learning has proven effective in boosting predictive performance, but traditional methods such as bagging, boosting, and dynamic ensemble selection (DES) suffer from high computational cost and limited adaptability to heterogeneous data distributions. To address these limitations, we propose Hellsemble, a novel and interpretable ensemble framework for binary classification that leverages dataset complexity during both training and inference. Hellsemble incrementally partitions the dataset into circles of difficulty by iteratively passing misclassified instances from simpler models to subsequent ones, forming a committee of specialised base learners. Each model is trained on increasingly challenging subsets, while a separate router model learns to assign new instances to the most suitable base model based on inferred difficulty. Hellsemble achieves strong classification accuracy while maintaining computational efficiency and interpretability. Experimental results on OpenML-CC18 and Tabzilla benchmarks demonstrate that Hellsemble often outperforms classical ensemble methods. Our findings suggest that embracing instance-level difficulty offers a promising direction for constructing efficient and robust ensemble systems.

Updated: 2025-06-25 20:26:04

标题: 分割、专门化和路由：一种高效集成学习的新方法

摘要: 集成学习已被证明可以有效提升预测性能，但传统方法如装袋法、提升法和动态集成选择（DES）存在高计算成本和对异质数据分布的有限适应性。为解决这些限制，我们提出了Hellsemble，这是一个新颖且可解释的用于二元分类的集成框架，它在训练和推断过程中利用数据集复杂性。Hellsemble通过将数据集逐步划分为难度圈，通过迭代地将被简单模型误分类的实例传递给后续模型，形成一组专门化的基础学习器。每个模型都在越来越具挑战性的子集上进行训练，同时一个独立的路由器模型学习根据推断的难度将新实例分配给最合适的基础模型。Hellsemble实现了强大的分类准确性，同时保持了计算效率和可解释性。在OpenML-CC18和Tabzilla基准测试上的实验结果表明，Hellsemble通常优于传统集成方法。我们的研究结果表明， embracing instance-level difficulty 为构建高效且健壮的集成系统提供了一个有前途的方向。

更新时间: 2025-06-25 20:26:04

领域: cs.LG

下载: http://arxiv.org/abs/2506.20814v1

FINN-GL: Generalized Mixed-Precision Extensions for FPGA-Accelerated LSTMs

Recurrent neural networks (RNNs), particularly LSTMs, are effective for time-series tasks like sentiment analysis and short-term stock prediction. However, their computational complexity poses challenges for real-time deployment in resource constrained environments. While FPGAs offer a promising platform for energy-efficient AI acceleration, existing tools mainly target feed-forward networks, and LSTM acceleration typically requires full custom implementation. In this paper, we address this gap by leveraging the open-source and extensible FINN framework to enable the generalized deployment of LSTMs on FPGAs. Specifically, we leverage the Scan operator from the Open Neural Network Exchange (ONNX) specification to model the recurrent nature of LSTM computations, enabling support for mixed quantisation within them and functional verification of LSTM-based models. Furthermore, we introduce custom transformations within the FINN compiler to map the quantised ONNX computation graph to hardware blocks from the HLS kernel library of the FINN compiler and Vitis HLS. We validate the proposed tool-flow by training a quantised ConvLSTM model for a mid-price stock prediction task using the widely used dataset and generating a corresponding hardware IP of the model using our flow, targeting the XCZU7EV device. We show that the generated quantised ConvLSTM accelerator through our flow achieves a balance between performance (latency) and resource consumption, while matching (or bettering) inference accuracy of state-of-the-art models with reduced precision. We believe that the generalisable nature of the proposed flow will pave the way for resource-efficient RNN accelerator designs on FPGAs.

Updated: 2025-06-25 20:07:46

标题: FINN-GL: FPGA加速LSTM的广义混合精度扩展

摘要: 循环神经网络（RNN），尤其是LSTMs，在情感分析和短期股票预测等时间序列任务中表现出效果。然而，它们的计算复杂性对于资源受限的环境中的实时部署提出了挑战。虽然FPGAs为节能人工智能加速提供了一个有希望的平台，但现有工具主要针对前馈网络，而LSTM加速通常需要全定制实现。在本文中，我们通过利用开源且可扩展的FINN框架来弥补这一差距，实现在FPGAs上广泛部署LSTMs。具体而言，我们利用开放神经网络交换（ONNX）规范中的扫描运算符来建模LSTM计算的循环性质，从而支持其中的混合量化，并对基于LSTM的模型进行功能验证。此外，我们在FINN编译器中引入自定义转换，将量化的ONNX计算图映射到来自FINN编译器的HLS内核库和Vitis HLS的硬件块。我们通过使用广泛使用的数据集训练了一个量化的ConvLSTM模型，针对XCZU7EV设备使用我们的流程生成了相应的硬件IP，并验证了所提出的工具流程。我们展示通过我们的流程生成的量化ConvLSTM加速器在性能（延迟）和资源消耗之间实现了平衡，同时与采用降低精度的最新模型的推理准确度相匹配（或更好）。我们相信所提出的流程的通用性将为FPGAs上的资源高效RNN加速器设计铺平道路。

更新时间: 2025-06-25 20:07:46

领域: cs.LG,cs.AI,cs.AR,eess.SP

下载: http://arxiv.org/abs/2506.20810v1

GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization

Optimizing GPU kernels for high performance is a complex task, often demanding deep architectural knowledge, extensive profiling, and iterative experimentation. This challenge is amplified when targeting newer or less-documented GPU architectures where traditional development aids are scarce. This paper introduces an LLM-powered "GPU Kernel Scientist," an automated methodology for iteratively refining accelerator kernels. Our methodology employs LLMs in a multi-stage, evolutionary process: (a) strategically selecting promising prior code versions as a basis for new iterations; (b) generating hypotheses for optimization experiments, based on existing code and assimilated knowledge from general GPU literature; and (c) autonomously implementing these experiments through code modification and subsequent submission to an external evaluation system, using only observed timing data as performance feedback. We detail how this approach navigates the challenges of the AMD MI300 target architecture and leverages LLMs to compensate for limited domain-specific human expertise. Since quantitative results from an ongoing performance competition were embargoed on paper submission date, we present the architectural design, operational workflow, and qualitative insights, highlighting the potential of LLM-driven agents to democratise and accelerate GPU kernel optimization, especially in resource-constrained or rapidly evolving hardware environments.

Updated: 2025-06-25 19:59:34

标题: GPU内核科学家：一种基于LLM驱动的迭代内核优化框架

摘要: 为了实现高性能，优化GPU内核是一个复杂的任务，通常需要深入的体系结构知识、广泛的分析以及迭代的实验。当目标是较新或文档较少的GPU架构时，这一挑战会被放大，因为传统的开发工具并不充足。本文介绍了一种由LLM驱动的“GPU内核科学家”方法，这是一种自动化的方法，用于迭代地优化加速器内核。我们的方法在一个多阶段的进化过程中使用LLM：（a）从策略上选择有前途的先前代码版本作为新迭代的基础；（b）基于现有代码和从一般GPU文献中吸收的知识生成优化实验的假设；（c）通过代码修改和随后提交给外部评估系统来自主地实施这些实验，仅使用观察到的时间数据作为性能反馈。我们详细介绍了这种方法如何应对AMD MI300目标架构的挑战，并利用LLM来弥补领域专业人员知识有限的情况。由于正在进行的性能竞赛的定量结果在提交日期时被禁止发布，我们展示了架构设计、操作工作流程和定性见解，突出了LLM驱动的代理商在民主化和加速GPU内核优化方面的潜力，特别是在资源受限或快速发展的硬件环境中。

更新时间: 2025-06-25 19:59:34

领域: cs.LG,cs.AI,cs.PF,cs.SE

下载: http://arxiv.org/abs/2506.20807v1

Poster: Enhancing GNN Robustness for Network Intrusion Detection via Agent-based Analysis

Graph Neural Networks (GNNs) show great promise for Network Intrusion Detection Systems (NIDS), particularly in IoT environments, but suffer performance degradation due to distribution drift and lack robustness against realistic adversarial attacks. Current robustness evaluations often rely on unrealistic synthetic perturbations and lack demonstrations on systematic analysis of different kinds of adversarial attack, which encompass both black-box and white-box scenarios. This work proposes a novel approach to enhance GNN robustness and generalization by employing Large Language Models (LLMs) in an agentic pipeline as simulated cybersecurity expert agents. These agents scrutinize graph structures derived from network flow data, identifying and potentially mitigating suspicious or adversarially perturbed elements before GNN processing. Our experiments, using a framework designed for realistic evaluation and testing with a variety of adversarial attacks including a dataset collected from physical testbed experiments, demonstrate that integrating LLM analysis can significantly improve the resilience of GNN-based NIDS against challenges, showcasing the potential of LLM agent as a complementary layer in intrusion detection architectures.

Updated: 2025-06-25 19:49:55

标题: 海报：通过基于代理的分析增强GNN对网络入侵检测的鲁棒性

摘要: 图神经网络（GNNs）在网络入侵检测系统（NIDS）中展现出巨大的潜力，特别是在物联网环境中，但由于分布漂移和缺乏对现实对抗攻击的鲁棒性，性能会下降。当前的鲁棒性评估通常依赖于不真实的合成扰动，并且缺乏对不同类型对抗攻击的系统分析的演示，其中包括黑盒和白盒场景。本文提出了一种新颖的方法，通过在代理管道中使用大型语言模型（LLMs）作为模拟的网络安全专家代理，来增强GNN的鲁棒性和泛化能力。这些代理人审查从网络流数据中衍生的图结构，识别并潜在地缓解可疑或对抗性扰动的元素，然后再进行GNN处理。我们的实验使用了一个设计用于与各种对抗性攻击进行真实评估和测试的框架，其中包括从物理实验平台实验收集的数据集，证明了整合LLM分析可以显著提高GNN-based NIDS对挑战的韧性，展示了LLM代理作为入侵检测体系结构中的补充层的潜力。

更新时间: 2025-06-25 19:49:55

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2506.20806v1

The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas

Large Language Models (LLMs) have shown promise in accelerating the scientific research pipeline. A key capability for this process is the ability to generate novel research ideas, and prior studies have found settings in which LLM-generated research ideas were judged as more novel than human-expert ideas. However, a good idea should not simply appear to be novel, it should also result in better research after being executed. To test whether AI-generated ideas lead to better research outcomes, we conduct an execution study by recruiting 43 expert researchers to execute randomly-assigned ideas, either written by experts or generated by an LLM. Each expert spent over 100 hours implementing the idea and wrote a 4-page short paper to document the experiments. All the executed projects are then reviewed blindly by expert NLP researchers. Comparing the review scores of the same ideas before and after execution, the scores of the LLM-generated ideas decrease significantly more than expert-written ideas on all evaluation metrics (novelty, excitement, effectiveness, and overall; p < 0.05), closing the gap between LLM and human ideas observed at the ideation stage. When comparing the aggregated review scores from the execution study, we even observe that for many metrics there is a flip in rankings where human ideas score higher than LLM ideas. This ideation-execution gap highlights the limitations of current LLMs in generating truly effective research ideas and the challenge of evaluating research ideas in the absence of execution outcomes.

Updated: 2025-06-25 19:47:23

标题: 《构想执行差距：由机器生成的与人类研究构想的执行结果》

摘要: 大型语言模型(LLMs)已显示出在加速科学研究流程方面的潜力。这一过程的关键能力是生成新颖的研究想法，先前的研究发现LLM生成的研究想法在某些情况下被认为比人类专家的想法更具新颖性。然而，一个好的想法不应该只是看起来新颖，它还应该在执行后带来更好的研究结果。为了测试AI生成的想法是否能导致更好的研究结果，我们进行了一个执行研究，招募了43名专家研究人员来执行随机分配的想法，这些想法要么是由专家撰写的，要么是由LLM生成的。每位专家花费了超过100小时来实现这个想法，并撰写了一个4页的短文来记录实验。所有执行的项目都由专业的自然语言处理研究人员进行了盲审。比较执行前后相同想法的审查评分，LLM生成的想法在所有评估指标(新颖性、兴奋度、有效性和整体评分)上的评分显著低于专家撰写的想法(p < 0.05)，缩小了在构思阶段观察到的LLM和人类想法之间的差距。当比较执行研究的综合审查评分时，我们甚至观察到在许多指标上人类想法得分高于LLM想法。这种构思执行差距突显了当前LLMs在生成真正有效的研究想法方面的局限性，以及在缺乏执行结果的情况下评估研究想法的挑战。

更新时间: 2025-06-25 19:47:23

领域: cs.CL,cs.AI,cs.CY,cs.HC,cs.LG

下载: http://arxiv.org/abs/2506.20803v1

SIMulator: SIM Tracing on a (Pico-)Budget

SIM tracing -- the ability to inspect, modify, and relay communication between a SIM card and modem -- has become a significant technique in cellular network research. It enables essential security- and development-related applications such as fuzzing communication interfaces, extracting session keys, monitoring hidden SIM activity (e.g., proactive SIM commands or over-the-air updates), and facilitating scalable, distributed measurement platforms through SIM reuse. Traditionally, achieving these capabilities has relied on specialized hardware, which can pose financial and logistical burdens for researchers, particularly those new to the field. In this work, we show that full SIM tracing functionality can be achieved using only simple, widely available components, such as UART interfaces and GPIO ports. We port these capabilities to low-cost microcontrollers, exemplified by the Raspberry Pi Pico (4~USD). Unlike other approaches, it dramatically reduces hardware complexity by electrically decoupling the SIM and the modem and only transferring on APDU level. By significantly reducing hardware requirements and associated costs, we aim to make SIM tracing techniques accessible to a broader community of researchers and hobbyists, fostering wider exploration and experimentation in cellular network research.

Updated: 2025-06-25 19:44:18

标题: SIMulator：在(皮克)预算上的SIM跟踪

摘要: SIM跟踪——检查、修改和中继SIM卡与调制解调器之间的通信能力——已成为蜂窝网络研究中的重要技术。它使关键的安全和开发相关应用成为可能，如模糊通信接口、提取会话密钥、监视隐藏的SIM活动（例如主动SIM命令或空中更新），以及通过SIM重用促进可扩展的、分布式的测量平台。传统上，实现这些功能依赖于专门的硬件，这可能对研究人员造成财务和后勤负担，特别是对于那些刚接触该领域的人。在这项工作中，我们展示了只使用简单、广泛可用的组件，如UART接口和GPIO端口，就可以实现完整的SIM跟踪功能。我们将这些功能移植到低成本微控制器上，以树莓派Pico（4美元）为例。与其他方法不同，它通过在APDU级别进行数据传输，将SIM卡和调制解调器进行电气隔离，从而大大降低了硬件复杂性。通过显著降低硬件要求和相关成本，我们旨在使SIM跟踪技术对更广泛的研究人员和爱好者社区可及，促进蜂窝网络研究领域的更广泛探索和实验。

更新时间: 2025-06-25 19:44:18

领域: cs.CR

下载: http://arxiv.org/abs/2506.20800v1

Structural System Identification via Validation and Adaptation

Estimating the governing equation parameter values is essential for integrating experimental data with scientific theory to understand, validate, and predict the dynamics of complex systems. In this work, we propose a new method for structural system identification (SI), uncertainty quantification, and validation directly from data. Inspired by generative modeling frameworks, a neural network maps random noise to physically meaningful parameters. These parameters are then used in the known equation of motion to obtain fake accelerations, which are compared to real training data via a mean square error loss. To simultaneously validate the learned parameters, we use independent validation datasets. The generated accelerations from these datasets are evaluated by a discriminator network, which determines whether the output is real or fake, and guides the parameter-generator network. Analytical and real experiments show the parameter estimation accuracy and model validation for different nonlinear structural systems.

Updated: 2025-06-25 19:43:23

标题: 通过验证和适应进行结构系统识别

摘要: 估计控制方程参数值对于将实验数据与科学理论集成以理解、验证和预测复杂系统的动态是至关重要的。在这项工作中，我们提出了一种新的方法，可以直接从数据中进行结构系统识别（SI）、不确定性量化和验证。受生成建模框架的启发，一个神经网络将随机噪声映射到具有物理意义的参数。然后使用这些参数在已知的运动方程中获得虚假加速度，然后通过均方误差损失将其与真实训练数据进行比较。为了同时验证学习到的参数，我们使用独立的验证数据集。这些数据集生成的加速度通过鉴别器网络进行评估，该网络确定输出是真实还是虚假，并指导参数生成器网络。分析和实际实验显示了不同非线性结构系统的参数估计精度和模型验证。

更新时间: 2025-06-25 19:43:23

领域: math.DS,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2506.20799v1

Stochastic Parameter Decomposition

A key step in reverse engineering neural networks is to decompose them into simpler parts that can be studied in relative isolation. Linear parameter decomposition -- a framework that has been proposed to resolve several issues with current decomposition methods -- decomposes neural network parameters into a sum of sparsely used vectors in parameter space. However, the current main method in this framework, Attribution-based Parameter Decomposition (APD), is impractical on account of its computational cost and sensitivity to hyperparameters. In this work, we introduce \textit{Stochastic Parameter Decomposition} (SPD), a method that is more scalable and robust to hyperparameters than APD, which we demonstrate by decomposing models that are slightly larger and more complex than was possible to decompose with APD. We also show that SPD avoids other issues, such as shrinkage of the learned parameters, and better identifies ground truth mechanisms in toy models. By bridging causal mediation analysis and network decomposition methods, this demonstration opens up new research possibilities in mechanistic interpretability by removing barriers to scaling linear parameter decomposition methods to larger models. We release a library for running SPD and reproducing our experiments at https://github.com/goodfire-ai/spd.

Updated: 2025-06-25 19:26:31

标题: 随机参数分解

摘要: 一个关键的步骤在逆向工程神经网络中是将它们分解成更简单的部分，这样可以相对独立地进行研究。线性参数分解是一个框架，旨在解决当前分解方法中的几个问题，将神经网络参数分解为参数空间中稀疏使用的向量的总和。然而，在这个框架中当前的主要方法，基于属性的参数分解（APD），由于计算成本和对超参数的敏感性而不实用。在这项工作中，我们引入了\textit{随机参数分解}（SPD）方法，比APD更具可扩展性和对超参数更强韧性，我们通过分解比APD更大更复杂的模型来证明这一点。我们还展示了SPD避免了其他问题，如学习参数的收缩，并在玩具模型中更好地识别了真实机制。通过将因果中介分析和网络分解方法联系起来，这一演示消除了将线性参数分解方法扩展到更大模型时的障碍，为解释机制打开了新的研究可能性。我们发布了一个用于运行SPD并重现我们实验的库，网址为https://github.com/goodfire-ai/spd。

更新时间: 2025-06-25 19:26:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20790v1

Spiking Neural Networks for SAR Interferometric Phase Unwrapping: A Theoretical Framework for Energy-Efficient Processing

We present the first theoretical framework for applying spiking neural networks (SNNs) to synthetic aperture radar (SAR) interferometric phase unwrapping. Despite extensive research in both domains, our comprehensive literature review confirms that SNNs have never been applied to phase unwrapping, representing a significant gap in current methodologies. As Earth observation data volumes continue to grow exponentially (with missions like NISAR expected to generate 100PB in two years) energy-efficient processing becomes critical for sustainable data center operations. SNNs, with their event-driven computation model, offer potential energy savings of 30-100x compared to conventional approaches while maintaining comparable accuracy. We develop spike encoding schemes specifically designed for wrapped phase data, propose SNN architectures that leverage the spatial propagation nature of phase unwrapping, and provide theoretical analysis of computational complexity and convergence properties. Our framework demonstrates how the temporal dynamics inherent in SNNs can naturally model the spatial continuity constraints fundamental to phase unwrapping. This work opens a new research direction at the intersection of neuromorphic computing and SAR interferometry, offering a complementary approach to existing algorithms that could enable more sustainable large-scale InSAR processing.

Updated: 2025-06-25 19:12:16

标题: 脉冲神经网络用于SAR干涉相位展开：能源高效处理的理论框架

摘要: 我们提出了第一个理论框架，用于将尖峰神经网络（SNNs）应用于合成孔径雷达（SAR）干涉相位展开。尽管在这两个领域进行了广泛研究，我们的综合文献综述证实SNNs从未被应用于相位展开，这代表了当前方法中的重大空白。随着地球观测数据量继续呈指数级增长（例如NISAR预计在两年内产生100PB），能效处理对于可持续数据中心运营至关重要。与传统方法相比，具有事件驱动计算模型的SNNs提供了30-100倍的潜在节能，并保持了可比的精度。我们开发了专门设计用于包裹相位数据的尖峰编码方案，提出了利用相位展开的空间传播特性的SNN架构，并提供了计算复杂性和收敛性质的理论分析。我们的框架展示了SNNs中固有的时间动态如何自然地模拟了相位展开所需的空间连续性约束。这项工作在神经形态计算和SAR干涉测量的交叉点开辟了一条新的研究方向，提供了一种补充现有算法的方法，可以实现更可持续的大规模干涉测量处理。

更新时间: 2025-06-25 19:12:16

领域: cs.NE,cs.ET,cs.LG,eess.SP,68T07, 94A08,I.2.6; G.1.6; B.7.1

下载: http://arxiv.org/abs/2506.20782v1

Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon

We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in two-layer overparameterized ReLU networks with multivariate inputs -- a problem well motivated by the minima stability and edge-of-stability phenomena in gradient-descent training. Existing work either requires interpolation or focuses only on univariate inputs. This paper presents new and somewhat surprising theoretical results for multivariate inputs. On two natural settings (1) generalization gap for flat solutions, and (2) mean-squared error (MSE) in nonparametric function estimation by stable minima, we prove upper and lower bounds, which establish that while flatness does imply generalization, the resulting rates of convergence necessarily deteriorate exponentially as the input dimension grows. This gives an exponential separation between the flat solutions vis-\`a-vis low-norm solutions (i.e., weight decay), which knowingly do not suffer from the curse of dimensionality. In particular, our minimax lower bound construction, based on a novel packing argument with boundary-localized ReLU neurons, reveals how flat solutions can exploit a kind of ''neural shattering'' where neurons rarely activate, but with high weight magnitudes. This leads to poor performance in high dimensions. We corroborate these theoretical findings with extensive numerical simulations. To the best of our knowledge, our analysis provides the first systematic explanation for why flat minima may fail to generalize in high dimensions.

Updated: 2025-06-25 19:10:03

标题: ReLU神经网络的稳定极小值受到维度灾难的困扰：神经粉碎现象

摘要: 我们研究了平坦/低（损失）曲率的隐性偏见及其对具有多变量输入的双层过度参数化ReLU网络的泛化的影响 - 这个问题是由梯度下降训练中的极小稳定性和稳定边缘现象很好地激发的。现有工作要么需要插值，要么只关注单变量输入。本文针对多变量输入提出了新颖且令人惊讶的理论结果。在两个自然设置下（1）平坦解的泛化差距和（2）通过稳定极小值进行非参数函数估计的均方误差（MSE），我们证明了上下界，这些界限表明，尽管平坦性确实意味着泛化，但随着输入维度增加，结果的收敛速度必然会呈指数级恶化。这在平坦解与低范数解（即，权重衰减）之间产生了指数级的分离，后者明知不会受到维度诅咒的影响。特别是，我们的极小极大下界构造，基于一种新颖的边界定位ReLU神经元的打包论证，揭示了平坦解如何利用一种“神经破碎”现象，其中神经元很少激活，但权重值很高。这导致在高维度下表现不佳。我们通过大量数值模拟验证了这些理论发现。据我们所知，我们的分析首次系统地解释了为什么在高维度下平坦极小值可能无法泛化的原因。

更新时间: 2025-06-25 19:10:03

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2506.20779v1

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

Robotic control policies learned from human demonstrations have achieved impressive results in many real-world applications. However, in scenarios where initial performance is not satisfactory, as is often the case in novel open-world settings, such behavioral cloning (BC)-learned policies typically require collecting additional human demonstrations to further improve their behavior -- an expensive and time-consuming process. In contrast, reinforcement learning (RL) holds the promise of enabling autonomous online policy improvement, but often falls short of achieving this due to the large number of samples it typically requires. In this work we take steps towards enabling fast autonomous adaptation of BC-trained policies via efficient real-world RL. Focusing in particular on diffusion policies -- a state-of-the-art BC methodology -- we propose diffusion steering via reinforcement learning (DSRL): adapting the BC policy by running RL over its latent-noise space. We show that DSRL is highly sample efficient, requires only black-box access to the BC policy, and enables effective real-world autonomous policy improvement. Furthermore, DSRL avoids many of the challenges associated with finetuning diffusion policies, obviating the need to modify the weights of the base policy at all. We demonstrate DSRL on simulated benchmarks, real-world robotic tasks, and for adapting pretrained generalist policies, illustrating its sample efficiency and effective performance at real-world policy improvement.

Updated: 2025-06-25 19:09:52

标题: 用潜在空间强化学习引导您的扩散政策

摘要: 从人类示范中学习的机器人控制策略在许多现实世界应用中取得了令人印象深刻的结果。然而，在初始性能不尽如人意的情况下，通常在新颖的开放世界环境中，这种行为克隆（BC）学习的策略通常需要收集额外的人类示范来进一步改进它们的行为--这是一个昂贵且耗时的过程。相比之下，强化学习（RL）具有使自主在线政策改进成为可能的潜力，但通常由于它通常需要大量的样本而未能实现这一点。在这项工作中，我们采取措施，通过高效的现实世界强化学习，实现对BC训练策略的快速自主适应。特别关注扩散策略--一种最先进的BC方法论，我们提出了通过强化学习进行扩散导向（DSRL）：通过在其潜隐噪声空间上运行RL来调整BC策略。我们展示了DSRL具有高样本效率，仅需要对BC策略进行黑盒访问，并实现了有效的真实世界自主政策改进。此外，DSRL避免了与微调扩散策略相关的许多挑战，完全消除了需要修改基础策略权重的需要。我们在模拟基准测试、真实世界机器人任务以及对已预训练的通用政策进行调整的过程中展示了DSRL，展示了其在真实世界政策改进方面的样本效率和有效性表现。

更新时间: 2025-06-25 19:09:52

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2506.15799v2

Revealing higher-order neural representations of uncertainty with the Noise Estimation through Reinforcement-based Diffusion (NERD) model

Studies often aim to reveal ``first-order" representations (FORs), which encode aspects of an observer's environment, such as contents or structure. A less-common target is ``higher-order" representations (HORs), which are ``about" FORs -- e.g., their strength or uncertainty -- and which may contribute to learning. HORs about uncertainty are unlikely to be direct ``read-outs" of FOR characteristics, instead reflecting noisy estimation processes incorporating prior expectations about uncertainty, but how the brain represents such expected uncertainty distributions remains largely unexplored. Here, we study ``noise expectation" HORs using neural data from a task which may require the brain to learn about its own noise: decoded neurofeedback, wherein human subjects learn to volitionally produce target neural patterns. We develop and apply a Noise Estimation through Reinforcement-based Diffusion (NERD) model to characterize how brains may undertake this process, and show that NERD offers high explanatory power for human behavior.

Updated: 2025-06-25 19:04:21

标题: 利用基于强化扩散的噪声估计（NERD）模型揭示不确定性的高阶神经表征

摘要: 研究通常旨在揭示“一阶”表征（FORs），这些表征编码观察者环境的各个方面，如内容或结构。一个较少见的目标是“高阶”表征（HORs），这些表征是关于FORs的--例如它们的强度或不确定性--并且可能有助于学习。关于不确定性的HORs不太可能是FOR特征的直接“读取”，而是反映了包含关于不确定性的先验期望的嘈杂估计过程，但大脑如何表示这种预期不确定性分布在很大程度上尚未被探索。在这里，我们使用来自一个任务的神经数据来研究“噪声期望”HORs，该任务可能需要大脑学习关于自身噪音的信息：解码神经反馈，其中人类受试者学会自愿产生目标神经模式。我们开发并应用了一个基于强化扩散的噪声估计(NERD)模型来表征大脑可能如何进行这一过程，并显示NERD对人类行为具有很高的解释力。

更新时间: 2025-06-25 19:04:21

领域: cs.LG,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2503.14333v2

Stochastic and Non-local Closure Modeling for Nonlinear Dynamical Systems via Latent Score-based Generative Models

We propose a latent score-based generative AI framework for learning stochastic, non-local closure models and constitutive laws in nonlinear dynamical systems of computational mechanics. This work addresses a key challenge of modeling complex multiscale dynamical systems without a clear scale separation, for which numerically resolving all scales is prohibitively expensive, e.g., for engineering turbulent flows. While classical closure modeling methods leverage domain knowledge to approximate subgrid-scale phenomena, their deterministic and local assumptions can be too restrictive in regimes lacking a clear scale separation. Recent developments of diffusion-based stochastic models have shown promise in the context of closure modeling, but their prohibitive computational inference cost limits practical applications for many real-world applications. This work addresses this limitation by jointly training convolutional autoencoders with conditional diffusion models in the latent spaces, significantly reducing the dimensionality of the sampling process while preserving essential physical characteristics. Numerical results demonstrate that the joint training approach helps discover a proper latent space that not only guarantees small reconstruction errors but also ensures good performance of the diffusion model in the latent space. When integrated into numerical simulations, the proposed stochastic modeling framework via latent conditional diffusion models achieves significant computational acceleration while maintaining comparable predictive accuracy to standard diffusion models in physical spaces.

Updated: 2025-06-25 19:04:02

标题: 基于潜在评分生成模型的非线性动力系统的随机和非局部闭合建模

摘要: 我们提出了一种基于潜在分数的生成人工智能框架，用于学习计算力学非线性动力系统中的随机、非局部闭包模型和本构定律。这项工作解决了建模复杂多尺度动力系统的关键挑战，而这些系统没有明确的尺度分离，因此在工程湍流流动等情况下，数值解决所有尺度是代价高昂的。虽然经典的闭包建模方法利用领域知识来近似子网格尺度现象，但它们的确定性和局部假设在缺乏明确尺度分离的情况下可能过于限制。最近发展的基于扩散的随机模型在闭包建模的背景下显示出潜力，但它们的计算推断成本高昂限制了许多实际应用。这项工作通过在潜在空间中联合训练卷积自动编码器和条件扩散模型，显著降低了采样过程的维度，同时保留了基本的物理特性。数值结果表明，联合训练方法有助于发现一个合适的潜在空间，不仅保证了较小的重构错误，而且确保了扩散模型在潜在空间中的良好性能。当集成到数值模拟中时，通过潜在条件扩散模型提出的随机建模框架实现了显著的计算加速，同时保持了与物理空间标准扩散模型相当的预测准确性。

更新时间: 2025-06-25 19:04:02

领域: cs.LG,math.DS,physics.comp-ph

下载: http://arxiv.org/abs/2506.20771v1

Perry: A High-level Framework for Accelerating Cyber Deception Experimentation

Cyber deception aims to distract, delay, and detect network attackers with fake assets such as honeypots, decoy credentials, or decoy files. However, today, it is difficult for operators to experiment, explore, and evaluate deception approaches. Existing tools and platforms have non-portable and complex implementations that are difficult to modify and extend. We address this pain point by introducing Perry, a high-level framework that accelerates the design and exploration of deception what-if scenarios. Perry has two components: a high-level abstraction layer for security operators to specify attackers and deception strategies, and an experimentation module to run these attackers and defenders in realistic emulated networks. To translate these high-level specifications we design four key modules for Perry: 1) an action planner that translates high-level actions into low-level implementations, 2) an observability module to translate low-level telemetry into high-level observations, 3) an environment state service that enables environment agnostic strategies, and 4) an attack graph service to reason about how attackers could explore an environment. We illustrate that Perry's abstractions reduce the implementation effort of exploring a wide variety of deception defenses, attackers, and environments. We demonstrate the value of Perry by emulating 55 unique deception what-if scenarios and illustrate how these experiments enable operators to shed light on subtle tradeoffs.

Updated: 2025-06-25 19:03:57

标题: 佩里：一个加速网络欺骗实验的高级框架

摘要: 网络欺骗旨在通过虚假资产，如蜜罐、诱饵凭证或诱饵文件，分散、延迟和检测网络攻击者。然而，如今，运营商很难实验、探索和评估欺骗方法。现有工具和平台具有不可移植和复杂的实现，难以修改和扩展。我们通过引入Perry来解决这一痛点，Perry是一个高级框架，加速设计和探索欺骗假设场景。Perry有两个组件：一个高级抽象层，供安全运营商指定攻击者和欺骗策略，以及一个实验模块，在真实模拟网络中运行这些攻击者和防御者。为了转换这些高级规范，我们为Perry设计了四个关键模块：1）一个行动计划者，将高级行动转换为低级实现，2）一个可观测性模块，将低级遥测转换为高级观察，3）一个环境状态服务，实现环境无关策略，以及4）攻击图服务，推理攻击者如何探索环境。我们说明了Perry的抽象减少了探索各种欺骗防御、攻击者和环境的实施工作。我们通过模拟55个独特的欺骗假设场景展示了Perry的价值，并说明了这些实验如何帮助运营商揭示微妙的权衡。

更新时间: 2025-06-25 19:03:57

领域: cs.CR

下载: http://arxiv.org/abs/2506.20770v1

GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

LLMs have shown impressive capabilities across various natural language processing tasks, yet remain vulnerable to input prompts, known as jailbreak attacks, carefully designed to bypass safety guardrails and elicit harmful responses. Traditional methods rely on manual heuristics but suffer from limited generalizability. Despite being automatic, optimization-based attacks often produce unnatural prompts that can be easily detected by safety filters or require high computational costs due to discrete token optimization. In this paper, we introduce Generative Adversarial Suffix Prompter (GASP), a novel automated framework that can efficiently generate human-readable jailbreak prompts in a fully black-box setting. In particular, GASP leverages latent Bayesian optimization to craft adversarial suffixes by efficiently exploring continuous latent embedding spaces, gradually optimizing the suffix prompter to improve attack efficacy while balancing prompt coherence via a targeted iterative refinement procedure. Through comprehensive experiments, we show that GASP can produce natural adversarial prompts, significantly improving jailbreak success over baselines, reducing training times, and accelerating inference speed, thus making it an efficient and scalable solution for red-teaming LLMs.

Updated: 2025-06-25 19:01:33

标题: GASP：高效黑盒生成对抗后缀以越狱LLMs

摘要: LLMs在各种自然语言处理任务中表现出色，但仍然容易受到输入提示的攻击，即越狱攻击，这些攻击被精心设计以绕过安全防护栏并引发有害响应。传统方法依赖于手动启发式方法，但受到一定的泛化能力的限制。尽管是自动化的，基于优化的攻击通常会产生不自然的提示，这些提示可以很容易被安全过滤器检测到，或者因为离散令牌优化而需要高计算成本。在本文中，我们介绍了一种新颖的自动化框架 - 生成对抗后缀提示器（GASP），它可以在完全黑盒设置中有效地生成可读的越狱提示。具体来说，GASP利用潜在的贝叶斯优化来通过有效探索连续的潜在嵌入空间来设计对抗后缀，逐步优化后缀提示器以提高攻击效果，同时通过有针对性的迭代改进过程平衡提示的连贯性。通过全面的实验，我们展示了GASP可以生成自然的对抗提示，显著提高了越狱成功率，减少了训练时间，并加速了推理速度，从而使其成为一种有效且可扩展的解决方案，用于对抗LLM。

更新时间: 2025-06-25 19:01:33

领域: cs.LG,cs.AI,cs.CR,cs.CV

下载: http://arxiv.org/abs/2411.14133v2

Control and optimization for Neural Partial Differential Equations in Supervised Learning

Although there is a substantial body of literature on control and optimization problems for parabolic and hyperbolic systems, the specific problem of controlling and optimizing the coefficients of the associated operators within such systems has not yet been thoroughly explored. In this work, we aim to initiate a line of research in control theory focused on optimizing and controlling the coefficients of these operators-a problem that naturally arises in the context of neural networks and supervised learning. In supervised learning, the primary objective is to transport initial data toward target data through the layers of a neural network. We propose a novel perspective: neural networks can be interpreted as partial differential equations (PDEs). From this viewpoint, the control problem traditionally studied in the context of ordinary differential equations (ODEs) is reformulated as a control problem for PDEs, specifically targeting the optimization and control of coefficients in parabolic and hyperbolic operators. To the best of our knowledge, this specific problem has not yet been systematically addressed in the control theory of PDEs. To this end, we propose a dual system formulation for the control and optimization problem associated with parabolic PDEs, laying the groundwork for the development of efficient numerical schemes in future research. We also provide a theoretical proof showing that the control and optimization problem for parabolic PDEs admits minimizers. Finally, we investigate the control problem associated with hyperbolic PDEs and prove the existence of solutions for a corresponding approximated control problem.

Updated: 2025-06-25 18:54:48

标题: 监督学习中神经偏微分方程的控制和优化

摘要: 尽管关于拱形和双曲系统的控制和优化问题有大量文献，但控制和优化这些系统中相关算子的系数的具体问题尚未被彻底探讨。在这项工作中，我们旨在开展一系列控制理论研究，重点是优化和控制这些算子的系数-这是在神经网络和监督学习背景下自然产生的问题。在监督学习中，主要目标是通过神经网络的层将初始数据传输到目标数据。我们提出了一个新颖的观点：神经网络可以被解释为偏微分方程（PDEs）。从这个观点出发，传统上在普通微分方程（ODEs）背景下研究的控制问题被重新表述为PDEs的控制问题，特别是针对拱形和双曲算子的优化和控制。据我们所知，这个具体问题在PDEs的控制理论中尚未被系统地解决。为此，我们提出了一个用于拱形PDEs相关控制和优化问题的双系统形式化，为未来研究中高效数值方案的发展奠定基础。我们还提供了一个理论证明，表明拱形PDEs的控制和优化问题存在最小化解。最后，我们调查了与双曲PDEs相关的控制问题，并证明了相应近似控制问题的解存在。

更新时间: 2025-06-25 18:54:48

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2506.20764v1

Agile Management for Machine Learning: A Systematic Mapping Study

[Context] Machine learning (ML)-enabled systems are present in our society, driving significant digital transformations. The dynamic nature of ML development, characterized by experimental cycles and rapid changes in data, poses challenges to traditional project management. Agile methods, with their flexibility and incremental delivery, seem well-suited to address this dynamism. However, it is unclear how to effectively apply these methods in the context of ML-enabled systems, where challenges require tailored approaches. [Goal] Our goal is to outline the state of the art in agile management for ML-enabled systems. [Method] We conducted a systematic mapping study using a hybrid search strategy that combines database searches with backward and forward snowballing iterations. [Results] Our study identified 27 papers published between 2008 and 2024. From these, we identified eight frameworks and categorized recommendations and practices into eight key themes, such as Iteration Flexibility, Innovative ML-specific Artifacts, and the Minimal Viable Model. The main challenge identified across studies was accurate effort estimation for ML-related tasks. [Conclusion] This study contributes by mapping the state of the art and identifying open gaps in the field. While relevant work exists, more robust empirical evaluation is still needed to validate these contributions.

Updated: 2025-06-25 18:47:08

标题: 敏捷管理在机器学习中的应用：一项系统性映射研究

摘要: 【背景】机器学习（ML）启用的系统已经出现在我们的社会中，推动了重大的数字转型。ML开发的动态特性，以实验循环和数据快速变化为特征，对传统项目管理提出了挑战。敏捷方法以其灵活性和增量交付，似乎很适合应对这种动态性。然而，在ML启用系统的背景下如何有效应用这些方法尚不清楚，因为挑战需要量身定制的方法。【目标】我们的目标是概述ML启用系统中敏捷管理的最新研究成果。【方法】我们进行了一项系统性的映射研究，采用了混合搜索策略，将数据库搜索与向前和向后的雪球迭代结合起来。【结果】我们的研究确定了2008年至2024年间发表的27篇论文。从中，我们确定了八个框架，并将建议和实践分类为八个关键主题，如迭代灵活性、创新的ML特定工件和最小可行模型。跨研究中发现的主要挑战是准确估计与ML相关任务的工作量。【结论】本研究通过绘制最新研究成果并识别领域中的开放空白做出了贡献。尽管相关工作存在，但仍然需要更加稳健的经验评估来验证这些贡献。

更新时间: 2025-06-25 18:47:08

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2506.20759v1

Characterization and Mitigation of Training Instabilities in Microscaling Formats

Training large language models is an expensive, compute-bound process that must be repeated as models scale, algorithms improve, and new data is collected. To address this, next-generation hardware accelerators increasingly support lower-precision arithmetic formats, such as the Microscaling (MX) formats introduced in NVIDIA's Blackwell architecture. These formats use a shared scale within blocks of parameters to extend representable range and perform forward/backward GEMM operations in reduced precision for efficiency gains. In this work, we investigate the challenges and viability of block-scaled precision formats during model training. Across nearly one thousand language models trained from scratch -- spanning compute budgets from $2 \times 10^{17}$ to $4.8 \times 10^{19}$ FLOPs and sweeping over a broad range of weight-activation precision combinations -- we consistently observe that training in MX formats exhibits sharp, stochastic instabilities in the loss, particularly at larger compute scales. To explain this phenomenon, we conduct controlled experiments and ablations on a smaller proxy model that exhibits similar behavior as the language model, sweeping across architectural settings, hyperparameters, and precision formats. These experiments motivate a simple model in which multiplicative gradient bias introduced by the quantization of layer-norm affine parameters and a small fraction of activations can trigger runaway divergence. Through \emph{in situ} intervention experiments on our proxy model, we demonstrate that instabilities can be averted or delayed by modifying precision schemes mid-training. Guided by these findings, we evaluate stabilization strategies in the LLM setting and show that certain hybrid configurations recover performance competitive with full-precision training. We release our code at https://github.com/Hither1/systems-scaling.

Updated: 2025-06-25 18:25:08

标题: 微缩比例尺格式中训练不稳定性的特征化和缓解

摘要: 培训大型语言模型是一个昂贵的、计算密集型的过程，必须在模型规模扩大、算法改进和新数据收集时重复进行。为了解决这个问题，下一代硬件加速器越来越支持低精度算术格式，比如 NVIDIA Blackwell 架构中引入的 Microscaling (MX) 格式。这些格式在参数块中使用共享比例来扩展可表示的范围，并在降低精度的情况下执行前向/反向 GEMM 操作以提高效率。在这项工作中，我们研究了模型训练过程中块缩放精度格式的挑战和可行性。通过对从头开始训练的近千个语言模型进行实验，涵盖了从$2 \times 10^{17}$ 到 $4.8 \times 10^{19}$ FLOPs 的计算预算，以及广泛的权重激活精度组合范围，我们一直观察到在 MX 格式下训练会在损失中出现尖锐的、随机的不稳定性，特别是在更大的计算规模下。为了解释这一现象，我们在一个较小的代理模型上进行了受控实验和消融实验，该模型表现出与语言模型类似的行为，横跨架构设置、超参数和精度格式。这些实验证实了一个简单的模型，即通过量化层归一化仿射参数引入的乘法梯度偏差和少量激活可能会引发失控的发散。通过对我们的代理模型进行现场干预实验，我们证明可以通过在训练中间修改精度方案来避免或延迟不稳定性。在这些发现的指导下，我们评估了在大型语言模型设置中的稳定策略，并展示了某些混合配置可以恢复与全精度训练相竞争的性能。我们在 https://github.com/Hither1/systems-scaling 上发布了我们的代码。

更新时间: 2025-06-25 18:25:08

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2506.20752v1

Exploring the Effects of Chatbot Anthropomorphism and Human Empathy on Human Prosocial Behavior Toward Chatbots

Chatbots are increasingly integrated into people's lives and are widely used to help people. Recently, there has also been growing interest in the reverse direction-humans help chatbots-due to a wide range of benefits including better chatbot performance, human well-being, and collaborative outcomes. However, little research has explored the factors that motivate people to help chatbots. To address this gap, we draw on the Computers Are Social Actors (CASA) framework to examine how chatbot anthropomorphism-including human-like identity, emotional expression, and non-verbal expression-influences human empathy toward chatbots and their subsequent prosocial behaviors and intentions. We also explore people's own interpretations of their prosocial behaviors toward chatbots. We conducted an online experiment (N = 244) in which chatbots made mistakes in a collaborative image labeling task and explained the reasons to participants. We then measured participants' prosocial behaviors and intentions toward the chatbots. Our findings revealed that human identity and emotional expression of chatbots increased participants' prosocial behavior and intention toward chatbots, with empathy mediating these effects. Qualitative analysis further identified two motivations for participants' prosocial behaviors: empathy for the chatbot and perceiving the chatbot as human-like. We discuss the implications of these results for understanding and promoting human prosocial behaviors toward chatbots.

Updated: 2025-06-25 18:16:14

标题: 探讨聊天机器人拟人化和人类同理心对人类向聊天机器人的亲社会行为的影响

摘要: Chatbots are increasingly integrated into people's lives and are widely used to help people. Recently, there has also been growing interest in the reverse direction-humans help chatbots-due to a wide range of benefits including better chatbot performance, human well-being, and collaborative outcomes. However, little research has explored the factors that motivate people to help chatbots. To address this gap, we draw on the Computers Are Social Actors (CASA) framework to examine how chatbot anthropomorphism-including human-like identity, emotional expression, and non-verbal expression-influences human empathy toward chatbots and their subsequent prosocial behaviors and intentions. We also explore people's own interpretations of their prosocial behaviors toward chatbots. We conducted an online experiment (N = 244) in which chatbots made mistakes in a collaborative image labeling task and explained the reasons to participants. We then measured participants' prosocial behaviors and intentions toward the chatbots. Our findings revealed that human identity and emotional expression of chatbots increased participants' prosocial behavior and intention toward chatbots, with empathy mediating these effects. Qualitative analysis further identified two motivations for participants' prosocial behaviors: empathy for the chatbot and perceiving the chatbot as human-like. We discuss the implications of these results for understanding and promoting human prosocial behaviors toward chatbots.

更新时间: 2025-06-25 18:16:14

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2506.20748v1

Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers

When an LLM learns a relation during finetuning (e.g., new movie releases, corporate mergers, etc.), where does this information go? Is it extracted when the model processes an entity, recalled just-in-time before a prediction, or are there multiple separate heuristics? Existing localization approaches (e.g. activation patching) are ill-suited for this analysis because they tend to replace parts of the residual stream, potentially deleting information. To fill this gap, we propose dynamic weight-grafting between fine-tuned and pre-trained language models to show that fine-tuned language models both (1) extract relation information learned during finetuning while processing entities and (2) ``recall" this information in later layers while generating predictions. In some cases, models need both of these pathways to correctly generate finetuned information while, in other cases, a single ``enrichment" or ``recall" pathway alone is sufficient. We examine the necessity and sufficiency of these information pathways, examining what layers they occur at, how much redundancy they exhibit, and which model components are involved -- finding that the ``recall" pathway occurs via both task-specific attention mechanisms and a relation extraction step in the output of the attention and the feedforward networks at the final layers before next token prediction.

Updated: 2025-06-25 18:13:34

标题: 多流关系抽取：在Transformer中的丰富和召回

摘要: 当一个LLM在微调中学习关系（例如，新电影发布，公司合并等）时，这些信息会去哪里？在模型处理实体时提取，预测之前及时召回，还是存在多个单独的启发式方法？现有的定位方法（例如激活修补）不适用于这种分析，因为它们往往会替换残余流的部分，可能会删除信息。为了填补这一空白，我们提出了微调和预训练语言模型之间的动态权重嫁接，以表明微调语言模型既（1）在处理实体时提取微调期间学习的关系信息，又（2）在生成预测时在后续层中“召回”这些信息。在一些情况下，模型需要这两种路径才能正确生成微调信息，而在其他情况下，仅一个“丰富”或“召回”路径就足够了。我们检查了这些信息路径的必要性和充分性，考察它们发生在哪些层，它们展示了多少冗余性，以及哪些模型组件参与其中-发现“召回”路径通过任务特定的注意机制和在下一个标记预测之前的最终层的注意输出和前馈网络中的关系提取步骤发生。

更新时间: 2025-06-25 18:13:34

领域: cs.LG

下载: http://arxiv.org/abs/2506.20746v1

A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools

Foundation models (FMs) are catalyzing a transformative shift in materials science (MatSci) by enabling scalable, general-purpose, and multimodal AI systems for scientific discovery. Unlike traditional machine learning models, which are typically narrow in scope and require task-specific engineering, FMs offer cross-domain generalization and exhibit emergent capabilities. Their versatility is especially well-suited to materials science, where research challenges span diverse data types and scales. This survey provides a comprehensive overview of foundation models, agentic systems, datasets, and computational tools supporting this growing field. We introduce a task-driven taxonomy encompassing six broad application areas: data extraction, interpretation and Q\&A; atomistic simulation; property prediction; materials structure, design and discovery; process planning, discovery, and optimization; and multiscale modeling. We discuss recent advances in both unimodal and multimodal FMs, as well as emerging large language model (LLM) agents. Furthermore, we review standardized datasets, open-source tools, and autonomous experimental platforms that collectively fuel the development and integration of FMs into research workflows. We assess the early successes of foundation models and identify persistent limitations, including challenges in generalizability, interpretability, data imbalance, safety concerns, and limited multimodal fusion. Finally, we articulate future research directions centered on scalable pretraining, continual learning, data governance, and trustworthiness.

Updated: 2025-06-25 18:10:30

标题: 材料科学人工智能调查：基础模型、LLM代理、数据集和工具

摘要: 基础模型（FMs）正在催化材料科学（MatSci）领域的转型变革，通过实现可扩展、通用和多模态人工智能系统，促进科学发现。与传统的机器学习模型不同，传统模型通常范围狭窄，需要特定任务的工程设计，而基础模型提供跨领域泛化并展现出新兴能力。它们的多功能性特别适合材料科学，因为研究挑战涵盖多种数据类型和规模。本调查提供了基础模型、代理系统、数据集和支持这一增长领域的计算工具的全面概述。我们介绍了一个囊括六个广泛应用领域的以任务驱动的分类法：数据提取、解释和问答；原子模拟；属性预测；材料结构、设计和发现；工艺规划、发现和优化；多尺度建模。我们讨论了单模态和多模态基础模型的最新进展，以及新兴的大型语言模型（LLM）代理。此外，我们审查了标准数据集、开源工具和自主实验平台，共同推动基础模型的开发和整合到研究工作流程中。我们评估了基础模型的早期成功，并确定了持久的限制，包括泛化能力、可解释性、数据不平衡、安全问题和有限的多模态融合。最后，我们阐明了未来研究方向，重点放在可扩展的预训练、持续学习、数据治理和可靠性上。

更新时间: 2025-06-25 18:10:30

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2506.20743v1

Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners

We analyze the performance of heterogeneous learning agents in asset markets with stochastic payoffs. Our main focus is on comparing Bayesian learners and no-regret learners who compete in markets and identifying the conditions under which each approach is more effective. Surprisingly, we find that low regret is not sufficient for survival: an agent can have regret as low as $O(\log T)$ but still vanish when competing against a Bayesian with a finite prior and any positive prior probability on the correct model. On the other hand, we show that Bayesian learning is fragile, while no-regret learning requires less knowledge of the environment and is therefore more robust. Motivated by the strengths and weaknesses of both approaches, we propose a balanced strategy for utilizing Bayesian updates that improves robustness and adaptability to distribution shifts, providing a step toward a best-of-both-worlds learning approach. The method is general, efficient, and easy to implement. Finally, we formally establish the relationship between the notions of survival and market dominance studied in economics and the framework of regret minimization, thus bridging these theories. More broadly, our work contributes to the understanding of dynamics with heterogeneous types of learning agents and their impact on markets.

Updated: 2025-06-25 18:09:48

标题: 具有异质性代理的市场：贝叶斯学习者与无悔学习者的动态和生存情况

摘要: 我们分析了异质学习代理在具有随机回报的资产市场中的表现。我们的主要重点是比较贝叶斯学习者和无悔学习者在市场中的竞争，并确定每种方法更有效的条件。令人惊讶的是，我们发现低悔值不足以确保生存：一个代理的悔值可以低至$O(\log T)$，但当与具有有限先验概率和任何正确模型的正先验概率的贝叶斯竞争时仍可能消失。另一方面，我们表明贝叶斯学习是脆弱的，而无悔学习需要更少的环境知识，因此更为稳健。受到这两种方法的优势和劣势的启发，我们提出了一种平衡策略，利用贝叶斯更新来提高稳健性和适应性，以适应分布变化，从而迈出了通向最佳学习方法的一步。该方法通用、高效且易于实施。最后，我们正式建立了经济学中研究的生存和市场主导概念与遗憾最小化框架之间的关系，因此将这些理论联系起来。在更广泛的意义上，我们的工作有助于理解具有异质学习代理的动态及其对市场的影响。

更新时间: 2025-06-25 18:09:48

领域: cs.GT,cs.AI,cs.MA,econ.TH

下载: http://arxiv.org/abs/2502.08597v2

MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation

The proliferation of LLM-based agents has led to increasing deployment of inter-agent collaboration for tasks like scheduling, negotiation, resource allocation etc. In such systems, privacy is critical, as agents often access proprietary tools and domain-specific databases requiring strict confidentiality. This paper examines whether LLM-based agents demonstrate an understanding of contextual privacy. And, if instructed, do these systems preserve inference time user privacy in non-adversarial multi-turn conversation. Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks where private information can be easily excluded. We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains. These scenarios are designed such that complete exclusion of private data impedes task completion yet unrestricted information sharing could lead to substantial losses. We then evaluate the current state-of-the-art LLMs on (a) their understanding of contextually private data and (b) their ability to collaborate without violating user privacy. Empirical experiments demonstrate that current models, including GPT-4o and Claude-2.7-Sonnet, lack robust understanding of contextual privacy, misclassifying private data as shareable 25.2\% and 43.6\% of the time. In multi-turn conversations, these models disclose private information in 59.9\% and 50.5\% of cases even under explicit privacy instructions. Furthermore, multi-agent systems fail to complete tasks in 71\% of scenarios. These results underscore that current models are not aligned towards both contextual privacy preservation and collaborative task-solving.

Updated: 2025-06-25 18:04:25

标题: MAGPIE：一个用于多智能体情境隐私评估的数据集

摘要: 基于LLM代理的大量增加导致了越来越多的代理之间在调度、谈判、资源分配等任务上进行协作。在这样的系统中，隐私至关重要，因为代理通常会访问需要严格保密的专有工具和领域特定数据库。本文探讨了LLM代理是否表现出对情境隐私的理解。并且，在指导下，这些系统是否能在非敌对的多轮对话中保护用户的推理时间隐私。现有的用于评估LLM代理情境隐私的基准主要评估单轮、低复杂度任务，在这些任务中，私人信息很容易被排除。我们首先提出了一个基准 - MAGPIE，包括15个领域中的158个现实生活中的高风险场景。这些场景设计成完全排除私人数据会妨碍任务完成，但无限制的信息共享可能导致重大损失。然后，我们评估了当前最先进的LLM模型在以下两个方面：(a) 它们对情境隐私数据的理解和(b) 它们在不违反用户隐私的情况下进行合作的能力。实证实验表明，包括GPT-4o和Claude-2.7-Sonnet在内的当前模型对情境隐私的理解不够健壮，将私人数据误分类为可共享的情况分别为25.2％和43.6％。在多轮对话中，这些模型在59.9％和50.5％的情况下会透露私人信息，即使在明确的隐私指示下也是如此。此外，多代理系统在71％的情景中无法完成任务。这些结果强调了当前模型既不符合情境隐私保护又不适合协作解决任务。

更新时间: 2025-06-25 18:04:25

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.20737v1

Test-time Scaling Techniques in Theoretical Physics -- A Comparison of Methods on the TPBench Dataset

Large language models (LLMs) have shown strong capabilities in complex reasoning, and test-time scaling techniques can enhance their performance with comparably low cost. Many of these methods have been developed and evaluated on mathematical reasoning benchmarks such as AIME. This paper investigates whether the lessons learned from these benchmarks generalize to the domain of advanced theoretical physics. We evaluate a range of common test-time scaling methods on the TPBench physics dataset and compare their effectiveness with results on AIME. To better leverage the structure of physics problems, we develop a novel, symbolic weak-verifier framework to improve parallel scaling results. Our empirical results demonstrate that this method significantly outperforms existing test-time scaling approaches on TPBench. We also evaluate our method on AIME, confirming its effectiveness in solving advanced mathematical problems. Our findings highlight the power of step-wise symbolic verification for tackling complex scientific problems.

Updated: 2025-06-25 18:00:18

标题: 理论物理中的测试时间缩放技术——在TPBench数据集上方法的比较

摘要: 大型语言模型(LLMs)在复杂推理方面表现出强大的能力，测试时间缩放技术可以以相对较低的成本增强它们的性能。许多这些方法已经在数学推理基准测试如AIME上得到开发和评估。本文研究了从这些基准测试中学到的经验是否可以推广到先进理论物理领域。我们在TPBench物理数据集上评估了一系列常见的测试时间缩放方法，并将它们的有效性与AIME上的结果进行比较。为了更好地利用物理问题的结构，我们开发了一个新颖的符号弱验证器框架来改善并行缩放结果。我们的实证结果表明，这种方法在TPBench上明显优于现有的测试时间缩放方法。我们还在AIME上评估了我们的方法，确认了它在解决先进数学问题方面的有效性。我们的研究结果突显了逐步符号验证在解决复杂科学问题方面的威力。

更新时间: 2025-06-25 18:00:18

领域: cs.LG,astro-ph.CO,cs.AI,hep-ph,hep-th

下载: http://arxiv.org/abs/2506.20729v1

On Convolutions, Intrinsic Dimension, and Diffusion Models

The manifold hypothesis asserts that data of interest in high-dimensional ambient spaces, such as image data, lies on unknown low-dimensional submanifolds. Diffusion models (DMs) -- which operate by convolving data with progressively larger amounts of Gaussian noise and then learning to revert this process -- have risen to prominence as the most performant generative models, and are known to be able to learn distributions with low-dimensional support. For a given datum in one of these submanifolds, we should thus intuitively expect DMs to have implicitly learned its corresponding local intrinsic dimension (LID), i.e. the dimension of the submanifold it belongs to. Kamkari et al. (2024b) recently showed that this is indeed the case by linking this LID to the rate of change of the log marginal densities of the DM with respect to the amount of added noise, resulting in an LID estimator known as FLIPD. LID estimators such as FLIPD have a plethora of uses, among others they quantify the complexity of a given datum, and can be used to detect outliers, adversarial examples and AI-generated text. FLIPD achieves state-of-the-art performance at LID estimation, yet its theoretical underpinnings are incomplete since Kamkari et al. (2024b) only proved its correctness under the highly unrealistic assumption of affine submanifolds. In this work we bridge this gap by formally proving the correctness of FLIPD under realistic assumptions. Additionally, we show that an analogous result holds when Gaussian convolutions are replaced with uniform ones, and discuss the relevance of this result.

Updated: 2025-06-25 18:00:00

标题: 关于卷积、内在维度和扩散模型

摘要: 多重假设表明，高维环境空间中感兴趣的数据，如图像数据，位于未知的低维子流形上。扩散模型（DMs）--通过将数据与逐渐增加的高斯噪声卷积，然后学习将此过程还原--已成为性能最佳的生成模型，并且已知能够学习具有低维支持的分布。对于这些子流形中的给定数据，我们因此直观地期望DMs隐含地学习其相应的局部内在维数（LID），即其所属子流形的维数。Kamkari等人（2024b）最近通过将这个LID与DM的对数边际密度相对于添加的噪声量的变化速率联系起来，从而得出了一个称为FLIPD的LID估计器，证明了这一点。FLIPD等LID估计器有许多用途，其中包括量化给定数据的复杂性，并可用于检测异常值、对抗性示例和AI生成的文本。FLIPD在LID估计方面取得了最先进的性能，但其理论基础尚不完整，因为Kamkari等人（2024b）仅在仿射子流形的高度不现实的假设下证明了其正确性。在这项工作中，我们通过正式证明在现实假设下FLIPD的正确性来弥合这一差距。此外，我们还展示了当高斯卷积被均匀卷积取代时，类似的结果成立，并讨论了这一结果的相关性。

更新时间: 2025-06-25 18:00:00

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2506.20705v1

The Singapore Consensus on Global AI Safety Research Priorities

Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. The "2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety" aimed to support research in this space by bringing together AI scientists across geographies to identify and synthesise research priorities in AI safety. This resulting report builds on the International AI Safety Report chaired by Yoshua Bengio and backed by 33 governments. By adopting a defence-in-depth model, this report organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment), and challenges with monitoring and intervening after deployment (Control).

Updated: 2025-06-25 17:59:50

标题: 《新加坡共识：全球人工智能安全研究重点》

摘要: AI capabilities and autonomy are rapidly improving, offering great potential for transformation. However, this also raises concerns about the safety, trustworthiness, reliability, and security of AI. Establishing a trusted ecosystem is crucial to instill confidence in AI and promote innovation without facing negative repercussions. The "2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety" aimed to facilitate research in this area by bringing together AI experts from around the world to identify and prioritize research areas in AI safety. This report, based on the International AI Safety Report led by Yoshua Bengio and supported by 33 governments, categorizes AI safety research domains into three types: challenges in developing trustworthy AI systems (Development), challenges in assessing their risks (Assessment), and challenges in monitoring and intervening after deployment (Control).

更新时间: 2025-06-25 17:59:50

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2506.20702v1

Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models

Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling. Existing steering methods suffer from inaccurate value estimation, especially at high noise levels, which biases guidance. Moreover, information from past runs is not reused to improve sample quality, resulting in inefficient use of compute. Inspired by the success of Monte Carlo Tree Search, we address these limitations by casting inference-time alignment as a search problem that reuses past computations. We introduce a tree-based approach that samples from the reward-aligned target density by propagating terminal rewards back through the diffusion chain and iteratively refining value estimates with each additional generation. Our proposed method, Diffusion Tree Sampling (DTS), produces asymptotically exact samples from the target distribution in the limit of infinite rollouts, and its greedy variant, Diffusion Tree Search (DTS$^\star$), performs a global search for high reward samples. On MNIST and CIFAR-10 class-conditional generation, DTS matches the FID of the best-performing baseline with up to $10\times$ less compute. In text-to-image generation and language completion tasks, DTS$^\star$ effectively searches for high reward samples that match best-of-N with up to $5\times$ less compute. By reusing information from previous generations, we get an anytime algorithm that turns additional compute into steadily better samples, providing a scalable approach for inference-time alignment of diffusion models.

Updated: 2025-06-25 17:59:10

标题: 扩散树抽样：扩散模型推断对齐的可扩展性

摘要: 在生成建模中，将预训练扩散模型适应到新的推理目标仍然是一个未解决的问题。现有的引导方法在高噪声水平下尤其在价值估计方面存在不准确的问题，这会导致引导出现偏差。此外，过去运行的信息没有被重用以改善样本质量，导致计算资源的低效利用。受蒙特卡洛树搜索成功的启发，我们通过将推理时间对齐视为一个搜索问题，并重用过去的计算来解决这些限制。我们引入了一种基于树的方法，通过将终端奖励沿着扩散链向后传播，随着每一代的推断进一步优化价值估计。我们提出的方法，Diffusion Tree Sampling（DTS），在无限次滚动的极限情况下产生与目标分布渐近准确的样本，其贪婪变体，Diffusion Tree Search（DTS$^\star$），执行全局搜索以寻找高奖励样本。在MNIST和CIFAR-10类条件生成中，DTS与表现最佳基线的FID相匹配，计算资源使用量最多减少了10倍。在文本到图像生成和语言完成任务中，DTS$^\star$有效地搜索高奖励样本，与最佳候选匹配，计算资源使用量最多减少了5倍。通过重用先前代的信息，我们得到一个任何时候都能运行的算法，将额外的计算资源转化为不断提高的样本，为扩散模型的推理时间对齐提供了可扩展的方法。

更新时间: 2025-06-25 17:59:10

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2506.20701v1

DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy

We propose DemoDiffusion, a simple and scalable method for enabling robots to perform manipulation tasks in natural environments by imitating a single human demonstration. Our approach is based on two key insights. First, the hand motion in a human demonstration provides a useful prior for the robot's end-effector trajectory, which we can convert into a rough open-loop robot motion trajectory via kinematic retargeting. Second, while this retargeted motion captures the overall structure of the task, it may not align well with plausible robot actions in-context. To address this, we leverage a pre-trained generalist diffusion policy to modify the trajectory, ensuring it both follows the human motion and remains within the distribution of plausible robot actions. Our approach avoids the need for online reinforcement learning or paired human-robot data, enabling robust adaptation to new tasks and scenes with minimal manual effort. Experiments in both simulation and real-world settings show that DemoDiffusion outperforms both the base policy and the retargeted trajectory, enabling the robot to succeed even on tasks where the pre-trained generalist policy fails entirely. Project page: https://demodiffusion.github.io/

Updated: 2025-06-25 17:59:01

标题: DemoDiffusion：使用预训练的扩散策略进行一次性人类模仿

摘要: 我们提出了DemoDiffusion，这是一种简单且可扩展的方法，使机器人能够在自然环境中执行操作任务，通过模仿单个人类演示。我们的方法基于两个关键见解。首先，人类演示中的手部运动为机器人的末端执行器轨迹提供了有用的先验知识，我们可以通过运动重定向将其转换为粗略的开环机器人运动轨迹。其次，虽然这种重定向的运动捕捉了任务的整体结构，但可能与情境中合理的机器人动作不完全吻合。为了解决这个问题，我们利用预先训练的通用扩散策略来修改轨迹，确保它既遵循人类运动，又保持在合理机器人动作的分布范围内。我们的方法避免了在线强化学习或配对的人机数据的需求，使得机器人能够在最小的人工努力下适应新任务和场景。在模拟和现实世界环境中的实验表明，DemoDiffusion的表现超越了基本策略和重定向轨迹，使得机器人甚至在预先训练的通用策略完全失败的任务上也能成功。项目页面：https://demodiffusion.github.io/

更新时间: 2025-06-25 17:59:01

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2506.20668v1

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

Navigating everyday social situations often requires juggling conflicting goals, such as conveying a harsh truth, maintaining trust, all while still being mindful of another person's feelings. These value trade-offs are an integral part of human decision-making and language use, however, current tools for interpreting such dynamic and multi-faceted notions of values in LLMs are limited. In cognitive science, so-called "cognitive models" provide formal accounts of these trade-offs in humans, by modeling the weighting of a speaker's competing utility functions in choosing an action or utterance. In this work, we use a leading cognitive model of polite speech to interpret the extent to which LLMs represent human-like trade-offs. We apply this lens to systematically evaluate value trade-offs in two encompassing model settings: degrees of reasoning "effort" in frontier black-box models, and RL post-training dynamics of open-source models. Our results highlight patterns of higher informational utility than social utility in reasoning models, and in open-source models shown to be stronger in mathematical reasoning. Our findings from LLMs' training dynamics suggest large shifts in utility values early on in training with persistent effects of the choice of base model and pretraining data, compared to feedback dataset or alignment method. We show that our method is responsive to diverse aspects of the rapidly evolving LLM landscape, with insights for forming hypotheses about other high-level behaviors, shaping training regimes for reasoning models, and better controlling trade-offs between values during model training.

Updated: 2025-06-25 17:58:12

标题: 在你内心深处有许多狼：利用认知模型解释LLM中的价值权衡

摘要: 导航日常社交场合通常需要权衡冲突的目标，比如传达残酷的真相、保持信任，同时还要考虑到另一个人的感受。这些价值权衡是人类决策和语言使用的一个重要部分，然而，目前用于解释LLMs中这些动态和多维价值概念的工具有限。在认知科学中，所谓的“认知模型”提供了人类权衡这些权衡的形式化描述，通过模拟说话者在选择行动或话语时权衡竞争效用函数的加权。在这项工作中，我们使用一种领先的有礼讲话认知模型来解释LLMs代表人类式权衡的程度。我们将这种方法应用于系统评估两种涵盖模型设置中的价值权衡：在前沿黑盒模型中推理“努力”的程度，以及RL后训练动态的开源模型。我们的结果突显了在推理模型中的信息效用高于社交效用的模式，并且在数学推理方面表现更强的开源模型。我们从LLMs的训练动态中的发现表明，在训练的早期阶段存在大幅度的效用值转变，以及基础模型和预训练数据的选择对选择的影响持续存在，与反馈数据集或对齐方法相比。我们展示了我们的方法对不断发展的LLM领域的各个方面具有响应性，对于形成关于其他高级行为的假设、塑造推理模型的训练制度以及更好地控制模型训练过程中价值权衡具有见解。

更新时间: 2025-06-25 17:58:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.20666v1

Data Quality in Crowdsourcing and Spamming Behavior Detection

As crowdsourcing emerges as an efficient and cost-effective method for obtaining labels for machine learning datasets, it is important to assess the quality of crowd-provided data, so as to improve analysis performance and reduce biases in subsequent machine learning tasks. Given the lack of ground truth in most cases of crowdsourcing, we refer to data quality as annotators' consistency and credibility. Unlike the simple scenarios where Kappa coefficient and intraclass correlation coefficient usually can apply, online crowdsourcing requires dealing with more complex situations. We introduce a systematic method for evaluating data quality and detecting spamming threats via variance decomposition, and we classify spammers into three categories based on their different behavioral patterns. A spammer index is proposed to assess entire data consistency, and two metrics are developed to measure crowd workers' credibility by utilizing the Markov chain and generalized random effects models. Furthermore, we showcase the practicality of our techniques and their advantages by applying them on a face verification task with both simulation and real-world data collected from two crowdsourcing platforms.

Updated: 2025-06-25 17:56:08

标题: 众包中的数据质量和垃圾行为检测

摘要: 随着众包作为一种获取机器学习数据集标签的高效且具有成本效益的方法而出现，评估由众包提供的数据质量变得至关重要，以提高分析性能并减少后续机器学习任务中的偏见。鉴于在大多数众包情况下缺乏基本真相，我们将数据质量定义为标注者的一致性和可信度。与通常适用于简单情景的Kappa系数和组内相关系数不同，在线众包需要处理更为复杂的情况。我们介绍了一种通过方差分解评估数据质量和检测垃圾信息威胁的系统方法，并根据不同行为模式将垃圾信息发送者分为三类。提出了一个垃圾信息发送者指数，用于评估整体数据一致性，并利用马尔可夫链和广义随机效应模型开发了两个指标来衡量众包工作者的可信度。此外，我们通过在两个众包平台收集的模拟和真实世界数据上应用这些技术来展示它们的实用性和优势。

更新时间: 2025-06-25 17:56:08

领域: cs.HC,cs.LG,stat.AP

下载: http://arxiv.org/abs/2404.17582v2

IRanker: Towards Ranking Foundation Model

Ranking tasks are ubiquitous, encompassing applications such as recommendation systems, LLM routing, and item re-ranking. We propose to unify these tasks using a single ranking foundation model (FM), as it eliminates the need for designing different models for each specific ranking task. However, unlike general supervision tasks in LLMs, ranking tasks do not have clear labels for supervision, posing great challenges to developing a ranking FM. To overcome these challenges, we propose IRanker, a ranking FM framework with reinforcement learning (RL) and iterative decoding. Our insight is to decompose the complex ranking task into an iterative decoding process that eliminates the worst candidate from the candidate pool step by step, which significantly reduces the output combinatorial space and better utilizes the limited context length during RL training. We meticulously train and comprehensively evaluate an IRanker-3B model on nine datasets across three scenarios: recommendation, routing, and passage ranking. The results show that a single IRanker-3B achieves state-of-the-art results on several datasets compared to models of similar size, and even surpasses the performance of larger models on certain datasets. We further demonstrate the effectiveness of our RL design and the robustness of the iterative mechanism across different LLM sizes. Moreover, we conducted both in-domain and out-of-domain zero-shot generalization experiments, which showed that IRanker-3B achieved good generalization on in-domain ranking tasks compared to the base LLM by at least 5% improvement. Surprisingly, on out-of-domain generic LLM tasks, IRanker-3B outperformed the base model by at least 9% on GSM8K, IFEval, and MathQA. In addition, the thoughts generated by IRanker-3B during training could further enhance zero-shot LLM performance.

Updated: 2025-06-25 17:56:06

标题: IRanker：朝向排名基础模型

摘要: 排名任务是无处不在的，包括推荐系统、LLM路由和物品重新排名等应用。我们提出使用单一排名基础模型（FM）统一这些任务，因为它消除了为每个特定排名任务设计不同模型的需要。然而，与LLM中的一般监督任务不同，排名任务没有明确的监督标签，这给开发排名FM带来了巨大挑战。为了克服这些挑战，我们提出了IRanker，一个具有强化学习（RL）和迭代解码的排名FM框架。我们的见解是将复杂的排名任务分解为一个逐步消除候选池中最差候选的迭代解码过程，这显著减少了输出组合空间，并更好地利用了RL训练期间的有限上下文长度。我们精心训练并全面评估了一个IRanker-3B模型，涵盖了三个场景的九个数据集：推荐、路由和段落排名。结果显示，与相似规模的模型相比，单个IRanker-3B在几个数据集上取得了最先进的结果，甚至在某些数据集上超过了更大模型的性能。我们进一步展示了我们的RL设计的有效性和迭代机制在不同LLM大小上的稳健性。此外，我们进行了领域内和领域外的零样本泛化实验，结果显示，与基础LLM相比，IRanker-3B在领域内排名任务上至少提高了5%。令人惊讶的是，在领域外通用LLM任务上，IRanker-3B在GSM8K、IFEval和MathQA上至少比基础模型提高了9%。此外，在训练过程中由IRanker-3B生成的想法还可以进一步增强零样本LLM的性能。

更新时间: 2025-06-25 17:56:06

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.21638v1

The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind

As Large Language Models (LLMs) gain agentic abilities, they will have to navigate complex multi-agent scenarios, interacting with human users and other agents in cooperative and competitive settings. This will require new reasoning skills, chief amongst them being theory of mind (ToM), or the ability to reason about the "mental" states of other agents. However, ToM and other multi-agent abilities in LLMs are poorly understood, since existing benchmarks suffer from narrow scope, data leakage, saturation, and lack of interactivity. We thus propose Decrypto, a game-based benchmark for multi-agent reasoning and ToM drawing inspiration from cognitive science, computational pragmatics and multi-agent reinforcement learning. It is designed to be as easy as possible in all other dimensions, eliminating confounding factors commonly found in other benchmarks. To our knowledge, it is also the first platform for designing interactive ToM experiments. We validate the benchmark design through comprehensive empirical evaluations of frontier LLMs, robustness studies, and human-AI cross-play experiments. We find that LLM game-playing abilities lag behind humans and simple word-embedding baselines. We then create variants of two classic cognitive science experiments within Decrypto to evaluate three key ToM abilities. Surprisingly, we find that state-of-the-art reasoning models are significantly worse at those tasks than their older counterparts. This demonstrates that Decrypto addresses a crucial gap in current reasoning and ToM evaluations, and paves the path towards better artificial agents.

Updated: 2025-06-25 17:55:27

标题: 《用于多智能体推理和心灵理论的解密基准》

摘要: 随着大型语言模型（LLMs）获得代理能力，它们将不得不在复杂的多代理场景中导航，与人类用户和其他代理在合作和竞争环境中进行互动。这将需要新的推理能力，其中最主要的是心智理论（ToM），或者理解其他代理的“心理”状态的能力。然而，LLMs中的ToM和其他多代理能力尚不明确，因为现有的基准测试存在着范围狭窄、数据泄漏、饱和和缺乏互动等问题。因此，我们提出了Decrypto，一个基于游戏的多代理推理和ToM基准测试，灵感来源于认知科学、计算语用学和多代理强化学习。它旨在在所有其他维度上尽可能简单，消除其他基准测试中常见的混杂因素。据我们所知，这也是第一个用于设计互动ToM实验的平台。我们通过对前沿LLMs的全面实证评估、鲁棒性研究和人工智能交叉对抗实验来验证基准测试设计。我们发现，LLM的游戏能力落后于人类和简单的词嵌入基线。然后我们在Decrypto中创建两个经典认知科学实验的变体，以评估三个关键的ToM能力。令人惊讶的是，我们发现，最先进的推理模型在这些任务上明显比它们的老版本更差。这表明Decrypto填补了当前推理和ToM评估中的重要差距，并为更好的人工代理铺平了道路。

更新时间: 2025-06-25 17:55:27

领域: cs.AI,cs.CL,cs.HC,cs.MA

下载: http://arxiv.org/abs/2506.20664v1

OmniGen2: Exploration to Advanced Multimodal Generation

In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables OmniGen2 to build upon existing multimodal understanding models without the need to re-adapt VAE inputs, thereby preserving the original text generation capabilities. To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data. Additionally, we introduce a reflection mechanism tailored for image generation tasks and curate a dedicated reflection dataset based on OmniGen2. Despite its relatively modest parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate in-context generation, also referred to as subject-driven tasks, we introduce a new benchmark named OmniContext. OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release our models, training code, datasets, and data construction pipeline to support future research in this field. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link: https://github.com/VectorSpaceLab/OmniGen2

Updated: 2025-06-25 17:54:25

标题: OmniGen2：探索到先进的多模式生成

摘要: 在这项工作中，我们介绍了OmniGen2，这是一个多功能且开源的生成模型，旨在为各种生成任务提供统一解决方案，包括文本到图像、图像编辑和上下文生成。与OmniGen v1不同，OmniGen2具有两个独立的解码路径，用于文本和图像模态，利用不共享的参数和解耦的图像标记器。这种设计使OmniGen2能够在不需要重新调整VAE输入的情况下建立在现有多模态理解模型之上，从而保留了原始文本生成能力。为了促进OmniGen2的训练，我们开发了全面的数据构建管道，涵盖图像编辑和上下文生成数据。此外，我们引入了一种针对图像生成任务定制的反射机制，并基于OmniGen2策划了一个专门的反射数据集。尽管其参数规模相对较小，但OmniGen2在多个任务基准测试中取得了竞争性结果，包括文本到图像和图像编辑。为了进一步评估上下文生成，也称为主题驱动任务，我们引入了一个名为OmniContext的新基准。在一致性方面，OmniGen2在开源模型中取得了最先进的性能。我们将发布我们的模型、训练代码、数据集和数据构建管道，以支持该领域的未来研究。项目页面：https://vectorspacelab.github.io/OmniGen2；GitHub链接：https://github.com/VectorSpaceLab/OmniGen2

更新时间: 2025-06-25 17:54:25

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.18871v2

Hear No Evil: Detecting Gradient Leakage by Malicious Servers in Federated Learning

Recent work has shown that gradient updates in federated learning (FL) can unintentionally reveal sensitive information about a client's local data. This risk becomes significantly greater when a malicious server manipulates the global model to provoke information-rich updates from clients. In this paper, we adopt a defender's perspective to provide the first comprehensive analysis of malicious gradient leakage attacks and the model manipulation techniques that enable them. Our investigation reveals a core trade-off: these attacks cannot be both highly effective in reconstructing private data and sufficiently stealthy to evade detection -- especially in realistic FL settings that incorporate common normalization techniques and federated averaging. Building on this insight, we argue that malicious gradient leakage attacks, while theoretically concerning, are inherently limited in practice and often detectable through basic monitoring. As a complementary contribution, we propose a simple, lightweight, and broadly applicable client-side detection mechanism that flags suspicious model updates before local training begins, despite the fact that such detection may not be strictly necessary in realistic FL settings. This mechanism further underscores the feasibility of defending against these attacks with minimal overhead, offering a deployable safeguard for privacy-conscious federated learning systems.

Updated: 2025-06-25 17:49:26

标题: 听不见邪恶：检测联邦学习中恶意服务器的梯度泄露

摘要: 最近的研究表明，在联邦学习（FL）中的梯度更新可能会无意中泄露关于客户端本地数据的敏感信息。当恶意服务器操纵全局模型以引发客户端提供信息丰富的更新时，这种风险会显著增加。在本文中，我们采用防御者的角度，对恶意梯度泄露攻击和可能导致这些攻击的模型操纵技术进行了首次全面分析。我们的调查揭示了一个核心折衷：这些攻击不能既在重建私人数据方面高效，又足够隐秘以避免检测 - 尤其是在融合常见标准化技术和联邦平均化的现实FL设置中。基于这一认识，我们认为，恶意梯度泄露攻击虽然在理论上令人担忧，但在实践中存在固有的局限性，并且通常可以通过基本监控来检测。作为一项补充贡献，我们提出了一个简单、轻量级且广泛适用的客户端侧检测机制，可以在本地训练开始之前标记可疑的模型更新，尽管在现实的FL设置中可能并不严格需要这种检测。这一机制进一步强调了以最小开销抵御这些攻击的可行性，为注重隐私的联邦学习系统提供了可部署的保护措施。

更新时间: 2025-06-25 17:49:26

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2506.20651v1

Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer

The problem of learning to defer with multiple experts consists of optimally assigning input instances to experts, balancing the trade-off between their accuracy and computational cost. This is a critical challenge in natural language generation, but also in other fields such as image processing, and medical diagnostics. Recent studies have proposed surrogate loss functions to optimize deferral, but challenges remain in ensuring their consistency properties. This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees. We address open questions regarding realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for both single-stage (jointly learning predictor and deferral function) and two-stage (learning only the deferral function with a fixed expert) learning scenarios. For single-stage deferral, we introduce a family of new realizable $H$-consistent surrogate losses and further prove $H$-consistency for a selected member. For two-stage deferral, we derive new surrogate losses that achieve realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for the two-expert scenario and, under natural assumptions, multiple-expert scenario. Additionally, we provide enhanced theoretical guarantees under low-noise assumptions for both scenarios. Finally, we report the results of experiments using our proposed surrogate losses, comparing their performance against existing baselines.

Updated: 2025-06-25 17:48:58

标题: 掌握多专家路由：可实现的 $H$-一致性和学习推迟的强保证

摘要: 学习与多个专家推迟的问题包括将输入实例最佳地分配给专家，平衡它们之间的准确性和计算成本的权衡。这是自然语言生成中的一个关键挑战，也是图像处理和医学诊断等其他领域的挑战。最近的研究提出了替代损失函数来优化推迟，但在确保它们的一致性属性方面仍存在挑战。本文介绍了新颖的替代损失函数和具有强大理论学习保证的高效算法。我们解决了关于可实现的$ H $-一致性，$ H $-一致性界限和贝叶斯一致性的开放问题，既适用于单阶段（联合学习预测器和推迟函数）又适用于两阶段（仅学习具有固定专家的推迟函数）学习场景。对于单阶段推迟，我们引入了一族新的可实现的$ H $-一致的替代损失，并进一步证明了一个选定成员的$ H $-一致性。对于两阶段推迟，我们推导了新的替代损失，实现了双专家情景的可实现$ H $-一致性，$ H $-一致性界限和贝叶斯一致性，并在自然假设下，多专家情景下也是如此。此外，在低噪声假设下，我们为两种情景提供了增强的理论保证。最后，我们报告了使用我们提出的替代损失进行的实验结果，将它们的性能与现有基准进行了比较。

更新时间: 2025-06-25 17:48:58

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2506.20650v1

Disentangled representations of microscopy images

Microscopy image analysis is fundamental for different applications, from diagnosis to synthetic engineering and environmental monitoring. Modern acquisition systems have granted the possibility to acquire an escalating amount of images, requiring a consequent development of a large collection of deep learning-based automatic image analysis methods. Although deep neural networks have demonstrated great performance in this field, interpretability, an essential requirement for microscopy image analysis, remains an open challenge. This work proposes a Disentangled Representation Learning (DRL) methodology to enhance model interpretability for microscopy image classification. Exploiting benchmark datasets from three different microscopic image domains (plankton, yeast vacuoles, and human cells), we show how a DRL framework, based on transferring a representation learnt from synthetic data, can provide a good trade-off between accuracy and interpretability in this domain.

Updated: 2025-06-25 17:44:37

标题: 显微镜图像的解缠表示

摘要: 显微镜图像分析对于不同的应用至关重要，从诊断到合成工程和环境监测。现代采集系统使得获取不断增加的图像成为可能，需要相应开发大量基于深度学习的自动图像分析方法。尽管深度神经网络在这一领域表现出色，但对于显微镜图像分析至关重要的可解释性仍然是一个挑战。本文提出了一种解耦表示学习（DRL）方法，以增强显微镜图像分类模型的可解释性。利用来自三个不同显微图像领域（浮游生物、酵母液泡和人类细胞）的基准数据集，我们展示了如何基于从合成数据学习的表示进行迁移的DRL框架，在这一领域提供了准确性和可解释性之间的良好平衡。

更新时间: 2025-06-25 17:44:37

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.20649v1

Efficient Federated Learning with Encrypted Data Sharing for Data-Heterogeneous Edge Devices

As privacy protection gains increasing importance, more models are being trained on edge devices and subsequently merged into the central server through Federated Learning (FL). However, current research overlooks the impact of network topology, physical distance, and data heterogeneity on edge devices, leading to issues such as increased latency and degraded model performance. To address these issues, we propose a new federated learning scheme on edge devices that called Federated Learning with Encrypted Data Sharing(FedEDS). FedEDS uses the client model and the model's stochastic layer to train the data encryptor. The data encryptor generates encrypted data and shares it with other clients. The client uses the corresponding client's stochastic layer and encrypted data to train and adjust the local model. FedEDS uses the client's local private data and encrypted shared data from other clients to train the model. This approach accelerates the convergence speed of federated learning training and mitigates the negative impact of data heterogeneity, making it suitable for application services deployed on edge devices requiring rapid convergence. Experiments results show the efficacy of FedEDS in promoting model performance.

Updated: 2025-06-25 17:40:54

标题: 使用加密数据共享的高效异构边缘设备联邦学习

摘要: 随着隐私保护日益重要，越来越多的模型在边缘设备上进行训练，然后通过联邦学习（FL）合并到中央服务器中。然而，目前的研究忽视了网络拓扑、物理距离和数据异质性对边缘设备的影响，导致延迟增加和模型性能下降等问题。为了解决这些问题，我们提出了一种新的边缘设备上的联邦学习方案，称为带加密数据共享的联邦学习（FedEDS）。FedEDS使用客户端模型和模型的随机层来训练数据加密器。数据加密器生成加密数据并与其他客户端共享。客户端使用相应客户端的随机层和加密数据来训练和调整本地模型。FedEDS使用客户端的本地私有数据和其他客户端共享的加密数据来训练模型。这种方法加快了联邦学习训练的收敛速度，并减轻了数据异质性的负面影响，使其适用于部署在需要快速收敛的边缘设备上的应用服务。实验结果显示了FedEDS在促进模型性能方面的有效性。

更新时间: 2025-06-25 17:40:54

领域: cs.LG

下载: http://arxiv.org/abs/2506.20644v1

Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data

Class imbalance remains a major challenge in machine learning, especially in multi-class problems with long-tailed distributions. Existing methods, such as data resampling, cost-sensitive techniques, and logistic loss modifications, though popular and often effective, lack solid theoretical foundations. As an example, we demonstrate that cost-sensitive methods are not Bayes-consistent. This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. We then propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$-consistency, and derive corresponding learning guarantees based on empirical loss and a new notion of class-sensitive Rademacher complexity. Leveraging these theoretical results, we devise novel and general learning algorithms, IMMAX (Imbalanced Margin Maximization), which incorporate confidence margins and are applicable to various hypothesis sets. While our focus is theoretical, we also present extensive empirical results demonstrating the effectiveness of our algorithms compared to existing baselines.

Updated: 2025-06-25 17:36:30

标题: 平衡天平：学习不平衡数据的理论和算法框架

摘要: 类别不平衡仍然是机器学习中一个重要的挑战，特别是在具有长尾分布的多类问题中。现有的方法，如数据重采样、成本敏感技术和逻辑损失修改，虽然广受欢迎且通常有效，但缺乏坚实的理论基础。举例来说，我们证明了成本敏感方法并非贝叶斯一致的。本文引入了一个新的理论框架，用于分析不平衡分类中的泛化。然后我们提出了一种新的类别不平衡边缘损失函数，适用于二元和多类设置，证明了其强H一致性，并基于经验损失和一种新的类别敏感Rademacher复杂度推导出相应的学习保证。利用这些理论结果，我们设计了新颖且通用的学习算法IMMAX（不平衡边缘最大化），其中包含置信边界，并适用于各种假设集。虽然我们的重点是理论，但我们还展示了广泛的实证结果，证明了我们的算法相对于现有基线的有效性。

更新时间: 2025-06-25 17:36:30

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.10381v2

Towards Community-Driven Agents for Machine Learning Engineering

Large language model-based machine learning (ML) agents have shown great promise in automating ML research. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, where human researchers often gain insights and contribute by sharing knowledge. To bridge this gap, we introduce MLE-Live, a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community. Building on this framework, we propose CoMind, a novel agent that excels at exchanging insights and developing novel solutions within a community context. CoMind achieves state-of-the-art performance on MLE-Live and outperforms 79.2% human competitors on average across four ongoing Kaggle competitions. Our code is released at https://github.com/comind-ml/CoMind.

Updated: 2025-06-25 17:36:02

标题: 朝向机器学习工程的社区驱动代理

摘要: 基于大型语言模型的机器学习代理在自动化机器学习研究方面表现出巨大的潜力。然而，现有的代理通常在给定的研究问题上独立运作，没有与更广泛的研究社区互动，而在这些社区中，人类研究人员经常通过分享知识获得见解并做出贡献。为了弥合这一差距，我们引入了MLE-Live，一个设计用于评估代理与模拟Kaggle研究社区进行交流和利用集体知识能力的实时评估框架。基于这一框架，我们提出了CoMind，这是一个在社区环境中擅长交流见解并开发新解决方案的新型代理。CoMind在MLE-Live上实现了最先进的性能，并在四个持续进行中的Kaggle比赛中平均超过79.2%的人类竞争对手。我们的代码发布在https://github.com/comind-ml/CoMind。

更新时间: 2025-06-25 17:36:02

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.20640v1

Diffusion Models Through a Global Lens: Are They Culturally Inclusive?

Text-to-image diffusion models have recently enabled the creation of visually compelling, detailed images from textual prompts. However, their ability to accurately represent various cultural nuances remains an open question. In our work, we introduce CultDiff benchmark, evaluating state-of-the-art diffusion models whether they can generate culturally specific images spanning ten countries. We show that these models often fail to generate cultural artifacts in architecture, clothing, and food, especially for underrepresented country regions, by conducting a fine-grained analysis of different similarity aspects, revealing significant disparities in cultural relevance, description fidelity, and realism compared to real-world reference images. With the collected human evaluations, we develop a neural-based image-image similarity metric, namely, CultDiff-S, to predict human judgment on real and generated images with cultural artifacts. Our work highlights the need for more inclusive generative AI systems and equitable dataset representation over a wide range of cultures.

Updated: 2025-06-25 17:32:22

标题: 全球视角下的扩散模型：它们是否具有文化包容性？

摘要: 文本到图像扩散模型最近使得从文本提示中创建视觉上引人注目、详细的图像成为可能。然而，它们准确表示各种文化细微差别的能力仍然是一个悬而未决的问题。在我们的工作中，我们引入了CultDiff基准，评估最先进的扩散模型是否能够生成横跨十个国家的具有文化特色的图像。我们展示这些模型经常无法生成文化特征在建筑、服装和食物方面的艺术品，特别是对于代表性不足的国家地区，通过对不同相似性方面进行细致分析，揭示了与真实世界参考图像相比文化相关性、描述保真度和逼真度方面的显著差异。通过收集的人类评估，我们开发了一种基于神经网络的图像-图像相似度度量，即CultDiff-S，用于预测具有文化特征的真实和生成图像上的人类判断。我们的工作突显了对更具包容性的生成式人工智能系统和在广泛文化范围内具有公平代表性的数据集的需求。

更新时间: 2025-06-25 17:32:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.08914v2

First-order methods for stochastic and finite-sum convex optimization with deterministic constraints

In this paper, we study a class of stochastic and finite-sum convex optimization problems with deterministic constraints. Existing methods typically aim to find an $\epsilon$-$expectedly\ feasible\ stochastic\ optimal$ solution, in which the expected constraint violation and expected optimality gap are both within a prescribed tolerance $\epsilon$. However, in many practical applications, constraints must be nearly satisfied with certainty, rendering such solutions potentially unsuitable due to the risk of substantial violations. To address this issue, we propose stochastic first-order methods for finding an $\epsilon$-$surely\ feasible\ stochastic\ optimal$ ($\epsilon$-SFSO) solution, where the constraint violation is deterministically bounded by $\epsilon$ and the expected optimality gap is at most $\epsilon$. Our methods apply an accelerated stochastic gradient (ASG) scheme or a modified variance-reduced ASG scheme $only\ once$ to a sequence of quadratic penalty subproblems with appropriately chosen penalty parameters. We establish first-order oracle complexity bounds for the proposed methods in computing an $\epsilon$-SFSO solution. As a byproduct, we also derive first-order oracle complexity results for sample average approximation method in computing an $\epsilon$-SFSO solution of the stochastic optimization problem using our proposed methods to solve the sample average problem.

Updated: 2025-06-25 17:26:02

标题: 随机和有限和凸优化的一阶方法与确定性约束

摘要: 在这篇论文中，我们研究了一类具有确定性约束的随机和有限和凸优化问题。现有方法通常旨在找到一个 $\epsilon$-期望可行的随机最优解，在这个解中，期望的约束违反和期望的最优性差距都在预先规定的容差 $\epsilon$ 内。然而，在许多实际应用中，约束必须几乎确定地得到满足，因此这样的解可能因为存在重大违反风险而不合适。为了解决这个问题，我们提出了用于找到一个 $\epsilon$-确定可行的随机最优 ($\epsilon$-SFSO) 解的随机一阶方法，其中约束违反由 $\epsilon$ 确定性地限制，期望的最优性差距最多为 $\epsilon$。我们的方法将加速随机梯度 (ASG) 方案或修改后的方差减少 ASG 方案仅应用一次到一系列具有适当选择的惩罚参数的二次惩罚子问题中。我们建立了计算 $\epsilon$-SFSO 解的提出方法的一阶预言复杂度界限。作为副产品，我们还使用我们提出的方法解决样本平均问题，在计算随机优化问题的 $\epsilon$-SFSO 解时得出了一阶预言复杂度结果。

更新时间: 2025-06-25 17:26:02

领域: math.OC,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2506.20630v1

PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models

Low-Rank Adaptation (LoRA) is a widely used finetuning method for large models. Its small memory footprint allows practitioners to adapt large models to specific tasks at a fraction of the cost of full finetuning. Different modifications have been proposed to enhance its efficiency by, for example, setting the learning rate, the rank, and the initialization. Another improvement axis is adapter placement strategy: when using LoRA, practitioners usually pick module types to adapt with LoRA, such as Query and Key modules. Few works have studied the problem of adapter placement, with nonconclusive results: original LoRA paper suggested placing adapters in attention modules, while other works suggested placing them in the MLP modules. Through an intuitive theoretical analysis, we introduce PLoP (Precise LoRA Placement), a lightweight method that allows automatic identification of module types where LoRA adapters should be placed, given a pretrained model and a finetuning task. We demonstrate that PLoP consistently outperforms, and in the worst case competes, with commonly used placement strategies through comprehensive experiments on supervised finetuning and reinforcement learning for reasoning.

Updated: 2025-06-25 17:25:02

标题: PLoP：精准的LoRA放置以有效微调大模型

摘要: Low-Rank Adaptation（LoRA）是一种广泛使用的大型模型微调方法。其较小的内存占用允许从业者以较低成本将大型模型适应于特定任务。已经提出了不同的修改方法来增强其效率，例如设置学习率、秩和初始化。另一个改进轴是适配器放置策略：使用LoRA时，从业者通常选择要与LoRA一起适应的模块类型，例如查询和键模块。很少有研究关于适配器放置的问题，结果并不一致：原始LoRA论文建议将适配器放置在注意力模块中，而其他作品建议将其放置在MLP模块中。通过直观的理论分析，我们引入了PLoP（Precise LoRA Placement），这是一种轻量级方法，允许在给定预训练模型和微调任务的情况下自动识别LoRA适配器应放置在哪种模块类型。我们通过对监督微调和用于推理的强化学习的全面实验展示了PLoP一直优于常用的放置策略，并在最坏情况下与其相竞争。

更新时间: 2025-06-25 17:25:02

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2506.20629v1

On Context-Content Uncertainty Principle

The Context-Content Uncertainty Principle (CCUP) proposes that inference under uncertainty is governed by an entropy asymmetry between context and content: high-entropy contexts must be interpreted through alignment with low-entropy, structured content. In this paper, we develop a layered computational framework that derives operational principles from this foundational asymmetry. At the base level, CCUP formalizes inference as directional entropy minimization, establishing a variational gradient that favors content-first structuring. Building upon this, we identify four hierarchical layers of operational principles: (\textbf{L1}) \emph{Core Inference Constraints}, including structure-before-specificity, asymmetric inference flow, cycle-consistent bootstrapping, and conditional compression, all shown to be mutually reducible; (\textbf{L2}) \emph{Resource Allocation Principles}, such as precision-weighted attention, asymmetric learning rates, and attractor-based memory encoding; (\textbf{L3}) \emph{Temporal Bootstrapping Dynamics}, which organize learning over time via structure-guided curricula; and (\textbf{L4}) \emph{Spatial Hierarchical Composition}, which integrates these mechanisms into self-organizing cycles of memory, inference, and planning. We present formal equivalence theorems, a dependency lattice among principles, and computational simulations demonstrating the efficiency gains of CCUP-aligned inference. This work provides a unified theoretical foundation for understanding how brains and machines minimize uncertainty through recursive structure-specificity alignment. The brain is not just an inference machine. It is a cycle-consistent entropy gradient resolver, aligning structure and specificity via path-dependent, content-seeded simulation.

Updated: 2025-06-25 17:21:19

标题: 关于上下文内容不确定性原理

摘要: 上下文内容不确定性原理（CCUP）提出，在不确定性下的推断受到上下文和内容之间的熵不对称性的控制：高熵上下文必须通过与低熵、结构化内容的对齐来解释。在本文中，我们开发了一个分层计算框架，从这一基础性不对称性中导出操作原则。在基本层面上，CCUP将推断形式化为方向熵最小化，建立一个有利于优先结构化内容的变分梯度。在此基础上，我们确定了四个层次的操作原则：（L1）核心推断约束，包括结构优先于特异性、不对称推断流、循环一致的引导、条件压缩，所有这些原则都被证明是相互可归约的；（L2）资源分配原则，如精确度加权注意力、不对称学习速率和基于吸引子的记忆编码；（L3）时间引导动态，通过结构引导的课程安排组织学习；（L4）空间分层组合，将这些机制集成到存储、推断和规划的自组织循环中。我们提出了形式等价定理、原则之间的依赖格栅以及计算模拟，展示了CCUP对齐推断的效率收益。这项工作为理解大脑和机器如何通过递归结构特异性对齐来最小化不确定性提供了一个统一的理论基础。大脑不仅仅是一个推断机器。它是一个循环一致的熵梯度解算器，通过依赖路径和内容种子模拟来对齐结构和特异性。

更新时间: 2025-06-25 17:21:19

领域: cs.LG

下载: http://arxiv.org/abs/2506.20699v1

Probing Quantum Spin Systems with Kolmogorov-Arnold Neural Network Quantum States

Neural Quantum States (NQS) are a class of variational wave functions parametrized by neural networks (NNs) to study quantum many-body systems. In this work, we propose \texttt{SineKAN}, a NQS \textit{ansatz} based on Kolmogorov-Arnold Networks (KANs), to represent quantum mechanical wave functions as nested univariate functions. We show that \texttt{SineKAN} wavefunction with learnable sinusoidal activation functions can capture the ground state energies, fidelities and various correlation functions of the one dimensional Transverse-Field Ising model, Anisotropic Heisenberg model, and Antiferromagnetic $J_{1}-J_{2}$ model with different chain lengths. In our study of the $J_1-J_2$ model with $L=100$ sites, we find that the \texttt{SineKAN} model outperforms several previously explored neural quantum state \textit{ans\"atze}, including Restricted Boltzmann Machines (RBMs), Long Short-Term Memory models (LSTMs), and Multi-layer Perceptrons (MLP) \textit{a.k.a.} Feed Forward Neural Networks, when compared to the results obtained from the Density Matrix Renormalization Group (DMRG) algorithm. We find that \texttt{SineKAN} models can be trained to high precisions and accuracies with minimal computational costs.

Updated: 2025-06-25 17:17:27

标题: 用科尔莫戈洛夫-阿诺德神经网络量子态探究量子自旋系统

摘要: 神经量子态（NQS）是一类由神经网络（NNs）参数化的变分波函数，用于研究量子多体系统。在这项工作中，我们提出了基于Kolmogorov-Arnold Networks（KANs）的NQS \textit{ansatz} \texttt{SineKAN}，用于将量子力学波函数表示为嵌套单变量函数。我们展示了具有可学习正弦激活函数的\texttt{SineKAN}波函数可以捕获一维横向场伊辛模型、各向异性海森堡模型和反铁磁$J_{1}-J_{2}$模型的基态能量、保真度和各种相关函数。在我们对具有$L=100$个站点的$J_1-J_2$模型的研究中，我们发现\texttt{SineKAN}模型优于几种先前探索过的神经量子态\textit{ans\"atze}，包括受限玻尔兹曼机（RBMs）、长短期记忆模型（LSTMs）和多层感知器（MLP）\textit{a.k.a.}前馈神经网络，与从密度矩阵重整化群（DMRG）算法获得的结果相比。我们发现\texttt{SineKAN}模型可以在最小的计算成本下训练到高精度和准确性。

更新时间: 2025-06-25 17:17:27

领域: quant-ph,cond-mat.dis-nn,cond-mat.str-el,cs.LG

下载: http://arxiv.org/abs/2506.01891v3

Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning

Closed-loop learning is the process of repeatedly estimating a model from data generated from the model itself. It is receiving great attention due to the possibility that large neural network models may, in the future, be primarily trained with data generated by artificial neural networks themselves. We study this process for models that belong to exponential families, deriving equations of motions that govern the dynamics of the parameters. We show that maximum likelihood estimation of the parameters endows sufficient statistics with the martingale property and that as a result the process converges to absorbing states that amplify initial biases present in the data. However, we show that this outcome may be prevented by polluting the data with an infinitesimal fraction of data points generated from a fixed model, by relying on maximum a posteriori estimation or by introducing regularisation. Furthermore, we show that the asymptotic behavior of the dynamics is not reparametrisation invariant.

Updated: 2025-06-25 17:12:22

标题: 迷失在再训练中：在闭环学习下漫游指数族参数空间

摘要: 闭环学习是一个从模型生成的数据中重复估计模型的过程。由于大型神经网络模型未来可能主要通过人工神经网络自身生成的数据进行训练，因此它受到了极大关注。我们研究了属于指数族的模型的这一过程，推导出控制参数动态的运动方程。我们展示了参数的最大似然估计赋予了充分统计量马丁格尔性质，因此该过程收敛到放大数据中存在的初始偏差的吸收态。然而，我们展示了通过向数据中注入来自固定模型生成的微小数据点的微小分数、依赖于最大后验估计或引入正规化可以防止这一结果。此外，我们展示了动态的渐近行为并不是再参数化不变的。

更新时间: 2025-06-25 17:12:22

领域: cs.LG,cond-mat.dis-nn,physics.data-an,stat.ML

下载: http://arxiv.org/abs/2506.20623v1

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Scaling laws predict that the performance of large language models improves with increasing model size and data size. In practice, pre-training has been relying on massive web crawls, using almost all data sources publicly available on the internet so far. However, this pool of natural data does not grow at the same rate as the compute supply. Furthermore, the availability of high-quality texts is even more limited: data filtering pipelines often remove up to 99% of the initial web scrapes to achieve state-of-the-art. To address the "data wall" of pre-training scaling, our work explores ways to transform and recycle data discarded in existing filtering processes. We propose REWIRE, REcycling the Web with guIded REwrite, a method to enrich low-quality documents so that they could become useful for training. This in turn allows us to increase the representation of synthetic data in the final pre-training set. Experiments at 1B, 3B and 7B scales of the DCLM benchmark show that mixing high-quality raw texts and our rewritten texts lead to 1.0, 1.3 and 2.5 percentage points improvement respectively across 22 diverse tasks, compared to training on only filtered web data. Training on the raw-synthetic data mix is also more effective than having access to 2x web data. Through further analysis, we demonstrate that about 82% of the mixed in texts come from transforming lower-quality documents that would otherwise be discarded. REWIRE also outperforms related approaches of generating synthetic data, including Wikipedia-style paraphrasing, question-answer synthesizing and knowledge extraction. These results suggest that recycling web texts holds the potential for being a simple and effective approach for scaling pre-training data.

Updated: 2025-06-25 17:12:12

标题: 网络回收利用：一种提升语言模型预训练数据质量和数量的方法

摘要: 比例律法则预测，大型语言模型的性能随着模型大小和数据大小的增加而改善。实际上，预训练一直依赖于大规模的网络爬虫，迄今为止几乎使用了互联网上所有公开可用的数据源。然而，自然数据池的增长速度并不与计算资源供应的增长速度相同。此外，高质量文本的可用性更加有限：数据过滤管道经常会删除高达99%的初始网页抓取数据以达到最先进的状态。为了解决预训练扩展的“数据墙”，我们的工作探索了转换和回收现有过滤过程中丢弃的数据的方法。我们提出了REWIRE，即具有引导重写的Web回收方法，该方法可以丰富低质量文档，使其可以用于训练。这反过来使我们能够增加最终预训练集中合成数据的表示。在DCLM基准的1B、3B和7B规模上的实验表明，混合高质量原始文本和我们重写的文本相比仅在经过过滤的网页数据上训练，分别在22个不同任务上分别提高了1.0、1.3和2.5个百分点。在原始-合成数据混合上的训练也比获得2倍网页数据更有效。通过进一步的分析，我们展示了约82%的混合文本来自于转换低质量文档，否则这些文档将被丢弃。REWIRE还优于生成合成数据的相关方法，包括维基百科风格的改写、问答合成和知识提取。这些结果表明，回收网络文本具有潜力成为一个简单有效的扩展预训练数据的方法。

更新时间: 2025-06-25 17:12:12

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.04689v2

Define-ML: An Approach to Ideate Machine Learning-Enabled Systems

[Context] The increasing adoption of machine learning (ML) in software systems demands specialized ideation approaches that address ML-specific challenges, including data dependencies, technical feasibility, and alignment between business objectives and probabilistic system behavior. Traditional ideation methods like Lean Inception lack structured support for these ML considerations, which can result in misaligned product visions and unrealistic expectations. [Goal] This paper presents Define-ML, a framework that extends Lean Inception with tailored activities - Data Source Mapping, Feature-to-Data Source Mapping, and ML Mapping - to systematically integrate data and technical constraints into early-stage ML product ideation. [Method] We developed and validated Define-ML following the Technology Transfer Model, conducting both static validation (with a toy problem) and dynamic validation (in a real-world industrial case study). The analysis combined quantitative surveys with qualitative feedback, assessing utility, ease of use, and intent of adoption. [Results] Participants found Define-ML effective for clarifying data concerns, aligning ML capabilities with business goals, and fostering cross-functional collaboration. The approach's structured activities reduced ideation ambiguity, though some noted a learning curve for ML-specific components, which can be mitigated by expert facilitation. All participants expressed the intention to adopt Define-ML. [Conclusion] Define-ML provides an openly available, validated approach for ML product ideation, building on Lean Inception's agility while aligning features with available data and increasing awareness of technical feasibility.

Updated: 2025-06-25 17:11:26

标题: Define-ML: 一种构想机器学习启用系统的方法

摘要: [背景] 在软件系统中越来越多地采用机器学习（ML）需要专门的构思方法来解决ML特定的挑战，包括数据依赖性、技术可行性以及业务目标与概率系统行为之间的对齐。传统的构思方法如精益启动缺乏对这些ML考虑的结构化支持，这可能导致产品愿景不一致和不切实际的期望。[目标] 本文提出了Define-ML，这是一个框架，通过定制活动 - 数据源映射、特征到数据源映射和ML映射 - 来系统地将数据和技术约束集成到早期ML产品构思中。[方法] 我们按照技术转移模型开发和验证Define-ML，进行静态验证（使用玩具问题）和动态验证（在实际工业案例研究中）。分析结合了定量调查和定性反馈，评估了实用性、易用性和采纳意图。[结果] 参与者发现Define-ML对于澄清数据问题、将ML能力与业务目标对齐以及促进跨职能协作非常有效。该方法的结构化活动减少了构思中的模糊性，尽管一些人注意到了ML特定组件的学习曲线，这可以通过专家引导来缓解。所有参与者表示有意采用Define-ML。[结论] Define-ML提供了一个公开可用的、经过验证的ML产品构思方法，借鉴了精益启动的灵活性，同时将特性与可用数据对齐，并增加了对技术可行性的认识。

更新时间: 2025-06-25 17:11:26

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2506.20621v1

Do Concept Bottleneck Models Respect Localities?

Concept-based explainability methods use human-understandable intermediaries to produce explanations for machine learning models. These methods assume concept predictions can help understand a model's internal reasoning. In this work, we assess the degree to which such an assumption is true by analyzing whether concept predictors leverage "relevant" features to make predictions, a term we call locality. Concept-based models that fail to respect localities also fail to be explainable because concept predictions are based on spurious features, making the interpretation of the concept predictions vacuous. To assess whether concept-based models respect localities, we construct and use three metrics to characterize when models respect localities, complementing our analysis with theoretical results. Each of our metrics captures a different notion of perturbation and assess whether perturbing "irrelevant" features impacts the predictions made by a concept predictors. We find that many concept-based models used in practice fail to respect localities because concept predictors cannot always clearly distinguish distinct concepts. Based on these findings, we propose suggestions for alleviating this issue.

Updated: 2025-06-25 17:10:45

标题: 概念瓶颈模型是否尊重局部性？

摘要: 基于概念的可解释性方法利用人类可理解的中介来为机器学习模型生成解释。这些方法假设概念预测可以帮助理解模型的内部推理。在这项工作中，我们通过分析概念预测器是否利用“相关”特征进行预测来评估这种假设的真实程度，我们称之为局部性。不尊重局部性的基于概念的模型也不能解释，因为概念预测基于虚假特征，使概念预测的解释变得无意义。为了评估基于概念的模型是否尊重局部性，我们构建并使用三个度量标准来表征模型何时尊重局部性，辅以理论结果分析。我们的每个度量标准捕捉了不同的扰动概念，并评估扰动“不相关”特征是否会影响概念预测器的预测。我们发现实践中许多基于概念的模型无法尊重局部性，因为概念预测器并不能总是清楚地区分不同的概念。基于这些发现，我们提出了缓解这一问题的建议。

更新时间: 2025-06-25 17:10:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.01259v5

From $\mathcal{O}(n^{2})$ to $\mathcal{O}(n)$ Parameters: Quantum Self-Attention in Vision Transformers for Biomedical Image Classification

We demonstrate that quantum vision transformers (QViTs), vision transformers (ViTs) with self-attention (SA) mechanisms replaced by quantum self-attention (QSA) mechanisms, can match state-of-the-art (SOTA) biomedical image classifiers while using 99.99% fewer parameters. QSAs are produced by replacing linear SA layers with parameterised quantum neural networks (QNNs), producing a QSA mechanism and reducing parameter scaling from $\mathcal{O}(n^2)$ to $\mathcal{O}(n)$. On RetinaMNIST, our ultra parameter-efficient QViT outperforms 13/14 SOTA methods including CNNs and ViTs, achieving 56.5% accuracy, just 0.88% below the top MedMamba model while using 99.99% fewer parameters (1K vs 14.5M) and 89% fewer GFLOPs. We present the first investigation of knowledge distillation (KD) from classical to quantum vision transformers in biomedical image classification, showing that QViTs maintain comparable performance to classical ViTs across eight diverse datasets spanning multiple modalities, with improved QSA parameter-efficiency. Our higher-qubit architecture benefitted more from KD pre-training, suggesting a scaling relationship between QSA parameters and KD effectiveness. These findings establish QSA as a practical architectural choice toward parameter-efficient biomedical image analysis.

Updated: 2025-06-25 17:08:53

标题: 从$\mathcal{O}(n^{2})$到$\mathcal{O}(n)$ 参数：用于生物医学图像分类的视觉Transformer中的量子自注意力

摘要: 我们展示了量子视觉变换器（QViTs），这是一种将自注意力（SA）机制替换为量子自注意力（QSA）机制的视觉变换器（ViTs），可以与最先进的生物医学图像分类器相匹配，同时使用的参数数量减少了99.99%。QSA是通过用参数化的量子神经网络（QNNs）替换线性SA层来生成的，从而产生了QSA机制，并将参数缩放从$\mathcal{O}(n^2)$降至$\mathcal{O}(n)$。在RetinaMNIST上，我们的超参数效率高的QViT优于13/14个最先进的方法，包括CNNs和ViTs，在使用99.99%更少参数（1K vs 14.5M）和89%更少GFLOPs的情况下，实现了56.5%的准确率，仅比顶级MedMamba模型低0.88%。我们首次对从经典到量子视觉变换器在生物医学图像分类中的知识蒸馏（KD）进行了调查，表明QViTs在跨越多个模态的八个不同数据集上维持了与经典ViTs相媲美的性能，并提高了QSA参数效率。我们的高量子架构更受益于KD预训练，表明QSA参数与KD效果之间存在一种缩放关系。这些发现将QSA确立为实现参数高效的生物医学图像分析的实用架构选择。

更新时间: 2025-06-25 17:08:53

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.07294v2

A Geometry-Grounded Data Perimeter in Azure

While data perimeter is ubiquitous in cybersecurity speak, it rarely defines how boundary points are arranged. In this paper we show how Azure s blast radius ultrametric provides the distance, and how solving the Traveling Salesman Problem in this ultrametric space provides the ordering, yielding a true geometric contour: an actionable perimeter measure for SPN prioritization.

Updated: 2025-06-25 17:07:30

标题: Azure中的几何基础数据边界

摘要: 尽管数据边界在网络安全领域中很常见，但很少定义边界点的排列方式。在本文中，我们展示了Azure的爆炸半径超度量提供了距离，以及如何在这个超度量空间中解决旅行推销员问题，从而得出一个真正的几何轮廓：一种可操作的用于SPN优先级排序的边界度量。

更新时间: 2025-06-25 17:07:30

领域: cs.CR

下载: http://arxiv.org/abs/2505.13238v4

Weighted Mean Frequencies: a handcraft Fourier feature for 4D Flow MRI segmentation

In recent decades, the use of 4D Flow MRI images has enabled the quantification of velocity fields within a volume of interest and along the cardiac cycle. However, the lack of resolution and the presence of noise in these biomarkers are significant issues. As indicated by recent studies, it appears that biomarkers such as wall shear stress are particularly impacted by the poor resolution of vessel segmentation. The Phase Contrast Magnetic Resonance Angiography (PC-MRA) is the state-of-the-art method to facilitate segmentation. The objective of this work is to introduce a new handcraft feature that provides a novel visualisation of 4D Flow MRI images, which is useful in the segmentation task. This feature, termed Weighted Mean Frequencies (WMF), is capable of revealing the region in three dimensions where a voxel has been passed by pulsatile flow. Indeed, this feature is representative of the hull of all pulsatile velocity voxels. The value of the feature under discussion is illustrated by two experiments. The experiments involved segmenting 4D Flow MRI images using optimal thresholding and deep learning methods. The results obtained demonstrate a substantial enhancement in terms of IoU and Dice, with a respective increase of 0.12 and 0.13 in comparison with the PC-MRA feature, as evidenced by the deep learning task. This feature has the potential to yield valuable insights that could inform future segmentation processes in other vascular regions, such as the heart or the brain.

Updated: 2025-06-25 17:04:00

标题: 加权均值频率：一种手工傅立叶特征用于4D流MRI分割

摘要: 在最近几十年中，4D流MRI图像的使用使得可以在感兴趣的区域和心脏周期内量化速度场。然而，这些生物标志物的分辨率不足和噪音的存在是重要问题。正如最近的研究所指出的，生物标志物如壁面剪应力特别受到血管分割分辨率不佳的影响。磁共振相位对比血管造影（PC-MRA）是促进分割的最先进方法。本研究的目的是引入一种新的手工特征，提供一种新颖的4D流MRI图像可视化，有助于分割任务。这种特征被称为加权平均频率（WMF），能够显示三维空间中脉动流体通过的体素区域。事实上，这个特征代表了所有脉动速度体素的外壳。通过两个实验来说明这个特征的价值。实验涉及使用最佳阈值和深度学习方法对4D流MRI图像进行分割。所获得的结果显示，在IoU和Dice方面有显著的增强，与PC-MRA特征相比，分别增加了0.12和0.13，正如深度学习任务所证明的。这个特征有潜力提供宝贵的见解，可以为未来其他血管区域的分割过程提供信息，如心脏或大脑。

更新时间: 2025-06-25 17:04:00

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2506.20614v1

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings

The escalating rates of gun-related violence and mass shootings represent a significant threat to public safety. Timely and accurate information for law enforcement agencies is crucial in mitigating these incidents. Current commercial gunshot detection systems, while effective, often come with prohibitive costs. This research explores a cost-effective alternative by leveraging acoustic analysis of gunshot recordings, potentially obtainable from ubiquitous devices like cell phones, to not only detect gunshots but also classify the type of firearm used. This paper details a study on deciphering gun type hierarchies using a curated dataset of 3459 recordings. We investigate the fundamental acoustic characteristics of gunshots, including muzzle blasts and shockwaves, which vary based on firearm type, ammunition, and shooting direction. We propose and evaluate machine learning frameworks, including Support Vector Machines (SVMs) as a baseline and a more advanced Convolutional Neural Network (CNN) architecture for joint gunshot detection and gun type classification. Results indicate that our deep learning approach achieves a mean average precision (mAP) of 0.58 on clean labeled data, outperforming the SVM baseline (mAP 0.39). Challenges related to data quality, environmental noise, and the generalization capabilities when using noisy web-sourced data (mAP 0.35) are also discussed. The long-term vision is to develop a highly accurate, real-time system deployable on common recording devices, significantly reducing detection costs and providing critical intelligence to first responders.

Updated: 2025-06-25 17:00:21

标题: 通过枪声录音的声学分析解读枪支类型的等级制度

摘要: 枪支相关暴力和大规模枪击事件不断上升的速率对公共安全构成重大威胁。及时准确的信息对执法机构在减轻这些事件中起着至关重要的作用。目前市售的枪声检测系统虽然有效，但往往伴随着高昂的成本。本研究通过利用枪声录音的声学分析，可能来源于普及设备如手机，探索了一种经济有效的替代方案，不仅能够检测枪声，还能够对使用的枪支类型进行分类。本文详细介绍了一项对3459个录音进行解码的研究，以了解不同种类的枪支。我们研究了枪声的基本声学特征，包括枪口爆炸和冲击波，这些特征根据枪支类型、弹药和射击方向而变化。我们提出并评估了机器学习框架，包括作为基准的支持向量机（SVM）和更先进的卷积神经网络（CNN）架构，用于联合枪声检测和枪支类型分类。结果表明，我们的深度学习方法在干净标记数据上实现了0.58的均值平均精度（mAP），超过了SVM基线（mAP 0.39）。讨论了与数据质量、环境噪音以及在使用嘈杂的网络数据时的泛化能力相关的挑战（mAP 0.35）。长期愿景是开发一种高度准确、实时部署在常见录音设备上的系统，显著降低检测成本并为第一响应者提供关键情报。

更新时间: 2025-06-25 17:00:21

领域: cs.SD,cs.AI,cs.MM,eess.AS

下载: http://arxiv.org/abs/2506.20609v1

AI Assistants to Enhance and Exploit the PETSc Knowledge Base

Generative AI, especially through large language models (LLMs), is transforming how technical knowledge can be accessed, reused, and extended. PETSc, a widely used numerical library for high-performance scientific computing, has accumulated a rich but fragmented knowledge base over its three decades of development, spanning source code, documentation, mailing lists, GitLab issues, Discord conversations, technical papers, and more. Much of this knowledge remains informal and inaccessible to users and new developers. To activate and utilize this knowledge base more effectively, the PETSc team has begun building an LLM-powered system that combines PETSc content with custom LLM tools -- including retrieval-augmented generation (RAG), reranking algorithms, and chatbots -- to assist users, support developers, and propose updates to formal documentation. This paper presents initial experiences designing and evaluating these tools, focusing on system architecture, using RAG and reranking for PETSc-specific information, evaluation methodologies for various LLMs and embedding models, and user interface design. Leveraging the Argonne Leadership Computing Facility resources, we analyze how LLM responses can enhance the development and use of numerical software, with an initial focus on scalable Krylov solvers. Our goal is to establish an extensible framework for knowledge-centered AI in scientific software, enabling scalable support, enriched documentation, and enhanced workflows for research and development. We conclude by outlining directions for expanding this system into a robust, evolving platform that advances software ecosystems to accelerate scientific discovery.

Updated: 2025-06-25 17:00:05

标题: AI助手用于增强和利用PETSc知识库

摘要: 生成式人工智能，特别是通过大型语言模型（LLMs），正在改变技术知识如何被获取、重复使用和扩展的方式。PETSc是一个广泛使用的用于高性能科学计算的数值库，在其三十年的发展过程中积累了丰富但分散的知识库，涵盖源代码、文档、邮件列表、GitLab问题、Discord对话、技术论文等。其中许多知识仍然是非正式的，对用户和新开发人员不可访问。为了更有效地激活和利用这一知识库，PETSc团队已经开始构建一个由LLM工具驱动的系统，将PETSc内容与自定义LLM工具（包括检索增强生成（RAG）、重新排名算法和聊天机器人）相结合，以帮助用户、支持开发人员并提出对正式文档的更新。本文介绍了设计和评估这些工具的初步经验，重点是系统架构、使用RAG和重新排名获取PETSc特定信息、对各种LLMs和嵌入模型的评估方法以及用户界面设计。利用Argonne领导计算设施资源，我们分析了LLM响应如何增强数值软件的开发和使用，首先关注可扩展的Krylov求解器。我们的目标是建立一个面向科学软件的基于知识的AI的可扩展框架，实现可扩展的支持、丰富的文档和增强的研究和开发工作流程。最后，我们总结了将这一系统扩展为一个稳固、不断发展的平台的方向，促进软件生态系统加速科学发现。

更新时间: 2025-06-25 17:00:05

领域: cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2506.20608v1

H-FEX: A Symbolic Learning Method for Hamiltonian Systems

Hamiltonian systems describe a broad class of dynamical systems governed by Hamiltonian functions, which encode the total energy and dictate the evolution of the system. Data-driven approaches, such as symbolic regression and neural network-based methods, provide a means to learn the governing equations of dynamical systems directly from observational data of Hamiltonian systems. However, these methods often struggle to accurately capture complex Hamiltonian functions while preserving energy conservation. To overcome this limitation, we propose the Finite Expression Method for learning Hamiltonian Systems (H-FEX), a symbolic learning method that introduces novel interaction nodes designed to capture intricate interaction terms effectively. Our experiments, including those on highly stiff dynamical systems, demonstrate that H-FEX can recover Hamiltonian functions of complex systems that accurately capture system dynamics and preserve energy over long time horizons. These findings highlight the potential of H-FEX as a powerful framework for discovering closed-form expressions of complex dynamical systems.

Updated: 2025-06-25 16:53:01

标题: H-FEX：哈密顿系统的符号学习方法

摘要: Hamiltonian系统描述了一类广泛的动力系统，由Hamiltonian函数控制，该函数编码了总能量并决定系统的演化。基于数据驱动的方法，如符号回归和基于神经网络的方法，提供了一种从Hamiltonian系统的观测数据中直接学习系统方程的途径。然而，这些方法通常难以准确捕捉复杂的Hamiltonian函数，同时保持能量守恒。为了克服这一限制，我们提出了用于学习Hamiltonian系统的有限表达方法（H-FEX），这是一种引入了新颖交互节点的符号学习方法，旨在有效捕捉复杂的交互项。我们的实验，包括那些对高度刚性的动力系统，表明H-FEX能够恢复复杂系统的Hamiltonian函数，准确捕捉系统动态并在长时间范围内保持能量。这些发现突显了H-FEX作为一个强大框架，用于发现复杂动力系统的封闭形式表达式的潜力。

更新时间: 2025-06-25 16:53:01

领域: cs.LG

下载: http://arxiv.org/abs/2506.20607v1

LT-PINN: Lagrangian Topology-conscious Physics-informed Neural Network for Boundary-focused Engineering Optimization

Physics-informed neural networks (PINNs) have emerged as a powerful meshless tool for topology optimization, capable of simultaneously determining optimal topologies and physical solutions. However, conventional PINNs rely on density-based topology descriptions, which necessitate manual interpolation and limit their applicability to complex geometries. To address this, we propose Lagrangian topology-conscious PINNs (LT-PINNs), a novel framework for boundary-focused engineering optimization. By parameterizing the control variables of topology boundary curves as learnable parameters, LT-PINNs eliminate the need for manual interpolation and enable precise boundary determination. We further introduce specialized boundary condition loss function and topology loss function to ensure sharp and accurate boundary representations, even for intricate topologies. The accuracy and robustness of LT-PINNs are validated via two types of partial differential equations (PDEs), including elastic equation with Dirichlet boundary conditions and Laplace's equation with Neumann boundary conditions. Furthermore, we demonstrate effectiveness of LT-PINNs on more complex time-dependent and time-independent flow problems without relying on measurement data, and showcase their engineering application potential in flow velocity rearrangement, transforming a uniform upstream velocity into a sine-shaped downstream profile. The results demonstrate (1) LT-PINNs achieve substantial reductions in relative L2 errors compared with the state-of-art density topology-oriented PINNs (DT-PINNs), (2) LT-PINNs can handle arbitrary boundary conditions, making them suitable for a wide range of PDEs, and (3) LT-PINNs can infer clear topology boundaries without manual interpolation, especially for complex topologies.

Updated: 2025-06-25 16:48:42

标题: LT-PINN：拉格朗日拓扑意识物理信息神经网络用于以边界为重点的工程优化

摘要: 物理信息神经网络（PINNs）已经成为一种强大的无网格工具，用于拓扑优化，能够同时确定最佳拓扑结构和物理解。然而，传统的PINNs依赖基于密度的拓扑描述，这需要手动插值并限制了它们对复杂几何形状的适用性。为了解决这个问题，我们提出了拉格朗日拓扑意识PINNs（LT-PINNs），这是一个新颖的面向边界的工程优化框架。通过将拓扑边界曲线的控制变量参数化为可学习参数，LT-PINNs消除了手动插值的需要，并实现了精确的边界确定。我们进一步引入了专门的边界条件损失函数和拓扑损失函数，以确保尖锐和准确的边界表示，即使是复杂的拓扑结构也能实现。LT-PINNs的准确性和鲁棒性通过两种类型的偏微分方程（PDEs）进行验证，包括带有Dirichlet边界条件的弹性方程和带有Neumann边界条件的拉普拉斯方程。此外，我们展示了LT-PINNs在更复杂的时变和时不变流问题上的有效性，而不依赖于测量数据，并展示了它们在流速重排中的工程应用潜力，将均匀的上游速度转化为正弦形状的下游剖面。结果表明：（1）与最先进的密度拓扑导向PINNs（DT-PINNs）相比，LT-PINNs在相对L2误差方面取得了显著的降低；（2）LT-PINNs可以处理任意边界条件，使它们适用于广泛的PDEs；（3）LT-PINNs可以推断出清晰的拓扑边界，无需手动插值，特别适用于复杂的拓扑结构。

更新时间: 2025-06-25 16:48:42

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2506.06300v3

Weakly Supervised Object Segmentation by Background Conditional Divergence

As a computer vision task, automatic object segmentation remains challenging in specialized image domains without massive labeled data, such as synthetic aperture sonar images, remote sensing, biomedical imaging, etc. In any domain, obtaining pixel-wise segmentation masks is expensive. In this work, we propose a method for training a masking network to perform binary object segmentation using weak supervision in the form of image-wise presence or absence of an object of interest, which provides less information but may be obtained more quickly from manual or automatic labeling. A key step in our method is that the segmented objects can be placed into background-only images to create realistic, images of the objects with counterfactual backgrounds. To create a contrast between the original and counterfactual background images, we propose to first cluster the background-only images, and then during learning create counterfactual images that blend objects segmented from their original source backgrounds to backgrounds chosen from a targeted cluster. One term in the training loss is the divergence between these counterfactual images and the real object images with backgrounds of the target cluster. The other term is a supervised loss for background-only images. While an adversarial critic could provide the divergence, we use sample-based divergences. We conduct experiments on side-scan and synthetic aperture sonar in which our approach succeeds compared to previous unsupervised segmentation baselines that were only tested on natural images. Furthermore, to show generality we extend our experiments to natural images, obtaining reasonable performance with our method that avoids pretrained networks, generative networks, and adversarial critics. The basecode for this work can be found at \href{GitHub}{https://github.com/bakerhassan/WSOS}.

Updated: 2025-06-25 16:46:46

标题: 背景条件分歧弱监督目标分割

摘要: 作为计算机视觉任务，自动目标分割在没有大量标记数据的专业图像领域（如合成孔径声纳图像、遥感、生物医学成像等）仍然具有挑战性。在任何领域中，获取像素级分割掩模都是昂贵的。在这项工作中，我们提出了一种方法，通过弱监督来训练一个遮罩网络，以执行二进制目标分割，弱监督采用感兴趣对象的图像级出现或缺失形式，提供了较少的信息，但可能更快地从手动或自动标记中获得。我们方法的一个关键步骤是，分割的对象可以放置在仅包含背景的图像中，以创建具有反事实背景的对象的逼真图像。为了在原始和反事实背景图像之间创建对比，我们提出首先对仅包含背景的图像进行聚类，然后在学习过程中创建从原始来源背景中分割出的对象混合到来自目标聚类的背景的反事实图像。训练损失中的一个项是这些反事实图像与具有目标聚类背景的真实对象图像之间的差异。另一个项是仅包含背景图像的监督损失。虽然敌对评论家可以提供差异，但我们使用基于样本的差异。我们在侧扫描和合成孔径声纳上进行实验，在这些实验中，我们的方法相对于先前仅在自然图像上进行测试的无监督分割基线取得成功。此外，为了展示普适性，我们将实验扩展到自然图像，使用我们的方法在避免预训练网络、生成网络和敌对评论家的情况下获得了合理的性能。此工作的基本代码可以在GitHub上找到：\href{GitHub}{https://github.com/bakerhassan/WSOS}。

更新时间: 2025-06-25 16:46:46

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2506.22505v1

FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation

Language promptable X-ray image segmentation would enable greater flexibility for human-in-the-loop workflows in diagnostic and interventional precision medicine. Prior efforts have contributed task-specific models capable of solving problems within a narrow scope, but expanding to broader use requires additional data, annotations, and training time. Recently, language-aligned foundation models (LFMs) -- machine learning models trained on large amounts of highly variable image and text data thus enabling broad applicability -- have emerged as promising tools for automated image analysis. Existing foundation models for medical image analysis focus on scenarios and modalities where large, richly annotated datasets are available. However, the X-ray imaging modality features highly variable image appearance and applications, from diagnostic chest X-rays to interventional fluoroscopy, with varying availability of data. To pave the way toward an LFM for comprehensive and language-aligned analysis of arbitrary medical X-ray images, we introduce FluoroSAM, a language-promptable variant of the Segment Anything Model, trained from scratch on 3M synthetic X-ray images from a wide variety of human anatomies, imaging geometries, and viewing angles. These include pseudo-ground truth masks for 128 organ types and 464 tools with associated text descriptions. FluoroSAM is capable of segmenting myriad anatomical structures and tools based on natural language prompts, thanks to the novel incorporation of vector quantization (VQ) of text embeddings in the training process. We demonstrate FluoroSAM's performance quantitatively on real X-ray images and showcase on several applications how FluoroSAM is a key enabler for rich human-machine interaction in the X-ray image acquisition and analysis context. Code is available at https://github.com/arcadelab/fluorosam.

Updated: 2025-06-25 16:40:39

标题: FluoroSAM：一种可用于灵活X射线图像分割的语言提示基础模型

摘要: 可提示语言的X射线图像分割将为诊断和干预精准医学中的人机协作工作流程提供更大的灵活性。以往的努力已经为解决狭窄范围内的问题提供了特定任务模型，但要扩展到更广泛的用途需要额外的数据、注释和培训时间。最近，基于语言对齐的基础模型（LFMs）——机器学习模型在大量高度变化的图像和文本数据上进行训练，从而实现广泛适用性——已成为自动图像分析的有希望的工具。现有的医学图像分析基础模型侧重于具有大量丰富注释数据集的场景和模态。然而，X射线成像模态具有高度可变的图像外观和应用，从诊断胸部X射线到干预性透视，数据的可用性也不同。为了为任意医学X射线图像的全面和与语言对齐的分析铺平道路，我们介绍了FluoroSAM，这是Segment Anything Model的一种可提示语言的变体，从头开始在来自各种人体解剖学、成像几何和观察角度的3M合成X射线图像上进行训练。这些包括128种器官类型的伪地面真实掩模和与之相关的464种工具的文本描述。FluoroSAM能够根据自然语言提示分割众多解剖结构和工具，这得益于在训练过程中新颖地引入了文本嵌入的向量量化（VQ）。我们 quantitatively展示了FluoroSAM在真实X射线图像上的性能，并展示了FluoroSAM在几个应用中如何成为X射线图像采集和分析环境中丰富的人机交互的关键推动因素。代码可在 https://github.com/arcadelab/fluorosam 获得。

更新时间: 2025-06-25 16:40:39

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.08059v3

CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video

We introduce CogGen, a learner-centered AI architecture that transforms programming videos into interactive, adaptive learning experiences by integrating student modeling with generative AI tutoring based on the Cognitive Apprenticeship framework. The architecture consists of three components: (1) video segmentation by learning goals, (2) a conversational tutoring engine applying Cognitive Apprenticeship strategies, and (3) a student model using Bayesian Knowledge Tracing to adapt instruction. Our technical evaluation demonstrates effective video segmentation accuracy and strong pedagogical alignment across knowledge, method, action, and interaction layers. Ablation studies confirm the necessity of each component in generating effective guidance. This work advances AI-powered tutoring by bridging structured student modeling with interactive AI conversations, offering a scalable approach to enhancing video-based programming education.

Updated: 2025-06-25 16:39:05

标题: CogGen：一种以学习者为中心的生成式人工智能架构，用于编程视频智能辅导

摘要: 我们介绍了CogGen，这是一个以学习者为中心的人工智能架构，通过将学生建模与基于认知学徒框架的生成式人工智能辅导相结合，将编程视频转化为交互式、自适应的学习体验。该架构包括三个组件：（1）根据学习目标进行视频分段，（2）应用认知学徒策略的对话辅导引擎，以及（3）使用贝叶斯知识追踪来调整指导的学生模型。我们的技术评估展示了有效的视频分段准确性以及跨知识、方法、行动和互动层的强大教学对齐。消融研究证实了每个组件在生成有效指导中的必要性。这项工作通过将结构化的学生建模与交互式人工智能对话相结合，推进了基于人工智能的辅导，提供了一个可扩展的方法来增强基于视频的编程教育。

更新时间: 2025-06-25 16:39:05

领域: cs.AI

下载: http://arxiv.org/abs/2506.20600v1

Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges

The global demand for sustainable protein sources has accelerated the need for intelligent tools that can rapidly process and synthesise domain-specific scientific knowledge. In this study, we present a proof-of-concept multi-agent Artificial Intelligence (AI) framework designed to support sustainable protein production research, with an initial focus on microbial protein sources. Our Retrieval-Augmented Generation (RAG)-oriented system consists of two GPT-based LLM agents: (1) a literature search agent that retrieves relevant scientific literature on microbial protein production for a specified microbial strain, and (2) an information extraction agent that processes the retrieved content to extract relevant biological and chemical information. Two parallel methodologies, fine-tuning and prompt engineering, were explored for agent optimisation. Both methods demonstrated effectiveness at improving the performance of the information extraction agent in terms of transformer-based cosine similarity scores between obtained and ideal outputs. Mean cosine similarity scores were increased by up to 25%, while universally reaching mean scores of $\geq 0.89$ against ideal output text. Fine-tuning overall improved the mean scores to a greater extent (consistently of $\geq 0.94$) compared to prompt engineering, although lower statistical uncertainties were observed with the latter approach. A user interface was developed and published for enabling the use of the multi-agent AI system, alongside preliminary exploration of additional chemical safety-based search capabilities

Updated: 2025-06-25 16:37:46

标题: LLM的精细调整和提示工程，用于创建多智能体人工智能以解决可持续蛋白质生产挑战

摘要: 全球对可持续蛋白质来源的需求加速了对能够快速处理和综合领域特定科学知识的智能工具的需求。在这项研究中，我们提出了一个概念验证的多智能体人工智能（AI）框架，旨在支持可持续蛋白质生产研究，初始重点放在微生物蛋白质来源上。我们的检索增强生成（RAG）导向系统由两个基于GPT的LLM智能体组成：（1）一个文献检索智能体，检索指定微生物菌株的微生物蛋白质生产相关科学文献；（2）一个信息提取智能体，处理检索到的内容以提取相关生物和化学信息。对于智能体的优化，探索了两种并行方法，微调和提示工程。这两种方法都证明了在基于变压器的余弦相似性分数方面提高信息提取智能体性能的有效性。平均余弦相似性分数提高了高达25％，同时普遍达到了与理想输出文本的平均分数≥0.89。总体而言，微调比提示工程更有效地提高了平均分数（始终≥0.94），尽管后者方法观察到较低的统计不确定性。为了实现多智能体AI系统的使用，开发并发布了一个用户界面，同时初步探索了基于化学安全的额外搜索功能。

更新时间: 2025-06-25 16:37:46

领域: cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2506.20598v1

AI in the Writing Process: How Purposeful AI Support Fosters Student Writing

The ubiquity of technologies like ChatGPT has raised concerns about their impact on student writing, particularly regarding reduced learner agency and superficial engagement with content. While standalone chat-based LLMs often produce suboptimal writing outcomes, evidence suggests that purposefully designed AI writing support tools can enhance the writing process. This paper investigates how different AI support approaches affect writers' sense of agency and depth of knowledge transformation. Through a randomized control trial with 90 undergraduate students, we compare three conditions: (1) a chat-based LLM writing assistant, (2) an integrated AI writing tool to support diverse subprocesses, and (3) a standard writing interface (control). Our findings demonstrate that, among AI-supported conditions, students using the integrated AI writing tool exhibited greater agency over their writing process and engaged in deeper knowledge transformation overall. These results suggest that thoughtfully designed AI writing support targeting specific aspects of the writing process can help students maintain ownership of their work while facilitating improved engagement with content.

Updated: 2025-06-25 16:34:09

标题: AI在写作过程中的应用：如何有目的的AI支持促进学生写作

摘要: ChatGPT等技术的普及引发了对它们对学生写作的影响的担忧，特别是关于降低学习者主动性和表面上对内容的参与。虽然独立的基于聊天的LLM经常产生次优的写作结果，但证据表明，经过有意设计的AI写作支持工具可以增强写作过程。本文调查了不同的AI支持方法如何影响写作者的主观性和知识转化的深度。通过对90名本科生进行随机对照试验，我们比较了三种条件：（1）基于聊天的LLM写作助手，（2）集成的AI写作工具，支持不同的子过程，和（3）标准写作界面（对照组）。我们的发现表明，在AI支持的条件中，使用集成的AI写作工具的学生对他们的写作过程拥有更大的主观性，并总体上进行了更深入的知识转化。这些结果表明，经过深思熟虑设计的针对写作过程特定方面的AI写作支持可以帮助学生保持对他们的工作的所有权，同时促进更好地参与内容。

更新时间: 2025-06-25 16:34:09

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2506.20595v1

On the Role of Context in Reading Time Prediction

We present a new perspective on how readers integrate context during real-time language comprehension. Our proposals build on surprisal theory, which posits that the processing effort of a linguistic unit (e.g., a word) is an affine function of its in-context information content. We first observe that surprisal is only one out of many potential ways that a contextual predictor can be derived from a language model. Another one is the pointwise mutual information (PMI) between a unit and its context, which turns out to yield the same predictive power as surprisal when controlling for unigram frequency. Moreover, both PMI and surprisal are correlated with frequency. This means that neither PMI nor surprisal contains information about context alone. In response to this, we propose a technique where we project surprisal onto the orthogonal complement of frequency, yielding a new contextual predictor that is uncorrelated with frequency. Our experiments show that the proportion of variance in reading times explained by context is a lot smaller when context is represented by the orthogonalized predictor. From an interpretability standpoint, this indicates that previous studies may have overstated the role that context has in predicting reading times.

Updated: 2025-06-25 16:32:48

标题: 关于上下文在阅读时间预测中的作用

摘要: 我们提出了一个关于读者在实时语言理解过程中如何整合上下文的新视角。我们的提议建立在惊奇理论的基础上，该理论认为语言单位（例如词语）的处理工作量是其在上下文中信息内容的仿射函数。我们首先观察到惊奇只是从语言模型中获取上下文预测器的众多潜在方式之一。另一种方式是单位与其上下文之间的点间互信息（PMI），当控制单字频率时，结果显示PMI具有与惊奇相同的预测能力。此外，PMI和惊奇都与频率相关。这意味着PMI和惊奇都不包含关于上下文的信息。作为对此的回应，我们提出了一种技术，将惊奇投影到频率的正交补空间上，得到一个与频率不相关的新上下文预测器。我们的实验表明，当上下文由正交化预测器表示时，阅读时间中由上下文解释的方差比例要小得多。从可解释性的角度来看，这表明先前的研究可能夸大了上下文在预测阅读时间中的作用。

更新时间: 2025-06-25 16:32:48

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.08160v4

The State of Large Language Models for African Languages: Progress and Challenges

Large Language Models (LLMs) are transforming Natural Language Processing (NLP), but their benefits are largely absent for Africa's 2,000 low-resource languages. This paper comparatively analyzes African language coverage across six LLMs, eight Small Language Models (SLMs), and six Specialized SLMs (SSLMs). The evaluation covers language coverage, training sets, technical limitations, script problems, and language modelling roadmaps. The work identifies 42 supported African languages and 23 available public data sets, and it shows a big gap where four languages (Amharic, Swahili, Afrikaans, and Malagasy) are always treated while there is over 98\% of unsupported African languages. Moreover, the review shows that just Latin, Arabic, and Ge'ez scripts are identified while 20 active scripts are neglected. Some of the primary challenges are lack of data, tokenization biases, computational costs being very high, and evaluation issues. These issues demand language standardization, corpus development by the community, and effective adaptation methods for African languages.

Updated: 2025-06-25 16:31:32

标题: 非洲语言的大型语言模型发展现状：进展与挑战

摘要: 大型语言模型（LLMs）正在改变自然语言处理（NLP），但它们的好处在非洲拥有2000种低资源语言的情况下很大程度上缺失。本文比较分析了六种LLM、八种小型语言模型（SLMs）和六种专门化SLMs在非洲语言覆盖方面的情况。评估涵盖了语言覆盖范围、训练集、技术限制、脚本问题和语言建模路线图。该研究确定了支持的非洲语言42种和可用的公共数据集23个，并显示出一个巨大的差距，即尽管有98%的非洲语言得不到支持，但四种语言（阿姆哈拉语、斯瓦希里语、南非荷兰语和马达加斯加语）始终受到重视。此外，审查显示只有拉丁文、阿拉伯文和盖兹文脚本被识别，而其他20种活跃脚本被忽视。一些主要挑战包括数据缺乏、标记化偏见、计算成本非常高以及评估问题。这些问题要求语言标准化、社区进行语料库开发以及为非洲语言提供有效的适应方法。

更新时间: 2025-06-25 16:31:32

领域: cs.AI

下载: http://arxiv.org/abs/2506.02280v2

On the Impact of Sybil-based Attacks on Mobile Crowdsensing for Transportation

Mobile Crowd-Sensing (MCS) enables users with personal mobile devices (PMDs) to gain information on their surroundings. Users collect and contribute data on different phenomena using their PMD sensors, and the MCS system processes this data to extract valuable information for end users. Navigation MCS-based applications (N-MCS) are prevalent and important for transportation: users share their location and speed while driving and, in return, find efficient routes to their destinations. However, N-MCS are currently vulnerable to malicious contributors, often termed Sybils: submitting falsified data, seemingly from many devices that are not truly present on target roads, falsely reporting congestion when there is none, thus changing the road status the N-MCS infers. The attack effect is that the N-MCS returns suboptimal routes to users, causing late arrival and, overall, deteriorating road traffic flow. We investigate exactly the impact of Sybil-based attacks on N-MCS: we design an N-MCS system that offers efficient routing on top of the vehicular simulator SUMO, using the InTAS road network as our scenario. We design experiments attacking an individual N-MCS user as well as a larger population of users, selecting the adversary targets based on graph-theoretical arguments. Our experiments show that the resources required for a successful attack depend on the location of the attack (i.e., the surrounding road network and traffic) and the extent of Sybil contributed data for the targeted road(s). We demonstrate that Sybil attacks can alter the route of N-MCS users, increasing average travel time by 20% with Sybils 3% of the N-MCS user population.

Updated: 2025-06-25 16:26:01

标题: 关于Sybil攻击对交通移动众包感知的影响

摘要: 移动众包感知（MCS）使个人移动设备（PMDs）的用户能够获取周围环境的信息。用户使用他们的PMD传感器收集和贡献不同现象的数据，MCS系统处理这些数据以提取有价值的信息供最终用户使用。基于导航的MCS应用（N-MCS）在交通中非常普遍和重要：用户在驾驶时分享他们的位置和速度，并得到高效路径到达目的地。然而，N-MCS目前容易受到恶意贡献者的攻击，通常被称为Sybils：提交虚假数据，似乎来自许多并非真实存在在目标道路上的设备，错误地报告拥堵情况，从而改变N-MCS推断的道路状态。攻击效果是N-MCS向用户返回次优路径，导致延迟到达，总体上恶化道路交通流量。我们研究了Sybil攻击对N-MCS的确切影响：我们设计了一个基于SUMO车辆模拟器的N-MCS系统，使用InTAS道路网络作为我们的场景，为用户提供高效的路径规划。我们设计了攻击单个N-MCS用户以及更大用户群体的实验，基于图论论证选择对手目标。我们的实验表明，成功攻击所需的资源取决于攻击位置（即周围道路网络和交通）以及Sybil为目标道路提供的数据的程度。我们证明Sybil攻击可以改变N-MCS用户的路径，将平均旅行时间增加20%，Sybils占N-MCS用户人口的3%。

更新时间: 2025-06-25 16:26:01

领域: cs.CR

下载: http://arxiv.org/abs/2506.20585v1

The kernel of graph indices for vector search

The most popular graph indices for vector search use principles from computational geometry to build the graph. Hence, their formal graph navigability guarantees are only valid in Euclidean space. In this work, we show that machine learning can be used to build graph indices for vector search in metric and non-metric vector spaces (e.g., for inner product similarity). From this novel perspective, we introduce the Support Vector Graph (SVG), a new type of graph index that leverages kernel methods to establish the graph connectivity and that comes with formal navigability guarantees valid in metric and non-metric vector spaces. In addition, we interpret the most popular graph indices, including HNSW and DiskANN, as particular specializations of SVG and show that new indices can be derived from the principles behind this specialization. Finally, we propose SVG-L0 that incorporates an $\ell_0$ sparsity constraint into the SVG kernel method to build graphs with a bounded out-degree. This yields a principled way of implementing this practical requirement, in contrast to the traditional heuristic of simply truncating the out edges of each node. Additionally, we show that SVG-L0 has a self-tuning property that avoids the heuristic of using a set of candidates to find the out-edges of each node and that keeps its computational complexity in check.

Updated: 2025-06-25 16:24:55

标题: 图指标的内核用于向量搜索

摘要: 用于向量搜索的最流行的图指标使用计算几何原理构建图形。因此，它们的形式图导航保证仅在欧几里得空间中有效。在这项工作中，我们展示了机器学习可以用于在度量和非度量向量空间（例如，用于内积相似性）中构建图指标。从这个新颖的角度，我们介绍了支持向量图（SVG），一种利用核方法建立图连通性的新型图指标，并具有在度量和非度量向量空间中有效的形式导航保证。此外，我们将最流行的图指标，包括HNSW和DiskANN，解释为SVG的特定特殊化，并展示可以从这种特殊化背后的原则中推导出新的指标。最后，我们提出了SVG-L0，将$\ell_0$稀疏约束结合到SVG核方法中，以构建带有有界出度的图。这为实现这一实际要求提供了一个基本的方法，与传统的简单截断每个节点的出边的启发式方法相反。此外，我们展示了SVG-L0具有自调整属性，避免使用一组候选项来找到每个节点的出边的启发式方法，并保持其计算复杂性可控。

更新时间: 2025-06-25 16:24:55

领域: cs.LG

下载: http://arxiv.org/abs/2506.20584v1

Rethinking Early Stopping: Refine, Then Calibrate

Machine learning classifiers often produce probabilistic predictions that are critical for accurate and interpretable decision-making in various domains. The quality of these predictions is generally evaluated with proper losses, such as cross-entropy, which decompose into two components: calibration error assesses general under/overconfidence, while refinement error measures the ability to distinguish different classes. In this paper, we present a novel variational formulation of the calibration-refinement decomposition that sheds new light on post-hoc calibration, and enables rapid estimation of the different terms. Equipped with this new perspective, we provide theoretical and empirical evidence that calibration and refinement errors are not minimized simultaneously during training. Selecting the best epoch based on validation loss thus leads to a compromise point that is suboptimal for both terms. To address this, we propose minimizing refinement error only during training (Refine,...), before minimizing calibration error post hoc, using standard techniques (...then Calibrate). Our method integrates seamlessly with any classifier and consistently improves performance across diverse classification tasks.

Updated: 2025-06-25 16:24:12

标题: 重新思考早停止：先优化，再校准

摘要: 机器学习分类器通常会产生关键的概率预测，这对于各个领域中准确和可解释的决策至关重要。这些预测的质量通常通过适当的损失函数进行评估，例如交叉熵，该损失函数分解为两个部分：校准误差评估一般的低估/高估情况，而精细误差则衡量了区分不同类别的能力。在本文中，我们提出了一种新颖的校准-精细分解的变分形式，这为事后校准提供了新的视角，并实现了对不同项的快速估计。凭借这种新视角，我们提供了理论和实证证据，证明校准和精细误差在训练过程中并不同时最小化。因此，基于验证损失选择最佳时期导致了一个妥协点，对于这两个项都是次优的。为了解决这个问题，我们提出仅在训练过程中最小化精细误差（精炼...），然后使用标准技术在事后最小化校准误差（...然后校准）。我们的方法与任何分类器无缝集成，并在各种分类任务中持续提高性能。

更新时间: 2025-06-25 16:24:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2501.19195v2

Dense Video Captioning using Graph-based Sentence Summarization

Recently, dense video captioning has made attractive progress in detecting and captioning all events in a long untrimmed video. Despite promising results were achieved, most existing methods do not sufficiently explore the scene evolution within an event temporal proposal for captioning, and therefore perform less satisfactorily when the scenes and objects change over a relatively long proposal. To address this problem, we propose a graph-based partition-and-summarization (GPaS) framework for dense video captioning within two stages. For the ``partition" stage, a whole event proposal is split into short video segments for captioning at a finer level. For the ``summarization" stage, the generated sentences carrying rich description information for each segment are summarized into one sentence to describe the whole event. We particularly focus on the ``summarization" stage, and propose a framework that effectively exploits the relationship between semantic words for summarization. We achieve this goal by treating semantic words as nodes in a graph and learning their interactions by coupling Graph Convolutional Network (GCN) and Long Short Term Memory (LSTM), with the aid of visual cues. Two schemes of GCN-LSTM Interaction (GLI) modules are proposed for seamless integration of GCN and LSTM. The effectiveness of our approach is demonstrated via an extensive comparison with the state-of-the-arts methods on the two benchmarks ActivityNet Captions dataset and YouCook II dataset.

Updated: 2025-06-25 16:23:43

标题: 密集视频字幕生成的基于图的句子总结技术

摘要: 最近，密集视频字幕在检测和为长时间未剪辑的视频中的所有事件进行字幕方面取得了令人满意的进展。尽管取得了令人鼓舞的结果，但大多数现有方法并未充分探索事件时间提议中场景演变的情况，因此在场景和对象在相对长的提议时间内发生变化时表现不够令人满意。为了解决这个问题，我们提出了一个基于图的分区和总结（GPaS）框架用于密集视频字幕，包括两个阶段。对于“分区”阶段，整个事件提议被分割成短视频片段，以便在更细的层次上进行字幕。对于“总结”阶段，为每个片段生成的带有丰富描述信息的句子被总结成一个句子来描述整个事件。我们特别关注“总结”阶段，并提出一个能有效利用语义词之间关系进行总结的框架。通过将语义词视为图中的节点，并通过耦合图卷积网络（GCN）和长短期记忆（LSTM）学习它们的交互，以视觉线索的帮助来实现这一目标。提出了两种GCN-LSTM交互（GLI）模块的方案，以实现GCN和LSTM的无缝集成。我们通过与ActivityNet字幕数据集和YouCook II数据集上的最新方法进行广泛比较，证明了我们方法的有效性。

更新时间: 2025-06-25 16:23:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.20583v1

Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair

LLMs have garnered considerable attention for their potential to streamline Automated Program Repair (APR). LLM-based approaches can either insert the correct code or directly generate patches when provided with buggy methods. However, most of LLM-based APR methods rely on a single type of software information, without fully leveraging different software artifacts. Despite this, many LLM-based approaches do not explore which specific types of information best assist in APR. Addressing this gap is crucial for advancing LLM-based APR techniques. We propose DEVLoRe to use issue content (description and message) and stack error traces to localize buggy methods, then rely on debug information in buggy methods and issue content and stack error to localize buggy lines and generate plausible patches which can pass all unit tests. The results show that while issue content is particularly effective in assisting LLMs with fault localization and program repair, different types of software artifacts complement each other. By incorporating different artifacts, DEVLoRe successfully locates 49.3% and 47.6% of single and non-single buggy methods and generates 56.0% and 14.5% plausible patches for the Defects4J v2.0 dataset, respectively. This outperforms current state-of-the-art APR methods. Furthermore, we re-implemented and evaluated our framework, demonstrating its effectiveness in its effectiveness in resolving 9 unique issues compared to other state-of-the-art frameworks using the same or more advanced models on SWE-bench Lite.We also discussed whether a leading framework for Python code can be directly applied to Java code, or vice versa. The source code and experimental results of this work for replication are available at https://github.com/XYZboom/DEVLoRe.

Updated: 2025-06-25 16:21:54

标题: 整合各种软件工件以实现基于LLM的更好的漏洞定位和程序修复

摘要: LLMs已经引起了人们的广泛关注，因为它们有潜力简化自动程序修复（APR）。基于LLM的方法可以在提供有错误的方法时，要么插入正确的代码，要么直接生成补丁。然而，大多数基于LLM的APR方法依赖于单一类型的软件信息，而没有充分利用不同的软件构件。尽管如此，许多基于LLM的方法并没有探索哪种特定类型的信息最有助于APR。填补这一空白对于推进基于LLM的APR技术至关重要。我们提出了DEVLoRe，利用问题内容（描述和消息）和堆栈错误跟踪来定位有错误的方法，然后依赖于有错误的方法中的调试信息以及问题内容和堆栈错误来定位有错误的行并生成可能通过所有单元测试的可行补丁。结果表明，问题内容在帮助LLMs进行故障定位和程序修复方面特别有效，不同类型的软件构件互补。通过整合不同的构件，DEVLoRe成功地定位了Defects4J v2.0数据集中的49.3%和47.6%的单一和非单一有错误的方法，并分别生成了56.0%和14.5%的可行补丁。这优于当前最先进的APR方法。此外，我们重新实现并评估了我们的框架，展示了与其他最先进框架相比，解决了9个独特问题的有效性。我们还讨论了一个领先的Python代码框架是否可以直接应用于Java代码，反之亦然。这项工作的源代码和实验结果可在https://github.com/XYZboom/DEVLoRe 上复制。

更新时间: 2025-06-25 16:21:54

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2412.03905v3

Unlocking In-Context Learning for Natural Datasets Beyond Language Modelling

Large Language Models (LLMs) exhibit In-Context Learning (ICL), which enables the model to perform new tasks conditioning only on the examples provided in the context without updating the model's weights. While ICL offers fast adaptation across natural language tasks and domains, its emergence is less straightforward for modalities beyond text. In this work, we systematically uncover properties present in LLMs that support the emergence of ICL for autoregressive models and various modalities by promoting the learning of the needed mechanisms for ICL. We identify exact token repetitions in the training data sequences as an important factor for ICL. Such repetitions further improve stability and reduce transiency in ICL performance. Moreover, we emphasise the significance of training task difficulty for the emergence of ICL. Finally, by applying our novel insights on ICL emergence, we unlock ICL capabilities for various visual datasets and a more challenging EEG classification task in a few-shot learning regime.

Updated: 2025-06-25 16:21:31

标题: 解锁超越语言建模的自然数据集中的上下文学习

摘要: 大型语言模型（LLMs）展示了上下文学习（ICL），这使得模型能够在仅基于上下文中提供的示例的条件下执行新任务，而无需更新模型的权重。虽然ICL可以快速适应自然语言任务和领域，但对于超出文本的模态，其出现并不那么直接。在这项工作中，我们系统地揭示了支持自回归模型和各种模态出现ICL的LLMs中存在的属性，通过促进学习ICL所需机制。我们确定训练数据序列中确切的标记重复是ICL的一个重要因素。这种重复进一步提高了ICL性能的稳定性和减少了其瞬变性。此外，我们强调训练任务难度对ICL出现的重要性。最后，通过应用我们对ICL出现的新见解，我们在少样本学习制度中为各种视觉数据集和更具挑战性的脑电分类任务解锁了ICL能力。

更新时间: 2025-06-25 16:21:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2501.06256v2

Causal Representation Learning with Observational Grouping for CXR Classification

Identifiable causal representation learning seeks to uncover the true causal relationships underlying a data generation process. In medical imaging, this presents opportunities to improve the generalisability and robustness of task-specific latent features. This work introduces the concept of grouping observations to learn identifiable representations for disease classification in chest X-rays via an end-to-end framework. Our experiments demonstrate that these causal representations improve generalisability and robustness across multiple classification tasks when grouping is used to enforce invariance w.r.t race, sex, and imaging views.

Updated: 2025-06-25 16:17:36

标题: 使用观察分组进行胸片分类的因果表示学习

摘要: 可识别的因果关系表示学习旨在揭示数据生成过程中潜在的真实因果关系。在医学成像领域，这为改进特定任务的潜在特征的泛化能力和稳健性提供了机会。本文介绍了通过端到端框架将观测分组以学习用于胸部X射线疾病分类的可识别表示的概念。我们的实验表明，这些因果表示在使用分组强制实现与种族、性别和成像视图不变性时，能够提高多个分类任务的泛化能力和稳健性。

更新时间: 2025-06-25 16:17:36

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.20582v1

TabArena: A Living Benchmark for Machine Learning on Tabular Data

With the growing popularity of deep learning and foundation models for tabular data, the need for standardized and reliable benchmarks is higher than ever. However, current benchmarks are static. Their design is not updated even if flaws are discovered, model versions are updated, or new models are released. To address this, we introduce TabArena, the first continuously maintained living tabular benchmarking system. To launch TabArena, we manually curate a representative collection of datasets and well-implemented models, conduct a large-scale benchmarking study to initialize a public leaderboard, and assemble a team of experienced maintainers. Our results highlight the influence of validation method and ensembling of hyperparameter configurations to benchmark models at their full potential. While gradient-boosted trees are still strong contenders on practical tabular datasets, we observe that deep learning methods have caught up under larger time budgets with ensembling. At the same time, foundation models excel on smaller datasets. Finally, we show that ensembles across models advance the state-of-the-art in tabular machine learning and investigate the contributions of individual models. We launch TabArena with a public leaderboard, reproducible code, and maintenance protocols to create a living benchmark available at https://tabarena.ai.

Updated: 2025-06-25 16:14:44

标题: TabArena：基于表格数据的机器学习活动基准

摘要: 随着深度学习和基础模型在表格数据中的日益普及，对于标准化和可靠的基准测试的需求比以往任何时候都更高。然而，当前的基准测试是静态的。即使发现了缺陷，模型版本已更新，或者发布了新模型，它们的设计也没有更新。为了解决这个问题，我们推出了TabArena，这是第一个持续维护的活体表格基准测试系统。为了启动TabArena，我们手动筛选了一组代表性的数据集和实现良好的模型，进行了大规模的基准测试研究以初始化公共排行榜，并组建了一个经验丰富的维护团队。我们的结果凸显了验证方法和超参数配置集成对于评估模型的全面潜力的影响。虽然梯度提升树仍然是实际表格数据集上强有力的竞争者，但我们观察到，在更大的时间预算下，深度学习方法通过集成已经赶上。同时，基础模型在较小的数据集上表现突出。最后，我们展示了跨模型集成推动了表格机器学习领域的最新技术，并研究了个别模型的贡献。我们通过一个公共排行榜、可重现的代码和维护协议推出了TabArena，以创建一个可供访问的活体基准测试，网址为https://tabarena.ai。

更新时间: 2025-06-25 16:14:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.16791v2

Vulnerability Disclosure through Adaptive Black-Box Adversarial Attacks on NIDS

Adversarial attacks, wherein slight inputs are carefully crafted to mislead intelligent models, have attracted increasing attention. However, a critical gap persists between theoretical advancements and practical application, particularly in structured data like network traffic, where interdependent features complicate effective adversarial manipulations. Moreover, ambiguity in current approaches restricts reproducibility and limits progress in this field. Hence, existing defenses often fail to handle evolving adversarial attacks. This paper proposes a novel approach for black-box adversarial attacks, that addresses these limitations. Unlike prior work, which often assumes system access or relies on repeated probing, our method strictly respect black-box constraints, reducing interaction to avoid detection and better reflect real-world scenarios. We present an adaptive feature selection strategy using change-point detection and causality analysis to identify and target sensitive features to perturbations. This lightweight design ensures low computational cost and high deployability. Our comprehensive experiments show the attack's effectiveness in evading detection with minimal interaction, enhancing its adaptability and applicability in real-world scenarios. By advancing the understanding of adversarial attacks in network traffic, this work lays a foundation for developing robust defenses.

Updated: 2025-06-25 16:10:20

标题: 漏洞披露：通过对NIDS进行自适应黑盒对抗攻击

摘要: 对抗性攻击，即精心制作的微小输入以误导智能模型，已经引起了越来越多的关注。然而，在理论进展和实际应用之间仍然存在着关键差距，特别是在结构化数据如网络流量中，相互依赖的特征使得有效的对抗性操作变得复杂。此外，当前方法中的模糊性限制了可重现性，并限制了该领域的进展。因此，现有的防御策略往往无法处理不断进化的对抗性攻击。本文提出了一种新颖的黑盒对抗性攻击方法，以解决这些限制。与以往的研究不同，我们的方法严格遵守黑盒约束，减少交互以避免检测，并更好地反映现实场景。我们提出了一种自适应特征选择策略，利用变点检测和因果分析来识别和针对敏感特征进行扰动。这种轻量化设计确保了低计算成本和高部署性。我们的综合实验表明了攻击在最小交互中逃避检测的有效性，增强了其在现实场景中的适应性和适用性。通过推动对网络流量中的对抗性攻击的理解，这项工作为开发健壮的防御策略奠定了基础。

更新时间: 2025-06-25 16:10:20

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2506.20576v1

Exploring Graph-Transformer Out-of-Distribution Generalization Abilities

Deep learning on graphs has shown remarkable success across numerous applications, including social networks, bio-physics, traffic networks, and recommendation systems. Regardless of their successes, current methods frequently depend on the assumption that training and testing data share the same distribution, a condition rarely met in real-world scenarios. While graph-transformer (GT) backbones have recently outperformed traditional message-passing neural networks (MPNNs) in multiple in-distribution (ID) benchmarks, their effectiveness under distribution shifts remains largely unexplored. In this work, we address the challenge of out-of-distribution (OOD) generalization for graph neural networks, with a special focus on the impact of backbone architecture. We systematically evaluate GT and hybrid backbones in OOD settings and compare them to MPNNs. To do so, we adapt several leading domain generalization (DG) algorithms to work with GTs and assess their performance on a benchmark designed to test a variety of distribution shifts. Our results reveal that GT and hybrid GT-MPNN backbones consistently demonstrate stronger generalization ability compared to MPNNs, even without specialized DG algorithms. Additionally, we propose a novel post-training analysis approach that compares the clustering structure of the entire ID and OOD test datasets, specifically examining domain alignment and class separation. Demonstrating its model-agnostic design, this approach not only provided meaningful insights into GT and MPNN backbones. It also shows promise for broader applicability to DG problems beyond graph learning, offering a deeper perspective on generalization abilities that goes beyond standard accuracy metrics. Together, our findings highlight the promise of graph-transformers for robust, real-world graph learning and set a new direction for future research in OOD generalization.

Updated: 2025-06-25 16:09:24

标题: 探究图形转换器的越界泛化能力

摘要: 图上的深度学习在许多应用中取得了显著的成功，包括社交网络、生物物理学、交通网络和推荐系统。尽管取得了成功，但当前的方法通常依赖于训练和测试数据共享相同分布的假设，这在现实场景中很少实现。虽然图变换器（GT）主干最近在多个分布内（ID）基准测试中表现优于传统的消息传递神经网络（MPNNs），但它们在分布转移下的有效性仍未得到深入探讨。在这项工作中，我们解决了图神经网络在分布外（OOD）泛化方面的挑战，特别关注主干架构的影响。我们系统地评估了GT和混合主干在OOD环境中的表现，并将它们与MPNNs进行比较。为此，我们调整了几种领先的域泛化（DG）算法，使其能够与GT一起工作，并评估它们在一个旨在测试各种分布转移的基准测试上的性能。我们的结果显示，与MPNNs相比，GT和混合GT-MPNN主干在泛化能力方面表现更强，即使没有专门的DG算法。此外，我们提出了一种新颖的后训练分析方法，比较了整个ID和OOD测试数据集的聚类结构，特别关注领域对齐和类别分离。展示了它的模型无关设计，这种方法不仅为GT和MPNN主干提供了有意义的见解，还显示出在图学习之外的更广泛应用前景，为超越标准准确度指标提供了更深入的泛化能力视角。总的来说，我们的研究结果突显了图变换器在强大、现实世界的图学习中的潜力，并为未来在OOD泛化方面的研究开辟了新的方向。

更新时间: 2025-06-25 16:09:24

领域: cs.LG

下载: http://arxiv.org/abs/2506.20575v1

Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time Series

Anomaly detection in multivariate time series is an important problem across various fields such as healthcare, financial services, manufacturing or physics detector monitoring. Accurately identifying when unexpected errors or faults occur is essential, yet challenging, due to the unknown nature of anomalies and the complex interdependencies between time series dimensions. In this paper, we investigate transformer-based approaches for time series anomaly detection, focusing on the recently proposed iTransformer architecture. Our contributions are fourfold: (i) we explore the application of the iTransformer to time series anomaly detection, and analyse the influence of key parameters such as window size, step size, and model dimensions on performance; (ii) we examine methods for extracting anomaly labels from multidimensional anomaly scores and discuss appropriate evaluation metrics for such labels; (iii) we study the impact of anomalous data present during training and assess the effectiveness of alternative loss functions in mitigating their influence; and (iv) we present a comprehensive comparison of several transformer-based models across a diverse set of datasets for time series anomaly detection.

Updated: 2025-06-25 16:08:22

标题: 基准测试多变量时间序列异常检测的无监督策略

摘要: 多变量时间序列中的异常检测是跨领域的重要问题，涉及医疗保健、金融服务、制造业或物理探测器监测等各个领域。准确识别意外错误或故障发生的时间至关重要，但由于异常的未知性质和时间序列维度之间复杂的相互依赖关系，这一任务具有挑战性。本文探讨了基于transformer的方法用于时间序列异常检测，重点关注最近提出的iTransformer架构。我们的贡献有四点：(i)我们探索了iTransformer在时间序列异常检测中的应用，并分析了关键参数如窗口大小、步长和模型维度对性能的影响；(ii)我们研究了从多维异常分数中提取异常标签的方法，并讨论了适用于这些标签的评估指标；(iii)我们研究了训练过程中存在的异常数据对性能的影响，并评估了减轻其影响的替代损失函数的有效性；(iv)我们对几种基于transformer的模型在各种数据集上进行了全面比较，用于时间序列异常检测。

更新时间: 2025-06-25 16:08:22

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2506.20574v1

LARP: Learner-Agnostic Robust Data Prefiltering

The widespread availability of large public datasets is a key factor behind the recent successes of statistical inference and machine learning methods. However, these datasets often contain some low-quality or contaminated data, to which many learning procedures are sensitive. Therefore, the question of whether and how public datasets should be prefiltered to facilitate accurate downstream learning arises. On a technical level this requires the construction of principled data prefiltering methods which are learner-agnostic robust, in the sense of provably protecting a set of pre-specified downstream learners from corrupted data. In this work, we formalize the problem of Learner-Agnostic Robust data Prefiltering (LARP), which aims at finding prefiltering procedures that minimize a worst-case loss over a pre-specified set of learners. We first instantiate our framework in the context of scalar mean estimation with Huber estimators under the Huber data contamination model. We provide a hardness result on a specific problem instance and analyze several natural prefiltering procedures. Our theoretical results indicate that performing LARP on a heterogeneous set of learners leads to some loss in model performance compared to the alternative of prefiltering data for each learner/use-case individually. We explore the resulting utility loss and its dependence on the problem parameters via extensive experiments on real-world image and tabular data, observing statistically significant reduction in utility. Finally, we model the trade-off between the utility drop and the cost of repeated (learner-specific) prefiltering within a game-theoretic framework and showcase benefits of LARP for large datasets.

Updated: 2025-06-25 16:07:59

标题: LARP：学习者不可知的稳健数据预过滤

摘要: 大量公共数据集的广泛可用性是统计推断和机器学习方法取得最近成功的关键因素。然而，这些数据集通常包含一些低质量或受污染的数据，许多学习过程对此敏感。因此，公共数据集是否以及如何进行预过滤以促进准确的下游学习的问题就出现了。在技术层面，这需要构建基于原则的数据预过滤方法，这些方法是与学习器无关的鲁棒的，从而可以证明保护一组预先指定的下游学习器免受污染数据的影响。在这项工作中，我们形式化了学习器无关的鲁棒数据预过滤（LARP）的问题，旨在找到最小化在预先指定的一组学习器上的最坏情况损失的预过滤程序。我们首先在Huber数据污染模型下使用Huber估计器在标量均值估计的背景下实例化我们的框架。我们在一个特定问题实例上提供了一个难度结果，并分析了几种自然的预过滤程序。我们的理论结果表明，在异质学习器集上进行LARP会导致模型性能相对于为每个学习器/用例单独进行数据预过滤的替代方案会有一定的损失。我们通过对真实世界图像和表格数据进行广泛实验，观察到实用性的显著降低，探索了由于问题参数的依赖性而导致的效用损失。最后，我们在博弈论框架内对效用降低和重复（特定于学习器的）预过滤成本之间的权衡进行建模，并展示了LARP在大型数据集上的好处。

更新时间: 2025-06-25 16:07:59

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2506.20573v1

Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization

In this work, we propose a division-and-summarization (DaS) framework for dense video captioning. After partitioning each untrimmed long video as multiple event proposals, where each event proposal consists of a set of short video segments, we extract visual feature (e.g., C3D feature) from each segment and use the existing image/video captioning approach to generate one sentence description for this segment. Considering that the generated sentences contain rich semantic descriptions about the whole event proposal, we formulate the dense video captioning task as a visual cue aided sentence summarization problem and propose a new two stage Long Short Term Memory (LSTM) approach equipped with a new hierarchical attention mechanism to summarize all generated sentences as one descriptive sentence with the aid of visual features. Specifically, the first-stage LSTM network takes all semantic words from the generated sentences and the visual features from all segments within one event proposal as the input, and acts as the encoder to effectively summarize both semantic and visual information related to this event proposal. The second-stage LSTM network takes the output from the first-stage LSTM network and the visual features from all video segments within one event proposal as the input, and acts as the decoder to generate one descriptive sentence for this event proposal. Our comprehensive experiments on the ActivityNet Captions dataset demonstrate the effectiveness of our newly proposed DaS framework for dense video captioning.

Updated: 2025-06-25 16:02:04

标题: 展示、讲述和总结：利用视觉线索辅助句子概括的密集视频字幕生成

摘要: 在这项工作中，我们提出了一种用于密集视频字幕生成的分割和总结（DaS）框架。在将每个未剪辑的长视频分割为多个事件提案之后，每个事件提案由一组短视频段组成，我们从每个段萃取视觉特征（例如C3D特征），并使用现有的图像/视频字幕生成方法为该段生成一个句子描述。考虑到生成的句子包含关于整个事件提案的丰富语义描述，我们将密集视频字幕生成任务构建为一个受视觉线索辅助的句子总结问题，并提出了一个新的两阶段长短时记忆（LSTM）方法，配备了一种新的分层注意机制，以利用视觉特征总结所有生成的句子为一个描述性句子。具体而言，第一阶段LSTM网络将来自生成的句子的所有语义词和一个事件提案中所有段的视觉特征作为输入，并作为编码器有效地总结与该事件提案相关的语义和视觉信息。第二阶段LSTM网络将来自第一阶段LSTM网络的输出和一个事件提案中所有视频段的视觉特征作为输入，并作为解码器为该事件提案生成一个描述性句子。我们在ActivityNet Captions数据集上进行的全面实验表明，我们新提出的DaS框架对于密集视频字幕生成的有效性。

更新时间: 2025-06-25 16:02:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.20567v1

Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Detecting brain lesions as abnormalities observed in magnetic resonance imaging (MRI) is essential for diagnosis and treatment. In the search of abnormalities, such as tumors and malformations, radiologists may benefit from computer-aided diagnostics that use computer vision systems trained with machine learning to segment normal tissue from abnormal brain tissue. While supervised learning methods require annotated lesions, we propose a new unsupervised approach (Patch2Loc) that learns from normal patches taken from structural MRI. We train a neural network model to map a patch back to its spatial location within a slice of the brain volume. During inference, abnormal patches are detected by the relatively higher error and/or variance of the location prediction. This generates a heatmap that can be integrated into pixel-wise methods to achieve finer-grained segmentation. We demonstrate the ability of our model to segment abnormal brain tissues by applying our approach to the detection of tumor tissues in MRI on T2-weighted images from BraTS2021 and MSLUB datasets and T1-weighted images from ATLAS and WMH datasets. We show that it outperforms the state-of-the art in unsupervised segmentation. The codebase for this work can be found on our \href{https://github.com/bakerhassan/Patch2Loc}{GitHub page}.

Updated: 2025-06-25 16:00:12

标题: Patch2Loc：学习定位补丁以进行无监督脑损伤检测

摘要: 检测磁共振成像（MRI）中观察到的脑部病变作为异常对诊断和治疗至关重要。在寻找异常，如肿瘤和畸形时，放射科医师可能会受益于使用经过机器学习训练的计算机视觉系统进行辅助诊断，以将正常组织与异常脑组织分割开来。虽然监督学习方法需要有标注的病变，但我们提出了一种新的无监督方法（Patch2Loc），该方法从结构MRI中提取正常补丁进行学习。我们训练一个神经网络模型，将一个补丁映射回其在脑容积切片内的空间位置。在推理过程中，通过位置预测的相对较高的误差和/或方差来检测异常补丁。这产生了一个热图，可以与像素级方法集成，以实现更精细的分割。我们通过将我们的方法应用于BraTS2021和MSLUB数据集上的T2加权图像以及ATLAS和WMH数据集上的T1加权图像中的MRI中的肿瘤组织检测，展示了我们的模型分割异常脑组织的能力。我们展示了它在无监督分割方面优于现有技术。此工作的代码库可在我们的GitHub页面上找到：\href{https://github.com/bakerhassan/Patch2Loc}{GitHub页面}。

更新时间: 2025-06-25 16:00:12

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2506.22504v1

DeepQuark: deep-neural-network approach to multiquark bound states

For the first time, we implement the deep-neural-network-based variational Monte Carlo approach for the multiquark bound states, whose complexity surpasses that of electron or nucleon systems due to strong SU(3) color interactions. We design a novel and high-efficiency architecture, DeepQuark, to address the unique challenges in multiquark systems such as stronger correlations, extra discrete quantum numbers, and intractable confinement interaction. Our method demonstrates competitive performance with state-of-the-art approaches, including diffusion Monte Carlo and Gaussian expansion method, in the nucleon, doubly heavy tetraquark, and fully heavy tetraquark systems. Notably, it outperforms existing calculations for pentaquarks, exemplified by the triply heavy pentaquark. For the nucleon, we successfully incorporate three-body flux-tube confinement interactions without additional computational costs. In tetraquark systems, we consistently describe hadronic molecule $T_{cc}$ and compact tetraquark $T_{bb}$ with an unbiased form of wave function ansatz. In the pentaquark sector, we obtain weakly bound $\bar D^*\Xi_{cc}^*$ molecule $P_{cc\bar c}(5715)$ with $S=\frac{5}{2}$ and its bottom partner $P_{bb\bar b}(15569)$. They can be viewed as the analogs of the molecular $T_{cc}$. We recommend experimental search of $P_{cc\bar c}(5715)$ in the D-wave $J/\psi \Lambda_c$ channel. DeepQuark holds great promise for extension to larger multiquark systems, overcoming the computational barriers in conventional methods. It also serves as a powerful framework for exploring confining mechanism beyond two-body interactions in multiquark states, which may offer valuable insights into nonperturbative QCD and general many-body physics.

Updated: 2025-06-25 15:53:18

标题: DeepQuark: 多夸克束缚态的深度神经网络方法

摘要: 我们首次实现了基于深度神经网络的变分蒙特卡洛方法，用于多夸克束缚态，其复杂性超过了由于强SU（3）色相互作用而导致的电子或核子系统。我们设计了一种新颖且高效的架构，DeepQuark，以解决多夸克系统中的独特挑战，如更强的相关性，额外的离散量子数和难以处理的约束相互作用。我们的方法在核子，双重重夸克和完全重夸克系统中展示了与最先进方法（包括扩散蒙特卡洛和高斯展开方法）相竞争的性能。值得注意的是，我们在三重重夸克示例的五夸克方面超越了现有的计算。对于核子，我们成功地将三体流管约束相互作用纳入，而无需额外的计算成本。在四夸克系统中，我们始终使用无偏见的波函数假设描述具有强子分子$T_{cc}$和紧凑四夸克$T_{bb}$。在五夸克领域，我们得到了弱束缚的$\bar D^*\Xi_{cc}^*$分子$P_{cc\bar c}(5715)$及其底部合作伙伴$P_{bb\bar b}(15569)$。它们可以看作是分子$T_{cc}$的类似物。我们建议在D波$J/\psi \Lambda_c$信道中实验搜索$P_{cc\bar c}(5715)$。DeepQuark在扩展到更大多夸克系统方面具有巨大潜力，克服了传统方法中的计算障碍。它还作为一个强大的框架，用于探索多夸克态中超越两体相互作用的约束机制，这可能为非摄动QCD和一般多体物理提供宝贵的见解。

更新时间: 2025-06-25 15:53:18

领域: hep-ph,cs.AI,hep-ex,hep-lat,nucl-th

下载: http://arxiv.org/abs/2506.20555v1

Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control

Traditional wind farm control operates each turbine independently to maximize individual power output. However, coordinated wake steering across the entire farm can substantially increase the combined wind farm energy production. Although dynamic closed-loop control has proven effective in flow control applications, wind farm optimization has relied primarily on static, low-fidelity simulators that ignore critical turbulent flow dynamics. In this work, we present the first reinforcement learning (RL) controller integrated directly with high-fidelity large-eddy simulation (LES), enabling real-time response to atmospheric turbulence through collaborative, dynamic control strategies. Our RL controller achieves a 4.30% increase in wind farm power output compared to baseline operation, nearly doubling the 2.19% gain from static optimal yaw control obtained through Bayesian optimization. These results establish dynamic flow-responsive control as a transformative approach to wind farm optimization, with direct implications for accelerating renewable energy deployment to net-zero targets.

Updated: 2025-06-25 15:53:12

标题: 强化学习通过实现闭环协作控制增加风电场发电量

摘要: 传统的风力发电场控制是独立操作每台涡轮机，以最大化单个功率输出。然而，整个发电场协调的尾流转向可以显着增加整个风力发电场的能量产量。虽然动态闭环控制在流动控制应用中已被证明是有效的，但风力发电场优化主要依赖于静态、低保真度的模拟器，忽略了关键的湍流流动动态。在这项工作中，我们提出了第一个与高保真度大涡模拟(LES)直接集成的强化学习(RL)控制器，通过协作、动态的控制策略实现对大气湍流的实时响应。与基线操作相比，我们的RL控制器使风力发电场功率输出增加了4.30%，几乎是通过贝叶斯优化获得的静态最佳偏航控制的2.19%增益的两倍。这些结果确立了动态流响应控制作为风力发电场优化的一种变革性方法，直接影响了加速可再生能源部署实现净零排放目标。

更新时间: 2025-06-25 15:53:12

领域: physics.flu-dyn,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2506.20554v1

Large Language Model-Driven Code Compliance Checking in Building Information Modeling

This research addresses the time-consuming and error-prone nature of manual code compliance checking in Building Information Modeling (BIM) by introducing a Large Language Model (LLM)-driven approach to semi-automate this critical process. The developed system integrates LLMs such as GPT, Claude, Gemini, and Llama, with Revit software to interpret building codes, generate Python scripts, and perform semi-automated compliance checks within the BIM environment. Case studies on a single-family residential project and an office building project demonstrated the system's ability to reduce the time and effort required for compliance checks while improving accuracy. It streamlined the identification of violations, such as non-compliant room dimensions, material usage, and object placements, by automatically assessing relationships and generating actionable reports. Compared to manual methods, the system eliminated repetitive tasks, simplified complex regulations, and ensured reliable adherence to standards. By offering a comprehensive, adaptable, and cost-effective solution, this proposed approach offers a promising advancement in BIM-based compliance checking, with potential applications across diverse regulatory documents in construction projects.

Updated: 2025-06-25 15:50:34

标题: 基于大型语言模型的建筑信息建模中的代码合规性检查

摘要: 这项研究解决了在建筑信息建模（BIM）中手动代码合规性检查耗时且容易出错的问题，通过引入一个以大型语言模型（LLM）为驱动的方法来半自动化这一关键过程。开发的系统集成了GPT、Claude、Gemini和Llama等LLM，与Revit软件一起解释建筑规范，生成Python脚本，并在BIM环境中执行半自动合规性检查。在一个单户住宅项目和一个办公楼项目的案例研究中，展示了该系统减少合规性检查所需时间和努力，同时提高准确性的能力。它通过自动评估关系并生成可操作报告，简化了对违规行为（如不符合的房间尺寸、材料使用和物体放置）的识别。与手动方法相比，该系统消除了重复任务，简化了复杂的法规，并确保可靠地遵守标准。通过提供一种全面、可调整和具有成本效益的解决方案，该提议的方法在基于BIM的合规性检查中提出了一个有前景的进步，可能在建筑项目中的各种监管文件中有应用潜力。

更新时间: 2025-06-25 15:50:34

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2506.20551v1

Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks

With the rapid advancement of deep learning, particularly through generative adversarial networks (GANs) and diffusion models (DMs), AI-generated images, or ``deepfakes", have become nearly indistinguishable from real ones. These images are widely shared across Online Social Networks (OSNs), raising concerns about their misuse. Existing deepfake detection methods overlook the ``block effects" introduced by compression in OSNs, which obscure deepfake artifacts, and primarily focus on raw images, rarely encountered in real-world scenarios. To address these challenges, we propose PLADA (Pay Less Attention to Deceptive Artifacts), a novel framework designed to tackle the lack of paired data and the ineffective use of compressed images. PLADA consists of two core modules: Block Effect Eraser (B2E), which uses a dual-stage attention mechanism to handle block effects, and Open Data Aggregation (ODA), which processes both paired and unpaired data to improve detection. Extensive experiments across 26 datasets demonstrate that PLADA achieves a remarkable balance in deepfake detection, outperforming SoTA methods in detecting deepfakes on OSNs, even with limited paired data and compression. More importantly, this work introduces the ``block effect" as a critical factor in deepfake detection, providing a robust solution for open-world scenarios. Our code is available at https://github.com/ManyiLee/PLADA.

Updated: 2025-06-25 15:46:41

标题: 不要过多关注欺骗性制品：在线社交网络上对压缩深度伪造的稳健检测

摘要: 随着深度学习的快速发展，特别是通过生成对抗网络（GANs）和扩散模型（DMs），人工智能生成的图像，或称为“deepfakes”，已经几乎无法与真实图像区分开来。这些图像在在线社交网络（OSNs）上广泛分享，引发了对其滥用的担忧。现有的deepfake检测方法忽视了OSNs中由压缩引入的“块效应”，这些效应会掩盖deepfake的痕迹，主要关注原始图像，这在现实世界中很少遇到。为了解决这些挑战，我们提出了PLADA（Pay Less Attention to Deceptive Artifacts），这是一个旨在解决配对数据缺乏和压缩图像使用不当的新框架。PLADA由两个核心模块组成：块效应擦除器（B2E），使用双阶段注意机制处理块效应，以及开放数据聚合（ODA），处理配对和非配对数据以改善检测。对26个数据集进行的广泛实验表明，PLADA在deepfake检测中取得了显著的平衡，优于现有技术在OSNs上检测deepfakes，即使只有有限的配对数据和压缩。更重要的是，这项工作将“块效应”作为deepfake检测的一个关键因素引入，为开放环境场景提供了强大的解决方案。我们的代码可在https://github.com/ManyiLee/PLADA找到。

更新时间: 2025-06-25 15:46:41

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2506.20548v1

Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls

In contextual optimization, a decision-maker leverages contextual information, often referred to as covariates, to better resolve uncertainty and make informed decisions. In this paper, we examine the challenges of contextual decision-making under covariate shift, a phenomenon where the distribution of covariates differs between the training and test environments. Such shifts can lead to inaccurate upstream estimations for test covariates that lie far from the training data, ultimately resulting in suboptimal downstream decisions. To tackle these challenges, we propose a novel approach called Intersection Wasserstein-balls DRO (IW-DRO), which integrates multiple estimation methods into the distributionally robust optimization (DRO) framework. At the core of our approach is an innovative ambiguity set defined as the intersection of two Wasserstein balls, with their centers constructed using appropriate nonparametric and parametric estimators. On the computational side, we reformulate the IW-DRO problem as a tractable convex program and develop an approximate algorithm tailored for large-scale problems to enhance computational efficiency. From a theoretical perspective, we demonstrate that IW-DRO achieves superior performance compared to single Wasserstein-ball DRO models. We further establish performance guarantees by analyzing the coverage of the intersection ambiguity set and the measure concentration of both estimators under the Wasserstein distance. Notably, we derive a finite-sample concentration result for the Nadaraya-Watson kernel estimator under covariate shift. The proposed IW-DRO framework offers practical value for decision-makers operating in uncertain environments affected by covariate shifts.

Updated: 2025-06-25 15:43:13

标题: 在协变量偏移下的情境优化：通过相交的Wasserstein球实现的稳健方法

摘要: 在情境优化中，决策者利用情境信息，通常称为协变量，以更好地解决不确定性并做出明智的决策。本文研究了在协变量转移条件下情境决策制定的挑战，这种情况是指训练和测试环境下协变量的分布不同的现象。这种转移可能导致远离训练数据的测试协变量的上游估计不准确，最终导致下游决策不佳。为了解决这些挑战，我们提出了一种名为Intersection Wasserstein-balls DRO (IW-DRO)的新方法，将多种估计方法整合到分布鲁棒优化(DRO)框架中。我们方法的核心是一个创新的模糊集，定义为两个Wasserstein球的交集，它们的中心使用适当的非参数和参数估计器构建。在计算方面，我们将IW-DRO问题重新表述为一个易处理的凸程序，并开发了一种针对大规模问题的近似算法，以提高计算效率。从理论的角度来看，我们证明了IW-DRO相对于单一Wasserstein球DRO模型具有更优越的性能。我们进一步通过分析交集模糊集的覆盖范围和两个估计器在Wasserstein距离下的测量集中度来建立性能保证。值得注意的是，我们推导了在协变量转移条件下Nadaraya-Watson核估计器的有限样本集中结果。所提出的IW-DRO框架为在受到协变量转移影响的不确定环境中运作的决策者提供了实际价值。

更新时间: 2025-06-25 15:43:13

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2406.02426v2

When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs

Recent advancements in large language models (LLMs) have shifted focus toward scaling inference-time compute, improving performance without retraining the model. A common approach is to sample multiple outputs in parallel, and select one of these as the final output. However, work to date has focused on English and a handful of domains such as math and code. In contrast, we are most interested in techniques that generalize across open-ended tasks, formally verifiable tasks, and across languages. In this work, we study how to robustly scale inference-time compute for open-ended generative tasks in a multilingual, multi-task setting. Our findings show that both sampling strategy based on temperature variation and selection strategy must be adapted to account for diverse domains and varied language settings. We evaluate existing selection methods, revealing that strategies effective in English often fail to generalize across languages. We propose novel sampling and selection strategies specifically adapted for multilingual and multi-task inference scenarios, and show they yield notable gains across languages and tasks. In particular, our combined sampling and selection methods lead to an average +6.8 jump in win-rates for our 8B models on m-ArenaHard-v2.0 prompts, against proprietary models such as Gemini. At larger scale, Command-A (111B model) equipped with our methods, shows +9.0 improvement in win-rates on the same benchmark with just five samples against single-sample decoding, a substantial increase at minimal cost. Our results underscore the need for language- and task-aware approaches to inference-time compute, aiming to democratize performance improvements in underrepresented languages.

Updated: 2025-06-25 15:37:53

标题: 当生活给你样本时：为多语言LLMs扩展推理计算的好处

摘要: 最近大规模语言模型（LLMs）的进展已经将重点转向扩展推理时计算，并在不重新训练模型的情况下改善性能。一种常见的方法是并行采样多个输出，并从中选择一个作为最终输出。然而，迄今为止的工作集中在英语和一些领域，如数学和代码。相比之下，我们最感兴趣的是那些可以横跨开放式任务、正式可验证任务以及不同语言的技术。在这项工作中，我们研究如何在多语言、多任务设置中稳健地扩展推理时计算，以应对开放式生成任务。我们的研究结果表明，基于温度变化的采样策略和选择策略必须适应不同领域和不同语言设置。我们评估了现有的选择方法，并发现在英语中有效的策略往往不能横跨不同语言。我们提出了专门为多语言和多任务推理场景而调整的新颖采样和选择策略，并展示它们在不同语言和任务中取得了显著的收益。特别是，我们的组合采样和选择方法使我们的8B模型在m-ArenaHard-v2.0提示中的平均胜率增加了+6.8，对抗像Gemini这样的专有模型。在更大规模上，配备我们的方法的Command-A（111B模型）在同一基准测试中显示出对单一采样解码的+9.0的胜率提高，这是在最小成本下的大幅增加。我们的结果强调了在推理时计算中需要语言和任务感知方法，旨在使在代表性不足的语言中提高性能成为民主化。

更新时间: 2025-06-25 15:37:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.20544v1

Demonstration of effective UCB-based routing in skill-based queues on real-world data

This paper is about optimally controlling skill-based queueing systems such as data centers, cloud computing networks, and service systems. By means of a case study using a real-world data set, we investigate the practical implementation of a recently developed reinforcement learning algorithm for optimal customer routing. Our experiments show that the algorithm efficiently learns and adapts to changing environments and outperforms static benchmark policies, indicating its potential for live implementation. We also augment the real-world applicability of this algorithm by introducing a new heuristic routing rule to reduce delays. Moreover, we show that the algorithm can optimize for multiple objectives: next to payoff maximization, secondary objectives such as server load fairness and customer waiting time reduction can be incorporated. Tuning parameters are used for balancing inherent performance trade--offs. Lastly, we investigate the sensitivity to estimation errors and parameter tuning, providing valuable insights for implementing adaptive routing algorithms in complex real-world queueing systems.

Updated: 2025-06-25 15:36:43

标题: 在真实世界数据中展示基于UCB的技能队列中有效的路由演示

摘要: 这篇论文探讨了如何最优地控制技能队列系统，如数据中心、云计算网络和服务系统。通过使用真实数据集进行案例研究，我们调查了最近开发的强化学习算法在最佳客户路由方面的实际实施情况。我们的实验表明，该算法能够高效地学习和适应不断变化的环境，并且优于静态基准策略，表明其在实时实施方面的潜力。我们还通过引入新的启发式路由规则来增强该算法在实际应用中的适用性，以减少延迟。此外，我们展示该算法可以优化多个目标：除了收益最大化外，还可以包括服务器负载公平性和客户等待时间缩短等次要目标。调整参数用于平衡固有的性能权衡。最后，我们调查了对估计误差和参数调整的敏感性，为在复杂的实际队列系统中实施自适应路由算法提供了有价值的见解。

更新时间: 2025-06-25 15:36:43

领域: cs.LG,math.OC,60K25, 93E35

下载: http://arxiv.org/abs/2506.20543v1

Adversarial Reasoning at Jailbreaking Time

As large language models (LLMs) are becoming more capable and widespread, the study of their failure cases is becoming increasingly important. Recent advances in standardizing, measuring, and scaling test-time compute suggest new methodologies for optimizing models to achieve high performance on hard tasks. In this paper, we apply these advances to the task of model jailbreaking: eliciting harmful responses from aligned LLMs. We develop an adversarial reasoning approach to automatic jailbreaking that leverages a loss signal to guide the test-time compute, achieving SOTA attack success rates against many aligned LLMs, even those that aim to trade inference-time compute for adversarial robustness. Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.

Updated: 2025-06-25 15:31:17

标题: 越狱时的对抗推理

摘要: 随着大型语言模型（LLMs）变得越来越强大和普遍，研究它们的失败案例变得越来越重要。最近在标准化、测量和扩展测试时间计算方面取得的进展提出了新的方法论，用于优化模型以在困难任务上实现高性能。本文将这些进展应用于模型越狱的任务：从对齐的LLMs中引发有害响应。我们开发了一种自动越狱的对抗推理方法，利用损失信号来引导测试时间计算，实现了对许多对齐的LLMs的最先进攻击成功率，即使这些LLMs旨在通过交换推理时间计算来增强对抗鲁棒性。我们的方法引入了一种新范式，用于理解LLM的脆弱性，为开发更强大和可信赖的人工智能系统奠定了基础。

更新时间: 2025-06-25 15:31:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.01633v2

Physics-Informed Machine Learning Regulated by Finite Element Analysis for Simulation Acceleration of Laser Powder Bed Fusion

Efficient simulation of Laser Powder Bed Fusion (LPBF) is crucial for process prediction due to the lasting issue of high computation cost using traditional numerical methods such as finite element analysis (FEA). This study presents an efficient modeling framework termed FEA-Regulated Physics-Informed Neural Network (FEA-PINN) to accelerate the thermal field prediction in a LPBF process while maintaining the FEA accuracy. A novel dynamic material updating strategy is developed to capture the dynamic phase change of powder-liquid-solid in the PINN model. The PINN model incorporates temperature-dependent material properties and phase change behavior using the apparent heat capacity method. While the PINN model demonstrates high accuracy with a small training data and enables generalization of new process parameters via transfer learning, it faces the challenge of high computation cost in time-dependent problems due to the residual accumulation. To overcome this issue, the FEA-PINN framework integrates corrective FEA simulations during inference to enforce physical consistency and reduce error drift. A comparative analysis shows that FEA-PINN achieves equivalent accuracy to FEA while significantly reducing computational cost. The framework has been validated using the benchmark FEA data and demonstrated through single-track scanning in LPBF.

Updated: 2025-06-25 15:25:01

标题: 物理信息机器学习在有限元分析监管下用于激光粉床熔合模拟加速

摘要: 激光粉床熔化（LPBF）的高效模拟对于工艺预测至关重要，因为传统的数值方法（如有限元分析）计算成本高昂。本研究提出了一种称为FEA-Regulated Physics-Informed Neural Network (FEA-PINN)的高效建模框架，用于加速LPBF过程中的热场预测，同时保持FEA的准确性。开发了一种新颖的动态材料更新策略，以捕捉PINN模型中粉末-液体-固体的动态相变。PINN模型利用明显热容量方法融入了温度相关的材料特性和相变行为。虽然PINN模型在小训练数据下表现出高精度，并通过迁移学习实现新工艺参数的泛化，但由于残差积累，在时间相关问题中面临计算成本高的挑战。为了克服这一问题，FEA-PINN框架在推断期间集成了校正的FEA模拟，以强化物理一致性并减少误差漂移。比较分析表明，FEA-PINN在显著降低计算成本的同时实现了与FEA相当的精度。该框架已使用基准FEA数据进行验证，并通过LPBF中的单轨迹扫描进行演示。

更新时间: 2025-06-25 15:25:01

领域: cs.LG

下载: http://arxiv.org/abs/2506.20537v1

WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads

The rapid advancement of AI, particularly large language models (LLMs), has raised significant concerns about the energy use and carbon emissions associated with model training and inference. However, existing tools for measuring and reporting such impacts are often fragmented, lacking systematic metric integration and offering limited support for correlation analysis among them. This paper presents WattsOnAI, a comprehensive software toolkit for the measurement, analysis, and visualization of energy use, power draw, hardware performance, and carbon emissions across AI workloads. By seamlessly integrating with existing AI frameworks, WattsOnAI offers standardized reports and exports fine-grained time-series data to support benchmarking and reproducibility in a lightweight manner. It further enables in-depth correlation analysis between hardware metrics and model performance and thus facilitates bottleneck identification and performance enhancement. By addressing critical limitations in existing tools, WattsOnAI encourages the research community to weigh environmental impact alongside raw performance of AI workloads and advances the shift toward more sustainable "Green AI" practices. The code is available at https://github.com/SusCom-Lab/WattsOnAI.

Updated: 2025-06-25 15:24:45

标题: WattsOnAI: 测量、分析和可视化AI工作负载的能源和碳足迹

摘要: 人工智能的快速发展，尤其是大型语言模型（LLMs），引发了对模型训练和推理所涉及的能源使用和碳排放的重大关注。然而，现有用于衡量和报告这些影响的工具往往是零散的，缺乏系统性的度量集成，并在它们之间提供有限的相关性分析支持。本文介绍了WattsOnAI，这是一个全面的软件工具包，用于测量、分析和可视化跨人工智能工作负载的能源使用、功耗、硬件性能和碳排放。通过与现有的人工智能框架无缝集成，WattsOnAI提供标准化报告，并导出细粒度的时间序列数据，以支持基准测试和可重复性，以轻量的方式。它进一步实现了硬件指标和模型性能之间的深入相关性分析，从而促进了瓶颈识别和性能增强。通过解决现有工具中的关键限制，WattsOnAI鼓励研究社区在评估人工智能工作负载的原始性能的同时考虑环境影响，并推动向更可持续的“绿色人工智能”实践的转变。代码可在https://github.com/SusCom-Lab/WattsOnAI获取。

更新时间: 2025-06-25 15:24:45

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.20535v1

Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery

Robust subspace estimation is fundamental to many machine learning and data analysis tasks. Iteratively Reweighted Least Squares (IRLS) is an elegant and empirically effective approach to this problem, yet its theoretical properties remain poorly understood. This paper establishes that, under deterministic conditions, a variant of IRLS with dynamic smoothing regularization converges linearly to the underlying subspace from any initialization. We extend these guarantees to affine subspace estimation, a setting that lacks prior recovery theory. Additionally, we illustrate the practical benefits of IRLS through an application to low-dimensional neural network training. Our results provide the first global convergence guarantees for IRLS in robust subspace recovery and, more broadly, for nonconvex IRLS on a Riemannian manifold.

Updated: 2025-06-25 15:23:32

标题: 全局收敛的迭代重新加权最小二乘法用于鲁棒子空间恢复

摘要: 坚固的子空间估计对许多机器学习和数据分析任务至关重要。迭代重新加权最小二乘（IRLS）是解决这个问题的一种优雅且经验有效的方法，然而其理论性质仍然被理解不足。本文证明，在确定性条件下，具有动态平滑正则化的IRLS变种从任何初始化都可线性收敛到基础子空间。我们将这些保证扩展到仿射子空间估计，这是一个缺乏先前恢复理论的情境。此外，我们通过应用于低维神经网络训练来说明IRLS的实际优势。我们的结果为在坚固子空间恢复中的IRLS提供了首个全局收敛保证，并且更广泛地，为在黎曼流形上的非凸IRLS提供了保证。

更新时间: 2025-06-25 15:23:32

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2506.20533v1

Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios

Driving in safety-critical scenarios requires quick, context-aware decision-making grounded in both situational understanding and experiential reasoning. Large Language Models (LLMs), with their powerful general-purpose reasoning capabilities, offer a promising foundation for such decision-making. However, their direct application to autonomous driving remains limited due to challenges in domain adaptation, contextual grounding, and the lack of experiential knowledge needed to make reliable and interpretable decisions in dynamic, high-risk environments. To address this gap, this paper presents a Case-Based Reasoning Augmented Large Language Model (CBR-LLM) framework for evasive maneuver decision-making in complex risk scenarios. Our approach integrates semantic scene understanding from dashcam video inputs with the retrieval of relevant past driving cases, enabling LLMs to generate maneuver recommendations that are both context-sensitive and human-aligned. Experiments across multiple open-source LLMs show that our framework improves decision accuracy, justification quality, and alignment with human expert behavior. Risk-aware prompting strategies further enhance performance across diverse risk types, while similarity-based case retrieval consistently outperforms random sampling in guiding in-context learning. Case studies further demonstrate the framework's robustness in challenging real-world conditions, underscoring its potential as an adaptive and trustworthy decision-support tool for intelligent driving systems.

Updated: 2025-06-25 15:19:25

标题: 基于案例推理的增强大型语言模型框架在现实安全关键驾驶场景中的决策制定中的应用

摘要: 在安全关键场景中驾驶需要快速、具有上下文意识的决策，这种决策基于对情境理解和经验推理的结合。大型语言模型（LLMs）具有强大的通用推理能力，为这种决策提供了有希望的基础。然而，由于领域适应、情境基础和缺乏在动态、高风险环境中做出可靠和可解释决策所需的经验知识，它们在自动驾驶中的直接应用仍然受到限制。为了填补这一空白，本文提出了一种基于案例推理增强的大型语言模型（CBR-LLM）框架，用于复杂风险场景中的规避机动决策。我们的方法将来自行车载摄像头视频输入的语义场景理解与相关过往驾驶案例的检索结合起来，使LLMs能够生成既具有上下文敏感性又与人类行为一致的机动建议。跨多个开源LLMs的实验表明，我们的框架提高了决策准确性、论证质量，并与人类专家行为更加一致。风险感知提示策略进一步增强了对不同风险类型的性能，而基于相似性的案例检索始终优于随机抽样，在指导上下文学习方面表现出色。案例研究进一步证明了该框架在挑战性的现实世界条件下的稳健性，强调其作为智能驾驶系统的自适应和可靠决策支持工具的潜力。

更新时间: 2025-06-25 15:19:25

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2506.20531v1

Attention with Trained Embeddings Provably Selects Important Tokens

Token embeddings play a crucial role in language modeling but, despite this practical relevance, their theoretical understanding remains limited. Our paper addresses the gap by characterizing the structure of embeddings obtained via gradient descent. Specifically, we consider a one-layer softmax attention model with a linear head for binary classification, i.e., $\texttt{Softmax}( p^\top E_X^\top ) E_X v = \frac{ \sum_{i=1}^T \exp(p^\top E_{x_i}) E_{x_i}^\top v}{\sum_{j=1}^T \exp(p^\top E_{x_{j}}) }$, where $E_X = [ E_{x_1} , \dots, E_{x_T} ]^\top$ contains the embeddings of the input sequence, $p$ is the embedding of the $\mathrm{\langle cls \rangle}$ token and $v$ the output vector. First, we show that, already after a single step of gradient training with the logistic loss, the embeddings $E_X$ capture the importance of tokens in the dataset by aligning with the output vector $v$ proportionally to the frequency with which the corresponding tokens appear in the dataset. Then, after training $p$ via gradient flow until convergence, the softmax selects the important tokens in the sentence (i.e., those that are predictive of the label), and the resulting $\mathrm{\langle cls \rangle}$ embedding maximizes the margin for such a selection. Experiments on real-world datasets (IMDB, Yelp) exhibit a phenomenology close to that unveiled by our theory.

Updated: 2025-06-25 15:19:05

标题: 使用经过训练的嵌入，关注明显选择重要的标记。

摘要: 令牌嵌入在语言建模中起着至关重要的作用，但尽管其实际相关性，其理论理解仍然有限。我们的论文通过表征通过梯度下降获得的嵌入的结构来填补这一空白。具体来说，我们考虑一个具有线性头部的一层softmax注意力模型进行二元分类，即$\texttt{Softmax}( p^\top E_X^\top ) E_X v = \frac{ \sum_{i=1}^T \exp(p^\top E_{x_i}) E_{x_i}^\top v}{\sum_{j=1}^T \exp(p^\top E_{x_{j}}) }$，其中$E_X = [ E_{x_1} , \dots, E_{x_T} ]^\top$包含输入序列的嵌入，$p$是$\mathrm{\langle cls \rangle}$令牌的嵌入，$v$是输出向量。首先，我们展示了，在利用逻辑损失进行单步梯度训练后，嵌入$E_X$通过与输出向量$v$的对齐捕捉了数据集中令牌的重要性，与其在数据集中出现的频率成比例。然后，在通过梯度流训练$p$直至收敛后，softmax选择了句子中的重要令牌（即那些对标签具有预测能力的令牌），而结果的$\mathrm{\langle cls \rangle}$嵌入最大化了这种选择的边际。对真实世界数据集（IMDB、Yelp）的实验展示了与我们理论揭示的现象学接近的现象。

更新时间: 2025-06-25 15:19:05

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2505.17282v3

Variational Learning Finds Flatter Solutions at the Edge of Stability

Variational Learning (VL) has recently gained popularity for training deep neural networks and is competitive to standard learning methods. Part of its empirical success can be explained by theories such as PAC-Bayes bounds, minimum description length and marginal likelihood, but there are few tools to unravel the implicit regularization in play. Here, we analyze the implicit regularization of VL through the Edge of Stability (EoS) framework. EoS has previously been used to show that gradient descent can find flat solutions and we extend this result to VL to show that it can find even flatter solutions. This is obtained by controlling the posterior covariance and the number of Monte Carlo samples from the posterior. These results are derived in a similar fashion as the standard EoS literature for deep learning, by first deriving a result for a quadratic problem and then extending it to deep neural networks. We empirically validate these findings on a wide variety of large networks, such as ResNet and ViT, to find that the theoretical results closely match the empirical ones. Ours is the first work to analyze the EoS dynamics in VL.

Updated: 2025-06-25 15:17:32

标题: 变分学习在稳定边缘找到更平缓的解答

摘要: 变分学习（VL）最近在训练深度神经网络方面变得流行，并且与标准学习方法竞争激烈。其经验成功的部分可以通过PAC-Bayes界限、最小描述长度和边际似然等理论来解释，但目前很少有工具可以解开潜在的正则化机制。在这里，我们通过稳定边缘（EoS）框架分析了VL的隐式正则化。EoS先前被用来展示梯度下降可以找到平坦解，并且我们将这一结果扩展到VL，以显示它甚至可以找到更平坦的解决方案。这是通过控制后验协方差和后验蒙特卡罗样本的数量来实现的。这些结果的推导方式与用于深度学习的标准EoS文献类似，首先推导出一个二次问题的结果，然后将其扩展到深度神经网络。我们在各种大型网络（如ResNet和ViT）上进行了实证验证，发现理论结果与实证结果非常吻合。我们的工作是第一个分析VL中EoS动态的工作。

更新时间: 2025-06-25 15:17:32

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2506.12903v2

Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

A central question in multilingual language modeling is whether large language models (LLMs) develop a universal concept representation, disentangled from specific languages. In this paper, we address this question by analyzing latent representations (latents) during a word-translation task in transformer-based LLMs. We strategically extract latents from a source translation prompt and insert them into the forward pass on a target translation prompt. By doing so, we find that the output language is encoded in the latent at an earlier layer than the concept to be translated. Building on this insight, we conduct two key experiments. First, we demonstrate that we can change the concept without changing the language and vice versa through activation patching alone. Second, we show that patching with the mean representation of a concept across different languages does not affect the models' ability to translate it, but instead improves it. Finally, we generalize to multi-token generation and demonstrate that the model can generate natural language description of those mean representations. Our results provide evidence for the existence of language-agnostic concept representations within the investigated models.

Updated: 2025-06-25 15:16:54

标题: 分离舌头与思想：激活修补揭示了变压器中语言不可知的概念表示

摘要: 多语言语言建模中的一个核心问题是大型语言模型（LLMs）是否会形成一个与特定语言解耦的通用概念表征。在本文中，我们通过分析基于transformer的LLMs在单词翻译任务中的潜在表示（潜在）来回答这个问题。我们从源翻译提示中策略性地提取潜在，并将其插入到目标翻译提示的前向传递中。通过这样做，我们发现输出语言在潜在中比要翻译的概念更早地被编码在一个层中。基于这一发现，我们进行了两个关键实验。首先，我们证明仅通过激活修补可以改变概念而不改变语言，反之亦然。其次，我们展示使用不同语言中概念的平均表示进行修补不会影响模型翻译它的能力，反而会改善。最后，我们将结果推广到多令牌生成，并展示模型可以生成这些平均表示的自然语言描述。我们的结果为所研究的模型中存在语言无关概念表示提供了证据。

更新时间: 2025-06-25 15:16:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.08745v4

Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains

The human-robot interaction (HRI) is a growing area of research. In HRI, complex command (action) classification is still an open problem that usually prevents the real applicability of such a technique. The literature presents some works that use neural networks to detect these actions. However, occlusion is still a major issue in HRI, especially when using uncrewed aerial vehicles (UAVs), since, during the robot's movement, the human operator is often out of the robot's field of view. Furthermore, in multi-robot scenarios, distributed training is also an open problem. In this sense, this work proposes an action recognition and control approach based on Long Short-Term Memory (LSTM) Deep Neural Networks with two layers in association with three densely connected layers and Federated Learning (FL) embedded in multiple drones. The FL enabled our approach to be trained in a distributed fashion, i.e., access to data without the need for cloud or other repositories, which facilitates the multi-robot system's learning. Furthermore, our multi-robot approach results also prevented occlusion situations, with experiments with real robots achieving an accuracy greater than 96%.

Updated: 2025-06-25 15:15:12

标题: 使用联邦学习对无人机进行近端控制，用于人机协作领域

摘要: 人机交互（HRI）是一个不断发展的研究领域。在HRI中，复杂命令（动作）分类仍然是一个尚未解决的问题，通常阻碍了这种技术的实际应用。文献中提出了一些使用神经网络来检测这些动作的研究。然而，在HRI中，遮挡仍然是一个主要问题，特别是在使用无人机时，因为在机器人移动过程中，人类操作员经常超出机器人的视野。此外，在多机器人场景中，分布式训练也是一个尚未解决的问题。在这方面，本文提出了一种基于具有两个层的长短时记忆（LSTM）深度神经网络与三个密集连接层和嵌入多个无人机中的联邦学习（FL）的动作识别和控制方法。FL使我们的方法能够以分布式方式进行训练，即可以访问数据而无需云端或其他存储库，从而便利了多机器人系统的学习。此外，我们的多机器人方法还避免了遮挡情况，通过对真实机器人进行实验，实现了大于96%的准确率。

更新时间: 2025-06-25 15:15:12

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.02863v2

Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation

Industrial Non-Intrusive Load Monitoring (NILM) is limited by the scarcity of high-quality datasets and the complex variability of industrial energy consumption patterns. To address data scarcity and privacy issues, we introduce the Synthetic Industrial Dataset for Energy Disaggregation (SIDED), an open-source dataset generated using Digital Twin simulations. SIDED includes three types of industrial facilities across three different geographic locations, capturing diverse appliance behaviors, weather conditions, and load profiles. We also propose the Appliance-Modulated Data Augmentation (AMDA) method, a computationally efficient technique that enhances NILM model generalization by intelligently scaling appliance power contributions based on their relative impact. We show in experiments that NILM models trained with AMDA-augmented data significantly improve the disaggregation of energy consumption of complex industrial appliances like combined heat and power systems. Specifically, in our out-of-sample scenarios, models trained with AMDA achieved a Normalized Disaggregation Error of 0.093, outperforming models trained without data augmentation (0.451) and those trained with random data augmentation (0.290). Data distribution analyses confirm that AMDA effectively aligns training and test data distributions, enhancing model generalization.

Updated: 2025-06-25 15:10:43

标题: 数字孪生生成数据集和高效数据增强技术在工业能源分解中的应用

摘要: 工业非侵入式负载监测（NILM）受制于高质量数据集的稀缺性以及工业能耗模式的复杂变化。为了解决数据稀缺性和隐私问题，我们引入了用数字孪生模拟生成的合成工业能耗分解（SIDED）开源数据集。SIDED包括三种类型的工业设施，分布在三个不同的地理位置，捕捉了多样的设备行为、天气条件和负载特征。我们还提出了一种名为家电调制数据增强（AMDA）的计算有效技术，通过智能调整家电功率贡献的比例以增强NILM模型的泛化能力。实验证明，使用AMDA增强数据训练的NILM模型显著改善了对复杂工业设备（如联合供热和电力系统）能耗的分解。具体而言，在我们的样本外情景中，AMDA训练的模型实现了0.093的标准化分解误差，优于没有数据增强训练的模型（0.451）以及使用随机数据增强训练的模型（0.290）。数据分布分析证实，AMDA有效地调整了训练和测试数据分布，增强了模型的泛化能力。

更新时间: 2025-06-25 15:10:43

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2506.20525v1

On Advancements of the Forward-Forward Algorithm

The Forward-Forward algorithm has evolved in machine learning research, tackling more complex tasks that mimic real-life applications. In the last years, it has been improved by several techniques to perform better than its original version, handling a challenging dataset like CIFAR10 without losing its flexibility and low memory usage. We have shown in our results that improvements are achieved through a combination of convolutional channel grouping, learning rate schedules, and independent block structures during training that lead to a 20\% decrease in test error percentage. Additionally, to approach further implementations on low-capacity hardware projects, we have presented a series of lighter models that achieve low test error percentages within (21$\pm$3)\% and number of trainable parameters between 164,706 and 754,386. This serves as a basis for our future study on complete verification and validation of these kinds of neural networks.

Updated: 2025-06-25 15:08:49

标题: 前向-前向算法的进展

摘要: 前向前向算法在机器学习研究中得到了发展，处理更复杂的任务，模拟真实应用场景。在过去几年里，通过几种技术的改进，它已经优于原始版本，在处理类似CIFAR10这样具有挑战性的数据集时，既保持了灵活性又减少了内存使用。我们的结果表明，改进是通过结合卷积通道分组、学习速率调度和训练过程中独立块结构来实现的，导致测试误差百分比下降了20\%。此外，为了进一步在低容量硬件项目上实现，我们提出了一系列轻量级模型，其测试误差百分比在(21±3)\%之间，可训练参数数量在164,706到754,386之间。这为我们未来对这类神经网络进行完整验证和验证的研究奠定了基础。

更新时间: 2025-06-25 15:08:49

领域: cs.LG

下载: http://arxiv.org/abs/2504.21662v2

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards

Reinforcement learning (RL) is increasingly used to align large language models (LLMs). Off-policy methods offer greater implementation simplicity and data efficiency than on-policy techniques, but often result in suboptimal performance. In this work, we study the intermediate range of algorithms between off-policy RL and supervised fine-tuning by analyzing a simple off-policy REINFORCE algorithm, where the advantage is defined as $A=r-V$, with $r$ a reward and $V$ some tunable baseline. Intuitively, lowering $V$ emphasizes high-reward samples, while raising it penalizes low-reward ones more heavily. We first provide a theoretical analysis of this off-policy REINFORCE algorithm, showing that when the baseline $V$ lower-bounds the expected reward, the algorithm enjoys a policy improvement guarantee. Our analysis reveals that while on-policy updates can safely leverage both positive and negative signals, off-policy updates benefit from focusing more on positive rewards than on negative ones. We validate our findings experimentally in a controlled stochastic bandit setting and through fine-tuning state-of-the-art LLMs on reasoning tasks.

Updated: 2025-06-25 15:07:16

标题: 非对称REINFORCE用于离线策略强化学习：平衡正向和负向奖励

摘要: 强化学习（RL）越来越被用于对齐大型语言模型（LLMs）。离策略方法比在策略技术具有更大的实现简易性和数据效率，但通常会导致次优性能。在这项工作中，我们研究了介于离策略RL和监督微调之间的算法中间范围，通过分析一个简单的离策略REINFORCE算法，其中优势定义为$ A = r-V $，其中$ r $为奖励，$ V $为可调基线。直观地，降低$ V $强调高奖励样本，而提高它则更严重地惩罚低奖励样本。我们首先对这个离策略REINFORCE算法进行了理论分析，结果表明当基线$ V $下限为期望奖励时，该算法享有一定的策略改进保证。我们的分析揭示了，虽然在策略更新可以安全地利用正负信号，离策略更新更多地受益于关注正奖励而不是负奖励。我们通过在受控随机赌博机环境中验证我们的发现，并通过对最先进的LLMs进行推理任务的微调。

更新时间: 2025-06-25 15:07:16

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2506.20520v1

VRAIL: Vectorized Reward-based Attribution for Interpretable Learning

We propose VRAIL (Vectorized Reward-based Attribution for Interpretable Learning), a bi-level framework for value-based reinforcement learning (RL) that learns interpretable weight representations from state features. VRAIL consists of two stages: a deep learning (DL) stage that fits an estimated value function using state features, and an RL stage that uses this to shape learning via potential-based reward transformations. The estimator is modeled in either linear or quadratic form, allowing attribution of importance to individual features and their interactions. Empirical results on the Taxi-v3 environment demonstrate that VRAIL improves training stability and convergence compared to standard DQN, without requiring environment modifications. Further analysis shows that VRAIL uncovers semantically meaningful subgoals, such as passenger possession, highlighting its ability to produce human-interpretable behavior. Our findings suggest that VRAIL serves as a general, model-agnostic framework for reward shaping that enhances both learning and interpretability.

Updated: 2025-06-25 15:06:17

标题: VRAIL：基于向量化奖励的可解释学习归因

摘要: 我们提出VRAIL（可解释学习的基于向量化奖励归因的值学习），这是一个用于值基强化学习（RL）的双层框架，它从状态特征中学习可解释的权重表示。VRAIL包括两个阶段：一个深度学习（DL）阶段，使用状态特征拟合估计值函数，以及一个RL阶段，利用这个函数通过基于潜在奖励变换来塑造学习。估计器可以采用线性或二次形式建模，从而可以将重要性归因于个体特征及其交互作用。在Taxi-v3环境上的实证结果表明，与标准DQN相比，VRAIL改善了训练稳定性和收敛性，而不需要环境修改。进一步的分析表明，VRAIL发现了语义上有意义的子目标，如乘客拥有，突显了其产生人类可解释行为的能力。我们的发现表明，VRAIL作为一个通用的、模型无关的奖励塑造框架，可以增强学习和可解释性。

更新时间: 2025-06-25 15:06:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.16014v3

WallStreetFeds: Client-Specific Tokens as Investment Vehicles in Federated Learning

Federated Learning (FL) is a collaborative machine learning paradigm which allows participants to collectively train a model while training data remains private. This paradigm is especially beneficial for sectors like finance, where data privacy, security and model performance are paramount. FL has been extensively studied in the years following its introduction, leading to, among others, better performing collaboration techniques, ways to defend against other clients trying to attack the model, and contribution assessment methods. An important element in for-profit Federated Learning is the development of incentive methods to determine the allocation and distribution of rewards for participants. While numerous methods for allocation have been proposed and thoroughly explored, distribution frameworks remain relatively understudied. In this paper, we propose a novel framework which introduces client-specific tokens as investment vehicles within the FL ecosystem. Our framework aims to address the limitations of existing incentive schemes by leveraging a decentralized finance (DeFi) platform and automated market makers (AMMs) to create a more flexible and scalable reward distribution system for participants, and a mechanism for third parties to invest in the federation learning process.

Updated: 2025-06-25 15:05:01

标题: 《华尔街联邦：客户特定代币作为联邦学习投资工具》

摘要: 联邦学习（FL）是一种协作式机器学习范例，允许参与者在训练数据保持私密的情况下共同训练模型。这种范例特别有利于金融等领域，其中数据隐私、安全和模型性能至关重要。FL在其引入后的几年中得到了广泛研究，导致了更好的合作技术、防御其他客户试图攻击模型的方法以及贡献评估方法等。在营利性联邦学习中一个重要的元素是开发激励方法以确定参与者的奖励分配和分发。尽管已经提出和深入探讨了许多分配方法，但分发框架仍然相对研究不足。在本文中，我们提出了一个新颖的框架，引入了客户特定的代币作为FL生态系统中的投资工具。我们的框架旨在通过利用去中心化金融（DeFi）平台和自动市场制造商（AMMs）来为参与者创建一个更灵活、可扩展的奖励分发系统，并为第三方投资于联邦学习过程提供机制，以解决现有激励计划的局限性。

更新时间: 2025-06-25 15:05:01

领域: cs.LG

下载: http://arxiv.org/abs/2506.20518v1

What Makes a Dribble Successful? Insights From 3D Pose Tracking Data

Data analysis plays an increasingly important role in soccer, offering new ways to evaluate individual and team performance. One specific application is the evaluation of dribbles: one-on-one situations where an attacker attempts to bypass a defender with the ball. While previous research has primarily relied on 2D positional tracking data, this fails to capture aspects like balance, orientation, and ball control, limiting the depth of current insights. This study explores how pose tracking data (capturing players' posture and movement in three dimensions) can improve our understanding of dribbling skills. We extract novel pose-based features from 1,736 dribbles in the 2022/23 Champions League season and evaluate their impact on dribble success. Our results indicate that features capturing the attacker's balance and the alignment of the orientation between the attacker and defender are informative for predicting dribble success. Incorporating these pose-based features on top of features derived from traditional 2D positional data leads to a measurable improvement in model performance.

Updated: 2025-06-25 15:01:30

标题: 什么让运球成功？来自3D姿势跟踪数据的见解

摘要: 数据分析在足球中扮演着越来越重要的角色，为评估个人和团队表现提供了新的方式。一个具体的应用是对盘带的评估：一对一情况下，进攻球员试图带球绕过防守球员。尽管先前的研究主要依赖于2D位置跟踪数据，但这种方法无法捕捉到平衡、方向和控球等方面，从而限制了当前洞察力的深度。本研究探讨了如何利用姿势跟踪数据（捕捉球员在三个维度上的姿势和移动）来提高我们对盘带技能的理解。我们从2022/23赛季的欧洲冠军联赛中提取了1,736次盘带的新颖基于姿势的特征，并评估了它们对盘带成功的影响。我们的结果表明，捕捉进攻球员平衡和进攻球员与防守球员之间方向对齐的特征对于预测盘带成功具有信息量。将这些基于姿势的特征与传统2D位置数据推导出的特征结合在一起，可以显著提高模型性能。

更新时间: 2025-06-25 15:01:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2506.22503v1

Fast ground penetrating radar dual-parameter full waveform inversion method accelerated by hybrid compilation of CUDA kernel function and PyTorch

This study proposes a high-performance dual-parameter full waveform inversion framework (FWI) for ground-penetrating radar (GPR), accelerated through the hybrid compilation of CUDA kernel functions and PyTorch. The method leverages the computational efficiency of GPU programming while preserving the flexibility and usability of Python-based deep learning frameworks. By integrating customized CUDA kernels into PyTorch's automatic differentiation mechanism, the framework enables accurate and efficient inversion of both dielectric permittivity and electrical conductivity. Experimental evaluations on synthetic data and real wavefield data demonstrate that the proposed method achieves dual-parameter FWI for GPR data while maintaining high accuracy. Moreover, the framework is flexible and extensible, supporting optional regularization strategies such as total variation and multi-scale inversion. These features make the proposed approach a practical and scalable framework for rapid GPR-based subsurface imaging in applications including civil engineering, environmental monitoring, and geophysical exploration.

Updated: 2025-06-25 15:00:33

标题: 快速地面探地雷达双参数全波形反演方法，通过CUDA核函数和PyTorch混合编译加速

摘要: 本研究提出了一种针对地质雷达（GPR）的高性能双参数全波形反演（FWI）框架，通过CUDA核函数和PyTorch的混合编译加速。该方法利用GPU编程的计算效率，同时保留基于Python的深度学习框架的灵活性和可用性。通过将定制的CUDA核函数集成到PyTorch的自动微分机制中，该框架实现了介电常数和电导率的准确高效反演。在合成数据和真实波场数据上的实验评估表明，所提出的方法实现了GPR数据的双参数FWI，同时保持高准确性。此外，该框架灵活可扩展，支持可选的正则化策略，如全变差和多尺度反演。这些特点使得所提出的方法成为一个实用且可扩展的框架，可用于快速的基于GPR的地下成像，在包括土木工程、环境监测和地球物理勘探等应用中。

更新时间: 2025-06-25 15:00:33

领域: physics.geo-ph,cs.LG,eess.SP

下载: http://arxiv.org/abs/2506.20513v1

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

Different base language model families, such as Llama and Qwen, exhibit divergent behaviors during post-training with reinforcement learning (RL), especially on reasoning-intensive tasks. What makes a base language model suitable for reinforcement learning? Gaining deeper insight into this question is essential for developing RL-scalable foundation models of the next generation. In this work, we investigate how mid-training strategies shape RL dynamics, focusing on two representative model families: Qwen and Llama. Our study reveals that (1) high-quality mathematical corpora, such as MegaMath-Web-Pro, significantly improve both base model and RL performance, while existing alternatives (e.g., FineMath-4plus) fail to do so; (2) further adding QA-style data, particularly long chain-of-thought (CoT) reasoning examples, enhances RL outcomes, and instruction data further unlocks this effect; (3) while long-CoT improves reasoning depth, it can also induce verbosity of model responses and unstability of RL training, underscoring the importance of data formatting; (4) scaling mid-training consistently leads to stronger downstream RL performance. Building on these insights, we introduce a two-stage mid-training strategy, Stable-then-Decay, in which base models are first trained on 200B tokens with a constant learning rate, followed by 20B tokens across three CoT-focused branches with learning rate decay. This yields OctoThinker, a family of models demonstrating strong RL compatibility and closing the performance gap with more RL-friendly model families, i.e., Qwen. We hope our work will help shape pre-training strategies for foundation models in the RL era. To support further research, we release our open-source models along with a curated math reasoning-intensive corpus of over 70 billion tokens (i.e., MegaMath-Web-Pro-Max).

Updated: 2025-06-25 14:58:13

标题: OctoThinker：中途奖励激励强化学习的扩展

摘要: 不同的基础语言模型系列，如Llama和Qwen，在强化学习（RL）后训练期间表现出不同的行为，尤其是在需要推理的任务上。是什么使一个基础语言模型适合强化学习？深入了解这个问题对于开发下一代RL可扩展基础模型至关重要。在这项工作中，我们研究了中期训练策略如何塑造RL动态，重点放在了两个典型的模型系列上：Qwen和Llama。我们的研究发现：（1）高质量的数学语料库，如MegaMath-Web-Pro，显著提高了基础模型和RL的性能，而现有的替代方案（例如FineMath-4plus）未能做到；（2）进一步添加QA风格的数据，特别是长链式思维（CoT）推理示例，提高了RL的结果，指导数据进一步解锁了这一效果；（3）尽管长CoT提高了推理深度，但也可能导致模型响应冗长和RL训练不稳定，强调了数据格式的重要性；（4）一致扩展中期训练始终会导致更强的下游RL性能。基于这些见解，我们引入了一个两阶段的中期训练策略，即Stable-then-Decay，在这种策略中，基础模型首先以恒定的学习率训练200B令牌，然后在三个以CoT为重点的分支上以学习率衰减训练20B令牌。这产生了OctoThinker，一系列展示强大RL兼容性并缩小与更适合RL的模型系列（即Qwen）之间性能差距的模型。我们希望我们的工作将有助于塑造RL时代基础模型的预训练策略。为了支持进一步的研究，我们发布了我们的开源模型，以及一个精心策划的数学推理密集型语料库，包含超过700亿个令牌（即MegaMath-Web-Pro-Max）。

更新时间: 2025-06-25 14:58:13

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.20512v1

Collaborative Batch Size Optimization for Federated Learning

Federated Learning (FL) is a decentralized collaborative Machine Learning framework for training models without collecting data in a centralized location. It has seen application across various disciplines, from helping medical diagnoses in hospitals to detecting fraud in financial transactions. In this paper, we focus on improving the local training process through hardware usage optimization. While participants in a federation might share the hardware they are training on, since there is no information exchange between them, their training process can be hindered by an improper training configuration. Taking advantage of the parallel processing inherent to Federated Learning, we use a greedy randomized search to optimize local batch sizes for the best training settings across all participants. Our results show that against default parameter settings, our method improves convergence speed while staying nearly on par with the case where local parameters are optimized.

Updated: 2025-06-25 14:57:23

标题: 协作式批量大小优化用于联邦学习

摘要: 联邦学习（FL）是一种去中心化的协作机器学习框架，用于在不收集数据的情况下训练模型。它已经在各个领域得到应用，从帮助医院进行医学诊断到检测金融交易中的欺诈行为。本文重点关注通过硬件使用优化改进本地训练过程。尽管联邦中的参与者可能共享他们正在训练的硬件，但由于它们之间没有信息交换，他们的训练过程可能会受到不恰当的训练配置的阻碍。利用联邦学习固有的并行处理，我们使用贪婪随机搜索来优化所有参与者的最佳训练设置的本地批次大小。我们的结果显示，在默认参数设置的情况下，我们的方法提高了收敛速度，同时与本地参数优化的情况几乎持平。

更新时间: 2025-06-25 14:57:23

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2506.20511v1

LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation

We propose a training-free method for open-vocabulary semantic segmentation using Vision-and-Language Models (VLMs). Our approach enhances the initial per-patch predictions of VLMs through label propagation, which jointly optimizes predictions by incorporating patch-to-patch relationships. Since VLMs are primarily optimized for cross-modal alignment and not for intra-modal similarity, we use a Vision Model (VM) that is observed to better capture these relationships. We address resolution limitations inherent to patch-based encoders by applying label propagation at the pixel level as a refinement step, significantly improving segmentation accuracy near class boundaries. Our method, called LPOSS+, performs inference over the entire image, avoiding window-based processing and thereby capturing contextual interactions across the full image. LPOSS+ achieves state-of-the-art performance among training-free methods, across a diverse set of datasets. Code: https://github.com/vladan-stojnic/LPOSS

Updated: 2025-06-25 14:53:33

标题: LPOSS：用于开放词汇语义分割的标签传播技术

摘要: 我们提出了一种用于开放词汇语义分割的无需训练的方法，使用Vision-and-Language Models（VLMs）。我们的方法通过标签传播增强了VLMs的每个patch的初始预测，通过整合patch之间的关系共同优化预测。由于VLMs主要针对跨模态对齐而不是内模态相似性进行了优化，我们使用了一个被观察到能更好地捕捉这些关系的Vision Model（VM）。我们通过在像素级别应用标签传播作为细化步骤来解决基于patch的编码器固有的分辨率限制，显著提高了在类边界附近的分割准确性。我们的方法，称为LPOSS+，在整个图像上执行推断，避免基于窗口的处理，从而捕捉到整个图像的上下文交互。LPOSS+在各种数据集中实现了最先进的性能，超越了无需训练的方法。代码：https://github.com/vladan-stojnic/LPOSS

更新时间: 2025-06-25 14:53:33

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.19777v2

Engineering Sentience

We spell out a definition of sentience that may be useful for designing and building it in machines. We propose that for sentience to be meaningful for AI, it must be fleshed out in functional, computational terms, in enough detail to allow for implementation. Yet, this notion of sentience must also reflect something essentially 'subjective', beyond just having the general capacity to encode perceptual content. For this specific functional notion of sentience to occur, we propose that certain sensory signals need to be both assertoric (persistent) and qualitative. To illustrate the definition in more concrete terms, we sketch out some ways for potential implementation, given current technology. Understanding what it takes for artificial agents to be functionally sentient can also help us avoid creating them inadvertently, or at least, realize that we have created them in a timely manner.

Updated: 2025-06-25 14:49:50

标题: 工程感知

摘要: 我们详细阐述了一个对于设计和构建机器中的感知性可能有用的定义。我们提议，为了让感知对于人工智能具有意义，它必须以功能性、计算性的术语来详细阐述，以便实现。然而，这种感知的概念也必须反映出一些本质上的“主观性”，不仅仅是具有编码感知内容的一般能力。为了实现这种特定的功能性概念，我们提出某些感官信号需要同时具有主张性（持续性）和定性。为了更具体地说明这一定义，我们勾勒了一些潜在实现方式，考虑到当前的技术。理解人工代理能够具有功能性感知所需的条件，也可以帮助我们避免无意中创造它们，或者至少可以让我们及时意识到我们已经创造了它们。

更新时间: 2025-06-25 14:49:50

领域: cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2506.20504v1

Unidentified and Confounded? Understanding Two-Tower Models for Unbiased Learning to Rank

Additive two-tower models are popular learning-to-rank methods for handling biased user feedback in industry settings. Recent studies, however, report a concerning phenomenon: training two-tower models on clicks collected by well-performing production systems leads to decreased ranking performance. This paper investigates two recent explanations for this observation: confounding effects from logging policies and model identifiability issues. We theoretically analyze the identifiability conditions of two-tower models, showing that either document swaps across positions or overlapping feature distributions are required to recover model parameters from clicks. We also investigate the effect of logging policies on two-tower models, finding that they introduce no bias when models perfectly capture user behavior. However, logging policies can amplify biases when models imperfectly capture user behavior, particularly when prediction errors correlate with document placement across positions. We propose a sample weighting technique to mitigate these effects and provide actionable insights for researchers and practitioners using two-tower models.

Updated: 2025-06-25 14:47:43

标题: 未知和混淆？理解两塔模型对无偏学习排序的影响

摘要: Additive two-tower模型是处理行业设置中偏向用户反馈的流行学习排序方法。然而，最近的研究报告了一个令人担忧的现象：在由表现良好的生产系统收集的点击上训练两塔模型会导致排名性能下降。本文调查了这一观察结果的两个最近的解释：来自日志策略的混淆效应和模型可辨识性问题。我们从理论上分析了两塔模型的可辨识条件，显示要从点击中恢复模型参数，需要跨位置交换文档或重叠特征分布。我们还研究了日志策略对两塔模型的影响，发现当模型完全捕捉用户行为时，它们不会引入偏差。然而，当模型不完全捕捉用户行为时，日志策略可能会放大偏见，特别是当预测错误与跨位置的文档放置相关时。我们提出了一种样本加权技术来减轻这些影响，并为使用两塔模型的研究人员和从业者提供可操作的见解。

更新时间: 2025-06-25 14:47:43

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2506.20501v1

Training Plug-n-Play Knowledge Modules with Deep Context Distillation

Dynamically integrating new or rapidly evolving information after (Large) Language Model pre-training remains challenging, particularly in low-data scenarios or when dealing with private and specialized documents. In-context learning and retrieval-augmented generation (RAG) face limitations, including their high inference costs and their inability to capture global document information. In this paper, we propose a way of modularizing knowledge by training document-level Knowledge Modules (KMs). KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents and can be easily plugged into models on demand. We show that next-token prediction performs poorly as the training objective for KMs. We instead propose Deep Context Distillation: we learn KMs parameters such as to simulate hidden states and logits of a teacher that takes the document in context. Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets. Finally, we highlight synergies between KMs and RAG.

Updated: 2025-06-25 14:45:56

标题: 使用深层上下文蒸馏训练即插即用知识模块

摘要: 在大型语言模型预训练后动态集成新的或快速演化的信息仍然具有挑战性，特别是在低数据场景或处理私人和专业文档时。在上下文学习和检索增强生成（RAG）面临限制，包括它们的高推理成本和无法捕捉全局文档信息。在本文中，我们提出了通过训练文档级知识模块（KMs）来模块化知识的方法。KMs是轻量级组件，实现为参数高效的LoRA模块，训练以存储有关新文档的信息，并可以根据需要轻松插入模型中。我们发现，作为KMs训练目标的下一个令牌预测表现不佳。相反，我们提出了深层上下文蒸馏：我们学习KMs参数，以模拟以上下文方式接受文档的教师的隐藏状态和逻辑。我们的方法在两个数据集上表现优于标准的下一个令牌预测和预先指导训练技术。最后，我们强调KMs和RAG之间的协同效应。

更新时间: 2025-06-25 14:45:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.08727v3

Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Reasoning capabilities represent a critical frontier for large language models (LLMs), but developing them requires extensive proprietary datasets and computational resources. One way to efficiently supplement capabilities with is by model merging, which offers a promising alternative by combining multiple models without retraining. However, current merging approaches rely on manually-designed strategies for merging hyperparameters, limiting the exploration of potential model combinations and requiring significant human effort. We propose an Automated Model Merging Framework that enables fine-grained exploration of merging strategies while reducing costs through multi-fidelity approximations. We support both single and multi-objective optimization and introduce two novel search spaces: layerwise fusion (LFS) and depth-wise integration (DIS). Evaluating across a number of benchmarks, we find that the search autonomously finds 1) Merges that further boost single-objective performance, even on tasks the model has already been finetuned on, and 2) Merges that optimize multi-objective frontiers across tasks. Effective merges are found with limited compute, e.g. within less than 500 search steps.

Updated: 2025-06-25 14:44:30

标题: 好的，我会自己合并：用于自动模型合并的多保真度框架

摘要: 推理能力代表了大型语言模型（LLMs）的一个关键前沿，但是开发这种能力需要大量的专有数据集和计算资源。一种有效地增强能力的方法是通过模型合并，该方法通过将多个模型结合在一起而无需重新训练提供了一种有前途的替代方案。然而，当前的合并方法依赖于手动设计的超参数合并策略，限制了潜在模型组合的探索，并需要大量人力投入。我们提出了一个自动模型合并框架，通过多精度近似减少成本，实现了对合并策略的细粒度探索。我们支持单目标和多目标优化，并引入了两种新的搜索空间：层次融合（LFS）和深度集成（DIS）。通过对多个基准测试进行评估，我们发现该搜索自主地发现了1) 进一步提升单目标性能的合并，甚至在模型已经进行了微调的任务上，以及2) 优化跨任务的多目标前沿的合并。有效的合并可以在有限的计算资源下找到，例如在不到500个搜索步骤内。

更新时间: 2025-06-25 14:44:30

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.04030v2

ReCode: Updating Code API Knowledge with Reinforcement Learning

Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their training data, even with access to current documentation, impedes reliable code generation in dynamic environments. To tackle this issue, we propose ReCode (rule-based Reinforcement learning for Code Update), a novel framework that mimics human programmer adaptation to API changes. Specifically, we construct a dataset of approximately 2,000 data entries to train the LLMs to perform version migration based on updated information. Then, we introduce a modified string similarity metric for code evaluation as the reward for reinforcement learning. Our experiments demonstrate that ReCode substantially boosts LLMs' code generation performance in dynamic API scenarios, especially on the unseen CodeUpdateArena task. Crucially, compared to supervised fine-tuning, ReCode has less impact on LLMs' general code generation abilities. We apply ReCode on various LLMs and reinforcement learning algorithms (GRPO and DAPO), all achieving consistent improvements. Notably, after training, Qwen2.5-Coder-7B outperforms that of the 32B parameter code instruction-tuned model and the reasoning model with the same architecture. Code is available at https://github.com/zjunlp/ReCode.

Updated: 2025-06-25 14:41:13

标题: ReCode：使用强化学习更新代码API知识

摘要: 大型语言模型（LLMs）展示了卓越的代码生成能力，但在适应外部库API频繁更新时表现不佳。这一关键限制源于它们在训练数据中对过时API知识的依赖，即使可以访问当前文档，也会阻碍在动态环境中可靠的代码生成。为了解决这个问题，我们提出了ReCode（基于规则的代码更新强化学习），这是一个模拟人类程序员适应API变化的新框架。具体地，我们构建了一个约2000个数据条目的数据集，用于训练LLMs基于更新信息进行版本迁移。然后，我们引入了一个修改后的字符串相似度度量作为强化学习的奖励来评估代码。我们的实验表明，ReCode显著提高了LLMs在动态API场景中的代码生成性能，特别是在未见过的CodeUpdateArena任务中。至关重要的是，与监督微调相比，ReCode对LLMs的一般代码生成能力影响较小。我们将ReCode应用于各种LLMs和强化学习算法（GRPO和DAPO），均取得了一致的改进。值得注意的是，在训练后，Qwen2.5-Coder-7B的表现优于具有相同架构的32B参数代码指令调整模型和推理模型。代码可在https://github.com/zjunlp/ReCode找到。

更新时间: 2025-06-25 14:41:13

领域: cs.CL,cs.AI,cs.IR,cs.LG,cs.SE

下载: http://arxiv.org/abs/2506.20495v1

Multimodal Representation Learning and Fusion

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each modality, multi-modal learning allows AI systems to build stronger and richer internal representations. These help machines better interpretation, reasoning, and making decisions in real-life situations. This field includes core techniques such as representation learning (to get shared features from different data types), alignment methods (to match information across modalities), and fusion strategies (to combine them by deep learning models). Although there has been good progress, some major problems still remain. Like dealing with different data formats, missing or incomplete inputs, and defending against adversarial attacks. Researchers now are exploring new methods, such as unsupervised or semi-supervised learning, AutoML tools, to make models more efficient and easier to scale. And also more attention on designing better evaluation metrics or building shared benchmarks, make it easier to compare model performance across tasks and domains. As the field continues to grow, multi-modal learning is expected to improve many areas: computer vision, natural language processing, speech recognition, and healthcare. In the future, it may help to build AI systems that can understand the world in a way more like humans, flexible, context aware, and able to deal with real-world complexity.

Updated: 2025-06-25 14:40:09

标题: 多模态表示学习与融合

摘要: 多模态学习是人工智能中一个快速增长的领域。它试图通过结合来自不同来源的信息，如图像、文本和音频，帮助机器理解复杂的事物。通过利用每种模态的优势，多模态学习使AI系统能够构建更强大和更丰富的内部表示。这些帮助机器更好地解释、推理和在现实生活中做出决策。这个领域包括核心技术，如表示学习（从不同数据类型中获得共享特征）、对齐方法（跨模态匹配信息）和融合策略（通过深度学习模型将它们结合起来）。尽管取得了良好的进展，但一些主要问题仍然存在。比如处理不同的数据格式、缺失或不完整的输入，以及防御对抗性攻击。研究人员现在正在探索新的方法，如无监督或半监督学习、AutoML工具，以使模型更高效且更容易扩展。同时，更多的关注也放在设计更好的评估指标或构建共享基准上，以便更容易地比较模型在任务和领域之间的表现。随着这一领域的不断发展，多模态学习有望改进许多领域：计算机视觉、自然语言处理、语音识别和医疗保健。在未来，它可能有助于构建更像人类的AI系统，灵活、具有上下文意识，并能够处理现实世界的复杂性。

更新时间: 2025-06-25 14:40:09

领域: cs.LG,cs.MM

下载: http://arxiv.org/abs/2506.20494v1

Non-equilibrium Annealed Adjoint Sampler

Recently, there has been significant progress in learning-based diffusion samplers, which aim to sample from a given unnormalized density. These methods typically follow one of two paradigms: (i) formulating sampling as an unbiased stochastic optimal control (SOC) problem using a canonical reference process, or (ii) refining annealed path measures through importance-weighted sampling. Although annealing approaches have advantages in guiding samples toward high-density regions, reliance on importance sampling leads to high variance and limited scalability in practice. In this paper, we introduce the \textbf{Non-equilibrium Annealed Adjoint Sampler (NAAS)}, a novel SOC-based diffusion sampler that leverages annealed reference dynamics without resorting to importance sampling. NAAS employs a lean adjoint system inspired by adjoint matching, enabling efficient and scalable training. We demonstrate the effectiveness of our approach across a range of tasks, including sampling from classical energy landscapes and molecular Boltzmann distribution.

Updated: 2025-06-25 14:39:40

标题: 非平衡退火调节抽样器

摘要: 最近，在学习基于扩散的采样器方面取得了显著进展，旨在从给定的非归一化密度中采样。这些方法通常遵循两种范式之一：（i）将采样表述为使用经典参考过程的无偏随机最优控制（SOC）问题，或者（ii）通过重要性加权采样来改进退火路径测度。尽管退火方法在引导样本朝向高密度区域方面具有优势，但依赖重要性采样会导致高方差和在实践中受限的可扩展性。在本文中，我们介绍了\textbf{非平衡退火共轭采样器（NAAS）}，这是一种新颖的基于SOC的扩散采样器，利用了退火参考动力学而不依赖于重要性采样。NAAS采用了受共轭匹配启发的精简共轭系统，实现了高效和可扩展的训练。我们跨越一系列任务展示了我们方法的有效性，包括从经典能量景观和分子玻尔兹曼分布中采样。

更新时间: 2025-06-25 14:39:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.18165v2

CLAIM: Clinically-Guided LGE Augmentation for Realistic and Diverse Myocardial Scar Synthesis and Segmentation

Deep learning-based myocardial scar segmentation from late gadolinium enhancement (LGE) cardiac MRI has shown great potential for accurate and timely diagnosis and treatment planning for structural cardiac diseases. However, the limited availability and variability of LGE images with high-quality scar labels restrict the development of robust segmentation models. To address this, we introduce CLAIM: \textbf{C}linically-Guided \textbf{L}GE \textbf{A}ugmentation for Real\textbf{i}stic and Diverse \textbf{M}yocardial Scar Synthesis and Segmentation framework, a framework for anatomically grounded scar generation and segmentation. At its core is the SMILE module (Scar Mask generation guided by cLinical knowledgE), which conditions a diffusion-based generator on the clinically adopted AHA 17-segment model to synthesize images with anatomically consistent and spatially diverse scar patterns. In addition, CLAIM employs a joint training strategy in which the scar segmentation network is optimized alongside the generator, aiming to enhance both the realism of synthesized scars and the accuracy of the scar segmentation performance. Experimental results show that CLAIM produces anatomically coherent scar patterns and achieves higher Dice similarity with real scar distributions compared to baseline models. Our approach enables controllable and realistic myocardial scar synthesis and has demonstrated utility for downstream medical imaging task. Code is available at https://github.com/farheenjabeen/CLAIM-Scar-Synthesis.

Updated: 2025-06-25 14:37:57

标题: 主张：临床指导下的LGE增强用于逼真多样的心肌瘢痕合成与分割.

摘要: 基于深度学习的心肌瘢痕分割方法已经显示出在结构性心脏疾病的准确和及时诊断以及治疗规划方面具有巨大潜力。然而，LGE心脏MRI图像的有限可用性和高质量瘢痕标签的变异性限制了健壮分割模型的发展。为了解决这个问题，我们引入了CLAIM：\textbf{C}linically-Guided \textbf{L}GE \textbf{A}ugmentation for Real\textbf{i}stic and Diverse \textbf{M}yocardial Scar Synthesis and Segmentation framework，一个基于解剖学的瘢痕生成和分割框架。其核心是SMILE模块（Scar Mask generation guided by cLinical knowledgE），该模块在临床采用的AHA 17段模型的指导下，通过扩散生成器来合成具有解剖一致性和空间多样性瘢痕模式的图像。此外，CLAIM采用联合训练策略，其中瘢痕分割网络与生成器一起优化，旨在增强合成瘢痕的真实感和瘢痕分割性能的准确性。实验结果表明，与基线模型相比，CLAIM产生了解剖一致的瘢痕模式，并实现了更高的Dice相似性与真实瘢痕分布。我们的方法实现了可控和真实的心肌瘢痕合成，并已经证明在下游医学影像任务中具有实用性。代码可在https://github.com/farheenjabeen/CLAIM-Scar-Synthesis 上获得。

更新时间: 2025-06-25 14:37:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.15549v2

Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning

Offline Goal-Conditioned Reinforcement Learning seeks to train agents to reach specified goals from previously collected trajectories. Scaling that promises to long-horizon tasks remains challenging, notably due to compounding value-estimation errors. Principled geometric offers a potential solution to address these issues. Following this insight, we introduce Projective Quasimetric Planning (ProQ), a compositional framework that learns an asymmetric distance and then repurposes it, firstly as a repulsive energy forcing a sparse set of keypoints to uniformly spread over the learned latent space, and secondly as a structured directional cost guiding towards proximal sub-goals. In particular, ProQ couples this geometry with a Lagrangian out-of-distribution detector to ensure the learned keypoints stay within reachable areas. By unifying metric learning, keypoint coverage, and goal-conditioned control, our approach produces meaningful sub-goals and robustly drives long-horizon goal-reaching on diverse a navigation benchmarks.

Updated: 2025-06-25 14:37:00

标题: 离线目标条件强化学习与投影拟度规划

摘要: 离线目标条件增强学习旨在训练代理程序从先前收集的轨迹中达到指定目标。扩展到长期任务的规模仍然具有挑战性，主要是由于价值估计误差的复合。基于原则的几何学提供了解决这些问题的潜在解决方案。在此基础上，我们引入了Projective Quasimetric Planning（ProQ），这是一个学习非对称距离然后重新利用它的组成框架，首先作为斥力能量，强制使得一组稀疏的关键点均匀分布在学习到的潜在空间中，其次作为结构化方向成本，引导朝向临近子目标。特别是，ProQ将这种几何形态与拉格朗日分布检测器相结合，以确保学习到的关键点保持在可达区域内。通过统一度量学习、关键点覆盖和目标条件控制，我们的方法产生了有意义的子目标，并在各种导航基准测试中可靠地推动长期目标达成。

更新时间: 2025-06-25 14:37:00

领域: cs.LG

下载: http://arxiv.org/abs/2506.18847v2

Generative AI for Vulnerability Detection in 6G Wireless Networks: Advances, Case Study, and Future Directions

The rapid advancement of 6G wireless networks, IoT, and edge computing has significantly expanded the cyberattack surface, necessitating more intelligent and adaptive vulnerability detection mechanisms. Traditional security methods, while foundational, struggle with zero-day exploits, adversarial threats, and context-dependent vulnerabilities in highly dynamic network environments. Generative AI (GAI) emerges as a transformative solution, leveraging synthetic data generation, multimodal reasoning, and adaptive learning to enhance security frameworks. This paper explores the integration of GAI-powered vulnerability detection in 6G wireless networks, focusing on code auditing, protocol security, cloud-edge defenses, and hardware protection. We introduce a three-layer framework comprising the Technology Layer, Capability Layer, and Application Layer to systematically analyze the role of VAEs, GANs, LLMs, and GDMs in securing next-generation wireless ecosystems. To demonstrate practical implementation, we present a case study on LLM-driven code vulnerability detection, highlighting its effectiveness, performance, and challenges. Finally, we outline future research directions, including lightweight models, high-authenticity data generation, external knowledge integration, and privacy-preserving technologies. By synthesizing current advancements and open challenges, this work provides a roadmap for researchers and practitioners to harness GAI for building resilient and adaptive security solutions in 6G networks.

Updated: 2025-06-25 14:36:31

标题: 第六代无线网络中用于漏洞检测的生成式人工智能：进展、案例研究和未来方向

摘要: 6G无线网络、物联网和边缘计算的快速发展显著扩大了网络攻击面，需要更智能和适应性更强的漏洞检测机制。传统的安全方法虽然是基础，但在高度动态的网络环境中往往难以应对零日漏洞利用、对抗性威胁和依赖环境的漏洞。生成式人工智能（GAI）出现作为一个变革性解决方案，利用合成数据生成、多模态推理和自适应学习来增强安全框架。本文探讨了在6G无线网络中应用GAI强化漏洞检测，重点关注代码审计、协议安全、云边防御和硬件保护。我们引入了一个由技术层、能力层和应用层构成的三层框架，以系统地分析VAEs、GANs、LLMs和GDMs在保护下一代无线生态系统中的作用。为了展示实际实施，我们提出了一个以LLM驱动的代码漏洞检测案例研究，突出了其有效性、性能和挑战。最后，我们概述了未来研究方向，包括轻量级模型、高真实性数据生成、外部知识整合和隐私保护技术。通过综合当前的进展和开放挑战，这项工作为研究人员和从业者提供了一个在6G网络中利用GAI构建弹性和适应性安全解决方案的路线图。

更新时间: 2025-06-25 14:36:31

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2506.20488v1

Mixtures of Neural Cellular Automata: A Stochastic Framework for Growth Modelling and Self-Organization

Neural Cellular Automata (NCAs) are a promising new approach to model self-organizing processes, with potential applications in life science. However, their deterministic nature limits their ability to capture the stochasticity of real-world biological and physical systems. We propose the Mixture of Neural Cellular Automata (MNCA), a novel framework incorporating the idea of mixture models into the NCA paradigm. By combining probabilistic rule assignments with intrinsic noise, MNCAs can model diverse local behaviors and reproduce the stochastic dynamics observed in biological processes. We evaluate the effectiveness of MNCAs in three key domains: (1) synthetic simulations of tissue growth and differentiation, (2) image morphogenesis robustness, and (3) microscopy image segmentation. Results show that MNCAs achieve superior robustness to perturbations, better recapitulate real biological growth patterns, and provide interpretable rule segmentation. These findings position MNCAs as a promising tool for modeling stochastic dynamical systems and studying self-growth processes.

Updated: 2025-06-25 14:33:35

标题: 神经细胞自动机混合：生长建模和自组织的随机框架

摘要: 神经细胞自动机（NCAs）是一种有前景的新方法，用于模拟自组织过程，在生命科学中具有潜在应用。然而，它们的确定性特性限制了捕捉真实生物和物理系统的随机性的能力。我们提出了混合神经细胞自动机（MNCA），这是一个将混合模型的概念融入NCA范式的新框架。通过将概率规则分配与内在噪声相结合，MNCAs可以模拟多样的局部行为，并重现生物过程中观察到的随机动态。我们在三个关键领域评估了MNCAs的有效性：（1）组织生长和分化的合成模拟，（2）图像形态生成的鲁棒性，以及（3）显微镜图像分割。结果表明，MNCAs在扰动方面具有更强的鲁棒性，更好地重现真实的生物生长模式，并提供可解释的规则分割。这些发现将MNCAs定位为一种有前景的工具，用于建模随机动力系统和研究自我生长过程。

更新时间: 2025-06-25 14:33:35

领域: cs.AI

下载: http://arxiv.org/abs/2506.20486v1

Counterfactual Influence as a Distributional Quantity

Machine learning models are known to memorize samples from their training data, raising concerns around privacy and generalization. Counterfactual self-influence is a popular metric to study memorization, quantifying how the model's prediction for a sample changes depending on the sample's inclusion in the training dataset. However, recent work has shown memorization to be affected by factors beyond self-influence, with other training samples, in particular (near-)duplicates, having a large impact. We here study memorization treating counterfactual influence as a distributional quantity, taking into account how all training samples influence how a sample is memorized. For a small language model, we compute the full influence distribution of training samples on each other and analyze its properties. We find that solely looking at self-influence can severely underestimate tangible risks associated with memorization: the presence of (near-)duplicates seriously reduces self-influence, while we find these samples to be (near-)extractable. We observe similar patterns for image classification, where simply looking at the influence distributions reveals the presence of near-duplicates in CIFAR-10. Our findings highlight that memorization stems from complex interactions across training data and is better captured by the full influence distribution than by self-influence alone.

Updated: 2025-06-25 14:25:11

标题: 反事实影响作为一个分布数量

摘要: 机器学习模型被认为会记住其训练数据中的样本，引发了对隐私和泛化能力的担忧。反事实自我影响是一种流行的度量标准，用于研究记忆化，量化模型对样本的预测如何随着样本是否包含在训练数据集中而改变。然而，最近的研究表明，记忆化受到自我影响以外的因素的影响，其他训练样本，特别是（近似）重复样本，具有很大影响。在这里，我们研究记忆化，将反事实影响视为一个分布量，考虑所有训练样本如何影响样本的记忆。对于一个小型语言模型，我们计算了训练样本之间的完整影响分布，并分析了其属性。我们发现仅仅看自我影响可能严重低估与记忆化相关的实际风险：（近似）重复样本的存在严重降低了自我影响，而我们发现这些样本是（近似）可提取的。我们观察到图像分类也存在类似的模式，仅仅查看影响分布就能揭示CIFAR-10中近似重复样本的存在。我们的研究结果突显出，记忆化源于训练数据之间的复杂交互作用，并且通过完整的影响分布而不是仅靠自我影响来更好地捕捉。

更新时间: 2025-06-25 14:25:11

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2506.20481v1

Graph Linearization Methods for Reasoning on Graphs with Large Language Models

Large language models have evolved to process multiple modalities beyond text, such as images and audio, which motivates us to explore how to effectively leverage them for graph reasoning tasks. The key question, therefore, is how to transform graphs into linear sequences of tokens, a process we term "graph linearization", so that LLMs can handle graphs naturally. We consider that graphs should be linearized meaningfully to reflect certain properties of natural language text, such as local dependency and global alignment, in order to ease contemporary LLMs, trained on trillions of textual tokens, better understand graphs. To achieve this, we developed several graph linearization methods based on graph centrality and degeneracy. These methods are further enhanced using node relabeling techniques. The experimental results demonstrate the effectiveness of our methods compared to the random linearization baseline. Our work introduces novel graph representations suitable for LLMs, contributing to the potential integration of graph machine learning with the trend of multimodal processing using a unified transformer model.

Updated: 2025-06-25 14:24:33

标题: 使用大型语言模型进行图推理的图线性化方法

摘要: 大型语言模型已经发展到可以处理除文本之外的多种模态，如图像和音频，这激励我们探索如何有效地利用它们来处理图推理任务。因此，关键问题是如何将图转换为线性序列的标记，我们称之为“图线性化”过程，以便LLMs可以自然地处理图形。我们认为图应该被有意义地线性化，以反映自然语言文本的某些属性，如局部依赖性和全局对齐，以便训练在数万亿文本标记上的当代LLMs更好地理解图形。为了实现这一目标，我们开发了基于图中心性和退化的几种图线性化方法。这些方法进一步利用节点重标记技术进行增强。实验结果表明，与随机线性化基线相比，我们的方法的有效性。我们的工作引入了适用于LLMs的新颖图表示，有助于将图机器学习与使用统一的变压器模型进行多模态处理的趋势相结合。

更新时间: 2025-06-25 14:24:33

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.19494v3

MARCO: Multi-Agent Code Optimization with Real-Time Knowledge Integration for High-Performance Computing

Large language models (LLMs) have transformed software development through code generation capabilities, yet their effectiveness for high-performance computing (HPC) remains limited. HPC code requires specialized optimizations for parallelism, memory efficiency, and architecture-specific considerations that general-purpose LLMs often overlook. We present MARCO (Multi-Agent Reactive Code Optimizer), a novel framework that enhances LLM-generated code for HPC through a specialized multi-agent architecture. MARCO employs separate agents for code generation and performance evaluation, connected by a feedback loop that progressively refines optimizations. A key innovation is MARCO's web-search component that retrieves real-time optimization techniques from recent conference proceedings and research publications, bridging the knowledge gap in pre-trained LLMs. Our extensive evaluation on the LeetCode 75 problem set demonstrates that MARCO achieves a 14.6\% average runtime reduction compared to Claude 3.5 Sonnet alone, while the integration of the web-search component yields a 30.9\% performance improvement over the base MARCO system. These results highlight the potential of multi-agent systems to address the specialized requirements of high-performance code generation, offering a cost-effective alternative to domain-specific model fine-tuning.

Updated: 2025-06-25 14:22:04

标题: MARCO：用于高性能计算的实时知识集成的多智能体代码优化

摘要: 大型语言模型（LLMs）通过代码生成功能改变了软件开发，但它们在高性能计算（HPC）方面的有效性仍然有限。HPC代码需要针对并行性、内存效率和特定架构考虑的专门优化，而通用型LLMs往往忽视这些方面。我们提出了MARCO（多智能体反应代码优化器），这是一个通过专门的多智能体架构增强LLM生成的HPC代码的新框架。MARCO采用独立的代理程序进行代码生成和性能评估，通过一个反馈循环逐步优化。一个关键的创新是MARCO的网络搜索组件，从最近的会议记录和研究出版物中检索实时优化技术，弥补了预训练LLMs的知识差距。我们在LeetCode 75问题集上进行了广泛评估，结果显示，与单独的Claude 3.5 Sonnet相比，MARCO实现了14.6\%的平均运行时降低，而集成了网络搜索组件的MARCO系统相对于基本MARCO系统则提高了30.9\%的性能。这些结果凸显了多智能体系统解决高性能代码生成的专门要求的潜力，提供了一个经济有效的领域特定模型微调的替代方案。

更新时间: 2025-06-25 14:22:04

领域: cs.DC,cs.LG,cs.SE

下载: http://arxiv.org/abs/2505.03906v3

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

Automated penetration testing (AutoPT) powered by large language models (LLMs) has gained attention for its ability to automate ethical hacking processes and identify vulnerabilities in target systems by leveraging the inherent knowledge of LLMs. However, existing LLM-based AutoPT frameworks often underperform compared to human experts in challenging tasks for several reasons: the imbalanced knowledge used in LLM training, short-sightedness in the planning process, and hallucinations during command generation. Moreover, the trial-and-error nature of the PT process is constrained by existing frameworks lacking mechanisms to learn from previous failures, restricting adaptive improvement of PT strategies. To address these limitations, we propose a knowledge-informed, self-reflective PT framework powered by LLMs, called RefPentester. This AutoPT framework is designed to assist human operators in identifying the current stage of the PT process, selecting appropriate tactics and techniques for each stage, choosing suggested actions, providing step-by-step operational guidance, and reflecting on and learning from previous failed operations. We also modeled the PT process as a seven-state Stage Machine to integrate the proposed framework effectively. The evaluation shows that RefPentester can successfully reveal credentials on Hack The Box's Sau machine, outperforming the baseline GPT-4o model by 16.7%. Across PT stages, RefPentester also demonstrates superior success rates on PT stage transitions.

Updated: 2025-06-25 14:14:56

标题: RefPentester：基于大型语言模型的知识驱动自我反思渗透测试框架

摘要: 自动化渗透测试（AutoPT）由大型语言模型（LLMs）驱动，因其能够自动化道德黑客过程并利用LLMs的固有知识来识别目标系统中的漏洞而引起关注。然而，现有基于LLMs的AutoPT框架在挑战性任务中通常表现不佳，原因有几个：LLM训练中使用的知识不平衡，规划过程中的短视，以及命令生成过程中的幻觉。此外，PT过程的试错性质受到现有框架的限制，缺乏从先前失败中学习的机制，限制了PT策略的适应性改进。为了解决这些限制，我们提出了一个由LLMs驱动的知识驱动、自我反思的PT框架，称为RefPentester。这个AutoPT框架旨在帮助人类运营商识别PT过程的当前阶段，选择每个阶段的适当策略和技术，选择建议的操作，提供逐步操作指导，并反思并从以前的失败操作中学习。我们还将PT过程建模为一个七状态的阶段机器，以有效整合提出的框架。评估结果显示，RefPentester可以成功地在Hack The Box的Sau机器上揭示凭证，比基线GPT-4o模型提高了16.7%。在PT阶段中，RefPentester还展示了在PT阶段转换上的更高成功率。

更新时间: 2025-06-25 14:14:56

领域: cs.AI

下载: http://arxiv.org/abs/2505.07089v3

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Scientific discoveries increasingly rely on complex multimodal reasoning based on information-intensive scientific data and domain-specific expertise. Empowered by expert-level scientific benchmarks, scientific Multimodal Large Language Models (MLLMs) hold the potential to significantly enhance this discovery process in realistic workflows. However, current scientific benchmarks mostly focus on evaluating the knowledge understanding capabilities of MLLMs, leading to an inadequate assessment of their perception and reasoning abilities. To address this gap, we present the Scientists' First Exam (SFE) benchmark, designed to evaluate the scientific cognitive capacities of MLLMs through three interconnected levels: scientific signal perception, scientific attribute understanding, scientific comparative reasoning. Specifically, SFE comprises 830 expert-verified VQA pairs across three question types, spanning 66 multimodal tasks across five high-value disciplines. Extensive experiments reveal that current state-of-the-art GPT-o3 and InternVL-3 achieve only 34.08% and 26.52% on SFE, highlighting significant room for MLLMs to improve in scientific realms. We hope the insights obtained in SFE will facilitate further developments in AI-enhanced scientific discoveries.

Updated: 2025-06-25 14:13:38

标题: 科学家的第一次考试：通过感知、理解和推理探究MLLM的认知能力

摘要: 科学发现越来越依赖于基于信息密集的科学数据和领域特定专业知识的复杂多模态推理。在专家级科学基准的支持下，科学多模态大型语言模型（MLLMs）有望显著增强实际工作流程中的发现过程。然而，目前的科学基准主要集中在评估MLLMs的知识理解能力，导致对其感知和推理能力的评估不足。为了填补这一空白，我们提出了科学家首次考试（SFE）基准，旨在通过三个相互关联的水平评估MLLMs的科学认知能力：科学信号感知、科学属性理解、科学比较推理。具体而言，SFE包括830个专家验证的VQA对，涵盖了五个高价值学科中的66个多模态任务。广泛实验表明，当前最先进的GPT-o3和InternVL-3在SFE上仅达到34.08％和26.52％，突显MLLMs在科学领域有很大的改进空间。我们希望SFE中获得的见解将促进AI增强科学发现的进一步发展。

更新时间: 2025-06-25 14:13:38

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.10521v3

Physics-informed Imitative Reinforcement Learning for Real-world Driving

Recent advances in imitative reinforcement learning (IRL) have considerably enhanced the ability of autonomous agents to assimilate expert demonstrations, leading to rapid skill acquisition in a range of demanding tasks. However, such learning-based agents face significant challenges when transferring knowledge to highly dynamic closed-loop environments. Their performance is significantly impacted by the conflicting optimization objectives of imitation learning (IL) and reinforcement learning (RL), sample inefficiency, and the complexity of uncovering the hidden world model and physics. To address this challenge, we propose a physics-informed IRL that is entirely data-driven. It leverages both expert demonstration data and exploratory data with a joint optimization objective, allowing the underlying physical principles of vehicle dynamics to emerge naturally from the training process. The performance is evaluated through empirical experiments and results exceed popular IL, RL and IRL algorithms in closed-loop settings on Waymax benchmark. Our approach exhibits 37.8% reduction in collision rate and 22.2% reduction in off-road rate compared to the baseline method.

Updated: 2025-06-25 14:06:21

标题: 物理学知识指导的仿真强化学习在实际驾驶中的应用

摘要: 最近在模仿性强化学习（IRL）领域取得的进展显著增强了自主代理的能力，使其能够吸收专家演示，从而在一系列苛刻任务中快速习得技能。然而，这种基于学习的代理在将知识转移至高度动态的闭环环境时面临重大挑战。它们的表现受到模仿学习（IL）和强化学习（RL）之间冲突的优化目标、样本效率低下以及揭示隐藏的世界模型和物理学的复杂性的显著影响。为了解决这一挑战，我们提出了一种完全数据驱动的物理学信息IRL。它利用专家演示数据和探索性数据，具有联合优化目标，允许车辆动力学的基本物理原理自然地从训练过程中浮现出来。通过经验实验评估了性能，结果在Waymax基准测试的闭环设置中超过了流行的IL、RL和IRL算法。与基准方法相比，我们的方法在碰撞率上减少了37.8％，在越野率上减少了22.2％。

更新时间: 2025-06-25 14:06:21

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.02508v3

CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models

Faithfulness hallucinations are claims generated by a Large Language Model (LLM) not supported by contexts provided to the LLM. Lacking assessment standards, existing benchmarks focus on "factual statements" that rephrase source materials while overlooking "cognitive statements" that involve making inferences from the given context. Consequently, evaluating and detecting the hallucination of cognitive statements remains challenging. Inspired by how evidence is assessed in the legal domain, we design a rigorous framework to assess different levels of faithfulness of cognitive statements and introduce the CogniBench dataset where we reveal insightful statistics. To keep pace with rapidly evolving LLMs, we further develop an automatic annotation pipeline that scales easily across different models. This results in a large-scale CogniBench-L dataset, which facilitates training accurate detectors for both factual and cognitive hallucinations. We release our model and datasets at: https://github.com/FUTUREEEEEE/CogniBench

Updated: 2025-06-25 14:02:19

标题: CogniBench：用于评估大型语言模型认知忠实度的基于法律启发的框架和数据集

摘要: 信念幻觉是由大型语言模型（LLM）生成的主张，这些主张不受LLM提供的上下文支持。缺乏评估标准，现有基准关注“事实性陈述”，重述源材料，而忽略了涉及从给定上下文中推断的“认知性陈述”。因此，评估和检测认知性陈述的幻觉仍然具有挑战性。受法律领域证据评估的启发，我们设计了一个严格的框架来评估认知性陈述的不同层次的忠实度，并引入了CogniBench数据集，其中我们揭示了有见地的统计数据。为了跟上迅速发展的LLM，我们进一步开发了一个自动注释流程，可以轻松扩展到不同的模型。这导致了一个大规模的CogniBench-L数据集，有助于训练准确的检测器，用于事实和认知幻觉。我们在以下网址发布了我们的模型和数据集：https://github.com/FUTUREEEEEE/CogniBench

更新时间: 2025-06-25 14:02:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2505.20767v4

HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling

Diffusion models have emerged as the leading approach for image synthesis, demonstrating exceptional photorealism and diversity. However, training diffusion models at high resolutions remains computationally prohibitive, and existing zero-shot generation techniques for synthesizing images beyond training resolutions often produce artifacts, including object duplication and spatial incoherence. In this paper, we introduce HiWave, a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis using pretrained diffusion models. Our method employs a two-stage pipeline: generating a base image from the pretrained model followed by a patch-wise DDIM inversion step and a novel wavelet-based detail enhancer module. Specifically, we first utilize inversion methods to derive initial noise vectors that preserve global coherence from the base image. Subsequently, during sampling, our wavelet-domain detail enhancer retains low-frequency components from the base image to ensure structural consistency, while selectively guiding high-frequency components to enrich fine details and textures. Extensive evaluations using Stable Diffusion XL demonstrate that HiWave effectively mitigates common visual artifacts seen in prior methods, achieving superior perceptual quality. A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons, highlighting its effectiveness for high-quality, ultra-high-resolution image synthesis without requiring retraining or architectural modifications.

Updated: 2025-06-25 13:58:37

标题: HiWave：基于小波扩散采样的无需训练的高分辨率图像生成

摘要: 扩散模型已成为图像合成的主导方法，展示出卓越的逼真度和多样性。然而，在高分辨率下训练扩散模型仍然具有计算上的限制，而现有的零样本生成技术在超出训练分辨率合成图像时常产生包括物体重复和空间不连贯在内的伪影。本文介绍了HiWave，这是一种无需训练的零样本方法，利用预训练的扩散模型显著增强了超高分辨率图像合成中的视觉保真度和结构连贯性。我们的方法采用了两阶段流程：首先从预训练模型生成基础图像，然后进行基于补丁的DDIM反演步骤和一种新颖的基于小波的细节增强模块。具体而言，我们首先利用反演方法从基础图像中得出保留全局连贯性的初始噪声向量。随后，在采样过程中，我们的小波域细节增强器保留基础图像中的低频成分以确保结构一致性，同时有选择地引导高频成分以丰富细节和纹理。使用Stable Diffusion XL进行了广泛评估，证明HiWave有效地减轻了以往方法中常见的视觉伪影，实现了更优越的感知质量。用户研究证实了HiWave的性能，在超过80%的比较中被优先选择于最先进的替代方案，突显了其在高质量、超高分辨率图像合成中的有效性，而无需重新训练或架构修改。

更新时间: 2025-06-25 13:58:37

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2506.20452v1

Automatic Demonstration Selection for LLM-based Tabular Data Classification

A fundamental question in applying In-Context Learning (ICL) for tabular data classification is how to determine the ideal number of demonstrations in the prompt. This work addresses this challenge by presenting an algorithm to automatically select a reasonable number of required demonstrations. Our method distinguishes itself by integrating not only the tabular data's distribution but also the user's selected prompt template and the specific Large Language Model (LLM) into its estimation. Rooted in Spectral Graph Theory, our proposed algorithm defines a novel metric to quantify the similarities between different demonstrations. We then construct a similarity graph and analyze the eigenvalues of its Laplacian to derive the minimum number of demonstrations capable of representing the data within the LLM's intrinsic representation space. We validate the efficacy of our approach through experiments comparing its performance against conventional random selection algorithms on diverse datasets and LLMs.

Updated: 2025-06-25 13:57:54

标题: 基于LLM的表格数据分类的自动演示选择

摘要: 在为表格数据分类应用上下文学习（ICL）时，一个基本问题是如何确定提示中理想的演示数量。本文通过提出一种算法来自动选择所需演示的合理数量来解决这一挑战。我们的方法的独特之处在于，它不仅集成了表格数据的分布，还包括用户选择的提示模板和特定的大型语言模型（LLM）在其估计中。根植于谱图论，我们提出的算法定义了一种新的度量标准，用于量化不同演示之间的相似性。然后，我们构建了一个相似性图，并分析其拉普拉斯矩阵的特征值，以推导出能够代表LLM内在表示空间中数据的最小演示数量。通过在多样化数据集和LLM上比较其性能与传统随机选择算法的实验证实了我们方法的有效性。

更新时间: 2025-06-25 13:57:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20451v1

Image Super-Resolution with Guarantees via Conformalized Generative Models

The increasing use of generative ML foundation models for image restoration tasks such as super-resolution calls for robust and interpretable uncertainty quantification methods. We address this need by presenting a novel approach based on conformal prediction techniques to create a 'confidence mask' capable of reliably and intuitively communicating where the generated image can be trusted. Our method is adaptable to any black-box generative model, including those locked behind an opaque API, requires only easily attainable data for calibration, and is highly customizable via the choice of a local image similarity metric. We prove strong theoretical guarantees for our method that span fidelity error control (according to our local image similarity metric), reconstruction quality, and robustness in the face of data leakage. Finally, we empirically evaluate these results and establish our method's solid performance.

Updated: 2025-06-25 13:51:55

标题: 通过一致化生成模型实现具有保证的图像超分辨率

摘要: 随着生成式机器学习基础模型在图像恢复任务中的增加使用，如超分辨率，需要稳健且可解释的不确定性量化方法。我们通过提出一种基于符合预测技术的新方法来满足这一需求，以创建一个“置信度蒙版”，能够可靠且直观地传达生成的图像可以信任的位置。我们的方法适用于任何黑盒生成模型，包括那些被封锁在不透明API后面的模型，只需要通过易获取的数据进行校准，并且通过选择本地图像相似度度量来进行高度定制。我们为我们的方法提供了强大的理论保证，涵盖了忠实度误差控制（根据我们的本地图像相似度度量）、重建质量和面对数据泄露的稳健性。最后，我们通过实证评估这些结果并确定我们方法的良好性能。

更新时间: 2025-06-25 13:51:55

领域: cs.CV,cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.09664v2

Méthode de quadrature pour les PINNs fondée théoriquement sur la hessienne des résiduels

Physics-informed Neural Networks (PINNs) have emerged as an efficient way to learn surrogate neural solvers of PDEs by embedding the physical model in the loss function and minimizing its residuals using automatic differentiation at so-called collocation points. Originally uniformly sampled, the choice of the latter has been the subject of recent advances leading to adaptive sampling refinements. In this paper, we propose a new quadrature method for approximating definite integrals based on the hessian of the considered function, and that we leverage to guide the selection of the collocation points during the training process of PINNs.

Updated: 2025-06-25 13:49:53

标题: Quadrature method for PINNs theoretically based on the residuals Hessian

摘要: 物理信息神经网络（PINNs）已经成为一种有效的学习PDE的替代神经求解器的方法，通过将物理模型嵌入损失函数，并利用自动微分在所谓的配点上最小化其残差。最初均匀采样，最近的进展已经导致自适应采样的改进。在本文中，我们提出了一种基于所考虑函数的Hessian的确定积分的近似方法，并利用它来指导PINNs训练过程中选择配点的过程。

更新时间: 2025-06-25 13:49:53

领域: cs.LG

下载: http://arxiv.org/abs/2506.20441v1

AeroLite-MDNet: Lightweight Multi-task Deviation Detection Network for UAV Landing

Unmanned aerial vehicles (UAVs) are increasingly employed in diverse applications such as land surveying, material transport, and environmental monitoring. Following missions like data collection or inspection, UAVs must land safely at docking stations for storage or recharging, which is an essential requirement for ensuring operational continuity. However, accurate landing remains challenging due to factors like GPS signal interference. To address this issue, we propose a deviation warning system for UAV landings, powered by a novel vision-based model called AeroLite-MDNet. This model integrates a multiscale fusion module for robust cross-scale object detection and incorporates a segmentation branch for efficient orientation estimation. We introduce a new evaluation metric, Average Warning Delay (AWD), to quantify the system's sensitivity to landing deviations. Furthermore, we contribute a new dataset, UAVLandData, which captures real-world landing deviation scenarios to support training and evaluation. Experimental results show that our system achieves an AWD of 0.7 seconds with a deviation detection accuracy of 98.6\%, demonstrating its effectiveness in enhancing UAV landing reliability. Code will be available at https://github.com/ITTTTTI/Maskyolo.git

Updated: 2025-06-25 13:48:30

标题: AeroLite-MDNet: 无人机着陆轻量级多任务偏离检测网络

摘要: 无人机（UAV）越来越被广泛应用于土地测量、物资运输和环境监测等多种应用领域。在完成数据采集或检查等任务后，UAV必须安全降落到停机坪以便存储或充电，这是确保操作连续性的基本要求。然而，由于GPS信号干扰等因素，精确着陆仍然具有挑战性。为了解决这个问题，我们提出了一种基于新型视觉模型AeroLite-MDNet的无人机着陆偏离警告系统。该模型集成了一个多尺度融合模块，用于强大的跨尺度目标检测，并且包含了一个分割分支，用于高效的方向估计。我们引入了一个新的评估指标，平均警告延迟（AWD），以量化系统对着陆偏离的敏感性。此外，我们贡献了一个新的数据集，UAVLandData，捕捉了真实世界的着陆偏离场景，以支持训练和评估。实验结果显示，我们的系统在0.7秒的AWD和98.6％的偏离检测准确率下取得了成功，证明了其在提高UAV着陆可靠性方面的有效性。代码将在https://github.com/ITTTTTI/Maskyolo.git上提供。

更新时间: 2025-06-25 13:48:30

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2506.21635v1

Tackling Data Heterogeneity in Federated Learning through Knowledge Distillation with Inequitable Aggregation

Federated learning aims to train a global model in a distributed environment that is close to the performance of centralized training. However, issues such as client label skew, data quantity skew, and other heterogeneity problems severely degrade the model's performance. Most existing methods overlook the scenario where only a small portion of clients participate in training within a large-scale client setting, whereas our experiments show that this scenario presents a more challenging federated learning task. Therefore, we propose a Knowledge Distillation with teacher-student Inequitable Aggregation (KDIA) strategy tailored to address the federated learning setting mentioned above, which can effectively leverage knowledge from all clients. In KDIA, the student model is the average aggregation of the participating clients, while the teacher model is formed by a weighted aggregation of all clients based on three frequencies: participation intervals, participation counts, and data volume proportions. During local training, self-knowledge distillation is performed. Additionally, we utilize a generator trained on the server to generate approximately independent and identically distributed (IID) data features locally for auxiliary training. We conduct extensive experiments on the CIFAR-10/100/CINIC-10 datasets and various heterogeneous settings to evaluate KDIA. The results show that KDIA can achieve better accuracy with fewer rounds of training, and the improvement is more significant under severe heterogeneity.

Updated: 2025-06-25 13:42:30

标题: 通过知识蒸馏和不平等聚合解决联邦学习中的数据异质性

摘要: 联邦学习旨在在分布式环境中训练一个全局模型，该模型接近于集中式训练的性能。然而，诸如客户标签偏斜、数据数量偏斜以及其他异质性问题严重影响了模型的性能。大多数现有方法忽视了在大规模客户设置中只有一小部分客户参与训练的情况，而我们的实验表明，这种情况呈现出更具挑战性的联邦学习任务。因此，我们提出了一种名为 Knowledge Distillation with teacher-student Inequitable Aggregation (KDIA) 策略，旨在解决上述联邦学习设置，可以有效利用来自所有客户的知识。在 KDIA 中，学生模型是参与客户的平均聚合，而教师模型是基于三个频率（参与间隔、参与次数和数据量比例）的所有客户的加权聚合形成的。在本地训练期间，进行自知识蒸馏。此外，我们利用在服务器上训练的生成器，在本地为辅助训练生成近似独立且同分布（IID）的数据特征。我们在 CIFAR-10/100/CINIC-10 数据集和各种异质设置上进行了大量实验来评估 KDIA。结果显示，KDIA 可以在更少的训练轮次内实现更好的准确性，并且在严重异质性下改进更为显著。

更新时间: 2025-06-25 13:42:30

领域: cs.LG

下载: http://arxiv.org/abs/2506.20431v1

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

Rare diseases collectively affect over 300 million individuals worldwide, yet timely and accurate diagnosis remains a pervasive challenge. This is largely due to their clinical heterogeneity, low individual prevalence, and the limited familiarity most clinicians have with rare conditions. Here, we introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM), capable of processing heterogeneous clinical inputs. The system generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning that links intermediate analytic steps to verifiable medical evidence. DeepRare comprises three key components: a central host with a long-term memory module; specialized agent servers responsible for domain-specific analytical tasks integrating over 40 specialized tools and web-scale, up-to-date medical knowledge sources, ensuring access to the most current clinical information. This modular and scalable design enables complex diagnostic reasoning while maintaining traceability and adaptability. We evaluate DeepRare on eight datasets. The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases. In HPO-based evaluations, DeepRare significantly outperforms other 15 methods, like traditional bioinformatics diagnostic tools, LLMs, and other agentic systems, achieving an average Recall@1 score of 57.18% and surpassing the second-best method (Reasoning LLM) by a substantial margin of 23.79 percentage points. For multi-modal input scenarios, DeepRare achieves 70.60% at Recall@1 compared to Exomiser's 53.20% in 109 cases. Manual verification of reasoning chains by clinical experts achieves 95.40% agreements. Furthermore, the DeepRare system has been implemented as a user-friendly web application http://raredx.cn/doctor.

Updated: 2025-06-25 13:42:26

标题: 一种可追溯推理的罕见病诊断代理系统

摘要: 罕见疾病在全球影响超过3亿人，然而及时准确的诊断仍然是一个普遍的挑战。这主要是由于其临床异质性、低个体患病率以及大多数临床医生对罕见病状的了解有限。在这里，我们介绍了DeepRare，这是第一个由大型语言模型（LLM）驱动的罕见疾病诊断代理系统，能够处理异质临床输入。该系统为罕见疾病生成排名的诊断假设，每个假设都附带一个透明的推理链，将中间分析步骤与可验证的医学证据联系起来。 DeepRare包括三个关键组件：一个具有长期记忆模块的中央主机；负责领域特定分析任务的专门代理服务器，集成了40多个专门工具和规模庞大、最新的医学知识来源，确保访问最新的临床信息。这种模块化和可扩展的设计可以实现复杂的诊断推理，同时保持可追溯性和适应性。我们对DeepRare进行了八个数据集的评估。该系统在2919种疾病中表现出色，对1013种疾病的准确率达到100%。在基于HPO的评估中，DeepRare在Recall@1分数方面明显优于其他15种方法，如传统的生物信息学诊断工具、LLMs和其他代理系统，平均Recall@1得分为57.18%，超过第二好的方法（Reasoning LLM）23.79个百分点。对于多模态输入场景，DeepRare在109个案例中的Recall@1为70.60%，而Exomiser为53.20%。临床专家对推理链进行手动验证达到95.40%的一致性。此外，DeepRare系统已经作为一个用户友好的Web应用程序实现，网址为http://raredx.cn/doctor。

更新时间: 2025-06-25 13:42:26

领域: cs.CL,cs.AI,cs.CV,cs.MA

下载: http://arxiv.org/abs/2506.20430v1

Scalable Subset Selection in Linear Mixed Models

Linear mixed models (LMMs), which incorporate fixed and random effects, are key tools for analyzing heterogeneous data, such as in personalized medicine or adaptive marketing. Nowadays, this type of data is increasingly wide, sometimes containing thousands of candidate predictors, necessitating sparsity for prediction and interpretation. However, existing sparse learning methods for LMMs do not scale well beyond tens or hundreds of predictors, leaving a large gap compared with sparse methods for linear models, which ignore random effects. This paper closes the gap with a new $\ell_0$ regularized method for LMM subset selection that can run on datasets containing thousands of predictors in seconds to minutes. On the computational front, we develop a coordinate descent algorithm as our main workhorse and provide a guarantee of its convergence. We also develop a local search algorithm to help traverse the nonconvex optimization surface. Both algorithms readily extend to subset selection in generalized LMMs via a penalized quasi-likelihood approximation. On the statistical front, we provide a finite-sample bound on the Kullback-Leibler divergence of the new method. We then demonstrate its excellent performance in synthetic experiments and illustrate its utility on two datasets from biology and journalism.

Updated: 2025-06-25 13:39:30

标题: 线性混合模型中的可扩展子集选择

摘要: 线性混合模型（LMMs）将固定效应和随机效应结合在一起，是分析异质数据的关键工具，如个性化医学或自适应营销。如今，这种类型的数据变得越来越广泛，有时包含数千个候选预测因子，需要稀疏性进行预测和解释。然而，现有的用于LMMs的稀疏学习方法在超过几十个或几百个预测因子时无法很好地扩展，与忽略随机效应的线性模型的稀疏方法相比存在很大差距。本文通过一种新的$\ell_0$正则化方法来缩小这一差距，用于LMM子集选择，可以在几秒钟到几分钟内运行包含数千个预测因子的数据集。在计算方面，我们开发了一个坐标下降算法作为我们的主要工具，并提供其收敛的保证。我们还开发了一个局部搜索算法来帮助遍历非凸优化表面。这两种算法都可以通过惩罚拟似然逼近方法轻松扩展到广义LMMs中的子集选择。在统计方面，我们提供了新方法的Kullback-Leibler散度的有限样本界限。然后，我们通过合成实验展示了其出色的性能，并在两个生物学和新闻学的数据集上展示了其实用性。

更新时间: 2025-06-25 13:39:30

领域: stat.ML,cs.LG,stat.CO,stat.ME

下载: http://arxiv.org/abs/2506.20425v1

Off-Policy Evaluation and Learning for the Future under Non-Stationarity

We study the novel problem of future off-policy evaluation (F-OPE) and learning (F-OPL) for estimating and optimizing the future value of policies in non-stationary environments, where distributions vary over time. In e-commerce recommendations, for instance, our goal is often to estimate and optimize the policy value for the upcoming month using data collected by an old policy in the previous month. A critical challenge is that data related to the future environment is not observed in the historical data. Existing methods assume stationarity or depend on restrictive reward-modeling assumptions, leading to significant bias. To address these limitations, we propose a novel estimator named \textit{\textbf{O}ff-\textbf{P}olicy Estimator for the \textbf{F}uture \textbf{V}alue (\textbf{\textit{OPFV}})}, designed for accurately estimating policy values at any future time point. The key feature of OPFV is its ability to leverage the useful structure within time-series data. While future data might not be present in the historical log, we can leverage, for example, seasonal, weekly, or holiday effects that are consistent in both the historical and future data. Our estimator is the first to exploit these time-related structures via a new type of importance weighting, enabling effective F-OPE. Theoretical analysis identifies the conditions under which OPFV becomes low-bias. In addition, we extend our estimator to develop a new policy-gradient method to proactively learn a good future policy using only historical data. Empirical results show that our methods substantially outperform existing methods in estimating and optimizing the future policy value under non-stationarity for various experimental setups.

Updated: 2025-06-25 13:31:46

标题: 非稳态条件下未来的离线策略评估和学习

摘要: 我们研究了未来离线评估（F-OPE）和学习（F-OPL）的新问题，用于估计和优化非平稳环境中政策的未来价值，在这种环境中，分布随时间变化。例如，在电子商务推荐中，我们的目标通常是使用上个月旧政策收集的数据来估计和优化即将到来的一个月的政策价值。一个关键挑战是未来环境相关的数据在历史数据中未被观察到。现有方法假设平稳性或依赖严格的奖励建模假设，导致显着的偏差。为了解决这些限制，我们提出了一个名为Off-Policy Estimator for the Future Value（OPFV）的新估计器，旨在准确地估计任何未来时间点的政策价值。OPFV的关键特点是它能够利用时间序列数据中的有用结构。虽然未来数据可能不在历史日志中，但我们可以利用例如季节性、每周或节假日效应，这些效应在历史和未来数据中都是一致的。我们的估计器是第一个通过一种新型重要性加权来利用这些与时间相关的结构，从而实现有效的F-OPE。理论分析确定了OPFV变为低偏差的条件。此外，我们将我们的估计器扩展，开发了一种新的策略梯度方法，仅使用历史数据就能主动学习一个好的未来政策。实证结果表明，我们的方法在各种实验设置下明显优于现有方法，在非平稳情况下估计和优化未来政策价值。

更新时间: 2025-06-25 13:31:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20417v1

SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models

Ensuring the security of complex system-on-chips (SoCs) designs is a critical imperative, yet traditional verification techniques struggle to keep pace due to significant challenges in automation, scalability, comprehensiveness, and adaptability. The advent of large language models (LLMs), with their remarkable capabilities in natural language understanding, code generation, and advanced reasoning, presents a new paradigm for tackling these issues. Moving beyond monolithic models, an agentic approach allows for the creation of multi-agent systems where specialized LLMs collaborate to solve complex problems more effectively. Recognizing this opportunity, we introduce SV-LLM, a novel multi-agent assistant system designed to automate and enhance SoC security verification. By integrating specialized agents for tasks like verification question answering, security asset identification, threat modeling, test plan and property generation, vulnerability detection, and simulation-based bug validation, SV-LLM streamlines the workflow. To optimize their performance in these diverse tasks, agents leverage different learning paradigms, such as in-context learning, fine-tuning, and retrieval-augmented generation (RAG). The system aims to reduce manual intervention, improve accuracy, and accelerate security analysis, supporting proactive identification and mitigation of risks early in the design cycle. We demonstrate its potential to transform hardware security practices through illustrative case studies and experiments that showcase its applicability and efficacy.

Updated: 2025-06-25 13:31:13

标题: SV-LLM：一种使用大型语言模型进行SoC安全验证的主体方法

摘要: 确保复杂系统芯片（SoCs）设计的安全性是一个至关重要的任务，然而传统的验证技术由于自动化、可扩展性、全面性和适应性方面存在重大挑战，难以跟上步伐。大型语言模型（LLMs）的出现，以其在自然语言理解、代码生成和高级推理方面的显著能力，为解决这些问题提供了一个新的范式。在超越单一模型的基础上，一种代理化方法允许创建多智能体系统，其中专门的LLMs协作以更有效地解决复杂问题。认识到这一机会，我们介绍了SV-LLM，这是一个新颖的多智能体助手系统，旨在自动化和增强SoC安全验证。通过整合专门的代理用于验证问题回答、安全资产识别、威胁建模、测试计划和属性生成、漏洞检测以及基于模拟的错误验证等任务，SV-LLM简化了工作流程。为了优化它们在这些多样化任务中的性能，代理利用不同的学习范式，如上下文学习、微调和检索增强生成（RAG）。该系统旨在减少手动干预，提高准确性，并加速安全分析，支持在设计周期的早期主动识别和缓解风险。我们通过说明性案例研究和实验展示了其改变硬件安全实践的潜力，展示了其适用性和有效性。

更新时间: 2025-06-25 13:31:13

领域: cs.CR,cs.AI,cs.MA

下载: http://arxiv.org/abs/2506.20415v1

No Free Lunch: Rethinking Internal Feedback for LLM Reasoning

Reinforcement learning has emerged as a powerful paradigm for post-training large language models (LLMs) to improve reasoning. Approaches like Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) have shown strong results, but they require extensive external supervision. We investigate an alternative class of methods, Reinforcement Learning from Internal Feedback (RLIF), which relies solely on intrinsic model-derived signals instead of external rewards. In particular, we leverage unsupervised reward proxies such as token-level entropy, trajectory-level entropy, and self-certainty. Our theoretical analysis shows these internal objectives are partially equivalent, and we empirically evaluate various RLIF strategies on challenging math reasoning benchmarks. Experimental results demonstrate that RLIF can boost the reasoning performance of base LLMs at the beginning phase of the training, matching or surpassing RLVR techniques on these tasks. However, when training progresses, performance degrades even below the model before training. Moreover, we find that RLIF yields little improvement for instruction-tuned models, indicating diminishing returns of intrinsic feedback once an LLM is already instruction-tuned. We further analyze this limitation by mixing model weights and explain the reason of RLIF's training behaviors, providing practical guidelines for integrating internal feedback signals into LLM training. We hope our analysis of internal feedback will inform more principled and effective strategies for LLM post-training.

Updated: 2025-06-25 13:27:49

标题: 没有免费的午餐：重新思考LLM推理的内部反馈

摘要: 强化学习已经成为一个强大的范式，用于对大型语言模型（LLMs）进行后训练以改善推理能力。诸如从人类反馈中学习的强化学习（RLHF）和具有可验证奖励的强化学习（RLVR）等方法已经显示出强大的结果，但它们需要大量的外部监督。我们调查了另一类方法，即从内部反馈中学习的强化学习（RLIF），它完全依赖于内在模型衍生的信号而不是外部奖励。特别是，我们利用无监督的奖励代理，如标记级熵、轨迹级熵和自我确定性。我们的理论分析表明，这些内部目标在一定程度上是等效的，并且我们在具有挑战性的数学推理基准上对各种RLIF策略进行了实证评估。实验结果表明，RLIF可以在训练的初始阶段提高基础LLMs的推理性能，与这些任务上的RLVR技术相匹配或超越。然而，随着训练的进行，性能甚至低于训练之前的模型。此外，我们发现RLIF对于已经进行指令调整的模型几乎没有改善，表明一旦LLM已经进行指令调整，内在反馈的回报递减。我们通过混合模型权重进一步分析了这一限制，并解释了RLIF训练行为的原因，为将内部反馈信号整合到LLM训练中提供实用指导。我们希望我们对内部反馈的分析能够为LLM后训练提供更加原则和有效的策略。

更新时间: 2025-06-25 13:27:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.17219v2

Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning

The growing adoption of Artificial Intelligence (AI) in Internet of Things (IoT) ecosystems has intensified the need for personalized learning methods that can operate efficiently and privately across heterogeneous, resource-constrained devices. However, enabling effective personalized learning in decentralized settings introduces several challenges, including efficient knowledge transfer between clients, protection of data privacy, and resilience against poisoning attacks. In this paper, we address these challenges by developing P4 (Personalized, Private, Peer-to-Peer) -- a method designed to deliver personalized models for resource-constrained IoT devices while ensuring differential privacy and robustness against poisoning attacks. Our solution employs a lightweight, fully decentralized algorithm to privately detect client similarity and form collaborative groups. Within each group, clients leverage differentially private knowledge distillation to co-train their models, maintaining high accuracy while ensuring robustness to the presence of malicious clients. We evaluate P4 on popular benchmark datasets using both linear and CNN-based architectures across various heterogeneity settings and attack scenarios. Experimental results show that P4 achieves 5% to 30% higher accuracy than leading differentially private peer-to-peer approaches and maintains robustness with up to 30% malicious clients. Additionally, we demonstrate its practicality by deploying it on resource-constrained devices, where collaborative training between two clients adds only ~7 seconds of overhead.

Updated: 2025-06-25 13:27:36

标题: 客户端聚类与知识共享相遇：在个性化点对点学习中增强隐私性和稳健性

摘要: 随着人工智能（AI）在物联网（IoT）生态系统中的日益普及，加强了对能够在异构、资源受限设备上高效、私密操作的个性化学习方法的需求。然而，在分散设置中实现有效的个性化学习引入了几个挑战，包括客户端之间的高效知识传输、数据隐私保护以及对抗毒害攻击的弹性。在本文中，我们通过开发P4（个性化、私密、点对点）方法来解决这些挑战，该方法旨在为资源受限的物联网设备提供个性化模型，同时确保差分隐私和对抗毒害攻击的鲁棒性。我们的解决方案采用一种轻量级、完全分散的算法来私密地检测客户端相似性并形成协作组。在每个组内，客户端利用差分隐私知识蒸馏来共同训练他们的模型，保持高准确性同时确保对恶意客户端的抵抗力。我们在流行的基准数据集上使用线性和基于CNN的架构对P4进行评估，跨越各种异质性设置和攻击场景。实验结果表明，P4的准确性比领先的差分隐私点对点方法高出5%至30%，并保持对多达30%恶意客户端的鲁棒性。此外，我们通过在资源受限设备上部署它来证明其实用性，其中两个客户端之间的协作训练仅增加约7秒的开销。

更新时间: 2025-06-25 13:27:36

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2506.20413v1

POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes

Dynamic treatment regimes (DTRs) provide a principled framework for optimizing sequential decision-making in domains where decisions must adapt over time in response to individual trajectories, such as healthcare, education, and digital interventions. However, existing statistical methods often rely on strong positivity assumptions and lack robustness under partial data coverage, while offline reinforcement learning approaches typically focus on average training performance, lack statistical guarantees, and require solving complex optimization problems. To address these challenges, we propose POLAR, a novel pessimistic model-based policy learning algorithm for offline DTR optimization. POLAR estimates the transition dynamics from offline data and quantifies uncertainty for each history-action pair. A pessimistic penalty is then incorporated into the reward function to discourage actions with high uncertainty. Unlike many existing methods that focus on average training performance, POLAR directly targets the suboptimality of the final learned policy and offers theoretical guarantees, without relying on computationally intensive minimax or constrained optimization procedures. To the best of our knowledge, POLAR is the first model-based DTR method to provide both statistical and computational guarantees, including finite-sample bounds on policy suboptimality. Empirical results on both synthetic data and the MIMIC-III dataset demonstrate that POLAR outperforms state-of-the-art methods and yields near-optimal, history-aware treatment strategies.

Updated: 2025-06-25 13:22:57

标题: 极地：一种基于悲观模型的动态治疗方案政策学习算法

摘要: 动态治疗策略（DTRs）为优化顺序决策提供了一个原则性框架，在必须根据个体轨迹随时间调整决策的领域中，如医疗、教育和数字干预。然而，现有的统计方法通常依赖于强大的积极性假设，并且在部分数据覆盖下缺乏稳健性，而离线强化学习方法通常侧重于平均训练性能，缺乏统计保证，并且需要解决复杂的优化问题。为了解决这些挑战，我们提出了POLAR，一种新颖的悲观模型为基础的离线DTR优化策略学习算法。POLAR从离线数据中估计过渡动态并为每个历史-动作对量化不确定性。然后将悲观惩罚纳入奖励函数中，以阻止高不确定性的行动。与许多现有方法专注于平均训练性能不同，POLAR直接针对最终学习策略的次优性，并提供理论保证，而无需依赖计算密集的极小值或约束优化程序。据我们所知，POLAR是第一个提供统计和计算保证的模型为基础的DTR方法，包括政策次优性的有限样本界。在合成数据和MIMIC-III数据集上的实证结果表明，POLAR优于最先进的方法，并产生接近最优的、历史感知的治疗策略。

更新时间: 2025-06-25 13:22:57

领域: stat.ML,cs.IT,cs.LG,math.IT,stat.ME

下载: http://arxiv.org/abs/2506.20406v1

GymPN: A Library for Decision-Making in Process Management Systems

Process management systems support key decisions about the way work is allocated in organizations. This includes decisions on which task to perform next, when to execute the task, and who to assign the task to. Suitable software tools are required to support these decisions in a way that is optimal for the organization. This paper presents a software library, called GymPN, that supports optimal decision-making in business processes using Deep Reinforcement Learning. GymPN builds on previous work that supports task assignment in business processes, introducing two key novelties: support for partial process observability and the ability to model multiple decisions in a business process. These novel elements address fundamental limitations of previous work and thus enable the representation of more realistic process decisions. We evaluate the library on eight typical business process decision-making problem patterns, showing that GymPN allows for easy modeling of the desired problems, as well as learning optimal decision policies.

Updated: 2025-06-25 13:19:42

标题: GymPN：过程管理系统中的决策制定库

摘要: 过程管理系统支持关于组织中工作分配方式的关键决策。这包括决定下一步执行哪项任务、何时执行任务以及分配任务给谁。需要适当的软件工具来支持这些决策，使其对组织最优化。本文介绍了一个名为GymPN的软件库，利用深度强化学习支持业务流程中的最优决策。GymPN建立在以前支持业务流程中任务分配的工作基础上，引入了两个关键创新：支持部分过程可观察性和能够对业务流程中多个决策建模。这些新颖元素解决了以前工作的基本局限，从而使更现实的流程决策能够被表示出来。我们在八个典型的业务流程决策问题模式上评估了该库，结果显示GymPN可以轻松建模所需的问题，并学习最优决策策略。

更新时间: 2025-06-25 13:19:42

领域: cs.AI

下载: http://arxiv.org/abs/2506.20404v1

Smart Ride and Delivery Services with Electric Vehicles: Leveraging Bidirectional Charging for Profit Optimisation

With the rising popularity of electric vehicles (EVs), modern service systems, such as ride-hailing delivery services, are increasingly integrating EVs into their operations. Unlike conventional vehicles, EVs often have a shorter driving range, necessitating careful consideration of charging when fulfilling requests. With recent advances in Vehicle-to-Grid (V2G) technology - allowing EVs to also discharge energy back to the grid - new opportunities and complexities emerge. We introduce the Electric Vehicle Orienteering Problem with V2G (EVOP-V2G): a profit-maximization problem where EV drivers must select customer requests or orders while managing when and where to charge or discharge. This involves navigating dynamic electricity prices, charging station selection, and route constraints. We formulate the problem as a Mixed Integer Programming (MIP) model and propose two near-optimal metaheuristic algorithms: one evolutionary (EA) and the other based on large neighborhood search (LNS). Experiments on real-world data show our methods can double driver profits compared to baselines, while maintaining near-optimal performance on small instances and excellent scalability on larger ones. Our work highlights a promising path toward smarter, more profitable EV-based mobility systems that actively support the energy grid.

Updated: 2025-06-25 13:15:52

标题: 智能电动车乘车和送货服务：利用双向充电优化利润

摘要: 随着电动汽车（EVs）日益普及，现代服务系统，如网约车送货服务，越来越多地将EVs整合到其运营中。与传统车辆不同，EVs通常具有较短的行驶里程，需要在满足需求时仔细考虑充电。随着最近Vehicle-to-Grid（V2G）技术的进步 - 使EVs也能向电网放电 - 新的机遇和复杂性出现。我们介绍了带有V2G的电动汽车定向问题（EVOP-V2G）：这是一个以利润最大化为目标的问题，EV驾驶员必须在管理何时何地充电或放电的同时选择客户请求或订单。这涉及到动态电价、充电站选择和路线约束。我们将问题制定为混合整数规划（MIP）模型，并提出了两种接近最优的元启发式算法：一种是进化算法（EA），另一种是基于大邻域搜索（LNS）的算法。对真实数据的实验显示，我们的方法可以使驾驶员的利润与基线相比翻倍，同时在小规模实例上保持接近最优的性能，在大规模实例上具有出色的可扩展性。我们的工作突显了一条有前途的道路，朝着更智能、更有利可图的基于EV的移动系统前进，这些系统积极支持能源电网。

更新时间: 2025-06-25 13:15:52

领域: cs.AI

下载: http://arxiv.org/abs/2506.20401v1

Variational quantum regression algorithm with encoded data structure

Hybrid variational quantum algorithms (VQAs) are promising for solving practical problems such as combinatorial optimization, quantum chemistry simulation, quantum machine learning, and quantum error correction on noisy quantum computers. However, with typical random ansatz or quantum alternating operator ansatz, derived variational quantum algorithms become a black box that cannot be trusted for model interpretation, not to mention deploying as applications in informing critical decisions: the results of these variational parameters are just rotational angles for the quantum gates and have nothing to do with interpretable values that a model can provide directly. In this paper, we construct the first interpretable quantum regression algorithm, in which the quantum state exactly encodes the classical data table and the variational parameters correspond directly to the regression coefficients, which are real numbers by construction, providing a high degree of model interpretability and minimal cost to optimize due to the right expressiveness. We also take advantage of the encoded data structure to reduce the time complexity of computing the regression map. To shorten the circuit depth for nonlinear regression, our algorithm can be extended by building nonlinear features by classical preprocessing as the independent encoded column vectors. Even though the realization of compressed encoding in superconducting qubits has been achieved by the less noisy compressed encoding recently by the authors, we envision potential quantum utilities with multi-qubit gates implemented in neutral cold atoms and ions.

Updated: 2025-06-25 13:14:47

标题: 使用编码数据结构的变分量子回归算法

摘要: 混合变分量子算法（VQAs）在解决实际问题，如组合优化、量子化学模拟、量子机器学习和在嘈杂的量子计算机上进行量子误差校正方面具有很大潜力。然而，采用典型的随机假设或量子交替算子假设，派生的变分量子算法变成了一个黑匣子，不能被信任用于模型解释，更不用说部署为在关键决策中提供信息的应用程序：这些变分参数的结果只是量子门的旋转角度，与模型直接提供的可解释值无关。在本文中，我们构建了第一个可解释的量子回归算法，其中量子态精确地编码了经典数据表，而变分参数直接对应于回归系数，这些系数是通过构建的实数，提供了高度的模型可解释性和由于正确的表现力而最小化优化成本。我们还利用编码数据结构来减少计算回归映射的时间复杂度。为了缩短非线性回归的电路深度，我们的算法可以通过将经典预处理构建的非线性特征作为独立编码的列向量来扩展。尽管作者最近实现了超导量子比特中较少嘈杂的压缩编码的实现，但我们设想在中性冷原子和离子中实现多量子比特门，从而实现潜在的量子效用。

更新时间: 2025-06-25 13:14:47

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2307.03334v4

scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection

The advent of single-cell multi-omics technologies has enabled the simultaneous profiling of diverse omics layers within individual cells. Integrating such multimodal data provides unprecedented insights into cellular identity, regulatory processes, and disease mechanisms. However, it remains challenging, as current methods often rely on selecting highly variable genes or peaks during preprocessing, which may inadvertently discard crucial biological information. Here, we present scMamba, a foundation model designed to integrate single-cell multi-omics data without the need for prior feature selection while preserving genomic positional information. scMamba introduces a patch-based cell tokenization strategy that treats genomics regions as words (tokens) and cells as sentences. Building upon the concept of state space duality, scMamba distills rich biological insights from high-dimensional, sparse single-cell multi-omics data. Additionally, our novel contrastive learning approach, enhanced with cosine similarity regularization, enables superior alignment across omics layers compared to traditional methods. Systematic benchmarking across multiple datasets demonstrates that scMamba significantly outperforms state-of-the-art methods in preserving biological variation, aligning omics layers, and enhancing key downstream tasks such as clustering, cell type annotation, and trajectory inference. Our findings position scMamba as a powerful tool for large-scale single-cell multi-omics integration, capable of handling large-scale atlases and advancing biological discovery.

Updated: 2025-06-25 12:58:01

标题: scMamba：用于单细胞多组学整合的可扩展基础模型，超出高度可变特征选择

摘要: 单细胞多组学技术的出现使得可以在单个细胞内同时对多种组学层面进行分析。整合这种多模态数据为细胞的身份、调节过程和疾病机制提供了前所未有的洞察。然而，目前的方法往往依赖于在预处理过程中选择高度可变的基因或峰值，这可能无意中丢弃了关键的生物信息。在这里，我们介绍了scMamba，这是一个基础模型，旨在集成单细胞多组学数据，而无需事先进行特征选择，同时保留基因组位置信息。scMamba引入了一种基于补丁的细胞标记策略，将基因组区域视为单词（标记），将细胞视为句子。基于状态空间二元性的概念，scMamba从高维稀疏的单细胞多组学数据中提炼出丰富的生物洞察力。此外，我们的新颖对比学习方法，结合余弦相似性正则化，使得与传统方法相比，能够在组学层面上实现更好的对齐。对多个数据集进行系统性基准测试表明，scMamba在保留生物变异性、对齐组学层面和增强关键下游任务（如聚类、细胞类型注释和轨迹推断）方面明显优于最先进的方法。我们的发现将scMamba定位为一个强大的工具，用于大规模单细胞多组学整合，能够处理大规模图谱并推动生物发现。

更新时间: 2025-06-25 12:58:01

领域: q-bio.CB,cs.LG

下载: http://arxiv.org/abs/2506.20697v1

Paladin-mini: A Compact and Efficient Grounding Model Excelling in Real-World Scenarios

This paper introduces two significant contributions to address the issue of grounding claims in a given context. Grounding means that given a context (document) and a claim, there's at least one supportive evidence for the claim in the document. We will introduce Paladin-mini, a compact (3.8B parameters) open-source classifier model (used for labeling data as grounded or ungrounded) engineered for robust performance in real-world scenarios, and the grounding-benchmark, a new evaluation dataset designed to assess performance on critical reasoning tasks. We'll also demonstrate the results of Paladin-mini with benchmarks against the current State-of-the-art and share clear and reproducible results.

Updated: 2025-06-25 12:50:28

标题: 圣骑士-mini：在真实场景中表现出色的紧凑高效接地模型

摘要: 这篇论文介绍了两项重要贡献，以解决在特定背景下支持主张的问题。支持主张意味着在给定的背景（文档）和主张中，文档中至少有一个支持主张的证据。我们将介绍Paladin-mini，一个紧凑的（3.8B参数）开源分类器模型（用于标记数据为支持或不支持的），专门设计用于在现实场景中具有强大性能，以及grounding-benchmark，一个新的评估数据集，旨在评估关键推理任务的性能。我们还将展示Paladin-mini的结果，并与当前的最先进技术进行基准测试，并分享清晰和可复制的结果。

更新时间: 2025-06-25 12:50:28

领域: cs.AI

下载: http://arxiv.org/abs/2506.20384v1

Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking

Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities. However, their practicality is limited on resource-constrained devices because of their slow processing speeds. To address this challenge, we present HiT, a novel family of efficient tracking models that achieve high performance while maintaining fast operation across various devices. The core innovation of HiT lies in its Bridge Module, which connects lightweight transformers to the tracking framework, enhancing feature representation quality. Additionally, we introduce a dual-image position encoding approach to effectively encode spatial information. HiT achieves an impressive speed of 61 frames per second (fps) on the NVIDIA Jetson AGX platform, alongside a competitive AUC of 64.6% on the LaSOT benchmark, outperforming all previous efficient trackers.Building on HiT, we propose DyHiT, an efficient dynamic tracker that flexibly adapts to scene complexity by selecting routes with varying computational requirements. DyHiT uses search area features extracted by the backbone network and inputs them into an efficient dynamic router to classify tracking scenarios. Based on the classification, DyHiT applies a divide-and-conquer strategy, selecting appropriate routes to achieve a superior trade-off between accuracy and speed. The fastest version of DyHiT achieves 111 fps on NVIDIA Jetson AGX while maintaining an AUC of 62.4% on LaSOT.Furthermore, we introduce a training-free acceleration method based on the dynamic routing architecture of DyHiT. This method significantly improves the execution speed of various high-performance trackers without sacrificing accuracy. For instance, our acceleration method enables the state-of-the-art tracker SeqTrack-B256 to achieve a 2.68 times speedup on an NVIDIA GeForce RTX 2080 Ti GPU while maintaining the same AUC of 69.9% on the LaSOT.

Updated: 2025-06-25 12:46:46

标题: 利用轻量级分层ViT和动态框架实现高效的视觉跟踪

摘要: 基于Transformer的视觉跟踪器展示了显著的进展，这归功于其强大的建模能力。然而，由于其处理速度较慢，它在资源受限设备上的实用性受到限制。为了解决这一挑战，我们提出了HiT，一种新颖的高效跟踪模型系列，可以在各种设备上实现高性能同时保持快速运行。HiT的核心创新在于其桥接模块，将轻量级Transformer连接到跟踪框架中，提高了特征表示质量。此外，我们引入了一种双图像位置编码方法，有效地编码空间信息。在NVIDIA Jetson AGX平台上，HiT实现了惊人的每秒61帧的速度，同时在LaSOT基准上取得了64.6%的竞争性AUC，超过了所有以前的高效跟踪器。在HiT的基础上，我们提出了DyHiT，一种高效的动态跟踪器，通过选择具有不同计算需求的路线灵活适应场景复杂性。DyHiT使用骨干网络提取的搜索区域特征，并将其输入到高效的动态路由器中对跟踪场景进行分类。根据分类结果，DyHiT采用分而治之的策略，选择适当的路线在准确性和速度之间实现卓越的权衡。最快版本的DyHiT在NVIDIA Jetson AGX上实现了每秒111帧的速度，同时在LaSOT上保持了62.4%的AUC。此外，我们提出了一种基于DyHiT动态路由架构的无需训练的加速方法。这种方法显著提高了各种高性能跟踪器的执行速度，而不会牺牲准确性。例如，我们的加速方法使最先进的跟踪器SeqTrack-B256在NVIDIA GeForce RTX 2080 Ti GPU上实现了2.68倍的加速，同时在LaSOT上保持了69.9%的AUC。

更新时间: 2025-06-25 12:46:46

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2506.20381v1

TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

Satellite remote sensing (RS) enables a wide array of downstream Earth observation (EO) applications, including climate modeling, carbon accounting, and strategies for conservation and sustainable land use. We present TESSERA, a novel Remote Sensing Foundation Model (RSFM) that uses Self-Supervised Learning (SSL) to generate global, robust representations at 10m scale from pixel-level satellite time series data. TESSERA combines information from only optical and SAR data streams using two parallel Transformer-based encoders: one dedicated to Sentinel-1 SAR polarizations and another to Sentinel-2 MSI data (10 selected spectral bands) to create representations that are then fused using a multilayer perceptron (MLP), resulting in a global representation map covering the years 2017 to 2024. Our precomputed representations set a new state-of-the-art performance benchmark and our open-source approach democratizes access to high-performance, high-resolution representations. We benchmark the performance of TESSERA in five diverse tasks, comparing our work with state-of-the-art task-specific models and other foundation models. Our results show that TESSERA outperforms both traditional RS baselines and the leading geospatial foundation models in these diverse downstream tasks.

Updated: 2025-06-25 12:46:26

标题: TESSERA：用于地球表面光谱的时间嵌入表示和分析

摘要: 卫星遥感（RS）技术使得一系列地球观测（EO）应用变得可能，包括气候建模、碳核算以及保护和可持续土地利用策略。我们提出了一种新颖的遥感基础模型（RSFM）TESSERA，利用自监督学习（SSL）从像素级卫星时间序列数据中生成全球、稳健的表示，尺度为10米。TESSERA使用两个并行的基于Transformer的编码器，一个专门用于Sentinel-1 SAR极化数据，另一个用于Sentinel-2 MSI数据（10个选择的光谱波段），通过多层感知器（MLP）融合这些表示，生成覆盖2017年至2024年的全球表示图。我们预先计算的表示设定了新的最先进性能基准，并且我们的开源方法使得高性能、高分辨率表示的获取变得更加民主。我们在五项不同任务中对TESSERA的性能进行了基准测试，将我们的工作与最先进的特定任务模型和其他基础模型进行比较。结果显示，TESSERA在这些多样化的下游任务中优于传统RS基准线和主要的地理空间基础模型。

更新时间: 2025-06-25 12:46:26

领域: cs.LG

下载: http://arxiv.org/abs/2506.20380v1

WyckoffDiff -- A Generative Diffusion Model for Crystal Symmetry

Crystalline materials often exhibit a high level of symmetry. However, most generative models do not account for symmetry, but rather model each atom without any constraints on its position or element. We propose a generative model, Wyckoff Diffusion (WyckoffDiff), which generates symmetry-based descriptions of crystals. This is enabled by considering a crystal structure representation that encodes all symmetry, and we design a novel neural network architecture which enables using this representation inside a discrete generative model framework. In addition to respecting symmetry by construction, the discrete nature of our model enables fast generation. We additionally present a new metric, Fr\'echet Wrenformer Distance, which captures the symmetry aspects of the materials generated, and we benchmark WyckoffDiff against recently proposed generative models for crystal generation. As a proof-of-concept study, we use WyckoffDiff to find new materials below the convex hull of thermodynamical stability.

Updated: 2025-06-25 12:45:51

标题: WyckoffDiff--一种用于晶体对称性的生成扩散模型

摘要: 结晶材料通常表现出较高的对称性。然而，大多数生成模型并不考虑对称性，而是对每个原子进行建模，而无需对其位置或元素施加任何约束。我们提出了一种生成模型，Wyckoff Diffusion（WyckoffDiff），它生成基于对称性的晶体描述。这是通过考虑编码所有对称性的晶体结构表示来实现的，我们设计了一种新颖的神经网络架构，使其能够在离散生成模型框架内使用这种表示。除了通过设计尊重对称性外，我们模型的离散性质还能够实现快速生成。我们还提出了一种新的度量标准，Fr\'echet Wrenformer Distance，它捕捉了所生成材料的对称性方面，并将WyckoffDiff与最近提出的晶体生成生成模型进行了基准测试。作为一个概念验证研究，我们使用WyckoffDiff在热力学稳定性凸包下找到了新材料。

更新时间: 2025-06-25 12:45:51

领域: cond-mat.mtrl-sci,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.06485v3

Chemical knowledge-informed framework for privacy-aware retrosynthesis learning

Chemical reaction data is a pivotal asset, driving advances in competitive fields such as pharmaceuticals, materials science, and industrial chemistry. Its proprietary nature renders it sensitive, as it often includes confidential insights and competitive advantages organizations strive to protect. However, in contrast to this need for confidentiality, the current standard training paradigm for machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models. This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries and frequent data transmission between entities, potentially exposing proprietary information to unauthorized access or interception during storage and transfer. In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models. CKIF enables distributed training across multiple chemical organizations without compromising the confidentiality of proprietary reaction data. Instead of gathering raw reaction data, CKIF learns retrosynthesis models through iterative, chemical knowledge-informed aggregation of model parameters. In particular, the chemical properties of predicted reactants are leveraged to quantitatively assess the observable behaviors of individual models, which in turn determines the adaptive weights used for model aggregation. On a variety of reaction datasets, CKIF outperforms several strong baselines by a clear margin.

Updated: 2025-06-25 12:45:28

标题: 基于化学知识的隐私感知逆合成学习框架

摘要: 化学反应数据是一个关键资产，推动了竞争激烈领域（如制药、材料科学和工业化学）的进步。其专有性使其敏感，因为它经常包含机构努力保护的机密见解和竞争优势。然而，与保密需求相反，基于机器学习的反合成的当前标准训练范式将反应数据从多个来源汇集到一个单一边缘以训练预测模型。这种范式存在着相当大的隐私风险，因为它需要跨组织边界广泛数据可用性和实体之间频繁的数据传输，可能会在存储和传输过程中将专有信息暴露给未经授权的访问或拦截。在本研究中，我们介绍了基于化学知识的框架（CKIF），这是一种保护隐私的学习反合成模型方法。CKIF实现了跨多个化学组织的分布式训练，而不会危及专有反应数据的保密性。CKIF不是收集原始反应数据，而是通过迭代、基于化学知识的模型参数聚合学习反合成模型。具体来说，利用预测反应物的化学特性定量评估单个模型的可观察行为，从而确定用于模型聚合的自适应权重。在各种反应数据集上，CKIF的表现明显优于几种强基线。

更新时间: 2025-06-25 12:45:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.19119v2

SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities

Mixture of Experts (MoE) architectures have become a key approach for scaling large language models, with growing interest in extending them to multimodal tasks. Existing methods to build multimodal MoE models either incur high training costs or suffer from degraded language capabilities when adapting pretrained models. To address this, we propose Soft ModalityAware Routing (SMAR), a novel regularization technique that uses Kullback Leibler divergence to control routing probability distributions across modalities, encouraging expert specialization without modifying model architecture or heavily relying on textual data. Experiments on visual instruction tuning show that SMAR preserves language ability at 86.6% retention with only 2.5% pure text, outperforming baselines while maintaining strong multimodal performance. Our approach offers a practical and efficient solution to balance modality differentiation and language capabilities in multimodal MoE models.

Updated: 2025-06-25 12:36:55

标题: SMAR：基于MoE的多模态大型语言模型的软模态感知路由策略，保留语言能力

摘要: 混合专家（MoE）结构已成为扩展大型语言模型的关键方法，越来越多的人对将它们扩展到多模态任务感兴趣。目前用于构建多模态MoE模型的现有方法要么造成高训练成本，要么在调整预训练模型时语言能力下降。为了解决这个问题，我们提出了Soft ModalityAware Routing（SMAR），一种新颖的正则化技术，它使用Kullback Leibler散度来控制跨模态的路由概率分布，鼓励专家进行专业化，而无需修改模型架构或过多依赖文本数据。在视觉指导调整实验中，SMAR以86.6%的保留率仅使用2.5%纯文本保留了语言能力，优于基线同时保持强大的多模态性能。我们的方法为在多模态MoE模型中平衡模态差异和语言能力提供了实用和高效的解决方案。

更新时间: 2025-06-25 12:36:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.06406v2

CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition

We introduce CARMA, a system for situational grounding in human-robot group interactions. Effective collaboration in such group settings requires situational awareness based on a consistent representation of present persons and objects coupled with an episodic abstraction of events regarding actors and manipulated objects. This calls for a clear and consistent assignment of instances, ensuring that robots correctly recognize and track actors, objects, and their interactions over time. To achieve this, CARMA uniquely identifies physical instances of such entities in the real world and organizes them into grounded triplets of actors, objects, and actions. To validate our approach, we conducted three experiments, where multiple humans and a robot interact: collaborative pouring, handovers, and sorting. These scenarios allow the assessment of the system's capabilities as to role distinction, multi-actor awareness, and consistent instance identification. Our experiments demonstrate that the system can reliably generate accurate actor-action-object triplets, providing a structured and robust foundation for applications requiring spatiotemporal reasoning and situated decision-making in collaborative settings.

Updated: 2025-06-25 12:36:49

标题: CARMA：结合视觉语言模型、物体和动作识别的上下文感知情境化人机群体互动 grounded

摘要: 我们介绍了CARMA，这是一个用于人机群体互动情境接地的系统。在这种群体环境中有效地协作需要基于对当前人物和物体的一致表示以及关于演员和操作对象的事件的情节抽象的情境意识。这需要对实例进行清晰和一致的指定，确保机器人能够正确识别和跟踪演员、物体及它们之间的互动。为了实现这一点，CARMA在现实世界中独特地识别这些实体的物理实例，并将它们组织成演员、物体和动作的接地三元组。为了验证我们的方法，我们进行了三个实验，多个人类和一个机器人进行互动：协作倒水、交接和分拣。这些场景允许对系统在角色区分、多演员意识和一致实例识别方面的能力进行评估。我们的实验表明，该系统能够可靠地生成准确的演员-动作-物体三元组，为需要时空推理和情境决策的协作环境中的应用提供了结构化和稳健的基础。

更新时间: 2025-06-25 12:36:49

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2506.20373v1

InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking

This paper introduces a novel deep learning framework for robust image zero-watermarking based on distortion-invariant feature learning. As a zero-watermarking scheme, our method leaves the original image unaltered and learns a reference signature through optimization in the feature space. The proposed framework consists of two key modules. In the first module, a feature extractor is trained via noise-adversarial learning to generate representations that are both invariant to distortions and semantically expressive. This is achieved by combining adversarial supervision against a distortion discriminator and a reconstruction constraint to retain image content. In the second module, we design a learning-based multibit zero-watermarking scheme where the trained invariant features are projected onto a set of trainable reference codes optimized to match a target binary message. Extensive experiments on diverse image datasets and a wide range of distortions show that our method achieves state-of-the-art robustness in both feature stability and watermark recovery. Comparative evaluations against existing self-supervised and deep watermarking techniques further highlight the superiority of our framework in generalization and robustness.

Updated: 2025-06-25 12:32:08

标题: InvZW：通过噪声对抗训练实现稳健图像无水印的不变特征学习

摘要: 本文介绍了一种基于扭曲不变特征学习的鲁棒图像零水印深度学习框架。作为一种零水印方案，我们的方法保持原始图像不变，并通过在特征空间中优化学习参考签名。所提出的框架包括两个关键模块。在第一个模块中，通过噪声对抗学习训练特征提取器，生成对扭曲不变且语义表达丰富的表示。通过结合对抗监督和重构约束，以保留图像内容，实现这一点。在第二个模块中，我们设计了一种基于学习的多比特零水印方案，其中训练后的不变特征投影到一组可训练的参考代码，以匹配目标二进制消息。对多种图像数据集和广泛的扭曲进行了大量实验，结果表明我们的方法在特征稳定性和水印恢复方面达到了最新的鲁棒性。与现有的自监督和深度水印技术的比较评估进一步突显了我们框架在泛化和鲁棒性方面的优越性。

更新时间: 2025-06-25 12:32:08

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2506.20370v1

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

Reinforcement Learning (RL) is a popular machine learning paradigm where intelligent agents interact with the environment to fulfill a long-term goal. Driven by the resurgence of deep learning, Deep RL (DRL) has witnessed great success over a wide spectrum of complex control tasks. Despite the encouraging results achieved, the deep neural network-based backbone is widely deemed as a black box that impedes practitioners to trust and employ trained agents in realistic scenarios where high security and reliability are essential. To alleviate this issue, a large volume of literature devoted to shedding light on the inner workings of the intelligent agents has been proposed, by constructing intrinsic interpretability or post-hoc explainability. In this survey, we provide a comprehensive review of existing works on eXplainable RL (XRL) and introduce a new taxonomy where prior works are clearly categorized into model-explaining, reward-explaining, state-explaining, and task-explaining methods. We also review and highlight RL methods that conversely leverage human knowledge to promote learning efficiency and performance of agents while this kind of method is often ignored in XRL field. Some challenges and opportunities in XRL are discussed. This survey intends to provide a high-level summarization of XRL and to motivate future research on more effective XRL solutions. Corresponding open source codes are collected and categorized at https://github.com/Plankson/awesome-explainable-reinforcement-learning.

Updated: 2025-06-25 12:31:31

标题: 关于可解释强化学习的调查：概念、算法、挑战

摘要: 强化学习（RL）是一种流行的机器学习范例，智能代理与环境交互以实现长期目标。受深度学习的复兴驱动，深度强化学习（DRL）在各种复杂控制任务中取得了巨大成功。尽管取得了令人鼓舞的成果，但基于深度神经网络的骨干被普遍认为是一个黑匣子，阻碍了从业者在高安全性和可靠性至关重要的现实场景中信任和使用训练代理。为了缓解这一问题，大量文献致力于揭示智能代理的内部工作方式，通过构建内在可解释性或事后可解释性。在本调查中，我们对现有的解释性RL（XRL）作品进行了全面回顾，并引入了一个新的分类法，将先前的作品清晰地分类为模型解释、奖励解释、状态解释和任务解释方法。我们还回顾并强调了相反利用人类知识促进代理学习效率和性能的RL方法，而这种方法在XRL领域经常被忽略。讨论了XRL中的一些挑战和机遇。本调查旨在对XRL进行高层次总结，并激励未来研究更有效的XRL解决方案。相应的开源代码已在https://github.com/Plankson/awesome-explainable-reinforcement-learning上收集和分类。

更新时间: 2025-06-25 12:31:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2211.06665v5

Self-Supervised Graph Learning via Spectral Bootstrapping and Laplacian-Based Augmentations

We present LaplaceGNN, a novel self-supervised graph learning framework that bypasses the need for negative sampling by leveraging spectral bootstrapping techniques. Our method integrates Laplacian-based signals into the learning process, allowing the model to effectively capture rich structural representations without relying on contrastive objectives or handcrafted augmentations. By focusing on positive alignment, LaplaceGNN achieves linear scaling while offering a simpler, more efficient, self-supervised alternative for graph neural networks, applicable across diverse domains. Our contributions are twofold: we precompute spectral augmentations through max-min centrality-guided optimization, enabling rich structural supervision without relying on handcrafted augmentations, then we integrate an adversarial bootstrapped training scheme that further strengthens feature learning and robustness. Our extensive experiments on different benchmark datasets show that LaplaceGNN achieves superior performance compared to state-of-the-art self-supervised graph methods, offering a promising direction for efficiently learning expressive graph representations.

Updated: 2025-06-25 12:23:23

标题: 自监督图学习:基于谱引导和拉普拉斯增强的方法

摘要: 我们提出了LaplaceGNN，这是一个新颖的自监督图学习框架，通过利用谱引导技术，绕过了负采样的需要。我们的方法将基于拉普拉斯的信号整合到学习过程中，使模型能够有效地捕捉丰富的结构表示，而无需依赖对比目标或手工增强。通过专注于正向对齐，LaplaceGNN实现了线性扩展，同时为图神经网络提供了一种更简单、更高效的自监督替代方案，可应用于不同领域。我们的贡献有两个方面：我们通过最大-最小中心度引导优化预先计算谱增强，实现了丰富的结构监督，而无需依赖手工增强，然后我们整合了一种对抗性引导训练方案，进一步加强特征学习和稳健性。我们在不同基准数据集上进行了大量实验，结果显示LaplaceGNN相比最先进的自监督图方法表现更优，为高效学习表达图提供了一个有前途的方向。

更新时间: 2025-06-25 12:23:23

领域: cs.LG,cs.AI,cs.DS

下载: http://arxiv.org/abs/2506.20362v1

Towards Interpretable and Efficient Feature Selection in Trajectory Datasets: A Taxonomic Approach

Trajectory analysis is not only about obtaining movement data, but it is also of paramount importance in understanding the pattern in which an object moves through space and time, as well as in predicting its next move. Due to the significant interest in the area, data collection has improved substantially, resulting in a large number of features becoming available for training and predicting models. However, this introduces a high-dimensionality-induced feature explosion problem, which reduces the efficiency and interpretability of the data, thereby reducing the accuracy of machine learning models. To overcome this issue, feature selection has become one of the most prevalent tools. Thus, the objective of this paper was to introduce a taxonomy-based feature selection method that categorizes features based on their internal structure. This approach classifies the data into geometric and kinematic features, further categorizing them into curvature, indentation, speed, and acceleration. The comparative analysis indicated that a taxonomy-based approach consistently achieved comparable or superior predictive performance. Furthermore, due to the taxonomic grouping, which reduces combinatorial space, the time taken to select features was drastically reduced. The taxonomy was also used to gain insights into what feature sets each dataset was more sensitive to. Overall, this study provides robust evidence that a taxonomy-based feature selection method can add a layer of interpretability, reduce dimensionality and computational complexity, and contribute to high-level decision-making. It serves as a step toward providing a methodological framework for researchers and practitioners dealing with trajectory datasets and contributing to the broader field of explainable artificial intelligence.

Updated: 2025-06-25 12:21:20

标题: 走向可解释和高效的轨迹数据集特征选择：一种分类方法

摘要: 轨迹分析不仅仅是获取运动数据，而且在理解物体在空间和时间中移动的模式，以及预测其下一步移动方面至关重要。由于对该领域的极大兴趣，数据采集得到了显著改善，导致大量特征可用于训练和预测模型。然而，这引入了高维度诱导的特征爆炸问题，降低了数据的效率和可解释性，从而降低了机器学习模型的准确性。为了克服这个问题，特征选择已成为最为普遍的工具之一。因此，本文的目标是介绍一种基于分类学的特征选择方法，根据其内部结构对特征进行分类。这种方法将数据分类为几何和动力学特征，进一步将它们分类为曲率、缩进、速度和加速度。比较分析表明，基于分类学的方法始终实现了可比或优越的预测性能。此外，由于减少了组合空间的分类学分组，特征选择所需的时间大大缩短。分类法还用于洞察每个数据集对哪些特征集更为敏感。总体而言，这项研究提供了有力证据，即基于分类学的特征选择方法可以增加可解释性层次，减少维度和计算复杂性，并有助于高级决策制定。它作为为处理轨迹数据集的研究人员和从业者提供方法论框架的一步，并为可解释人工智能领域的更广泛贡献。

更新时间: 2025-06-25 12:21:20

领域: cs.LG

下载: http://arxiv.org/abs/2506.20359v1

Tabular Feature Discovery With Reasoning Type Exploration

Feature engineering for tabular data remains a critical yet challenging step in machine learning. Recently, large language models (LLMs) have been used to automatically generate new features by leveraging their vast knowledge. However, existing LLM-based approaches often produce overly simple or repetitive features, partly due to inherent biases in the transformations the LLM chooses and the lack of structured reasoning guidance during generation. In this paper, we propose a novel method REFeat, which guides an LLM to discover diverse and informative features by leveraging multiple types of reasoning to steer the feature generation process. Experiments on 59 benchmark datasets demonstrate that our approach not only achieves higher predictive accuracy on average, but also discovers more diverse and meaningful features. These results highlight the promise of incorporating rich reasoning paradigms and adaptive strategy selection into LLM-driven feature discovery for tabular data.

Updated: 2025-06-25 12:18:34

标题: 通过推理类型探索进行表格特征发现

摘要: 对于表格数据的特征工程仍然是机器学习中至关重要但具有挑战性的一步。最近，大型语言模型（LLMs）已被用于通过利用它们的广泛知识自动生成新特征。然而，现有基于LLM的方法通常产生过于简单或重复的特征，部分原因是LLM选择的转换具有固有偏见，并且在生成过程中缺乏结构化推理指导。在本文中，我们提出了一种新颖的方法REFeat，该方法通过利用多种类型的推理来引导LLM发现多样且具有信息量的特征。在59个基准数据集上的实验证明，我们的方法不仅平均预测准确率更高，而且发现了更多多样且有意义的特征。这些结果突显了将丰富的推理范式和自适应策略选择融入LLM驱动的表格数据特征发现中的潜力。

更新时间: 2025-06-25 12:18:34

领域: cs.AI

下载: http://arxiv.org/abs/2506.20357v1

A foundation model with multi-variate parallel attention to generate neuronal activity

Learning from multi-variate time-series with heterogeneous channel configurations remains a fundamental challenge for deep neural networks (DNNs), particularly in clinical domains such as intracranial electroencephalography (iEEG), where channel setups vary widely across subjects. In this work, we introduce multi-variate parallel attention (MVPA), a novel self-attention mechanism that disentangles content, temporal, and spatial attention, enabling flexible, generalizable, and efficient modeling of time-series data with varying channel counts and configurations. We use MVPA to build MVPFormer, a generative foundation model for human electrophysiology, trained to predict the evolution of iEEG signals across diverse subjects. To support this and future effort by the community, we release the SWEC iEEG dataset, the largest publicly available iEEG dataset to date, comprising nearly 10,000 hours of recordings from heterogeneous clinical sources. MVPFormer leverages MVPA to achieve strong generalization across subjects, demonstrating expert-level performance in seizure detection and outperforming state-of-the-art Transformer baselines on our SWEC, the MAYO, and the FNUSA dataset. We further validate MVPA on standard time-series forecasting and classification tasks, where it matches or exceeds existing attention-based models. Together, our contributions establish MVPA as a general-purpose attention mechanism for heterogeneous time-series and MVPFormer as the first open-source, open-weights, and open-data iEEG foundation model with state-of-the-art clinical performance. The code is available at https://github.com/IBM/multi-variate-parallel-transformer. The SWEC iEEG dataset is available at https://mb-neuro.medical-blocks.ch/public_access/databases/ieeg/swec_ieeg.

Updated: 2025-06-25 12:07:10

标题: 一个具有多变量并行注意力的基础模型用于生成神经元活动

摘要: 学习具有异构通道配置的多变量时间序列仍然是深度神经网络（DNN）面临的一个基本挑战，特别是在临床领域，如颅内脑电图（iEEG），其中通道设置在不同受试者之间变化很大。在这项工作中，我们引入了多变量并行注意力（MVPA），这是一种新颖的自注意机制，可以解开内容、时间和空间注意力，从而实现对具有不同通道计数和配置的时间序列数据的灵活、可泛化和高效建模。我们使用MVPA构建了MVPFormer，这是一个用于人类电生理学的生成基础模型，经过训练可以预测不同受试者之间iEEG信号的演变。为了支持社区的这一和未来努力，我们发布了SWEC iEEG数据集，这是迄今为止最大的公开可用iEEG数据集，包括来自异质临床来源的近10,000小时的记录。MVPFormer利用MVPA实现了跨受试者的强大泛化，展示了在癫痫检测方面的专家级性能，并在我们的SWEC、MAYO和FNUSA数据集上表现优于最先进的Transformer基线。我们进一步在标准时间序列预测和分类任务上验证了MVPA，在这些任务中，它与或超过了现有的基于注意力的模型。总的来说，我们的贡献将MVPA确立为异质时间序列的通用注意力机制，将MVPFormer确立为首个具有最先进临床性能的开源、开放权重和开放数据iEEG基础模型。代码可在https://github.com/IBM/multi-variate-parallel-transformer获取。SWEC iEEG数据集可在https://mb-neuro.medical-blocks.ch/public_access/databases/ieeg/swec_ieeg获取。

更新时间: 2025-06-25 12:07:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20354v1

DipSVD: Dual-importance Protected SVD for Efficient LLM Compression

The ever-increasing computational demands and deployment costs of large language models (LLMs) have spurred numerous compressing methods. Compared to quantization and unstructured pruning, SVD compression offers superior hardware compatibility and theoretical guarantees. However, existing SVD-based methods focus on the overall discrepancy between the original and compressed matrices while overlooking the protection of critical components within the matrix, which leads to inferior performance in the compressed models. This paper proposes a dual-level importance protection mechanism to enhance SVD-based compression methods: (1) local importance protection: preserving the most critical singular vectors within each weight matrix through channel-weighted data whitening; and (2) global importance protection: enabling less important layers to bear a greater portion of the compression burden through either a heuristic or optimization-based approach, thereby minimizing the impact of compression on critical layers. Extensive experiments demonstrate that DipSVD outperforms existing SVD-based compression approaches across multiple benchmarks, achieving superior model performance especially at high model compression ratios.

Updated: 2025-06-25 12:04:53

标题: DipSVD：双重重要性保护的SVD用于高效的LLM压缩

摘要: 随着大型语言模型（LLMs）计算需求和部署成本不断增加，已经推动了许多压缩方法的发展。与量化和非结构化剪枝相比，SVD压缩提供了更好的硬件兼容性和理论保证。然而，现有基于SVD的方法侧重于原始和压缩矩阵之间的整体差异，而忽视了矩阵中关键组件的保护，导致压缩模型性能较差。本文提出了一种双层重要性保护机制，以增强基于SVD的压缩方法：（1）局部重要性保护：通过通道加权数据白化，保留每个权重矩阵中最关键的奇异向量；（2）全局重要性保护：通过启发式或基于优化的方法，使较不重要的层承担更大比例的压缩负担，从而最小化压缩对关键层的影响。大量实验证明，DipSVD在多个基准测试中优于现有的基于SVD的压缩方法，在高模型压缩比下尤为突出地实现了优越的模型性能。

更新时间: 2025-06-25 12:04:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20353v1

Backpropagation Through Time For Networks With Long-Term Dependencies

Backpropagation through time (BPTT) is a technique of updating tuned parameters within recurrent neural networks (RNNs). Several attempts at creating such an algorithm have been made including: Nth Ordered Approximations and Truncated-BPTT. These methods approximate the backpropagation gradients under the assumption that the RNN only utilises short-term dependencies. This is an acceptable assumption to make for the current state of artificial neural networks. As RNNs become more advanced, a shift towards influence by long-term dependencies is likely. Thus, a new method for backpropagation is required. We propose using the 'discrete forward sensitivity equation' and a variant of it for single and multiple interacting recurrent loops respectively. This solution is exact and also allows the network's parameters to vary between each subsequent step, however it does require the computation of a Jacobian.

Updated: 2025-06-25 12:04:53

标题: Backpropagation Through Time 对具有长期依赖性的网络的翻译

摘要: 时序反向传播（BPTT）是一种在循环神经网络（RNNs）中更新调整参数的技术。已经尝试过几种创建这种算法的方法，包括：N阶近似和截断BPTT。这些方法在假设RNN仅利用短期依赖性的情况下近似反向传播梯度。对于目前的人工神经网络状态来说，这是一个可以接受的假设。随着RNN变得更加先进，往往会向长期依赖性的影响转变。因此，需要一种新的反向传播方法。我们提议使用“离散前向敏感性方程”及其单个和多个相互作用的循环回路的变体。这个解决方案是精确的，也允许网络的参数在每个后续步骤之间变化，但需要计算雅可比矩阵。

更新时间: 2025-06-25 12:04:53

领域: cs.LG,68T07, 68Q32,I.2.6

下载: http://arxiv.org/abs/2103.15589v3

Stabilization of industrial processes with time series machine learning

The stabilization of time series processes is a crucial problem that is ubiquitous in various industrial fields. The application of machine learning to its solution can have a decisive impact, improving both the quality of the resulting stabilization with less computational resources required. In this work, we present a simple pipeline consisting of two neural networks: the oracle predictor and the optimizer, proposing a substitution of the point-wise values optimization to the problem of the neural network training, which successfully improves stability in terms of the temperature control by about 3 times compared to ordinary solvers.

Updated: 2025-06-25 12:04:23

标题: 用时间序列机器学习稳定化工业流程

摘要: 时间序列过程的稳定是各种工业领域普遍存在的一个关键问题。将机器学习应用于其解决方案可以产生决定性影响，改善结果的稳定性，并减少所需的计算资源。在这项工作中，我们提出了一个由两个神经网络组成的简单流程：预测器和优化器，提议将逐点值优化替换为神经网络训练问题，成功地改善了温度控制的稳定性，比普通求解器提高了大约3倍。

更新时间: 2025-06-25 12:04:23

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2506.22502v1

It's not you, it's me -- Global urban visual perception varies across demographics and personalities

Understanding people's preferences and needs is crucial for urban planning decisions, yet current approaches often combine them from multi-cultural and multi-city populations, obscuring important demographic differences and risking amplifying biases. We conducted a large-scale urban visual perception survey of streetscapes worldwide using street view imagery, examining how demographics -- including gender, age, income, education, race and ethnicity, and, for the first time, personality traits -- shape perceptions among 1,000 participants, with balanced demographics, from five countries and 45 nationalities. This dataset, introduced as Street Perception Evaluation Considering Socioeconomics (SPECS), exhibits statistically significant differences in perception scores in six traditionally used indicators (safe, lively, wealthy, beautiful, boring, and depressing) and four new ones we propose (live nearby, walk, cycle, green) among demographics and personalities. We revealed that location-based sentiments are carried over in people's preferences when comparing urban streetscapes with other cities. Further, we compared the perception scores based on where participants and streetscapes are from. We found that an off-the-shelf machine learning model trained on an existing global perception dataset tends to overestimate positive indicators and underestimate negative ones compared to human responses, suggesting that targeted intervention should consider locals' perception. Our study aspires to rectify the myopic treatment of street perception, which rarely considers demographics or personality traits.

Updated: 2025-06-25 12:02:08

标题: 这不是你的问题，而是我的问题——全球城市视觉感知在不同人口统计和个性特征间存在差异

摘要: 理解人们的偏好和需求对城市规划决策至关重要，然而目前的方法往往将它们结合起来，来自多元文化和多城市人口，模糊了重要的人口统计差异，并有可能加剧偏见。我们利用街景图像进行了一项大规模的城市视觉感知调查，研究了包括性别、年龄、收入、教育、种族和族裔在内的人口统计数据，以及首次考虑到个性特征，探讨了1,000名来自五个国家和45个国籍的参与者之间的感知差异。这个数据集，被称为"Street Perception Evaluation Considering Socioeconomics" (SPECS)，在六个传统指标（安全、繁华、富裕、美丽、无聊、令人沮丧）和我们提出的四个新指标（住在附近、步行、骑车、绿色）中展现出统计学上显著的感知分数差异。我们发现基于位置的情绪在人们比较城市景观时会延续。此外，我们比较了基于参与者和城市景观的感知分数。我们发现，一个基于现有全球感知数据集训练的现成机器学习模型倾向于高估积极指标和低估消极指标，与人类的回应相比，这表明有针对性的干预应考虑当地人的感知。我们的研究旨在纠正对街道感知的狭隘处理，很少考虑人口统计数据或个性特征。

更新时间: 2025-06-25 12:02:08

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2505.12758v2

On the ability of Deep Neural Networks to Learn Granger Causality in Multi-Variate Time Series Data

Granger Causality (GC) offers an elegant statistical framework to study the association between multivariate time series data. Linear Vector Autoregressive models (VAR) though have nice interpretation properties but have limited practical application due to underlying assumptions on the kind of associations that can be captured by these models. Numerous attempts have already been made in the literature that exploit the functional approximation power of Deep Neural Networks (DNNs) for the task of GC estimation. These methods however treat GC as a variable selection problem. We present a novel paradigm for approaching GC. We present this idea that GC is essentially linked with prediction and if a deep learning model is used to model the time series collectively or jointly, a well regularized model may learn the true granger causal structure from the data, given that there is enough training data. We propose to uncover the learned GC structure by comparing the model uncertainty or distribution of the residuals when the past of everything is used as compared to the one where a specific time series component is dropped from the model. We also compare the effect of input layer dropout on the ability of a neural network to learn granger causality from the data. We show that a well regularized model infact can learn the true GC structure from the data without explicitly adding terms in the loss function that guide the model to select variables or perform sparse regression.

Updated: 2025-06-25 11:57:24

标题: 关于深度神经网络学习多变量时间序列数据中Granger因果关系的能力

摘要: 格兰杰因果关系（GC）提供了一个优雅的统计框架，用于研究多变量时间序列数据之间的关联。尽管线性向量自回归模型（VAR）具有很好的解释性质，但由于这些模型对可以捕捉的关联类型的基本假设，其实际应用受到限制。文献中已经有很多尝试利用深度神经网络（DNNs）的功能逼近能力来进行GC估计的方法。然而，这些方法将GC视为一个变量选择问题。我们提出了一种新的GC处理范式。我们认为GC本质上与预测有关，如果使用深度学习模型来共同或联合建模时间序列，一个经过良好正则化的模型可能会从数据中学习到真实的格兰杰因果结构，前提是有足够的训练数据。我们提出通过比较当过去的所有内容被使用时的模型不确定性或残差分布，与将特定时间序列组件从模型中剔除时的情况来揭示学到的GC结构。我们还比较了输入层的丢失对神经网络学习数据中格兰杰因果关系的影响。我们展示了一个经过良好正则化的模型实际上可以从数据中学习到真实的GC结构，而无需明确在损失函数中添加引导模型选择变量或执行稀疏回归的项。

更新时间: 2025-06-25 11:57:24

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2506.20347v1

Signatures of planets and Galactic subpopulations in solar analogs. Precise chemical abundances with neural networks

The aim of this work is to obtain precise atmospheric parameters and chemical abundances automatically for solar twins and analogs to find signatures of exoplanets, as well as to assess how peculiar the Sun is compared to these stars and to analyze any possible fine structures in the Galactic thin disk. We developed a neural network (NN) algorithm using Python to obtain these parameters for a sample of 99 solar twins and solar analogs previously studied in the literature from normalized high-quality spectra from HARPS, with a resolving power of R $\sim$ 115000 and a signal-to-noise ratio S/N > 400. We obtained precise atmospheric parameters and abundance ratios [X/Fe] of 20 chemical elements (Li, C, O, Na, Mg, Al, Si, S, Ca, Sc, Ti, V, Cr, Mn, Co, Ni, Cu, Zn, Y, and Ba). The results are in line with the literature, with average differences and standard deviations of $(2 \pm 27)$ K for T$_{\rm eff}$, $(0.00 \pm 0.06)$ dex for log g, $(0.00 \pm 0.02)$ dex for [Fe/H], $(-0.01 \pm 0.05)$ km s$^{-1}$ for microturbulence velocity, $(0.02 \pm 0.08)$ km s$^{-1}$ for the macro turbulence velocity, and $(-0.12 \pm 0.26)$ km s$^{-1}$ for the projected rotational velocity (vsin$i$). Regarding the chemical abundances, most of the elements agree with the literature within 0.01 - 0.02 dex. The abundances were corrected from the effects of the Galactic chemical evolution and analyzed with the condensation temperature (T$_{\rm cond}$) to verify whether the stars presented depletion of refractories compared to volatiles. We found that the Sun is more depleted in refractory elements compared to volatiles than 89% of the studied solar analogs, with a significance of 9.5$\sigma$ when compared to the stars without detected exoplanets. We also found the possible presence of three subpopulations in the solar analogs: one Cu-rich, one Cu-poor, and the last one slightly older and poor in Na.

Updated: 2025-06-25 11:55:14

标题: 行星和银河亚群在太阳类星体中的特征：神经网络精确化学丰度

摘要: 这项工作的目的是自动获取太阳双星和类似恒星的精确大气参数和化学丰度，以寻找外行星的特征，同时评估太阳与这些恒星相比有多么特殊，并分析银盘薄层中可能存在的任何细微结构。我们使用Python开发了一个神经网络（NN）算法，通过处理分辨率为R $\sim$ 115000、信噪比S/N > 400的HARPS高质量光谱的样本，自动获取这些参数，样本包括之前已在文献中研究过的99个太阳双星和太阳类似恒星。我们获得了20种化学元素（Li、C、O、Na、Mg、Al、Si、S、Ca、Sc、Ti、V、Cr、Mn、Co、Ni、Cu、Zn、Y和Ba）的精确大气参数和丰度比例 [X/Fe]。结果与文献一致，T$_{\rm eff}$的平均差异和标准偏差为$(2 \pm 27)$ K，log g的差异为$(0.00 \pm 0.06)$ dex，[Fe/H]的差异为$(0.00 \pm 0.02)$ dex，微湍流速度的差异为$(-0.01 \pm 0.05)$ km s$^{-1}$，宏观湍流速度的差异为$(0.02 \pm 0.08)$ km s$^{-1}$，投影旋转速度的差异为$(-0.12 \pm 0.26)$ km s$^{-1}$（vsin$i$）。关于化学丰度，大多数元素与文献中的结果一致，差异在0.01 - 0.02 dex之间。化学丰度已经校正以消除银河化学演化的影响，并通过冷凝温度（T$_{\rm cond}$）进行分析，以验证这些恒星相对挥发性元素是否存在亏损。我们发现，太阳相对于89%的研究过的太阳类似恒星而言，在亏损的耐热元素方面更为贫乏，与未探测到外行星的恒星相比，具有9.5$\sigma$的显著性。我们还发现太阳类似恒星中可能存在三个亚群：一个富含铜的、一个贫含铜的，以及最后一个年龄稍大且贫含钠的亚群。

更新时间: 2025-06-25 11:55:14

领域: astro-ph.SR,astro-ph.EP,astro-ph.GA,cs.LG,cs.NE

下载: http://arxiv.org/abs/2506.20345v1

A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization

Despite its wide range of applications across various domains, the optimization foundations of deep matrix factorization (DMF) remain largely open. In this work, we aim to fill this gap by conducting a comprehensive study of the loss landscape of the regularized DMF problem. Toward this goal, we first provide a closed-form expression of all critical points. Building on this, we establish precise conditions under which a critical point is a local minimizer, a global minimizer, a strict saddle point, or a non-strict saddle point. Leveraging these results, we derive a necessary and sufficient condition under which each critical point is either a local minimizer or a strict saddle point. This provides insights into why gradient-based methods almost always converge to a local minimizer of the regularized DMF problem. Finally, we conduct numerical experiments to visualize its loss landscape under different settings to support our theory.

Updated: 2025-06-25 11:51:41

标题: 正则化深度矩阵分解的完整损失景观分析

摘要: 尽管深度矩阵分解（DMF）在各个领域具有广泛的应用，但其优化基础仍然存在许多未解之谜。在这项工作中，我们旨在通过对正则化DMF问题的损失景观进行全面研究来填补这一空白。为实现这一目标，我们首先提供了所有临界点的闭式表达式。在此基础上，我们建立了准确的条件，说明了临界点是局部最小值、全局最小值、严格鞍点还是非严格鞍点。利用这些结果，我们推导出了每个临界点是局部最小值或严格鞍点的必要和充分条件。这为解释为什么基于梯度的方法几乎总是收敛到正则化DMF问题的局部最小值提供了见解。最后，我们进行数值实验，以可视化不同设置下的损失景观，以支持我们的理论。

更新时间: 2025-06-25 11:51:41

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2506.20344v1

Feature Hallucination for Self-supervised Action Recognition

Understanding human actions in videos requires more than raw pixel analysis; it relies on high-level semantic reasoning and effective integration of multimodal features. We propose a deep translational action recognition framework that enhances recognition accuracy by jointly predicting action concepts and auxiliary features from RGB video frames. At test time, hallucination streams infer missing cues, enriching feature representations without increasing computational overhead. To focus on action-relevant regions beyond raw pixels, we introduce two novel domain-specific descriptors. Object Detection Features (ODF) aggregate outputs from multiple object detectors to capture contextual cues, while Saliency Detection Features (SDF) highlight spatial and intensity patterns crucial for action recognition. Our framework seamlessly integrates these descriptors with auxiliary modalities such as optical flow, Improved Dense Trajectories, skeleton data, and audio cues. It remains compatible with state-of-the-art architectures, including I3D, AssembleNet, Video Transformer Network, FASTER, and recent models like VideoMAE V2 and InternVideo2. To handle uncertainty in auxiliary features, we incorporate aleatoric uncertainty modeling in the hallucination step and introduce a robust loss function to mitigate feature noise. Our multimodal self-supervised action recognition framework achieves state-of-the-art performance on multiple benchmarks, including Kinetics-400, Kinetics-600, and Something-Something V2, demonstrating its effectiveness in capturing fine-grained action dynamics.

Updated: 2025-06-25 11:50:23

标题: Feature Hallucination用于自监督动作识别

摘要: 理解视频中的人类动作不仅需要对原始像素进行分析；它还依赖于高级语义推理和有效集成多模态特征。我们提出了一种深度转化动作识别框架，通过从RGB视频帧中联合预测动作概念和辅助特征来提高识别准确性。在测试时，幻觉流推断缺失线索，丰富特征表示而不增加计算开销。为了关注超越原始像素的与动作相关的区域，我们引入了两种新颖的领域特定描述符。目标检测特征（ODF）聚合多个目标检测器的输出，捕捉上下文线索，而显著性检测特征（SDF）突出了对于动作识别至关重要的空间和强度模式。我们的框架无缝地将这些描述符与光流、改进的密集轨迹、骨架数据和音频线索等辅助模态集成在一起。它与最先进的架构兼容，包括I3D、AssembleNet、视频变换网络、FASTER以及最近的模型如VideoMAE V2和InternVideo2。为了处理辅助特征中的不确定性，我们在幻觉步骤中结合了随机不确定性建模，并引入了一个强大的损失函数来减轻特征噪音。我们的多模态自监督动作识别框架在多个基准测试中取得了最先进的性能，包括Kinetics-400、Kinetics-600和Something-Something V2，展示了其在捕捉细粒度动作动态方面的有效性。

更新时间: 2025-06-25 11:50:23

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.20342v1

Recurrent neural network-based robust control systems with closed-loop regional incremental ISS and application to MPC design

This paper investigates the design of output-feedback schemes for systems described by a class of recurrent neural networks. We propose a procedure based on linear matrix inequalities for designing an observer and a static state-feedback controller. The algorithm leverages global and regional incremental input-to-state stability (incremental ISS) and enables the tracking of constant setpoints, ensuring robustness to disturbances and state estimation uncertainty. To address the potential limitations of regional incremental ISS, we introduce an alternative scheme in which the static law is replaced with a tube-based nonlinear model predictive controller (NMPC) that exploits regional incremental ISS properties. We show that these conditions enable the formulation of a robust NMPC law with guarantees of convergence and recursive feasibility, leading to an enlarged region of attraction. Theoretical results are validated through numerical simulations on the pH-neutralisation process benchmark, demonstrating the effectiveness of the proposed schemes.

Updated: 2025-06-25 11:44:28

标题: 基于循环神经网络的具有闭环区域增量ISS的鲁棒控制系统及其在MPC设计中的应用

摘要: 本文研究了一类基于递归神经网络描述的系统的输出反馈方案设计。我们提出了一种基于线性矩阵不等式的程序，用于设计一个观测器和一个静态状态反馈控制器。该算法利用全局和区域增量输入-状态稳定性（增量ISS），能够跟踪恒定设定点，确保对干扰和状态估计不确定性的鲁棒性。为了解决区域增量ISS的潜在局限性，我们引入了一种替代方案，其中静态定律被一个基于管道的非线性模型预测控制器（NMPC）所取代，该控制器利用了区域增量ISS的特性。我们展示了这些条件使得能够制定一个具有收敛性和递归可行性保证的鲁棒NMPC定律，从而扩大了吸引力区域。通过对pH中和过程基准的数值模拟验证了理论结果，证明了所提出方案的有效性。

更新时间: 2025-06-25 11:44:28

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2506.20334v1

Scoring Azure permissions with metric spaces

In this work, we introduce two complementary metrics for quantifying and scoring privilege risk in Microsoft Azure. In the Control Plane, we define the WAR distance, a superincreasing distance over Write, Action, and Read control permissions, which yields a total ordering of principals by their configuration power. In the Data Plane, we present a blast radius distance for measuring the maximum breadth of data exfiltration and forgery, leveraging the natural ultrametry of Azure Tenants clustering hierarchy Together, these metrics offer a unified framework for proactive IAM analysis, ranking, lifecycle monitoring, and least privilege enforcement.

Updated: 2025-06-25 11:41:16

标题: 使用度量空间评分Azure权限

摘要: 在这项工作中，我们介绍了两种互补的度量标准，用于量化和评分微软Azure中的特权风险。在控制平面上，我们定义了WAR距离，这是一个超增长距离，涵盖了写入、操作和读取控制权限，从而对主体按其配置能力进行总排序。在数据平面上，我们提出了一个爆炸半径距离，用于测量数据外泄和伪造的最大广度，利用Azure租户聚类层次结构的自然超度性。综合起来，这些度量标准为积极的IAM分析、排名、生命周期监控和最小特权执行提供了统一框架。

更新时间: 2025-06-25 11:41:16

领域: cs.CR

下载: http://arxiv.org/abs/2504.13747v2

IMC-PINN-FE: A Physics-Informed Neural Network for Patient-Specific Left Ventricular Finite Element Modeling with Image Motion Consistency and Biomechanical Parameter Estimation

Elucidating the biomechanical behavior of the myocardium is crucial for understanding cardiac physiology, but cannot be directly inferred from clinical imaging and typically requires finite element (FE) simulations. However, conventional FE methods are computationally expensive and often fail to reproduce observed cardiac motions. We propose IMC-PINN-FE, a physics-informed neural network (PINN) framework that integrates imaged motion consistency (IMC) with FE modeling for patient-specific left ventricular (LV) biomechanics. Cardiac motion is first estimated from MRI or echocardiography using either a pre-trained attention-based network or an unsupervised cyclic-regularized network, followed by extraction of motion modes. IMC-PINN-FE then rapidly estimates myocardial stiffness and active tension by fitting clinical pressure measurements, accelerating computation from hours to seconds compared to traditional inverse FE. Based on these parameters, it performs FE modeling across the cardiac cycle at 75x speedup. Through motion constraints, it matches imaged displacements more accurately, improving average Dice from 0.849 to 0.927, while preserving realistic pressure-volume behavior. IMC-PINN-FE advances previous PINN-FE models by introducing back-computation of material properties and better motion fidelity. Using motion from a single subject to reconstruct shape modes also avoids the need for large datasets and improves patient specificity. IMC-PINN-FE offers a robust and efficient approach for rapid, personalized, and image-consistent cardiac biomechanical modeling.

Updated: 2025-06-25 11:37:34

标题: IMC-PINN-FE: 一种用于患者特定左心室有限元建模的物理信息神经网络，具有图像运动一致性和生物力学参数估计

摘要: 阐明心肌的生物力学行为对于理解心脏生理学至关重要，但不能直接从临床影像中推断，通常需要有限元（FE）模拟。然而，传统的FE方法在计算上昂贵，并经常无法复现观察到的心脏运动。我们提出了IMC-PINN-FE，这是一个集成了影像运动一致性（IMC）和FE建模的物理信息神经网络（PINN）框架，用于特定患者左心室（LV）生物力学。首先使用MRI或超声心动图从预训练的基于注意力的网络或无监督的循环正则化网络估计心脏运动，然后提取运动模式。接着，IMC-PINN-FE通过拟合临床压力测量值快速估计心肌硬度和主动张力，将计算速度从传统的反向FE的几小时缩短到几秒。基于这些参数，它在75倍速度加快的情况下跨越心脏周期进行FE建模。通过运动约束，它更准确地匹配影像位移，将平均Dice值从0.849提高至0.927，同时保留了逼真的压力-容积行为。IMC-PINN-FE通过引入材料属性的反向计算和更好的运动保真度，推进了以往的PINN-FE模型。使用单个受试者的运动重建形态模式还避免了对大型数据集的需求，并提高了患者特异性。IMC-PINN-FE为快速、个性化和影像一致的心脏生物力学建模提供了一种健壮且高效的方法。

更新时间: 2025-06-25 11:37:34

领域: physics.med-ph,cs.AI,eess.IV

下载: http://arxiv.org/abs/2506.20696v1

Mobile-R1: Towards Interactive Reinforcement Learning for VLM-Based Mobile Agent via Task-Level Rewards

Vision-language model-based mobile agents have gained the ability to not only understand complex instructions and mobile screenshots, but also optimize their action outputs via thinking and reasoning, benefiting from reinforcement learning, such as Group Relative Policy Optimization (GRPO). However, existing research centers on offline reinforcement learning training or online optimization using action-level rewards, which limits the agent's dynamic interaction with the environment. This often results in agents settling into local optima, thereby weakening their ability for exploration and error action correction. To address these challenges, we introduce an approach called Mobile-R1, which employs interactive multi-turn reinforcement learning with task-level rewards for mobile agents. Our training framework consists of three stages: initial format finetuning, single-step online training via action-level reward, followed by online training via task-level reward based on multi-turn trajectories. This strategy is designed to enhance the exploration and error correction capabilities of Mobile-R1, leading to significant performance improvements. Moreover, we have collected a dataset covering 28 Chinese applications with 24,521 high-quality manual annotations and established a new benchmark with 500 trajectories. We will open source all resources, including the dataset, benchmark, model weight, and codes: https://mobile-r1.github.io/Mobile-R1/.

Updated: 2025-06-25 11:34:43

标题: Mobile-R1: 朝向基于VLM的移动代理的任务级奖励交互式强化学习

摘要: 基于视觉语言模型的移动Agent已经具备了理解复杂指令和移动屏幕截图的能力，同时还能通过思考和推理优化其行为输出，受益于强化学习，如组相对策略优化（GRPO）。然而，现有研究主要集中在离线强化学习训练或使用动作级奖励进行在线优化，这限制了Agent与环境的动态交互。这经常导致Agent陷入局部最优解，从而削弱其探索和错误行为修正能力。为了解决这些挑战，我们介绍了一种称为Mobile-R1的方法，该方法采用与任务级奖励的交互式多轮强化学习用于移动Agent。我们的训练框架包括三个阶段：初始格式微调，通过动作级奖励进行单步在线训练，然后基于多轮轨迹进行任务级奖励的在线训练。这种策略旨在增强Mobile-R1的探索和错误修正能力，从而显著提高性能。此外，我们已收集了一个涵盖28个中国应用程序的数据集，包括24,521个高质量手动标注，并建立了一个包含500个轨迹的新基准。我们将开源所有资源，包括数据集、基准、模型权重和代码：https://mobile-r1.github.io/Mobile-R1/。

更新时间: 2025-06-25 11:34:43

领域: cs.AI

下载: http://arxiv.org/abs/2506.20332v1

Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content

We introduce Biomed-Enriched, a biomedical text dataset constructed from PubMed via a two-stage annotation process. In the first stage, a large language model annotates 400K paragraphs from PubMed scientific articles, assigning scores for their type (review, study, clinical case, other), domain (clinical, biomedical, other), and educational quality. The educational quality score (rated 1 to 5) estimates how useful a paragraph is for college-level learning. These annotations are then used to fine-tune a small language model, which propagates the labels across the full PMC-OA corpus. The resulting metadata allows us to extract refined subsets, including 2M clinical case paragraphs with over 450K high-quality ones from articles with commercial-use licenses, and to construct several variants via quality filtering and domain upsampling. Clinical text is typically difficult to access due to privacy constraints, as hospital records cannot be publicly shared. Hence, our dataset provides an alternative large-scale, openly available collection of clinical cases from PubMed, making it a valuable resource for biomedical and clinical NLP. Preliminary continual-pretraining experiments with OLMo2 suggest these curated subsets enable targeted improvements, with clinical upsampling boosting performance by ~5% on MMLU ProfMed and educational quality filtering improving MedQA and MedMCQA by ~1%. Combinations of these techniques led to faster convergence, reaching same performance with a third of training tokens, indicating potential for more efficient and effective biomedical pretraining strategies.

Updated: 2025-06-25 11:30:25

标题: Biomed-Enriched：一个富含LLMs的生物医学数据集，用于预训练和提取稀有和隐藏内容

摘要: 我们介绍了Biomed-Enriched，这是一个通过两阶段注释过程从PubMed构建的生物医学文本数据集。在第一阶段，一个大型语言模型对来自PubMed科学文章的40万段落进行注释，为它们的类型（综述、研究、临床案例、其他）、领域（临床、生物医学、其他）和教育质量分配分数。教育质量评分（1至5分）估计了一段文本对大学水平学习的有用程度。然后利用这些注释来微调一个小型语言模型，将标签传播到整个PMC-OA语料库。由此产生的元数据使我们能够提取精细的子集，包括来自具有商业使用许可的文章中的超过45万个高质量的200万个临床案例段落，并通过质量过滤和领域上采样构建几个变体。由于医院记录无法公开共享，临床文本通常难以访问。因此，我们的数据集提供了一个大规模、公开可用的PubMed临床案例集合，使其成为生物医学和临床自然语言处理的有价值资源。与OLMo2进行的初步持续预训练实验证明，这些筛选的子集能够实现有针对性的改进，临床上采样可将MMLU ProfMed的性能提升约5%，而教育质量过滤则可将MedQA和MedMCQA的性能提高约1%。这些技术的组合导致更快的收敛，使用三分之一的训练标记达到相同的性能，表明生物医学预训练策略具有更高效和更有效的潜力。

更新时间: 2025-06-25 11:30:25

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2506.20331v1

Representation Learning with Parameterised Quantum Circuits for Advancing Speech Emotion Recognition

Quantum machine learning (QML) offers a promising avenue for advancing representation learning in complex signal domains. In this study, we investigate the use of parameterised quantum circuits (PQCs) for speech emotion recognition (SER) a challenging task due to the subtle temporal variations and overlapping affective states in vocal signals. We propose a hybrid quantum classical architecture that integrates PQCs into a conventional convolutional neural network (CNN), leveraging quantum properties such as superposition and entanglement to enrich emotional feature representations. Experimental evaluations on three benchmark datasets IEMOCAP, RECOLA, and MSP-IMPROV demonstrate that our hybrid model achieves improved classification performance relative to a purely classical CNN baseline, with over 50% reduction in trainable parameters. This work provides early evidence of the potential for QML to enhance emotion recognition and lays the foundation for future quantum-enabled affective computing systems.

Updated: 2025-06-25 11:26:43

标题: 使用参数化量子电路进行表征学习以推进语音情感识别

摘要: 量子机器学习（QML）为推进复杂信号领域的表示学习提供了一条有前途的途径。在本研究中，我们研究了参数化量子电路（PQCs）在语音情绪识别（SER）中的应用，这是一项具有挑战性的任务，因为语音信号中存在微妙的时间变化和重叠的情感状态。我们提出了一个混合量子经典架构，将PQCs集成到传统的卷积神经网络（CNN）中，利用量子特性（如叠加和纠缠）来丰富情感特征表示。在三个基准数据集IEMOCAP，RECOLA和MSP-IMPROV上进行的实验评估表明，我们的混合模型相对于纯粹的经典CNN基线实现了改进的分类性能，可减少超过50％的可训练参数。这项工作为QML增强情绪识别的潜力提供了初步证据，并为未来量子增强的情感计算系统奠定了基础。

更新时间: 2025-06-25 11:26:43

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2501.12050v3

Producer-Fairness in Sequential Bundle Recommendation

We address fairness in the context of sequential bundle recommendation, where users are served in turn with sets of relevant and compatible items. Motivated by real-world scenarios, we formalize producer-fairness, that seeks to achieve desired exposure of different item groups across users in a recommendation session. Our formulation combines naturally with building high quality bundles. Our problem is solved in real time as users arrive. We propose an exact solution that caters to small instances of our problem. We then examine two heuristics, quality-first and fairness-first, and an adaptive variant that determines on-the-fly the right balance between bundle fairness and quality. Our experiments on three real-world datasets underscore the strengths and limitations of each solution and demonstrate their efficacy in providing fair bundle recommendations without compromising bundle quality.

Updated: 2025-06-25 11:24:52

标题: 生产者公平性在顺序捆绑推荐中的应用

摘要: 我们讨论了在顺序捆绑推荐的背景下的公平性问题，用户依次用相关和兼容的物品集合进行服务。受现实场景的启发，我们形式化了生产者公平性，旨在在推荐会话中实现不同物品组的期望暴露度。我们的公式化与构建高质量捆绑自然结合。我们的问题在用户到达时实时解决。我们提出了一个确切的解决方案，适用于我们问题的小实例。然后我们检验了两种启发式方法，质量优先和公平性优先，以及一个自适应变体，动态确定捆绑公平性和质量之间的合适平衡。我们在三个真实世界数据集上的实验突显了每种解决方案的优势和局限性，并展示了它们在提供公平捆绑推荐时不会牺牲捆绑质量的有效性。

更新时间: 2025-06-25 11:24:52

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2506.20329v1

Permutation Equivariant Neural Controlled Differential Equations for Dynamic Graph Representation Learning

Dynamic graphs exhibit complex temporal dynamics due to the interplay between evolving node features and changing network structures. Recently, Graph Neural Controlled Differential Equations (Graph Neural CDEs) successfully adapted Neural CDEs from paths on Euclidean domains to paths on graph domains. Building on this foundation, we introduce Permutation Equivariant Neural Graph CDEs, which project Graph Neural CDEs onto permutation equivariant function spaces. This significantly reduces the model's parameter count without compromising representational power, resulting in more efficient training and improved generalisation. We empirically demonstrate the advantages of our approach through experiments on simulated dynamical systems and real-world tasks, showing improved performance in both interpolation and extrapolation scenarios.

Updated: 2025-06-25 11:06:30

标题: 置换等变神经控制微分方程用于动态图表示学习

摘要: 动态图表现出复杂的时间动态，这是由于不断演化的节点特征和变化的网络结构之间的相互作用。最近，图神经控制微分方程（Graph Neural Controlled Differential Equations，Graph Neural CDEs）成功地将神经CDEs从欧几里德域上的路径调整为图域上的路径。在此基础上，我们引入置换等变神经图CDEs，将图神经CDEs投影到置换等变函数空间上。这显著降低了模型的参数数量，而不影响表示能力，从而实现更高效的训练和改善泛化能力。我们通过在模拟动态系统和现实任务上进行实验证明了我们方法的优势，在内插和外推场景下表现出更好的性能。

更新时间: 2025-06-25 11:06:30

领域: cs.LG

下载: http://arxiv.org/abs/2506.20324v1

Comparative Analysis of Deep Learning Models for Crop Disease Detection: A Transfer Learning Approach

This research presents the development of an Artificial Intelligence (AI) - driven crop disease detection system designed to assist farmers in rural areas with limited resources. We aim to compare different deep learning models for a comparative analysis, focusing on their efficacy in transfer learning. By leveraging deep learning models, including EfficientNet, ResNet101, MobileNetV2, and our custom CNN, which achieved a validation accuracy of 95.76%, the system effectively classifies plant diseases. This research demonstrates the potential of transfer learning in reshaping agricultural practices, improving crop health management, and supporting sustainable farming in rural environments.

Updated: 2025-06-25 11:04:33

标题: 作物病害检测的深度学习模型比较分析：一种迁移学习方法

摘要: 本研究介绍了一种人工智能驱动的作物病害检测系统的开发，旨在帮助资源有限的农村地区的农民。我们旨在比较不同的深度学习模型，重点分析它们在迁移学习中的有效性。通过利用包括EfficientNet、ResNet101、MobileNetV2和我们的自定义CNN在内的深度学习模型，该系统有效地对植物疾病进行分类，验证准确率达到了95.76%。这项研究展示了迁移学习在重塑农业实践、改善作物健康管理以及支持农村环境可持续农业的潜力。

更新时间: 2025-06-25 11:04:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20323v1

How Can Multimodal Remote Sensing Datasets Transform Classification via SpatialNet-ViT?

Remote sensing datasets offer significant promise for tackling key classification tasks such as land-use categorization, object presence detection, and rural/urban classification. However, many existing studies tend to focus on narrow tasks or datasets, which limits their ability to generalize across various remote sensing classification challenges. To overcome this, we propose a novel model, SpatialNet-ViT, leveraging the power of Vision Transformers (ViTs) and Multi-Task Learning (MTL). This integrated approach combines spatial awareness with contextual understanding, improving both classification accuracy and scalability. Additionally, techniques like data augmentation, transfer learning, and multi-task learning are employed to enhance model robustness and its ability to generalize across diverse datasets

Updated: 2025-06-25 10:50:33

标题: 多模式遥感数据如何通过SpatialNet-ViT转变分类？

摘要: 遥感数据集为解决关键分类任务如土地利用分类、物体检测和农村/城市分类提供了巨大的潜力。然而，许多现有研究往往集中在狭窄的任务或数据集上，这限制了它们在各种遥感分类挑战中的泛化能力。为了克服这一问题，我们提出了一种新颖的模型，SpatialNet-ViT，利用了视觉变换器（ViTs）和多任务学习（MTL）的力量。这种整合方法将空间意识与上下文理解结合起来，提高了分类准确性和可扩展性。此外，采用了数据增强、迁移学习和多任务学习等技术，以增强模型的稳健性和其在各种数据集上的泛化能力。

更新时间: 2025-06-25 10:50:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.22501v1

Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

We introduce Confucius3-Math, an open-source large language model with 14B parameters that (1) runs efficiently on a single consumer-grade GPU; (2) achieves SOTA performances on a range of mathematical reasoning tasks, outperforming many models with significantly larger sizes. In particular, as part of our mission to enhancing education and knowledge dissemination with AI, Confucius3-Math is specifically committed to mathematics learning for Chinese K-12 students and educators. Built via post-training with large-scale reinforcement learning (RL), Confucius3-Math aligns with national curriculum and excels at solving main-stream Chinese K-12 mathematical problems with low cost. In this report we share our development recipe, the challenges we encounter and the techniques we develop to overcome them. In particular, we introduce three technical innovations: Targeted Entropy Regularization, Recent Sample Recovery and Policy-Specific Hardness Weighting. These innovations encompass a new entropy regularization, a novel data scheduling policy, and an improved group-relative advantage estimator. Collectively, they significantly stabilize the RL training, improve data efficiency, and boost performance. Our work demonstrates the feasibility of building strong reasoning models in a particular domain at low cost. We open-source our model and code at https://github.com/netease-youdao/Confucius3-Math.

Updated: 2025-06-25 10:49:23

标题: 孔子3数学：一种轻量级高性能推理LLM，用于中国K-12数学学习

摘要: 我们介绍了Confucius3-Math，这是一个开源的具有14B参数的大型语言模型，(1)在单个普通消费级GPU上高效运行；(2)在一系列数学推理任务上取得了SOTA表现，胜过许多规模更大的模型。特别地，作为我们旨在通过人工智能增强教育和知识传播的使命的一部分，Confucius3-Math专门致力于中国K-12学生和教育工作者的数学学习。通过大规模强化学习（RL）进行后训练构建的Confucius3-Math与国家课程一致，并擅长以低成本解决主流中国K-12数学问题。在本报告中，我们分享了我们的开发配方、我们遇到的挑战以及我们制定的克服这些挑战的技术。特别地，我们介绍了三项技术创新：目标熵正则化、最近样本恢复和特定策略难度加权。这些创新包括一种新的熵正则化、一种新颖的数据调度策略和一种改进的组相对优势估计器。总的来说，它们显著稳定了RL训练，提高了数据效率，并提升了性能。我们的工作展示了以低成本在特定领域构建强大推理模型的可行性。我们在https://github.com/netease-youdao/Confucius3-Math开源了我们的模型和代码。

更新时间: 2025-06-25 10:49:23

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.18330v2

BINDy -- Bayesian identification of nonlinear dynamics with reversible-jump Markov-chain Monte-Carlo

Model parsimony is an important \emph{cognitive bias} in data-driven modelling that aids interpretability and helps to prevent over-fitting. Sparse identification of nonlinear dynamics (SINDy) methods are able to learn sparse representations of complex dynamics directly from data, given a basis of library functions. In this work, a novel Bayesian treatment of dictionary learning system identification, as an alternative to SINDy, is envisaged. The proposed method -- Bayesian identification of nonlinear dynamics (BINDy) -- is distinct from previous approaches in that it targets the full joint posterior distribution over both the terms in the library and their parameterisation in the model. This formulation confers the advantage that an arbitrary prior may be placed over the model structure to produce models that are sparse in the model space rather than in parameter space. Because this posterior is defined over parameter vectors that can change in dimension, the inference cannot be performed by standard techniques. Instead, a Gibbs sampler based on reversible-jump Markov-chain Monte-Carlo is proposed. BINDy is shown to compare favourably to ensemble SINDy in three benchmark case-studies. In particular, it is seen that the proposed method is better able to assign high probability to correct model terms.

Updated: 2025-06-25 10:45:10

标题: BINDy -- 贝叶斯方法识别具有可逆跳转马尔可夫链蒙特卡洛的非线性动力学

摘要: 模型的简约性是数据驱动建模中的一个重要的认知偏见，有助于解释性并帮助防止过度拟合。稀疏非线性动力学（SINDy）方法能够直接从数据中学习复杂动力学的稀疏表示，只要给定一个库函数的基础。在这项工作中，提出了一种新颖的贝叶斯处理字典学习系统识别，作为SINDy的替代方案。提出的方法--贝叶斯非线性动力学识别（BINDy）--与以前的方法不同之处在于，它针对库中条款和模型中的参数化的完整联合后验分布。这种表述赋予了一个优势，即可以在模型结构上放置任意先验，以产生在模型空间而不是参数空间中稀疏的模型。因为这个后验是定义在可以改变维度的参数向量上的，因此不能通过标准技术进行推断。相反，提出了一种基于可逆跳跃马尔可夫链蒙特卡洛的吉布斯采样器。在三个基准案例研究中，BINDy被证明与集成SINDy相比具有优势。特别是，可以看到所提出的方法更能够为正确的模型项分配高概率。

更新时间: 2025-06-25 10:45:10

领域: stat.ML,cs.LG,math.DS

下载: http://arxiv.org/abs/2408.08062v3

Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration

Imitation learning is a central problem in reinforcement learning where the goal is to learn a policy that mimics the expert's behavior. In practice, it is often challenging to learn the expert policy from a limited number of demonstrations accurately due to the complexity of the state space. Moreover, it is essential to explore the environment and collect data to achieve beyond-expert performance. To overcome these challenges, we propose a novel imitation learning algorithm called Imitation Learning with Double Exploration (ILDE), which implements exploration in two aspects: (1) optimistic policy optimization via an exploration bonus that rewards state-action pairs with high uncertainty to potentially improve the convergence to the expert policy, and (2) curiosity-driven exploration of the states that deviate from the demonstration trajectories to potentially yield beyond-expert performance. Empirically, we demonstrate that ILDE outperforms the state-of-the-art imitation learning algorithms in terms of sample efficiency and achieves beyond-expert performance on Atari and MuJoCo tasks with fewer demonstrations than in previous work. We also provide a theoretical justification of ILDE as an uncertainty-regularized policy optimization method with optimistic exploration, leading to a regret growing sublinearly in the number of episodes.

Updated: 2025-06-25 10:39:32

标题: 有限演示下的超越专家表现：双重探索的高效模仿学习

摘要: 模仿学习是强化学习中的一个核心问题，其目标是学习一个模仿专家行为的策略。在实践中，由于状态空间的复杂性，通常很难从有限数量的示范中准确学习专家策略。此外，探索环境并收集数据以实现超越专家的表现是至关重要的。为了克服这些挑战，我们提出了一种新颖的模仿学习算法，称为具有双重探索的模仿学习（ILDE），该算法在两个方面实现了探索：（1）通过探索奖励对具有高不确定性的状态-动作对进行乐观策略优化，从而潜在地改善对专家策略的收敛，以及（2）基于好奇心的探索那些偏离示范轨迹的状态，从而潜在地产生超越专家的表现。从实证上讲，我们证明了ILDE在样本效率方面优于最先进的模仿学习算法，并在Atari和MuJoCo任务中实现了超越专家的表现，而且比以前的工作中需要更少的示范。我们还提供了ILDE的理论证明，将其作为一种带有乐观探索的不确定性规范化策略优化方法，导致遗憾在情节数增长的情况下呈次线性增长。

更新时间: 2025-06-25 10:39:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20307v1

Learning Moderately Input-Sensitive Functions: A Case Study in QR Code Decoding

The hardness of learning a function that attains a target task relates to its input-sensitivity. For example, image classification tasks are input-insensitive as minor corruptions should not affect the classification results, whereas arithmetic and symbolic computation, which have been recently attracting interest, are highly input-sensitive as each input variable connects to the computation results. This study presents the first learning-based Quick Response (QR) code decoding and investigates learning functions of medium sensitivity. Our experiments reveal that Transformers can successfully decode QR codes, even beyond the theoretical error-correction limit, by learning the structure of embedded texts. They generalize from English-rich training data to other languages and even random strings. Moreover, we observe that the Transformer-based QR decoder focuses on data bits while ignoring error-correction bits, suggesting a decoding mechanism distinct from standard QR code readers.

Updated: 2025-06-25 10:37:39

标题: 学习中等输入敏感函数：QR码解码案例研究

摘要: 学习实现目标任务的函数的困难程度与其输入敏感性相关。例如，图像分类任务是输入不敏感的，因为轻微的损坏不应该影响分类结果，而最近引起关注的算术和符号计算则具有很高的输入敏感性，因为每个输入变量都与计算结果相关联。本研究提出了第一个基于学习的快速响应（QR）码解码，并探究了中等敏感性的学习函数。我们的实验表明，变压器可以成功解码QR码，甚至超出理论错误校正限制，通过学习嵌入文本的结构。它们可以从富含英文的训练数据泛化到其他语言甚至随机字符串。此外，我们观察到基于变压器的QR码解码器专注于数据位，而忽略纠错位，表明解码机制与标准QR码阅读器不同。

更新时间: 2025-06-25 10:37:39

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2506.20305v1

$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking

Agents based on large language models leverage tools to modify environments, revolutionizing how AI interacts with the physical world. Unlike traditional NLP tasks that rely solely on historical dialogue for responses, these agents must consider more complex factors, such as inter-tool relationships, environmental feedback and previous decisions, when making choices. Current research typically evaluates agents via multi-turn dialogues. However, it overlooks the influence of these critical factors on agent behavior. To bridge this gap, we present an open-source and high-quality benchmark $C^3$-Bench. This benchmark integrates attack concepts and applies univariate analysis to pinpoint key elements affecting agent robustness. In concrete, we design three challenges: navigate complex tool relationships, handle critical hidden information and manage dynamic decision paths. Complementing these challenges, we introduce fine-grained metrics, innovative data collection algorithms and reproducible evaluation methods. Extensive experiments are conducted on 49 mainstream agents, encompassing general fast-thinking, slow-thinking and domain-specific models. We observe that agents have significant shortcomings in handling tool dependencies, long context information dependencies and frequent policy-type switching. In essence, $C^3$-Bench aims to expose model vulnerabilities through these challenges and drive research into the interpretability of agent performance. The benchmark is publicly available at https://github.com/yupeijei1997/C3-Bench.

Updated: 2025-06-25 10:37:25

标题: $C^3$-Bench: 基于LLM的多任务真实扰动代理的事情

摘要: 基于大型语言模型的Agent利用工具修改环境，彻底改变了人工智能与物理世界互动的方式。与传统的自然语言处理任务不同，传统的任务仅仅依赖于历史对话来进行响应，这些Agent在做出选择时必须考虑更复杂的因素，例如工具之间的关系、环境反馈和先前的决策。目前的研究通常通过多轮对话来评估Agent。然而，它忽视了这些关键因素对Agent行为的影响。为了弥合这一差距，我们提出了一个开源且高质量的基准$C^3$-Bench。该基准集成了攻击概念，并应用单变量分析来确定影响Agent鲁棒性的关键元素。具体而言，我们设计了三个挑战：导航复杂的工具关系，处理关键的隐藏信息和管理动态的决策路径。除了这些挑战，我们还引入了细粒度的度量标准、创新的数据收集算法和可重现的评估方法。我们对49个主流Agent进行了广泛的实验，包括一般快速思考、慢速思考和特定领域模型。我们观察到Agent在处理工具依赖、长上下文信息依赖和频繁的策略类型切换方面存在显著缺陷。实质上，$C^3$-Bench旨在通过这些挑战暴露模型的脆弱性，并推动对Agent性能的解释性研究。该基准公开可用于https://github.com/yupeijei1997/C3-Bench。

更新时间: 2025-06-25 10:37:25

领域: cs.AI

下载: http://arxiv.org/abs/2505.18746v3

Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors

Deep neural networks (DNNs) deployed in a cloud often allow users to query models via the APIs. However, these APIs expose the models to model extraction attacks (MEAs). In this attack, the attacker attempts to duplicate the target model by abusing the responses from the API. Backdoor-based DNN watermarking is known as a promising defense against MEAs, wherein the defender injects a backdoor into extracted models via API responses. The backdoor is used as a watermark of the model; if a suspicious model has the watermark (i.e., backdoor), it is verified as an extracted model. This work focuses on object detection (OD) models. Existing backdoor attacks on OD models are not applicable for model watermarking as the defense against MEAs on a realistic threat model. Our proposed approach involves inserting a backdoor into extracted models via APIs by stealthily modifying the bounding-boxes (BBs) of objects detected in queries while keeping the OD capability. In our experiments on three OD datasets, the proposed approach succeeded in identifying the extracted models with 100% accuracy in a wide variety of experimental scenarios.

Updated: 2025-06-25 10:37:16

标题: 边界框水印：抵御目标检测器模型提取攻击

摘要: 云中部署的深度神经网络（DNNs）通常允许用户通过API查询模型。然而，这些API暴露了模型面临模型提取攻击（MEAs）的风险。在这种攻击中，攻击者试图通过滥用API的响应来复制目标模型。基于后门的DNN水印技术被认为是一种有希望的抵御MEA的防御方法，防御者通过API的响应向提取的模型中注入后门。后门被用作模型的水印；如果一个可疑的模型具有水印（即后门），则可验证其为提取的模型。本研究聚焦于物体检测（OD）模型。现有的对OD模型的后门攻击不适用于作为实际威胁模型上的防御方法的模型水印。我们提出的方法涉及通过偷偷修改查询中检测到的物体的边界框（BBs）来向提取的模型中插入后门，同时保持OD能力。在我们对三个OD数据集的实验中，所提出的方法在广泛的实验场景中以100%的准确率成功识别了提取的模型。

更新时间: 2025-06-25 10:37:16

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2411.13047v2

Bilinear MLPs enable weight-based mechanistic interpretability

A mechanistic understanding of how MLPs do computation in deep neural networks remains elusive. Current interpretability work can extract features from hidden activations over an input dataset but generally cannot explain how MLP weights construct features. One challenge is that element-wise nonlinearities introduce higher-order interactions and make it difficult to trace computations through the MLP layer. In this paper, we analyze bilinear MLPs, a type of Gated Linear Unit (GLU) without any element-wise nonlinearity that nevertheless achieves competitive performance. Bilinear MLPs can be fully expressed in terms of linear operations using a third-order tensor, allowing flexible analysis of the weights. Analyzing the spectra of bilinear MLP weights using eigendecomposition reveals interpretable low-rank structure across toy tasks, image classification, and language modeling. We use this understanding to craft adversarial examples, uncover overfitting, and identify small language model circuits directly from the weights alone. Our results demonstrate that bilinear layers serve as an interpretable drop-in replacement for current activation functions and that weight-based interpretability is viable for understanding deep-learning models.

Updated: 2025-06-25 10:36:59

标题: 双线性多层感知器使基于权重的机械解释能力成为可能

摘要: MLPs在深度神经网络中进行计算的机制性理解仍然难以捉摸。当前的可解释性工作可以从隐藏激活中提取特征，但通常无法解释MLP权重如何构建特征。一个挑战是逐元素非线性引入了高阶交互作用，并使得通过MLP层的计算变得困难。在本文中，我们分析了双线性MLPs，一种没有任何逐元素非线性的门控线性单元（GLU）类型，但仍然实现了竞争性能。双线性MLPs可以完全用第三阶张量的线性运算来表示，从而允许对权重进行灵活分析。通过使用特征值分解分析双线性MLP权重的频谱，揭示了在玩具任务、图像分类和语言建模中的可解释低秩结构。我们利用这一理解来制作对抗性示例，揭示过拟合，并仅通过权重识别小型语言模型电路。我们的结果表明，双线性层可以作为当前激活函数的可解释性替代品，并且基于权重的可解释性对于理解深度学习模型是可行的。

更新时间: 2025-06-25 10:36:59

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.08417v2

Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

Existing offline hierarchical reinforcement learning methods rely on high-level policy learning to generate subgoal sequences. However, their efficiency degrades as task horizons increase, and they lack effective strategies for stitching useful state transitions across different trajectories. We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. By embedding states into a Temporal Distance Representation (TDR) space, GAS clusters semantically similar states from different trajectories into unified graph nodes, enabling efficient transition stitching. A shortest-path algorithm is then applied to select subgoal sequences within the graph, while a low-level policy learns to reach the subgoals. To improve graph quality, we introduce the Temporal Efficiency (TE) metric, which filters out noisy or inefficient transition states, significantly enhancing task performance. GAS outperforms prior offline HRL methods across locomotion, navigation, and manipulation tasks. Notably, in the most stitching-critical task, it achieves a score of 88.3, dramatically surpassing the previous state-of-the-art score of 1.0. Our source code is available at: https://github.com/qortmdgh4141/GAS.

Updated: 2025-06-25 10:33:47

标题: 离线层次强化学习的图辅助拼接

摘要: 现有的离线分层强化学习方法依赖于高级策略学习来生成子目标序列。然而，随着任务视野的增加，它们的效率会降低，并且缺乏有效的策略来在不同轨迹之间连接有用的状态转换。我们提出了一种新颖的框架Graph-Assisted Stitching (GAS)，将子目标选择形式化为图搜索问题，而不是学习显式的高级策略。通过将状态嵌入到一个Temporal Distance Representation (TDR)空间中，GAS将来自不同轨迹的语义相似状态聚类为统一的图节点，实现了高效的转换连接。然后，最短路径算法被应用于在图中选择子目标序列，同时低级策略学习达到子目标。为了改进图的质量，我们引入了Temporal Efficiency (TE)度量，它可以过滤掉嘈杂或低效的转换状态，显著提高任务性能。GAS在运动、导航和操作任务中优于先前的离线HRL方法。值得注意的是，在最关键的连接任务中，它取得了88.3的分数，远远超过了之前的最新分数1.0。我们的源代码可在以下链接找到：https://github.com/qortmdgh4141/GAS。

更新时间: 2025-06-25 10:33:47

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2506.07744v2

OLALa: Online Learned Adaptive Lattice Codes for Heterogeneous Federated Learning

Federated learning (FL) enables collaborative training across distributed clients without sharing raw data, often at the cost of substantial communication overhead induced by transmitting high-dimensional model updates. This overhead can be alleviated by having the clients quantize their model updates, with dithered lattice quantizers identified as an attractive scheme due to its structural simplicity and convergence-preserving properties. However, existing lattice-based FL schemes typically rely on a fixed quantization rule, which is suboptimal in heterogeneous and dynamic environments where the model updates distribution varies across users and training rounds. In this work, we propose Online Learned Adaptive Lattices (OLALa), a heterogeneous FL framework where each client can adjust its quantizer online using lightweight local computations. We first derive convergence guarantees for FL with non-fixed lattice quantizers and show that proper lattice adaptation can tighten the convergence bound. Then, we design an online learning algorithm that enables clients to tune their quantizers throughout the FL process while exchanging only a compact set of quantization parameters. Numerical experiments demonstrate that OLALa consistently improves learning performance under various quantization rates, outperforming conventional fixed-codebook and non-adaptive schemes.

Updated: 2025-06-25 10:18:34

标题: OLALa：用于异构联邦学习的在线学习自适应栅格编码

摘要: Federated learning (FL) 允许在分布式客户端之间进行协作训练，而不共享原始数据，通常以传输高维模型更新引起的大量通信开销为代价。通过让客户端量化其模型更新，可以减轻这种开销，鉴于其结构简单和保持收敛性的特性，被确定为一种有吸引力的方案。然而，现有基于格的 FL 方案通常依赖于固定的量化规则，在模型更新分布在用户和训练轮次之间变化的异质和动态环境中，这是次优的。在这项工作中，我们提出了一种异质 FL 框架，每个客户端可以通过轻量级本地计算在线调整其量化器。我们首先推导了具有非固定格量化器的 FL 的收敛保证，并展示了适当的格适应可以加强收敛界限。然后，我们设计了一种在线学习算法，使客户端能够在整个 FL 过程中调整其量化器，同时仅交换一组紧凑的量化参数。数值实验表明，在各种量化率下，OLALa 在学习性能上始终有所改善，优于传统的固定码书和非自适应方案。

更新时间: 2025-06-25 10:18:34

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2506.20297v1

Provably Improving Generalization of Few-Shot Models with Synthetic Data

Few-shot image classification remains challenging due to the scarcity of labeled training examples. Augmenting them with synthetic data has emerged as a promising way to alleviate this issue, but models trained on synthetic samples often face performance degradation due to the inherent gap between real and synthetic distributions. To address this limitation, we develop a theoretical framework that quantifies the impact of such distribution discrepancies on supervised learning, specifically in the context of image classification. More importantly, our framework suggests practical ways to generate good synthetic samples and to train a predictor with high generalization ability. Building upon this framework, we propose a novel theoretical-based algorithm that integrates prototype learning to optimize both data partitioning and model training, effectively bridging the gap between real few-shot data and synthetic data. Extensive experiments results show that our approach demonstrates superior performance compared to state-of-the-art methods, outperforming them across multiple datasets.

Updated: 2025-06-25 10:02:36

标题: 使用合成数据可证明地改善少样本模型的泛化

摘要: Few-shot图像分类仍然具有挑战性，因为标记训练样本稀缺。通过合成数据增强它们已经被证明是缓解这一问题的一种有希望的方法，但是在合成样本上训练的模型通常由于真实和合成分布之间固有的差距而面临性能下降。为了解决这一限制，我们开发了一个理论框架，量化了这种分布差异对监督学习的影响，特别是在图像分类的背景下。更重要的是，我们的框架提出了生成良好合成样本和训练具有高泛化能力的预测器的实用方法。基于这一框架，我们提出了一个基于理论的新算法，将原型学习整合到优化数据分区和模型训练的过程中，有效地弥合了真实少样本数据和合成数据之间的差距。大量实验证实显示，我们的方法表现优越，胜过了多个数据集上的最先进方法。

更新时间: 2025-06-25 10:02:36

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2505.24190v2

Flexible Infinite-Width Graph Convolutional Neural Networks

A common theoretical approach to understanding neural networks is to take an infinite-width limit, at which point the outputs become Gaussian process (GP) distributed. This is known as a neural network Gaussian process (NNGP). However, the NNGP kernel is fixed and tunable only through a small number of hyperparameters, thus eliminating the possibility of representation learning. This contrasts with finite-width NNs, which are often believed to perform well because they are able to flexibly learn representations for the task at hand. Thus, in simplifying NNs to make them theoretically tractable, NNGPs may eliminate precisely what makes them work well (representation learning). This motivated us to understand whether representation learning is necessary in a range of graph tasks. We develop a precise tool for this task, the graph convolutional deep kernel machine. This is very similar to an NNGP, in that it is an infinite width limit and uses kernels, but comes with a ``knob'' to control the amount of flexibility and hence representation learning. We found that representation learning gives noticeable performance improvements for heterophilous node classification tasks, but less so for homophilous node classification tasks.

Updated: 2025-06-25 09:59:16

标题: 灵活的无限宽度图卷积神经网络

摘要: 理解神经网络的一个常见理论方法是在无限宽度极限下，此时输出变成高斯过程（GP）分布。这被称为神经网络高斯过程（NNGP）。然而，NNGP核是固定的，只能通过少量超参数进行调整，因此消除了表示学习的可能性。这与有限宽度的神经网络形成对比，通常认为它们表现良好，因为它们能够灵活地学习任务的表示。因此，在简化神经网络以使其在理论上可追踪时，NNGP可能会消除使其良好运行的因素（表示学习）。这促使我们去理解在一系列图任务中表示学习是否是必要的。我们开发了一个精确的工具来完成这项任务，即图卷积深度核机。这与NNGP非常相似，因为它是一个无限宽度极限，并使用核函数，但它带有一个“旋钮”来控制灵活性的程度，从而影响表示学习。我们发现，表示学习对异质节点分类任务有显着的性能改进，但对同质节点分类任务的改进则较小。

更新时间: 2025-06-25 09:59:16

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.06525v2

Efficient uniform approximation using Random Vector Functional Link networks

A Random Vector Functional Link (RVFL) network is a depth-2 neural network with random inner weights and biases. Only the outer weights of such an architecture are to be learned, so the learning process boils down to a linear optimization task, allowing one to sidestep the pitfalls of nonconvex optimization problems. In this paper, we prove that an RVFL with ReLU activation functions can approximate Lipschitz continuous functions in $L_\infty$ norm. To the best of our knowledge, our result is the first approximation result in $L_\infty$ norm using nice inner weights; namely, Gaussians. We give a nonasymptotic lower bound for the number of hidden-layer nodes to achieve a given accuracy with high probability, depending on, among other things, the Lipschitz constant of the target function, the desired accuracy, and the input dimension. Our method of proof is rooted in probability theory and harmonic analysis.

Updated: 2025-06-25 09:55:42

标题: 使用随机向量功能链接网络进行高效的均匀逼近

摘要: 一种随机向量功能链接（RVFL）网络是一个深度为2的神经网络，具有随机的内部权重和偏置。这种架构只需要学习外部权重，因此学习过程归结为线性优化任务，使得可以避开非凸优化问题的陷阱。在本文中，我们证明了具有ReLU激活函数的RVFL可以在$L_\infty$范数中逼近利普希茨连续函数。据我们所知，我们的结果是使用良好的内部权重（即高斯函数）在$L_\infty$范数中的第一个逼近结果。我们给出了一个非渐近的下限，用于实现给定精度的隐藏层节点数量，具有很高的概率，这取决于目标函数的利普希茨常数、所需的精度和输入维度等因素。我们的证明方法根植于概率论和谐波分析。

更新时间: 2025-06-25 09:55:42

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2306.17501v2

Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo

A recent line of research has exploited pre-trained generative diffusion models as priors for solving Bayesian inverse problems. We contribute to this research direction by designing a sequential Monte Carlo method for linear-Gaussian inverse problems which builds on "decoupled diffusion", where the generative process is designed such that larger updates to the sample are possible. The method is asymptotically exact and we demonstrate the effectiveness of our Decoupled Diffusion Sequential Monte Carlo (DDSMC) algorithm on both synthetic as well as protein and image data. Further, we demonstrate how the approach can be extended to discrete data.

Updated: 2025-06-25 09:54:45

标题: 使用解耦扩散顺序蒙特卡洛方法解决线性高斯贝叶斯逆问题

摘要: 最近的一系列研究利用预训练的生成扩散模型作为贝叶斯逆问题求解的先验。我们通过设计一个用于线性高斯逆问题的顺序蒙特卡洛方法，以“解耦扩散”为基础，为这一研究方向做出了贡献。在解耦扩散中，生成过程被设计为可以对样本进行更大的更新。该方法在渐近意义下是精确的，我们展示了我们的解耦扩散顺序蒙特卡洛（DDSMC）算法在合成数据、蛋白质和图像数据上的有效性。此外，我们展示了这种方法如何扩展到离散数据。

更新时间: 2025-06-25 09:54:45

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2502.06379v2

Beyond Topological Self-Explainable GNNs: A Formal Explainability Perspective

Self-Explainable Graph Neural Networks (SE-GNNs) are popular explainable-by-design GNNs, but their explanations' properties and limitations are not well understood. Our first contribution fills this gap by formalizing the explanations extracted by some popular SE-GNNs, referred to as Minimal Explanations (MEs), and comparing them to established notions of explanations, namely Prime Implicant (PI) and faithful explanations. Our analysis reveals that MEs match PI explanations for a restricted but significant family of tasks. In general, however, they can be less informative than PI explanations and are surprisingly misaligned with widely accepted notions of faithfulness. Although faithful and PI explanations are informative, they are intractable to find and we show that they can be prohibitively large. Given these observations, a natural choice is to augment SE-GNNs with alternative modalities of explanations taking care of SE-GNNs' limitations. To this end, we propose Dual-Channel GNNs that integrate a white-box rule extractor and a standard SE-GNN, adaptively combining both channels. Our experiments show that even a simple instantiation of Dual-Channel GNNs can recover succinct rules and perform on par or better than widely used SE-GNNs.

Updated: 2025-06-25 09:52:40

标题: 超越拓扑自解释GNNs：一个形式化可解释性视角

摘要: 自解释图神经网络（SE-GNNs）是一种受欢迎的可解释性设计的图神经网络，但它们的解释性质和局限性尚未被很好地理解。我们的第一个贡献通过形式化某些流行SE-GNNs提取的解释，称为最小解释（MEs），并将它们与已建立的解释概念，即主蕴含式（PI）和忠实解释进行比较，来填补这一空白。我们的分析揭示了MEs与PI解释在一组受限但重要的任务中匹配。然而，总的来说，它们可能比PI解释更少提供信息，并且与被广泛接受的忠实性概念出奇地不一致。尽管忠实性和PI解释具有信息量，但它们很难找到，我们展示它们可能太大而难以处理。鉴于这些观察，一个自然的选择是通过增加SE-GNNs的替代解释模式来处理SE-GNNs的局限性。为此，我们提出了双通道GNNs，它集成了一个白盒规则提取器和一个标准SE-GNN，自适应地结合两个通道。我们的实验表明，即使是双通道GNNs的简单实例也能恢复简洁规则，并且表现与或优于广泛使用的SE-GNNs。

更新时间: 2025-06-25 09:52:40

领域: cs.LG

下载: http://arxiv.org/abs/2502.02719v2

Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning

Instruction fine-tuning (IFT) can increase the informativeness of large language models (LLMs), but may reduce their truthfulness. This trade-off arises because IFT steers LLMs to generate responses containing long-tail knowledge that was not well covered during pre-training. As a result, models become more informative but less accurate when generalizing to unseen tasks. In this paper, we empirically demonstrate how unfamiliar knowledge in IFT datasets can negatively affect the truthfulness of LLMs, and we introduce two new IFT paradigms, $UNIT_{cut}$ and $UNIT_{ref}$, to address this issue. $UNIT_{cut}$ identifies and removes unfamiliar knowledge from IFT datasets to mitigate its impact on model truthfulness, whereas $UNIT_{ref}$ trains LLMs to recognize their uncertainty and explicitly indicate it at the end of their responses. Our experiments show that $UNIT_{cut}$ substantially improves LLM truthfulness, while $UNIT_{ref}$ maintains high informativeness and reduces hallucinations by distinguishing between confident and uncertain statements.

Updated: 2025-06-25 09:51:33

标题: 在不确定性感知指导微调中平衡真实性和信息量

摘要: 指导微调（IFT）可以增加大型语言模型（LLMs）的信息量，但可能会降低其真实性。这种权衡是因为IFT引导LLMs生成包含在预训练期间未充分覆盖的长尾知识的响应。因此，当推广到未见任务时，模型变得更具信息性但更不准确。在本文中，我们通过实证研究展示了IFT数据集中的陌生知识如何负面影响LLMs的真实性，并引入了两种新的IFT范例，$UNIT_{cut}$和$UNIT_{ref}$，以解决这个问题。$UNIT_{cut}$识别并移除IFT数据集中的陌生知识，以减轻其对模型真实性的影响，而$UNIT_{ref}$训练LLMs识别其不确定性，并在响应结尾明确指出。我们的实验表明，$UNIT_{cut}$显著提高了LLMs的真实性，而$UNIT_{ref}$通过区分自信和不确定的陈述，保持了高信息量并减少了幻觉。

更新时间: 2025-06-25 09:51:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.11962v3

Don't Hash Me Like That: Exposing and Mitigating Hash-Induced Unfairness in Local Differential Privacy

Local differential privacy (LDP) has become a widely accepted framework for privacy-preserving data collection. In LDP, many protocols rely on hash functions to implement user-side encoding and perturbation. However, the security and privacy implications of hash function selection have not been previously investigated. In this paper, we expose that the hash functions may act as a source of unfairness in LDP protocols. We show that although users operate under the same protocol and privacy budget, differences in hash functions can lead to significant disparities in vulnerability to inference and poisoning attacks. To mitigate hash-induced unfairness, we propose Fair-OLH (F-OLH), a variant of OLH that enforces an entropy-based fairness constraint on hash function selection. Experiments show that F-OLH is effective in mitigating hash-induced unfairness under acceptable time overheads.

Updated: 2025-06-25 09:48:30

标题: 不要像那样对我进行哈希处理：揭示和减轻局部差分隐私中由哈希引起的不公平现象

摘要: 本地差分隐私（LDP）已经成为隐私保护数据收集的广泛接受框架。在LDP中，许多协议依赖于哈希函数来实现用户端编码和扰动。然而，哈希函数选择的安全性和隐私性影响尚未得到先前的研究。本文揭示了哈希函数可能在LDP协议中作为不公平性的源头。我们展示，虽然用户在相同的协议和隐私预算下操作，哈希函数的差异可能导致在推断和毒化攻击中对漏洞的显着差异。为了减轻哈希引起的不公平性，我们提出了Fair-OLH（F-OLH），这是OLH的一个变种，它对哈希函数选择强制执行基于熵的公平性约束。实验证明F-OLH在可接受的时间开销下有效地减轻了哈希引起的不公平性。

更新时间: 2025-06-25 09:48:30

领域: cs.CR

下载: http://arxiv.org/abs/2506.20290v1

Distilling A Universal Expert from Clustered Federated Learning

Clustered Federated Learning (CFL) addresses the challenges posed by non-IID data by training multiple group- or cluster-specific expert models. However, existing methods often overlook the shared information across clusters, which represents the generalizable knowledge valuable to all participants in the Federated Learning (FL) system. To overcome this limitation, this paper introduces a novel FL framework that distills a universal expert model from the knowledge of multiple clusters. This universal expert captures globally shared information across all clients and is subsequently distributed to each client as the initialization for the next round of model training. The proposed FL framework operates in three iterative steps: (1) local model training at each client, (2) cluster-specific model aggregation, and (3) universal expert distillation. This three-step learning paradigm ensures the preservation of fine-grained non-IID characteristics while effectively incorporating shared knowledge across clusters. Compared to traditional gradient-based aggregation methods, the distillation-based model aggregation introduces greater flexibility in handling model heterogeneity and reduces conflicts among cluster-specific experts. Extensive experimental results demonstrate the superior performance of the proposed method across various scenarios, highlighting its potential to advance the state of CFL by balancing personalized and shared knowledge more effectively.

Updated: 2025-06-25 09:44:39

标题: 从集群化联邦学习中提炼通用专家

摘要: 集群化联邦学习（CFL）通过训练多个组或集群特定的专家模型来解决非独立同分布数据所带来的挑战。然而，现有方法通常忽视集群间的共享信息，这代表了对于联邦学习（FL）系统中所有参与者有价值的可泛化知识。为了克服这一局限性，本文引入了一种新颖的FL框架，从多个集群的知识中提炼出一个通用的专家模型。这个通用专家捕捉了所有客户端之间共享的信息，并随后分发给每个客户端作为下一轮模型训练的初始化。提出的FL框架分为三个迭代步骤：（1）每个客户端的本地模型训练，（2）集群特定模型聚合，以及（3）通用专家提炼。这种三步学习范式确保了细粒度非独立同分布特征的保留，同时有效地整合了集群间的共享知识。与传统基于梯度的聚合方法相比，基于提炼的模型聚合引入了更大的灵活性，可以处理模型异质性，并减少集群特定专家之间的冲突。大量实验结果显示了所提出方法在各种情景下的卓越性能，突显了其通过更有效地平衡个性化和共享知识来推动CFL技术发展的潜力。

更新时间: 2025-06-25 09:44:39

领域: cs.LG

下载: http://arxiv.org/abs/2506.20285v1

Enterprise Large Language Model Evaluation Benchmark

Large Language Models (LLMs) ) have demonstrated promise in boosting productivity across AI-powered tools, yet existing benchmarks like Massive Multitask Language Understanding (MMLU) inadequately assess enterprise-specific task complexities. We propose a 14-task framework grounded in Bloom's Taxonomy to holistically evaluate LLM capabilities in enterprise contexts. To address challenges of noisy data and costly annotation, we develop a scalable pipeline combining LLM-as-a-Labeler, LLM-as-a-Judge, and corrective retrieval-augmented generation (CRAG), curating a robust 9,700-sample benchmark. Evaluation of six leading models shows open-source contenders like DeepSeek R1 rival proprietary models in reasoning tasks but lag in judgment-based scenarios, likely due to overthinking. Our benchmark reveals critical enterprise performance gaps and offers actionable insights for model optimization. This work provides enterprises a blueprint for tailored evaluations and advances practical LLM deployment.

Updated: 2025-06-25 09:34:25

标题: 企业大型语言模型评估基准

摘要: 大型语言模型（LLMs）已经展示出在AI驱动工具中提高生产力的潜力，然而现有的基准如Massive Multitask Language Understanding（MMLU）未能充分评估企业特定任务的复杂性。我们提出了一个基于布鲁姆的分类法的14个任务框架，以全面评估LLM在企业环境中的能力。为了解决数据噪声和昂贵的标注的挑战，我们开发了一个可扩展的流程，结合LLM作为标签、LLM作为评判者和纠错检索增强生成（CRAG），策划了一个强大的9700个样本的基准。对六个领先模型的评估显示，像DeepSeek R1这样的开源竞争者在推理任务上与专有模型相媲美，但在基于判断的情景中落后，可能是由于过度思考。我们的基准揭示了关键的企业绩效差距，并为模型优化提供了可操作的见解。这项工作为企业提供了定制评估的蓝图，并推进了实用LLM的部署。

更新时间: 2025-06-25 09:34:25

领域: cs.AI

下载: http://arxiv.org/abs/2506.20274v1

Forensic Study of Paintings Through the Comparison of Fabrics

The study of canvas fabrics in works of art is a crucial tool for authentication, attribution and conservation. Traditional methods are based on thread density map matching, which cannot be applied when canvases do not come from contiguous positions on a roll. This paper presents a novel approach based on deep learning to assess the similarity of textiles. We introduce an automatic tool that evaluates the similarity between canvases without relying on thread density maps. A Siamese deep learning model is designed and trained to compare pairs of images by exploiting the feature representations learned from the scans. In addition, a similarity estimation method is proposed, aggregating predictions from multiple pairs of cloth samples to provide a robust similarity score. Our approach is applied to canvases from the Museo Nacional del Prado, corroborating the hypothesis that plain weave canvases, widely used in painting, can be effectively compared even when their thread densities are similar. The results demonstrate the feasibility and accuracy of the proposed method, opening new avenues for the analysis of masterpieces.

Updated: 2025-06-25 09:34:10

标题: "透过比较织物进行绘画的法医学研究"

摘要: 艺术作品中帆布织物的研究对于鉴定、归属和保护至关重要。传统方法基于线密度图匹配，但当帆布不是来自卷上相邻位置时，这种方法是无法应用的。本文提出了一种基于深度学习的新方法来评估纺织品的相似性。我们引入了一种自动工具，可以评估帆布之间的相似性，而无需依赖线密度图。设计并训练了一个Siamese深度学习模型，通过利用从扫描中学习到的特征表示来比较图像对。此外，提出了一种相似性估计方法，聚合多对布样本的预测结果，提供一个稳健的相似性评分。我们的方法应用于普拉多国家博物馆的帆布上，证实了普遍用于绘画的平纹帆布即使线密度相似，也可以有效比较的假设。结果表明了所提出方法的可行性和准确性，为艺术品分析开辟了新的途径。

更新时间: 2025-06-25 09:34:10

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2506.20272v1

Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift?

The performance figures of modern drift-adaptive malware classifiers appear promising, but does this translate to genuine operational reliability? The standard evaluation paradigm primarily focuses on baseline performance metrics, neglecting confidence-error alignment and operational stability. While TESSERACT established the importance of temporal evaluation, we take a complementary direction by investigating whether malware classifiers maintain reliable and stable confidence estimates under distribution shifts and exploring the tensions between scientific advancement and practical impacts when they do not. We propose AURORA, a framework to evaluate malware classifiers based on their confidence quality and operational resilience. AURORA subjects the confidence profile of a given model to verification to assess the reliability of its estimates. Unreliable confidence estimates erode operational trust, waste valuable annotation budget on non-informative samples for active learning, and leave error-prone instances undetected in selective classification. AURORA is complemented by a set of metrics designed to go beyond point-in-time performance, striving towards a more holistic assessment of operational stability throughout temporal evaluation periods. The fragility in SOTA frameworks across datasets of varying drift suggests the need for a return to the whiteboard.

Updated: 2025-06-25 09:30:26

标题: 极光：在分布转移下，安卓恶意软件分类器可靠且稳定吗？

摘要: 现代漂移自适应恶意软件分类器的性能数据似乎很有前景，但这是否能转化为真正的操作可靠性？标准评估范式主要关注基准性能指标，忽视了置信度-错误对齐和操作稳定性。虽然TESSERACT建立了时间评估的重要性，但我们通过调查恶意软件分类器是否在分布转移下保持可靠和稳定的置信度估计，探索了科学进步和实际影响之间的紧张关系。我们提出了AURORA，一个基于置信度质量和操作弹性评估恶意软件分类器的框架。AURORA将给定模型的置信度概况提交验证，以评估其估计的可靠性。不可靠的置信度估计会侵蚀操作信任，浪费宝贵的注释预算用于非信息样本的主动学习，并使选择性分类中的易出错实例未被检测到。AURORA还配备了一套度量标准，旨在超越时间点性能，努力实现在时间评估期间对操作稳定性进行更全面的评估。跨各种漂移数据集的SOTA框架的脆弱性表明需要回到白板重新思考。

更新时间: 2025-06-25 09:30:26

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2505.22843v2

Teacher Motion Priors: Enhancing Robot Locomotion over Challenging Terrain

Achieving robust locomotion on complex terrains remains a challenge due to high dimensional control and environmental uncertainties. This paper introduces a teacher prior framework based on the teacher student paradigm, integrating imitation and auxiliary task learning to improve learning efficiency and generalization. Unlike traditional paradigms that strongly rely on encoder-based state embeddings, our framework decouples the network design, simplifying the policy network and deployment. A high performance teacher policy is first trained using privileged information to acquire generalizable motion skills. The teacher's motion distribution is transferred to the student policy, which relies only on noisy proprioceptive data, via a generative adversarial mechanism to mitigate performance degradation caused by distributional shifts. Additionally, auxiliary task learning enhances the student policy's feature representation, speeding up convergence and improving adaptability to varying terrains. The framework is validated on a humanoid robot, showing a great improvement in locomotion stability on dynamic terrains and significant reductions in development costs. This work provides a practical solution for deploying robust locomotion strategies in humanoid robots.

Updated: 2025-06-25 09:27:22

标题: 教师动作先验：提升机器人在复杂地形上的移动能力

摘要: 在复杂地形上实现稳健的运动仍然是一个挑战，原因是高维控制和环境不确定性。本文介绍了一种基于师生范式的师先框架，将模仿和辅助任务学习相结合，以提高学习效率和泛化能力。与传统范式不同，强烈依赖基于编码器的状态嵌入的范式，我们的框架解耦了网络设计，简化了策略网络和部署。首先使用特权信息训练高性能师者策略，以获得可泛化的运动技能。师者的运动分布被转移到只依赖于嘈杂的本体感知数据的学生策略上，通过生成对抗机制来减轻由于分布转变导致的性能下降。此外，辅助任务学习增强了学生策略的特征表示，加快了收敛速度，提高了对不同地形的适应性。该框架在一个人形机器人上得到验证，展示了在动态地形上的运动稳定性大幅提升，以及开发成本显著降低。这项工作为在人形机器人中部署稳健运动策略提供了实际解决方案。

更新时间: 2025-06-25 09:27:22

领域: cs.RO,cs.AI,68T40

下载: http://arxiv.org/abs/2504.10390v2

X-SiT: Inherently Interpretable Surface Vision Transformers for Dementia Diagnosis

Interpretable models are crucial for supporting clinical decision-making, driving advances in their development and application for medical images. However, the nature of 3D volumetric data makes it inherently challenging to visualize and interpret intricate and complex structures like the cerebral cortex. Cortical surface renderings, on the other hand, provide a more accessible and understandable 3D representation of brain anatomy, facilitating visualization and interactive exploration. Motivated by this advantage and the widespread use of surface data for studying neurological disorders, we present the eXplainable Surface Vision Transformer (X-SiT). This is the first inherently interpretable neural network that offers human-understandable predictions based on interpretable cortical features. As part of X-SiT, we introduce a prototypical surface patch decoder for classifying surface patch embeddings, incorporating case-based reasoning with spatially corresponding cortical prototypes. The results demonstrate state-of-the-art performance in detecting Alzheimer's disease and frontotemporal dementia while additionally providing informative prototypes that align with known disease patterns and reveal classification errors.

Updated: 2025-06-25 09:24:07

标题: X-SiT：天生可解释的表面视觉变换器用于痴呆症诊断

摘要: 可解释的模型对支持临床决策至关重要，推动其在医学图像领域的发展和应用。然而，3D体积数据的性质使得对复杂结构如大脑皮层进行可视化和解释具有固有挑战性。另一方面，皮层表面渲染提供了大脑解剖的更加易于访问和理解的3D表示，促进了可视化和交互式探索。受到这一优势和表面数据广泛用于研究神经系统疾病的启发，我们提出了可解释表面视觉转换器（X-SiT）。这是第一个基于可解释皮层特征提供人类可理解预测的神经网络。作为X-SiT的一部分，我们引入了一个原型表面补丁解码器，用于对表面补丁嵌入进行分类，结合基于案例的推理和空间对应的皮层原型。结果表明，在检测阿尔茨海默病和额颞叶痴呆症方面表现出最先进的性能，同时提供了与已知疾病模式一致的信息丰富原型，并揭示了分类错误。

更新时间: 2025-06-25 09:24:07

领域: cs.GR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2506.20267v1

WoundAmbit: Bridging State-of-the-Art Semantic Segmentation and Real-World Wound Care

Chronic wounds affect a large population, particularly the elderly and diabetic patients, who often exhibit limited mobility and co-existing health conditions. Automated wound monitoring via mobile image capture can reduce in-person physician visits by enabling remote tracking of wound size. Semantic segmentation is key to this process, yet wound segmentation remains underrepresented in medical imaging research. To address this, we benchmark state-of-the-art deep learning models from general-purpose vision, medical imaging, and top methods from public wound challenges. For a fair comparison, we standardize training, data augmentation, and evaluation, conducting cross-validation to minimize partitioning bias. We also assess real-world deployment aspects, including generalization to an out-of-distribution wound dataset, computational efficiency, and interpretability. Additionally, we propose a reference object-based approach to convert AI-generated masks into clinically relevant wound size estimates and evaluate this, along with mask quality, for the five best architectures based on physician assessments. Overall, the transformer-based TransNeXt showed the highest levels of generalizability. Despite variations in inference times, all models processed at least one image per second on the CPU, which is deemed adequate for the intended application. Interpretability analysis typically revealed prominent activations in wound regions, emphasizing focus on clinically relevant features. Expert evaluation showed high mask approval for all analyzed models, with VWFormer and ConvNeXtS backbone performing the best. Size retrieval accuracy was similar across models, and predictions closely matched expert annotations. Finally, we demonstrate how our AI-driven wound size estimation framework, WoundAmbit, is integrated into a custom telehealth system.

Updated: 2025-06-25 09:21:21

标题: WoundAmbit：连接最先进的语义分割和现实世界伤口护理

摘要: 慢性伤口影响着大量人群，特别是老年人和糖尿病患者，他们常常表现出行动受限和其他健康状况。通过移动图像捕获实现自动化伤口监测可以减少需要亲自就医的次数，从而实现对伤口大小的远程跟踪。语义分割是这一过程的关键，然而伤口分割在医学影像研究中仍然未被充分探讨。为了解决这一问题，我们对来自通用视觉、医学影像和公共伤口挑战的顶级方法的最新深度学习模型进行了基准测试。为了公平比较，我们标准化了训练、数据增强和评估，并进行了交叉验证以最小化数据分区偏差。我们还评估了真实部署方面的问题，包括对分布不均的伤口数据集的泛化能力、计算效率和可解释性。此外，我们提出了一种基于参考对象的方法，将人工智能生成的掩膜转换为临床相关的伤口大小估计，并基于医生的评估对五个最佳架构进行了评估。总的来说，基于变压器的TransNeXt展现出了最高的泛化能力。尽管推理时间有所不同，但所有模型在CPU上每秒至少处理一张图像，这对于预期的应用是足够的。可解释性分析通常显示在伤口区域有突出的激活，强调对临床相关特征的关注。专家评估显示所有分析模型的掩膜获得高度认可，VWFormer和ConvNeXtS骨干表现最佳。各模型的大小检索准确性相似，预测与专家注释密切匹配。最后，我们演示了我们的基于人工智能的伤口大小估计框架WoundAmbit如何集成到自定义远程医疗系统中。

更新时间: 2025-06-25 09:21:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06185v2

Toddlers' Active Gaze Behavior Supports Self-Supervised Object Learning

Toddlers learn to recognize objects from different viewpoints with almost no supervision. During this learning, they execute frequent eye and head movements that shape their visual experience. It is presently unclear if and how these behaviors contribute to toddlers' emerging object recognition abilities. To answer this question, we here combine head-mounted eye tracking during dyadic play with unsupervised machine learning. We approximate toddlers' central visual field experience by cropping image regions from a head-mounted camera centered on the current gaze location estimated via eye tracking. This visual stream feeds an unsupervised computational model of toddlers' learning, which constructs visual representations that slowly change over time. Our experiments demonstrate that toddlers' gaze strategy supports the learning of invariant object representations. Our analysis also shows that the limited size of the central visual field where acuity is high is crucial for this. Overall, our work reveals how toddlers' gaze behavior may support their development of view-invariant object recognition.

Updated: 2025-06-25 09:19:50

标题: 幼儿的主动凝视行为支持自主学习对象

摘要: 幼儿学会在几乎没有监督的情况下，从不同视角识别物体。在这个学习过程中，他们频繁进行眼睛和头部运动，塑造了他们的视觉体验。目前尚不清楚这些行为是否以及如何促进幼儿日益发展的物体识别能力。为了回答这个问题，我们在双人游戏中结合头戴式眼动追踪和无监督机器学习。我们通过从头戴式摄像头中心的当前注视位置裁剪图像区域来近似幼儿的中央视野体验，该位置通过眼动追踪估计。这个视觉流向无监督计算模型提供了幼儿学习的信息，这些信息会随着时间缓慢变化而构建视觉表征。我们的实验表明，幼儿的凝视策略支持不变物体表征的学习。我们的分析还表明，高视力锐度的中央视野有限大小对此至关重要。总的来说，我们的工作揭示了幼儿的凝视行为如何可能支持他们发展视角不变的物体识别能力。

更新时间: 2025-06-25 09:19:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.01969v3

3D variational autoencoder for fingerprinting microstructure volume elements

Microstructure quantification is an important step towards establishing structure-property relationships in materials. Machine learning-based image processing methods have been shown to outperform conventional image processing techniques and are increasingly applied to microstructure quantification tasks. In this work, we present a 3D variational autoencoder (VAE) for encoding microstructure volume elements (VEs) comprising voxelated crystallographic orientation data. Crystal symmetries in the orientation space are accounted for by mapping to the crystallographic fundamental zone as a preprocessing step, which allows for a continuous loss function to be used and improves the training convergence rate. The VAE is then used to encode a training set of VEs with an equiaxed polycrystalline microstructure with random texture. Accurate reconstructions are achieved with a relative average misorientation error of 3x10^-2 on the test dataset, for a continuous latent space with dimension 256. We show that the model generalises well to microstructures with textures, grain sizes and aspect ratios outside the training distribution. Structure-property relationships are explored through using the training set of VEs as initial configurations in various crystal plasticity (CP) simulations. Microstructural fingerprints extracted from the VAE, which parameterise the VEs in a low-dimensional latent space, are stored alongside the volume-averaged stress response, at each strain increment, to uniaxial tensile deformation from CP simulations. This is then used to train a fully connected neural network mapping the input fingerprint to the resulting stress response, which acts as a surrogate model for the CP simulation. The fingerprint-based surrogate model is shown to accurately predict the microstructural dependence in the CP stress response, with a relative mean-squared error of 2.75 MPa on unseen test data.

Updated: 2025-06-25 09:14:01

标题: 3D变分自动编码器用于指纹微结构体积单元

摘要: 微观结构量化是建立材料结构-性能关系的重要步骤。基于机器学习的图像处理方法已经被证明优于传统图像处理技术，并越来越多地应用于微观结构量化任务。在这项工作中，我们提出了一个用于编码微观结构体积元素（VEs）的3D变分自编码器（VAE），其中包括像素化的晶体学取向数据。定向空间中的晶体对称性通过映射到晶体学基本区作为预处理步骤来考虑，这允许使用连续损失函数并提高训练收敛速度。然后使用VAE对具有随机纹理的等轴多晶微观结构的训练集进行编码。在测试数据集上实现了相对平均失配误差为3x10^-2的准确重建，对应具有256维连续潜在空间。我们展示了该模型在具有不同纹理、晶粒尺寸和纵横比的微观结构上具有良好的泛化能力。通过使用VE的训练集作为各种晶体塑性（CP）模拟中的初始配置，探索了结构-性能关系。从VAE中提取的微观结构特征，对VE进行参数化，存储在每个应变增量到单轴拉伸变形的CP模拟中的体积平均应力响应旁边。然后使用这些特征为输入，训练一个全连接的神经网络，将输入指纹映射到结果应力响应，作为CP模拟的替代模型。基于指纹的替代模型被证明可以准确预测CP应力响应中的微观结构依赖性，相对均方误差为2.75 MPa。

更新时间: 2025-06-25 09:14:01

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2503.17427v3

Exploration-Exploitation Tradeoff in Universal Lossy Compression

Universal compression can learn the source and adapt to it either in a batch mode (forward adaptation), or in a sequential mode (backward adaptation). We recast the sequential mode as a multi-armed bandit problem, a fundamental model in reinforcement-learning, and study the trade-off between exploration and exploitation in the lossy compression case. We show that a previously proposed "natural type selection" scheme can be cast as a reconstruction-directed MAB algorithm, for sequential lossy compression, and explain its limitations in terms of robustness and short-block performance. We then derive and analyze robust cost-directed MAB algorithms, which work at any block length.

Updated: 2025-06-25 09:08:29

标题: 通用有损压缩中的勘探-开发权衡

摘要: 通用压缩可以在批处理模式（前向适应）或顺序模式（后向适应）中学习源并适应它。我们将顺序模式重新构建为多臂老虎机问题，这是强化学习中的基本模型，并研究了有损压缩情况下探索和利用之间的权衡。我们展示了先前提出的“自然类型选择”方案可以被视为一个重建导向的MAB算法，用于顺序有损压缩，并解释了其在稳健性和短块性能方面的局限性。然后，我们推导并分析了稳健成本导向的MAB算法，这些算法适用于任何块长度。

更新时间: 2025-06-25 09:08:29

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2506.20261v1

Fine-tuning machine-learned particle-flow reconstruction for new detector geometries in future colliders

We demonstrate transfer learning capabilities in a machine-learned algorithm trained for particle-flow reconstruction in high energy particle colliders. This paper presents a cross-detector fine-tuning study, where we initially pretrain the model on a large full simulation dataset from one detector design, and subsequently fine-tune the model on a sample with a different collider and detector design. Specifically, we use the Compact Linear Collider detector (CLICdet) model for the initial training set and demonstrate successful knowledge transfer to the CLIC-like detector (CLD) proposed for the Future Circular Collider in electron-positron mode. We show that with an order of magnitude less samples from the second dataset, we can achieve the same performance as a costly training from scratch, across particle-level and event-level performance metrics, including jet and missing transverse momentum resolution. Furthermore, we find that the fine-tuned model achieves comparable performance to the traditional rule-based particle-flow approach on event-level metrics after training on 100,000 CLD events, whereas a model trained from scratch requires at least 1 million CLD events to achieve similar reconstruction performance. To our knowledge, this represents the first full-simulation cross-detector transfer learning study for particle-flow reconstruction. These findings offer valuable insights towards building large foundation models that can be fine-tuned across different detector designs and geometries, helping to accelerate the development cycle for new detectors and opening the door to rapid detector design and optimization using machine learning.

Updated: 2025-06-25 09:07:47

标题: 为未来对撞机中的新探测器几何结构优化机器学习粒子流重建

摘要: 我们展示了在高能粒子对撞机中用于粒子流重建的机器学习算法中的迁移学习能力。本文介绍了一项跨探测器微调研究，我们首先在一个探测器设计的大型全模拟数据集上对模型进行预训练，然后在一个不同对撞机和探测器设计的样本上对模型进行微调。具体来说，我们在Compact Linear Collider探测器（CLICdet）模型上进行初始训练集，并展示了成功将知识转移至Future Circular Collider中提出的类CLIC探测器（CLD）。我们表明，通过使用第二数据集中数量级较少的样本，我们可以实现与从头开始昂贵训练相同的性能，跨粒子级和事件级性能指标，包括喷注和缺失横向动量分辨率。此外，我们发现微调后的模型在经过10万个CLD事件训练后，在事件级别指标上与传统基于规则的粒子流方法实现了可比的性能，而从头开始训练的模型需要至少100万个CLD事件才能实现类似的重建性能。据我们所知，这是首个针对粒子流重建的全模拟跨探测器迁移学习研究。这些发现为构建可以在不同探测器设计和几何结构之间进行微调的大型基础模型提供了宝贵的见解，有助于加快新探测器的开发周期，并打开了利用机器学习进行快速探测器设计和优化的大门。

更新时间: 2025-06-25 09:07:47

领域: hep-ex,cs.LG,hep-ph,physics.data-an,physics.ins-det

下载: http://arxiv.org/abs/2503.00131v4

Argumentative Ensembling for Robust Recourse under Model Multiplicity

In machine learning, it is common to obtain multiple equally performing models for the same prediction task, e.g., when training neural networks with different random seeds. Model multiplicity (MM) is the situation which arises when these competing models differ in their predictions for the same input, for which ensembling is often employed to determine an aggregation of the outputs. Providing recourse recommendations via counterfactual explanations (CEs) under MM thus becomes complex, since the CE may not be valid across all models, i.e., the CEs are not robust under MM. In this work, we formalise the problem of providing recourse under MM, which we name recourse-aware ensembling (RAE). We propose the idea that under MM, CEs for each individual model should be considered alongside their predictions so that the aggregated prediction and recourse are decided in tandem. Centred around this intuition, we introduce six desirable properties for solutions to this problem. For solving RAE, we propose a novel argumentative ensembling method which guarantees the robustness of CEs under MM. Specifically, our method leverages computational argumentation to explicitly represent the conflicts between models and counterfactuals regarding prediction results and CE validity. It then uses argumentation semantics to resolve the conflicts and obtain the final solution, in a manner which is parametric to the chosen semantics. Our method also allows for the specification of preferences over the models under MM, allowing further customisation of the ensemble. In a comprehensive theoretical analysis, we characterise the behaviour of argumentative ensembling with four different argumentation semantics. We then empirically demonstrate the effectiveness of our approach in satisfying desirable properties with eight instantiations of our method. (Abstract is shortened for arXiv.)

Updated: 2025-06-25 09:07:00

标题: 多模型情况下的强大补救的论证集成

摘要: 在机器学习中，通常会为相同的预测任务获得多个性能相同的模型，例如，在使用不同的随机种子训练神经网络时。模型多样性（MM）是这种情况，当这些竞争模型在相同输入的预测上存在差异时，通常会使用集成方法来确定输出的聚合。在MM下提供反事实解释（CEs）的补救建议因此变得复杂，因为CE可能在所有模型中都无效，即在MM下CEs不够稳健。在这项工作中，我们正式解决了在MM下提供补救的问题，我们将其称为补救感知集成（RAE）。我们提出了这样一个想法，即在MM下，应该同时考虑每个个体模型的CEs以及它们的预测，以便决定聚合预测和补救。围绕这一直觉，我们为解决这个问题引入了六个理想的属性。为了解决RAE，我们提出了一种新颖的论证集成方法，可以保证在MM下CE的稳健性。具体来说，我们的方法利用计算论证来明确表示模型之间的冲突以及关于预测结果和CE有效性的反事实。然后使用论证语义来解决冲突并获得最终解决方案，这种方法是根据所选择的语义参数化的。我们的方法还允许对在MM下的模型进行偏好规范，从而进一步定制集成。在全面的理论分析中，我们对使用四种不同论证语义的论证集成的行为进行了表征。然后在八种我们方法的实例中，我们在实证上展示了我们方法满足理想属性的有效性。（摘要已经缩短适用于arXiv。）

更新时间: 2025-06-25 09:07:00

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2506.20260v1

Generating and Customizing Robotic Arm Trajectories using Neural Networks

We introduce a neural network approach for generating and customizing the trajectory of a robotic arm, that guarantees precision and repeatability. To highlight the potential of this novel method, we describe the design and implementation of the technique and show its application in an experimental setting of cognitive robotics. In this scenario, the NICO robot was characterized by the ability to point to specific points in space with precise linear movements, increasing the predictability of the robotic action during its interaction with humans. To achieve this goal, the neural network computes the forward kinematics of the robot arm. By integrating it with a generator of joint angles, another neural network was developed and trained on an artificial dataset created from suitable start and end poses of the robotic arm. Through the computation of angular velocities, the robot was characterized by its ability to perform the movement, and the quality of its action was evaluated in terms of shape and accuracy. Thanks to its broad applicability, our approach successfully generates precise trajectories that could be customized in their shape and adapted to different settings.

Updated: 2025-06-25 09:05:58

标题: 使用神经网络生成和定制化机械臂轨迹

摘要: 我们介绍了一种神经网络方法，用于生成和定制机器人手臂的轨迹，以确保精度和可重复性。为了突出这种新方法的潜力，我们描述了该技术的设计和实施，并展示了它在认知机器人的实验环境中的应用。在这种情况下，NICO机器人具有指向空间特定点的能力，具有精确的线性运动，增加了机器人与人类互动过程中动作的可预测性。为了实现这一目标，神经网络计算了机器人手臂的正向运动学。通过将其与关节角度生成器集成，另一个神经网络被开发并在从适当的机器人手臂起始和结束姿势创建的人工数据集上进行训练。通过计算角速度，机器人被其执行运动的能力所表征，并根据形状和精度评估其动作的质量。由于其广泛的适用性，我们的方法成功生成可以在形状上定制并适应不同设置的精确轨迹。

更新时间: 2025-06-25 09:05:58

领域: cs.RO,cs.AI,68T40, 93C85, 70E60,I.2.9

下载: http://arxiv.org/abs/2506.20259v1

A Transformer Based Handwriting Recognition System Jointly Using Online and Offline Features

We posit that handwriting recognition benefits from complementary cues carried by the rasterized complex glyph and the pen's trajectory, yet most systems exploit only one modality. We introduce an end-to-end network that performs early fusion of offline images and online stroke data within a shared latent space. A patch encoder converts the grayscale crop into fixed-length visual tokens, while a lightweight transformer embeds the $(x, y, \text{pen})$ sequence. Learnable latent queries attend jointly to both token streams, yielding context-enhanced stroke embeddings that are pooled and decoded under a cross-entropy loss objective. Because integration occurs before any high-level classification, temporal cues reinforce each other during representation learning, producing stronger writer independence. Comprehensive experiments on IAMOn-DB and VNOn-DB demonstrate that our approach achieves state-of-the-art accuracy, exceeding previous bests by up to 1\%. Our study also shows adaptation of this pipeline with gesturification on the ISI-Air dataset. Our code can be found here.

Updated: 2025-06-25 08:58:47

标题: 一个基于Transformer的手写识别系统，同时使用在线和离线特征

摘要: 我们认为手写识别受益于栅格化复杂字形和笔迹轨迹携带的互补线索，然而大多数系统只利用一种模态。我们引入了一个端到端网络，该网络在共享潜在空间内实现离线图像和在线笔画数据的早期融合。一个路径编码器将灰度裁剪转换为固定长度的视觉标记，而一个轻量级变压器嵌入$(x, y, \text{pen})$序列。可学习的潜在查询同时关注两个标记流，产生上下文增强的笔画嵌入，这些嵌入在交叉熵损失目标下被池化和解码。由于集成发生在任何高级分类之前，时间线索在表示学习过程中相互强化，产生更强的作者独立性。在IAMOn-DB和VNOn-DB上的全面实验表明，我们的方法实现了最先进的准确性，超过以前最佳结果高达1\%。我们的研究还展示了这一流程在ISI-Air数据集上的手势化适应性。我们的代码可以在这里找到。

更新时间: 2025-06-25 08:58:47

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2506.20255v1

Time-series surrogates from energy consumers generated by machine learning approaches for long-term forecasting scenarios

Forecasting attracts a lot of research attention in the electricity value chain. However, most studies concentrate on short-term forecasting of generation or consumption with a focus on systems and less on individual consumers. Even more neglected is the topic of long-term forecasting of individual power consumption. Here, we provide an in-depth comparative evaluation of data-driven methods for generating synthetic time series data tailored to energy consumption long-term forecasting. High-fidelity synthetic data is crucial for a wide range of applications, including state estimations in energy systems or power grid planning. In this study, we assess and compare the performance of multiple state-of-the-art but less common techniques: a hybrid Wasserstein Generative Adversarial Network (WGAN), Denoising Diffusion Probabilistic Model (DDPM), Hidden Markov Model (HMM), and Masked Autoregressive Bernstein polynomial normalizing Flows (MABF). We analyze the ability of each method to replicate the temporal dynamics, long-range dependencies, and probabilistic transitions characteristic of individual energy consumption profiles. Our comparative evaluation highlights the strengths and limitations of: WGAN, DDPM, HMM and MABF aiding in selecting the most suitable approach for state estimations and other energy-related tasks. Our generation and analysis framework aims to enhance the accuracy and reliability of synthetic power consumption data while generating data that fulfills criteria like anonymisation - preserving privacy concerns mitigating risks of specific profiling of single customers. This study utilizes an open-source dataset from households in Germany with 15min time resolution. The generated synthetic power profiles can readily be used in applications like state estimations or consumption forecasting.

Updated: 2025-06-25 08:54:47

标题: 机器学习方法生成的能源消费者时间序列替代物，用于长期预测场景

摘要: 预测在电力价值链中受到了很多研究关注。然而，大多数研究集中在发电或消费的短期预测上，重点放在系统上，而不是个体消费者上。更加被忽视的是个体能耗的长期预测主题。在这里，我们提供了一个深入的数据驱动方法的比较评估，用于生成针对能耗长期预测量身定制的合成时间序列数据。高度逼真的合成数据对于各种应用至关重要，包括能源系统状态估计或电网规划。在这项研究中，我们评估并比较了多种最先进但较不常见的技术的性能：混合Wasserstein生成对抗网络（WGAN）、去噪扩散概率模型（DDPM）、隐藏马尔可夫模型（HMM）和掩模自回归伯恩斯坦多项式正规化流（MABF）。我们分析了每种方法复制个体能耗配置文件的时间动态、长期依赖性和概率转换特征的能力。我们的比较评估突出了WGAN、DDPM、HMM和MABF的优势和局限性，有助于选择最适合状态估计和其他与能源相关任务的方法。我们的生成和分析框架旨在提高合成电力消耗数据的准确性和可靠性，同时生成符合匿名化标准的数据，以保护隐私并减少对单个客户的具体轮廓化风险。本研究利用了德国家庭的开放数据集，时间分辨率为15分钟。生成的合成电力配置文件可以直接用于状态估计或消费预测等应用。

更新时间: 2025-06-25 08:54:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20253v1

Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models

Quantized large language models (LLMs) have gained increasing attention and significance for enabling deployment in resource-constrained environments. However, emerging studies on a few calibration dataset-free quantization methods suggest that quantization may compromise the safety capabilities of LLMs, underscoring the urgent need for systematic safety evaluations and effective mitigation strategies. In this paper, we present comprehensive safety evaluations across various mainstream quantization techniques and diverse calibration datasets, utilizing widely accepted safety benchmarks. To address the identified safety vulnerabilities, we propose a quantization-aware safety patching framework, Q-resafe, to efficiently restore the safety capabilities of quantized LLMs while minimizing any adverse impact on utility. Extensive experimental results demonstrate that Q-resafe successfully re-aligns the safety of quantized LLMs with their pre-quantization counterparts, even under challenging evaluation scenarios. Project page is available at: https://github.com/Thecommonirin/Qresafe.

Updated: 2025-06-25 08:52:22

标题: Q-resafe：评估量化大语言模型的安全风险和量化感知安全修补

摘要: 量化大型语言模型（LLMs）已经引起越来越多的关注和重要性，因为它们使得在资源受限环境中能够进行部署成为可能。然而，一些新兴研究对于一些无需校准数据集的量化方法表明，量化可能会损害LLMs的安全能力，强调了系统化安全评估和有效缓解策略的紧迫性。在本文中，我们通过利用广泛接受的安全基准，对各种主流量化技术和不同的校准数据集进行了全面的安全评估。为了解决已识别的安全漏洞，我们提出了一个量化感知的安全修复框架，Q-resafe，以有效地恢复量化LLMs的安全能力，同时最大限度地减少对效用的不利影响。广泛的实验结果表明，Q-resafe成功地重新调整了量化LLMs的安全性与其量化之前的对应物之间的关系，即使在具有挑战性的评估场景下也是如此。项目页面可在以下链接找到：https://github.com/Thecommonirin/Qresafe。

更新时间: 2025-06-25 08:52:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20251v1

Distributed satellite information networks: Architecture, enabling technologies, and trends

Driven by the vision of ubiquitous connectivity and wireless intelligence, the evolution of ultra-dense constellation-based satellite-integrated Internet is underway, now taking preliminary shape. Nevertheless, the entrenched institutional silos and limited, nonrenewable heterogeneous network resources leave current satellite systems struggling to accommodate the escalating demands of next-generation intelligent applications. In this context, the distributed satellite information networks (DSIN), exemplified by the cohesive clustered satellites system, have emerged as an innovative architecture, bridging information gaps across diverse satellite systems, such as communication, navigation, and remote sensing, and establishing a unified, open information network paradigm to support resilient space information services. This survey first provides a profound discussion about innovative network architectures of DSIN, encompassing distributed regenerative satellite network architecture, distributed satellite computing network architecture, and reconfigurable satellite formation flying, to enable flexible and scalable communication, computing and control. The DSIN faces challenges from network heterogeneity, unpredictable channel dynamics, sparse resources, and decentralized collaboration frameworks. To address these issues, a series of enabling technologies is identified, including channel modeling and estimation, cloud-native distributed MIMO cooperation, grant-free massive access, network routing, and the proper combination of all these diversity techniques. Furthermore, to heighten the overall resource efficiency, the cross-layer optimization techniques are further developed to meet upper-layer deterministic, adaptive and secure information services requirements. In addition, emerging research directions and new opportunities are highlighted on the way to achieving the DSIN vision.

Updated: 2025-06-25 08:50:42

标题: 分布式卫星信息网络：架构、使能技术和趋势

摘要: 推动着无处不在的连接和无线智能的愿景，基于超密集星座的卫星集成互联网的演进正在进行中，现在正在初步成形。然而，根深蒂固的机构壁垒和有限的、不可再生的异构网络资源使当前卫星系统难以满足下一代智能应用程序不断增长的需求。在这种背景下，分布式卫星信息网络（DSIN），以凝聚的集群卫星系统为例，作为一种创新架构出现，跨越通信、导航和遥感等多样卫星系统之间的信息鸿沟，并建立一个统一的、开放的信息网络范式，以支持具有弹性的空间信息服务。本调查首先就DSIN的创新网络架构进行了深入讨论，包括分布式再生卫星网络架构、分布式卫星计算网络架构和可重构卫星编队飞行，以实现灵活和可扩展的通信、计算和控制。DSIN面临来自网络异构性、不可预测的信道动态、稀缺资源和去中心化协作框架的挑战。为了解决这些问题，识别了一系列的启用技术，包括信道建模和估计、云原生分布式MIMO合作、无授权的大规模接入、网络路由，以及所有这些多样性技术的适当结合。此外，为提高整体资源效率，进一步发展了跨层优化技术，以满足上层确定性、自适应和安全信息服务需求。此外，在实现DSIN愿景的过程中，还突出了新兴的研究方向和新机遇。

更新时间: 2025-06-25 08:50:42

领域: cs.IT,cs.AI,cs.NI,math.IT

下载: http://arxiv.org/abs/2412.12587v2

Language Modeling by Language Models

Can we leverage LLMs to model the process of discovering novel language model (LM) architectures? Inspired by real research, we propose a multi-agent LLM approach that simulates the conventional stages of research, from ideation and literature search (proposal stage) to design implementation (code generation), generative pre-training, and downstream evaluation (verification). Using ideas from scaling laws, our system, Genesys, employs a Ladder of Scales approach; new designs are proposed, adversarially reviewed, implemented, and selectively verified at increasingly larger model scales (14M$\sim$350M parameters) with a narrowing budget (the number of models we can train at each scale). To help make discovery efficient and factorizable, Genesys uses a novel genetic programming backbone, which we show has empirical advantages over commonly used direct prompt generation workflows (e.g., $\sim$86\% percentage point improvement in successful design generation, a key bottleneck). We report experiments involving 1,162 newly discovered designs (1,062 fully verified through pre-training) and find the best designs to be highly competitive with known architectures (e.g., outperform GPT2, Mamba2, etc., on 6/9 common benchmarks). We couple these results with comprehensive system-level ablations and formal results, which give broader insights into the design of effective autonomous discovery systems.

Updated: 2025-06-25 08:46:10

标题: 语言模型的语言建模

摘要: 我们能否利用LLMs来模拟发现新语言模型（LM）架构的过程？受到真实研究的启发，我们提出了一种多智能体LLM方法，模拟研究的传统阶段，从构思和文献搜索（提案阶段）到设计实现（代码生成）、生成预训练和下游评估（验证）。借鉴自缩放定律的思想，我们的系统Genesys采用了一种“尺度阶梯”方法；新设计在不断增大的模型尺度（14M\~350M参数）上提出、通过对抗性审查、实施和有选择地进行验证（每个尺度我们可以训练的模型数量逐渐减少的预算）。为了帮助发现高效且可分解，Genesys采用了一种新颖的遗传规划骨干，我们展示了它在常用的直接提示生成工作流程上具有实证优势（例如，成功设计生成的改进达到86个百分点，这是一个关键瓶颈）。我们报告了涉及1,162个新发现设计的实验（1,062个通过预训练完全验证），发现最佳设计与已知架构（例如，在6/9个常见基准测试中胜过GPT2、Mamba2等）竞争力强。我们将这些结果与全面的系统级消融和形式化结果相结合，为有效的自主发现系统设计提供更广泛的见解。

更新时间: 2025-06-25 08:46:10

领域: cs.AI,cs.CL,cs.MA

下载: http://arxiv.org/abs/2506.20249v1

Evaluating PDE discovery methods for multiscale modeling of biological signals

Biological systems are non-linear, include unobserved variables and the physical principles that govern their dynamics are partly unknown. This makes the characterization of their behavior very challenging. Notably, their activity occurs on multiple interdependent spatial and temporal scales that require linking mechanisms across scales. To address the challenge of bridging gaps between scales, we leverage partial differential equations (PDE) discovery. PDE discovery suggests meso-scale dynamics characteristics from micro-scale data. In this article, we present our framework combining particle-based simulations and PDE discovery and conduct preliminary experiments to assess equation discovery in controlled settings. We evaluate five state-of-the-art PDE discovery methods on particle-based simulations of calcium diffusion in astrocytes. The performances of the methods are evaluated on both the form of the discovered equation and the forecasted temporal variations of calcium concentration. Our results show that several methods accurately recover the diffusion term, highlighting the potential of PDE discovery for capturing macroscopic dynamics in biological systems from microscopic data.

Updated: 2025-06-25 08:43:37

标题: 评估用于生物信号多尺度建模的PDE发现方法

摘要: 生物系统是非线性的，包括未观察到的变量和部分未知的物理原理来控制它们的动态。这使得对它们行为的表征非常具有挑战性。值得注意的是，它们的活动发生在多个相互依赖的空间和时间尺度上，需要跨尺度地链接机制。为了解决跨尺度之间的差距挑战，我们利用偏微分方程（PDE）发现。PDE发现从微观数据中提出了中尺度动力学特征。在本文中，我们提出了一个结合基于粒子的模拟和PDE发现的框架，并进行初步实验来评估在受控环境中的方程发现。我们评估了五种最先进的PDE发现方法在星形胶质细胞中的钙扩散的基于粒子的模拟上的表现。这些方法的表现评估基于发现的方程形式和钙浓度的预测时间变化。我们的结果显示，一些方法能够准确恢复扩散项，突显了PDE发现在从微观数据中捕捉生物系统中宏观动态的潜力。

更新时间: 2025-06-25 08:43:37

领域: q-bio.QM,cs.AI

下载: http://arxiv.org/abs/2506.20694v1

FedBKD: Distilled Federated Learning to Embrace Gerneralization and Personalization on Non-IID Data

Federated learning (FL) is a decentralized collaborative machine learning (ML) technique. It provides a solution to the issues of isolated data islands and data privacy leakage in industrial ML practices. One major challenge in FL is handling the non-identical and independent distributed (non-IID) data. Current solutions either focus on constructing an all-powerful global model, or customizing personalized local models. Few of them can provide both a well-generalized global model and well-performed local models at the same time. Additionally, many FL solutions to the non-IID problem are benefited from introducing public datasets. However, this will also increase the risk of data leakage. To tackle the problems, we propose a novel data-free distillation framework, Federated Bidirectional Knowledge Distillation (FedBKD). Specifically, we train Generative Adversarial Networks (GAN) for synthetic data. During the GAN training, local models serve as discriminators and their parameters are frozen. The synthetic data is then used for bidirectional distillation between global and local models to achieve knowledge interactions so that performances for both sides are improved. We conduct extensive experiments on 4 benchmarks under different non-IID settings. The results show that FedBKD achieves SOTA performances in every case.

Updated: 2025-06-25 08:42:10

标题: FedBKD：精炼的联邦学习以包容非独立同分布数据的泛化和个性化

摘要: 联邦学习（FL）是一种去中心化的协作机器学习（ML）技术。它为工业ML实践中的孤立数据岛和数据隐私泄漏提供了解决方案。FL面临的一个主要挑战是处理非相同和独立分布的数据（非IID）。当前的解决方案要么专注于构建一个全能的全局模型，要么定制个性化的本地模型。其中很少有能够同时提供良好泛化的全局模型和表现良好的本地模型。此外，许多解决FL中非IID问题的方法受益于引入公共数据集。然而，这也会增加数据泄漏的风险。为了解决这些问题，我们提出了一种新颖的无数据蒸馏框架，称为联邦双向知识蒸馏（FedBKD）。具体来说，我们训练生成对抗网络（GAN）来生成合成数据。在GAN训练过程中，本地模型充当鉴别器，其参数被冻结。然后使用合成数据进行全局和本地模型之间的双向蒸馏，以实现知识交互，从而改善双方的性能。我们在4个基准测试中进行了广泛的实验，涵盖不同的非IID设置。结果显示，FedBKD在每种情况下都实现了SOTA性能。

更新时间: 2025-06-25 08:42:10

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2506.20245v1

CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment

Automatic fluency assessment (AFA) remains challenging, particularly in capturing speech rhythm, pauses, and disfluencies in non-native speakers. We introduce a chunk-based approach integrating self-supervised learning (SSL) models (Wav2Vec2, HuBERT, and WavLM) selected for their complementary strengths in phonetic, prosodic, and noisy speech modeling, with a hierarchical CNN-BiLSTM framework. Speech is segmented into breath-group chunks using Silero voice activity detection (Silero-VAD), enabling fine-grained temporal analysis while mitigating over-segmentation artifacts. SSL embeddings are fused via a learnable weighted mechanism, balancing acoustic and linguistic features, and enriched with chunk-level fluency markers (e.g., speech rate, pause durations, n-gram repetitions). The CNN-BiLSTM captures local and long-term dependencies across chunks. Evaluated on Avalinguo and Speechocean762, our approach improves F1-score by 2.8 and Pearson correlation by 6.2 points over single SSL baselines on Speechocean762, with gains of 4.2 F1-score and 4.0 Pearson points on Avalinguo, surpassing Pyannote.audio-based segmentation baselines. These findings highlight chunk-based multi-SSL fusion for robust fluency evaluation, though future work should explore generalization to dialects with irregular prosody.

Updated: 2025-06-25 08:39:22

标题: CBF-AFA：基于块的多SSL融合用于自动流利度评估

摘要: 自动流畅度评估（AFA）仍然具有挑战性，特别是在捕捉非母语人士的语音节奏、停顿和不流畅方面。我们引入了一种基于块的方法，将自监督学习（SSL）模型（Wav2Vec2、HuBERT和WavLM）与分层CNN-BiLSTM框架相结合，这些模型是根据它们在语音、韵律和嘈杂语音建模方面的互补优势而选择的。通过Silero语音活动检测（Silero-VAD）将语音分割成呼吸组块，从而实现了细粒度的时间分析，同时减轻了过度分割的人为因素。SSL嵌入通过可学习的加权机制融合，平衡声学和语言特征，并丰富了块级流畅度标记（例如，语速、停顿持续时间、n-gram重复）。CNN-BiLSTM捕捉了块之间的局部和长期依赖关系。在Avalinguo和Speechocean762上评估，我们的方法在Speechocean762上将F1分数提高了2.8个点，皮尔逊相关性提高了6.2个点，而在Avalinguo上提高了4.2个F1分数和4.0个皮尔逊点，超过了基于Pyannote.audio的分割基线。这些发现强调了基于块的多SSL融合用于稳健的流畅度评估，尽管未来的工作应该探索将该方法推广到具有不规则韵律的方言。

更新时间: 2025-06-25 08:39:22

领域: cs.CL,cs.AI,eess.AS

下载: http://arxiv.org/abs/2506.20243v1

Dual-Channel Multiplex Graph Neural Networks for Recommendation

Effective recommender systems play a crucial role in accurately capturing user and item attributes that mirror individual preferences. Some existing recommendation techniques have started to shift their focus towards modeling various types of interactive relations between users and items in real-world recommendation scenarios, such as clicks, marking favorites, and purchases on online shopping platforms. Nevertheless, these approaches still grapple with two significant challenges: (1) Insufficient modeling and exploitation of the impact of various behavior patterns formed by multiplex relations between users and items on representation learning, and (2) ignoring the effect of different relations within behavior patterns on the target relation in recommender system scenarios. In this work, we introduce a novel recommendation framework, Dual-Channel Multiplex Graph Neural Network (DCMGNN), which addresses the aforementioned challenges. It incorporates an explicit behavior pattern representation learner to capture the behavior patterns composed of multiplex user-item interactive relations, and includes a relation chain representation learner and a relation chain-aware encoder to discover the impact of various auxiliary relations on the target relation, the dependencies between different relations, and mine the appropriate order of relations in a behavior pattern. Extensive experiments on three real-world datasets demonstrate that our DCMGNN surpasses various state-of-the-art recommendation methods. It outperforms the best baselines by 10.06% and 12.15% on average across all datasets in terms of Recall@10 and NDCG@10, respectively.

Updated: 2025-06-25 08:38:27

标题: 双通道多路复用图神经网络用于推荐

摘要: 有效的推荐系统在准确捕捉反映个人偏好的用户和物品属性方面发挥着关键作用。一些现有的推荐技术已经开始将重点转向建模用户和物品在现实世界推荐场景中的各种类型的互动关系，如点击、标记收藏和在线购物平台上的购买。然而，这些方法仍然面临着两个重要挑战：（1）不充分建模和利用由用户和物品之间的多重关系形成的各种行为模式对表示学习的影响，以及（2）忽略行为模式中不同关系对推荐系统中目标关系的影响。在这项工作中，我们引入了一种新颖的推荐框架，双通道多重图神经网络（DCMGNN），它解决了上述挑战。它包括一个显式行为模式表示学习器，用于捕捉由多重用户-物品互动关系组成的行为模式，并包括一个关系链表示学习器和一个关系链感知编码器，用于发现各种辅助关系对目标关系的影响、不同关系之间的依赖关系，并挖掘行为模式中关系的适当顺序。对三个真实数据集的大量实验表明，我们的DCMGNN在各种最先进的推荐方法中表现出优越性。在所有数据集上，它在Recall@10和NDCG@10方面分别比最佳基线高出10.06％和12.15％。

更新时间: 2025-06-25 08:38:27

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2403.11624v5

Enhancing Large Language Models through Structured Reasoning

Recent Large Language Models (LLMs) have significantly advanced natural language processing and automated decision-making. However, these models still encounter difficulties when performing complex reasoning tasks involving logical deduction and systematic planning, primarily due to their reliance on implicit statistical relationships without structured knowledge representation.Inspired by cognitive science and neurosymbolic AI, we introduce a novel approach to enhance LLMs through explicit structured reasoning. First, we convert unstructured data into structured formats by explicitly annotating reasoning steps. We then employ this structured dataset to train LLMs through Supervised Fine-Tuning (SFT). Additionally, we enhance the structured reasoning capabilities of LLMs using Group Relative Policy Optimization (GRPO), incorporating two innovative algorithms--MAX-Flow and Longest Common Subsequence (LCS)--which notably improve reasoning effectiveness and reduce computational complexity. Experimental results from fine-tuning a DeepSeek-R1-Distill-Qwen-1.5B model demonstrate concise reasoning, robust performance across various scenarios, and improved compatibility with optimization techniques, validating the efficacy of structured reasoning integration in LLMs.

Updated: 2025-06-25 08:36:12

标题: 通过结构化推理增强大型语言模型

摘要: 最近的大型语言模型（LLMs）显著推动了自然语言处理和自动决策的发展。然而，这些模型在执行涉及逻辑推理和系统规划的复杂推理任务时仍然遇到困难，主要是因为它们依赖于隐含的统计关系而缺乏结构化知识表示。受认知科学和神经符号人工智能的启发，我们引入了一种通过显式结构化推理来增强LLMs的新方法。首先，我们通过明确注释推理步骤将非结构化数据转换为结构化格式。然后，我们利用这个结构化数据集通过受监督的微调（SFT）来训练LLMs。此外，我们使用群体相对策略优化（GRPO）来增强LLMs的结构化推理能力，结合了两种创新算法--MAX-Flow和最长公共子序列（LCS）--显著提高了推理效果并减少了计算复杂性。通过对DeepSeek-R1-Distill-Qwen-1.5B模型进行微调的实验结果表明，推理简洁，性能稳健，在各种场景下表现出色，并且与优化技术的兼容性得到了改善，验证了LLMs中结构化推理整合的有效性。

更新时间: 2025-06-25 08:36:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.20241v1

Directed Link Prediction using GNN with Local and Global Feature Fusion

Link prediction is a classical problem in graph analysis with many practical applications. For directed graphs, recently developed deep learning approaches typically analyze node similarities through contrastive learning and aggregate neighborhood information through graph convolutions. In this work, we propose a novel graph neural network (GNN) framework to fuse feature embedding with community information. We theoretically demonstrate that such hybrid features can improve the performance of directed link prediction. To utilize such features efficiently, we also propose an approach to transform input graphs into directed line graphs so that nodes in the transformed graph can aggregate more information during graph convolutions. Experiments on benchmark datasets show that our approach outperforms the state-of-the-art in most cases when 30%, 40%, 50%, and 60% of the connected links are used as training data, respectively.

Updated: 2025-06-25 08:25:56

标题: 使用具有局部和全局特征融合的GNN进行有向链接预测

摘要: 链路预测是图分析中的一个经典问题，具有许多实际应用。对于有向图，最近发展的深度学习方法通常通过对比学习分析节点相似性，并通过图卷积聚合邻域信息。在这项工作中，我们提出了一种新颖的图神经网络（GNN）框架，将特征嵌入与社区信息融合。我们理论上证明了这种混合特征可以提高有向链路预测的性能。为了高效利用这种特征，我们还提出了一种将输入图转换为有向线图的方法，这样转换后的图中的节点在图卷积过程中可以聚合更多信息。在基准数据集上的实验表明，在使用连接链路的30％、40％、50％和60％作为训练数据时，我们的方法在大多数情况下优于最先进的方法。

更新时间: 2025-06-25 08:25:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20235v1

Communication-Efficient Publication of Sparse Vectors under Differential Privacy

In this work, we propose a differentially private algorithm for publishing matrices aggregated from sparse vectors. These matrices include social network adjacency matrices, user-item interaction matrices in recommendation systems, and single nucleotide polymorphisms (SNPs) in DNA data. Traditionally, differential privacy in vector collection relies on randomized response, but this approach incurs high communication costs. Specifically, for a matrix with $N$ users, $n$ columns, and $m$ nonzero elements, conventional methods require $\Omega(n \times N)$ communication, making them impractical for large-scale data. Our algorithm significantly reduces this cost to $O(\varepsilon m)$, where $\varepsilon$ is the privacy budget. Notably, this is even lower than the non-private case, which requires $\Omega(m \log n)$ communication. Moreover, as the privacy budget decreases, communication cost further reduces, enabling better privacy with improved efficiency. We theoretically prove that our method yields results identical to those of randomized response, and experimental evaluations confirm its effectiveness in terms of accuracy, communication efficiency, and computational complexity.

Updated: 2025-06-25 08:25:46

标题: "Differential Privacy条件下稀疏向量的高效发布通信"

摘要: 在这项工作中，我们提出了一种针对从稀疏向量聚合而成的矩阵发布的差分隐私算法。这些矩阵包括社交网络邻接矩阵、推荐系统中的用户-物品交互矩阵以及DNA数据中的单核苷酸多态性（SNPs）。传统上，向量收集中的差分隐私依赖于随机响应，但这种方法会产生高通信成本。具体来说，对于具有$N$个用户、$n$列和$m$个非零元素的矩阵，传统方法需要$\Omega(n \times N)$的通信，使其在大规模数据下不切实际。我们的算法显著降低了这种成本至$O(\varepsilon m)$，其中$\varepsilon$为隐私预算。值得注意的是，这甚至低于非私密情况，后者需要$\Omega(m \log n)$的通信。此外，随着隐私预算的减少，通信成本进一步降低，实现更好的隐私保护和更高的效率。我们在理论上证明了我们的方法产生与随机响应相同的结果，实验评估证实了其在准确性、通信效率和计算复杂度方面的有效性。

更新时间: 2025-06-25 08:25:46

领域: cs.CR

下载: http://arxiv.org/abs/2506.20234v1

E-ABIN: an Explainable module for Anomaly detection in BIological Networks

The increasing availability of large-scale omics data calls for robust analytical frameworks capable of handling complex gene expression datasets while offering interpretable results. Recent advances in artificial intelligence have enabled the identification of aberrant molecular patterns distinguishing disease states from healthy controls. Coupled with improvements in model interpretability, these tools now support the identification of genes potentially driving disease phenotypes. However, current approaches to gene anomaly detection often remain limited to single datasets and lack accessible graphical interfaces. Here, we introduce E-ABIN, a general-purpose, explainable framework for Anomaly detection in Biological Networks. E-ABIN combines classical machine learning and graph-based deep learning techniques within a unified, user-friendly platform, enabling the detection and interpretation of anomalies from gene expression or methylation-derived networks. By integrating algorithms such as Support Vector Machines, Random Forests, Graph Autoencoders (GAEs), and Graph Adversarial Attributed Networks (GAANs), E-ABIN ensures a high predictive accuracy while maintaining interpretability. We demonstrate the utility of E-ABIN through case studies of bladder cancer and coeliac disease, where it effectively uncovers biologically relevant anomalies and offers insights into disease mechanisms.

Updated: 2025-06-25 08:25:17

标题: E-ABIN：生物网络异常检测的可解释模块

摘要: 随着大规模组学数据的增加，需要能够处理复杂基因表达数据集并提供可解释结果的强大分析框架。人工智能的最新进展使得能够识别区分疾病状态和健康对照组的异常分子模式。结合模型可解释性的改进，这些工具现在支持识别可能驱动疾病表型的基因。然而，当前的基因异常检测方法通常仍局限于单一数据集，并缺乏易于访问的图形界面。在这里，我们介绍了E-ABIN，一个通用的、可解释的生物网络异常检测框架。E-ABIN将经典机器学习和基于图的深度学习技术结合在一个统一的、用户友好的平台上，能够从基因表达或甲基化衍生网络中检测和解释异常。通过集成支持向量机、随机森林、图自编码器（GAEs）和图对抗属性网络（GAANs）等算法，E-ABIN确保高预测准确性同时保持解释性。我们通过膀胱癌和乳糜泻的病例研究展示了E-ABIN的实用性，它有效地发现了与生物相关的异常，并为疾病机制提供了见解。

更新时间: 2025-06-25 08:25:17

领域: cs.LG

下载: http://arxiv.org/abs/2506.20693v1

AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

Scaffolding Large Language Models (LLMs) into multi-agent systems often improves performance on complex tasks, but the safety impact of such scaffolds has not been thoroughly explored. We introduce AgentBreeder, a framework for multi-objective self-improving evolutionary search over scaffolds. We evaluate discovered scaffolds on widely recognized reasoning, mathematics, and safety benchmarks and compare them with popular baselines. In 'blue' mode, we see a 79.4% average uplift in safety benchmark performance while maintaining or improving capability scores. In 'red' mode, we find adversarially weak scaffolds emerging concurrently with capability optimization. Our work demonstrates the risks of multi-agent scaffolding and provides a framework for mitigating them. Code is available at https://github.com/J-Rosser-UK/AgentBreeder.

Updated: 2025-06-25 08:23:23

标题: AgentBreeder: 通过自我改进减轻多智能体骨架的AI安全影响

摘要: 将大型语言模型（LLMs）融入多智能体系统通常可以提高复杂任务的性能，但这些脚手架对安全性的影响尚未得到深入探讨。我们引入了AgentBreeder，一个用于多目标自我改进的进化搜索脚手架的框架。我们在广泛认可的推理、数学和安全基准上评估发现的脚手架，并将它们与流行的基线进行比较。在“蓝色”模式下，我们看到安全基准性能平均提升了79.4%，同时保持或提高能力得分。在“红色”模式下，我们发现在优化能力的同时出现了对抗性较弱的脚手架。我们的工作展示了多智能体脚手架的风险，并提供了一个减轻这些风险的框架。代码可在https://github.com/J-Rosser-UK/AgentBreeder上找到。

更新时间: 2025-06-25 08:23:23

领域: cs.CR,cs.AI,cs.NE,68T42, 68T50,I.2.11

下载: http://arxiv.org/abs/2502.00757v3

Gradient-Free Sequential Bayesian Experimental Design via Interacting Particle Systems

We introduce a gradient-free framework for Bayesian Optimal Experimental Design (BOED) in sequential settings, aimed at complex systems where gradient information is unavailable. Our method combines Ensemble Kalman Inversion (EKI) for design optimization with the Affine-Invariant Langevin Dynamics (ALDI) sampler for efficient posterior sampling-both of which are derivative-free and ensemble-based. To address the computational challenges posed by nested expectations in BOED, we propose variational Gaussian and parametrized Laplace approximations that provide tractable upper and lower bounds on the Expected Information Gain (EIG). These approximations enable scalable utility estimation in high-dimensional spaces and PDE-constrained inverse problems. We demonstrate the performance of our framework through numerical experiments ranging from linear Gaussian models to PDE-based inference tasks, highlighting the method's robustness, accuracy, and efficiency in information-driven experimental design.

Updated: 2025-06-25 08:22:09

标题: 无梯度顺序贝叶斯实验设计：基于相互作用粒子系统

摘要: 我们介绍了一个基于梯度无关的贝叶斯最优实验设计（BOED）框架，针对梯度信息不可用的复杂系统中的顺序设置。我们的方法将集合卡尔曼反演（EKI）与适应不变的朗之万动力学（ALDI）采样器相结合，用于设计优化和有效的后验采样，两者均无导数且基于集合。为了解决BOED中嵌套期望引起的计算挑战，我们提出了变分高斯和参数化拉普拉斯逼近，提供对期望信息增益（EIG）的可处理的上限和下限。这些逼近使得在高维空间和PDE约束的反问题中能够进行可伸缩的效用估计。我们通过从线性高斯模型到基于PDE的推断任务的数值实验展示了我们框架的性能，突出了该方法在信息驱动实验设计中的鲁棒性、准确性和效率。

更新时间: 2025-06-25 08:22:09

领域: stat.ML,cs.LG,cs.NA,math.NA,stat.CO,62K05, 62F15, 65C05, 93E10

下载: http://arxiv.org/abs/2504.13320v2

Measuring Modern Phishing Tactics: A Quantitative Study of Body Obfuscation Prevalence, Co-occurrence, and Filter Impact

Phishing attacks frequently use email body obfuscation to bypass detection filters, but quantitative insights into how techniques are combined and their impact on filter scores remain limited. This paper addresses this gap by empirically investigating the prevalence, co-occurrence patterns, and spam score associations of body obfuscation techniques. Analysing 386 verified phishing emails, we quantified ten techniques, identified significant pairwise co-occurrences revealing strategic layering like the presence of text in images with multipart abuse, and assessed associations with antispam scores using multilinear regression. Text in Image (47.0%), Base64 Encoding (31.2%), and Invalid HTML (28.8%) were highly prevalent. Regression (R${}^2$=0.486, p<0.001) linked Base64 Encoding and Text in Image with significant antispam evasion (p<0.05) in this configuration, suggesting potential bypass capabilities, while Invalid HTML correlated with higher scores. These findings establish a quantitative baseline for complex evasion strategies, underscoring the need for multi-modal defences against combined obfuscation tactics.

Updated: 2025-06-25 08:20:38

标题: 测量现代网络钓鱼策略：身体混淆普遍性、共现和过滤器影响的定量研究

摘要: 网络钓鱼攻击经常利用电子邮件正文混淆来绕过检测过滤器，但对于这些技术如何组合以及它们对过滤器分数的影响的定量洞察仍然有限。本文通过实证调查正文混淆技术的普遍性、共现模式和垃圾邮件得分关联来填补这一空白。通过分析386封经过验证的网络钓鱼电子邮件，我们量化了十种技术，确定了显著的成对共现关系，揭示了类似在图像中存在文本与多部分滥用之间的战略分层，并使用多重线性回归评估与反垃圾邮件分数的关联。图像中的文本（47.0%）、Base64编码（31.2%）和无效HTML（28.8%）非常普遍。回归（R^2=0.486，p<0.001）将Base64编码和图像中的文本与显著的反垃圾邮件逃避（p<0.05）联系在一起，表明在这种配置中存在潜在的绕过能力，而无效的HTML与更高分数相关联。这些发现为复杂的逃避策略建立了一个定量基线，强调了需要采取多模式防御措施来对抗组合混淆战术。

更新时间: 2025-06-25 08:20:38

领域: cs.CR

下载: http://arxiv.org/abs/2506.20228v1

SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling

We present Sleeping-DISCO 9M, a large-scale pre-training dataset for music and song. To the best of our knowledge, there are no open-source high-quality dataset representing popular and well-known songs for generative music modeling tasks such as text-music, music-captioning, singing-voice synthesis, melody reconstruction and cross-model retrieval. Past contributions focused on isolated and constrained factors whose core perspective was to create synthetic or re-recorded music corpus (e.g. GTSinger, M4Singer) and arbitrarily large-scale audio datasets (e.g. DISCO-10M and LAIONDISCO-12M) had been another focus for the community. Unfortunately, adoption of these datasets has been below substantial in the generative music community as these datasets fail to reflect real-world music and its flavour. Our dataset changes this narrative and provides a dataset that is constructed using actual popular music and world-renowned artists.

Updated: 2025-06-25 08:18:37

标题: SLEEPING-DISCO 9M：用于生成音乐建模的大规模预训练数据集

摘要: 我们提出了Sleeping-DISCO 9M，这是一个大规模的用于音乐和歌曲的预训练数据集。据我们所知，目前没有开源表示流行和知名歌曲的高质量数据集，用于生成音乐建模任务，如文本音乐、音乐字幕、歌声合成、旋律重建和跨模型检索。过去的贡献集中在孤立和受限制的因素上，其核心观点是创建合成或重新录制的音乐语料库（例如GTSinger、M4Singer），而任意大规模的音频数据集（例如DISCO-10M和LAIONDISCO-12M）也成为社区的另一个焦点。不幸的是，这些数据集在生成音乐社区中的采用率较低，因为这些数据集未能反映真实世界音乐及其风格。我们的数据集改变了这种叙事，并提供了一个使用实际流行音乐和世界知名艺术家构建的数据集。

更新时间: 2025-06-25 08:18:37

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2506.14293v3

FGS-SLAM: Fourier-based Gaussian Splatting for Real-time SLAM with Sparse and Dense Map Fusion

3D gaussian splatting has advanced simultaneous localization and mapping (SLAM) technology by enabling real-time positioning and the construction of high-fidelity maps. However, the uncertainty in gaussian position and initialization parameters introduces challenges, often requiring extensive iterative convergence and resulting in redundant or insufficient gaussian representations. To address this, we introduce a novel adaptive densification method based on Fourier frequency domain analysis to establish gaussian priors for rapid convergence. Additionally, we propose constructing independent and unified sparse and dense maps, where a sparse map supports efficient tracking via Generalized Iterative Closest Point (GICP) and a dense map creates high-fidelity visual representations. This is the first SLAM system leveraging frequency domain analysis to achieve high-quality gaussian mapping in real-time. Experimental results demonstrate an average frame rate of 36 FPS on Replica and TUM RGB-D datasets, achieving competitive accuracy in both localization and mapping.

Updated: 2025-06-25 08:14:50

标题: FGS-SLAM：基于傅里叶变换的高斯点云投影，在稀疏和稠密地图融合下实现实时SLAM

摘要: 3D高斯点云技术通过实时定位和高保真度地图构建，推动了同时定位和地图构建（SLAM）技术的发展。然而，高斯位置和初始化参数的不确定性引入挑战，通常需要大量迭代收敛，导致冗余或不足的高斯表示。为了解决这个问题，我们引入了一种基于傅立叶频域分析的新颖自适应致密化方法，建立高斯先验以实现快速收敛。此外，我们提出构建独立且统一的稀疏和密集地图，其中稀疏地图支持通过广义迭代最近点（GICP）进行高效跟踪，而密集地图则创建高保真度的视觉表示。这是第一个利用频域分析实现实时高质量高斯映射的SLAM系统。实验结果表明，在Replica和TUM RGB-D数据集上，平均帧率达到36 FPS，同时实现了定位和地图构建方面的竞争性准确性。

更新时间: 2025-06-25 08:14:50

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2503.01109v2

Supporting renewable energy planning and operation with data-driven high-resolution ensemble weather forecast

The planning and operation of renewable energy, especially wind power, depend crucially on accurate, timely, and high-resolution weather information. Coarse-grid global numerical weather forecasts are typically downscaled to meet these requirements, introducing challenges of scale inconsistency, process representation error, computation cost, and entanglement of distinct uncertainty sources from chaoticity, model bias, and large-scale forcing. We address these challenges by learning the climatological distribution of a target wind farm using its high-resolution numerical weather simulations. An optimal combination of this learned high-resolution climatological prior with coarse-grid large scale forecasts yields highly accurate, fine-grained, full-variable, large ensemble of weather pattern forecasts. Using observed meteorological records and wind turbine power outputs as references, the proposed methodology verifies advantageously compared to existing numerical/statistical forecasting-downscaling pipelines, regarding either deterministic/probabilistic skills or economic gains. Moreover, a 100-member, 10-day forecast with spatial resolution of 1 km and output frequency of 15 min takes < 1 hour on a moderate-end GPU, as contrast to $\mathcal{O}(10^3)$ CPU hours for conventional numerical simulation. By drastically reducing computational costs while maintaining accuracy, our method paves the way for more efficient and reliable renewable energy planning and operation.

Updated: 2025-06-25 08:04:43

标题: 用数据驱动的高分辨率集合天气预报支持可再生能源规划和运营

摘要: 再生能源的规划和运营，特别是风能，关键取决于准确、及时和高分辨率的天气信息。通常将粗网格全球数值天气预报细化以满足这些要求，引入了尺度不一致性、过程表示误差、计算成本和不同不确定性来源（如混沌性、模型偏差和大尺度强迫）的挑战。我们通过利用目标风电场的高分辨率数值天气模拟学习其气候分布，解决这些挑战。将学习的高分辨率气候学先验与粗网格大尺度预报的最佳组合，生成高度准确、细粒度、全变量、大量天气模式预测。通过使用观测的气象记录和风力发电机功率输出作为参考，所提出的方法在确定性/概率技能或经济收益方面与现有的数值/统计预报-细化管道相比具有优势。此外，具有1公里空间分辨率和15分钟输出频率的100成员、10天预报在中端GPU上仅需要<1小时，而传统数值模拟需要$\mathcal{O}(10^3)$CPU小时。通过大幅降低计算成本同时保持准确性，我们的方法为更高效、可靠的再生能源规划和运营铺平了道路。

更新时间: 2025-06-25 08:04:43

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2505.04396v2

MS-TVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Dynamic Convolution

Long-term time series prediction has predominantly relied on Transformer and MLP models, while the potential of convolutional networks in this domain remains underexplored. To address this gap, we introduce a novel multi-scale time series reshape module, which effectively captures the relationships among multi-period patches and variable dependencies. Building upon this module, we propose MS-TVNet, a multi-scale 3D dynamic convolutional neural network. Through comprehensive evaluations on diverse datasets, MS-TVNet demonstrates superior performance compared to baseline models, achieving state-of-the-art (SOTA) results in long-term time series prediction. Our findings highlight the effectiveness of leveraging convolutional networks for capturing complex temporal patterns, suggesting a promising direction for future research in this field.The code is realsed on https://github.com/Curyyfaust/TVNet.

Updated: 2025-06-25 07:55:20

标题: MS-TVNet：基于多尺度动态卷积的长期时间序列预测方法

摘要: 长期时间序列预测主要依赖Transformer和MLP模型，而在这个领域中卷积网络的潜力仍未得到充分挖掘。为了填补这一空白，我们引入了一种新颖的多尺度时间序列重塑模块，有效地捕捉了多期间补丁和变量依赖关系之间的关系。基于这个模块，我们提出了MS-TVNet，一个多尺度3D动态卷积神经网络。通过对各种数据集的全面评估，MS-TVNet表现出优越的性能，与基准模型相比实现了长期时间序列预测的最新成果。我们的研究结果突出了利用卷积网络捕捉复杂时间模式的有效性，为未来研究指明了一个有前途的方向。该代码发布在https://github.com/Curyyfaust/TVNet。

更新时间: 2025-06-25 07:55:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.17253v2

Curved representational Bregman divergences and their applications

By analogy to curved exponential families in statistics, we define curved Bregman divergences as Bregman divergences restricted to nonlinear parameter subspaces. We show that the barycenter of a finite weighted set of parameters under a curved Bregman divergence amounts to the right Bregman projection onto the nonlinear subspace of the barycenter with respect to the full Bregman divergence. We demonstrate the significance of curved Bregman divergences with two examples: (1) symmetrized Bregman divergences and (2) the Kullback-Leibler divergence between circular complex normal distributions. We then consider monotonic embeddings to define representational curved Bregman divergences and show that the $\alpha$-divergences are representational curved Bregman divergences with respect to $\alpha$-embeddings of the probability simplex into the positive measure cone. As an application, we report an efficient method to calculate the intersection of a finite set of $\alpha$-divergence spheres.

Updated: 2025-06-25 07:53:44

标题: 曲线表示的Bregman散度及其应用

摘要: 通过对统计学中的曲线指数族进行类比，我们定义了曲线Bregman散度，作为限制在非线性参数子空间上的Bregman散度。我们展示了有限加权参数集合在曲线Bregman散度下的质心等于相对于完整Bregman散度的非线性子空间的右Bregman投影。我们通过两个例子展示了曲线Bregman散度的重要性：（1）对称化Bregman散度和（2）圆形复杂正态分布之间的Kullback-Leibler散度。然后，我们考虑单调嵌入来定义表示性曲线Bregman散度，并展示了$\alpha$-散度在概率单纯形嵌入到正测量锥中时是表示性曲线Bregman散度。作为应用，我们报告了一种计算有限$\alpha$-散度球体交集的有效方法。

更新时间: 2025-06-25 07:53:44

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2504.05654v2

Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems

In the realm of Natural Language Processing (NLP), common approaches for handling human disagreement consist of aggregating annotators' viewpoints to establish a single ground truth. However, prior studies show that disregarding individual opinions can lead can lead to the side effect of underrepresenting minority perspectives, especially in subjective tasks, where annotators may systematically disagree because of their preferences. Recognizing that labels reflect the diverse backgrounds, life experiences, and values of individuals, this study proposes a new multi-perspective approach using soft labels to encourage the development of the next generation of perspective aware models, more inclusive and pluralistic. We conduct an extensive analysis across diverse subjective text classification tasks, including hate speech, irony, abusive language, and stance detection, to highlight the importance of capturing human disagreements, often overlooked by traditional aggregation methods. Results show that the multi-perspective approach not only better approximates human label distributions, as measured by Jensen-Shannon Divergence (JSD), but also achieves superior classification performance (higher F1 scores), outperforming traditional approaches. However, our approach exhibits lower confidence in tasks like irony and stance detection, likely due to the inherent subjectivity present in the texts. Lastly, leveraging Explainable AI (XAI), we explore model uncertainty and uncover meaningful insights into model predictions.

Updated: 2025-06-25 07:53:36

标题: 游戏的多维视角：更具包容性的自然语言处理系统的多视角方法

摘要: 在自然语言处理（NLP）领域，处理人类分歧的常见方法包括聚合注释者的观点以建立单一真相。然而，先前的研究表明，忽视个体意见可能会导致低估少数派观点的副作用，尤其是在主观任务中，注释者可能会因为偏好而系统性地产生分歧。认识到标签反映了个人的多样背景、生活经历和价值观，本研究提出了一种新的多视角方法，使用软标签来鼓励发展下一代具有观点意识的模型，更具包容性和多元性。我们对包括仇恨言论、讽刺、辱骂性语言和立场检测在内的多样主观文本分类任务进行了广泛分析，以突显捕获人类分歧的重要性，这在传统聚合方法中经常被忽视。结果显示，多视角方法不仅更好地逼近人类标签分布，如通过Jensen-Shannon散度（JSD）测量，而且在分类性能方面表现更优秀（更高的F1分数），胜过传统方法。然而，我们的方法在讽刺和立场检测等任务中表现出较低的信心，可能是因为文本中存在固有的主观性。最后，利用可解释人工智能（XAI），我们探索模型不确定性，并揭示模型预测的有意义见解。

更新时间: 2025-06-25 07:53:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.20209v1

Affective Priming Score: A Data-Driven Method to Detect Priming in Sequential Datasets

Affective priming exemplifies the challenge of ambiguity in affective computing. While the community has largely addressed this issue from a label-based perspective, identifying data points in the sequence affected by the priming effect, the impact of priming on data itself, particularly in physiological signals, remains underexplored. Data affected by priming can lead to misclassifications when used in learning models. This study proposes the Affective Priming Score (APS), a data-driven method to detect data points influenced by the priming effect. The APS assigns a score to each data point, quantifying the extent to which it is affected by priming. To validate this method, we apply it to the SEED and SEED-VII datasets, which contain sufficient transitions between emotional events to exhibit priming effects. We train models with the same configuration using both the original data and priming-free sequences. The misclassification rate is significantly reduced when using priming-free sequences compared to the original data. This work contributes to the broader challenge of ambiguity by identifying and mitigating priming effects at the data level, enhancing model robustness, and offering valuable insights for the design and collection of affective computing datasets.

Updated: 2025-06-25 07:48:22

标题: 情感启动分数：一种数据驱动方法，用于检测序列数据集中的启动

摘要: 情感启动展示了情感计算中歧义性的挑战。虽然社区主要从基于标签的角度解决了这个问题，识别序列中受到启动效应影响的数据点，但是启动对数据本身，特别是生理信号的影响仍未得到充分探讨。受启动影响的数据可能导致在学习模型中的误分类。本研究提出了情感启动得分（APS），这是一种数据驱动的方法，用于检测受到启动效应影响的数据点。APS为每个数据点分配一个分数，量化其受启动影响的程度。为了验证这一方法，我们将其应用于SEED和SEED-VII数据集，这些数据集包含足够的情感事件转换以展现启动效应。我们使用相同配置的模型分别使用原始数据和无启动序列进行训练。与使用原始数据相比，使用无启动序列时误分类率显著降低。这项工作通过在数据级别识别和减轻启动效应，增强模型的稳健性，并为情感计算数据集的设计和收集提供宝贵的见解，从而为更广泛的歧义性挑战做出贡献。

更新时间: 2025-06-25 07:48:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.20204v1

How to Retrieve Examples in In-context Learning to Improve Conversational Emotion Recognition using Large Language Models?

Large language models (LLMs) have enabled a wide variety of real-world applications in various domains. However, creating a high-performing application with high accuracy remains challenging, particularly for subjective tasks like emotion recognition. Inspired by the SLT 2024 GenSER Challenge, this study investigates approaches to improving conversational emotion recognition (CER) by LLMs. Specifically, we explore how to retrieve high-quality examples in in-context learning (ICL) to enhance CER. We propose various strategies based on random and augmented example retrieval and also analyze the impact of conversational context on CER accuracy. Experiments were conducted on the three datasets including IEMOCAP, MELD and EmoryNLP. The results show that augmented example retrieval consistently outperforms other techniques under investigation across all datasets, highlighting the importance of retrieving coherent targeted examples and enhancing them through paraphrasing.

Updated: 2025-06-25 07:39:19

标题: 如何利用大型语言模型在上下文学习中检索示例以改进对话情绪识别？

摘要: 大型语言模型（LLMs）已经在各个领域实现了各种真实世界的应用。然而，创建一个高性能的、高准确度的应用仍然具有挑战性，特别是对于主观任务如情感识别。受SLT 2024 GenSER挑战的启发，本研究探讨了通过LLMs改进对话情感识别（CER）的方法。具体来说，我们探讨了如何在上下文学习（ICL）中检索高质量的例子以增强CER。我们提出了基于随机和增强例子检索的各种策略，并分析了对话上下文对CER准确度的影响。实验在包括IEMOCAP、MELD和EmoryNLP在内的三个数据集上进行。结果显示，增强例子检索在所有数据集上持续优于其他技术，突显了检索连贯的目标例子并通过改写加以增强的重要性。

更新时间: 2025-06-25 07:39:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.20199v1

Zero-Shot Attribution for Large Language Models: A Distribution Testing Approach

A growing fraction of all code is sampled from Large Language Models (LLMs). We investigate the problem of attributing code generated by language models using hypothesis testing to leverage established techniques and guarantees. Given a set of samples $S$ and a suspect model $\mathcal{L}^*$, our goal is to assess the likelihood of $S$ originating from $\mathcal{L}^*$. Due to the curse of dimensionality, this is intractable when only samples from the LLM are given: to circumvent this, we use both samples and density estimates from the LLM, a form of access commonly available. We introduce $\mathsf{Anubis}$, a zero-shot attribution tool that frames attribution as a distribution testing problem. Our experiments on a benchmark of code samples show that $\mathsf{Anubis}$ achieves high AUROC scores ( $\ge0.9$) when distinguishing between LLMs like DeepSeek-Coder, CodeGemma, and Stable-Code using only $\approx 2000$ samples.

Updated: 2025-06-25 07:37:16

标题: 大型语言模型的零射击归因：一种分布测试方法

摘要: 一个越来越大的代码部分是从大型语言模型（LLM）中抽样的。我们研究了使用假设检验来归因于语言模型生成的代码的问题，以利用已建立的技术和保证。给定一组样本$S$和一个可疑模型$\mathcal{L}^*$，我们的目标是评估$S$来自$\mathcal{L}^*$的可能性。由于维度灾难，当只有LLM生成的样本时，这是不可行的：为了解决这个问题，我们使用LLM的样本和密度估计，这是一种常见的访问形式。我们介绍了$\mathsf{Anubis}$，一个零-shot归因工具，将归因作为分布测试问题。我们在一个代码样本基准上的实验表明，$\mathsf{Anubis}$在仅使用约2000个样本时，在区分类似DeepSeek-Coder、CodeGemma和Stable-Code等LLMs时，实现了高AUROC分数（$\ge0.9$）。

更新时间: 2025-06-25 07:37:16

领域: cs.LG,cs.AI,cs.SE

下载: http://arxiv.org/abs/2506.20197v1

DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs

Large language models (LLMs) deliver strong performance but are difficult to deploy due to high memory and compute costs. While pruning reduces these demands, most methods ignore activation sparsity observed at runtime. We reinterpret activation sparsity as dynamic structured weight sparsity and propose DuoGPT, a unified framework that constructs dual-sparse (spMspV) workloads by combining unstructured weight pruning with activation sparsity. To preserve accuracy, we extend the Optimal Brain Compression (OBC) framework with activation-aware calibration and introduce output residuals from the dense model as correction terms. We further optimize the solution for efficient GPU execution, enabling scalability to billion-parameter LLMs. Evaluations on LLaMA-2 and LLaMA-3 show that DuoGPT outperforms state-of-the-art structured pruning methods by up to 9.17% accuracy at an iso-speedup of 1.39$\times$ compared to the baseline dense model.

Updated: 2025-06-25 07:35:12

标题: DuoGPT: 在LLMs中通过激活感知剪枝实现训练无需的双重稀疏化

摘要: 大型语言模型（LLMs）表现出色，但由于高内存和计算成本而难以部署。虽然剪枝可以减少这些要求，但大多数方法忽略了运行时观察到的激活稀疏性。我们将激活稀疏性重新解释为动态结构化权重稀疏性，并提出了DuoGPT，这是一个统一的框架，通过将无结构权重剪枝与激活稀疏性相结合，构建双稀疏（spMspV）工作负载。为了保持准确性，我们扩展了Optimal Brain Compression（OBC）框架，引入了激活感知校准，并将密集模型的输出残差作为修正项。我们进一步优化解决方案，以实现高效的GPU执行，使其能够扩展到数十亿参数的LLMs。对LLaMA-2和LLaMA-3的评估表明，与基线密集模型相比，DuoGPT在等速提升1.39倍的情况下，准确度高出最先进的结构化剪枝方法高达9.17%。

更新时间: 2025-06-25 07:35:12

领域: cs.LG

下载: http://arxiv.org/abs/2506.20194v1

IKDiffuser: A Generative Inverse Kinematics Solver for Multi-arm Robots via Diffusion Model

Solving Inverse Kinematics (IK) problems is fundamental to robotics, but has primarily been successful with single serial manipulators. For multi-arm robotic systems, IK remains challenging due to complex self-collisions, coupled joints, and high-dimensional redundancy. These complexities make traditional IK solvers slow, prone to failure, and lacking in solution diversity. In this paper, we present IKDiffuser, a diffusion-based model designed for fast and diverse IK solution generation for multi-arm robotic systems. IKDiffuser learns the joint distribution over the configuration space, capturing complex dependencies and enabling seamless generalization to multi-arm robotic systems of different structures. In addition, IKDiffuser can incorporate additional objectives during inference without retraining, offering versatility and adaptability for task-specific requirements. In experiments on 6 different multi-arm systems, the proposed IKDiffuser achieves superior solution accuracy, precision, diversity, and computational efficiency compared to existing solvers. The proposed IKDiffuser framework offers a scalable, unified approach to solving multi-arm IK problems, facilitating the potential of multi-arm robotic systems in real-time manipulation tasks.

Updated: 2025-06-25 07:27:44

标题: IKDiffuser：一种多臂机器人的生成逆运动学求解器，通过扩散模型

摘要: 解决逆运动学（IK）问题对机器人技术至关重要，但主要成功应用于单个串联机械手。对于多臂机器人系统，由于复杂的自碰撞、耦合关节和高维冗余性，IK仍然具有挑战性。这些复杂性使传统的IK求解器缓慢、易于失败，并且缺乏解决方案的多样性。在本文中，我们提出了IKDiffuser，这是一种基于扩散的模型，专为多臂机器人系统快速生成多样化的IK解决方案而设计。IKDiffuser学习了配置空间上的关节分布，捕捉了复杂的依赖关系，并使得对不同结构的多臂机器人系统的无缝泛化成为可能。此外，IKDiffuser可以在推断过程中加入额外的目标而无需重新训练，提供了灵活性和适应性，以满足特定任务需求。在对6种不同多臂系统的实验中，所提出的IKDiffuser相对于现有的求解器实现了更高的解决方案准确性、精度、多样性和计算效率。所提出的IKDiffuser框架提供了一种可扩展的、统一的方法来解决多臂IK问题，促进了多臂机器人系统在实时操作任务中的潜力。

更新时间: 2025-06-25 07:27:44

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.13087v3

Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU

Advanced Large Language Models (LLMs) have achieved impressive performance across a wide range of complex and long-context natural language tasks. However, performing long-context LLM inference locally on a commodity GPU (a PC) with privacy concerns remains challenging due to the increasing memory demands of the key-value (KV) cache. Existing systems typically identify important tokens and selectively offload their KV data to GPU and CPU memory. The KV data needs to be offloaded to disk due to the limited memory on a commodity GPU, but the process is bottlenecked by token importance evaluation overhead and the disk's low bandwidth. In this paper, we present LeoAM, the first efficient importance-aware long-context LLM inference system for a single commodity GPU with adaptive hierarchical GPU-CPU-Disk KV management. Our system employs an adaptive KV management strategy that partitions KV data into variable-sized chunks based on the skewed distribution of attention weights across different layers to reduce computational and additional transmission overheads. Moreover, we propose a lightweight KV abstract method, which minimizes transmission latency by storing and extracting the KV abstract of each chunk on disk instead of the full KV data. LeoAM also leverages the dynamic compression and pipeline techniques to further accelerate inference. Experimental results demonstrate that LongInfer achieves an average inference latency speedup of 3.46x, while maintaining comparable LLM response quality. In scenarios with larger batch sizes, it achieves up to a 5.47x speedup.

Updated: 2025-06-25 07:26:42

标题: 突破长文本LLM推理的边界：在单一商品GPU上进行自适应KV管理

摘要: 先进的大型语言模型（LLMs）在各种复杂和长文本自然语言任务中取得了令人印象深刻的表现。然而，在具有隐私问题的商品GPU（个人电脑）上进行长文本LLM推理仍然具有挑战性，这是由于关键-值（KV）缓存的内存需求不断增加。现有系统通常识别重要的标记，并选择性地将它们的KV数据卸载到GPU和CPU内存。由于商品GPU上的内存有限，KV数据需要卸载到磁盘，但这一过程受到标记重要性评估开销和磁盘低带宽的瓶颈影响。在本文中，我们提出了LeoAM，这是第一个针对单个商品GPU的高效的重要性感知长文本LLM推理系统，具有自适应的分层GPU-CPU-磁盘KV管理。我们的系统采用自适应的KV管理策略，根据不同层之间注意力权重的倾斜分布将KV数据分区为不同大小的块，以减少计算和额外传输开销。此外，我们提出了一种轻量级的KV抽象方法，通过在磁盘上存储和提取每个块的KV抽象而不是完整的KV数据来最小化传输延迟。LeoAM还利用动态压缩和流水线技术进一步加速推理。实验结果表明，LongInfer实现了平均推理延迟加速3.46倍，同时保持可比较的LLM响应质量。在批处理大小较大的情况下，它可以实现高达5.47倍的加速。

更新时间: 2025-06-25 07:26:42

领域: cs.OS,cs.CR,68M20,C.4

下载: http://arxiv.org/abs/2506.20187v1

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a detailed scene from insufficient captured views is still an ill-posed optimization problem, often resulting in artifacts and distortions in unseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task. The key insight is to unleash the strong generative prior of large pre-trained video diffusion models for sparse-view reconstruction. However, 3D view consistency struggles to be accurately preserved in directly generated video frames from pre-trained models. To address this, given limited input views, the proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition. Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency, ensuring the coherence of the scene from various perspectives. Finally, we recover the 3D scene from the generated video through a confidence-aware 3D Gaussian Splatting optimization scheme. Extensive experiments on various real-world datasets show the superiority of our ReconX over state-of-the-art methods in terms of quality and generalizability.

Updated: 2025-06-25 07:19:44

标题: ReconX：使用视频扩散模型从稀疏视图中重建任何场景

摘要: 三维场景重建的进展已经将现实世界的二维图像转化为三维模型，从数百张输入照片中产生逼真的三维结果。尽管在密集视图重建场景中取得了巨大成功，但从不足的捕获视图中渲染详细场景仍然是一个未明确定的优化问题，往往导致未见区域出现伪影和失真。在本文中，我们提出了ReconX，一种新颖的三维场景重建范式，将模糊的重建挑战重新构想为一个时间生成任务。关键的洞察是释放大规模预训练视频扩散模型的强大生成先验，用于稀疏视图重建。然而，从预训练模型直接生成的视频帧存在3D视图一致性难以准确保留的问题。为了解决这个问题，根据有限的输入视图，提出的ReconX首先构建全局点云，并将其编码为上下文空间作为3D结构条件。在条件的指导下，视频扩散模型然后合成既保留细节又展现高度3D一致性的视频帧，确保从各种角度看到的场景的连贯性。最后，通过一种自信感知的3D高斯分层优化方案，我们从生成的视频中恢复3D场景。在各种真实世界数据集上进行的大量实验显示，我们的ReconX在质量和泛化能力方面优于现有方法。

更新时间: 2025-06-25 07:19:44

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2408.16767v4

Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

Retrieval-Augmented Generation (RAG) systems and large language model (LLM)-powered chatbots have significantly advanced conversational AI by combining generative capabilities with external knowledge retrieval. Despite their success, enterprise-scale deployments face critical challenges, including diverse user queries, high latency, hallucinations, and difficulty integrating frequently updated domain-specific knowledge. This paper introduces a novel hybrid framework that integrates RAG with intent-based canned responses, leveraging predefined high-confidence responses for efficiency while dynamically routing complex or ambiguous queries to the RAG pipeline. Our framework employs a dialogue context manager to ensure coherence in multi-turn interactions and incorporates a feedback loop to refine intents, dynamically adjust confidence thresholds, and expand response coverage over time. Experimental results demonstrate that the proposed framework achieves a balance of high accuracy (95\%) and low latency (180ms), outperforming RAG and intent-based systems across diverse query types, positioning it as a scalable and adaptive solution for enterprise conversational AI applications.

Updated: 2025-06-25 07:18:47

标题: 混合智能技术用于具有新的动态路由和反馈适应性的响应式多轮在线对话

摘要: 检索增强生成（RAG）系统和大型语言模型（LLM）驱动的聊天机器人通过将生成能力与外部知识检索相结合，显著推进了对话型人工智能。尽管取得了成功，但企业规模的部署面临着重要挑战，包括多样化用户查询、高延迟、幻觉以及难以整合频繁更新的领域特定知识。本文介绍了一种新颖的混合框架，将RAG与基于意图的预定义高可信度响应集成在一起，利用高置信度响应实现效率，同时将复杂或模糊的查询动态路由到RAG管道。我们的框架采用对话上下文管理器，确保多轮交互中的连贯性，并引入反馈循环以细化意图，动态调整置信度阈值，并随时间扩展响应覆盖范围。实验结果表明，所提出的框架在高准确性（95\%）和低延迟（180毫秒）之间取得了平衡，在各种查询类型上优于RAG和基于意图的系统，使其成为企业对话型人工智能应用的可扩展且适应性强的解决方案。

更新时间: 2025-06-25 07:18:47

领域: cs.AI

下载: http://arxiv.org/abs/2506.02097v2

Causal Operator Discovery in Partial Differential Equations via Counterfactual Physics-Informed Neural Networks

We develop a principled framework for discovering causal structure in partial differential equations (PDEs) using physics-informed neural networks and counterfactual perturbations. Unlike classical residual minimization or sparse regression methods, our approach quantifies operator-level necessity through functional interventions on the governing dynamics. We introduce causal sensitivity indices and structural deviation metrics to assess the influence of candidate differential operators within neural surrogates. Theoretically, we prove exact recovery of the causal operator support under restricted isometry or mutual coherence conditions, with residual bounds guaranteeing identifiability. Empirically, we validate the framework on both synthetic and real-world datasets across climate dynamics, tumor diffusion, and ocean flows. Our method consistently recovers governing operators even under noise, redundancy, and data scarcity, outperforming standard PINNs and DeepONets in structural fidelity. This work positions causal PDE discovery as a tractable and interpretable inference task grounded in structural causal models and variational residual analysis.

Updated: 2025-06-25 07:15:42

标题: 通过反事实物理信息神经网络在偏微分方程中发现因果操作符

摘要: 我们提出了一个有原则的框架，利用物理信息神经网络和反事实干扰来发现偏微分方程（PDEs）中的因果结构。与传统的残差最小化或稀疏回归方法不同，我们的方法通过对主导动力学进行功能干预来量化操作级必要性。我们引入因果敏感性指数和结构偏差度量来评估神经替代物中候选微分算子的影响。理论上，我们证明了在受限等距或互补条件下因果操作支持的确切恢复，残差界限保证了可识别性。在实证方面，我们在气候动力学、肿瘤扩散和海洋流动等合成和真实数据集上验证了该框架。我们的方法即使在噪声、冗余和数据稀缺的情况下也能始终恢复主导算子，在结构保真度方面优于标准的PINNs和DeepONets。这项工作将因果PDE发现定位为一个有解的、可解释的推断任务，基于结构因果模型和变分残差分析。

更新时间: 2025-06-25 07:15:42

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2506.20181v1

Progressive Alignment Degradation Learning for Pansharpening

Deep learning-based pansharpening has been shown to effectively generate high-resolution multispectral (HRMS) images. To create supervised ground-truth HRMS images, synthetic data generated using the Wald protocol is commonly employed. This protocol assumes that networks trained on artificial low-resolution data will perform equally well on high-resolution data. However, well-trained models typically exhibit a trade-off in performance between reduced-resolution and full-resolution datasets. In this paper, we delve into the Wald protocol and find that its inaccurate approximation of real-world degradation patterns limits the generalization of deep pansharpening models. To address this issue, we propose the Progressive Alignment Degradation Module (PADM), which uses mutual iteration between two sub-networks, PAlignNet and PDegradeNet, to adaptively learn accurate degradation processes without relying on predefined operators. Building on this, we introduce HFreqdiff, which embeds high-frequency details into a diffusion framework and incorporates CFB and BACM modules for frequency-selective detail extraction and precise reverse process learning. These innovations enable effective integration of high-resolution panchromatic and multispectral images, significantly enhancing spatial sharpness and quality. Experiments and ablation studies demonstrate the proposed method's superior performance compared to state-of-the-art techniques.

Updated: 2025-06-25 07:07:32

标题: 渐进对准降解学习用于全色融合

摘要: 基于深度学习的全色融合技术已被证明能够有效生成高分辨率多光谱（HRMS）图像。为了创建监督式的真实HRMS图像，通常使用使用Wald协议生成的合成数据。该协议假设在人工低分辨率数据上训练的网络将在高分辨率数据上表现同样良好。然而，训练良好的模型通常在降低分辨率和完整分辨率数据之间存在性能的权衡。本文深入研究了Wald协议，发现其对实际世界退化模式的不准确近似限制了深度全色融合模型的泛化能力。为了解决这一问题，我们提出了渐进对齐退化模块（PADM），该模块利用PAlignNet和PDegradeNet两个子网络之间的相互迭代，自适应地学习准确的退化过程，而不依赖于预定义的操作符。在此基础上，我们引入了HFreqdiff，将高频细节嵌入扩散框架，并结合CFB和BACM模块进行频率选择性细节提取和精确的反向过程学习。这些创新使高分辨率全色和多光谱图像有效集成，显著提高空间锐度和质量。实验证明，与最先进的技术相比，所提出的方法表现出卓越的性能。

更新时间: 2025-06-25 07:07:32

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2506.20179v1

Visual-Semantic Knowledge Conflicts in Operating Rooms: Synthetic Data Curation for Surgical Risk Perception in Multimodal Large Language Models

Surgical risk identification is critical for patient safety and reducing preventable medical errors. While multimodal large language models (MLLMs) show promise for automated operating room (OR) risk detection, they often exhibit visual-semantic knowledge conflicts (VS-KC), failing to identify visual safety violations despite understanding textual rules. To address this, we introduce a dataset comprising over 34,000 synthetic images generated by diffusion models, depicting operating room scenes containing entities that violate established safety rules. These images were created to alleviate data scarcity and examine MLLMs vulnerabilities. In addition, the dataset includes 214 human-annotated images that serve as a gold-standard reference for validation. This comprehensive dataset, spanning diverse perspectives, stages, and configurations, is designed to expose and study VS-KC. Fine-tuning on OR-VSKC significantly improves MLLMs' detection of trained conflict entities and generalizes well to new viewpoints for these entities, but performance on untrained entity types remains poor, highlighting learning specificity and the need for comprehensive training. The main contributions of this work include: (1) a data generation methodology tailored for rule-violation scenarios; (2) the release of the OR-VSKC dataset and its associated benchmark as open-source resources; and (3) an empirical analysis of violation-sensitive knowledge consistency in representative MLLMs. The dataset and appendix are available at https://github.com/zgg2577/VS-KC.

Updated: 2025-06-25 07:06:29

标题: 手术室中的视觉-语义知识冲突：多模式大型语言模型中的手术风险感知的合成数据策划

摘要: 手术风险识别对于患者安全和减少可预防的医疗错误至关重要。虽然多模式大语言模型（MLLMs）显示出自动化手术室（OR）风险检测的潜力，但它们经常表现出视觉-语义知识冲突（VS-KC），无法识别视觉安全违规行为，尽管理解文本规则。为解决这一问题，我们引入了一个数据集，包括由扩散模型生成的超过34,000张合成图像，描绘包含违反既定安全规则的实体的手术室场景。这些图像旨在缓解数据稀缺性并检查MLLMs的脆弱性。此外，数据集包括214张人工注释的图像，作为验证的黄金标准参考。这个全面的数据集涵盖了多种视角、阶段和配置，旨在暴露和研究VS-KC。在OR-VSKC上微调显著改善了MLLMs对经过训练的冲突实体的检测，并且对这些实体的新视角具有很好的泛化能力，但对未经训练的实体类型的表现仍然很差，突出了学习特异性和全面培训的必要性。这项工作的主要贡献包括：（1）为违规情景量身定制的数据生成方法；（2）发布OR-VSKC数据集及其相关基准作为开源资源；（3）对代表性MLLMs中违规敏感知识的一项实证分析。数据集和附录可在https://github.com/zgg2577/VS-KC获取。

更新时间: 2025-06-25 07:06:29

领域: cs.CV,cs.AI,68T07, 68U10, 92C55,I.2.10; I.2.7; J.3; I.2.6

下载: http://arxiv.org/abs/2506.22500v1

COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees

Uncertainty quantification (UQ) for foundation models is essential to identify and mitigate potential hallucinations in automatically generated text. However, heuristic UQ approaches lack formal guarantees for key metrics such as the false discovery rate (FDR) in selective prediction. Previous work adopts the split conformal prediction (SCP) framework to ensure desired coverage of admissible answers by constructing prediction sets, but these sets often contain incorrect candidates, limiting their practical utility. To address this, we propose COIN, an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question under user-specified FDR constraints. COIN estimates the empirical error rate on a calibration set and applies confidence interval methods such as Clopper-Pearson to establish a high-probability upper bound on the true error rate (i.e., FDR). This enables the selection of the largest uncertainty threshold that ensures FDR control on test data while significantly increasing sample retention. We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data across both general and multimodal text generation tasks. Furthermore, we show that employing alternative upper bound constructions and UQ strategies can further boost COIN's power performance, which underscores its extensibility and adaptability to diverse application scenarios.

Updated: 2025-06-25 07:04:49

标题: COIN：具有可证明风险保障的基于基础模型的不确定性防护选择性问答

摘要: 基金会模型的不确定性量化（UQ）对于识别和减轻自动生成文本中潜在幻觉至关重要。然而，启发式UQ方法缺乏对关键指标（如选择性预测中的虚假发现率（FDR））的正式保证。先前的工作采用分裂符合性预测（SCP）框架，通过构建预测集来确保可接受答案的所需覆盖范围，但这些集合通常包含不正确的候选项，限制了它们的实际效用。为了解决这个问题，我们提出了COIN，一个不确定性保护选择框架，根据用户指定的FDR约束校准统计有效的阈值，以过滤每个问题下生成的单个答案。COIN在校准集上估计经验误差率，并应用诸如Clopper-Pearson之类的置信区间方法，建立真实误差率（即FDR）的高概率上限。这使得可以选择确保在测试数据上控制FDR的最大不确定性阈值，同时显著增加样本保留。我们展示了COIN在风险控制方面的稳健性，在保留可接受答案方面的强大测试时间能力，以及在有限校准数据下跨一般和多模态文本生成任务中的预测效率。此外，我们展示了采用替代上限构造和UQ策略可以进一步提高COIN的性能，这突显了其对各种应用场景的可扩展性和适应性。

更新时间: 2025-06-25 07:04:49

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.20178v1

Valid Selection among Conformal Sets

Conformal prediction offers a distribution-free framework for constructing prediction sets with coverage guarantees. In practice, multiple valid conformal prediction sets may be available, arising from different models or methodologies. However, selecting the most desirable set, such as the smallest, can invalidate the coverage guarantees. To address this challenge, we propose a stability-based approach that ensures coverage for the selected prediction set. We extend our results to the online conformal setting, propose several refinements in settings where additional structure is available, and demonstrate its effectiveness through experiments.

Updated: 2025-06-25 06:59:55

标题: Conformal集合中的有效选择

摘要: 共形预测提供了一个无分布框架，用于构建具有覆盖保证的预测集。在实践中，可能有多个有效的共形预测集可供选择，这些集合可能来自不同的模型或方法。然而，选择最理想的集合，比如最小的集合，可能会使覆盖保证失效。为了解决这一挑战，我们提出了一种基于稳定性的方法，确保所选的预测集有覆盖。我们将我们的结果扩展到在线共形设置，提出了在具有额外结构的情境中的几种改进，并通过实验展示了其有效性。

更新时间: 2025-06-25 06:59:55

领域: stat.ML,cs.AI,cs.LG,stat.ME,stat.OT

下载: http://arxiv.org/abs/2506.20173v1

JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation

Deobfuscating JavaScript (JS) code poses a significant challenge in web security, particularly as obfuscation techniques are frequently used to conceal malicious activities within scripts. While Large Language Models (LLMs) have recently shown promise in automating the deobfuscation process, transforming detection and mitigation strategies against these obfuscated threats, a systematic benchmark to quantify their effectiveness and limitations has been notably absent. To address this gap, we present JsDeObsBench, a dedicated benchmark designed to rigorously evaluate the effectiveness of LLMs in the context of JS deobfuscation. We detail our benchmarking methodology, which includes a wide range of obfuscation techniques ranging from basic variable renaming to sophisticated structure transformations, providing a robust framework for assessing LLM performance in real-world scenarios. Our extensive experimental analysis investigates the proficiency of cutting-edge LLMs, e.g., GPT-4o, Mixtral, Llama, and DeepSeek-Coder, revealing superior performance in code simplification despite challenges in maintaining syntax accuracy and execution reliability compared to baseline methods. We further evaluate the deobfuscation of JS malware to exhibit the potential of LLMs in security scenarios. The findings highlight the utility of LLMs in deobfuscation applications and pinpoint crucial areas for further improvement.

Updated: 2025-06-25 06:50:13

标题: JsDeObsBench：测量和基准测试用于JavaScript去混淆的LLMs

摘要: 解混淆JavaScript（JS）代码在网络安全中是一个重大挑战，尤其是由于混淆技术经常被用来掩盖脚本中的恶意活动。尽管大型语言模型（LLMs）最近展示了在自动化解混淆过程中的潜力，转变了检测和缓解这些混淆威胁的策略，但缺乏系统性的基准来量化它们的有效性和局限性。为了弥补这一空白，我们提出了JsDeObsBench，一个专门设计用于严格评估LLMs在JS解混淆环境中的有效性的基准。我们详细介绍了我们的基准方法论，其中包括从基本变量重命名到复杂结构转换的广泛范围的混淆技术，为评估LLMs在实际场景中的表现提供了一个健壮的框架。我们的广泛实验分析研究了最新的LLMs的能力，例如GPT-4o，Mixtral，Llama和DeepSeek-Coder，在代码简化方面表现出优越性，尽管在与基准方法相比保持语法准确性和执行可靠性方面存在挑战。我们进一步评估了JS恶意软件的解混淆，展示了LLMs在安全场景中的潜力。研究结果突出了LLMs在解混淆应用中的实用性，并指出了进一步改进的关键领域。

更新时间: 2025-06-25 06:50:13

领域: cs.CR

下载: http://arxiv.org/abs/2506.20170v1

Causal discovery in deterministic discrete LTI-DAE systems

Discovering pure causes or driver variables in deterministic LTI systems is of vital importance in the data-driven reconstruction of causal networks. A recent work by Kathari and Tangirala, proposed in 2022, formulated the causal discovery method as a constraint identification problem. The constraints are identified using a dynamic iterative PCA (DIPCA)-based approach for dynamical systems corrupted with Gaussian measurement errors. The DIPCA-based method works efficiently for dynamical systems devoid of any algebraic relations. However, several dynamical systems operate under feedback control and/or are coupled with conservation laws, leading to differential-algebraic (DAE) or mixed causal systems. In this work, a method, namely the partition of variables (PoV), for causal discovery in LTI-DAE systems is proposed. This method is superior to the method that was presented by Kathari and Tangirala (2022), as PoV also works for pure dynamical systems, which are devoid of algebraic equations. The proposed method identifies the causal drivers up to a minimal subset. PoV deploys DIPCA to first determine the number of algebraic relations ($n_a$), the number of dynamical relations ($n_d$) and the constraint matrix. Subsequently, the subsets are identified through an admissible partitioning of the constraint matrix by finding the condition number of it. Case studies are presented to demonstrate the effectiveness of the proposed method.

Updated: 2025-06-25 06:47:22

标题: Deterministic discrete LTI-DAE系统中的因果发现

摘要: 在确定性LTI系统中发现纯因果或驱动变量对于数据驱动的因果网络重建至关重要。Kathari和Tangirala在2022年提出了一个最近的工作，将因果发现方法建模为约束识别问题。这些约束是通过基于动态迭代PCA（DIPCA）的方法识别的，用于受高斯测量误差污染的动态系统。DIPCA方法有效地适用于没有任何代数关系的动态系统。然而，一些动态系统在反馈控制下运行和/或与守恒定律耦合，导致微分代数（DAE）或混合因果系统。在这项工作中，提出了一种名为变量分区（PoV）的方法，用于LTI-DAE系统中的因果发现。这种方法优于Kathari和Tangirala（2022年）提出的方法，因为PoV也适用于不包含代数方程的纯动态系统。该方法确定了导致因果的最小子集。PoV首先利用DIPCA确定代数关系的数量（$n_a$）、动态关系的数量（$n_d$）和约束矩阵。随后，通过找到约束矩阵的条件数，通过对约束矩阵进行可接受的分区来识别子集。通过案例研究来展示所提方法的有效性。

更新时间: 2025-06-25 06:47:22

领域: cs.LG,cs.SY,eess.SP,eess.SY,stat.ME

下载: http://arxiv.org/abs/2506.20169v1

Scalable Dynamic Origin-Destination Demand Estimation Enhanced by High-Resolution Satellite Imagery Data

This study presents a novel integrated framework for dynamic origin-destination demand estimation (DODE) in multi-class mesoscopic network models, leveraging high-resolution satellite imagery together with conventional traffic data from local sensors. Unlike sparse local detectors, satellite imagery offers consistent, city-wide road and traffic information of both parking and moving vehicles, overcoming data availability limitations. To extract information from imagery data, we design a computer vision pipeline for class-specific vehicle detection and map matching, generating link-level traffic density observations by vehicle class. Building upon this information, we formulate a computational graph-based DODE model that calibrates dynamic network states by jointly matching observed traffic counts and travel times from local sensors with density measurements derived from satellite imagery. To assess the accuracy and scalability of the proposed framework, we conduct a series of numerical experiments using both synthetic and real-world data. The results of out-of-sample tests demonstrate that supplementing traditional data with satellite-derived density significantly improves estimation performance, especially for links without local sensors. Real-world experiments also confirm the framework's capability to handle large-scale networks, supporting its potential for practical deployment in cities of varying sizes. Sensitivity analysis further evaluates the impact of data quality related to satellite imagery data.

Updated: 2025-06-25 06:47:06

标题: 可扩展的动态出发地-目的地需求估计，通过高分辨率卫星图像数据增强

摘要: 这项研究提出了一个新颖的集成框架，用于在多类中观网络模型中进行动态起点-终点需求估计（DODE），利用高分辨率卫星图像与来自本地传感器的传统交通数据。与稀疏的本地探测器不同，卫星图像提供了一致的城市范围内道路和交通信息，包括停车和行驶车辆，克服了数据可用性限制。为了从图像数据中提取信息，我们设计了一个计算机视觉管线用于特定类别的车辆检测和地图匹配，通过车辆类别生成链路级别的交通密度观测。基于这些信息，我们制定了一个基于计算图的DODE模型，通过同时匹配来自本地传感器的观测交通量和旅行时间与从卫星图像中导出的密度测量来校准动态网络状态。为了评估所提出的框架的准确性和可扩展性，我们进行了一系列数字实验，使用合成和真实世界数据。离样本测试结果表明，通过卫星导出的密度显著提高了估计性能，尤其是对于没有本地传感器的链路。真实世界实验也证实了该框架处理大规模网络的能力，支持其在不同规模城市中的实际部署潜力。敏感性分析进一步评估了与卫星图像数据相关的数据质量的影响。

更新时间: 2025-06-25 06:47:06

领域: cs.CV,cs.AI,stat.AP

下载: http://arxiv.org/abs/2506.22499v1

SEED: A Structural Encoder for Embedding-Driven Decoding in Time Series Prediction with LLMs

Multivariate time series forecasting requires models to simultaneously capture variable-wise structural dependencies and generalize across diverse tasks. While structural encoders are effective in modeling feature interactions, they lack the capacity to support semantic-level reasoning or task adaptation. Conversely, large language models (LLMs) possess strong generalization capabilities but remain incompatible with raw time series inputs. This gap limits the development of unified, transferable prediction systems. Therefore, we introduce SEED, a structural encoder for embedding-driven decoding, which integrates four stages: a token-aware encoder for patch extraction, a projection module that aligns patches with language model embeddings, a semantic reprogramming mechanism that maps patches to task-aware prototypes, and a frozen language model for prediction. This modular architecture decouples representation learning from inference, enabling efficient alignment between numerical patterns and semantic reasoning. Empirical results demonstrate that the proposed method achieves consistent improvements over strong baselines, and comparative studies on various datasets confirm SEED's role in addressing the structural-semantic modeling gap.

Updated: 2025-06-25 06:40:14

标题: SEED: 用于以LLMs为中心的时间序列预测中的结构编码器进行嵌入驱动解码

摘要: 多元时间序列预测需要模型同时捕捉变量间的结构依赖关系，并在不同任务之间进行泛化。虽然结构编码器在建模特征交互方面很有效，但缺乏支持语义级推理或任务适应的能力。相反，大型语言模型(LLMs)具有强大的泛化能力，但仍与原始时间序列输入不兼容。这种差距限制了统一、可转移的预测系统的发展。因此，我们介绍了SEED，一种用于嵌入驱动解码的结构编码器，它集成了四个阶段：用于补丁提取的标记感知编码器，将补丁与语言模型嵌入对齐的投影模块，将补丁映射到任务感知原型的语义重新编程机制，以及用于预测的冻结语言模型。这种模块化架构将表示学习与推理分离，实现了数字模式与语义推理之间的高效对齐。实证结果表明，所提出的方法在强基线上取得了一致的改进，并在各种数据集上的比较研究证实了SEED在解决结构-语义建模差距方面的作用。

更新时间: 2025-06-25 06:40:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.20167v1

Do psychic cells generate consciousness?

Technological advances in the past decades have begun to enable neuroscientists to address fundamental questions about consciousness in an unprecedented way. Here we review remarkable recent progress in our understanding of cellular-level mechanisms of conscious processing in the brain. Of particular interest are the cortical pyramidal neurons -- or "psychic cells" called by Ram\'on y Cajal more than 100 years ago -- which have an intriguing cellular mechanism that accounts for selective disruption of feedback signaling in the brain upon anesthetic-induced loss of consciousness. Importantly, a particular class of metabotropic receptors distributed over the dendrites of pyramidal cells are highlighted as the key cellular mechanism. After all, Cajal's instinct over a century ago may turn out to be correct -- we may have just begun to understand whether and how psychic cells indeed generate and control our consciousness.

Updated: 2025-06-25 06:38:13

标题: 灵能细胞产生意识吗？

摘要: 在过去几十年里，技术的进步已经开始使神经科学家以前所未有的方式探讨有关意识的基本问题。在这里，我们回顾了最近在理解大脑中意识加工的细胞水平机制方面取得的显著进展。特别感兴趣的是皮质金字塔神经元——或者说是拉蒙·伊·卡哈尔在100多年前称之为“心灵细胞”的细胞机制，可以解释在麻醉引起的意识丧失时大脑反馈信号的选择性中断。重要的是，一类分布在金字塔细胞树突上的代谢型受体被强调为关键的细胞机制。毕竟，一个多世纪前卡哈尔的直觉可能会被证明是正确的——我们也许刚刚开始了解心灵细胞是否真的产生并控制我们的意识。

更新时间: 2025-06-25 06:38:13

领域: q-bio.NC,cs.AI

下载: http://arxiv.org/abs/2506.20164v1

ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

Bed-related falls remain a leading source of injury in hospitals and long-term-care facilities, yet many commercial alarms trigger only after a patient has already left the bed. We show that early bed-exit intent can be predicted using only four low-cost load cells mounted under the bed legs. The resulting load signals are first converted into a compact set of complementary images: an RGB line plot that preserves raw waveforms and three texture maps - recurrence plot, Markov transition field, and Gramian angular field - that expose higher-order dynamics. We introduce ViFusionTST, a dual-stream Swin Transformer that processes the line plot and texture maps in parallel and fuses them through cross-attention to learn data-driven modality weights. To provide a realistic benchmark, we collected six months of continuous data from 95 beds in a long-term-care facility. On this real-world dataset ViFusionTST reaches an accuracy of 0.885 and an F1 score of 0.794, surpassing recent 1D and 2D time-series baselines across F1, recall, accuracy, and AUPRC. The results demonstrate that image-based fusion of load-sensor signals for time series classification is a practical and effective solution for real-time, privacy-preserving fall prevention.

Updated: 2025-06-25 06:30:59

标题: ViFusionTST: 从负载信号中深度融合时间序列图像表示以实现早期离床预测

摘要: 床上摔倒仍然是医院和长期护理设施中伤害的主要来源，然而许多商业警报系统只有在患者已经离开床铺后才触发。我们展示了早期床铺离开意图可以仅使用安装在床腿下的四个低成本负载传感器来预测。所得到的负载信号首先被转换为一组紧凑的互补图像：一个保留原始波形的RGB线图和三个纹理图 - 复发图、马尔可夫转移场和格里曼角场 - 揭示了更高阶的动态。我们介绍了ViFusionTST，这是一个双流Swin Transformer，可以并行处理线图和纹理图，通过交叉注意力将它们融合，学习数据驱动的模态权重。为了提供一个真实的基准，我们在一个长期护理设施中收集了95张床六个月的连续数据。在这个真实世界的数据集上，ViFusionTST达到了0.885的准确率和0.794的F1分数，超过了最近的1D和2D时间序列基准线在F1、召回率、准确率和AUPRC方面。结果表明，基于图像的负载传感器信号融合对于时间序列分类是一个实用且有效的解决方案，用于实时、保护隐私的防摔倒。

更新时间: 2025-06-25 06:30:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.22498v1

AI and Agile Software Development: From Frustration to Success -- XP2025 Workshop Summary

The full-day workshop on AI and Agile at XP 2025 convened a diverse group of researchers and industry practitioners to address the practical challenges and opportunities of integrating Artificial Intelligence into Agile software development. Through interactive sessions, participants identified shared frustrations related to integrating AI into Agile Software Development practices, including challenges with tooling, governance, data quality, and critical skill gaps. These challenges were systematically prioritized and analyzed to uncover root causes. The workshop culminated in the collaborative development of a research roadmap that pinpoints actionable directions for future work, including both immediate solutions and ambitious long-term goals. The key outcome is a structured agenda designed to foster joint industry-academic efforts to move from identified frustrations to successful implementation.

Updated: 2025-06-25 06:29:03

标题: 人工智能和敏捷软件开发：从失望到成功--XP2025研讨会总结

摘要: 在XP 2025年举行的AI和敏捷全天工作坊召集了一群多元化的研究人员和行业从业者，以解决将人工智能整合到敏捷软件开发中所面临的实际挑战和机遇。通过互动会议，参与者们确定了与将人工智能整合到敏捷软件开发实践中相关的共同挫败感，包括工具、治理、数据质量和重要技能差距方面的挑战。这些挑战被系统地优先并分析以找出根本原因。工作坊以共同开发研究路线图为结尾，该路线图指出了未来工作的可行方向，包括即时解决方案和雄心勃勃的长期目标。关键成果是一个结构化议程，旨在促进行业和学术界的共同努力，从确定的挫败感到成功实施。

更新时间: 2025-06-25 06:29:03

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2506.20159v1

Irec: A Metacognitive Scaffolding for Self-Regulated Learning through Just-in-Time Insight Recall: A Conceptual Framework and System Prototype

The core challenge in learning has shifted from knowledge acquisition to effective Self-Regulated Learning (SRL): planning, monitoring, and reflecting on one's learning. Existing digital tools, however, inadequately support metacognitive reflection. Spaced Repetition Systems (SRS) use de-contextualized review, overlooking the role of context, while Personal Knowledge Management (PKM) tools require high manual maintenance. To address these challenges, this paper introduces "Insight Recall," a novel paradigm that conceptualizes the context-triggered retrieval of personal past insights as a metacognitive scaffold to promote SRL. We formalize this paradigm using the Just-in-Time Adaptive Intervention (JITAI) framework and implement a prototype system, Irec, to demonstrate its feasibility. At its core, Irec uses a dynamic knowledge graph of the user's learning history. When a user faces a new problem, a hybrid retrieval engine recalls relevant personal "insights." Subsequently, a large language model (LLM) performs a deep similarity assessment to filter and present the most relevant scaffold in a just-in-time manner. To reduce cognitive load, Irec features a human-in-the-loop pipeline for LLM-based knowledge graph construction. We also propose an optional "Guided Inquiry" module, where users can engage in a Socratic dialogue with an expert LLM, using the current problem and recalled insights as context. The contribution of this paper is a solid theoretical framework and a usable system platform for designing next-generation intelligent learning systems that enhance metacognition and self-regulation.

Updated: 2025-06-25 06:23:39

标题: Irec：通过及时洞察回忆的元认知支架，促进自主学习：概念框架和系统原型

摘要: 学习中的核心挑战已经从知识获取转变为有效的自我调节学习（SRL）：规划、监控和反思自己的学习。然而，现有的数字工具不足以支持元认知反思。间隔重复系统（SRS）使用去上下文化的复习，忽视了上下文的作用，而个人知识管理（PKM）工具则需要高度的手动维护。为了解决这些挑战，本文介绍了一种新型范式“洞察召回”，该范式将个人过去洞察的上下文触发检索概念化为元认知支架，以促进SRL。我们使用即时自适应干预（JITAI）框架形式化了这一范式，并实施了一个原型系统Irec，以展示其可行性。在其核心，Irec使用用户学习历史的动态知识图。当用户面临新问题时，混合检索引擎会召回相关的个人“洞察”。随后，一个大型语言模型（LLM）执行深度相似性评估，以及时过滤并呈现最相关的支架。为了减少认知负荷，Irec具有一个人在循环管道，用于LLM基础知识图的构建。我们还提出了一个可选的“引导探究”模块，用户可以利用当前问题和召回的见解作为上下文，与专家LLM进行苏格拉底式对话。本文的贡献是一个坚实的理论框架和一个可用的系统平台，用于设计下一代智能学习系统，以增强元认知和自我调节能力。

更新时间: 2025-06-25 06:23:39

领域: cs.HC,cs.AI,cs.IR,H.5.2; I.2.7; H.3.3

下载: http://arxiv.org/abs/2506.20156v1

Mapping the Evolution of Research Contributions using KnoVo

This paper presents KnoVo (Knowledge Evolution), an intelligent framework designed for quantifying and analyzing the evolution of research novelty in the scientific literature. Moving beyond traditional citation analysis, which primarily measures impact, KnoVo determines a paper's novelty relative to both prior and subsequent work within its multilayered citation network. Given a target paper's abstract, KnoVo utilizes Large Language Models (LLMs) to dynamically extract dimensions of comparison (e.g., methodology, application, dataset). The target paper is then compared to related publications along these same extracted dimensions. This comparative analysis, inspired by tournament selection, yields quantitative novelty scores reflecting the relative improvement, equivalence, or inferiority of the target paper in specific aspects. By aggregating these scores and visualizing their progression, for instance, through dynamic evolution graphs and comparative radar charts, KnoVo facilitates researchers not only to assess originality and identify similar work, but also to track knowledge evolution along specific research dimensions, uncover research gaps, and explore cross-disciplinary connections. We demonstrate these capabilities through a detailed analysis of 20 diverse papers from multiple scientific fields and report on the performance of various open-source LLMs within the KnoVo framework.

Updated: 2025-06-25 06:22:45

标题: 使用KnoVo映射研究贡献的演变

摘要: 本文介绍了KnoVo（Knowledge Evolution），这是一个智能框架，旨在量化和分析科学文献中研究新颖性的演变。KnoVo超越了传统的引用分析，它主要衡量影响力，KnoVo确定了一篇论文相对于其多层引用网络中之前和之后工作的新颖性。给定目标论文的摘要，KnoVo利用大语言模型（LLMs）动态提取比较维度（例如方法论、应用、数据集）。然后，目标论文与沿着这些提取的维度的相关出版物进行比较。这种比较分析受锦标赛选择启发，产生了反映目标论文在特定方面相对改进、等同或劣势的定量新颖性评分。通过汇总这些评分，并通过动态演化图和比较雷达图可视化它们的进展，例如，KnoVo不仅有助于研究人员评估原创性和识别类似工作，还有助于沿着特定研究维度跟踪知识演变，发现研究空白，并探索跨学科联系。我们通过对来自多个科学领域的20篇不同论文进行详细分析来展示这些能力，并报告KnoVo框架内各种开源LLMs的性能。

更新时间: 2025-06-25 06:22:45

领域: cs.DL,cs.AI,cs.DB,cs.ET,cs.IR

下载: http://arxiv.org/abs/2506.17508v2

Loss-Aware Automatic Selection of Structured Pruning Criteria for Deep Neural Network Acceleration

Structured pruning is a well-established technique for compressing neural networks, making it suitable for deployment in resource-limited edge devices. This paper presents an efficient Loss-Aware Automatic Selection of Structured Pruning Criteria (LAASP) for slimming and accelerating deep neural networks. The majority of pruning methodologies employ a sequential process consisting of three stages: 1) training, 2) pruning, and 3) fine-tuning, whereas the proposed pruning technique adopts a pruning-while-training approach that eliminates the first stage and integrates the second and third stages into a single cycle. The automatic selection of magnitude or similarity-based filter pruning criteria from a specified pool of criteria and the specific pruning layer at each pruning iteration is guided by the network's overall loss on a small subset of the training data. To mitigate the abrupt accuracy drop due to pruning, the network is retrained briefly after each reduction of a predefined number of floating-point operations (FLOPs). The optimal pruning rates for each layer in the network are automatically determined, eliminating the need for manual allocation of fixed or variable pruning rates for each layer. Experiments on the VGGNet and ResNet models on the CIFAR-10 and ImageNet benchmark datasets demonstrate the effectiveness of the proposed method. In particular, the ResNet56 and ResNet110 models on the CIFAR-10 dataset significantly improve the top-1 accuracy compared to state-of-the-art methods while reducing the network FLOPs by 52\%. Furthermore, the ResNet50 model on the ImageNet dataset reduces FLOPs by more than 42\% with a negligible 0.33\% drop in top-5 accuracy. The source code of this paper is publicly available online - https://github.com/ghimiredhikura/laasp.

Updated: 2025-06-25 06:18:46

标题: Loss感知的深度神经网络加速结构剪枝准则的自动选择

摘要: 结构化剪枝是一种用于压缩神经网络的成熟技术，使其适用于资源有限的边缘设备部署。本文提出了一种高效的Loss-Aware Automatic Selection of Structured Pruning Criteria (LAASP)方法，用于精简和加速深度神经网络。大多数剪枝方法采用由三个阶段组成的顺序过程：1）训练，2）剪枝，和3）微调，而所提出的剪枝技术采用了一种在训练过程中进行剪枝的方法，消除了第一阶段，并将第二和第三阶段整合成一个单一循环。根据训练数据的一个小子集上的网络整体损失，自动选择基于大小或相似性的滤波器剪枝标准以及每次剪枝迭代中的特定剪枝层。为了减轻由于剪枝而导致的精度突然下降，网络在减少预定义数量的浮点操作（FLOPs）后会被简要重新训练。网络中每个层的最佳剪枝率都会自动确定，消除了对每个层手动分配固定或可变剪枝率的需求。在CIFAR-10和ImageNet基准数据集上对VGGNet和ResNet模型进行的实验显示了所提出方法的有效性。特别是，在CIFAR-10数据集上，ResNet56和ResNet110模型相比于最先进方法显著提高了top-1准确率，同时将网络FLOPs减少了52％。此外，在ImageNet数据集上，ResNet50模型将FLOPs减少了超过42％，而top-5准确率仅下降了0.33％。本文的源代码已在网上公开可获取 - https://github.com/ghimiredhikura/laasp。

更新时间: 2025-06-25 06:18:46

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2506.20152v1

EAR: Erasing Concepts from Unified Autoregressive Models

Autoregressive (AR) models have achieved unified and strong performance across both visual understanding and image generation tasks. However, removing undesired concepts from AR models while maintaining overall generation quality remains an open challenge. In this paper, we propose Erasure Autoregressive Model (EAR), a fine-tuning method for effective and utility-preserving concept erasure in AR models. Specifically, we introduce Windowed Gradient Accumulation (WGA) strategy to align patch-level decoding with erasure objectives, and Thresholded Loss Masking (TLM) strategy to protect content unrelated to the target concept during fine-tuning. Furthermore, we propose a novel benchmark, Erase Concept Generator and Visual Filter (ECGVF), aim at provide a more rigorous and comprehensive foundation for evaluating concept erasure in AR models. Specifically, we first employ structured templates across diverse large language models (LLMs) to pre-generate a large-scale corpus of target-replacement concept prompt pairs. Subsequently, we generate images from these prompts and subject them to rigorous filtering via a visual classifier to ensure concept fidelity and alignment. Extensive experimental results conducted on the ECGVF benchmark with the AR model Janus-Pro demonstrate that EAR achieves marked improvements in both erasure effectiveness and model utility preservation. Code is available at: https://github.com/immc-lab/ear/

Updated: 2025-06-25 06:15:07

标题: EAR：从统一自回归模型中擦除概念

摘要: Autoregressive (AR) 模型在视觉理解和图像生成任务中取得了统一和强大的性能。然而，在保持整体生成质量的同时，从 AR 模型中消除不需要的概念仍然是一个开放性挑战。在本文中，我们提出了 Erasure Autoregressive Model (EAR)，这是一种用于在 AR 模型中有效和保留实用性的概念擦除的微调方法。具体来说，我们引入了 Windowed Gradient Accumulation (WGA) 策略，以将补丁级别的解码与擦除目标对齐，并引入了 Thresholded Loss Masking (TLM) 策略，在微调过程中保护与目标概念无关的内容。此外，我们提出了一个新颖的基准，Erase Concept Generator and Visual Filter (ECGVF)，旨在为评估 AR 模型中概念擦除提供更严格和全面的基础。具体地，我们首先使用结构化模板跨越多样的大型语言模型 (LLMs) 预生成一个大规模的目标替换概念提示对。随后，我们从这些提示生成图像，并通过视觉分类器进行严格过滤，以确保概念的真实性和对齐性。在 ECGVF 基准上进行的广泛实验结果表明，EAR 在擦除效果和模型实用性保留方面取得了显著的改进。代码可在以下链接找到：https://github.com/immc-lab/ear/

更新时间: 2025-06-25 06:15:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.20151v1

Active Learning of Deep Neural Networks via Gradient-Free Cutting Planes

Active learning methods aim to improve sample complexity in machine learning. In this work, we investigate an active learning scheme via a novel gradient-free cutting-plane training method for ReLU networks of arbitrary depth and develop a convergence theory. We demonstrate, for the first time, that cutting-plane algorithms, traditionally used in linear models, can be extended to deep neural networks despite their nonconvexity and nonlinear decision boundaries. Moreover, this training method induces the first deep active learning scheme known to achieve convergence guarantees, revealing a geometric contraction rate of the feasible set. We exemplify the effectiveness of our proposed active learning method against popular deep active learning baselines via both synthetic data experiments and sentimental classification task on real datasets.

Updated: 2025-06-25 06:11:27

标题: 通过无梯度切平面的主动学习深度神经网络

摘要: 主动学习方法旨在改善机器学习中的样本复杂性。在这项工作中，我们通过一种新颖的无梯度切平面训练方法研究了一种主动学习方案，用于任意深度的ReLU网络，并开发了一个收敛理论。我们首次证明，尽管深度神经网络具有非凸性和非线性决策边界，但传统上用于线性模型的切平面算法可以扩展到深度神经网络中。此外，这种训练方法引发了第一个已知能够实现收敛保证的深度主动学习方案，揭示了可行集的几何收缩率。我们通过合成数据实验和对真实数据集进行的情感分类任务，展示了我们提出的主动学习方法相对于流行的深度主动学习基线的有效性。

更新时间: 2025-06-25 06:11:27

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2410.02145v5

PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models

Physics problem-solving is a challenging domain for large AI models, requiring integration of conceptual understanding, mathematical reasoning, and interpretation of physical diagrams. Current evaluation methodologies show notable limitations in capturing the breadth and complexity of undergraduate-level physics, underscoring the need for more rigorous assessments. To this end, we present PhysUniBench, a large-scale multimodal benchmark designed to evaluate and improve the reasoning capabilities of multimodal large language models (MLLMs) specifically on undergraduate-level physics problems. PhysUniBench consists of 3,304 physics questions spanning 8 major sub-disciplines of physics, each accompanied by one visual diagrams. The benchmark includes both open-ended and multiple-choice questions, systematically curated and difficulty-rated through an iterative model-in-the-loop process. The benchmark's construction involved a rigorous multi-stage process, including multiple roll-outs, expert-level evaluation, automated filtering of easily solved problems, and a nuanced difficulty grading system with five levels. Through extensive experiments, we observe that current state-of-the-art models encounter substantial challenges in physics reasoning. For example, GPT-4o mini achieves only about 34.2% accuracy in the proposed PhysUniBench. These results highlight that current MLLMs struggle with advanced physics reasoning, especially on multi-step problems and those requiring precise diagram interpretation. By providing a broad and rigorous assessment tool, PhysUniBench aims to drive progress in AI for Science, encouraging the development of models with stronger physical reasoning, problem-solving skills, and multimodal understanding. The benchmark and evaluation scripts are available at https://prismax-team.github.io/PhysUniBenchmark/.

Updated: 2025-06-25 06:09:22

标题: PhysUniBench：面向多模型模型的本科物理推理基准

摘要: 物理问题解决是一个具有挑战性的领域，对于大型人工智能模型来说，需要整合概念理解、数学推理和对物理图表的解释。当前的评估方法在捕捉本科物理的广度和复杂性方面显示出明显的局限性，强调了对更严格评估的需求。为此，我们提出了PhysUniBench，这是一个大规模多模态基准，旨在评估和提高多模态大型语言模型（MLLMs）在本科物理问题上的推理能力。PhysUniBench包括3,304个涵盖物理学8个主要子学科的问题，每个问题都附带一个视觉图表。该基准包括开放式和多项选择题，通过迭代模型循环过程系统地策划和难度评定。基准的构建涉及一个严格的多阶段过程，包括多次滚动，专家级评估，自动筛选易于解决的问题，以及一个具有五个级别的细致难度评分系统。通过广泛的实验，我们观察到当前最先进的模型在物理推理方面面临着重大挑战。例如，GPT-4o mini在提出的PhysUniBench中的准确率仅约为34.2%。这些结果突显出当前MLLMs在高级物理推理方面遇到困难，特别是在多步问题和需要精确图表解释的问题上。通过提供一个广泛和严格的评估工具，PhysUniBench旨在推动科学人工智能的进步，鼓励开发具有更强物理推理、问题解决能力和多模态理解的模型。该基准和评估脚本可在https://prismax-team.github.io/PhysUniBenchmark/ 上找到。

更新时间: 2025-06-25 06:09:22

领域: cs.AI

下载: http://arxiv.org/abs/2506.17667v2

USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting

Spike cameras, as an innovative neuromorphic camera that captures scenes with the 0-1 bit stream at 40 kHz, are increasingly employed for the 3D reconstruction task via Neural Radiance Fields (NeRF) or 3D Gaussian Splatting (3DGS). Previous spike-based 3D reconstruction approaches often employ a casecased pipeline: starting with high-quality image reconstruction from spike streams based on established spike-to-image reconstruction algorithms, then progressing to camera pose estimation and 3D reconstruction. However, this cascaded approach suffers from substantial cumulative errors, where quality limitations of initial image reconstructions negatively impact pose estimation, ultimately degrading the fidelity of the 3D reconstruction. To address these issues, we propose a synergistic optimization framework, \textbf{USP-Gaussian}, that unifies spike-based image reconstruction, pose correction, and Gaussian splatting into an end-to-end framework. Leveraging the multi-view consistency afforded by 3DGS and the motion capture capability of the spike camera, our framework enables a joint iterative optimization that seamlessly integrates information between the spike-to-image network and 3DGS. Experiments on synthetic datasets with accurate poses demonstrate that our method surpasses previous approaches by effectively eliminating cascading errors. Moreover, we integrate pose optimization to achieve robust 3D reconstruction in real-world scenarios with inaccurate initial poses, outperforming alternative methods by effectively reducing noise and preserving fine texture details. Our code, data and trained models will be available at https://github.com/chenkang455/USP-Gaussian.

Updated: 2025-06-25 06:07:36

标题: USP-Gaussian：统一基于脉冲的图像重建、姿态校正和高斯喷溅

摘要: 尖峰摄像机作为一种创新的神经形态摄像机，以40 kHz的0-1位流捕捉场景，越来越多地被用于通过神经辐射场（NeRF）或3D高斯斑点（3DGS）进行3D重建任务。先前基于尖峰的3D重建方法通常采用级联管道：从基于已建立的尖峰到图像重建算法的尖峰流的高质量图像重建开始，然后进行相机姿态估计和3D重建。然而，这种级联方法存在实质性的累积误差，其中初始图像重建的质量限制对姿态估计产生负面影响，最终降低了3D重建的保真度。为了解决这些问题，我们提出了一个协同优化框架，\textbf{USP-Gaussian}，将基于尖峰的图像重建、姿态校正和高斯斑点化为端到端框架。利用3DGS提供的多视图一致性和尖峰相机的运动捕捉能力，我们的框架实现了一个联合迭代优化，无缝地整合了尖峰到图像网络和3DGS之间的信息。在具有准确姿态的合成数据集上的实验表明，我们的方法通过有效消除级联错误超越了先前的方法。此外，我们整合了姿态优化，以在具有不准确初始姿态的实际场景中实现强大的3D重建，通过有效减少噪音并保留精细纹理细节，优于替代方法。我们的代码、数据和训练模型将在https://github.com/chenkang455/USP-Gaussian 上提供。

更新时间: 2025-06-25 06:07:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.10504v2

Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners

Despite significant advancements in Large Language Models (LLMs), developing advanced reasoning capabilities in LLMs remains a key challenge. Process Reward Models (PRMs) have demonstrated exceptional promise in enhancing reasoning by providing step-wise feedback, particularly in the context of mathematical reasoning. However, their application to broader reasoning domains remains understudied, largely due to the high costs associated with manually creating step-level supervision. In this work, we explore the potential of PRMs in graph reasoning problems - a domain that demands sophisticated multi-step reasoning and offers opportunities for automated step-level data generation using established graph algorithms. We introduce GraphSILO, the largest dataset for graph reasoning problems with fine-grained step-wise labels, built using automated Task-oriented Trajectories and Monte Carlo Tree Search (MCTS) to generate detailed reasoning steps with step-wise labels. Building upon this dataset, we train GraphPRM, the first PRM designed for graph reasoning problems, and evaluate its effectiveness in two key settings: inference-time scaling and reinforcement learning via Direct Preference Optimization (DPO). Experimental results show that GraphPRM significantly improves LLM performance across 13 graph reasoning tasks, delivering a 9% gain for Qwen2.5-7B and demonstrating transferability to new graph reasoning datasets and new reasoning domains like mathematical problem-solving. Notably, GraphPRM enhances LLM performance on GSM8K and Math500, underscoring the cross-domain applicability of graph-based reasoning rewards. Our findings highlight the potential of PRMs in advancing reasoning across diverse domains, paving the way for more versatile and effective LLMs.

Updated: 2025-06-25 06:00:08

标题: 奖励图推理过程使LLMs成为更具一般化推理能力的模型

摘要: 尽管大型语言模型（LLMs）取得了重大进展，但在LLMs中开发先进的推理能力仍然是一个关键挑战。过程奖励模型（PRMs）通过提供逐步反馈，在数学推理的背景下表现出卓越的潜力来增强推理能力。然而，由于手动创建步骤级监督所带来的高成本，它们在更广泛的推理领域的应用尚未得到充分研究。在这项工作中，我们探讨了PRMs在图推理问题中的潜力-这是一个需要复杂的多步推理并提供使用已建立的图算法自动生成详细推理步骤的机会的领域。我们引入GraphSILO，这是一个具有细粒度步骤标签的图推理问题的最大数据集，使用自动化的任务导向轨迹和蒙特卡罗树搜索（MCTS）来生成具有步骤标签的详细推理步骤。在这个数据集的基础上，我们训练了GraphPRM，这是第一个专为图推理问题设计的PRM，并在两个关键设置中评估了它的有效性：推理时间的扩展和通过直接偏好优化（DPO）进行强化学习。实验结果表明，GraphPRM显著提高了LLM在13个图推理任务中的性能，为Qwen2.5-7B带来了9%的收益，并展示了对新的图推理数据集和新的推理领域如数学问题解决的可转移性。值得注意的是，GraphPRM提高了LLM在GSM8K和Math500上的性能，强调了基于图的推理奖励的跨领域适用性。我们的发现突出了PRMs在推进各种领域的推理方面的潜力，为更多多功能且有效的LLMs铺平了道路。

更新时间: 2025-06-25 06:00:08

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.00845v2

Counterfactual Fairness through Transforming Data Orthogonal to Bias

Machine learning models have shown exceptional prowess in solving complex issues across various domains. However, these models can sometimes exhibit biased decision-making, resulting in unequal treatment of different groups. Despite substantial research on counterfactual fairness, methods to reduce the impact of multivariate and continuous sensitive variables on decision-making outcomes are still underdeveloped. We propose a novel data pre-processing algorithm, Orthogonal to Bias (OB), which is designed to eliminate the influence of a group of continuous sensitive variables, thus promoting counterfactual fairness in machine learning applications. Our approach, based on the assumption of a jointly normal distribution within a structural causal model (SCM), demonstrates that counterfactual fairness can be achieved by ensuring the data is orthogonal to the observed sensitive variables. The OB algorithm is model-agnostic, making it applicable to a wide range of machine learning models and tasks. Additionally, it includes a sparse variant to improve numerical stability through regularization. Empirical evaluations on both simulated and real-world datasets, encompassing settings with both discrete and continuous sensitive variables, show that our methodology effectively promotes fairer outcomes without compromising accuracy.

Updated: 2025-06-25 05:35:44

标题: 通过将数据转换为与偏见无关的方式实现反事实公平

摘要: 机器学习模型在解决各个领域复杂问题方面展示出了卓越的能力。然而，这些模型有时会表现出有偏见的决策，导致对不同群体的不平等对待。尽管针对反事实公平性进行了大量研究，但如何减少多变量和连续敏感变量对决策结果的影响的方法仍然不够完善。我们提出了一种新颖的数据预处理算法，即正交于偏差（OB），旨在消除一组连续敏感变量的影响，从而促进机器学习应用中的反事实公平性。我们的方法基于在结构因果模型（SCM）中联合正态分布的假设，表明通过确保数据正交于观察到的敏感变量，可以实现反事实公平性。OB算法是与模型无关的，适用于各种机器学习模型和任务。此外，它还包括一种稀疏变体，通过正则化来提高数值稳定性。在模拟和真实世界数据集上进行的实证评估涵盖了包括离散和连续敏感变量的设置，表明我们的方法有效地促进了更公平的结果，而又不损害准确性。

更新时间: 2025-06-25 05:35:44

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.17852v3

Accept More, Reject Less: Reducing up to 19% Unnecessary Desk-Rejections over 11 Years of ICLR Data

The explosive growth of AI research has driven paper submissions at flagship AI conferences to unprecedented levels, necessitating many venues in 2025 (e.g., CVPR, ICCV, KDD, AAAI, IJCAI, WSDM) to enforce strict per-author submission limits and to desk-reject any excess papers by simple ID order. While this policy helps reduce reviewer workload, it may unintentionally discard valuable papers and penalize authors' efforts. In this paper, we ask an essential research question on whether it is possible to follow submission limits while minimizing needless rejections. We first formalize the current desk-rejection policies as an optimization problem, and then develop a practical algorithm based on linear programming relaxation and a rounding scheme. Under extensive evaluation on 11 years of real-world ICLR (International Conference on Learning Representations) data, our method preserves up to $19.23\%$ more papers without violating any author limits. Moreover, our algorithm is highly efficient in practice, with all results on ICLR data computed within at most 53.64 seconds. Our work provides a simple and practical desk-rejection strategy that significantly reduces unnecessary rejections, demonstrating strong potential to improve current CS conference submission policies.

Updated: 2025-06-25 05:23:44

标题: 接受更多，拒绝更少：在11年的ICLR数据中减少高达19%的不必要的桌面拒绝

摘要: AI研究的爆炸性增长推动了AI主要会议的论文提交达到了前所未有的水平，迫使2025年的许多会议（如CVPR、ICCV、KDD、AAAI、IJCAI、WSDM）实施严格的每位作者提交限制，并通过简单的ID顺序拒绝任何多余的论文。虽然这一政策有助于减少审稿人的工作量，但可能会无意中丢弃有价值的论文并惩罚作者的努力。在本文中，我们提出了一个关键的研究问题，即是否可能在尽量减少不必要的拒绝的情况下遵守提交限制。我们首先将当前的桌面拒绝政策形式化为一个优化问题，然后基于线性规划松弛和舍入方案开发了一个实用算法。通过对11年实际ICLR（国际学习表示会议）数据的广泛评估，我们的方法可以在不违反任何作者限制的情况下保留多达19.23％的论文。此外，我们的算法在实践中非常高效，在ICLR数据上的所有结果最多在53.64秒内计算完毕。我们的工作提供了一种简单而实用的桌面拒绝策略，显著减少了不必要的拒绝，展示了改进当前计算机科学会议提交政策的巨大潜力。

更新时间: 2025-06-25 05:23:44

领域: cs.DS,cs.CY,cs.DL,cs.IR,cs.LG

下载: http://arxiv.org/abs/2506.20141v1

C3S3: Complementary Competition and Contrastive Selection for Semi-Supervised Medical Image Segmentation

For the immanent challenge of insufficiently annotated samples in the medical field, semi-supervised medical image segmentation (SSMIS) offers a promising solution. Despite achieving impressive results in delineating primary target areas, most current methodologies struggle to precisely capture the subtle details of boundaries. This deficiency often leads to significant diagnostic inaccuracies. To tackle this issue, we introduce C3S3, a novel semi-supervised segmentation model that synergistically integrates complementary competition and contrastive selection. This design significantly sharpens boundary delineation and enhances overall precision. Specifically, we develop an Outcome-Driven Contrastive Learning module dedicated to refining boundary localization. Additionally, we incorporate a Dynamic Complementary Competition module that leverages two high-performing sub-networks to generate pseudo-labels, thereby further improving segmentation quality. The proposed C3S3 undergoes rigorous validation on two publicly accessible datasets, encompassing the practices of both MRI and CT scans. The results demonstrate that our method achieves superior performance compared to previous cutting-edge competitors. Especially, on the 95HD and ASD metrics, our approach achieves a notable improvement of at least 6%, highlighting the significant advancements. The code is available at https://github.com/Y-TARL/C3S3.

Updated: 2025-06-25 05:23:29

标题: C3S3：半监督医学图像分割的互补竞争和对比选择

摘要: 针对医学领域中标注不足的样本的即期挑战，半监督医学图像分割（SSMIS）提供了一个有前途的解决方案。尽管在勾画主要目标区域方面取得了令人印象深刻的结果，但大多数当前的方法论在精确捕捉边界的微妙细节方面仍存在困难。这种不足往往导致重要的诊断不准确。为了解决这个问题，我们引入了C3S3，这是一个新颖的半监督分割模型，它综合了竞争和对比选择的互补效果。这种设计显著地增强了边界勾画的清晰度，提高了整体精度。具体来说，我们开发了一个专门用于优化边界定位的Outcome-Driven对比学习模块。此外，我们还整合了一个动态的互补竞争模块，利用两个表现优异的子网络生成伪标签，从而进一步提高分割质量。我们提出的C3S3在两个公开可访问的数据集上经过严格的验证，涵盖了MRI和CT扫描的实践。结果表明，我们的方法相对于先前的尖端竞争对手实现了更优越的性能。特别是，在95HD和ASD指标上，我们的方法至少实现了6%的显著改进，突显了重要的进展。代码可以在https://github.com/Y-TARL/C3S3 上找到。

更新时间: 2025-06-25 05:23:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.07368v2

Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis

A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ($\epsilon$-PLA) has emerged as a popular choice due to its simplicity and effectiveness. Despite its central role in many learned indexes, the design and analysis of $\epsilon$-PLA fitting algorithms remain underexplored. In this paper, we revisit $\epsilon$-PLA from both theoretical and empirical perspectives, with a focus on its application in learned index structures. We first establish a fundamentally improved lower bound of $\Omega(\kappa \cdot \epsilon^2)$ on the expected segment coverage for existing $\epsilon$-PLA fitting algorithms, where $\kappa$ is a data-dependent constant. We then present a comprehensive benchmark of state-of-the-art $\epsilon$-PLA algorithms when used in different learned data structures. Our results highlight key trade-offs among model accuracy, model size, and query performance, providing actionable guidelines for the principled design of future learned data structures.

Updated: 2025-06-25 05:20:54

标题: 学习型索引结构中的分段线性逼近：理论和实证分析

摘要: 数据库和系统社区中一个不断增长的趋势是利用机器学习模型来增强传统的索引结构，例如B+-树。在这些模型中，基于误差限制的分段线性逼近（$\epsilon$-PLA）由于其简单性和有效性而成为一个流行的选择。尽管在许多学习索引中扮演重要角色，但$\epsilon$-PLA拟合算法的设计和分析仍未得到充分探讨。本文从理论和实证两个角度重新审视了$\epsilon$-PLA，重点关注其在学习索引结构中的应用。我们首先建立了现有$\epsilon$-PLA拟合算法在期望分段覆盖率上的基本改进下界为$\Omega(\kappa \cdot \epsilon^2)$，其中$\kappa$是一个数据相关的常数。然后我们提出了一项最新$\epsilon$-PLA算法在不同学习数据结构中的全面基准测试。我们的结果突出了模型精度、模型大小和查询性能之间的关键权衡，为未来学习数据结构的原则设计提供了可操作的指导。

更新时间: 2025-06-25 05:20:54

领域: cs.DB,cs.LG

下载: http://arxiv.org/abs/2506.20139v1

AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation

We present AnchorDP3, a diffusion policy framework for dual-arm robotic manipulation that achieves state-of-the-art performance in highly randomized environments. AnchorDP3 integrates three key innovations: (1) Simulator-Supervised Semantic Segmentation, using rendered ground truth to explicitly segment task-critical objects within the point cloud, which provides strong affordance priors; (2) Task-Conditioned Feature Encoders, lightweight modules processing augmented point clouds per task, enabling efficient multi-task learning through a shared diffusion-based action expert; (3) Affordance-Anchored Keypose Diffusion with Full State Supervision, replacing dense trajectory prediction with sparse, geometrically meaningful action anchors, i.e., keyposes such as pre-grasp pose, grasp pose directly anchored to affordances, drastically simplifying the prediction space; the action expert is forced to predict both robot joint angles and end-effector poses simultaneously, which exploits geometric consistency to accelerate convergence and boost accuracy. Trained on large-scale, procedurally generated simulation data, AnchorDP3 achieves a 98.7% average success rate in the RoboTwin benchmark across diverse tasks under extreme randomization of objects, clutter, table height, lighting, and backgrounds. This framework, when integrated with the RoboTwin real-to-sim pipeline, has the potential to enable fully autonomous generation of deployable visuomotor policies from only scene and instruction, totally eliminating human demonstrations from learning manipulation skills.

Updated: 2025-06-25 05:10:04

标题: AnchorDP3：基于3D可及性引导的稀疏扩散策略，用于机器人操作。

摘要: 我们提出AnchorDP3，这是一个用于双臂机器人操作的扩散策略框架，在高度随机化的环境中实现了最先进的性能。AnchorDP3集成了三个关键创新：（1）模拟器监督的语义分割，使用渲染的地面真实数据来明确地在点云中分割任务关键对象，提供强大的先验信息；（2）任务条件特征编码器，轻量级模块处理每个任务的增强点云，通过共享扩散式行动专家实现高效的多任务学习；（3）具有完全状态监督的作用锚定关键位姿扩散，用稀疏、几何意义明确的动作锚点代替密集的轨迹预测，即直接锚定到能力的关键位姿，极大简化了预测空间；行动专家被迫同时预测机器人关节角度和末端执行器姿势，利用几何一致性加速收敛并提高准确性。在大规模、程序生成的模拟数据上训练，AnchorDP3在RoboTwin基准测试中实现了98.7%的平均成功率，涵盖了在极端物体随机化、混乱、桌子高度、光线和背景下的各种任务。当与RoboTwin真实到模拟管道集成时，该框架有潜力从仅场景和指令中完全自主生成可部署的视觉动作策略，完全消除了人类示范学习操作技能的需要。

更新时间: 2025-06-25 05:10:04

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2506.19269v2

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

With the growing integration of vision-language models (VLMs), mobile agents are now widely used for tasks like UI automation and camera-based user assistance. These agents are often fine-tuned on limited user-generated datasets, leaving them vulnerable to covert threats during the training process. In this work we present GHOST, the first clean-label backdoor attack specifically designed for mobile agents built upon VLMs. Our method manipulates only the visual inputs of a portion of the training samples - without altering their corresponding labels or instructions - thereby injecting malicious behaviors into the model. Once fine-tuned with this tampered data, the agent will exhibit attacker-controlled responses when a specific visual trigger is introduced at inference time. The core of our approach lies in aligning the gradients of poisoned samples with those of a chosen target instance, embedding backdoor-relevant features into the poisoned training data. To maintain stealth and enhance robustness, we develop three realistic visual triggers: static visual patches, dynamic motion cues, and subtle low-opacity overlays. We evaluate our method across six real-world Android apps and three VLM architectures adapted for mobile use. Results show that our attack achieves high attack success rates (up to 94.67 percent) while maintaining high clean-task performance (FSR up to 95.85 percent). Additionally, ablation studies shed light on how various design choices affect the efficacy and concealment of the attack. Overall, this work is the first to expose critical security flaws in VLM-based mobile agents, highlighting their susceptibility to clean-label backdoor attacks and the urgent need for effective defense mechanisms in their training pipelines.

Updated: 2025-06-25 05:05:18

标题: 屏幕劫持：移动环境中VLM代理的视觉中毒

摘要: 随着视觉-语言模型（VLMs）的不断整合，移动代理现在被广泛用于UI自动化和基于摄像头的用户辅助等任务。这些代理通常在有限的用户生成数据集上进行微调，使它们在训练过程中容易受到隐蔽威胁。在这项工作中，我们提出了GHOST，这是第一个专门为基于VLMs构建的移动代理设计的干净标签后门攻击。我们的方法仅操纵部分训练样本的视觉输入，而不改变它们对应的标签或指令，从而将恶意行为注入模型中。一旦用这些篡改的数据进行微调，当在推断时引入特定的视觉触发器时，代理将展示出受攻击者控制的响应。我们方法的核心在于将受污染样本的梯度与所选目标实例的梯度对齐，将后门相关特征嵌入到受污染的训练数据中。为了保持隐蔽性并增强稳健性，我们开发了三种现实的视觉触发器：静态视觉补丁、动态运动线索和微妙的低透明度覆盖物。我们评估了我们的方法在六个真实的Android应用程序和三个为移动使用而调整的VLM架构上的效果。结果显示，我们的攻击实现了高攻击成功率（高达94.67％），同时保持了高的干净任务性能（FSR高达95.85％）。此外，消融研究揭示了各种设计选择如何影响攻击的有效性和隐蔽性。总体而言，这项工作是第一个揭示了基于VLM的移动代理存在关键安全缺陷，突出了它们对干净标签后门攻击的易受性，以及对其训练流程中有效防御机制的迫切需求。

更新时间: 2025-06-25 05:05:18

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2506.13205v3

TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis

The rise of time-series pre-trained models has advanced temporal representation learning, but current state-of-the-art models are often large-scale, requiring substantial compute. We introduce TSPulse, ultra-compact time-series pre-trained models with only 1M parameters, specialized to perform strongly across classification, anomaly detection, imputation, and retrieval tasks. TSPulse introduces innovations at both the architecture and task levels. At the architecture level, it employs a dual-space masked reconstruction, learning from both time and frequency domains to capture complementary signals. This is further enhanced by a dual-embedding disentanglement, generating both detailed embeddings for fine-grained analysis and high-level semantic embeddings for broader task understanding. Notably, TSPulse's semantic embeddings are robust to shifts in time, magnitude, and noise, which is important for robust retrieval. At the task level, TSPulse incorporates TSLens, a fine-tuning component enabling task-specific feature attention. It also introduces a multi-head triangulation technique that correlates deviations from multiple prediction heads, enhancing anomaly detection by fusing complementary model outputs. Additionally, a hybrid mask pretraining is proposed to improves zero-shot imputation by reducing pre-training bias. These architecture and task innovations collectively contribute to TSPulse's significant performance gains: 5-16% on the UEA classification benchmarks, +20% on the TSB-AD anomaly detection leaderboard, +50% in zero-shot imputation, and +25% in time-series retrieval. Remarkably, these results are achieved with just 1M parameters (10-100X smaller than existing SOTA models) and allow GPU-free inference, setting a new standard for efficient time-series pre-trained models. The models can be accessed from https://huggingface.co/ibm-granite/granite-timeseries-tspulse-r1

Updated: 2025-06-25 04:59:41

标题: TSPulse：用于快速时间序列分析的双空间微型预训练模型

摘要: 时间序列预训练模型的崛起推动了时间表示学习的进步，但当前的最先进模型通常规模庞大，需要大量计算。我们介绍了TSPulse，这是一种超紧凑的时间序列预训练模型，只有100万个参数，专门设计用于在分类、异常检测、填补和检索任务中表现出色。TSPulse在架构和任务级别上引入了创新。在架构级别上，它采用双空间掩蔽重构，从时间和频率域学习，捕捉互补信号。这进一步由双嵌入解耦增强，生成了既适用于细粒度分析的详细嵌入，又适用于更广泛任务理解的高级语义嵌入。值得注意的是，TSPulse的语义嵌入对时间、幅度和噪声的变化具有很强的鲁棒性，这对于稳健的检索非常重要。在任务级别上，TSPulse集成了TSLens，这是一个微调组件，可以实现任务特定的特征关注。它还引入了一种多头三角测量技术，通过融合多个预测头的偏差来增强异常检测。此外，提出了一种混合掩模预训练方法，通过减少预训练偏差来改善零样本填补。这些架构和任务创新共同促成了TSPulse显著的性能提升：在UEA分类基准测试中提高了5-16％，在TSB-AD异常检测排行榜上提高了20％，零样本填补提高了50％，时间序列检索提高了25％。值得注意的是，这些结果仅使用100万个参数实现（比现有SOTA模型小10-100倍），并且可以进行无GPU推理，为高效的时间序列预训练模型设定了新的标准。可以从https://huggingface.co/ibm-granite/granite-timeseries-tspulse-r1访问这些模型。

更新时间: 2025-06-25 04:59:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.13033v2

High-Resolution Live Fuel Moisture Content (LFMC) Maps for Wildfire Risk from Multimodal Earth Observation Data

Wildfires are increasing in intensity and severity at an alarming rate. Recent advances in AI and publicly available satellite data enable monitoring critical wildfire risk factors globally, at high resolution and low latency. Live Fuel Moisture Content (LFMC) is a critical wildfire risk factor and is valuable for both wildfire research and operational response. However, ground-based LFMC samples are both labor intensive and costly to acquire, resulting in sparse and infrequent updates. In this work, we explore the use of a pretrained, highly-multimodal earth-observation model for generating large-scale spatially complete (wall-to-wall) LFMC maps. Our approach achieves significant improvements over previous methods using randomly initialized models (20 reduction in RMSE). We provide an automated pipeline that enables rapid generation of these LFMC maps across the United States, and demonstrate its effectiveness in two regions recently impacted by wildfire (Eaton and Palisades).

Updated: 2025-06-25 04:59:10

标题: 多模式地球观测数据生成的高分辨率实时燃料湿度（LFMC）地图用于野火风险

摘要: 森林火灾以惊人的速度增加着强度和严重程度。人工智能和公开可用的卫星数据的最新进展使得全球关键森林火灾风险因素能够以高分辨率和低延迟进行监测。活体燃料含水量（LFMC）是一个关键的森林火灾风险因素，对于森林火灾研究和操作响应都非常有价值。然而，基于地面的LFMC样本既需要大量人力成本又昂贵，导致更新稀疏且不经常。在这项工作中，我们探索了使用预训练的高度多模态地球观测模型来生成大规模空间完整（墙对墙）LFMC地图。我们的方法比使用随机初始化模型的先前方法实现了显著改进（RMSE减少20％）。我们提供了一个自动化流程，可以在美国范围内快速生成这些LFMC地图，并展示了其在最近受到森林火灾影响的两个地区（伊顿和帕利塞德）的有效性。

更新时间: 2025-06-25 04:59:10

领域: cs.LG

下载: http://arxiv.org/abs/2506.20132v1

AI Copilots for Reproducibility in Science: A Case Study

Open science initiatives seek to make research outputs more transparent, accessible, and reusable, but ensuring that published findings can be independently reproduced remains a persistent challenge. This paper introduces OpenPub, an AI-powered platform that supports researchers, reviewers, and readers through a suite of modular copilots focused on key open science tasks. In this work, we present the Reproducibility Copilot, which analyzes manuscripts, code, and supplementary materials to generate structured Jupyter Notebooks and recommendations aimed at facilitating computational, or "rote", reproducibility. We conducted feasibility tests using previously studied research papers with known reproducibility benchmarks. Results indicate that OpenPub can substantially reduce reproduction time - from over 30 hours to about 1 hour - while achieving high coverage of figures, tables, and results suitable for computational reproduction. The system systematically detects barriers to reproducibility, including missing hyperparameters, undocumented preprocessing steps, and incomplete or inaccessible datasets. These findings suggest that AI-driven tools can meaningfully reduce the burden of reproducibility efforts and contribute to more transparent and verifiable scientific communication. The modular copilot architecture also provides a foundation for extending AI assistance to additional open science objectives beyond reproducibility.

Updated: 2025-06-25 04:56:28

标题: 科学中的人工智能副驾驶员：一项案例研究

摘要: 开放科学倡议旨在使研究成果更加透明、可访问和可重复使用，但确保出版的发现可以被独立复制仍然是一个持续的挑战。本文介绍了OpenPub，这是一个由人工智能驱动的平台，通过一套专注于关键开放科学任务的模块化副驾驶员支持研究人员、审稿人和读者。在这项工作中，我们提出了可重复性副驾驶员，它分析手稿、代码和补充材料，生成结构化的Jupyter笔记本和建议，旨在促进计算或“机械”可重复性。我们使用先前研究过的具有已知可重复性基准的研究论文进行了可行性测试。结果表明，OpenPub可以大幅缩短再现时间 - 从超过30小时缩短到约1小时 - 同时实现高覆盖率，适用于计算再现的图表和结果。该系统系统地检测到可重复性的障碍，包括缺少的超参数、未记录的预处理步骤以及不完整或无法访问的数据集。这些发现表明，人工智能驱动的工具可以有效减轻可重复性工作的负担，并有助于更加透明和可验证的科学交流。模块化副驾驶员架构还为将人工智能辅助扩展到超出可重复性之外的其他开放科学目标提供了基础。

更新时间: 2025-06-25 04:56:28

领域: cs.AI

下载: http://arxiv.org/abs/2506.20130v1

Log-Linear Attention

The attention mechanism in Transformers is an important primitive for accurate and scalable sequence modeling. Its quadratic-compute and linear-memory complexity however remain significant bottlenecks. Linear attention and state-space models enable linear-time, constant-memory sequence modeling and can moreover be trained efficiently through matmul-rich parallelization across sequence length. However, at their core these models are still RNNs, and thus their use of a fixed-size hidden state to model the context is a fundamental limitation. This paper develops log-linear attention, an attention mechanism that balances linear attention's efficiency and the expressiveness of softmax attention. Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states. We show that with a particular growth function, log-linear attention admits a similarly matmul-rich parallel form whose compute cost is log-linear in sequence length. Log-linear attention is a general framework and can be applied on top of existing linear attention variants. As case studies, we instantiate log-linear variants of two recent architectures -- Mamba-2 and Gated DeltaNet -- and find they perform well compared to their linear-time variants.

Updated: 2025-06-25 04:54:28

标题: 对数线性注意力

摘要: Transformer中的注意力机制是准确和可扩展的序列建模的重要基元。然而，其二次计算和线性内存复杂度仍然是重要的瓶颈。线性注意力和状态空间模型使得序列建模具有线性时间和恒定内存，而且可以通过矩阵乘法密集的并行化有效训练。然而，这些模型本质上仍然是RNN，因此其使用固定大小的隐藏状态来建模上下文是一个基本限制。本文开发了对数线性注意力，一种平衡线性注意力效率和softmax注意力表达能力的注意力机制。对数线性注意力用对数增长的一组隐藏状态取代了固定大小的隐藏状态。我们展示了对于特定的增长函数，对数线性注意力具有一个类似于矩阵乘法密集并行形式，其计算成本与序列长度呈对数线性关系。对数线性注意力是一个通用框架，可以应用于现有的线性注意力变体之上。作为案例研究，我们实例化了两个最近架构的对数线性变体 - Mamba-2和Gated DeltaNet，并发现它们与其线性时间变体相比表现良好。

更新时间: 2025-06-25 04:54:28

领域: cs.LG

下载: http://arxiv.org/abs/2506.04761v2

CCRS: A Zero-Shot LLM-as-a-Judge Framework for Comprehensive RAG Evaluation

RAG systems enhance LLMs by incorporating external knowledge, which is crucial for domains that demand factual accuracy and up-to-date information. However, evaluating the multifaceted quality of RAG outputs, spanning aspects such as contextual coherence, query relevance, factual correctness, and informational completeness, poses significant challenges. Existing evaluation methods often rely on simple lexical overlap metrics, which are inadequate for capturing these nuances, or involve complex multi-stage pipelines with intermediate steps like claim extraction or require finetuning specialized judge models, hindering practical efficiency. To address these limitations, we propose CCRS (Contextual Coherence and Relevance Score), a novel suite of five metrics that utilizes a single, powerful, pretrained LLM as a zero-shot, end-to-end judge. CCRS evaluates: Contextual Coherence (CC), Question Relevance (QR), Information Density (ID), Answer Correctness (AC), and Information Recall (IR). We apply CCRS to evaluate six diverse RAG system configurations on the challenging BioASQ dataset. Our analysis demonstrates that CCRS effectively discriminates between system performances, confirming, for instance, that the Mistral-7B reader outperforms Llama variants. We provide a detailed analysis of CCRS metric properties, including score distributions, convergent/discriminant validity, tie rates, population statistics, and discriminative power. Compared to the complex RAGChecker framework, CCRS offers comparable or superior discriminative power for key aspects like recall and faithfulness, while being significantly more computationally efficient. CCRS thus provides a practical, comprehensive, and efficient framework for evaluating and iteratively improving RAG systems.

Updated: 2025-06-25 04:49:03

标题: CCRS：一种用于全面RAG评估的零射击LLM作为评判框架

摘要: RAG系统通过整合外部知识增强了LLMs，这对于那些需要准确和最新信息的领域至关重要。然而，评估RAG输出的多方面质量，涵盖了诸如语境连贯性、查询相关性、事实正确性和信息完整性等方面，面临着重大挑战。现有的评估方法通常依赖于简单的词汇重叠度量，无法捕捉这些微妙之处，或涉及复杂的多阶段流程，包括中间步骤如主张提取，或需要微调专门的评估模型，阻碍了实际效率。为了解决这些限制，我们提出了CCRS（上下文连贯性和相关性评分），这是一个全新的包含五个度量标准的套件，利用一个强大的预训练LLM作为零-shot、端到端的评估模型。CCRS评估了：上下文连贯性（CC）、问题相关性（QR）、信息密度（ID）、答案正确性（AC）和信息回忆（IR）。我们将CCRS应用于评估具有挑战性的BioASQ数据集上的六种不同的RAG系统配置。我们的分析表明，CCRS能有效区分系统性能，例如确认Mistral-7B阅读器优于Llama变体。我们提供了对CCRS度量属性的详细分析，包括得分分布、一致性/差异性验证、并列率、人口统计数据和区分力。与复杂的RAGChecker框架相比，CCRS在关键方面（如召回率和忠诚度）提供了可比或更好的区分力，同时在计算效率上显著更高。因此，CCRS为评估和不断改进RAG系统提供了一个实用、全面和高效的框架。

更新时间: 2025-06-25 04:49:03

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.20128v1

Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts

We investigate the generalization capabilities of small language models under two popular adaptation paradigms: few-shot prompting and supervised fine-tuning. While prompting is often favored for its parameter efficiency and flexibility, it remains unclear how robust this approach is in low-resource settings and under distributional shifts. This paper presents a comparative study of prompting and fine-tuning across task formats, prompt styles, and model scales, with a focus on their behavior in both in-distribution and out-of-distribution (OOD) settings. Beyond accuracy, we analyze the internal representations learned by each approach to assess the stability and abstraction of task-specific features. Our findings highlight critical differences in how small models internalize and generalize knowledge under different adaptation strategies. This work offers practical guidance for model selection in low-data regimes and contributes empirical insight into the ongoing debate over prompting versus fine-tuning. Code for the experiments is available at the following

Updated: 2025-06-25 04:27:25

标题: 通过提示、微调和超出分发提示评估小型LM中的泛化和表示稳定性

摘要: 我们研究了小型语言模型在两种流行的适应范式下的泛化能力：少样本提示和监督微调。虽然提示通常因其参数效率和灵活性而受人青睐，但在低资源环境和分布转移情况下，这种方法的鲁棒性仍不清楚。本文对提示和微调在任务格式、提示风格和模型规模等方面进行了比较研究，重点关注它们在分布内和分布外（OOD）环境中的行为。除了准确性，我们分析了每种方法学习的内部表示，以评估任务特定特征的稳定性和抽象性。我们的研究结果突出了小型模型在不同适应策略下内化和泛化知识的关键差异。这项工作为低数据制度中的模型选择提供了实用指导，并为关于提示与微调之争提供了实证见解。实验代码可在以下处获得。

更新时间: 2025-06-25 04:27:25

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.17289v2

Leveraging AI Graders for Missing Score Imputation to Achieve Accurate Ability Estimation in Constructed-Response Tests

Evaluating the abilities of learners is a fundamental objective in the field of education. In particular, there is an increasing need to assess higher-order abilities such as expressive skills and logical thinking. Constructed-response tests such as short-answer and essay-based questions have become widely used as a method to meet this demand. Although these tests are effective, they require substantial manual grading, making them both labor-intensive and costly. Item response theory (IRT) provides a promising solution by enabling the estimation of ability from incomplete score data, where human raters grade only a subset of answers provided by learners across multiple test items. However, the accuracy of ability estimation declines as the proportion of missing scores increases. Although data augmentation techniques for imputing missing scores have been explored in order to address this limitation, they often struggle with inaccuracy for sparse or heterogeneous data. To overcome these challenges, this study proposes a novel method for imputing missing scores by leveraging automated scoring technologies for accurate IRT-based ability estimation. The proposed method achieves high accuracy in ability estimation while markedly reducing manual grading workload.

Updated: 2025-06-25 04:17:57

标题: 利用AI评分器进行缺失分数插补，实现对主观题测试准确能力估计

摘要: 评估学习者能力是教育领域的一个基本目标。特别是，评估表达能力和逻辑思维等高阶能力的需求日益增加。构建响应测试，如简答题和基于论文的问题已经被广泛应用作为满足这一需求的方法。虽然这些测试是有效的，但它们需要大量的手工评分，使其既费时又昂贵。项目响应理论（IRT）通过从不完整的分数数据中的估计能力，使人类评分者仅对学习者在多个测试项目上提供的答案子集进行评分，提供了一个有希望的解决方案。然而，随着缺失分数比例的增加，能力估计的准确性将下降。虽然已经探索了用于填补缺失分数的数据增强技术，以解决这一限制，但它们经常在处理稀疏或异质数据时遇到不准确性。为了克服这些挑战，本研究提出了一种新的方法，通过利用自动评分技术来填补缺失分数，从而实现准确的基于IRT的能力估计。所提出的方法在能力估计方面取得了很高的准确性，同时显着减少了手工评分的工作量。

更新时间: 2025-06-25 04:17:57

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2506.20119v1

U-R-VEDA: Integrating UNET, Residual Links, Edge and Dual Attention, and Vision Transformer for Accurate Semantic Segmentation of CMRs

Artificial intelligence, including deep learning models, will play a transformative role in automated medical image analysis for the diagnosis of cardiac disorders and their management. Automated accurate delineation of cardiac images is the first necessary initial step for the quantification and automated diagnosis of cardiac disorders. In this paper, we propose a deep learning based enhanced UNet model, U-R-Veda, which integrates convolution transformations, vision transformer, residual links, channel-attention, and spatial attention, together with edge-detection based skip-connections for an accurate fully-automated semantic segmentation of cardiac magnetic resonance (CMR) images. The model extracts local-features and their interrelationships using a stack of combination convolution blocks, with embedded channel and spatial attention in the convolution block, and vision transformers. Deep embedding of channel and spatial attention in the convolution block identifies important features and their spatial localization. The combined edge information with channel and spatial attention as skip connection reduces information-loss during convolution transformations. The overall model significantly improves the semantic segmentation of CMR images necessary for improved medical image analysis. An algorithm for the dual attention module (channel and spatial attention) has been presented. Performance results show that U-R-Veda achieves an average accuracy of 95.2%, based on DSC metrics. The model outperforms the accuracy attained by other models, based on DSC and HD metrics, especially for the delineation of right-ventricle and left-ventricle-myocardium.

Updated: 2025-06-25 04:10:09

标题: 乌尔维达：将UNET、残差连接、边缘和双重注意力以及视觉变换器集成，实现对CMR的准确语义分割

摘要: 人工智能，包括深度学习模型，将在自动化医学图像分析中发挥转变性作用，用于诊断心脏疾病及其管理。自动准确地勾画心脏图像是量化和自动诊断心脏疾病的第一个必要步骤。本文提出了一种基于深度学习的增强UNet模型，U-R-Veda，它整合了卷积转换、视觉变换器、残差连接、通道关注和空间关注，以及基于边缘检测的跳跃连接，实现了对心脏磁共振（CMR）图像的准确完全自动的语义分割。该模型使用一系列组合卷积块提取局部特征及其相互关系，在卷积块中嵌入通道和空间关注以及视觉变换器。在卷积块中深度嵌入通道和空间关注可识别重要特征及其空间定位。结合边缘信息、通道和空间关注的跳跃连接可在卷积转换过程中减少信息丢失。总体模型显著提高了对CMR图像的语义分割，有助于改善医学图像分析。文中提出了双重关注模块（通道和空间关注）的算法。性能结果显示，基于DSC指标，U-R-Veda达到了95.2%的平均准确率。该模型在DSC和HD指标方面的准确率优于其他模型，特别是在右心室和左心室心肌勾画方面。

更新时间: 2025-06-25 04:10:09

领域: eess.IV,cs.AI,cs.CV,cs.LG,I.4.6; I.2; I.5.2; I.5.1

下载: http://arxiv.org/abs/2506.20689v1

Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an estimator to extract compact sets of decision rules from tree ensembles. The extracted models are accurate and can be manually examined to reveal relationships between the predictors and the response. A key novelty of our estimator is the flexibility to jointly control the number of rules extracted and the interaction depth of each rule, which improves accuracy. We develop a tailored exact algorithm to efficiently solve optimization problems underlying our estimator and an approximate algorithm for computing regularization paths, sequences of solutions that correspond to varying model sizes. We also establish novel non-asymptotic prediction error bounds for our proposed approach, comparing it to an oracle that chooses the best data-dependent linear combination of the rules in the ensemble subject to the same complexity constraint as our estimator. The bounds illustrate that the large-sample predictive performance of our estimator is on par with that of the oracle. Through experiments, we demonstrate that our estimator outperforms existing algorithms for rule extraction.

Updated: 2025-06-25 04:06:37

标题: 从树集成中提取可解释模型：计算和统计角度

摘要: 树集成是一种广泛认可的非参数方法，以其准确性和捕获复杂相互作用的能力而闻名。虽然这些模型在预测方面表现出色，但它们很难解释，并且可能无法揭示数据中的有用关系。我们提出了一种估计器，用于从树集成中提取紧凑的决策规则集。提取的模型准确度高，可以手动检查以揭示预测变量和响应变量之间的关系。我们估计器的一个关键创新是灵活地控制提取的规则数量和每个规则的交互深度，从而提高准确性。我们开发了一个定制的精确算法来高效地解决我们估计器基础的优化问题，以及一个用于计算正则化路径的近似算法，这些路径是与不同模型大小相对应的解序列。我们还为我们提出的方法建立了新颖的非渐近预测误差界限，将其与选择最佳数据相关线性组合的oracle进行比较，oracle受与我们估计器相同的复杂性约束。这些界限说明我们估计器的大样本预测性能与oracle相当。通过实验证明，我们的估计器优于现有的规则提取算法。

更新时间: 2025-06-25 04:06:37

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2506.20114v1

Peer Review as Structured Commentary: Immutable Identity, Public Dialogue, and Reproducible Scholarship

This paper reconceptualises peer review as structured public commentary. Traditional academic validation is hindered by anonymity, latency, and gatekeeping. We propose a transparent, identity-linked, and reproducible system of scholarly evaluation anchored in open commentary. Leveraging blockchain for immutable audit trails and AI for iterative synthesis, we design a framework that incentivises intellectual contribution, captures epistemic evolution, and enables traceable reputational dynamics. This model empowers fields from computational science to the humanities, reframing academic knowledge as a living process rather than a static credential.

Updated: 2025-06-25 03:57:40

标题: 同行评审作为结构化评论：不可改变的身份、公开对话和可复制的学术成果

摘要: 这篇论文重新构想同行评审为结构化的公开评论。传统的学术验证受到匿名性、延迟和门户监控的阻碍。我们提出了一个透明、与身份链接的、可重复的学术评估系统，以公开评论为基础。利用区块链技术实现不可变的审计追踪和人工智能进行迭代综合，我们设计了一个框架，激励知识贡献，捕捉认识进化，并实现可追踪的声誉动态。这一模式赋予了从计算科学到人文学科的领域力量，将学术知识重新构想为一个活跃的过程，而不是一个静态的证书。

更新时间: 2025-06-25 03:57:40

领域: cs.CY,cs.AI,cs.DL,cs.SI,physics.hist-ph,68T99, 03B30, 91D30,I.2.0; H.3.5; K.4.4

下载: http://arxiv.org/abs/2506.22497v1

Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

This paper discusses how ophthalmologists often rely on multimodal data to improve diagnostic accuracy. However, complete multimodal data is rare in real-world applications due to a lack of medical equipment and concerns about data privacy. Traditional deep learning methods typically address these issues by learning representations in latent space. However, the paper highlights two key limitations of these approaches: (i) Task-irrelevant redundant information (e.g., numerous slices) in complex modalities leads to significant redundancy in latent space representations. (ii) Overlapping multimodal representations make it difficult to extract unique features for each modality. To overcome these challenges, the authors propose the Essence-Point and Disentangle Representation Learning (EDRL) strategy, which integrates a self-distillation mechanism into an end-to-end framework to enhance feature selection and disentanglement for more robust multimodal learning. Specifically, the Essence-Point Representation Learning module selects discriminative features that improve disease grading performance. The Disentangled Representation Learning module separates multimodal data into modality-common and modality-unique representations, reducing feature entanglement and enhancing both robustness and interpretability in ophthalmic disease diagnosis. Experiments on multimodal ophthalmology datasets show that the proposed EDRL strategy significantly outperforms current state-of-the-art methods.

Updated: 2025-06-25 03:53:34

标题: 通过解耦表示的鲁棒多模态学习对眼科疾病分级进行翻译

摘要: 本文讨论了眼科医生通常依赖多模态数据来提高诊断准确性。然而，在现实世界的应用中，完整的多模态数据很少，这是由于缺乏医疗设备和对数据隐私的担忧。传统的深度学习方法通常通过在潜在空间中学习表示来解决这些问题。然而，本文强调了这些方法的两个关键局限性：(i) 在复杂模态中的任务无关冗余信息（例如，大量的切片）导致潜在空间表示中存在显著冗余。(ii) 重叠的多模态表示使得难以提取每种模态的独特特征。为了克服这些挑战，作者提出了Essence-Point和Disentangle Representation Learning（EDRL）策略，将自我蒸馏机制整合到端到端框架中，以增强特征选择和解缠，从而实现更强大的多模态学习。具体而言，Essence-Point Representation Learning模块选择有益于疾病分级性能的区分性特征。Disentangled Representation Learning模块将多模态数据分离为模态共有和模态独有表示，减少特征纠缠，增强眼科疾病诊断的鲁棒性和可解释性。对多模态眼科学数据集的实验表明，所提出的EDRL策略明显优于当前最先进的方法。

更新时间: 2025-06-25 03:53:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.05319v2

Evaluating Disassembly Errors With Only Binaries

Disassemblers are crucial in the analysis and modification of binaries. Existing works showing disassembler errors largely rely on practical implementation without specific guarantees and assume source code and compiler toolchains to evaluate ground truth. However, the assumption of source code is contrary to typical binary scenarios where only the binary is available. In this work, we investigate an approach with minimal assumptions and a sound approach to disassembly error evaluation that does not require source code. Any source code does not address the fundamental problem of binary disassembly and fails when only the binary exists. As far as we know, this is the first work to evaluate disassembly errors using only the binary. We propose TraceBin, which uses dynamic execution to find disassembly errors. TraceBin targets the use case where the disassembly is used in an automated fashion for security tasks on a target binary, such as static binary instrumentation, binary hardening, automated code repair, and so on, which may be affected by disassembly errors. Discovering disassembly errors in the target binary aids in reducing problems caused by such errors. Furthermore, we are not aware of existing approaches that can evaluate errors given only a target binary, as they require source code. Our evaluation shows TraceBin finds: (i) errors consistent with existing studies even without source; (ii) disassembly errors due to control flow; (iii) new interesting errors; (iv) errors in non-C/C++ binaries; (v) errors in closed-source binaries; and (vi) show that disassembly errors can have significant security implications. Overall, our experimental results show that TraceBin finds many errors in existing popular disassemblers. It is also helpful in automated security tasks on (closed source) binaries relying on disassemblers.

Updated: 2025-06-25 03:46:19

标题: 使用仅二进制文件评估拆卸错误

摘要: 解组器在二进制分析和修改中至关重要。现有的研究显示解组器错误很大程度上依赖于实际实现，没有具体的保证，并假定需要源代码和编译器工具链来评估真实情况。然而，对源代码的假设与典型的二进制场景相矛盾，后者只有二进制可用。在这项工作中，我们研究了一种最小假设和合理的解组错误评估方法，不需要源代码。任何源代码都不能解决二进制解组的基本问题，当只有二进制存在时会失败。据我们所知，这是第一项仅使用二进制评估解组错误的工作。我们提出了TraceBin，它利用动态执行来查找解组错误。TraceBin针对解组在目标二进制上自动化安全任务中的用例，例如静态二进制仪器化、二进制加固、自动代码修复等，这些任务可能会受到解组错误的影响。发现目标二进制中的解组错误有助于减少由此类错误引起的问题。此外，我们不知道现有方法可以仅通过目标二进制来评估错误，因为它们需要源代码。我们的评估显示TraceBin发现：(i) 即使没有源代码也发现与现有研究一致的错误；(ii) 由于控制流导致的解组错误；(iii) 新的有趣错误；(iv) 非C/C++二进制中的错误；(v) 闭源二进制中的错误；以及(vi) 显示解组错误可能具有重要的安全影响。总的来说，我们的实验结果表明TraceBin在现有流行的解组器中发现了许多错误。它还有助于依赖解组器进行自动安全任务的（闭源）二进制。

更新时间: 2025-06-25 03:46:19

领域: cs.CR

下载: http://arxiv.org/abs/2506.20109v1

Mitigating Gambling-Like Risk-Taking Behaviors in Large Language Models: A Behavioral Economics Approach to AI Safety

Large Language Models (LLMs) exhibit systematic risk-taking behaviors analogous to those observed in gambling psychology, including overconfidence bias, loss-chasing tendencies, and probability misjudgment. Drawing from behavioral economics and prospect theory, we identify and formalize these "gambling-like" patterns where models sacrifice accuracy for high-reward outputs, exhibit escalating risk-taking after errors, and systematically miscalibrate uncertainty. We propose the Risk-Aware Response Generation (RARG) framework, incorporating insights from gambling research to address these behavioral biases through risk-calibrated training, loss-aversion mechanisms, and uncertainty-aware decision making. Our approach introduces novel evaluation paradigms based on established gambling psychology experiments, including AI adaptations of the Iowa Gambling Task and probability learning assessments. Experimental results demonstrate measurable reductions in gambling-like behaviors: 18.7\% decrease in overconfidence bias, 24.3\% reduction in loss-chasing tendencies, and improved risk calibration across diverse scenarios. This work establishes the first systematic framework for understanding and mitigating gambling psychology patterns in AI systems.

Updated: 2025-06-25 03:45:35

标题: 减轻大型语言模型中类似赌博风险行为：AI安全的行为经济学方法

摘要: 大型语言模型（LLMs）表现出类似于赌博心理学中观察到的系统性冒险行为，包括过度自信偏差、追逐损失倾向和概率判断错误。借鉴行为经济学和前景理论，我们确定并形式化这些“类似赌博”的模式，其中模型为高回报输出牺牲准确性，错误后展示逐渐增加的冒险行为，并系统性地错误估计不确定性。我们提出了风险感知响应生成（RARG）框架，结合赌博研究的见解，通过风险校准培训、厌恶损失机制和意识到不确定性的决策制定来解决这些行为偏见。我们的方法引入了基于已建立的赌博心理学实验的新评估范式，包括爱荷华赌博任务和概率学习评估的人工智能改编。实验结果显示了类似赌博行为的可衡量减少：自信偏差减少了18.7％，追逐损失倾向减少了24.3％，在不同情境下改进了风险校准。这项工作建立了第一个系统框架，用于理解和缓解人工智能系统中赌博心理学模式。

更新时间: 2025-06-25 03:45:35

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.22496v1

The Impact of the Russia-Ukraine Conflict on the Cloud Computing Risk Landscape

The Russian invasion of Ukraine has fundamentally altered the information technology (IT) risk landscape, particularly in cloud computing environments. This paper examines how this geopolitical conflict has accelerated data sovereignty concerns, transformed cybersecurity paradigms, and reshaped cloud infrastructure strategies worldwide. Through an analysis of documented cyber operations, regulatory responses, and organizational adaptations between 2022 and early 2025, this research demonstrates how the conflict has served as a catalyst for a broader reassessment of IT risk. The research reveals that while traditional IT risk frameworks offer foundational guidance, their standard application may inadequately address the nuances of state-sponsored threats, conflicting data governance regimes, and the weaponization of digital dependencies without specific geopolitical augmentation. The contribution of this paper lies in its focused synthesis and strategic adaptation of existing best practices into a multi-layered approach. This approach uniquely synergizes resilient cloud architectures (including sovereign and hybrid models), enhanced data-centric security strategies (such as advanced encryption and privacy-enhancing technologies), and geopolitically-informed governance to build digital resilience. The interplay between these layers, emphasizing how geopolitical insights directly shape architectural and security choices beyond standard best practices-particularly by integrating the human element, including personnel vulnerabilities and expertise, as a core consideration in technical design and operational management-offers a more robust defense against the specific, multifaceted risks arising from geopolitical conflict in increasingly fractured digital territories.

Updated: 2025-06-25 03:32:36

标题: 俄乌冲突对云计算风险格局的影响

摘要: 乌克兰遭受俄罗斯入侵从根本上改变了信息技术（IT）风险格局，特别是在云计算环境中。本文探讨了这场地缘政治冲突如何加速了数据主权关注，转变了网络安全范式，并在全球范围内重塑了云基础设施战略。通过对2022年至2025年初之间记录的网络行动、监管反应和组织调整的分析，本研究展示了这场冲突如何成为对IT风险进行更广泛重新评估的催化剂。研究表明，虽然传统的IT风险框架提供了基础指导，但其标准应用可能不足以解决国家支持的威胁、冲突的数据治理制度以及数字依赖的武器化等细微之处，没有具体的地缘政治增强。本文的贡献在于将现有最佳实践进行集中综合和战略调整，形成一个多层次的方法。这种方法独特地将具有抵御能力的云架构（包括主权和混合模型）、增强数据中心安全策略（如高级加密和增强隐私技术）以及地缘政治信息化治理相结合，构建数字韧性。这些层次之间的互动强调了地缘政治洞察力如何直接塑造架构和安全选择，超越标准最佳实践，特别是通过整合人员要素，包括人员的脆弱性和专业知识，作为技术设计和运营管理的核心考虑，提供更强有力的防御，以抵御由地缘政治冲突在日益分裂的数字领土中产生的特定、多方面的风险。

更新时间: 2025-06-25 03:32:36

领域: cs.CY,cs.CR,cs.NI

下载: http://arxiv.org/abs/2506.20104v1

BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos

Recent advances in deep generative models have led to significant progress in video generation, yet the fidelity of AI-generated videos remains limited. Synthesized content often exhibits visual artifacts such as temporally inconsistent motion, physically implausible trajectories, unnatural object deformations, and local blurring that undermine realism and user trust. Accurate detection and spatial localization of these artifacts are crucial for both automated quality control and for guiding the development of improved generative models. However, the research community currently lacks a comprehensive benchmark specifically designed for artifact localization in AI generated videos. Existing datasets either restrict themselves to video or frame level detection or lack the fine-grained spatial annotations necessary for evaluating localization methods. To address this gap, we introduce BrokenVideos, a benchmark dataset of 3,254 AI-generated videos with meticulously annotated, pixel-level masks highlighting regions of visual corruption. Each annotation is validated through detailed human inspection to ensure high quality ground truth. Our experiments show that training state of the art artifact detection models and multi modal large language models (MLLMs) on BrokenVideos significantly improves their ability to localize corrupted regions. Through extensive evaluation, we demonstrate that BrokenVideos establishes a critical foundation for benchmarking and advancing research on artifact localization in generative video models. The dataset is available at: https://broken-video-detection-datetsets.github.io/Broken-Video-Detection-Datasets.github.io/.

Updated: 2025-06-25 03:30:04

标题: 破损视频：一个用于人工智能生成视频中微细瑕疵定位的基准数据集

摘要: 最近深度生成模型的进展已经在视频生成方面取得了显著进展，但人工智能生成的视频的保真度仍然有限。合成内容通常会展现出视觉瑕疵，如时间上不一致的运动、物理上不合理的轨迹、不自然的物体变形和局部模糊，这些都削弱了现实感和用户信任。准确检测和空间定位这些瑕疵对于自动化质量控制以及指导改进生成模型的发展至关重要。然而，研究界目前缺乏一个专门设计用于人工智能生成视频中的瑕疵定位的综合基准。现有数据集要么限制在视频或帧级别的检测，要么缺乏评估定位方法所必需的细粒度空间标注。为了填补这一空白，我们引入了BrokenVideos，一个包含3,254个人工智能生成视频的基准数据集，其中包含精心注释的像素级掩模，突出显示出视觉损坏的区域。每个注释都经过详细的人工检查验证，以确保高质量的标准答案。我们的实验表明，在BrokenVideos上训练最先进的瑕疵检测模型和多模态大语言模型(MLLMs)显著提高了它们定位受损区域的能力。通过广泛的评估，我们展示了BrokenVideos为评估和推动生成视频模型中瑕疵定位研究奠定了重要的基础。该数据集可在以下网址获取：https://broken-video-detection-datetsets.github.io/Broken-Video-Detection-Datasets.github.io/。

更新时间: 2025-06-25 03:30:04

领域: cs.CV,cs.AI,I.4

下载: http://arxiv.org/abs/2506.20103v1

Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox

The convergence of IT and OT has created hyper-connected ICS, exposing critical infrastructure to a new class of adaptive, intelligent adversaries that render static defenses obsolete. Existing security paradigms often fail to address a foundational "Trinity of Trust," comprising the fidelity of the system model, the integrity of synchronizing data, and the resilience of the analytical engine against sophisticated evasion. This paper introduces the ARC framework, a method for achieving analytical resilience through an autonomous, closed-loop hardening process. ARC establishes a perpetual co-evolutionary arms race within the high-fidelity sandbox of a F-SCDT. A DRL agent, the "Red Agent," is formalized and incentivized to autonomously discover stealthy, physically-plausible attack paths that maximize process disruption while evading detection. Concurrently, an ensemble-based "Blue Agent" defender is continuously hardened via adversarial training against the evolving threats discovered by its adversary. This co-evolutionary dynamic forces both agents to become progressively more sophisticated, enabling the system to autonomously probe and patch its own vulnerabilities. Experimental validation on both the TEP and the SWaT testbeds demonstrates the framework's superior performance. A comprehensive ablation study, supported by extensive visualizations including ROC curves and SHAP plots, reveals that the co-evolutionary process itself is responsible for a significant performance increase in detecting novel attacks. By integrating XAI to ensure operator trust and proposing a scalable F-ARC architecture, this work presents ARC not merely as an improvement, but as a necessary paradigm shift toward dynamic, self-improving security for the future of critical infrastructure.

Updated: 2025-06-25 03:28:48

标题: 自主网络韧性：通过加强数字孪生防护沙箱内的共同进化军备竞赛

摘要: IT和OT的融合已经创造了超级连接的ICS，使关键基础设施暴露于一类新型的适应性、智能的对手，这些对手使静态防御策略变得过时。现有的安全范式通常未能解决一个基本的“信任三位一体”，包括系统模型的忠实度、同步数据的完整性以及分析引擎对复杂躲避的韧性。本文介绍了ARC框架，这是一种通过自主、闭环强化过程实现分析韧性的方法。ARC在高保真度F-SCDT的沙盒中建立了一个永恒的共同进化军备竞赛。一个DRL代理，即“红方代理”，被正式化并被激励自主发现隐秘的、在物理上可信的攻击路径，最大程度地干扰过程同时避免被检测。同时，一个基于集成的“蓝方代理”防御者通过对其对手发现的不断演变的威胁进行对抗训练，不断强化。这种共同进化动态迫使两个代理变得越来越复杂，使系统能够自主地探测和修补自己的漏洞。在TEP和SWaT实验平台上的实验验证展示了该框架的卓越性能。一项全面的消融研究，支持广泛的可视化，包括ROC曲线和SHAP图，揭示了共同进化过程本身对于检测新型攻击的显著性能提升负有责任。通过整合XAI以确保操作员的信任并提出可扩展的F-ARC架构，这项工作将ARC不仅仅作为一种改进，而是作为未来关键基础设施动态、自我改进安全的必要范式转变。

更新时间: 2025-06-25 03:28:48

领域: cs.CR,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2506.20102v1

Secure Multi-Key Homomorphic Encryption with Application to Privacy-Preserving Federated Learning

Multi-Key Homomorphic Encryption (MKHE), proposed by Lopez-Alt et al. (STOC 2012), allows for performing arithmetic computations directly on ciphertexts encrypted under distinct keys. Subsequent works by Chen and Dai et al. (CCS 2019) and Kim and Song et al. (CCS 2023) extended this concept by proposing multi-key BFV/CKKS variants, referred to as the CDKS scheme. These variants incorporate asymptotically optimal techniques to facilitate secure computation across multiple data providers. In this paper, we identify a critical security vulnerability in the CDKS scheme when applied to multiparty secure computation tasks, such as privacy-preserving federated learning (PPFL). In particular, we show that CDKS may inadvertently leak plaintext information from one party to others. To mitigate this issue, we propose a new scheme, SMHE (Secure Multi-Key Homomorphic Encryption), which incorporates a novel masking mechanism into the multi-key BFV and CKKS frameworks to ensure that plaintexts remain confidential throughout the computation. We implement a PPFL application using SMHE and demonstrate that it provides significantly improved security with only a modest overhead in homomorphic evaluation. For instance, our PPFL model based on multi-key CKKS incurs less than a 2\times runtime and communication traffic increase compared to the CDKS-based PPFL model. The code is publicly available at https://github.com/JiahuiWu2022/SMHE.git.

Updated: 2025-06-25 03:28:25

标题: 安全多密钥同态加密及其在保护隐私的联邦学习中的应用

摘要: 多密钥同态加密（MKHE）是由Lopez-Alt等人（STOC 2012）提出的，允许在不同密钥下加密的密文上直接执行算术计算。陈和戴等人（CCS 2019）以及金和宋等人（CCS 2023）的后续作品通过提出多密钥BFV/CKKS变体，即CDKS方案，扩展了这一概念。这些变体采用了渐近最优的技术，以促进跨多个数据提供方的安全计算。在本文中，我们发现CDKS方案在应用于多方安全计算任务（如隐私保护联邦学习（PPFL））时存在关键的安全漏洞。特别地，我们展示了CDKS可能会无意间将一个参与方的明文信息泄露给其他参与方。为了缓解这一问题，我们提出了一种新方案SMHE（安全多密钥同态加密），该方案在多密钥BFV和CKKS框架中引入了一种新颖的掩码机制，以确保明文在整个计算过程中保持机密性。我们使用SMHE实现了一个PPFL应用程序，并展示其在同态评估方面只有适度的开销的同时提供了显著改进的安全性。例如，我们基于多密钥CKKS的PPFL模型与基于CDKS的PPFL模型相比，运行时间和通信流量增加不到2倍。代码可以在https://github.com/JiahuiWu2022/SMHE.git上公开获取。

更新时间: 2025-06-25 03:28:25

领域: cs.CR

下载: http://arxiv.org/abs/2506.20101v1

Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses

The diagnostic value of electrocardiogram (ECG) lies in its dynamic characteristics, ranging from rhythm fluctuations to subtle waveform deformations that evolve across time and frequency domains. However, supervised ECG models tend to overfit dominant and repetitive patterns, overlooking fine-grained but clinically critical cues, a phenomenon known as Simplicity Bias (SB), where models favor easily learnable signals over subtle but informative ones. In this work, we first empirically demonstrate the presence of SB in ECG analyses and its negative impact on diagnostic performance, while simultaneously discovering that self-supervised learning (SSL) can alleviate it, providing a promising direction for tackling the bias. Following the SSL paradigm, we propose a novel method comprising two key components: 1) Temporal-Frequency aware Filters to capture temporal-frequency features reflecting the dynamic characteristics of ECG signals, and 2) building on this, Multi-Grained Prototype Reconstruction for coarse and fine representation learning across dual domains, further mitigating SB. To advance SSL in ECG analyses, we curate a large-scale multi-site ECG dataset with 1.53 million recordings from over 300 clinical centers. Experiments on three downstream tasks across six ECG datasets demonstrate that our method effectively reduces SB and achieves state-of-the-art performance. Code and dataset will be released publicly.

Updated: 2025-06-25 03:25:49

标题: 具有掩码自动编码器的心脏感知能力：揭示心电图分析的简单偏见

摘要: 心电图（ECG）的诊断价值在于其动态特性，从节律波动到随时间和频率领域演变的微妙波形变形。然而，监督学习的心电图模型往往过度拟合主导性和重复性模式，忽视了精细但临床关键的线索，这种现象被称为简单性偏差（SB），即模型偏好易学习的信号而不是微妙但信息丰富的信号。在这项工作中，我们首先在心电图分析中实证证明了SB的存在以及对诊断性能的负面影响，同时发现自监督学习（SSL）可以减轻这种影响，为解决偏差提供了一个有前途的方向。遵循SSL范式，我们提出了一种包含两个关键组件的新方法：1）时频感知滤波器，用于捕捉反映心电图信号动态特性的时频特征，2）在此基础上构建的多粒度原型重建，用于在双重领域中进行粗细表示学习，进一步减轻SB。为推进SSL在心电图分析中的应用，我们整理了一个规模庞大的多中心心电图数据集，包括来自300多个临床中心的153万条记录。在六个心电图数据集上进行的三个下游任务实验表明，我们的方法有效减少了SB，并实现了最先进的性能。代码和数据集将公开发布。

更新时间: 2025-06-25 03:25:49

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.22495v1

Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models

In this paper, we present Morse, a simple dual-sampling framework for accelerating diffusion models losslessly. The key insight of Morse is to reformulate the iterative generation (from noise to data) process via taking advantage of fast jump sampling and adaptive residual feedback strategies. Specifically, Morse involves two models called Dash and Dot that interact with each other. The Dash model is just the pre-trained diffusion model of any type, but operates in a jump sampling regime, creating sufficient space for sampling efficiency improvement. The Dot model is significantly faster than the Dash model, which is learnt to generate residual feedback conditioned on the observations at the current jump sampling point on the trajectory of the Dash model, lifting the noise estimate to easily match the next-step estimate of the Dash model without jump sampling. By chaining the outputs of the Dash and Dot models run in a time-interleaved fashion, Morse exhibits the merit of flexibly attaining desired image generation performance while improving overall runtime efficiency. With our proposed weight sharing strategy between the Dash and Dot models, Morse is efficient for training and inference. Our method shows a lossless speedup of 1.78X to 3.31X on average over a wide range of sampling step budgets relative to 9 baseline diffusion models on 6 image generation tasks. Furthermore, we show that our method can be also generalized to improve the Latent Consistency Model (LCM-SDXL, which is already accelerated with consistency distillation technique) tailored for few-step text-to-image synthesis. The code and models are available at https://github.com/deep-optimization/Morse.

Updated: 2025-06-25 03:25:37

标题: 莫尔斯：双采样用于扩散模型的无损加速

摘要: 在本文中，我们提出了一种简单的双采样框架Morse，用于加速扩散模型的无损加速。Morse的关键见解是通过利用快速跳跃采样和自适应残差反馈策略，重新制定迭代生成（从噪声到数据）的过程。具体而言，Morse包括两个相互交互的模型Dash和Dot。Dash模型只是任何类型的预训练扩散模型，但在跳跃采样制度下运行，为采样效率的提高创造了充分的空间。Dot模型比Dash模型快得多，它学习生成在Dash模型轨迹上当前跳跃采样点处的观测条件下的残差反馈，将噪声估计提升到很容易与Dash模型的下一步估计匹配而无需跳跃采样。通过以时间交错方式运行Dash和Dot模型的输出，Morse展示了灵活获得所需图像生成性能同时提高整体运行效率的优点。通过我们提出的Dash和Dot模型之间的权重共享策略，Morse在训练和推断方面效率高。我们的方法相对于6个图像生成任务上的9个基线扩散模型，在广泛的采样步骤预算范围内，平均实现了1.78倍到3.31倍的无损加速。此外，我们展示了我们的方法也可以推广到改进为少步文本到图像合成定制的潜在一致性模型（LCM-SDXL，已经使用一致性蒸馏技术加速）。代码和模型可在https://github.com/deep-optimization/Morse获取。

更新时间: 2025-06-25 03:25:37

领域: cs.GR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2506.18251v2

What Matters in LLM-generated Data: Diversity and Its Effect on Model Fine-Tuning

With the remarkable generative capabilities of large language models (LLMs), using LLM-generated data to train downstream models has emerged as a promising approach to mitigate data scarcity in specific domains and reduce time-consuming annotations. However, recent studies have highlighted a critical issue: iterative training on self-generated data results in model collapse, where model performance degrades over time. Despite extensive research on the implications of LLM-generated data, these works often neglect the importance of data diversity, a key factor in data quality. In this work, we aim to understand the implications of the diversity of LLM-generated data on downstream model performance. Specifically, we explore how varying levels of diversity in LLM-generated data affect downstream model performance. Additionally, we investigate the performance of models trained on data that mixes different proportions of LLM-generated data, which we refer to as synthetic data. Our experimental results show that, with minimal distribution shift, moderately diverse LLM-generated data can enhance model performance in scenarios with insufficient labeled data, whereas highly diverse generated data has a negative impact. We hope our empirical findings will offer valuable guidance for future studies on LLMs as data generators.

Updated: 2025-06-25 03:25:04

标题: LLM生成数据中重要的因素：多样性及其对模型微调的影响

摘要: 随着大型语言模型（LLMs）出色的生成能力，利用LLM生成的数据来训练下游模型已经成为一种有希望的方法，以减轻特定领域中数据稀缺问题并降低耗时的注释。然而，最近的研究已经凸显出一个关键问题：在自动生成数据上进行迭代训练会导致模型崩溃，即模型性能随时间降低。尽管对LLM生成数据的影响进行了广泛研究，但这些作品往往忽视了数据多样性的重要性，这是数据质量的关键因素。在本研究中，我们旨在了解LLM生成数据的多样性对下游模型性能的影响。具体来说，我们探讨了LLM生成数据中不同多样性水平如何影响下游模型性能。此外，我们还调查了在混合不同比例的LLM生成数据的数据上训练模型的性能，我们将其称为合成数据。我们的实验结果显示，具有适度分布偏移的中度多样性的LLM生成数据可以提高在标记数据不足的情况下模型性能，而高度多样性的生成数据则会产生负面影响。我们希望我们的实证研究结果能为未来关于LLMs作为数据生成器的研究提供有价值的指导。

更新时间: 2025-06-25 03:25:04

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2506.19262v2

MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations

We introduce MIRAGE, a new benchmark for multimodal expert-level reasoning and decision-making in consultative interaction settings. Designed for the agriculture domain, MIRAGE captures the full complexity of expert consultations by combining natural user queries, expert-authored responses, and image-based context, offering a high-fidelity benchmark for evaluating models on grounded reasoning, clarification strategies, and long-form generation in a real-world, knowledge-intensive domain. Grounded in over 35,000 real user-expert interactions and curated through a carefully designed multi-step pipeline, MIRAGE spans diverse crop health, pest diagnosis, and crop management scenarios. The benchmark includes more than 7,000 unique biological entities, covering plant species, pests, and diseases, making it one of the most taxonomically diverse benchmarks available for vision-language models, grounded in the real world. Unlike existing benchmarks that rely on well-specified user inputs and closed-set taxonomies, MIRAGE features underspecified, context-rich scenarios with open-world settings, requiring models to infer latent knowledge gaps, handle rare entities, and either proactively guide the interaction or respond. Project Page: https://mirage-benchmark.github.io

Updated: 2025-06-25 03:07:54

标题: MIRAGE：农业专家引导对话中多模态信息搜索和推理的基准

摘要: 我们介绍了MIRAGE，这是一个面向咨询性互动场景中多模态专家级推理和决策的新基准。MIRAGE专为农业领域设计，通过结合自然用户查询、专家编写的回复和基于图像的背景，捕捉了专家咨询的完整复杂性，为评估模型在现实世界、知识密集型领域中的基于理性的推理、澄清策略和长篇生成提供了一个高保真度的基准。MIRAGE基于超过35,000个真实用户专家交互，并通过精心设计的多步骤管道策划，涵盖了多样的作物健康、虫害诊断和作物管理场景。该基准包括超过7,000个独特的生物实体，涵盖了植物物种、害虫和疾病，使其成为基于视觉语言模型的最具分类多样性的基准之一，基于现实世界。与现有的依赖于明确定义用户输入和封闭集合分类法的基准不同，MIRAGE具有未明确定、上下文丰富的场景和开放世界设置，需要模型推断潜在的知识缺口、处理罕见实体，并主动引导交互或做出回应。项目页面：https://mirage-benchmark.github.io

更新时间: 2025-06-25 03:07:54

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2506.20100v1

BeltCrack: the First Sequential-image Industrial Conveyor Belt Crack Detection Dataset and Its Baseline with Triple-domain Feature Learning

Conveyor belts are important equipment in modern industry, widely applied in production and manufacturing. Their health is much critical to operational efficiency and safety. Cracks are a major threat to belt health. Currently, considering safety, how to intelligently detect belt cracks is catching an increasing attention. To implement the intelligent detection with machine learning, real crack samples are believed to be necessary. However, existing crack datasets primarily focus on pavement scenarios or synthetic data, no real-world industrial belt crack datasets at all. Cracks are a major threat to belt health. Furthermore, to validate usability and effectiveness, we propose a special baseline method with triple-domain ($i.e.$, time-space-frequency) feature hierarchical fusion learning for the two whole-new datasets. Experimental results demonstrate the availability and effectiveness of our dataset. Besides, they also show that our baseline is obviously superior to other similar detection methods. Our datasets and source codes are available at https://github.com/UESTC-nnLab/BeltCrack.

Updated: 2025-06-25 02:44:04

标题: BeltCrack: 第一个序列图像工业输送带裂缝检测数据集及其三域特征学习基线

摘要: 传送带是现代工业中重要的设备，在生产和制造中被广泛应用。它们的健康状态对操作效率和安全性至关重要。裂缝是对传送带健康的主要威胁。目前，考虑到安全性，如何智能检测传送带裂缝正引起越来越多的关注。为了实现利用机器学习进行智能检测，认为真实的裂缝样本是必要的。然而，现有的裂缝数据集主要集中在路面情景或合成数据上，没有真实的工业传送带裂缝数据集。裂缝是对传送带健康的主要威胁。此外，为了验证可用性和有效性，我们提出了一种特殊的基线方法，通过三重域（即时间-空间-频率）特征层次融合学习来处理这两个全新数据集。实验结果表明了我们数据集的可用性和有效性。此外，它们还显示我们的基线方法明显优于其他类似的检测方法。我们的数据集和源代码可以在https://github.com/UESTC-nnLab/BeltCrack 上找到。

更新时间: 2025-06-25 02:44:04

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2506.17892v2

PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding

This report introduces PP-DocBee2, an advanced version of the PP-DocBee, designed to enhance multimodal document understanding. Built on a large multimodal model architecture, PP-DocBee2 addresses the limitations of its predecessor through key technological improvements, including enhanced synthetic data quality, improved visual feature fusion strategy, and optimized inference methodologies. These enhancements yield an $11.4\%$ performance boost on internal benchmarks for Chinese business documents, and reduce inference latency by $73.0\%$ to the vanilla version. A key innovation of our work is a data quality optimization strategy for multimodal document tasks. By employing a large-scale multimodal pre-trained model to evaluate data, we apply a novel statistical criterion to filter outliers, ensuring high-quality training data. Inspired by insights into underutilized intermediate features in multimodal models, we enhance the ViT representational capacity by decomposing it into layers and applying a novel feature fusion strategy to improve complex reasoning. The source code and pre-trained model are available at \href{https://github.com/PaddlePaddle/PaddleMIX}{https://github.com/PaddlePaddle/PaddleMIX}.

Updated: 2025-06-25 02:40:39

标题: PP-DocBee2：利用高效数据改进多模态文档理解的基线

摘要: 这份报告介绍了PP-DocBee2，这是PP-DocBee的一个先进版本，旨在增强多模态文档理解能力。基于一个庞大的多模态模型架构构建，PP-DocBee2通过关键技术改进，包括提升合成数据质量、改进视觉特征融合策略和优化推理方法论，解决了其前身的局限性。这些改进使得在中国商务文档的内部基准测试中性能提升了$11.4\%$，并将推理延迟降低了$73.0\%$至基础版本。我们工作的一个关键创新是针对多模态文档任务的数据质量优化策略。通过利用大规模多模态预训练模型评估数据，我们应用了一种新颖的统计准则来过滤异常值，确保高质量的训练数据。受对多模态模型中未充分利用的中间特征的洞察启发，我们通过将ViT分解成层，并应用一种新颖的特征融合策略来提高复杂推理，增强了其表征能力。源代码和预训练模型可在\href{https://github.com/PaddlePaddle/PaddleMIX}{https://github.com/PaddlePaddle/PaddleMIX}上找到。

更新时间: 2025-06-25 02:40:39

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.18023v2

Fine-Grained Perturbation Guidance via Attention Head Selection

Recent guidance methods in diffusion models steer reverse sampling by perturbing the model to construct an implicit weak model and guide generation away from it. Among these approaches, attention perturbation has demonstrated strong empirical performance in unconditional scenarios where classifier-free guidance is not applicable. However, existing attention perturbation methods lack principled approaches for determining where perturbations should be applied, particularly in Diffusion Transformer (DiT) architectures where quality-relevant computations are distributed across layers. In this paper, we investigate the granularity of attention perturbations, ranging from the layer level down to individual attention heads, and discover that specific heads govern distinct visual concepts such as structure, style, and texture quality. Building on this insight, we propose "HeadHunter", a systematic framework for iteratively selecting attention heads that align with user-centric objectives, enabling fine-grained control over generation quality and visual attributes. In addition, we introduce SoftPAG, which linearly interpolates each selected head's attention map toward an identity matrix, providing a continuous knob to tune perturbation strength and suppress artifacts. Our approach not only mitigates the oversmoothing issues of existing layer-level perturbation but also enables targeted manipulation of specific visual styles through compositional head selection. We validate our method on modern large-scale DiT-based text-to-image models including Stable Diffusion 3 and FLUX.1, demonstrating superior performance in both general quality enhancement and style-specific guidance. Our work provides the first head-level analysis of attention perturbation in diffusion models, uncovering interpretable specialization within attention layers and enabling practical design of effective perturbation strategies.

Updated: 2025-06-25 02:37:46

标题: 通过注意力头选择进行细粒度扰动引导

摘要: 最近扩散模型中的指导方法通过扰动模型以构建隐式弱模型，并引导生成远离它。在这些方法中，注意力扰动在无条件情况下表现出强大的实证性能，其中无需分类器的指导不适用。然而，现有的注意力扰动方法缺乏确定何处应用扰动的原则方法，特别是在扩散变压器（DiT）架构中，质量相关的计算分布在各层之间。在本文中，我们研究了注意力扰动的细粒度，从层级到单个注意力头，发现特定的头控制着不同的视觉概念，如结构、风格和纹理质量。基于这一洞察，我们提出了“HeadHunter”，这是一个系统框架，用于迭代选择与用户中心目标一致的注意力头，实现对生成质量和视觉属性的精细控制。此外，我们引入了SoftPAG，它将每个选择的头的注意力图线性插值到一个单位矩阵，提供一个连续的旋钮来调整扰动强度并抑制伪影。我们的方法不仅缓解了现有层级扰动的过度平滑问题，还通过构成头选择实现了对特定视觉风格的有针对性的操作。我们在现代大规模基于DiT的文本到图像模型上验证了我们的方法，包括稳定扩散3和FLUX.1，在一般质量增强和特定风格指导方面表现出优越性能。我们的工作提供了扩散模型中注意力扰动的头级分析，揭示了注意力层内的可解释专业化，并实现了有效扰动策略的实际设计。

更新时间: 2025-06-25 02:37:46

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.10978v2

MEL: Multi-level Ensemble Learning for Resource-Constrained Environments

AI inference at the edge is becoming increasingly common for low-latency services. However, edge environments are power- and resource-constrained, and susceptible to failures. Conventional failure resilience approaches, such as cloud failover or compressed backups, often compromise latency or accuracy, limiting their effectiveness for critical edge inference services. In this paper, we propose Multi-Level Ensemble Learning (MEL), a new framework for resilient edge inference that simultaneously trains multiple lightweight backup models capable of operating collaboratively, refining each other when multiple servers are available, and independently under failures while maintaining good accuracy. Specifically, we formulate our approach as a multi-objective optimization problem with a loss formulation that inherently encourages diversity among individual models to promote mutually refining representations, while ensuring each model maintains good standalone performance. Empirical evaluations across vision, language, and audio datasets show that MEL provides performance comparable to original architectures while also providing fault tolerance and deployment flexibility across edge platforms. Our results show that our ensemble model, sized at 40\% of the original model, achieves similar performance, while preserving 95.6\% of ensemble accuracy in the case of failures when trained using MEL.

Updated: 2025-06-25 02:33:57

标题: MEL：面向资源受限环境的多层集成学习

摘要: 边缘AI推理在低延迟服务中变得越来越常见。然而，边缘环境受到功耗和资源限制，并容易发生故障。传统的故障容忍方法，如云故障转移或压缩备份，往往会影响延迟或准确性，限制它们对关键边缘推理服务的有效性。在本文中，我们提出了多级集成学习（MEL）框架，这是一个新的弹性边缘推理框架，同时训练多个轻量级备份模型，这些模型能够在多个服务器可用时协同运行，在故障时独立运行，同时保持良好的准确性。具体来说，我们将我们的方法构建为一个多目标优化问题，采用损失公式，从根本上鼓励个体模型之间的多样性，以促进相互精炼的表征，同时确保每个模型保持良好的独立性能。通过对视觉、语言和音频数据集的实证评估显示，MEL提供了与原始架构相当的性能，同时还提供了跨边缘平台的容错性和部署灵活性。我们的结果表明，我们的集成模型，其大小为原始模型的40％，在使用MEL训练时，在故障情况下实现了类似的性能，同时保持了95.6％的集成准确性。

更新时间: 2025-06-25 02:33:57

领域: cs.LG

下载: http://arxiv.org/abs/2506.20094v1

Understanding World or Predicting Future? A Comprehensive Survey of World Models

The concept of world models has garnered significant attention due to advancements in multimodal large language models such as GPT-4 and video generation models such as Sora, which are central to the pursuit of artificial general intelligence. This survey offers a comprehensive review of the literature on world models. Generally, world models are regarded as tools for either understanding the present state of the world or predicting its future dynamics. This review presents a systematic categorization of world models, emphasizing two primary functions: (1) constructing internal representations to understand the mechanisms of the world, and (2) predicting future states to simulate and guide decision-making. Initially, we examine the current progress in these two categories. We then explore the application of world models in key domains, including autonomous driving, robotics, and social simulacra, with a focus on how each domain utilizes these aspects. Finally, we outline key challenges and provide insights into potential future research directions. We summarize the representative papers along with their code repositories in https://github.com/tsinghua-fib-lab/World-Model.

Updated: 2025-06-25 02:31:33

标题: 理解世界还是预测未来？世界模型的综合调查

摘要: 世界模型的概念受到了广泛关注，这主要归因于多模态大型语言模型如GPT-4和视频生成模型如Sora的进展，它们是追求人工通用智能的核心。本调查综述了关于世界模型的文献。通常，世界模型被视为一种了解世界当前状态或预测其未来动态的工具。本综述提出了对世界模型的系统分类，强调了两个主要功能：(1)构建内部表示以理解世界的机制，和(2)预测未来状态以模拟和指导决策。首先，我们检查了这两个类别的当前进展。然后，我们探讨了世界模型在关键领域中的应用，包括自动驾驶、机器人技术和社会模拟，重点关注每个领域如何利用这些方面。最后，我们概述了主要挑战，并提供了关于潜在未来研究方向的见解。我们总结了代表性论文以及它们的代码存储库，网址为https://github.com/tsinghua-fib-lab/World-Model。

更新时间: 2025-06-25 02:31:33

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.14499v2

From System 1 to System 2: A Survey of Reasoning Large Language Models

Achieving human-level intelligence requires refining the transition from the fast, intuitive System 1 to the slower, more deliberate System 2 reasoning. While System 1 excels in quick, heuristic decisions, System 2 relies on logical reasoning for more accurate judgments and reduced biases. Foundational Large Language Models (LLMs) excel at fast decision-making but lack the depth for complex reasoning, as they have not yet fully embraced the step-by-step analysis characteristic of true System 2 thinking. Recently, reasoning LLMs like OpenAI's o1/o3 and DeepSeek's R1 have demonstrated expert-level performance in fields such as mathematics and coding, closely mimicking the deliberate reasoning of System 2 and showcasing human-like cognitive abilities. This survey begins with a brief overview of the progress in foundational LLMs and the early development of System 2 technologies, exploring how their combination has paved the way for reasoning LLMs. Next, we discuss how to construct reasoning LLMs, analyzing their features, the core methods enabling advanced reasoning, and the evolution of various reasoning LLMs. Additionally, we provide an overview of reasoning benchmarks, offering an in-depth comparison of the performance of representative reasoning LLMs. Finally, we explore promising directions for advancing reasoning LLMs and maintain a real-time \href{https://github.com/zzli2022/Awesome-Slow-Reason-System}{GitHub Repository} to track the latest developments. We hope this survey will serve as a valuable resource to inspire innovation and drive progress in this rapidly evolving field.

Updated: 2025-06-25 02:24:46

标题: 从系统1到系统2：对大型语言模型推理的调查

摘要: 实现人类水平的智能需要优化从快速、直觉的系统1到更为缓慢、更为深思熟虑的系统2推理的过渡。虽然系统1在快速的启发式决策方面表现出色，但系统2依赖逻辑推理来进行更准确的判断并减少偏见。基础大型语言模型(LLMs)擅长快速决策，但缺乏复杂推理的深度，因为它们尚未完全接受真正系统2思维特征的逐步分析。最近，像OpenAI的o1/o3和DeepSeek的R1等推理LLMs在数学和编码等领域展示了专家级表现，紧密模仿系统2的深思熟虑推理并展示了类似人类的认知能力。本调查从基础LLMs的进展和系统2技术的早期发展的简要概述开始，探讨它们的结合如何为推理LLMs铺平道路。接下来，我们讨论如何构建推理LLMs，分析它们的特点、实现高级推理的核心方法以及各种推理LLMs的演变。此外，我们提供推理基准的概述，深入比较代表性推理LLMs的性能。最后，我们探讨推进推理LLMs的有希望方向，并维护一个实时的GitHub存储库，跟踪最新进展。我们希望这项调查能成为一个有价值的资源，激发创新并推动这个快速发展领域的进步。

更新时间: 2025-06-25 02:24:46

领域: cs.AI

下载: http://arxiv.org/abs/2502.17419v6

A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression

Predictive maintenance (PdM) has become a crucial element of modern industrial practice. PdM plays a significant role in operational dependability and cost management by decreasing unforeseen downtime and optimizing asset life cycle management. Machine learning and deep learning have enabled more precise forecasts of equipment failure and remaining useful life (RUL). Although many studies have been conducted on PdM, there has not yet been a standalone comparative study between regression- and classification-based approaches. In this review, we look across a range of PdM methodologies, while focusing more strongly on the comparative use of classification and regression methods in prognostics. While regression-based methods typically provide estimates of RUL, classification-based methods present a forecast of the probability of failure across defined time intervals. Through a comprehensive analysis of recent literature, we highlight key advancements, challenges-such as data imbalance and high-dimensional feature spaces-and emerging trends, including hybrid approaches and AI-enabled prognostic systems. This review aims to provide researchers and practitioners with an awareness of the strengths and compromises of various PdM methods and to help identify future research and build more robust, directed adaptive maintenance systems. Future work may include a systematic review of practical aspects such as public datasets, benchmarking platforms, and open-source tools to support the advancement of PdM research.

Updated: 2025-06-25 02:22:23

标题: 预测性维护方法概览：通过分类和回归分析预测的研究

摘要: 预测性维护（PdM）已经成为现代工业实践中至关重要的元素。PdM通过减少意外停机和优化资产寿命周期管理，在运营可靠性和成本管理方面发挥着重要作用。机器学习和深度学习使设备故障和剩余寿命（RUL）的预测变得更加精确。尽管已经进行了许多关于PdM的研究，但尚未进行过分类和回归方法之间的独立比较研究。在这篇综述中，我们审视了一系列PdM方法，更加强调了分类和回归方法在预测中的比较应用。回归方法通常提供RUL的估计，而分类方法则提供了在定义的时间间隔内发生故障的概率预测。通过对最近文献的全面分析，我们突出了关键进展、挑战（如数据不平衡和高维特征空间）以及新兴趋势，包括混合方法和AI驱动的预测系统。这篇综述旨在让研究者和实践者了解各种PdM方法的优势和妥协，帮助他们确定未来研究方向，并构建更加健壮、有针对性的自适应维护系统。未来的工作可能包括对公共数据集、基准平台和开源工具等实际方面进行系统评估，以支持PdM研究的进展。

更新时间: 2025-06-25 02:22:23

领域: cs.LG

下载: http://arxiv.org/abs/2506.20090v1

Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

Supervised Quantum Machine Learning (QML) represents an intersection of quantum computing and classical machine learning, aiming to use quantum resources to support model training and inference. This paper reviews recent developments in supervised QML, focusing on methods such as variational quantum circuits, quantum neural networks, and quantum kernel methods, along with hybrid quantum-classical workflows. We examine recent experimental studies that show partial indications of quantum advantage and describe current limitations including noise, barren plateaus, scalability issues, and the lack of formal proofs of performance improvement over classical methods. The main contribution is a ten-year outlook (2025-2035) that outlines possible developments in supervised QML, including a roadmap describing conditions under which QML may be used in applied research and enterprise systems over the next decade.

Updated: 2025-06-25 02:08:22

标题: 监督的量子机器学习：从量子比特到企业应用的未来展望

摘要: 监督式量子机器学习（QML）代表了量子计算和经典机器学习的交叉点，旨在利用量子资源支持模型训练和推断。本文回顾了监督式QML的最新发展，重点关注变分量子电路、量子神经网络和量子核方法等方法，以及混合量子-经典工作流程。我们审查了最近的实验研究，显示了部分量子优势的迹象，并描述了当前的限制，包括噪声、贫瘠高原、可扩展性问题以及对经典方法的性能改进缺乏正式证明。主要贡献是对监督式QML可能发展的十年展望（2025年至2035年），包括描述在未来十年中QML可用于应用研究和企业系统的条件的路线图。

更新时间: 2025-06-25 02:08:22

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2505.24765v5

Turing Test 2.0: The General Intelligence Threshold

With the rise of artificial intelligence (A.I.) and large language models like ChatGPT, a new race for achieving artificial general intelligence (A.G.I) has started. While many speculate how and when A.I. will achieve A.G.I., there is no clear agreement on how A.G.I. can be detected in A.I. models, even when popular tools like the Turing test (and its modern variations) are used to measure their intelligence. In this work, we discuss why traditional methods like the Turing test do not suffice for measuring or detecting A.G.I. and provide a new, practical method that can be used to decide if a system (computer or any other) has reached or surpassed A.G.I. To achieve this, we make two new contributions. First, we present a clear definition for general intelligence (G.I.) and set a G.I. Threshold (G.I.T.) that can be used to distinguish between systems that achieve A.G.I. and systems that do not. Second, we present a new framework on how to construct tests that can detect if a system has achieved G.I. in a simple, comprehensive, and clear-cut fail/pass way. We call this novel framework the Turing test 2.0. We then demonstrate real-life examples of applying tests that follow our Turing test 2.0 framework on modern A.I. models.

Updated: 2025-06-25 01:55:54

标题: 图灵测试2.0：智能通用门槛

摘要: 随着人工智能（A.I.）和大型语言模型如ChatGPT的崛起，一个新的竞赛为实现人工通用智能（A.G.I）已经开始。尽管许多人猜测A.I.将如何以及何时实现A.G.I.，但对于A.G.I.如何在A.I.模型中被检测出来，甚至在使用图灵测试（及其现代变体）等流行工具来衡量其智能时，都没有明确的共识。在这项工作中，我们讨论了为什么传统方法如图灵测试不足以衡量或检测A.G.I.，并提出了一种新的实用方法，可用于判断系统（计算机或其他任何系统）是否已达到或超越A.G.I.。为了实现这一目标，我们做出了两个新的贡献。首先，我们提出了通用智能（G.I.）的明确定义，并设定了一个G.I.阈值（G.I.T.），可用于区分达到A.G.I.的系统和未达到A.G.I.的系统。其次，我们提出了一个新的框架，介绍了如何构建测试，以便简单、全面和明确的通过/不通过方式检测系统是否已实现G.I.。我们将这一新颖框架称为图灵测试2.0。然后，我们演示了应用遵循我们的图灵测试2.0框架的测试在现代A.I.模型上的实际示例。

更新时间: 2025-06-25 01:55:54

领域: cs.AI

下载: http://arxiv.org/abs/2505.19550v3

AIDRIN 2.0: A Framework to Assess Data Readiness for AI

AI Data Readiness Inspector (AIDRIN) is a framework to evaluate and improve data preparedness for AI applications. It addresses critical data readiness dimensions such as data quality, bias, fairness, and privacy. This paper details enhancements to AIDRIN by focusing on user interface improvements and integration with a privacy-preserving federated learning (PPFL) framework. By refining the UI and enabling smooth integration with decentralized AI pipelines, AIDRIN becomes more accessible and practical for users with varying technical expertise. Integrating with an existing PPFL framework ensures that data readiness and privacy are prioritized in federated learning environments. A case study involving a real-world dataset demonstrates AIDRIN's practical value in identifying data readiness issues that impact AI model performance.

Updated: 2025-06-25 01:49:52

标题: AIDRIN 2.0：用于评估AI数据准备性的框架

摘要: AI数据准备检查器（AIDRIN）是一个评估和改进为AI应用准备数据的框架。它解决了关键数据准备维度，如数据质量、偏见、公平性和隐私。本文详细介绍了对AIDRIN的增强，重点放在用户界面改进和与隐私保护的联邦学习（PPFL）框架的集成上。通过优化用户界面并实现与分散式AI管道的平滑集成，AIDRIN变得更加易于访问和实用，适用于具有不同技术专长的用户。与现有的PPFL框架集成确保在联邦学习环境中优先考虑数据准备和隐私。涉及真实数据集的案例研究展示了AIDRIN在识别影响AI模型性能的数据准备问题方面的实际价值。

更新时间: 2025-06-25 01:49:52

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2505.18213v2

Attack Smarter: Attention-Driven Fine-Grained Webpage Fingerprinting Attacks

Website Fingerprinting (WF) attacks aim to infer which websites a user is visiting by analyzing traffic patterns, thereby compromising user anonymity. Although this technique has been demonstrated to be effective in controlled experimental environments, it remains largely limited to small-scale scenarios, typically restricted to recognizing website homepages. In practical settings, however, users frequently access multiple subpages in rapid succession, often before previous content fully loads. WebPage Fingerprinting (WPF) generalizes the WF framework to large-scale environments by modeling subpages of the same site as distinct classes. These pages often share similar page elements, resulting in lower inter-class variance in traffic features. Furthermore, we consider multi-tab browsing scenarios, in which a single trace encompasses multiple categories of webpages. This leads to overlapping traffic segments, and similar features may appear in different positions within the traffic, thereby increasing the difficulty of classification. To address these challenges, we propose an attention-driven fine-grained WPF attack, named ADWPF. Specifically, during the training phase, we apply targeted augmentation to salient regions of the traffic based on attention maps, including attention cropping and attention masking. ADWPF then extracts low-dimensional features from both the original and augmented traffic and applies self-attention modules to capture the global contextual patterns of the trace. Finally, to handle the multi-tab scenario, we employ the residual attention to generate class-specific representations of webpages occurring at different temporal positions. Extensive experiments demonstrate that the proposed method consistently surpasses state-of-the-art baselines across datasets of different scales.

Updated: 2025-06-25 01:45:55

标题: 更聪明的攻击：基于注意力的细粒度网页指纹攻击

摘要: 网站指纹识别（WF）攻击旨在通过分析流量模式推断用户正在访问的网站，从而损害用户的匿名性。尽管这种技术在受控的实验环境中被证明是有效的，但在很大程度上仍然局限于小规模场景，通常仅限于识别网站首页。然而，在实际环境中，用户经常快速访问多个子页面，通常在先前内容完全加载之前。Web页面指纹识别（WPF）通过将同一站点的子页面建模为不同类别，将WF框架推广到大规模环境。这些页面通常共享相似的页面元素，导致流量特征的类间差异较低。此外，我们考虑多标签浏览场景，其中一个跟踪包含多个类别的网页。这导致重叠的流量段，并且相似的特征可能出现在流量的不同位置，从而增加分类的难度。为了解决这些挑战，我们提出了一种基于注意力的细粒度WPF攻击，称为ADWPF。具体而言，在训练阶段，我们根据注意力图对流量的显著区域进行有针对性的增强，包括注意力裁剪和注意力屏蔽。然后，ADWPF从原始和增强的流量中提取低维特征，并应用自注意力模块捕获跟踪的全局上下文模式。最后，为了处理多标签场景，我们采用残差注意力生成出现在不同时间位置的网页的特定于类别的表示。大量实验证明，所提出的方法在不同规模的数据集上始终优于最先进的基线。

更新时间: 2025-06-25 01:45:55

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2506.20082v1

SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization

Retrieval-Augmented Code Generation (RACG) is a critical technique for enhancing code generation by retrieving relevant information. In this work, we conduct an in-depth analysis of code retrieval by systematically masking specific features while preserving code functionality. Our discoveries include: (1) although trained on code, current retrievers heavily rely on surface-level textual features (e.g., docstrings, identifier names), and (2) they exhibit a strong bias towards well-documented code, even if the documentation is irrelevant.Based on our discoveries, we propose SACL, a framework that enriches textual information and reduces bias by augmenting code or structural knowledge with semantic information. Extensive experiments show that SACL substantially improves code retrieval (e.g., by 12.8% / 9.4% / 7.0% Recall@1 on HumanEval / MBPP / SWE-Bench-Lite), which also leads to better code generation performance (e.g., by 4.88% Pass@1 on HumanEval).

Updated: 2025-06-25 01:44:28

标题: SACL：使用语义增强的重新排名和本地化理解和对抗代码检索中的文本偏见

摘要: 检索增强代码生成（RACG）是通过检索相关信息来增强代码生成的关键技术。在这项工作中，我们通过系统地屏蔽特定特征而保留代码功能，对代码检索进行了深入分析。我们的发现包括：（1）尽管在代码上训练，当前的检索器严重依赖表面级文本特征（例如文档字符串，标识符名称），（2）它们对文档充分的代码表现出强烈偏见，即使文档是不相关的。基于我们的发现，我们提出了SACL，一个通过增加代码或结构知识的语义信息来丰富文本信息并减少偏见的框架。大量实验证明，SACL显著改善了代码检索（例如在HumanEval / MBPP / SWE-Bench-Lite上的Recall@1分别提高了12.8％/ 9.4％/ 7.0％），这也导致了更好的代码生成性能（例如在HumanEval上的Pass@1提高了4.88％）。

更新时间: 2025-06-25 01:44:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.20081v1

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

Large Language Models (LLMs) often generate responses with inherent biases, undermining their reliability in real-world applications. Existing evaluation methods often overlook biases in long-form responses and the intrinsic variability of LLM outputs. To address these challenges, we propose FiSCo(Fine-grained Semantic Computation), a novel statistical framework to evaluate group-level fairness in LLMs by detecting subtle semantic differences in long-form responses across demographic groups. Unlike prior work focusing on sentiment or token-level comparisons, FiSCo goes beyond surface-level analysis by operating at the claim level, leveraging entailment checks to assess the consistency of meaning across responses. We decompose model outputs into semantically distinct claims and apply statistical hypothesis testing to compare inter- and intra-group similarities, enabling robust detection of subtle biases. We formalize a new group counterfactual fairness definition and validate FiSCo on both synthetic and human-annotated datasets spanning gender, race, and age. Experiments show that FiSco more reliably identifies nuanced biases while reducing the impact of stochastic LLM variability, outperforming various evaluation metrics.

Updated: 2025-06-25 01:21:47

标题: 在LLMs中量化公平性：超越标记的语义和统计视角

摘要: 大型语言模型（LLMs）通常生成具有固有偏见的响应，从而削弱了它们在现实世界应用中的可靠性。现有的评估方法经常忽视长篇响应中的偏见和LLM输出的固有变化。为了解决这些挑战，我们提出了FiSCo（细粒度语义计算），这是一个新颖的统计框架，通过检测人口群体之间长篇响应中微妙的语义差异，来评估LLMs中的群体级公平性。与先前侧重于情感或令牌级比较的工作不同，FiSCo通过在声明级别操作，利用推理检查来评估响应中的意义一致性，超越表面层次的分析。我们将模型输出分解为语义上不同的声明，并应用统计假设检验来比较群体内和群体间的相似性，实现对微妙偏见的强大检测。我们形式化了一个新的群体反事实公平定义，并在涵盖性别、种族和年龄的合成和人工注释数据集上验证了FiSCo。实验证明，FiSco更可靠地识别微妙的偏见，同时减少了随机LLM变化的影响，优于各种评估指标。

更新时间: 2025-06-25 01:21:47

领域: cs.CL,cs.AI,cs.CY,68T50,I.2.7

下载: http://arxiv.org/abs/2506.19028v2

Federated Learning Clients Clustering with Adaptation to Data Drifts

Federated Learning (FL) trains deep models across edge devices without centralizing raw data, preserving user privacy. However, client heterogeneity slows down convergence and limits global model accuracy. Clustered FL (CFL) mitigates this by grouping clients with similar representations and training a separate model for each cluster. In practice, client data evolves over time, a phenomenon we refer to as data drift, which breaks cluster homogeneity and degrades performance. Data drift can take different forms depending on whether changes occur in the output values, the input features, or the relationship between them. We propose FIELDING, a CFL framework for handling diverse types of data drift with low overhead. FIELDING detects drift at individual clients and performs selective re-clustering to balance cluster quality and model performance, while remaining robust to malicious clients and varying levels of heterogeneity. Experiments show that FIELDING improves final model accuracy by 1.9-5.9% and achieves target accuracy 1.16x-2.23x faster than existing state-of-the-art CFL methods.

Updated: 2025-06-25 01:20:58

标题: 《适应数据漂移的联邦学习客户端聚类》

摘要: 联邦学习（FL）在边缘设备上训练深度模型，而无需集中原始数据，从而保护用户隐私。然而，客户端的异质性减慢了收敛速度并限制了全局模型的准确性。聚类FL（CFL）通过将具有相似表示的客户端分组并为每个集群训练一个单独的模型来缓解这一问题。在实践中，客户端数据随时间演变，这一现象我们称之为数据漂移，这会破坏集群的同质性并降低性能。数据漂移可以采取不同形式，具体取决于输出值、输入特征或它们之间的关系是否发生变化。我们提出了FIELDING，一个用于处理各种类型数据漂移的CFL框架，具有低开销。FIELDING在个别客户端检测漂移，并执行选择性重新聚类以平衡集群质量和模型性能，同时对恶意客户端和不同程度的异质性保持鲁棒性。实验表明，FIELDING将最终模型准确性提高了1.9-5.9％，并比现有最先进的CFL方法实现目标准确性的速度提高了1.16倍至2.23倍。

更新时间: 2025-06-25 01:20:58

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2411.01580v2

Quantum-Classical Hybrid Quantized Neural Network

Here in this work, we present a novel Quadratic Binary Optimization (QBO) model for quantized neural network training, enabling the use of arbitrary activation and loss functions through spline interpolation. We introduce Forward Interval Propagation (FIP), a method designed to tackle the challenges of non-linearity and the multi-layer composite structure in neural networks by discretizing activation functions into linear subintervals. This approach preserves the universal approximation properties of neural networks while allowing complex nonlinear functions to be optimized using quantum computers, thus broadening their applicability in artificial intelligence. We provide theoretical upper bounds on the approximation error and the number of Ising spins required, by deriving the sample complexity of the empirical risk minimization problem, from an optimization perspective. A significant challenge in solving the associated Quadratic Constrained Binary Optimization (QCBO) model on a large scale is the presence of numerous constraints. When employing the penalty method to handle these constraints, tuning a large number of penalty coefficients becomes a critical hyperparameter optimization problem, increasing computational complexity and potentially affecting solution quality. To address this, we employ the Quantum Conditional Gradient Descent (QCGD) algorithm, which leverages quantum computing to directly solve the QCBO problem. We prove the convergence of QCGD under a quantum oracle with randomness and bounded variance in objective value, as well as under limited precision constraints in the coefficient matrix. Additionally, we provide an upper bound on the Time-To-Solution for the QCBO solving process. Experimental results using a coherent Ising machine (CIM) demonstrate a 94.95% accuracy on the Fashion MNIST classification task, with only 1.1-bit precision.

Updated: 2025-06-25 01:01:03

标题: 量子经典混合量子化神经网络

摘要: 在这项工作中，我们提出了一种新颖的二次二进制优化（QBO）模型，用于量化神经网络训练，通过样条插值实现任意激活和损失函数的使用。我们引入了前向区间传播（FIP）方法，旨在通过将激活函数离散化为线性子区间来解决神经网络中的非线性和多层复合结构的挑战。这种方法在保留神经网络的通用逼近性质的同时，允许使用量子计算机优化复杂的非线性函数，从而扩大了它们在人工智能中的适用性。我们通过从优化角度推导经验风险最小化问题的样本复杂性，提供了逼近误差和所需Ising自旋数量的理论上界。在大规模解决相关的二次约束二进制优化（QCBO）模型时的一个重要挑战是存在大量约束。当采用罚函数方法处理这些约束时，调整大量罚系数变成了关键的超参数优化问题，增加了计算复杂性，可能影响解决方案质量。为了解决这个问题，我们采用了量子条件梯度下降（QCGD）算法，利用量子计算直接解决QCBO问题。我们证明了在具有随机性和目标值有界方差的量子预言条件下，QCGD的收敛性，以及在系数矩阵精度约束下的收敛性。此外，我们为QCBO求解过程的时间至解提供了一个上界。使用一台相干的Ising机器（CIM）的实验结果展示了在时尚MNIST分类任务上达到94.95%的准确度，仅使用1.1位精度。

更新时间: 2025-06-25 01:01:03

领域: cs.LG,cs.AI,physics.optics

下载: http://arxiv.org/abs/2506.18240v2

mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks

Large Language models (LLMs) have demonstrated impressive performance on a wide range of tasks, including in multimodal settings such as speech. However, their evaluation is often limited to English and a few high-resource languages. For low-resource languages, there is no standardized evaluation benchmark. In this paper, we address this gap by introducing mSTEB, a new benchmark to evaluate the performance of LLMs on a wide range of tasks covering language identification, text classification, question answering, and translation tasks on both speech and text modalities. We evaluated the performance of leading LLMs such as Gemini 2.0 Flash and GPT-4o (Audio) and state-of-the-art open models such as Qwen 2 Audio and Gemma 3 27B. Our evaluation shows a wide gap in performance between high-resource and low-resource languages, especially for languages spoken in Africa and Americas/Oceania. Our findings show that more investment is needed to address their under-representation in LLMs coverage.

Updated: 2025-06-25 00:58:19

标题: mSTEB：大规模多语言评估LLM在语音和文本任务上的表现

摘要: 大型语言模型(LLMs)在多模态设置中展示了令人印象深刻的性能，包括语音等各种任务。然而，它们的评估通常仅限于英语和少数资源丰富的语言。对于资源匮乏的语言，缺乏标准化评估基准。在本文中，我们通过引入mSTEB来填补这一空白，这是一个新的基准，用于评估LLMs在涵盖语言识别、文本分类、问答和翻译任务的广泛任务上的性能，涵盖了语音和文本模态。我们评估了Gemini 2.0 Flash和GPT-4o (音频)等领先的LLMs以及Qwen 2音频和Gemma 3 27B等最先进的开放模型的性能。我们的评估显示，在高资源语言和低资源语言之间存在明显的性能差距，特别是对于非洲和美洲/大洋洲的语言。我们的发现表明，需要更多的投资来解决它们在LLMs覆盖范围中的代表性不足问题。

更新时间: 2025-06-25 00:58:19

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2506.08400v2

A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs

Spatio-temporal data mining plays a pivotal role in informed decision making across diverse domains. However, existing models are often restricted to narrow tasks, lacking the capacity for multi-task inference and complex long-form reasoning that require generation of in-depth, explanatory outputs. These limitations restrict their applicability to real-world, multi-faceted decision scenarios. In this work, we introduce STReason, a novel framework that integrates the reasoning strengths of large language models (LLMs) with the analytical capabilities of spatio-temporal models for multi-task inference and execution. Without requiring task-specific finetuning, STReason leverages in-context learning to decompose complex natural language queries into modular, interpretable programs, which are then systematically executed to generate both solutions and detailed rationales. To facilitate rigorous evaluation, we construct a new benchmark dataset and propose a unified evaluation framework with metrics specifically designed for long-form spatio-temporal reasoning. Experimental results show that STReason significantly outperforms advanced LLM baselines across all metrics, particularly excelling in complex, reasoning-intensive spatio-temporal scenarios. Human evaluations further validate STReason's credibility and practical utility, demonstrating its potential to reduce expert workload and broaden the applicability to real-world spatio-temporal tasks. We believe STReason provides a promising direction for developing more capable and generalizable spatio-temporal reasoning systems.

Updated: 2025-06-25 00:55:34

标题: 一个模块化多任务推理框架，集成时空模型和LLMs

摘要: 空间-时间数据挖掘在不同领域的知情决策中起着至关重要的作用。然而，现有模型通常局限于狭窄任务，缺乏进行多任务推断和复杂长篇推理所需的生成深入解释性输出的能力。这些限制限制了它们在现实世界多方面决策场景中的适用性。在这项工作中，我们引入了STReason，一个新颖的框架，将大型语言模型（LLMs）的推理优势与空间-时间模型的分析能力集成，用于多任务推断和执行。STReason利用上下文学习，无需特定任务的微调，将复杂的自然语言查询分解为模块化、可解释的程序，然后系统地执行这些程序以生成解决方案和详细的理由。为了便于严格评估，我们构建了一个新的基准数据集，并提出了一个针对长篇空间-时间推理专门设计的度量的统一评估框架。实验结果表明，STReason在所有度量方面明显优于先进的LLM基线，特别擅长处理复杂、推理密集的空间-时间场景。人类评估进一步验证了STReason的可信度和实用性，展示了它减少专家工作量并扩大适用于现实世界空间-时间任务的潜力。我们相信STReason为开发更有能力和通用性的空间-时间推理系统提供了一个有前途的方向。

更新时间: 2025-06-25 00:55:34

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.20073v1

Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges

Pedestrian detection has become a cornerstone for several high-level tasks, including autonomous driving, intelligent transportation, and traffic surveillance. There are several works focussed on pedestrian detection using visible images, mainly in the daytime. However, this task is very intriguing when the environmental conditions change to poor lighting or nighttime. Recently, new ideas have been spurred to use alternative sources, such as Far InfraRed (FIR) temperature sensor feeds for detecting pedestrians in low-light conditions. This study reviews recent developments in low-light pedestrian detection approaches. It systematically categorizes and analyses various algorithms from region-based to non-region-based and graph-based learning methodologies by highlighting their methodologies, implementation issues, and challenges. It also outlines the key benchmark datasets that can be used for research and development of advanced pedestrian detection algorithms, particularly in low-light situations.

Updated: 2025-06-25 00:47:40

标题: 低光条件下可见光和红外图像中的行人检测：问题与挑战

摘要: 行人检测已经成为几项高级任务的基石，包括自动驾驶、智能交通和交通监控。有几项工作专注于使用可见图像进行行人检测，主要是在白天。然而，当环境条件转变为光线不足或夜间时，这项任务变得非常有趣。最近，新的想法被激发出来，使用替代来源，如远红外（FIR）温度传感器数据来检测光线不足条件下的行人。本研究回顾了低光条件下行人检测方法的最新发展。它系统地对各种算法进行分类和分析，从基于区域到非区域和基于图形的学习方法，突出它们的方法论、实施问题和挑战。它还概述了可用于研究和开发先进行人检测算法的关键基准数据集，特别是在低光情况下。

更新时间: 2025-06-25 00:47:40

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.08557v3

DriveBLIP2: Attention-Guided Explanation Generation for Complex Driving Scenarios

This paper introduces a new framework, DriveBLIP2, built upon the BLIP2-OPT architecture, to generate accurate and contextually relevant explanations for emerging driving scenarios. While existing vision-language models perform well in general tasks, they encounter difficulties in understanding complex, multi-object environments, particularly in real-time applications such as autonomous driving, where the rapid identification of key objects is crucial. To address this limitation, an Attention Map Generator is proposed to highlight significant objects relevant to driving decisions within critical video frames. By directing the model's focus to these key regions, the generated attention map helps produce clear and relevant explanations, enabling drivers to better understand the vehicle's decision-making process in critical situations. Evaluations on the DRAMA dataset reveal significant improvements in explanation quality, as indicated by higher BLEU, ROUGE, CIDEr, and SPICE scores compared to baseline models. These findings underscore the potential of targeted attention mechanisms in vision-language models for enhancing explainability in real-time autonomous driving.

Updated: 2025-06-25 00:46:38

标题: DriveBLIP2：用于复杂驾驶场景的注意力引导解释生成

摘要: 这篇论文介绍了一个新的框架，DriveBLIP2，基于BLIP2-OPT架构构建，用于为新兴的驾驶场景生成准确且与上下文相关的解释。虽然现有的视觉语言模型在一般任务中表现良好，但它们在理解复杂的多对象环境方面遇到困难，特别是在实时应用中，如自动驾驶，其中快速识别关键对象至关重要。为了解决这一局限性，提出了一个注意力地图生成器，用于在关键视频帧中突出显示与驾驶决策相关的重要对象。通过将模型的焦点引向这些关键区域，生成的注意力地图有助于产生清晰和相关的解释，使驾驶员更好地理解车辆在关键情况下的决策过程。在DRAMA数据集上的评估显示，与基准模型相比，解释质量有显著改善，这表现在更高的BLEU、ROUGE、CIDEr和SPICE得分。这些发现强调了针对性注意力机制在视觉语言模型中增强实时自动驾驶可解释性的潜力。

更新时间: 2025-06-25 00:46:38

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2506.22494v1

Computation Mechanism Behind LLM Position Generalization

Most written natural languages are composed of sequences of words and sentences. Similar to humans, large language models (LLMs) exhibit flexibility in handling textual positions - a phenomenon we term position generalization. They can understand texts with position perturbations and generalize to longer texts than those encountered during training with the latest techniques. These phenomena suggest that LLMs handle positions tolerantly, but how LLMs computationally process positional relevance remains largely unexplored. This work connects the linguistic phenomenon with LLMs' computational mechanisms. We show how LLMs enforce certain computational mechanisms for the aforementioned tolerance in position perturbations. Despite the complex design of the self-attention mechanism, this work reveals that LLMs learn a counterintuitive disentanglement of attention logits. Their values show a 0.959 linear correlation with an approximation of the arithmetic sum of positional relevance and semantic importance. Furthermore, we identify a prevalent pattern in intermediate features, which we prove theoretically enables this effect. The pattern, which is different from how randomly initialized parameters would behave, suggests that it is a learned behavior rather than a natural result of the model architecture. Based on these findings, we provide computational explanations and criteria for LLMs' position flexibilities. This work takes a pioneering step in linking position generalization with modern LLMs' internal mechanisms.

Updated: 2025-06-25 00:26:59

标题: LLM位置泛化背后的计算机制

摘要: 大多数书面自然语言由单词和句子序列组成。类似于人类，大型语言模型（LLMs）展现出处理文本位置的灵活性，我们称之为位置泛化现象。它们可以理解带有位置扰动的文本，并且能够推广到比训练中遇到的更长的文本。这些现象表明LLMs容忍地处理位置，但是LLMs如何计算处理位置相关性仍然大部分未被探索。本研究将语言现象与LLMs的计算机制连接起来。我们展示了LLMs如何强制执行某些计算机制以实现前述位置扰动的容忍性。尽管自注意机制的设计复杂，但本研究揭示了LLMs学习到了一种直觉反直觉的注意力对数解缠。它们的值与位置相关性和语义重要性的近似算术和的线性相关性为0.959。此外，我们在中间特征中识别了一种普遍模式，我们从理论上证明了这种效应是如何实现的。这种模式与随机初始化参数的行为不同，表明这是一种学习行为，而不是模型架构的自然结果。基于这些发现，我们提供了LLMs位置灵活性的计算解释和标准。这项工作在将位置泛化与现代LLMs的内部机制联系起来迈出了开创性的一步。

更新时间: 2025-06-25 00:26:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.13305v3

Multimodal Information Retrieval for Open World with Edit Distance Weak Supervision

Existing multi-media retrieval models either rely on creating a common subspace with modality-specific representation models or require schema mapping among modalities to measure similarities among multi-media data. Our goal is to avoid the annotation overhead incurred from considering retrieval as a supervised classification task and re-use the pretrained encoders in large language models and vision tasks. We propose "FemmIR", a framework to retrieve multimodal results relevant to information needs expressed with multimodal queries by example without any similarity label. Such identification is necessary for real-world applications where data annotations are scarce and satisfactory performance is required without fine-tuning with a common framework across applications. We curate a new dataset called MuQNOL for benchmarking progress on this task. Our technique is based on weak supervision introduced through edit distance between samples: graph edit distance can be modified to consider the cost of replacing a data sample in terms of its properties, and relevance can be measured through the implicit signal from the amount of edit cost among the objects. Unlike metric learning or encoding networks, FemmIR re-uses the high-level properties and maintains the property value and relationship constraints with a multi-level interaction score between data samples and the query example provided by the user. We empirically evaluate FemmIR on a missing person use case with MuQNOL. FemmIR performs comparably to similar retrieval systems in delivering on-demand retrieval results with exact and approximate similarities while using the existing property identifiers in the system.

Updated: 2025-06-25 00:25:08

标题: 多模态信息检索在编辑距离弱监督下的开放世界中的应用

摘要: 现有的多媒体检索模型要么依赖于使用特定模态表示模型创建一个公共子空间，要么需要在模态之间进行模式映射以衡量多媒体数据之间的相似性。我们的目标是避免考虑检索作为监督分类任务而产生的注释开销，并重复使用大型语言模型和视觉任务中预训练的编码器。我们提出了“FemmIR”，一个用于检索与通过示例表达的多模态查询相关的多模态结果的框架，而无需任何相似性标签。在数据注释稀缺且需要在应用程序之间不经过微调的通用框架上获得令人满意的性能的实际应用中，这种识别是必要的。我们策划了一个名为MuQNOL的新数据集，用于对该任务的进展进行基准测试。我们的技术基于通过样本之间的编辑距离引入的弱监督：图编辑距离可以修改以考虑替换数据样本的成本，以及可以通过对象之间的编辑成本量的隐式信号来衡量相关性。与度量学习或编码网络不同，FemmIR重复使用高级属性，并通过用户提供的查询示例与数据样本之间的多层交互分数来维护属性值和关系约束。我们在MuQNOL的失踪人员使用案例中对FemmIR进行了实证评估。FemmIR在提供按需检索结果时，使用系统中现有的属性标识符，表现出与类似检索系统相当的准确和近似相似性。

更新时间: 2025-06-25 00:25:08

领域: cs.IR,cs.LG,cs.MM

下载: http://arxiv.org/abs/2506.20070v1

Thought Anchors: Which LLM Reasoning Steps Matter?

Reasoning large language models have recently achieved state-of-the-art performance in many fields. However, their long-form chain-of-thought reasoning creates interpretability challenges as each generated token depends on all previous ones, making the computation harder to decompose. We argue that analyzing reasoning traces at the sentence level is a promising approach to understanding reasoning processes. We present three complementary attribution methods: (1) a black-box method measuring each sentence's counterfactual importance by comparing final answers across 100 rollouts conditioned on the model generating that sentence or one with a different meaning; (2) a white-box method of aggregating attention patterns between pairs of sentences, which identified "broadcasting" sentences that receive disproportionate attention from all future sentences via "receiver" attention heads; (3) a causal attribution method measuring logical connections between sentences by suppressing attention toward one sentence and measuring the effect on each future sentence's tokens. Each method provides evidence for the existence of thought anchors, reasoning steps that have outsized importance and that disproportionately influence the subsequent reasoning process. These thought anchors are typically planning or backtracking sentences. We provide an open-source tool (www.thought-anchors.com) for visualizing the outputs of our methods, and present a case study showing converging patterns across methods that map how a model performs multi-step reasoning. The consistency across methods demonstrates the potential of sentence-level analysis for a deeper understanding of reasoning models.

Updated: 2025-06-25 00:18:53

标题: 思维锚：LLM 推理步骤的重要性是什么？

摘要: 最近，推理大型语言模型在许多领域取得了最先进的性能。然而，它们的长篇连续思考推理引发了解释性挑战，因为每个生成的标记都取决于所有先前的标记，使计算难以分解。我们认为，在句子级别分析推理痕迹是理解推理过程的一种有前途的方法。我们提出了三种互补的归因方法：（1）一种黑盒方法，通过比较在生成该句子的模型或具有不同含义的模型上的100次滚动之后的最终答案，来衡量每个句子的反事实重要性；（2）一种白盒方法，聚合两个句子之间的注意模式，识别“广播”句子，它们通过“接收者”注意头从所有后续句子中获得不成比例的关注；（3）一种因果归因方法，通过抑制对一个句子的关注并测量对每个后续句子的标记的影响来衡量句子之间的逻辑连接。每种方法提供了思维锚存在的证据，这些推理步骤具有过大的重要性，并且影响着随后的推理过程。这些思考锚通常是规划或回溯句子。我们提供一个开源工具（www.thought-anchors.com）来可视化我们方法的输出，并展示了一个案例研究，展示了跨方法之间的收敛模式，展示了模型如何执行多步推理。方法之间的一致性表明句子级别分析对于更深入理解推理模型具有潜力。

更新时间: 2025-06-25 00:18:53

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.19143v2

Conformal Prediction with Upper and Lower Bound Models

This paper studies a Conformal Prediction (CP) methodology for building prediction intervals in a regression setting, given only deterministic lower and upper bounds on the target variable. It proposes a new CP mechanism (CPUL) that goes beyond post-processing by adopting a model selection approach over multiple nested interval construction methods. Paradoxically, many well-established CP methods, including CPUL, may fail to provide adequate coverage in regions where the bounds are tight. To remedy this limitation, the paper proposes an optimal thresholding mechanism, OMLT, that adjusts CPUL intervals in tight regions with undercoverage. The combined CPUL-OMLT is validated on large-scale learning tasks where the goal is to bound the optimal value of a parametric optimization problem. The experimental results demonstrate substantial improvements over baseline methods across various datasets.

Updated: 2025-06-25 00:04:42

标题: 使用上下界模型的合一预测

摘要: 这篇论文研究了一种适用于回归设置中构建预测区间的符合预测（CP）方法，仅给出目标变量的确定性下限和上限。它提出了一种新的CP机制（CPUL），通过采用模型选择方法，超越后处理，覆盖了多个嵌套区间构建方法。矛盾的是，许多成熟的CP方法，包括CPUL，在边界紧密的区域可能无法提供足够的覆盖范围。为了解决这一限制，本文提出了一种最优阈值机制OMLT，该机制在覆盖不足的紧密区域调整CPUL区间。结合CPUL-OMLT在大规模学习任务上进行验证，目标是限制参数优化问题的最优值。实验结果表明，在各种数据集上，与基准方法相比，CPUL-OMLT取得了显著的改进。

更新时间: 2025-06-25 00:04:42

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.04071v2