Arxiv Day: Article

ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning

Event stream based scene text recognition is a newly arising research topic in recent years which performs better than the widely used RGB cameras in extremely challenging scenarios, especially the low illumination, fast motion. Existing works either adopt end-to-end encoder-decoder framework or large language models for enhanced recognition, however, they are still limited by the challenges of insufficient interpretability and weak contextual logical reasoning. In this work, we propose a novel chain-of-thought reasoning based event stream scene text recognition framework, termed ESTR-CoT. Specifically, we first adopt the vision encoder EVA-CLIP (ViT-G/14) to transform the input event stream into tokens and utilize a Llama tokenizer to encode the given generation prompt. A Q-former is used to align the vision token to the pre-trained large language model Vicuna-7B and output both the answer and chain-of-thought (CoT) reasoning process simultaneously. Our framework can be optimized using supervised fine-tuning in an end-to-end manner. In addition, we also propose a large-scale CoT dataset to train our framework via a three stage processing (i.e., generation, polish, and expert verification). This dataset provides a solid data foundation for the development of subsequent reasoning-based large models. Extensive experiments on three event stream STR benchmark datasets (i.e., EventSTR, WordArt*, IC15*) fully validated the effectiveness and interpretability of our proposed framework. The source code and pre-trained models will be released on https://github.com/Event-AHU/ESTR-CoT.

Updated: 2025-07-02 23:41:31

标题: ESTR-CoT：基于思维链推理的可解释和准确的基于事件流的场景文本识别

摘要: 基于事件流的场景文本识别是近年来新兴的研究主题，在极具挑战的情景中表现比广泛使用的RGB摄像头更好，特别是在低照明、快速移动的情况下。现有的工作要么采用端到端编码器-解码器框架，要么采用大型语言模型进行增强识别，然而，它们仍然受到解释性不足和弱上下文逻辑推理挑战的限制。在本研究中，我们提出了一种基于链式思维推理的事件流场景文本识别框架，称为ESTR-CoT。具体来说，我们首先采用视觉编码器EVA-CLIP（ViT-G/14）将输入的事件流转换为标记，并利用Llama分词器对给定的生成提示进行编码。使用Q-former来将视觉标记与预训练的大型语言模型Vicuna-7B对齐，并同时输出答案和链式思维（CoT）推理过程。我们的框架可以通过监督微调进行端到端优化。此外，我们还提出了一个大规模的CoT数据集，通过三个阶段的处理（即生成、打磨和专家验证）来训练我们的框架。这个数据集为后续基于推理的大型模型的发展提供了坚实的数据基础。对三个事件流STR基准数据集（即EventSTR、WordArt*、IC15*）的广泛实验充分验证了我们提出的框架的有效性和解释性。源代码和预训练模型将在https://github.com/Event-AHU/ESTR-CoT 上发布。

更新时间: 2025-07-02 23:41:31

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.02200v1

Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer

Chain-of-thought (CoT) reasoning has enabled transformer-based language models to excel at complex mathematics and multi-step planning. However, in standard decoder-only architectures, these reasoning steps are externalized in natural language, improving interpretability at the cost of efficiency. To capture reasoning that is not easily represented in words, many works have explored recurrent architectures that aim to internalize reasoning in latent space, potentially supporting latent CoT. In this paper, we investigate whether such reasoning structures emerge in Huginn-3.5B, a depth-recurrent Transformer that reuses layers at inference time without increasing parameter count. We examine the model's internal behavior on arithmetic tasks using a suite of probing techniques including the Logit Lens and Coda Lens. Our findings reveal limited evidence of interpretable latent CoT by tracking rank trajectories of final and intermediate result tokens. Furthermore, we uncover significant probing inconsistencies across recurrent blocks, where the interpretability of hidden states depends heavily on both the layer index and the decoding method. Finally, we empirically show that increasing recurrence depth yields only marginal gains and falls well short of models that explicitly externalize reasoning steps. The code is available at https://github.com/wenquanlu/huginn-latent-cot.

Updated: 2025-07-02 23:35:21

标题: 潜在的思维链条？解码深度循环变压器

摘要: 链式思维（CoT）推理使基于变压器的语言模型在复杂数学和多步规划方面表现出色。然而，在标准的仅解码器架构中，这些推理步骤以自然语言外部化，提高了可解释性，但效率却受到影响。为了捕捉不容易用词语表示的推理过程，许多作品探索了旨在在潜在空间内部化推理的递归架构，可能支持潜在的CoT。在本文中，我们研究了Huginn-3.5B中是否会出现这样的推理结构，Huginn-3.5B是一种深度递归变压器，在推理时重复使用层而不增加参数数量。我们使用一系列探测技术（包括Logit Lens和Coda Lens）检验了模型在算术任务上的内部行为。我们的研究结果揭示了有限的可解释潜在CoT的证据，通过跟踪最终和中间结果标记的排名轨迹。此外，我们发现在递归块之间存在显著的探测不一致性，隐藏状态的可解释性严重依赖于层索引和解码方法。最后，我们经验性地表明增加递归深度只会带来较小的增益，远远不及明确外部化推理步骤的模型。代码可在https://github.com/wenquanlu/huginn-latent-cot找到。

更新时间: 2025-07-02 23:35:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02199v1

How Good LLM-Generated Password Policies Are?

Generative AI technologies, particularly Large Language Models (LLMs), are rapidly being adopted across industry, academia, and government sectors, owing to their remarkable capabilities in natural language processing. However, despite their strengths, the inconsistency and unpredictability of LLM outputs present substantial challenges, especially in security-critical domains such as access control. One critical issue that emerges prominently is the consistency of LLM-generated responses, which is paramount for ensuring secure and reliable operations. In this paper, we study the application of LLMs within the context of Cybersecurity Access Control Systems. Specifically, we investigate the consistency and accuracy of LLM-generated password policies, translating natural language prompts into executable pwquality.conf configuration files. Our experimental methodology adopts two distinct approaches: firstly, we utilize pre-trained LLMs to generate configuration files purely from natural language prompts without additional guidance. Secondly, we provide these models with official pwquality.conf documentation to serve as an informative baseline. We systematically assess the soundness, accuracy, and consistency of these AI-generated configurations. Our findings underscore significant challenges in the current generation of LLMs and contribute valuable insights into refining the deployment of LLMs in Access Control Systems.

Updated: 2025-07-02 23:34:36

标题: LLM生成的密码策略有多好？

摘要: 生成式AI技术，特别是大型语言模型（LLMs），正在工业、学术和政府部门迅速被采用，这归功于它们在自然语言处理方面的显著能力。然而，尽管它们的优势，LLM输出的不一致性和不可预测性在安全关键领域（如访问控制）中提出了重大挑战。一个突出的关键问题是LLM生成的响应的一致性，这对确保安全和可靠的运行至关重要。本文研究了LLMs在网络安全访问控制系统中的应用。具体地，我们调查了LLM生成的密码策略的一致性和准确性，将自然语言提示转换为可执行的pwquality.conf配置文件。我们的实验方法采用了两种不同的方法：首先，我们利用预训练的LLMs纯粹从自然语言提示生成配置文件，而无需额外指导。其次，我们为这些模型提供官方的pwquality.conf文档，作为信息基准。我们系统地评估了这些AI生成的配置的合理性、准确性和一致性。我们的研究结果强调了当前一代LLMs中存在的重要挑战，并为改进LLMs在访问控制系统中的部署提供了宝贵的见解。

更新时间: 2025-07-02 23:34:36

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2506.08320v2

Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust

As LLMs are increasingly studied as role-playing agents to generate synthetic data for human behavioral research, ensuring that their outputs remain coherent with their assigned roles has become a critical concern. In this paper, we investigate how consistently LLM-based role-playing agents' stated beliefs about the behavior of the people they are asked to role-play ("what they say") correspond to their actual behavior during role-play ("how they act"). Specifically, we establish an evaluation framework to rigorously measure how well beliefs obtained by prompting the model can predict simulation outcomes in advance. Using an augmented version of the GenAgents persona bank and the Trust Game (a standard economic game used to quantify players' trust and reciprocity), we introduce a belief-behavior consistency metric to systematically investigate how it is affected by factors such as: (1) the types of beliefs we elicit from LLMs, like expected outcomes of simulations versus task-relevant attributes of individual characters LLMs are asked to simulate; (2) when and how we present LLMs with relevant information about Trust Game; and (3) how far into the future we ask the model to forecast its actions. We also explore how feasible it is to impose a researcher's own theoretical priors in the event that the originally elicited beliefs are misaligned with research objectives. Our results reveal systematic inconsistencies between LLMs' stated (or imposed) beliefs and the outcomes of their role-playing simulation, at both an individual- and population-level. Specifically, we find that, even when models appear to encode plausible beliefs, they may fail to apply them in a consistent way. These findings highlight the need to identify how and when LLMs' stated beliefs align with their simulated behavior, allowing researchers to use LLM-based agents appropriately in behavioral studies.

Updated: 2025-07-02 23:30:51

标题: 角色扮演代理是否言行一致？基于LLM的人类信任模拟中的信念行为一致性

摘要: 随着LLM越来越被研究作为角色扮演代理以生成人类行为研究的合成数据，确保它们的输出与其所分配的角色保持一致已成为一个关键问题。本文中，我们调查了基于LLM的角色扮演代理关于被要求扮演的人的行为的陈述信念（“他们说什么”）与角色扮演过程中的实际行为（“他们如何行动”）之间的一致性。具体来说，我们建立了一个评估框架，严格衡量了通过提示模型获得的信念如何预测模拟结果。使用GenAgents人物库的增强版本和信任游戏（用于量化玩家信任和互惠的标准经济游戏），我们引入了信念-行为一致性度量标准，系统地研究了它如何受到各种因素的影响，如：(1)我们从LLM中引出的信念类型，如模拟预期结果与要求LLM模拟的个体角色的任务相关属性；(2)何时以及如何向LLM展示与信任游戏相关的信息；以及(3)我们要求模型预测其行为的时间跨度。我们还探讨了在原始引出的信念与研究目标不一致时，强加研究人员自己的理论先验的可行性。我们的结果揭示了LLM的陈述（或强加的）信念与其角色扮演模拟结果之间的系统不一致性，无论是在个体还是在整体水平上。具体来说，我们发现，即使模型似乎编码了合理的信念，它们也可能无法以一致的方式应用这些信念。这些发现突出了确定LLM的陈述信念何时与其模拟行为一致的必要性，使研究人员能够适当地在行为研究中使用基于LLM的代理。

更新时间: 2025-07-02 23:30:51

领域: cs.AI

下载: http://arxiv.org/abs/2507.02197v1

Temporally Consistent Koopman Autoencoders for Forecasting Dynamical Systems

Absence of sufficiently high-quality data often poses a key challenge in data-driven modeling of high-dimensional spatio-temporal dynamical systems. Koopman Autoencoders (KAEs) harness the expressivity of deep neural networks (DNNs), the dimension reduction capabilities of autoencoders, and the spectral properties of the Koopman operator to learn a reduced-order feature space with simpler, linear dynamics. However, the effectiveness of KAEs is hindered by limited and noisy training datasets, leading to poor generalizability. To address this, we introduce the temporally consistent Koopman autoencoder (tcKAE), designed to generate accurate long-term predictions even with limited and noisy training data. This is achieved through a consistency regularization term that enforces prediction coherence across different time steps, thus enhancing the robustness and generalizability of tcKAE over existing models. We provide analytical justification for this approach based on Koopman spectral theory and empirically demonstrate tcKAE's superior performance over state-of-the-art KAE models across a variety of test cases, including simple pendulum oscillations, kinetic plasma, and fluid flow data.

Updated: 2025-07-02 23:12:52

标题: 一致的Koopman自编码器在预测动态系统中的应用

摘要: 缺乏高质量数据通常是高维时空动态系统数据驱动建模中的关键挑战。Koopman自编码器(KAEs)利用深度神经网络(DNNs)的表现力、自编码器的降维能力以及Koopman算子的谱特性，学习一个简化的特征空间，具有更简单的线性动力学。然而，KAEs的有效性受限于有限且嘈杂的训练数据集，导致泛化能力差。为了解决这个问题，我们引入了具有时间一致性的Koopman自编码器(tcKAE)，旨在生成准确的长期预测，即使只有有限且嘈杂的训练数据。通过一个一致性正则化项，强制在不同时间步上预测保持一致，从而增强tcKAE相对于现有模型的稳健性和泛化能力。我们基于Koopman谱理论提供了这种方法的分析理由，并在各种测试案例中，包括简单摆动、动能等离子体和流体流动数据中，实证证明了tcKAE相对于最先进的KAE模型的卓越性能。

更新时间: 2025-07-02 23:12:52

领域: cs.LG

下载: http://arxiv.org/abs/2403.12335v3

AirV2X: Unified Air-Ground Vehicle-to-Everything Collaboration

While multi-vehicular collaborative driving demonstrates clear advantages over single-vehicle autonomy, traditional infrastructure-based V2X systems remain constrained by substantial deployment costs and the creation of "uncovered danger zones" in rural and suburban areas. We present AirV2X-Perception, a large-scale dataset that leverages Unmanned Aerial Vehicles (UAVs) as a flexible alternative or complement to fixed Road-Side Units (RSUs). Drones offer unique advantages over ground-based perception: complementary bird's-eye-views that reduce occlusions, dynamic positioning capabilities that enable hovering, patrolling, and escorting navigation rules, and significantly lower deployment costs compared to fixed infrastructure. Our dataset comprises 6.73 hours of drone-assisted driving scenarios across urban, suburban, and rural environments with varied weather and lighting conditions. The AirV2X-Perception dataset facilitates the development and standardized evaluation of Vehicle-to-Drone (V2D) algorithms, addressing a critical gap in the rapidly expanding field of aerial-assisted autonomous driving systems. The dataset and development kits are open-sourced at https://github.com/taco-group/AirV2X-Perception.

Updated: 2025-07-02 23:07:24

标题: AirV2X：统一的空地车辆一切协作

摘要: 虽然多车协同驾驶显示出明显优势，但传统基础设施的V2X系统仍受到部署成本高昂和在农村和郊区地区创建"未覆盖危险区域"的限制。我们提出了AirV2X-Perception，这是一个利用无人机作为固定路侧单元（RSU）的灵活替代品或补充的大规模数据集。无人机相对于地面感知具有独特优势：互补的鸟瞰视角减少遮挡，动态定位能力使其能够悬停，巡逻和护航导航规则，并且与固定基础设施相比，部署成本显著降低。我们的数据集包括穿越城市、郊区和农村环境的6.73小时无人机辅助驾驶场景，涵盖了各种天气和照明条件。AirV2X-Perception数据集有助于开发和标准化评估车辆到无人机（V2D）算法，在快速扩展的空中辅助自动驾驶系统领域填补了重要空白。该数据集和开发工具包在https://github.com/taco-group/AirV2X-Perception 上开源。

更新时间: 2025-07-02 23:07:24

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2506.19283v3

cVLA: Towards Efficient Camera-Space VLAs

Vision-Language-Action (VLA) models offer a compelling framework for tackling complex robotic manipulation tasks, but they are often expensive to train. In this paper, we propose a novel VLA approach that leverages the competitive performance of Vision Language Models (VLMs) on 2D images to directly infer robot end-effector poses in image frame coordinates. Unlike prior VLA models that output low-level controls, our model predicts trajectory waypoints, making it both more efficient to train and robot embodiment agnostic. Despite its lightweight design, our next-token prediction architecture effectively learns meaningful and executable robot trajectories. We further explore the underutilized potential of incorporating depth images, inference-time techniques such as decoding strategies, and demonstration-conditioned action generation. Our model is trained on a simulated dataset and exhibits strong sim-to-real transfer capabilities. We evaluate our approach using a combination of simulated and real data, demonstrating its effectiveness on a real robotic system.

Updated: 2025-07-02 22:56:41

标题: cVLA：朝着高效的相机空间VLAs

摘要: 视觉-语言-动作（VLA）模型为解决复杂的机器人操作任务提供了一个引人注目的框架，但往往训练成本高昂。在本文中，我们提出了一种新颖的VLA方法，利用视觉语言模型（VLMs）在2D图像上的竞争性表现，直接推断图像坐标系中的机器人末端执行器姿态。与先前的VLA模型输出低级控制不同，我们的模型预测轨迹路径点，使其更加高效训练和机器人体现无关。尽管设计轻量级，我们的下一个标记预测架构有效地学习了有意义且可执行的机器人轨迹。我们进一步探索了深度图像、推断时间技术（如解码策略）和演示条件动作生成等潜在的利用可能性。我们的模型在模拟数据集上进行训练，并展示了强大的模拟到实际的转移能力。我们使用模拟和真实数据的组合评估我们的方法，展示了其在真实机器人系统上的有效性。

更新时间: 2025-07-02 22:56:41

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2507.02190v1

Distinguishing Predictive and Generative AI in Regulation

Over the past decade, policymakers have developed a set of regulatory tools to ensure AI development aligns with key societal goals. Many of these tools were initially developed in response to concerns with predictive AI and therefore encode certain assumptions about the nature of AI systems and the utility of certain regulatory approaches. With the advent of generative AI, however, some of these assumptions no longer hold, even as policymakers attempt to maintain a single regulatory target that covers both types of AI. In this paper, we identify four distinct aspects of generative AI that call for meaningfully different policy responses. These are the generality and adaptability of generative AI that make it a poor regulatory target, the difficulty of designing effective evaluations, new legal concerns that change the ecosystem of stakeholders and sources of expertise, and the distributed structure of the generative AI value chain. In light of these distinctions, policymakers will need to evaluate where the past decade of policy work remains relevant and where new policies, designed to address the unique risks posed by generative AI, are necessary. We outline three recommendations for policymakers to more effectively identify regulatory targets and leverage constraints across the broader ecosystem to govern generative AI.

Updated: 2025-07-02 22:50:44

标题: 区分预测性和生成性人工智能在监管中的应用

摘要: 在过去的十年里，决策者们已经制定了一套监管工具，以确保人工智能发展与关键社会目标保持一致。许多这些工具最初是针对预测性人工智能引发的担忧而制定的，因此对人工智能系统的性质和某些监管方法的效用做出了一些假设。然而，随着生成式人工智能的出现，一些这些假设不再成立，尽管决策者们试图维持一个涵盖两种类型人工智能的单一监管目标。在本文中，我们确定了生成式人工智能的四个不同方面，需要有意义地不同的政策应对。这些方面包括生成式人工智能的普遍性和适应性使其成为一个不良的监管目标，设计有效评估的困难，改变利益相关者和专业来源生态系统的新法律问题，以及生成式人工智能价值链的分布结构。鉴于这些区别，决策者们需要评估过去十年的政策工作在何处仍然相关，以及需要哪些新的政策来应对生成式人工智能带来的独特风险。我们提出了三项建议，供决策者更有效地确定监管目标，并利用整个生态系统中的约束来管理生成式人工智能。

更新时间: 2025-07-02 22:50:44

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2506.17347v2

KAIROS: Scalable Model-Agnostic Data Valuation

Training data increasingly shapes not only model accuracy but also regulatory compliance and market valuation of AI assets. Yet existing valuation methods remain inadequate: model-based techniques depend on a single fitted model and inherit its biases, while algorithm-based approaches such as Data Shapley require costly retrainings at web scale. Recent Wasserstein-based model-agnostic methods rely on approximations that misrank examples relative to their true leave-one-out (LOO) utility. We introduce KAIROS, a scalable, model-agnostic valuation framework that assigns each example a distributional influence score: its contribution to the Maximum Mean Discrepancy (MMD) between the empirical training distribution and a clean reference set. Unlike Wasserstein surrogates, our MMD-based influence admits a closed-form solution that faithfully approximates the exact LOO ranking within $O(1/N^2)$ error, requires no retraining, and naturally extends to conditional kernels for unified label- and feature-error detection. Moreover, KAIROS supports efficient online updates: when a new batch of size m arrives, all scores can be updated in $O(mN)$ time, delivering up to 50x speedup without compromising ranking quality. Empirical evaluations on noise, mislabeling, and poisoning benchmarks show that KAIROS consistently outperforms state-of-the-art model-, Shapley-, and Wasserstein-based baselines in both accuracy and runtime. We provide rigorous theoretical guarantees, including symmetry for reproducible rankings and density-separation for interpretable thresholds.

Updated: 2025-07-02 22:50:21

标题: KAIROS：可扩展的与模型无关的数据价值评估

摘要: 培训数据越来越不仅塑造了模型的准确性，还影响了AI资产的监管合规性和市场估值。然而，现有的估值方法仍然不足：基于模型的技术依赖于一个拟合模型，并继承其偏见，而基于算法的方法，如Data Shapley，需要在网络规模上进行昂贵的重新训练。最近基于Wasserstein的模型无关方法依赖于近似，导致对其真实的留一法（LOO）效用排序进行错误排序。我们引入了KAIROS，一个可扩展的，模型无关的估值框架，为每个示例分配一个分布影响分数：其对经验训练分布和干净参考集之间的最大平均差异（MMD）的贡献。与Wasserstein代理不同，我们基于MMD的影响力具有一个闭合形式的解决方案，忠实地近似了确切的LOO排序，其误差为$O(1/N^2)$，无需重新训练，并自然地扩展到条件核心，用于统一标签和特征错误检测。此外，KAIROS支持高效的在线更新：当一个新的大小为m的批次到达时，所有分数可以在$O(mN)$的时间内更新，提供高达50倍的加速，而不影响排序质量。在噪声、错误标记和中毒基准上的实证评估表明，KAIROS在准确性和运行时间方面始终优于最先进的基于模型、Shapley和Wasserstein的基线。我们提供严格的理论保证，包括对可重复排名的对称性和可解释阈值的密度分离。

更新时间: 2025-07-02 22:50:21

领域: cs.LG

下载: http://arxiv.org/abs/2506.23799v2

Bayesian Invariance Modeling of Multi-Environment Data

Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features - those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this problem through hypothesis testing or regularized optimization. Here we develop Bayesian Invariant Prediction (BIP), a probabilistic model for invariant prediction. BIP encodes the indices of invariant features as a latent variable and recover them by posterior inference. Under the assumptions of Peters et al. [2016], the BIP posterior targets the true invariant features. We prove that the posterior is consistent and that greater environment heterogeneity leads to faster posterior contraction. To handle many features, we design an efficient variational approximation called VI-BIP. In simulations and real data, we find that BIP and VI-BIP are more accurate and scalable than existing methods for invariant prediction.

Updated: 2025-07-02 22:33:39

标题: 贝叶斯不变建模多环境数据

摘要: 不变预测[Peters等，2016]分析来自多个环境的特征/结果数据，以识别不变特征 - 具有与结果稳定预测关系的特征。这些特征支持对新环境的泛化，并有助于揭示因果机制。先前的方法主要通过假设检验或正则化优化来解决这个问题。在这里，我们开发了贝叶斯不变预测（BIP），这是一个用于不变预测的概率模型。BIP将不变特征的索引编码为潜变量，并通过后验推断来恢复它们。在Peters等人的假设下[2016]，BIP后验目标是真实的不变特征。我们证明后验是一致的，并且更大的环境异质性会导致后验收缩更快。为了处理许多特征，我们设计了一种称为VI-BIP的高效变分逼近。在模拟和真实数据中，我们发现BIP和VI-BIP比现有的不变预测方法更准确和可扩展。

更新时间: 2025-07-02 22:33:39

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2506.22675v2

Extended c-differential distinguishers of full 9 and reduced-round Kuznyechik cipher

This paper introduces {\em truncated inner $c$-differential cryptanalysis}, a novel technique that for the first time enables the practical application of $c$-differential uniformity to block ciphers. While Ellingsen et al. (IEEE Trans. Inf. Theory, 2020) established the notion of $c$-differential uniformity using $(F(x\oplus a), cF(x))$, a key challenge remained: multiplication by $c$ disrupts the structural properties essential for block cipher analysis, particularly key addition. We resolve this challenge by developing an \emph{inner} $c$-differential approach where multiplication by $c$ affects the input: $(F(cx\oplus a), F(x))$. We prove that the inner $c$-differential uniformity of a function $F$ equals the outer $c$-differential uniformity of $F^{-1}$, establishing a fundamental duality. This modification preserves cipher structure while enabling practical cryptanalytic applications. Our main contribution is a comprehensive multi-faceted statistical-computational framework, implementing truncated $c$-differential analysis against the full 9-round Kuznyechik cipher (the inner $c$-differentials are immune to the key whitening at the backend). Through extensive computational analysis involving millions of differential pairs, we demonstrate statistically significant non-randomness across all tested round counts. For the full 9-round cipher, we identify multiple configurations triggering critical security alerts, with bias ratios reaching $1.7\times$ and corrected p-values as low as $1.85 \times 10^{-3}$, suggesting insufficient security margin against this new attack vector. This represents the first practical distinguisher against the full 9-round Kuznyechik.

Updated: 2025-07-02 22:27:33

标题: 全9轮和减少轮次的Kuznyechik密码的扩展c-微分区分器

摘要: 本文介绍了“截断内部$c$-差分密码分析”的新技术，首次实现了$c$-差分均匀性在分块密码中的实际应用。虽然Ellingsen等人（IEEE信息论杂志，2020年）利用$(F(x\oplus a), cF(x))$建立了$c$-差分均匀性的概念，但一个关键挑战仍然存在：乘以$c$会破坏对于分块密码分析至关重要的结构属性，特别是密钥加法。我们通过开发一种“内部”$c$-差分方法来解决这一挑战，其中乘以$c$会影响输入：$(F(cx\oplus a), F(x))$。我们证明了函数$F$的内部$c$-差分均匀性等于$F^{-1}$的外部$c$-差分均匀性，建立了一个基本的对偶关系。这种修改保持了密码结构同时实现了实际的密码分析应用。我们的主要贡献是一个全面多面向的统计计算框架，实施了针对完整的9轮Kuznyechik密码的截断$c$-差分分析（内部$c$-差分对于后端的密钥白化是免疫的）。通过涉及数百万对差分的广泛计算分析，我们展示了在所有测试轮数中存在统计显著的非随机性。对于完整的9轮密码，我们确定了多个配置，触发了关键的安全警报，偏差比率达到$1.7\times$，修正的p值低至$1.85 \times 10^{-3}$，表明针对这种新攻击向量的安全保障不足。这代表了对完整的9轮Kuznyechik的第一个实用区分器。

更新时间: 2025-07-02 22:27:33

领域: cs.CR,cs.IT,math.IT,94A60, 11T71, 12E20, 68P25, 62P99

下载: http://arxiv.org/abs/2507.02181v1

Road Graph Generator: Mapping roads at construction sites from GPS data

We propose a new method for inferring roads from GPS trajectories to map construction sites. This task presents a unique challenge due to the erratic and non-standard movement patterns of construction machinery, which significantly diverge from typical vehicular traffic on established roads. Our proposed method first identifies intersections in the road network that serve as critical decision points, and then connects them with edges to produce a graph, which can subsequently be used for planning and task-allocation. We demonstrate the approach by mapping roads at a real-life construction site in Norway. The method is validated on four increasingly complex segments of the map. In our tests, the method achieved perfect accuracy in detecting intersections and inferring roads in data with no or low noise, while its performance was reduced in areas with significant noise and consistently missing GPS updates.

Updated: 2025-07-02 22:21:58

标题: 道路图生成器：从GPS数据绘制建筑工地的道路

摘要: 我们提出了一种新的方法，用于从GPS轨迹推断道路以绘制建筑工地地图。这项任务面临着独特的挑战，因为建筑机械的运动模式不规律且非标准，与已建立道路上的 typic vehicular traffic 有显著的偏差。我们提出的方法首先识别道路网络中作为关键决策点的交叉口，然后将它们连接起来形成一个图，随后可以用于规划和任务分配。我们通过在挪威一个实际的建筑工地绘制道路来演示这种方法。该方法在地图的四个越来越复杂的区段上得到验证。在我们的测试中，该方法在没有或低噪声的数据中实现了完美的准确性，而在存在大量噪声和持续缺失GPS更新的区域，其性能则有所降低。

更新时间: 2025-07-02 22:21:58

领域: cs.AI

下载: http://arxiv.org/abs/2402.09919v4

Tree-based Learning for High-Fidelity Prediction of Chaos

Model-free forecasting of the temporal evolution of chaotic systems is crucial but challenging. Existing solutions require hyperparameter tuning, significantly hindering their wider adoption. In this work, we introduce a tree-based approach not requiring hyperparameter tuning: TreeDOX. It uses time delay overembedding as explicit short-term memory and Extra-Trees Regressors to perform feature reduction and forecasting. We demonstrate the state-of-the-art performance of TreeDOX using the Henon map, Lorenz and Kuramoto-Sivashinsky systems, and the real-world Southern Oscillation Index.

Updated: 2025-07-02 22:20:54

标题: 树状学习用于深度预测混沌系统

摘要: 混沌系统时间演化的无模型预测是至关重要但具有挑战性的。现有解决方案需要超参数调整，显著阻碍了它们的广泛采用。在这项工作中，我们引入了一种不需要超参数调整的基于树的方法：TreeDOX。它利用时间延迟过度嵌入作为显式短期记忆，并使用Extra-Trees回归器进行特征降维和预测。我们使用Henon映射、Lorenz和Kuramoto-Sivashinsky系统以及真实世界的Southern Oscillation Index展示了TreeDOX的最先进性能。

更新时间: 2025-07-02 22:20:54

领域: cs.LG,math.DS,nlin.CD,physics.data-an,stat.ML

下载: http://arxiv.org/abs/2403.13836v2

ARMOUR US: Android Runtime Zero-permission Sensor Usage Monitoring from User Space

This work investigates how to monitor access to Android zero-permission sensors which could cause privacy leakage to users. Moreover, monitoring such sensitive access allows security researchers to characterize potential sensor abuse patterns. Zero-permission sensors such as accelerometers have become an indispensable part of Android devices. The critical information they provide has attracted extensive research investigating how data collectors could capture more sensor data to enable both benign and exploitative applications. In contrast, little work has explored how to enable data providers, such as end users, to understand sensor usage. While existing methods such as static analysis and hooking-based dynamic analysis face challenges of requiring complicated development chains, rooting privilege, and app-specific reverse engineering analysis, our work aims to bridge this gap by developing ARMOUR for user-space runtime monitoring, leveraging the intrinsic sampling rate variation and convergence behaviors of Android. ARMOUR enables privacy-aware users to easily monitor how third-party apps use sensor data and support security researchers to perform rapid app-agnostic sensor access analysis. Our evaluation with 1,448 commercial applications shows the effectiveness of ARMOUR in detecting sensor usage in obfuscated code and other conditions, and observes salient sensor abuse patterns such as 50% of apps from seemingly sensor-independent categories accessing data of multiple zero-permission sensors. We analyze the impact of Android's recent policy changes on zero-permission sensors and remaining technical and regulatory problems.

Updated: 2025-07-02 22:18:54

标题: ARMOUR US：Android运行时用户空间零权限传感器使用监控

摘要: 这项工作研究了如何监控Android零权限传感器的访问，这可能会导致用户隐私泄露。此外，监控这种敏感访问可以帮助安全研究人员对潜在的传感器滥用模式进行表征。加速计等零权限传感器已成为Android设备不可或缺的一部分。它们提供的关键信息吸引了大量研究，探讨数据收集者如何捕获更多传感器数据，以实现良性和剥削性应用程序。相比之下，很少有研究探讨如何使数据提供者（如终端用户）了解传感器的使用情况。虽然现有方法（如静态分析和基于hooking的动态分析）面临着开发链复杂、需要root权限和应用程序特定的逆向工程分析等挑战，我们的工作旨在通过开发ARMOUR实现用户空间运行时监控，利用Android的内在采样率变化和收敛行为来弥合这一差距。ARMOUR使注重隐私的用户可以轻松监控第三方应用程序如何使用传感器数据，并支持安全研究人员进行快速的与应用程序无关的传感器访问分析。我们对1,448个商业应用程序进行评估，结果显示ARMOUR在检测混淆代码和其他条件下的传感器使用方面的有效性，并观察到显著的传感器滥用模式，例如似乎与传感器无关类别的50%应用程序访问多个零权限传感器的数据。我们分析了Android最近政策变化对零权限传感器和尚存的技术和监管问题的影响。

更新时间: 2025-07-02 22:18:54

领域: cs.CR,K.6.5; D.4.6

下载: http://arxiv.org/abs/2507.02177v1

Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis

Modeling voice identity is challenging due to its multifaceted nature. In generative speech systems, identity is often assessed using automatic speaker verification (ASV) embeddings, designed for discrimination rather than characterizing identity. This paper investigates which aspects of a voice are captured in such representations. We find that widely used ASV embeddings focus mainly on static features like timbre and pitch range, while neglecting dynamic elements such as rhythm. We also identify confounding factors that compromise speaker similarity measurements and suggest mitigation strategies. To address these gaps, we propose U3D, a metric that evaluates speakers' dynamic rhythm patterns. This work contributes to the ongoing challenge of assessing speaker identity consistency in the context of ever-better voice cloning systems. We publicly release our code.

Updated: 2025-07-02 22:16:42

标题: 分析和改进语音合成中的说话者相似性评估

摘要: 建模语音身份是具有多方面性质的，因此具有挑战性。在生成性语音系统中，身份通常是通过自动说话者验证（ASV）嵌入来评估的，这些嵌入是为了区分而不是表征身份而设计的。本文研究了这些表示中捕捉到的声音的哪些方面。我们发现，广泛使用的ASV嵌入主要关注静态特征，如音色和音高范围，而忽略了节奏等动态元素。我们还确定了影响说话者相似度测量的混淆因素，并提出了缓解策略。为了解决这些差距，我们提出了U3D，一个评估说话者动态节奏模式的度量。这项工作有助于在不断改进的语音克隆系统背景下评估说话者身份一致性的持续挑战。我们公开发布我们的代码。

更新时间: 2025-07-02 22:16:42

领域: cs.SD,cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2507.02176v1

Hardware and Software Platform Inference

It is now a common business practice to buy access to large language model (LLM) inference rather than self-host, because of significant upfront hardware infrastructure and energy costs. However, as a buyer, there is no mechanism to verify the authenticity of the advertised service including the serving hardware platform, e.g. that it is actually being served using an NVIDIA H100. Furthermore, there are reports suggesting that model providers may deliver models that differ slightly from the advertised ones, often to make them run on less expensive hardware. That way, a client pays premium for a capable model access on more expensive hardware, yet ends up being served by a (potentially less capable) cheaper model on cheaper hardware. In this paper we introduce hardware and software platform inference (HSPI) -- a method for identifying the underlying GPU architecture and software stack of a (black-box) machine learning model solely based on its input-output behavior. Our method leverages the inherent differences of various GPU architectures and compilers to distinguish between different GPU types and software stacks. By analyzing the numerical patterns in the model's outputs, we propose a classification framework capable of accurately identifying the GPU used for model inference as well as the underlying software configuration. Our findings demonstrate the feasibility of inferring GPU type from black-box models. We evaluate HSPI against models served on different real hardware and find that in a white-box setting we can distinguish between different GPUs with between $83.9\%$ and $100\%$ accuracy. Even in a black-box setting we achieve results that are up to 3x higher than random guess accuracy. Our code is available at https://github.com/ChengZhang-98/HSPI.

Updated: 2025-07-02 22:14:13

标题: 硬件和软件平台推断

摘要: 目前，一种常见的商业实践是购买大型语言模型（LLM）推理的访问权限，而不是自行托管，因为存在显著的前期硬件基础设施和能源成本。然而，作为买方，没有机制可以验证广告服务的真实性，包括服务硬件平台，例如是否实际上是使用NVIDIA H100进行服务。此外，有报道表明，模型提供者可能会提供与广告不完全相同的模型，通常是为了使其在更便宜的硬件上运行。这样，客户为在更昂贵的硬件上访问功能强大的模型付出了高额费用，然而最终可能是由（潜在的能力较差的）更便宜的模型在更便宜的硬件上提供服务。在本文中，我们介绍了硬件和软件平台推理（HSPI）-一种仅基于其输入输出行为识别（黑盒）机器学习模型的底层GPU架构和软件堆栈的方法。我们的方法利用各种GPU架构和编译器之间固有的差异来区分不同的GPU类型和软件堆栈。通过分析模型输出中的数字模式，我们提出了一个能够准确识别用于模型推理的GPU以及底层软件配置的分类框架。我们的研究结果证明了从黑盒模型中推断GPU类型的可行性。我们对在不同真实硬件上提供服务的模型对HSPI进行评估，发现在白盒设置中，我们可以以83.9%至100%的准确率区分不同的GPU。即使在黑盒设置中，我们的结果也比随机猜测准确率高出多达3倍。我们的代码可在https://github.com/ChengZhang-98/HSPI 上获得。

更新时间: 2025-07-02 22:14:13

领域: cs.LG

下载: http://arxiv.org/abs/2411.05197v2

Data Diversification Methods In Alignment Enhance Math Performance In LLMs

While recent advances in preference learning have enhanced alignment in human feedback, mathematical reasoning remains a persistent challenge. We investigate how data diversification strategies in preference optimization can improve the mathematical reasoning abilities of large language models (LLMs). We evaluate three common data generation methods: temperature sampling, Chain-of-Thought prompting, and Monte Carlo Tree Search (MCTS), and introduce Diversified-ThinkSolve (DTS), a novel structured approach that systematically decomposes problems into diverse reasoning paths. Our results show that with strategically diversified preference data, models can substantially improve mathematical reasoning performance, with the best approach yielding gains of 7.1% on GSM8K and 4.2% on MATH over the base model. Despite its strong performance, DTS incurs only a marginal computational overhead (1.03x) compared to the baseline, while MCTS is nearly five times more costly with lower returns. These findings demonstrate that structured exploration of diverse problem-solving methods creates more effective preference data for mathematical alignment than traditional approaches.

Updated: 2025-07-02 22:12:03

标题: 对齐中的数据多样化方法提高LLMs的数学表现

摘要: 最近在偏好学习方面的进展增强了人类反馈的一致性，但数学推理仍然是一个持久的挑战。我们研究了偏好优化中的数据多样化策略如何可以提高大型语言模型（LLMs）的数学推理能力。我们评估了三种常见的数据生成方法：温度抽样、思维链提示和蒙特卡洛树搜索（MCTS），并引入了Diversified-ThinkSolve（DTS），这是一种新颖的结构化方法，系统地将问题分解为不同的推理路径。我们的结果显示，通过策略性地多样化偏好数据，模型可以显著提高数学推理性能，最佳方法在GSM8K上的增益为7.1％，在MATH上为4.2％，基准模型。尽管DTS表现强劲，但与基准相比只产生微不足道的计算开销（1.03倍），而MCTS的成本几乎是后者的五倍，回报较低。这些发现表明，对多样化问题解决方法进行结构化探索比传统方法更有效，为数学对齐提供更有效的偏好数据。

更新时间: 2025-07-02 22:12:03

领域: cs.AI

下载: http://arxiv.org/abs/2507.02173v1

A physics informed neural network approach to simulating ice dynamics governed by the shallow ice approximation

In this article we develop a Physics Informed Neural Network (PINN) approach to simulate ice sheet dynamics governed by the Shallow Ice Approximation. This problem takes the form of a time-dependent parabolic obstacle problem. Prior work has used this approach to address the stationary obstacle problem and here we extend it to the time dependent problem. Through comprehensive 1D and 2D simulations, we validate the model's effectiveness in capturing complex free-boundary conditions. By merging traditional mathematical modeling with cutting-edge deep learning methods, this approach provides a scalable and robust solution for predicting temporal variations in ice thickness. To illustrate this approach in a real world setting, we simulate the dynamics of the Devon Ice Cap, incorporating aerogeophysical data from 2000 and 2018.

Updated: 2025-07-02 22:11:34

标题: 一个物理知情神经网络方法来模拟由浅冰近似控制的冰动力学

摘要: 在本文中，我们发展了一种物理信息神经网络（PINN）方法来模拟由浅冰逼近法规定的冰盖动力学。这个问题采取了一个时间相关的抛物障碍问题的形式。之前的工作已经使用这种方法来解决静态障碍问题，而在这里我们将其扩展到时间相关的问题。通过全面的一维和二维模拟，我们验证了模型在捕捉复杂自由边界条件方面的有效性。通过将传统数学建模与尖端深度学习方法相结合，这种方法为预测冰厚度的时间变化提供了可扩展和稳健的解决方案。为了在真实世界环境中展示这种方法，我们模拟了Devon冰帽的动力学，结合了2000年和2018年的航空地球物理数据。

更新时间: 2025-07-02 22:11:34

领域: cs.LG,cs.NA,math.NA,physics.ao-ph

下载: http://arxiv.org/abs/2504.08136v2

The unstable formula theorem revisited via algorithms

This paper is about the surprising interaction of a foundational result from model theory, about stability of theories, with algorithmic stability in learning. First, in response to gaps in existing learning models, we introduce a new statistical learning model, called ``Probably Eventually Correct'' or PEC. We characterize Littlestone (stable) classes in terms of this model. As a corollary, Littlestone classes have frequent short definitions in a natural statistical sense. In order to obtain a characterization of Littlestone classes in terms of frequent definitions, we build an equivalence theorem highlighting what is common to many existing approximation algorithms, and to the new PEC. This is guided by an analogy to definability of types in model theory, but has its own character. Drawing on these theorems and on other recent work, we present a complete algorithmic analogue of Shelah's celebrated Unstable Formula Theorem, with algorithmic properties taking the place of the infinite.

Updated: 2025-07-02 22:11:09

标题: 通过算法重新审视不稳定的公式定理

摘要: 本文讨论了模型理论中一个基础结果关于理论的稳定性与学习中的算法稳定性之间的意外相互作用。首先，为了填补现有学习模型中的空白，我们引入了一个新的统计学习模型，称为“可能最终正确”或PEC。我们用这个模型来描述Littlestone（稳定）类。作为推论，Littlestone类在自然统计意义上具有频繁的简短定义。为了在频繁定义的术语中获得对Littlestone类的描述，我们建立了一个等价定理，突出了许多现有近似算法和新的PEC之间的共同点。这是在模型理论中类型可定义的类比的指导下进行的，但具有自己的特点。基于这些定理和其他最近的工作，我们呈现了Shelah著名的不稳定公式定理的完整算法类比，其中算法性质取代了无限。

更新时间: 2025-07-02 22:11:09

领域: math.LO,cs.DM,cs.LG,cs.LO,math.CO

下载: http://arxiv.org/abs/2212.05050v3

OblivIO: Securing reactive programs by oblivious execution with bounded traffic overheads

Traffic analysis attacks remain a significant problem for online security. Communication between nodes can be observed by network level attackers as it inherently takes place in the open. Despite online services increasingly using encrypted traffic, the shape of the traffic is not hidden. To prevent traffic analysis, the shape of a system's traffic must be independent of secrets. We investigate adapting the data-oblivious approach the reactive setting and present OblivIO, a secure language for writing reactive programs driven by network events. Our approach pads with dummy messages to hide which program sends are genuinely executed. We use an information-flow type system to provably enforce timing-sensitive noninterference. The type system is extended with potentials to bound the overhead in traffic introduced by our approach. We address challenges that arise from joining data-oblivious and reactive programming and demonstrate the feasibility of our resulting language by developing an interpreter that implements security critical operations as constant-time algorithms.

Updated: 2025-07-02 22:09:25

标题: OblivIO：通过带有有限流量开销的遗忘执行来保护反应式程序

摘要: 交通分析攻击仍然是在线安全的一个重要问题。节点之间的通信可以被网络级攻击者观察到，因为它在开放环境中进行。尽管在线服务越来越多地使用加密流量，但流量的形状并未隐藏。为了防止流量分析，系统流量的形状必须与秘密信息无关。我们研究了将数据遗忘方法调整为响应式设置的可能性，并提出了OblivIO，一个安全的用于编写由网络事件驱动的响应式程序的语言。我们的方法通过填充虚假消息来隐藏真正执行的程序发送。我们使用信息流类型系统来明确实施对时间敏感的非干扰。类型系统被扩展为包含潜在因素，以限制我们方法引入的流量开销。我们解决了由于将数据遗忘和响应式编程相结合而产生的挑战，并通过开发一个实现安全关键操作的解释器来展示我们结果语言的可行性。

更新时间: 2025-07-02 22:09:25

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2301.08148v2

Towards Bio-Inspired Robotic Trajectory Planning via Self-Supervised RNN

Trajectory planning in robotics is understood as generating a sequence of joint configurations that will lead a robotic agent, or its manipulator, from an initial state to the desired final state, thus completing a manipulation task while considering constraints like robot kinematics and the environment. Typically, this is achieved via sampling-based planners, which are computationally intensive. Recent advances demonstrate that trajectory planning can also be performed by supervised sequence learning of trajectories, often requiring only a single or fixed number of passes through a neural architecture, thus ensuring a bounded computation time. Such fully supervised approaches, however, perform imitation learning; they do not learn based on whether the trajectories can successfully reach a goal, but try to reproduce observed trajectories. In our work, we build on this approach and propose a cognitively inspired self-supervised learning scheme based on a recurrent architecture for building a trajectory model. We evaluate the feasibility of the proposed method on a task of kinematic planning for a robotic arm. The results suggest that the model is able to learn to generate trajectories only using given paired forward and inverse kinematics models, and indicate that this novel method could facilitate planning for more complex manipulation tasks requiring adaptive solutions.

Updated: 2025-07-02 22:05:58

标题: 朝向通过自监督循环神经网络实现的生物启发式机器人轨迹规划

摘要: 机器人的轨迹规划被理解为生成一系列关节配置，将引导机器人代理或其操作器从初始状态到达期望的最终状态，从而完成一项操作任务，同时考虑到像机器人运动学和环境等约束条件。通常，这是通过基于采样的规划器实现的，这在计算上是密集的。最近的进展表明，轨迹规划也可以通过监督序列学习轨迹来完成，通常只需要通过神经架构进行一次或固定次数的遍历，从而确保有界的计算时间。然而，这种完全监督的方法执行模仿学习；它们不是基于轨迹是否成功到达目标而学习，而是试图重现观察到的轨迹。在我们的工作中，我们基于这种方法并提出了一个基于循环架构的认知启发式自监督学习方案，用于构建轨迹模型。我们在机器人臂的运动学规划任务上评估了所提出方法的可行性。结果表明，该模型能够学习仅使用给定的正向和逆向运动学模型生成轨迹，并且表明这种新颖方法可以促进对需要适应性解决方案的更复杂操作任务的规划。

更新时间: 2025-07-02 22:05:58

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02171v1

Deep Learning-Based Forecasting of Hotel KPIs: A Cross-City Analysis of Global Urban Markets

This study employs Long Short-Term Memory (LSTM) networks to forecast key performance indicators (KPIs), Occupancy (OCC), Average Daily Rate (ADR), and Revenue per Available Room (RevPAR), across five major cities: Manchester, Amsterdam, Dubai, Bangkok, and Mumbai. The cities were selected for their diverse economic profiles and hospitality dynamics. Monthly data from 2018 to 2025 were used, with 80% for training and 20% for testing. Advanced time series decomposition and machine learning techniques enabled accurate forecasting and trend identification. Results show that Manchester and Mumbai exhibited the highest predictive accuracy, reflecting stable demand patterns, while Dubai and Bangkok demonstrated higher variability due to seasonal and event-driven influences. The findings validate the effectiveness of LSTM models for urban hospitality forecasting and provide a comparative framework for data-driven decision-making. The models generalisability across global cities highlights its potential utility for tourism stakeholders and urban planners.

Updated: 2025-07-02 22:05:51

标题: 基于深度学习的酒店关键绩效指标预测：全球城市市场的跨城分析

摘要: 这项研究采用了长短期记忆（LSTM）网络来预测五个主要城市（曼彻斯特、阿姆斯特丹、迪拜、曼谷和孟买）的关键绩效指标（KPIs），包括入住率（OCC）、平均每日房价（ADR）和每间可用房间的收入（RevPAR）。选择这些城市是因为它们具有多样化的经济特点和酒店行业动态。使用了2018年至2025年的月度数据，其中80%用于训练，20%用于测试。先进的时间序列分解和机器学习技术实现了准确的预测和趋势识别。结果显示，曼彻斯特和孟买表现出最高的预测准确性，反映出稳定的需求模式，而迪拜和曼谷由于季节性和事件驱动的影响而表现出更高的变异性。这些发现验证了LSTM模型在城市酒店业预测中的有效性，并为基于数据的决策提供了比较框架。该模型在全球城市间的泛化性突显了其对旅游利益相关者和城市规划者的潜在实用性。

更新时间: 2025-07-02 22:05:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03028v1

Threat Modeling for AI: The Case for an Asset-Centric Approach

Recent advances in AI are transforming AI's ubiquitous presence in our world from that of standalone AI-applications into deeply integrated AI-agents. These changes have been driven by agents' increasing capability to autonomously make decisions and initiate actions, using existing applications; whether those applications are AI-based or not. This evolution enables unprecedented levels of AI integration, with agents now able to take actions on behalf of systems and users -- including, in some cases, the powerful ability for the AI to write and execute scripts as it deems necessary. With AI systems now able to autonomously execute code, interact with external systems, and operate without human oversight, traditional security approaches fall short. This paper introduces an asset-centric methodology for threat modeling AI systems that addresses the unique security challenges posed by integrated AI agents. Unlike existing top-down frameworks that analyze individual attacks within specific product contexts, our bottom-up approach enables defenders to systematically identify how vulnerabilities -- both conventional and AI-specific -- impact critical AI assets across distributed infrastructures used to develop and deploy these agents. This methodology allows security teams to: (1) perform comprehensive analysis that communicates effectively across technical domains, (2) quantify security assumptions about third-party AI components without requiring visibility into their implementation, and (3) holistically identify AI-based vulnerabilities relevant to their specific product context. This approach is particularly relevant for securing agentic systems with complex autonomous capabilities. By focusing on assets rather than attacks, our approach scales with the rapidly evolving threat landscape while accommodating increasingly complex and distributed AI development pipelines.

Updated: 2025-07-02 22:04:43

标题: 人工智能的威胁建模：资产为中心方法的案例

摘要: 最近人工智能的进展正在将人工智能在我们世界中的无处不在转变为深度集成的人工智能代理。这些变化是由于代理的能力越来越大，能够自主地做出决策并发起行动，利用现有的应用程序；无论这些应用程序是否基于人工智能。这种演变使得人工智能集成达到前所未有的水平，代理现在能够代表系统和用户采取行动，包括在某些情况下，人工智能具有编写和执行脚本的强大能力。由于人工智能系统现在能够自主执行代码、与外部系统交互并在没有人类监督的情况下运行，传统的安全方法显得不足够。本文介绍了一种针对集成人工智能代理所提出的威胁建模资产中心方法论，解决了集成人工智能代理带来的独特安全挑战。与现有的自上而下框架不同，该方法论使防御者能够系统地识别漏洞——无论是传统的还是人工智能特定的——如何影响用于开发和部署这些代理的分布式基础设施中的关键人工智能资产。该方法论允许安全团队：(1)执行跨技术领域有效沟通的综合分析，(2)量化对第三方人工智能组件的安全假设，而无需了解其实现，以及(3)全面识别与其特定产品背景相关的基于人工智能的漏洞。这种方法对于保护具有复杂自主能力的代理系统尤为重要。通过关注资产而不是攻击，我们的方法能够随着威胁格局的迅速演变而扩展，并适应日益复杂和分布式的人工智能开发流程。

更新时间: 2025-07-02 22:04:43

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2505.06315v2

Design and Evaluation of Privacy-Preserving Protocols for Agent-Facilitated Mobile Money Services in Kenya

Mobile Money (MoMo), a technology that allows users to complete financial transactions using a mobile phone without requiring a bank account, is a common method for processing financial transactions in Africa and other developing regions. Users can deposit and withdraw money with the help of human agents. During deposit and withdraw operations, know-your-customer (KYC) processes require agents to access and verify customer information such as name and ID number, which can introduce privacy and security risks. In this work, we design alternative protocols for MoMo deposits/withdrawals that protect users' privacy while enabling KYC checks by redirecting the flow of sensitive information from the agent to the MoMo provider. We evaluate the usability and efficiency of our proposed protocols in a role-play and semi-structured interview study with 32 users and 15 agents in Kenya. We find that users and agents prefer the new protocols, due in part to convenient and efficient verification using biometrics as well as better data privacy and access control. However, our study also surfaced challenges that need to be addressed before these protocols can be deployed.

Updated: 2025-07-02 21:57:06

标题: 肯尼亚代理促进的移动货币服务的隐私保护协议的设计与评估

摘要: Mobile Money（MoMo）是一种技术，允许用户在不需要银行账户的情况下使用手机完成金融交易，是非洲和其他发展中地区处理金融交易的常用方法。用户可以通过人工代理存取款。在存取款操作过程中，了解您的客户（KYC）流程要求代理访问和验证客户信息，如姓名和身份证号码，这可能会带来隐私和安全风险。在这项工作中，我们设计了MoMo存取款的替代协议，通过将敏感信息的流向从代理转向MoMo提供商，保护用户的隐私同时实现了KYC检查。我们通过在肯尼亚进行的一项角色扮演和半结构化访谈研究中，与32名用户和15名代理评估了我们提出的协议的可用性和效率。我们发现用户和代理喜欢这些新协议，部分原因在于使用生物识别技术进行方便和高效的验证，以及更好的数据隐私和访问控制。然而，我们的研究也揭示了在这些协议部署之前需要解决的挑战。

更新时间: 2025-07-02 21:57:06

领域: cs.CR,cs.HC

下载: http://arxiv.org/abs/2412.18716v2

Beyond Scale: The Diversity Coefficient as a Data Quality Metric for Variability in Natural Language Data

Current trends in pre-training Large Language Models (LLMs) primarily focus on the scaling of model and dataset size. While the quality of pre-training data is considered an important factor for training powerful LLMs, it remains a nebulous concept that has not been rigorously characterized. To this end, we propose a formalization of one key aspect of data quality -- measuring the variability of natural language data -- specifically via a measure we call the diversity coefficient. Our empirical analysis shows that the proposed diversity coefficient aligns with the intuitive properties of diversity and variability, e.g., it increases as the number of latent concepts increases. Then, we measure the diversity coefficient of publicly available pre-training datasets and demonstrate that their formal diversity is high compared to theoretical lower and upper bounds. Finally, we conduct a comprehensive set of controlled interventional experiments with GPT-2 and LLaMAv2 that demonstrate the diversity coefficient of pre-training data characterizes useful aspects of downstream model evaluation performance -- totaling 44 models of various sizes (51M to 7B parameters). We conclude that our formal notion of diversity is an important aspect of data quality that captures variability and causally leads to improved evaluation performance.

Updated: 2025-07-02 21:53:51

标题: 超越规模：多样性系数作为自然语言数据变异性的数据质量指标

摘要: 目前，预训练大型语言模型（LLMs）的当前趋势主要集中在模型和数据集规模的扩展上。尽管预训练数据的质量被认为是训练强大LLMs的重要因素，但它仍然是一个模糊的概念，尚未得到严格的表征。为此，我们提出了数据质量的一个关键方面的形式化--通过衡量自然语言数据的变异性--特别是通过我们称之为多样性系数的度量。我们的实证分析显示，所提出的多样性系数与多样性和变异性的直观属性一致，例如，随着潜在概念数量的增加而增加。然后，我们测量了公开可用的预训练数据集的多样性系数，并证明它们的形式多样性与理论下限和上限相比较高。最后，我们进行了一系列有控制的干预实验，使用GPT-2和LLaMAv2展示了预训练数据的多样性系数表征了下游模型评估性能的有用方面--共计44个不同规模的模型（从51M到7B个参数）。我们得出结论，我们对多样性的正式概念是数据质量的一个重要方面，捕捉了变异性并因果地导致了改进的评估性能。

更新时间: 2025-07-02 21:53:51

领域: cs.CL,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2306.13840v4

Non-Convex Optimization with Spectral Radius Regularization

We develop regularization methods to find flat minima while training deep neural networks. These minima generalize better than sharp minima, yielding models outperforming baselines on real-world test data (which may be distributed differently than the training data). Specifically, we propose a method of regularized optimization to reduce the spectral radius of the Hessian of the loss function. We also derive algorithms to efficiently optimize neural network models and prove that these algorithms almost surely converge. Furthermore, we demonstrate that our algorithm works effectively on applications in different domains, including healthcare. To show that our models generalize well, we introduced various methods for testing generalizability and found that our models outperform comparable baseline models on these tests.

Updated: 2025-07-02 21:53:39

标题: 非凸优化与谱半径正则化

摘要: 我们开发了正则化方法，以在训练深度神经网络时找到平坦的极小值。这些极小值比尖锐的极小值具有更好的泛化能力，从而产生在真实世界测试数据上表现优于基线的模型（这些数据可能与训练数据有所不同）。具体来说，我们提出了一种正则化优化方法，以减少损失函数的Hessian矩阵的谱半径。我们还推导出了有效优化神经网络模型的算法，并证明这些算法几乎肯定会收敛。此外，我们展示了我们的算法在包括医疗保健在内的不同领域的应用中有效工作。为了证明我们的模型具有良好的泛化能力，我们引入了各种测试泛化能力的方法，并发现我们的模型在这些测试中优于可比较的基线模型。

更新时间: 2025-07-02 21:53:39

领域: cs.LG

下载: http://arxiv.org/abs/2102.11210v2

COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework

Recently, with the emergence of recent Multimodal Large Language Model (MLLM) technology, it has become possible to exploit its video understanding capability on different classification tasks. In practice, we face the difficulty of huge requirements for GPU resource if we need to deploy MLLMs online. In this paper, we propose COEF-VQ, a novel cascaded MLLM framework designed to enhance video quality understanding on the short-video platform while optimizing computational efficiency. Our approach integrates an entropy-based pre-filtering stage, where a lightweight model assesses uncertainty and selectively filters cases before passing them to the more computationally intensive MLLM for final evaluation. By prioritizing high-uncertainty samples for deeper analysis, our framework significantly reduces GPU usage while maintaining the strong classification performance of a full MLLM deployment. To demonstrate the effectiveness of COEF-VQ, we deploy this new framework onto the video management platform (VMP) at the short-video platform, and perform a series of detailed experiments on two in-house tasks related to video quality understanding. We show that COEF-VQ leads to substantial performance gains from the offline evaluation in these two tasks and effectively enhances platform safety with limit resource consumption, significantly reducing inappropriate content video view rate by 9.9% in a online A/B test without affecting engagement. Post-launch monitoring confirmed sustained improvements, validating its real-world impact.

Updated: 2025-07-02 21:50:59

标题: COEF-VQ: 通过级联多模态LLM框架实现成本高效的视频质量理解

摘要: 最近，随着最新的多模态大型语言模型（MLLM）技术的出现，利用其视频理解能力在不同的分类任务上已经成为可能。在实践中，如果我们需要在线部署MLLM，就会面临对GPU资源的巨大需求困难。在本文中，我们提出了COEF-VQ，这是一个新颖的级联MLLM框架，旨在增强短视频平台上对视频质量理解，同时优化计算效率。我们的方法集成了基于熵的预过滤阶段，其中一个轻量级模型评估不确定性，并在将案例传递给更具计算复杂性的MLLM进行最终评估之前有选择地过滤案例。通过优先处理高不确定性样本进行更深入的分析，我们的框架显著减少了GPU的使用量，同时保持了完整MLLM部署的强大分类性能。为了展示COEF-VQ的有效性，我们将这个新框架部署到短视频平台的视频管理平台（VMP）上，并在与视频质量理解相关的两项内部任务上进行了一系列详细实验。我们展示了COEF-VQ在这两项任务的离线评估中带来了显著的性能提升，并通过限制资源消耗有效地增强了平台安全性，使在线A/B测试中不当内容视频观看率降低了9.9%，而不影响参与度。上线后的监测确认了持续的改进，验证了其在现实世界中的影响。

更新时间: 2025-07-02 21:50:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.10435v2

Statistical Inference for Responsiveness Verification

Many safety failures in machine learning arise when models are used to assign predictions to people (often in settings like lending, hiring, or content moderation) without accounting for how individuals can change their inputs. In this work, we introduce a formal validation procedure for the responsiveness of predictions with respect to interventions on their features. Our procedure frames responsiveness as a type of sensitivity analysis in which practitioners control a set of changes by specifying constraints over interventions and distributions over downstream effects. We describe how to estimate responsiveness for the predictions of any model and any dataset using only black-box access, and how to use these estimates to support tasks such as falsification and failure probability estimation. We develop algorithms that construct these estimates by generating a uniform sample of reachable points, and demonstrate how they can promote safety in real-world applications such as recidivism prediction, organ transplant prioritization, and content moderation.

Updated: 2025-07-02 21:50:08

标题: 对于响应验证的统计推断

摘要: 机器学习中许多安全失败出现在模型用于将预测分配给人员（通常在类似借贷、招聘或内容管理的设置中）而不考虑个体如何改变其输入。在这项工作中，我们引入了一个正式的验证程序，用于评估预测对其特征干预的响应性。我们的程序将响应性框定为一种敏感性分析，其中从业者通过指定对干预的约束和对下游效果的分布来控制一组变化。我们描述了如何使用仅有黑盒访问访问来估计任何模型和任何数据集的预测的响应性，以及如何利用这些估计来支持诸如证伪和失败概率估计等任务。我们开发了一些算法，通过生成可达点的均匀样本来构建这些估计，并展示了它们如何在现实世界的应用中促进安全性，例如再犯预测、器官移植优先级确定和内容管理。

更新时间: 2025-07-02 21:50:08

领域: cs.LG

下载: http://arxiv.org/abs/2507.02169v1

Generating Large Semi-Synthetic Graphs of Any Size

Graph generation is an important area in network science. Traditional approaches focus on replicating specific properties of real-world graphs, such as small diameters or power-law degree distributions. Recent advancements in deep learning, particularly with Graph Neural Networks, have enabled data-driven methods to learn and generate graphs without relying on predefined structural properties. Despite these advances, current models are limited by their reliance on node IDs, which restricts their ability to generate graphs larger than the input graph and ignores node attributes. To address these challenges, we propose Latent Graph Sampling Generation (LGSG), a novel framework that leverages diffusion models and node embeddings to generate graphs of varying sizes without retraining. The framework eliminates the dependency on node IDs and captures the distribution of node embeddings and subgraph structures, enabling scalable and flexible graph generation. Experimental results show that LGSG performs on par with baseline models for standard metrics while outperforming them in overlooked ones, such as the tendency of nodes to form clusters. Additionally, it maintains consistent structural characteristics across graphs of different sizes, demonstrating robustness and scalability.

Updated: 2025-07-02 21:46:28

标题: 生成任意大小的大型半合成图网络

摘要: 图生成是网络科学中的重要领域。传统方法侧重于复制现实世界图的特定属性，如小直径或幂律度分布。最近深度学习的进展，特别是图神经网络，使数据驱动方法能够学习和生成图，而无需依赖预定义的结构属性。尽管取得了这些进展，当前模型受限于对节点ID的依赖，这限制了它们生成比输入图更大的图的能力，也忽略了节点属性。为了解决这些挑战，我们提出了潜在图采样生成（LGSG），这是一个利用扩散模型和节点嵌入来生成不同大小的图而无需重新训练的新框架。该框架消除了对节点ID的依赖，并捕获了节点嵌入和子图结构的分布，实现了可扩展和灵活的图生成。实验结果表明，LGSG在标准指标上与基线模型表现相当，同时在被忽视的指标上表现优越，例如节点形成集群的倾向。此外，它在不同大小的图之间保持一致的结构特征，展示了其鲁棒性和可扩展性。

更新时间: 2025-07-02 21:46:28

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2507.02166v1

Generalized Adaptive Transfer Network: Enhancing Transfer Learning in Reinforcement Learning Across Domains

Transfer learning in Reinforcement Learning (RL) enables agents to leverage knowledge from source tasks to accelerate learning in target tasks. While prior work, such as the Attend, Adapt, and Transfer (A2T) framework, addresses negative transfer and selective transfer, other critical challenges remain underexplored. This paper introduces the Generalized Adaptive Transfer Network (GATN), a deep RL architecture designed to tackle task generalization across domains, robustness to environmental changes, and computational efficiency in transfer. GATN employs a domain-agnostic representation module, a robustness-aware policy adapter, and an efficient transfer scheduler to achieve these goals. We evaluate GATN on diverse benchmarks, including Atari 2600, MuJoCo, and a custom chatbot dialogue environment, demonstrating superior performance in cross-domain generalization, resilience to dynamic environments, and reduced computational overhead compared to baselines. Our findings suggest GATN is a versatile framework for real-world RL applications, such as adaptive chatbots and robotic control.

Updated: 2025-07-02 21:33:48

标题: 广义自适应转移网络：增强强化学习跨领域的转移学习

摘要: 在强化学习（RL）中的迁移学习使代理能够利用源任务中的知识加速目标任务中的学习。虽然先前的工作，如关注、调整和迁移（A2T）框架，解决了负迁移和选择性迁移，但其他关键挑战仍未得到充分探讨。本文介绍了通用自适应迁移网络（GATN），这是一个深度RL架构，旨在解决跨领域任务泛化、对环境变化的鲁棒性以及迁移中的计算效率等问题。GATN采用了一个领域无关的表示模块、一个具有鲁棒性的策略适配器和一个高效的迁移调度器来实现这些目标。我们在各种基准测试中评估了GATN，包括Atari 2600、MuJoCo和一个自定义的聊天机器人对话环境，展示了在跨领域泛化、对动态环境的韧性以及与基准相比的减少计算开销方面的卓越表现。我们的研究结果表明，GATN是一个适用于真实世界RL应用的多功能框架，例如自适应聊天机器人和机器人控制。

更新时间: 2025-07-02 21:33:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03026v1

GeMID: Generalizable Models for IoT Device Identification

With the proliferation of devices on the Internet of Things (IoT), ensuring their security has become paramount. Device identification (DI), which distinguishes IoT devices based on their traffic patterns, plays a crucial role in both differentiating devices and identifying vulnerable ones, closing a serious security gap. However, existing approaches to DI that build machine learning models often overlook the challenge of model generalizability across diverse network environments. In this study, we propose a novel framework to address this limitation and to evaluate the generalizability of DI models across data sets collected within different network environments. Our approach involves a two-step process: first, we develop a feature and model selection method that is more robust to generalization issues by using a genetic algorithm with external feedback and datasets from distinct environments to refine the selections. Second, the resulting DI models are then tested on further independent datasets to robustly assess their generalizability. We demonstrate the effectiveness of our method by empirically comparing it to alternatives, highlighting how fundamental limitations of commonly employed techniques such as sliding window and flow statistics limit their generalizability. Moreover, we show that statistical methods, widely used in the literature, are unreliable for device identification due to their dependence on network-specific characteristics rather than device-intrinsic properties, challenging the validity of a significant portion of existing research. Our findings advance research in IoT security and device identification, offering insight into improving model effectiveness and mitigating risks in IoT networks.

Updated: 2025-07-02 21:26:35

标题: GeMID: 通用化的物联网设备识别模型

摘要: 随着物联网(IoT)设备的增多，确保它们的安全性变得至关重要。设备识别(DI)根据其流量模式区分物联网设备，在区分设备和识别易受攻击的设备方面起着至关重要的作用，填补了严重的安全漏洞。然而，现有的建立机器学习模型的DI方法往往忽视了在不同网络环境中模型泛化的挑战。在本研究中，我们提出了一个新颖的框架来解决这一限制，并评估在不同网络环境中收集的数据集上DI模型的泛化能力。我们的方法包括两个步骤：首先，我们通过使用遗传算法和来自不同环境的数据集来优化选择，开发一个更具鲁棒性的特征和模型选择方法，以解决泛化问题。其次，然后在进一步的独立数据集上测试生成的DI模型，以稳健地评估其泛化能力。我们通过实证比较将我们的方法与替代方法相比较，展示了其有效性，突显了常用技术的根本局限性，如滑动窗口和流统计限制了它们的泛化能力。此外，我们还展示了广泛应用于文献中的统计方法对于设备识别是不可靠的，因为它们依赖于网络特定特征而不是设备固有属性，挑战了现有研究的有效性。我们的研究推动了物联网安全和设备识别领域的研究，为改善模型有效性和减轻物联网网络风险提供了见解。

更新时间: 2025-07-02 21:26:35

领域: cs.CR,cs.AI,cs.NI

下载: http://arxiv.org/abs/2411.14441v2

Quantifying the Importance of Data Alignment in Downstream Model Performance

Contrary to the conventional emphasis on dataset size, we explore the role of data alignment -- an often overlooked aspect of data quality -- in training capable Large Language Models (LLMs). To do so, we use the Task2Vec-based alignment coefficient, a quantitative measure of the similarity between two datasets, to quantify the impact of alignment between training data and evaluation data on downstream performance. In particular, we conduct controlled \textit{interventional} experiments for two settings: 1. the impact of increased alignment coefficients between various pre-training (pt) against evaluation datasets, and 2. the impact of increased alignment coefficients between domain specific fine-tuning (ft) against domain specific evaluation. The domain specific task we explore is Autoformalization -- the machine translation task between natural language and code for formal verification. In both settings, we find a strong, predictable negative correlation between the alignment coefficient of a model's training and evaluation data and the model's loss/perplexity on the respective downstream task. These findings suggest a re-evaluation of LLM training approaches, demonstrating the relevance of data alignment compared to data quantity, especially in specialized downstream tasks such as Autoformalization.

Updated: 2025-07-02 21:19:42

标题: 衡量数据对下游模型性能的对齐重要性

摘要: 与传统强调数据集大小相反，我们探讨数据对齐的作用——这是数据质量中经常被忽视的一个方面——在训练能力强大的大型语言模型（LLMs）中的作用。为了做到这一点，我们使用基于Task2Vec的对齐系数，这是衡量两个数据集之间相似性的定量指标，来量化训练数据与评估数据之间对下游性能的影响。特别是，我们针对两种设置进行了受控的干预实验：1.不同的预训练（pt）与评估数据集之间对齐系数的影响，2.不同领域特定微调（ft）与领域特定评估之间对齐系数的影响。我们探讨的领域特定任务是自动形式化——自然语言和代码之间的机器翻译任务，用于形式验证。在两种设置中，我们发现模型训练和评估数据的对齐系数与模型在相应下游任务上的损失/困惑度之间存在强烈、可预测的负相关。这些发现表明需要重新评估LLM训练方法，展示了数据对齐与数据量之间的相关性，特别是在Autoformalization等专门的下游任务中。

更新时间: 2025-07-02 21:19:42

领域: cs.CL,cs.AI,cs.LG,cs.PL

下载: http://arxiv.org/abs/2501.08496v3

Reliable algorithm selection for machine learning-guided design

Algorithms for machine learning-guided design, or design algorithms, use machine learning-based predictions to propose novel objects with desired property values. Given a new design task -- for example, to design novel proteins with high binding affinity to a therapeutic target -- one must choose a design algorithm and specify any hyperparameters and predictive and/or generative models involved. How can these decisions be made such that the resulting designs are successful? This paper proposes a method for design algorithm selection, which aims to select design algorithms that will produce a distribution of design labels satisfying a user-specified success criterion -- for example, that at least ten percent of designs' labels exceed a threshold. It does so by combining designs' predicted property values with held-out labeled data to reliably forecast characteristics of the label distributions produced by different design algorithms, building upon techniques from prediction-powered inference. The method is guaranteed with high probability to return design algorithms that yield successful label distributions (or the null set if none exist), if the density ratios between the design and labeled data distributions are known. We demonstrate the method's effectiveness in simulated protein and RNA design tasks, in settings with either known or estimated density ratios.

Updated: 2025-07-02 21:17:20

标题: 可靠的算法选择用于机器学习引导设计

摘要: 机器学习引导设计的算法，或称为设计算法，利用基于机器学习的预测提出具有期望属性值的新颖对象。在面对一个新的设计任务时，例如设计具有高结合亲和力的蛋白质与治疗靶点结合，必须选择一个设计算法，并指定任何涉及的超参数和预测和/或生成模型。如何做出这些决策，从而产生成功的设计呢？本文提出了一种设计算法选择方法，旨在选择产生符合用户指定成功标准的设计标签分布的设计算法，例如，至少有百分之十的设计标签超过某个阈值。它通过将设计的预测属性值与留出的标记数据相结合，可靠地预测不同设计算法产生的标签分布的特征，借鉴了预测驱动推理的技术。如果设计数据和标记数据的密度比已知，该方法具有高概率保证返回产生成功标签分布的设计算法（如果不存在，则为空集）。我们在模拟蛋白质和RNA设计任务中展示了该方法在已知或估计密度比情况下的有效性。

更新时间: 2025-07-02 21:17:20

领域: cs.LG,q-bio.QM,stat.ML

下载: http://arxiv.org/abs/2503.20767v2

The Illusion of Fairness: Auditing Fairness Interventions with Audit Studies

Artificial intelligence systems, especially those using machine learning, are being deployed in domains from hiring to loan issuance in order to automate these complex decisions. Judging both the effectiveness and fairness of these AI systems, and their human decision making counterpart, is a complex and important topic studied across both computational and social sciences. Within machine learning, a common way to address bias in downstream classifiers is to resample the training data to offset disparities. For example, if hiring rates vary by some protected class, then one may equalize the rate within the training set to alleviate bias in the resulting classifier. While simple and seemingly effective, these methods have typically only been evaluated using data obtained through convenience samples, introducing selection bias and label bias into metrics. Within the social sciences, psychology, public health, and medicine, audit studies, in which fictitious ``testers'' (e.g., resumes, emails, patient actors) are sent to subjects (e.g., job openings, businesses, doctors) in randomized control trials, provide high quality data that support rigorous estimates of discrimination. In this paper, we investigate how data from audit studies can be used to improve our ability to both train and evaluate automated hiring algorithms. We find that such data reveals cases where the common fairness intervention method of equalizing base rates across classes appears to achieve parity using traditional measures, but in fact has roughly 10% disparity when measured appropriately. We additionally introduce interventions based on individual treatment effect estimation methods that further reduce algorithmic discrimination using this data.

Updated: 2025-07-02 21:15:56

标题: 公平的幻觉：用审计研究审计公平干预措施

摘要: 人工智能系统，尤其是利用机器学习的系统，正在被部署在从招聘到贷款发放等领域，以便自动化这些复杂的决策。评估这些AI系统的效果和公平性，以及它们的人类决策对应物，是一个复杂且重要的主题，被计算和社会科学研究。在机器学习中，解决下游分类器中的偏见的常见方法是对训练数据进行重采样，以抵消不平等。例如，如果通过某个受保护的类别变化，那么可以在训练集中使比率相等，以减轻结果分类器中的偏见。虽然这些方法简单且看似有效，但通常只使用通过便利样本获取的数据进行评估，引入选择偏差和标签偏差到指标中。在社会科学、心理学、公共卫生和医学领域，审计研究，即向受试者（例如工作机会、企业、医生）发送虚构的“测试者”（例如简历、电子邮件、患者演员）进行随机对照试验，提供高质量数据，支持对歧视的严格估计。在本文中，我们研究了如何利用审计研究中的数据来改进我们训练和评估自动化招聘算法的能力。我们发现，这些数据揭示了一些情况，即常见的公平干预方法，即在类别之间平衡基础比率，似乎通过传统指标实现了平等，但实际上在适当衡量时存在大约10％的差异。此外，我们还介绍了基于个体处理效果估计方法的干预措施，利用这些数据进一步减少算法歧视。

更新时间: 2025-07-02 21:15:56

领域: cs.AI

下载: http://arxiv.org/abs/2507.02152v1

Non-exchangeable Conformal Prediction for Temporal Graph Neural Networks

Conformal prediction for graph neural networks (GNNs) offers a promising framework for quantifying uncertainty, enhancing GNN reliability in high-stakes applications. However, existing methods predominantly focus on static graphs, neglecting the evolving nature of real-world graphs. Temporal dependencies in graph structure, node attributes, and ground truth labels violate the fundamental exchangeability assumption of standard conformal prediction methods, limiting their applicability. To address these challenges, in this paper, we introduce NCPNET, a novel end-to-end conformal prediction framework tailored for temporal graphs. Our approach extends conformal prediction to dynamic settings, mitigating statistical coverage violations induced by temporal dependencies. To achieve this, we propose a diffusion-based non-conformity score that captures both topological and temporal uncertainties within evolving networks. Additionally, we develop an efficiency-aware optimization algorithm that improves the conformal prediction process, enhancing computational efficiency and reducing coverage violations. Extensive experiments on diverse real-world temporal graphs, including WIKI, REDDIT, DBLP, and IBM Anti-Money Laundering dataset, demonstrate NCPNET's capability to ensure guaranteed coverage in temporal graphs, achieving up to a 31% reduction in prediction set size on the WIKI dataset, significantly improving efficiency compared to state-of-the-art methods. Our data and code are available at https://github.com/ODYSSEYWT/NCPNET.

Updated: 2025-07-02 21:15:00

标题: 非交换形预测用于时间图神经网络

摘要: 图神经网络（GNNs）的一致性预测为量化不确定性提供了一个有希望的框架，增强了GNN在高风险应用中的可靠性。然而，现有方法主要集中在静态图上，忽视了现实世界图的演变性质。图结构、节点属性和地面真实标签中的时间依赖性违背了标准一致性预测方法的基本可交换性假设，限制了它们的适用性。为了解决这些挑战，在本文中，我们引入了NCPNET，一个专门针对时间图的端到端一致性预测框架。我们的方法将一致性预测扩展到动态设置，减少了由时间依赖性引起的统计覆盖违规。为了实现这一目标，我们提出了一种基于扩散的非一致性评分，捕捉了演化网络中的拓扑和时间不确定性。此外，我们开发了一种效率感知的优化算法，改善了一致性预测过程，增强了计算效率并减少了覆盖违规。对包括WIKI、REDDIT、DBLP和IBM反洗钱数据集在内的多样的真实世界时间图进行了大量实验，展示了NCPNET在时间图中确保覆盖率的能力，使WIKI数据集上的预测集大小减少了高达31%，与最先进方法相比显著提高了效率。我们的数据和代码可在https://github.com/ODYSSEYWT/NCPNET获取。

更新时间: 2025-07-02 21:15:00

领域: cs.LG,H.1.0; I.2.0

下载: http://arxiv.org/abs/2507.02151v1

Fundamental Limits of Hierarchical Secure Aggregation with Cyclic User Association

Secure aggregation is motivated by federated learning (FL) where a cloud server aims to compute an averaged model (i.e., weights of deep neural networks) of the locally-trained models of numerous clients, while adhering to data security requirements. Hierarchical secure aggregation (HSA) extends this concept to a three-layer hierarchical network, where clustered users communicate with the server through an intermediate layer of relays. In HSA, beyond conventional server security, relay security is also enforced to ensure that the relays remain oblivious to the users' inputs (an abstraction of the local models in FL). Existing study on HSA assumes that each user is associated with only one relay, limiting opportunities for coding across inter-cluster users to achieve efficient communication and key generation. In this paper, we consider HSA with a cyclic association pattern where each user is connected to $B$ consecutive relays in a wrap-around manner. We propose an efficient aggregation scheme which includes a message design for the inputs inspired by gradient coding-a well-known technique for efficient communication in distributed computing-along with a highly non-trivial security key design. We also derive novel converse bounds on the minimum achievable communication and key rates using information-theoretic arguments.

Updated: 2025-07-02 21:12:22

标题: 层次安全聚合与循环用户关联的基本限制

摘要: 安全聚合的动机是联邦学习（FL），其中一个云服务器旨在计算众多客户端的本地训练模型的平均模型（即深度神经网络的权重），同时遵守数据安全要求。分层安全聚合（HSA）将这个概念扩展到一个三层分层网络，其中集群用户通过一个中间层的中继与服务器通信。在HSA中，除了传统的服务器安全性外，还实施了中继安全性，以确保中继对用户的输入保持无知（FL中本地模型的抽象）。现有关于HSA的研究假设每个用户只与一个中继相关联，限制了通过跨集群用户进行编码以实现高效通信和密钥生成的机会。在本文中，我们考虑了一种循环关联模式的HSA，其中每个用户以环绕方式与$B$个连续中继相连。我们提出了一种高效的聚合方案，其中包括受梯度编码启发的输入的消息设计-一种用于分布式计算中高效通信的众所周知的技术-以及一个极其复杂的安全密钥设计。我们还根据信息论论证推导出了关于最小可实现的通信和密钥速率的新颖的对话界限。

更新时间: 2025-07-02 21:12:22

领域: cs.IT,cs.AI,cs.CR,cs.DC,math.IT

下载: http://arxiv.org/abs/2503.04564v4

Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical Center

Background: Generative artificial intelligence (AI) deployment in academic medical settings raises copyright compliance concerns. Dana-Farber Cancer Institute implemented GPT4DFCI, an internal generative AI tool utilizing OpenAI models, that is approved for enterprise use in research and operations. Given (1) the exceptionally broad adoption of the tool in our organization, (2) our research mission, and (3) the shared responsibility model required to benefit from Customer Copyright Commitment in Azure OpenAI Service products, we deemed rigorous copyright compliance testing necessary. Case Description: We conducted a structured red teaming exercise in Nov. 2024, with 42 participants from academic, industry, and government institutions. Four teams attempted to extract copyrighted content from GPT4DFCI across four domains: literary works, news articles, scientific publications, and access-restricted clinical notes. Teams successfully extracted verbatim book dedications and near-exact passages through various strategies. News article extraction failed despite jailbreak attempts. Scientific article reproduction yielded only high-level summaries. Clinical note testing revealed appropriate privacy safeguards. Discussion: The successful extraction of literary content indicates potential copyrighted material presence in training data, necessitating inference-time filtering. Differential success rates across content types suggest varying protective mechanisms. The event led to implementation of a copyright-specific meta-prompt in GPT4DFCI; this mitigation has been in production since Jan. 2025. Conclusion: Systematic red teaming revealed specific vulnerabilities in generative AI copyright compliance, leading to concrete mitigation strategies. Academic medical institutions deploying generative AI should implement continuous testing protocols to ensure legal and ethical compliance.

Updated: 2025-07-02 21:04:41

标题: 生成式人工智能的红队行动：在学术医疗中心完成的以版权为重点的演习报告

摘要: 背景：在学术医疗设置中部署生成式人工智能（AI）引发了版权合规方面的担忧。丹娜-法伯癌症研究所实施了GPT4DFCI，这是一个内部生成式AI工具，利用OpenAI模型，已获批准用于研究和运营。鉴于（1）我们组织对该工具的异常广泛采用，（2）我们的研究使命，以及（3）要从Azure OpenAI服务产品中受益需要遵守客户版权承诺的共享责任模型，我们认为有必要进行严格的版权合规测试。案例描述：我们在2024年11月进行了一次结构化的红队演练，有来自学术、行业和政府机构的42名参与者。四个团队尝试从GPT4DFCI中提取受版权保护的内容，涵盖了文学作品、新闻文章、科学出版物和受访问限制的临床笔记四个领域。通过各种策略，团队成功地提取了完全相同的致辞和几乎完全相同的段落。尽管进行了越狱尝试，但新闻文章提取失败。科学文章复制仅产生了高层次摘要。临床笔记测试显示出适当的隐私保障。讨论：文学内容的成功提取表明在训练数据中存在可能受版权保护的材料，需要在推理时进行过滤。不同内容类型之间的成功率差异表明存在不同的保护机制。此事件导致在GPT4DFCI中实施了一个特定于版权的元提示；这一缓解措施自2025年1月起已投入生产。结论：系统性的红队演练揭示了生成式AI版权合规方面的具体漏洞，导致了具体的缓解策略。部署生成式AI的学术医疗机构应实施持续的测试协议，以确保法律和道德合规。

更新时间: 2025-07-02 21:04:41

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2506.22523v3

Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization

Dialogue summarization is a challenging task with significant practical value in customer service, meeting analysis, and conversational AI. Although large language models (LLMs) have achieved substantial progress in summarization tasks, the performance of step-by-step reasoning architectures-specifically Long Chain-of-Thought (CoT) implementations such as OpenAI-o1 and DeepSeek-R1-remains unexplored for dialogue scenarios requiring concurrent abstraction and conciseness. In this work, we present the first comprehensive and systematic evaluation of state-of-the-art reasoning LLMs and non-reasoning LLMs across three major paradigms-generic, role-oriented, and query-oriented dialogue summarization. Our study spans diverse languages, domains, and summary lengths, leveraging strong benchmarks (SAMSum, DialogSum, CSDS, and QMSum) and advanced evaluation protocols that include both LLM-based automatic metrics and human-inspired criteria. Contrary to trends in other reasoning-intensive tasks, our findings show that explicit stepwise reasoning does not consistently improve dialogue summarization quality. Instead, reasoning LLMs are often prone to verbosity, factual inconsistencies, and less concise summaries compared to their non-reasoning counterparts. Through scenario-specific analyses and detailed case studies, we further identify when and why explicit reasoning may fail to benefit-or even hinder-summarization in complex dialogue contexts. Our work provides new insights into the limitations of current reasoning LLMs and highlights the need for targeted modeling and evaluation strategies for real-world dialogue summarization.

Updated: 2025-07-02 21:02:41

标题: 推理还是不推理？对话摘要的推理LLMs的全面评估

摘要: 对话摘要是一项具有重要实际价值的挑战性任务，适用于客户服务、会议分析和对话人工智能。尽管大型语言模型（LLMs）在摘要任务中取得了重大进展，但基于逐步推理架构特别是长链式思维（CoT）实现（如OpenAI-o1和DeepSeek-R1）的性能尚未在需要同时抽象和简洁的对话场景中得到探索。在这项工作中，我们首次对最先进的推理LLMs和非推理LLMs在三种主要范式-通用、角色导向和查询导向对话摘要-进行全面系统的评估。我们的研究涵盖了多种语言、领域和摘要长度，利用了强大的基准（SAMSum、DialogSum、CSDS和QMSum）和包括LLM基于自动评估指标和人类启发标准的先进评估协议。与其他推理密集型任务的趋势相反，我们的研究结果显示，显式逐步推理并不能始终提高对话摘要的质量。相反，推理LLMs往往容易产生冗长、事实不一致和不够简洁的摘要，与非推理对应物相比。通过场景特定的分析和详细案例研究，我们进一步确定了在复杂对话背景下显式推理可能不利于-甚至妨碍-摘要的情况和原因。我们的工作为当前推理LLMs的局限性提供了新的见解，并强调了对真实世界对话摘要的定向建模和评估策略的需求。

更新时间: 2025-07-02 21:02:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.02145v1

When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search

Large language models (LLMs) are increasingly used to assign document relevance labels in information retrieval pipelines, especially in domains lacking human-labeled data. However, different models often disagree on borderline cases, raising concerns about how such disagreement affects downstream retrieval. This study examines labeling disagreement between two open-weight LLMs, LLaMA and Qwen, on a corpus of scholarly abstracts related to Sustainable Development Goals (SDGs) 1, 3, and 7. We isolate disagreement subsets and examine their lexical properties, rank-order behavior, and classification predictability. Our results show that model disagreement is systematic, not random: disagreement cases exhibit consistent lexical patterns, produce divergent top-ranked outputs under shared scoring functions, and are distinguishable with AUCs above 0.74 using simple classifiers. These findings suggest that LLM-based filtering introduces structured variability in document retrieval, even under controlled prompting and shared ranking logic. We propose using classification disagreement as an object of analysis in retrieval evaluation, particularly in policy-relevant or thematic search tasks.

Updated: 2025-07-02 20:53:51

标题: 当LLM不一致时：诊断SDG搜索中的相关性过滤偏见和检索差异

摘要: 大型语言模型（LLMs）越来越被用于在信息检索管道中分配文档相关性标签，特别是在缺乏人工标记数据的领域。然而，不同模型在边缘案例上经常存在分歧，引发了对这种分歧如何影响下游检索的担忧。本研究检验了两个开放权重LLMs，LLaMA和Qwen，在与可持续发展目标（SDGs）1、3和7相关的学术摘要语料库上的标记分歧。我们分离了分歧子集，并检查它们的词汇特性、排名行为和分类可预测性。我们的结果表明，模型分歧是系统性的，而不是随机的：分歧案例表现出一致的词汇模式，在共享评分函数下产生不同的排名输出，并且可以使用简单分类器得到高于0.74的AUC来区分。这些发现表明，基于LLM的过滤在文档检索中引入了结构化的变异性，即使在受控提示和共享排名逻辑下也是如此。我们建议将分类分歧作为检索评估的分析对象，特别是在政策相关或主题搜索任务中。

更新时间: 2025-07-02 20:53:51

领域: cs.IR,cs.AI,cs.DL

下载: http://arxiv.org/abs/2507.02139v1

ODE$_t$(ODE$_l$): Shortcutting the Time and Length in Diffusion and Flow Models for Faster Sampling

Recently, continuous normalizing flows (CNFs) and diffusion models (DMs) have been studied using the unified theoretical framework. Although such models can generate high-quality data points from a noise distribution, the sampling demands multiple iterations to solve an ordinary differential equation (ODE) with high computational complexity. Most existing methods focus on reducing the number of time steps during the sampling process to improve efficiency. In this work, we explore a complementary direction in which the quality-complexity tradeoff can be dynamically controlled in terms of time steps and in the length of the neural network. We achieve this by rewiring the blocks in the transformer-based architecture to solve an inner discretized ODE w.r.t. its length. Then, we employ time- and length-wise consistency terms during flow matching training, and as a result, the sampling can be performed with an arbitrary number of time steps and transformer blocks. Unlike others, our ODE$_t$(ODE$_l$) approach is solver-agnostic in time dimension and decreases both latency and memory usage. Compared to the previous state of the art, image generation experiments on CelebA-HQ and ImageNet show a latency reduction of up to 3$\times$ in the most efficient sampling mode, and a FID score improvement of up to 3.5 points for high-quality sampling. We release our code and model weights with fully reproducible experiments.

Updated: 2025-07-02 20:53:10

标题: ODE$_t$(ODE$_l$)：加快扩散和流动模型的采样速度的时间和长度的捷径

摘要: 最近，使用统一的理论框架研究了连续归一化流（CNFs）和扩散模型（DMs）。尽管这些模型可以从噪声分布中生成高质量的数据点，但抽样需要多次迭代来解决具有高计算复杂性的常微分方程（ODE）。大多数现有方法侧重于在抽样过程中减少时间步数以提高效率。在这项工作中，我们探索了一个互补的方向，可以动态控制质量-复杂度的权衡，具体表现在时间步数和神经网络长度上。我们通过重新连接基于转换器的架构中的块来解决内部离散化ODE与其长度。然后，在流匹配训练过程中使用时间和长度一致性项，结果是可以使用任意数量的时间步数和转换器块进行抽样。与其他方法不同，我们的ODE$_t$（ODE$_l$）方法在时间维度上不依赖于求解器，并且降低了延迟和内存使用。与先前的最新技术相比，在CelebA-HQ和ImageNet上的图像生成实验中，最高效的抽样模式下延迟降低了高达3倍，并且高质量抽样的FID分数提高了高达3.5分。我们发布了我们的代码和模型权重，具有完全可复现的实验。

更新时间: 2025-07-02 20:53:10

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2506.21714v2

Generating Hypotheses of Dynamic Causal Graphs in Neuroscience: Leveraging Generative Factor Models of Observed Time Series

The field of hypothesis generation promises to reduce costs in neuroscience by narrowing the range of interventional studies needed to study various phenomena. Existing machine learning methods can generate scientific hypotheses from complex datasets, but many approaches assume causal relationships are static over time, limiting their applicability to systems with dynamic, state-dependent behavior, such as the brain. While some techniques attempt dynamic causal discovery through factor models, they often restrict relationships to linear patterns or impose other simplifying assumptions. We propose a novel method that models dynamic graphs as a conditionally weighted superposition of static graphs, where each static graph can capture nonlinear relationships. This approach enables the detection of complex, time-varying interactions between variables beyond linear limitations. Our method improves f1-scores of predicted dynamic causal patterns by roughly 22-28% on average over baselines in some of our experiments, with some improvements reaching well over 60%. A case study on real brain data demonstrates our method's ability to uncover relationships linked to specific behavioral states, offering valuable insights into neural dynamics.

Updated: 2025-07-02 20:42:02

标题: 在神经科学中生成动态因果图的假设：利用观察时间序列的生成性因子模型

摘要: 假设生成领域承诺通过缩小研究各种现象所需的干预性研究范围，从而降低神经科学成本。现有的机器学习方法可以从复杂数据集中生成科学假设，但许多方法假定因果关系随时间保持不变，从而限制了它们对具有动态、状态依赖行为的系统（如大脑）的适用性。虽然一些技术尝试通过因子模型进行动态因果发现，但它们通常将关系限制在线性模式或施加其他简化假设。我们提出了一种新颖的方法，将动态图建模为静态图的条件加权叠加，其中每个静态图可以捕捉非线性关系。这种方法能够检测超越线性限制的变量之间的复杂、时变的相互作用。在我们的一些实验中，我们的方法将预测的动态因果模式的f1分数平均提高了约22-28%，有些改进甚至达到了60%以上。对真实脑数据的案例研究展示了我们的方法揭示与特定行为状态相关的关系的能力，为神经动力学提供了宝贵的见解。

更新时间: 2025-07-02 20:42:02

领域: cs.LG,cs.AI,stat.AP,stat.ML

下载: http://arxiv.org/abs/2505.20697v3

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Scaling the capacity of language models has consistently proven to be a reliable approach for improving performance and unlocking new capabilities. Capacity can be primarily defined by two dimensions: the number of model parameters and the compute per example. While scaling typically involves increasing both, the precise interplay between these factors and their combined contribution to overall capacity remains not fully understood. We explore this relationship in the context of sparse Mixture-of-Experts (MoEs), which allow scaling the number of parameters without proportionally increasing the FLOPs per example. We investigate how varying the sparsity level, i.e., the fraction of inactive parameters, impacts model's performance during pretraining and downstream few-shot evaluation. We find that under different constraints (e.g., parameter size and total training compute), there is an optimal level of sparsity that improves both training efficiency and model performance. These results provide a better understanding of the impact of sparsity in scaling laws for MoEs and complement existing works in this area, offering insights for designing more efficient architectures.

Updated: 2025-07-02 20:37:41

标题: Parameters vs FLOPs: 杂交专家语言模型最佳稀疏度的扩展定律

摘要: 扩展语言模型的容量一直被证明是提高性能和解锁新能力的可靠方法。容量主要由两个维度定义：模型参数数量和每个示例的计算量。虽然扩展通常涉及增加这两个因素，但它们之间的精确相互作用及其对整体容量的综合贡献仍未完全理解。我们在稀疏专家混合（MoEs）的背景下探讨了这种关系，这种方法允许扩展参数数量而不需要成比例增加每个示例的FLOPs。我们研究了如何在预训练和下游少样本评估过程中改变稀疏水平（即非活动参数的比例）影响模型性能。我们发现在不同的约束条件下（例如参数大小和总训练计算量），存在一种最佳稀疏水平，可以提高训练效率和模型性能。这些结果增进了我们对MoEs扩展定律中稀疏性影响的理解，并补充了这一领域现有的研究，为设计更高效的架构提供了见解。

更新时间: 2025-07-02 20:37:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2501.12370v3

Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction

Generative models have demonstrated strong performance in conditional settings and can be viewed as a form of data compression, where the condition serves as a compact representation. However, their limited controllability and reconstruction accuracy restrict their practical application to data compression. In this work, we propose an efficient latent diffusion framework that bridges this gap by combining a variational autoencoder with a conditional diffusion model. Our method compresses only a small number of keyframes into latent space and uses them as conditioning inputs to reconstruct the remaining frames via generative interpolation, eliminating the need to store latent representations for every frame. This approach enables accurate spatiotemporal reconstruction while significantly reducing storage costs. Experimental results across multiple datasets show that our method achieves up to 10 times higher compression ratios than rule-based state-of-the-art compressors such as SZ3, and up to 63 percent better performance than leading learning-based methods under the same reconstruction error.

Updated: 2025-07-02 20:27:38

标题: 生成式潜在扩散用于高效时空数据降维

摘要: 生成模型在有条件设置下表现出强大的性能，并可以被视为一种数据压缩形式，其中条件作为紧凑的表示。然而，它们有限的可控性和重建精度限制了它们在数据压缩方面的实际应用。在这项工作中，我们提出了一种高效的潜在扩散框架，通过将变分自动编码器与条件扩散模型结合起来，弥合了这一差距。我们的方法仅将少量关键帧压缩到潜在空间，并将它们用作重建其余帧的条件输入，通过生成插值来消除存储每帧的潜在表示的需求。这种方法在显著降低存储成本的同时实现了准确的时空重建。跨多个数据集的实验结果显示，我们的方法实现了比基于规则的最先进压缩器（如SZ3）高达10倍的压缩比，以及比同样重建误差下领先的基于学习的方法高达63％的性能。

更新时间: 2025-07-02 20:27:38

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.02129v1

CROP: Circuit Retrieval and Optimization with Parameter Guidance using LLMs

Modern very large-scale integration (VLSI) design requires the implementation of integrated circuits using electronic design automation (EDA) tools. Due to the complexity of EDA algorithms, the vast parameter space poses a huge challenge to chip design optimization, as the combination of even moderate numbers of parameters creates an enormous solution space to explore. Manual parameter selection remains industrial practice despite being excessively laborious and limited by expert experience. To address this issue, we present CROP, the first large language model (LLM)-powered automatic VLSI design flow tuning framework. Our approach includes: (1) a scalable methodology for transforming RTL source code into dense vector representations, (2) an embedding-based retrieval system for matching designs with semantically similar circuits, and (3) a retrieval-augmented generation (RAG)-enhanced LLM-guided parameter search system that constrains the search process with prior knowledge from similar designs. Experiment results demonstrate CROP's ability to achieve superior quality-of-results (QoR) with fewer iterations than existing approaches on industrial designs, including a 9.9% reduction in power consumption.

Updated: 2025-07-02 20:25:47

标题: CROP: 利用LLMs进行电路检索和优化的参数引导

摘要: 现代超大规模集成（VLSI）设计需要使用电子设计自动化（EDA）工具实现集成电路。由于EDA算法的复杂性，庞大的参数空间对芯片设计优化构成了巨大挑战，即使是适度数量的参数组合也会产生一个巨大的解空间进行探索。尽管手动参数选择是工业实践，但由于过于繁琐且受限于专家经验。为了解决这个问题，我们提出了CROP，这是第一个由大型语言模型（LLM）驱动的自动VLSI设计流程调优框架。我们的方法包括：（1）一种可扩展的方法论，将RTL源代码转化为密集向量表示，（2）一种基于嵌入式检索系统，用于将设计与语义上相似的电路进行匹配，以及（3）一种检索增强生成（RAG）增强的LLM引导参数搜索系统，该系统通过来自相似设计的先验知识来约束搜索过程。实验结果表明，CROP能够在工业设计上比现有方法更少的迭代次数中实现优越的结果质量（QoR），包括功耗降低了9.9%。

更新时间: 2025-07-02 20:25:47

领域: cs.LG

下载: http://arxiv.org/abs/2507.02128v1

Identifying Systems with Symmetries using Equivariant Autoregressive Reservoir Computers

The investigation reported in this document focuses on identifying systems with symmetries using equivariant autoregressive reservoir computers. General results in structured matrix approximation theory are presented, exploring a two-fold approach. Firstly, a comprehensive examination of generic symmetry-preserving nonlinear time delay embedding is conducted. This involves analyzing time series data sampled from an equivariant system under study. Secondly, sparse least-squares methods are applied to discern approximate representations of the output coupling matrices. These matrices play a critical role in determining the nonlinear autoregressive representation of an equivariant system. The structural characteristics of these matrices are dictated by the set of symmetries inherent in the system. The document outlines prototypical algorithms derived from the described techniques, offering insight into their practical applications. Emphasis is placed on the significant improvement on structured identification precision when compared to classical reservoir computing methods for the simulation of equivariant dynamical systems.

Updated: 2025-07-02 20:23:07

标题: 用等变自回归储水池计算机识别具有对称性的系统

摘要: 这份文件中报告的调查重点是利用等变自回归储水池计算机识别具有对称性的系统。文中提出了结构矩阵逼近理论中的一般结果，并探讨了一种双重方法。首先，进行了对保持对称性的非线性时间延迟嵌入的全面研究。这涉及分析从所研究的等变系统中采样的时间序列数据。其次，应用稀疏最小二乘方法来辨别输出耦合矩阵的近似表示。这些矩阵在确定等变系统的非线性自回归表示中起着关键作用。这些矩阵的结构特征由系统固有的对称性集合所决定。文件概述了从所描述技术中派生的原型算法，为其实际应用提供了洞察。重点放在结构化识别精度的显著提高上，与用于模拟等变动力系统的经典储水池计算方法相比。

更新时间: 2025-07-02 20:23:07

领域: eess.SY,cs.LG,cs.SY,math.OC

下载: http://arxiv.org/abs/2311.09511v4

Can Artificial Intelligence solve the blockchain oracle problem? Unpacking the Challenges and Possibilities

The blockchain oracle problem, which refers to the challenge of injecting reliable external data into decentralized systems, remains a fundamental limitation to the development of trustless applications. While recent years have seen a proliferation of architectural, cryptographic, and economic strategies to mitigate this issue, no one has yet fully resolved the fundamental question of how a blockchain can gain knowledge about the off-chain world. In this position paper, we critically assess the role artificial intelligence (AI) can play in tackling the oracle problem. Drawing from both academic literature and practitioner implementations, we examine how AI techniques such as anomaly detection, language-based fact extraction, dynamic reputation modeling, and adversarial resistance can enhance oracle systems. We observe that while AI introduces powerful tools for improving data quality, source selection, and system resilience, it cannot eliminate the reliance on unverifiable off-chain inputs. Therefore, this study supports the idea that AI should be understood as a complementary layer of inference and filtering within a broader oracle design, not a substitute for trust assumptions.

Updated: 2025-07-02 20:15:21

标题: 人工智能能解决区块链预言机问题吗？揭示挑战和可能性

摘要: 区块链预言机问题是指向去中心化系统注入可靠外部数据的挑战，仍然是信任应用程序发展的一个基本限制。近年来，出现了许多架构、加密和经济策略来缓解这个问题，但没有人完全解决了区块链如何获取有关链下世界的知识的根本问题。在这篇立场论文中，我们批判性地评估了人工智能（AI）在解决预言机问题中的作用。我们从学术文献和从业者实施中汲取灵感，探讨了AI技术如异常检测、基于语言的事实提取、动态声誉建模和对抗性抵抗如何增强预言机系统。我们观察到，虽然AI引入了强大的工具来提高数据质量、源选择和系统弹性，但它不能消除对无法验证的链下输入的依赖。因此，这项研究支持AI应被视为在更广泛的预言机设计中的一层推理和过滤的补充层，而不是对信任假设的替代。

更新时间: 2025-07-02 20:15:21

领域: cs.CR,cs.AI,cs.CY,cs.GT,cs.LG,11, 62, 68, 90, 91,F.0; F.4; H.4; H.5; I.2

下载: http://arxiv.org/abs/2507.02125v1

TAROT: Targeted Data Selection via Optimal Transport

We propose TAROT, a targeted data selection framework grounded in optimal transport theory. Previous targeted data selection methods primarily rely on influence-based greedy heuristics to enhance domain-specific performance. While effective on limited, unimodal data (i.e., data following a single pattern), these methods struggle as target data complexity increases. Specifically, in multimodal distributions, these heuristics fail to account for multiple inherent patterns, leading to suboptimal data selection. This work identifies two primary factors contributing to this limitation: (i) the disproportionate impact of dominant feature components in high-dimensional influence estimation, and (ii) the restrictive linear additive assumptions inherent in greedy selection strategies. To address these challenges, TAROT incorporates whitened feature distance to mitigate dominant feature bias, providing a more reliable measure of data influence. Building on this, TAROT uses whitened feature distance to quantify and minimize the optimal transport distance between the selected data and target domains. Notably, this minimization also facilitates the estimation of optimal selection ratios. We evaluate TAROT across multiple tasks, including semantic segmentation, motion prediction, and instruction tuning. Results consistently show that TAROT outperforms state-of-the-art methods, highlighting its versatility across various deep learning tasks. Code is available at https://github.com/vita-epfl/TAROT.

Updated: 2025-07-02 19:48:29

标题: 塔罗牌：通过最优输运进行有针对性的数据选择

摘要: 我们提出了TAROT，一个基于最优输运理论的目标数据选择框架。先前的目标数据选择方法主要依赖于基于影响力的贪婪启发式方法来增强特定领域的性能。虽然在有限的、单一模式的数据上有效（即遵循单一模式的数据），但随着目标数据复杂性的增加，这些方法在多模态分布中遇到困难。具体来说，在多模态分布中，这些启发式方法未能考虑到多个固有模式，导致数据选择不够优化。本文确定了导致这一限制的两个主要因素：（i）在高维影响力估计中显著特征组件的不成比例影响，以及（ii）贪婪选择策略中固有的限制性线性加性假设。为了解决这些挑战，TAROT结合了白化特征距离来减轻主导特征偏差，提供更可靠的数据影响度量。在此基础上，TAROT使用白化特征距离来量化和最小化所选数据与目标领域之间的最优输运距离。值得注意的是，这种最小化还有助于估计最优选择比例。我们在多个任务上评估了TAROT，包括语义分割、运动预测和指令调优。结果一致表明，TAROT优于最先进的方法，突显了其在各种深度学习任务中的多功能性。代码可在https://github.com/vita-epfl/TAROT 上找到。

更新时间: 2025-07-02 19:48:29

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2412.00420v2

Parametric Neural Amp Modeling with Active Learning

We introduce PANAMA, an active learning framework for the training of end-to-end parametric guitar amp models using a WaveNet-like architecture. With \model, one can create a virtual amp by recording samples that are determined by an active learning strategy to use a minimum amount of datapoints (i.e., amp knob settings). We show that gradient-based optimization algorithms can be used to determine the optimal datapoints to sample, and that the approach helps under a constrained number of samples.

Updated: 2025-07-02 19:47:23

标题: 使用主动学习的参数神经放大建模

摘要: 我们介绍了PANAMA，一个用于训练端到端参数化吉他放大器模型的主动学习框架，使用类似WaveNet的架构。通过该模型，可以通过记录由主动学习策略确定的样本来创建虚拟放大器，以使用最少量的数据点（即放大器旋钮设置）。我们展示了基于梯度的优化算法可以用来确定最佳的数据点采样，并且该方法有助于在受限数量的样本下进行理解。

更新时间: 2025-07-02 19:47:23

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.02109v1

Resolving Turbulent Magnetohydrodynamics: A Hybrid Operator-Diffusion Framework

We present a hybrid machine learning framework that combines Physics-Informed Neural Operators (PINOs) with score-based generative diffusion models to simulate the full spatio-temporal evolution of two-dimensional, incompressible, resistive magnetohydrodynamic (MHD) turbulence across a broad range of Reynolds numbers ($\mathrm{Re}$). The framework leverages the equation-constrained generalization capabilities of PINOs to predict coherent, low-frequency dynamics, while a conditional diffusion model stochastically corrects high-frequency residuals, enabling accurate modeling of fully developed turbulence. Trained on a comprehensive ensemble of high-fidelity simulations with $\mathrm{Re} \in \{100, 250, 500, 750, 1000, 3000, 10000\}$, the approach achieves state-of-the-art accuracy in regimes previously inaccessible to deterministic surrogates. At $\mathrm{Re}=1000$ and $3000$, the model faithfully reconstructs the full spectral energy distributions of both velocity and magnetic fields late into the simulation, capturing non-Gaussian statistics, intermittent structures, and cross-field correlations with high fidelity. At extreme turbulence levels ($\mathrm{Re}=10000$), it remains the first surrogate capable of recovering the high-wavenumber evolution of the magnetic field, preserving large-scale morphology and enabling statistically meaningful predictions.

Updated: 2025-07-02 19:33:57

标题: 解决湍流磁流体力学问题：混合运算符-扩散框架

摘要: 我们提出了一种混合机器学习框架，将物理信息神经算子（PINOs）与基于评分的生成扩散模型结合起来，以模拟二维不可压缩、电阻性磁流体力学（MHD）湍流的完整时空演变，涵盖广泛的雷诺数范围（$\mathrm{Re}$）。该框架利用PINOs的方程约束泛化能力来预测连贯的低频动态，同时条件扩散模型随机校正高频残差，实现对完全发展湍流的准确建模。在一组包括$\mathrm{Re} \in \{100, 250, 500, 750, 1000, 3000, 10000\}$的高保真模拟集上训练，该方法在先前无法访问的区域达到了最先进的准确性。在$\mathrm{Re}=1000$和$3000$时，该模型忠实地重建了速度和磁场的完整谱能量分布，捕捉了非高斯统计、间歇结构和高保真度的交叉场相关性。在极端湍流水平（$\mathrm{Re}=10000$）下，它仍然是第一个能够恢复磁场高波数演变、保留大尺度形态并实现统计上有意义预测的替代品。

更新时间: 2025-07-02 19:33:57

领域: physics.flu-dyn,cs.AI,cs.LG,gr-qc,physics.comp-ph,J.2; I.2

下载: http://arxiv.org/abs/2507.02106v1

What Neuroscience Can Teach AI About Learning in Continuously Changing Environments

Modern AI models, such as large language models, are usually trained once on a huge corpus of data, potentially fine-tuned for a specific task, and then deployed with fixed parameters. Their training is costly, slow, and gradual, requiring billions of repetitions. In stark contrast, animals continuously adapt to the ever-changing contingencies in their environments. This is particularly important for social species, where behavioral policies and reward outcomes may frequently change in interaction with peers. The underlying computational processes are often marked by rapid shifts in an animal's behaviour and rather sudden transitions in neuronal population activity. Such computational capacities are of growing importance for AI systems operating in the real world, like those guiding robots or autonomous vehicles, or for agentic AI interacting with humans online. Can AI learn from neuroscience? This Perspective explores this question, integrating the literature on continual and in-context learning in AI with the neuroscience of learning on behavioral tasks with shifting rules, reward probabilities, or outcomes. We will outline an agenda for how specifically insights from neuroscience may inform current developments in AI in this area, and - vice versa - what neuroscience may learn from AI, contributing to the evolving field of NeuroAI.

Updated: 2025-07-02 19:30:57

标题: 神经科学如何教导人工智能在不断变化的环境中学习

摘要: 现代人工智能模型，如大型语言模型，通常在一个巨大的数据语料库上进行一次训练，可能会针对特定任务进行微调，然后以固定参数部署。它们的训练成本高昂、速度慢且渐进，需要数十亿次重复。与之形成鲜明对比的是，动物不断适应环境中不断变化的情况。这对于社会物种尤为重要，其中行为策略和奖励结果可能经常与同伴互动中发生变化。这种计算过程通常表现为动物行为的快速转变和神经群体活动的相对突然转变。这种计算能力对于在现实世界中运行的人工智能系统越来越重要，比如指导机器人或自动驾驶车辆的系统，或者用于在线与人类互动的代理人工智能。人工智能能够从神经科学中学习吗？本文探讨了这个问题，将人工智能中关于持续学习和上下文学习的文献与神经科学中关于在具有变化规则、奖励概率或结果的行为任务上学习的文献整合在一起。我们将概述神经科学洞察如何具体影响当前人工智能领域的发展议程，以及神经科学可以从人工智能中学到什么，为神经人工智能这一不断发展的领域做出贡献。

更新时间: 2025-07-02 19:30:57

领域: cs.AI,q-bio.NC,I.2; I.6; A.1

下载: http://arxiv.org/abs/2507.02103v1

A robust and adaptive MPC formulation for Gaussian process models

In this paper, we present a robust and adaptive model predictive control (MPC) framework for uncertain nonlinear systems affected by bounded disturbances and unmodeled nonlinearities. We use Gaussian Processes (GPs) to learn the uncertain dynamics based on noisy measurements, including those collected during system operation. As a key contribution, we derive robust predictions for GP models using contraction metrics, which are incorporated in the MPC formulation. The proposed design guarantees recursive feasibility, robust constraint satisfaction and convergence to a reference state, with high probability. We provide a numerical example of a planar quadrotor subject to difficult-to-model ground effects, which highlights significant improvements achieved through the proposed robust prediction method and through online learning.

Updated: 2025-07-02 19:27:14

标题: 一个鲁棒且自适应的高斯过程模型的MPC公式

摘要: 在本文中，我们提出了一种鲁棒且自适应的模型预测控制（MPC）框架，用于受到有界干扰和未建模非线性影响的不确定非线性系统。我们使用高斯过程（GPs）来学习基于嘈杂测量的不确定动力学，包括在系统运行期间收集的测量数据。作为一个重要贡献，我们利用收缩度量导出了GP模型的鲁棒预测，并将其纳入MPC的制定中。所提出的设计确保了递归可行性、鲁棒约束满足和高概率收敛到参考状态。我们提供了一个平面四旋翼机器人的数值例子，该机器人受到难以建模的地面效应影响，这突出了通过所提出的鲁棒预测方法和在线学习实现的显著改进。

更新时间: 2025-07-02 19:27:14

领域: eess.SY,cs.LG,cs.SY,math.OC

下载: http://arxiv.org/abs/2507.02098v1

Asymptotically perfect seeded graph matching without edge correlation (and applications to inference)

We present the OmniMatch algorithm for seeded multiple graph matching. In the setting of $d$-dimensional Random Dot Product Graphs (RDPG), we prove that under mild assumptions, OmniMatch with $s$ seeds asymptotically and efficiently perfectly aligns $O(s^{\alpha})$ unseeded vertices -- for $\alpha<2\wedge d/4$ -- across multiple networks even in the presence of no edge correlation. We demonstrate the effectiveness of our algorithm across numerous simulations and in the context of shuffled graph hypothesis testing. In the shuffled testing setting, testing power is lost due to the misalignment/shuffling of vertices across graphs, and we demonstrate the capacity of OmniMatch to correct for misaligned vertices prior to testing and hence recover the lost testing power. We further demonstrate the algorithm on a pair of data examples from connectomics and machine translation.

Updated: 2025-07-02 19:19:46

标题: 渐近完美的有种子图匹配无边相关性（及其在推断中的应用）

摘要: 我们提出了OmniMatch算法用于种子多图匹配。在$d$维随机点积图（RDPG）的环境中，我们证明在温和的假设下，OmniMatch算法在$s$个种子的情况下，即使在没有边缘相关性的情况下，也可以在多个网络中渐近且高效地完美对齐$O(s^{\alpha})$个未种子化的顶点，其中$\alpha<2\wedge d/4$。我们通过大量模拟和洗牌图假设检验的背景展示了我们算法的有效性。在洗牌测试环境中，由于在图之间的顶点错位/洗牌导致测试功率丧失，我们展示了OmniMatch算法在测试之前校正错位顶点的能力，从而恢复了丧失的测试功率。我们进一步在连接组学和机器翻译的一对数据示例上展示了该算法的效果。

更新时间: 2025-07-02 19:19:46

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2506.02825v2

Energy-Based Transformers are Scalable Learners and Thinkers

Inference-time computation techniques, analogous to human System 2 Thinking, have recently become popular for improving model performances. However, most existing approaches suffer from several limitations: they are modality-specific (e.g., working only in text), problem-specific (e.g., verifiable domains like math and coding), or require additional supervision/training on top of unsupervised pretraining (e.g., verifiers or verifiable rewards). In this paper, we ask the question "Is it possible to generalize these System 2 Thinking approaches, and develop models that learn to think solely from unsupervised learning?" Interestingly, we find the answer is yes, by learning to explicitly verify the compatibility between inputs and candidate-predictions, and then re-framing prediction problems as optimization with respect to this verifier. Specifically, we train Energy-Based Transformers (EBTs) -- a new class of Energy-Based Models (EBMs) -- to assign an energy value to every input and candidate-prediction pair, enabling predictions through gradient descent-based energy minimization until convergence. Across both discrete (text) and continuous (visual) modalities, we find EBTs scale faster than the dominant Transformer++ approach during training, achieving an up to 35% higher scaling rate with respect to data, batch size, parameters, FLOPs, and depth. During inference, EBTs improve performance with System 2 Thinking by 29% more than the Transformer++ on language tasks, and EBTs outperform Diffusion Transformers on image denoising while using fewer forward passes. Further, we find that EBTs achieve better results than existing models on most downstream tasks given the same or worse pretraining performance, suggesting that EBTs generalize better than existing approaches. Consequently, EBTs are a promising new paradigm for scaling both the learning and thinking capabilities of models.

Updated: 2025-07-02 19:17:29

标题: 基于能量的变压器是可扩展的学习者和思考者。

摘要: 推理时间计算技术，类似于人类系统2思维，最近变得流行起来，用于提高模型性能。然而，大多数现有方法存在几个限制：它们是特定于模态的（例如，仅适用于文本），特定于问题的（例如，可验证的领域如数学和编码），或者需要在无监督预训练的基础上额外的监督/训练（例如，验证器或可验证的奖励）。在本文中，我们提出一个问题：“是否可能推广这些系统2思维方法，并开发仅通过无监督学习学会思考的模型？”有趣的是，我们发现答案是肯定的，通过学习明确验证输入和候选预测之间的兼容性，然后将预测问题重新构建为相对于这个验证器的优化。具体来说，我们训练能量基变压器（EBTs）-一种新类别的能量基模型（EBMs）-为每个输入和候选预测对分配一个能量值，通过基于梯度下降的能量最小化进行预测，直到收敛。在离散（文本）和连续（视觉）模态中，我们发现EBTs在训练过程中比主导的Transformer++方法更快地扩展，相对于数据、批量大小、参数、FLOPs和深度，实现高达35%的更高扩展速率。在推理过程中，EBTs在语言任务上比Transformer++更多地提高了29%的性能，并且在图像去噪方面，EBTs在使用更少的前向传递的情况下优于扩散变压器。此外，我们发现在大多数下游任务中，EBTs相比现有模型在相同或更差的预训练性能下取得更好的结果，表明EBTs比现有方法更具泛化能力。因此，EBTs是一种有望扩展模型学习和思考能力的新兴范式。

更新时间: 2025-07-02 19:17:29

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2507.02092v1

Completion of the DrugMatrix Toxicogenomics Database using 3-Dimensional Tensors

We explore applying a tensor completion approach to complete the DrugMatrix toxicogenomics dataset. Our hypothesis is that by preserving the 3-dimensional structure of the data, which comprises tissue, treatment, and transcriptomic measurements, and by leveraging a machine learning formulation, our approach will improve upon prior state-of-the-art results. Our results demonstrate that the new tensor-based method more accurately reflects the original data distribution and effectively captures organ-specific variability. The proposed tensor-based methodology achieved lower mean squared errors and mean absolute errors compared to both conventional Canonical Polyadic decomposition and 2-dimensional matrix factorization methods. In addition, our non-negative tensor completion implementation reveals relationships among tissues. Our findings not only complete the world's largest in-vivo toxicogenomics database with improved accuracy but also offer a promising methodology for future studies of drugs that may cross species barriers, for example, from rats to humans.

Updated: 2025-07-02 19:15:51

标题: 使用三维张量完成DrugMatrix毒理基因组学数据库

摘要: 我们探讨了将张量补全方法应用于完成DrugMatrix毒理基因组学数据集。我们的假设是，通过保留数据的三维结构，包括组织、治疗和转录组测量，以及利用机器学习的形式化，我们的方法将改进先前的最先进结果。我们的结果表明，新的基于张量的方法更准确地反映了原始数据分布，并有效地捕获了器官特异性变异。所提出的基于张量的方法比传统的Canonical Polyadic分解和二维矩阵因子化方法实现了更低的均方误差和均值绝对误差。此外，我们的非负张量补全实现揭示了组织之间的关系。我们的发现不仅以更高的准确性完成了世界上最大的体内毒理基因组学数据库，而且为未来研究可能跨越物种障碍的药物（例如，从大鼠到人类）提供了一种有前途的方法。

更新时间: 2025-07-02 19:15:51

领域: cs.LG,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2507.03024v1

Sample Complexity Bounds for Linear Constrained MDPs with a Generative Model

We consider infinite-horizon $\gamma$-discounted (linear) constrained Markov decision processes (CMDPs) where the objective is to find a policy that maximizes the expected cumulative reward subject to expected cumulative constraints. Given access to a generative model, we propose to solve CMDPs with a primal-dual framework that can leverage any black-box unconstrained MDP solver. For linear CMDPs with feature dimension $d$, we instantiate the framework by using mirror descent value iteration (\texttt{MDVI})~\citep{kitamura2023regularization} an example MDP solver. We provide sample complexity bounds for the resulting CMDP algorithm in two cases: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to exactly satisfy the constraint. For (i), we prove that the algorithm can return an $\epsilon$-optimal policy with high probability by using $\tilde{O}\left(\frac{d^2}{(1-\gamma)^4\epsilon^2}\right)$ samples. We note that these results exhibit a near-optimal dependence on both $d$ and $\epsilon$. For (ii), we show that the algorithm requires $\tilde{O}\left(\frac{d^2}{(1-\gamma)^6\epsilon^2\zeta^2}\right)$ samples, where $\zeta$ is the problem-dependent Slater constant that characterizes the size of the feasible region. Finally, we instantiate our framework for tabular CMDPs and show that it can be used to recover near-optimal sample complexities in this setting.

Updated: 2025-07-02 19:07:37

标题: 线性约束MDP具有生成模型的样本复杂性界限

摘要: 我们考虑无限时间跨度的$\gamma$-折扣（线性）有约束马尔可夫决策过程（CMDPs），其目标是找到一种策略，最大化期望累积奖励，同时满足期望累积约束。在获得生成模型的情况下，我们建议使用一个原始-对偶框架来解决CMDPs，该框架可以利用任何黑盒无约束MDP求解器。对于具有特征维度$d$的线性CMDPs，我们通过使用镜像下降值迭代（MDVI）~\citep{kitamura2023regularization}作为示例MDP求解器来实例化该框架。我们在两种情况下为结果CMDP算法提供了样本复杂性界限：（i）松弛的可行性，允许小的约束违反，和（ii）严格的可行性，要求输出策略完全满足约束。对于（i），我们证明该算法可以使用$\tilde{O}\left(\frac{d^2}{(1-\gamma)^4\epsilon^2}\right)$个样本有很高概率返回一个$\epsilon$-最优策略。我们注意到这些结果在$d$和$\epsilon$上表现出接近最优的依赖性。对于（ii），我们展示该算法需要$\tilde{O}\left(\frac{d^2}{(1-\gamma)^6\epsilon^2\zeta^2}\right)$个样本，其中$\zeta$是特定问题的Slater常数，用于表征可行区域的大小。最后，我们实例化我们的框架用于表格CMDPs，并展示它可以在这种情况下恢复接近最优的样本复杂性。

更新时间: 2025-07-02 19:07:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.02089v1

Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective

One of the primary catalysts fueling advances in artificial intelligence (AI) and machine learning (ML) is the availability of massive, curated datasets. A commonly used technique to curate such massive datasets is crowdsourcing, where data are dispatched to multiple annotators. The annotator-produced labels are then fused to serve downstream learning and inference tasks. This annotation process often creates noisy labels due to various reasons, such as the limited expertise, or unreliability of annotators, among others. Therefore, a core objective in crowdsourcing is to develop methods that effectively mitigate the negative impact of such label noise on learning tasks. This feature article introduces advances in learning from noisy crowdsourced labels. The focus is on key crowdsourcing models and their methodological treatments, from classical statistical models to recent deep learning-based approaches, emphasizing analytical insights and algorithmic developments. In particular, this article reviews the connections between signal processing (SP) theory and methods, such as identifiability of tensor and nonnegative matrix factorization, and novel, principled solutions of longstanding challenges in crowdsourcing -- showing how SP perspectives drive the advancements of this field. Furthermore, this article touches upon emerging topics that are critical for developing cutting-edge AI/ML systems, such as crowdsourcing in reinforcement learning with human feedback (RLHF) and direct preference optimization (DPO) that are key techniques for fine-tuning large language models (LLMs).

Updated: 2025-07-02 19:06:48

标题: 从众包噪声标签中学习：信号处理的视角

摘要: 人工智能（AI）和机器学习（ML）进展的一个主要推动因素是大量经过筛选的数据集的可用性。用于筛选这些大规模数据集的常用技术是众包，其中数据被分发给多个注释者。然后合并注释者生成的标签以服务于下游的学习和推理任务。由于限制了专业知识或注释者的不可靠性等各种原因，这个注释过程往往会产生嘈杂的标签。因此，众包的核心目标是开发有效减轻这种标签噪音对学习任务的负面影响的方法。本特色文章介绍了从嘈杂的众包标签中学习的进展。重点是关键的众包模型及其方法论处理，从传统的统计模型到最近基于深度学习的方法，强调分析见解和算法发展。特别是，本文审查了信号处理（SP）理论和方法，如张量的可识别性和非负矩阵分解，以及众包长期挑战的新颖、有原则的解决方案之间的联系，展示了SP视角如何推动这一领域的进步。此外，本文还涉及了对发展尖端AI/ML系统至关重要的新兴主题，例如在强化学习中的众包与人类反馈（RLHF）以及直接偏好优化（DPO），这些是用于微调大型语言模型（LLMs）的关键技术。

更新时间: 2025-07-02 19:06:48

领域: eess.SP,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.06902v2

Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions

The use of large language models (LLMs) in hiring promises to streamline candidate screening, but it also raises serious concerns regarding accuracy and algorithmic bias where sufficient safeguards are not in place. In this work, we benchmark several state-of-the-art foundational LLMs - including models from OpenAI, Anthropic, Google, Meta, and Deepseek, and compare them with our proprietary domain-specific hiring model (Match Score) for job candidate matching. We evaluate each model's predictive accuracy (ROC AUC, Precision-Recall AUC, F1-score) and fairness (impact ratio of cut-off analysis across declared gender, race, and intersectional subgroups). Our experiments on a dataset of roughly 10,000 real-world recent candidate-job pairs show that Match Score outperforms the general-purpose LLMs on accuracy (ROC AUC 0.85 vs 0.77) and achieves significantly more equitable outcomes across demographic groups. Notably, Match Score attains a minimum race-wise impact ratio of 0.957 (near-parity), versus 0.809 or lower for the best LLMs, (0.906 vs 0.773 for the intersectionals, respectively). We discuss why pretraining biases may cause LLMs with insufficient safeguards to propagate societal biases in hiring scenarios, whereas a bespoke supervised model can more effectively mitigate these biases. Our findings highlight the importance of domain-specific modeling and bias auditing when deploying AI in high-stakes domains such as hiring, and caution against relying on off-the-shelf LLMs for such tasks without extensive fairness safeguards. Furthermore, we show with empirical evidence that there shouldn't be a dichotomy between choosing accuracy and fairness in hiring: a well-designed algorithm can achieve both accuracy in hiring and fairness in outcomes.

Updated: 2025-07-02 19:02:18

标题: 评估LLM在招聘决策中的优势和局限性

摘要: 在招聘中使用大型语言模型（LLMs）有望简化候选人筛选流程，但如果没有足够的保障措施，也会引发关于准确性和算法偏见的严重担忧。在这项工作中，我们对几个最先进的基础LLMs进行基准测试，包括来自OpenAI、Anthropic、Google、Meta和Deepseek的模型，并将它们与我们的专有领域特定招聘模型（Match Score）进行比较，用于求职者匹配。我们评估每个模型的预测准确性（ROC AUC、Precision-Recall AUC、F1-score）和公平性（在声明的性别、种族和交叉子群中的截断分析影响比）。我们在一个包含大约10,000个真实世界最近的候选人-工作对的数据集上进行实验，结果显示Match Score在准确性方面优于通用LLMs（ROC AUC 0.85比0.77），并在人口统计学群体之间实现了更加公平的结果。值得注意的是，Match Score实现了0.957的最小种族影响比（接近平等），而最佳LLMs的影响比为0.809或更低（交叉影响比分别为0.906比0.773）。我们讨论了为什么预训练偏见可能导致没有足够保障措施的LLMs在招聘场景中传播社会偏见，而专门的监督模型可以更有效地减轻这些偏见。我们的研究结果强调了在高风险领域如招聘中部署人工智能时领域特定建模和偏见审计的重要性，并警告不要在没有广泛的公平保障措施的情况下依赖现成的LLMs进行这样的任务。此外，我们通过实证证据表明，在招聘中选择准确性和公平性之间不应该存在二元分歧：一个设计良好的算法可以在招聘中实现准确性和结果公平性。

更新时间: 2025-07-02 19:02:18

领域: cs.LG,cs.CL,cs.CY

下载: http://arxiv.org/abs/2507.02087v1

Selective Feature Re-Encoded Quantum Convolutional Neural Network with Joint Optimization for Image Classification

Quantum Machine Learning (QML) has seen significant advancements, driven by recent improvements in Noisy Intermediate-Scale Quantum (NISQ) devices. Leveraging quantum principles such as entanglement and superposition, quantum convolutional neural networks (QCNNs) have demonstrated promising results in classifying both quantum and classical data. This study examines QCNNs in the context of image classification and proposes a novel strategy to enhance feature processing and a QCNN architecture for improved classification accuracy. First, a selective feature re-encoding strategy is proposed, which directs the quantum circuits to prioritize the most informative features, thereby effectively navigating the crucial regions of the Hilbert space to find the optimal solution space. Secondly, a novel parallel-mode QCNN architecture is designed to simultaneously incorporate features extracted by two classical methods, Principal Component Analysis (PCA) and Autoencoders, within a unified training scheme. The joint optimization involved in the training process allows the QCNN to benefit from complementary feature representations, enabling better mutual readjustment of model parameters. To assess these methodologies, comprehensive experiments have been performed using the widely used MNIST and Fashion MNIST datasets for binary classification tasks. Experimental findings reveal that the selective feature re-encoding method significantly improves the quantum circuit's feature processing capability and performance. Furthermore, the jointly optimized parallel QCNN architecture consistently outperforms the individual QCNN models and the traditional ensemble approach involving independent learning followed by decision fusion, confirming its superior accuracy and generalization capabilities.

Updated: 2025-07-02 18:51:56

标题: 具有联合优化的选择性特征重新编码的量子卷积神经网络用于图像分类

摘要: 量子机器学习（QML）已经取得了显著进展，这得益于最近在噪声中间规模量子（NISQ）设备的改进。利用量子原理如纠缠和叠加，量子卷积神经网络（QCNNs）已经在分类量子和经典数据方面展示出了有希望的结果。本研究在图像分类的背景下考察了QCNNs，并提出了一种增强特征处理和改善分类准确性的QCNN架构的新策略。首先，提出了一种选择性特征重新编码策略，指导量子电路优先处理最具信息量的特征，从而有效地导航希尔伯特空间的关键区域以找到最优解空间。其次，设计了一种新颖的并行模式QCNN架构，以统一的训练方案同时整合了由两种经典方法提取的特征，即主成分分析（PCA）和自动编码器。训练过程中的联合优化使QCNN能够从互补的特征表示中受益，实现更好的模型参数相互调整。为了评估这些方法，使用广泛使用的MNIST和Fashion MNIST数据集进行了全面的实验，用于二分类任务。实验结果显示，选择性特征重新编码方法显著提高了量子电路的特征处理能力和性能。此外，经过联合优化的并行QCNN架构始终优于单独的QCNN模型和传统的独立学习后进行决策融合的集成方法，证实了其卓越的准确性和泛化能力。

更新时间: 2025-07-02 18:51:56

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2507.02086v1

GeoAda: Efficiently Finetune Geometric Diffusion Models with Equivariant Adapters

Geometric diffusion models have shown remarkable success in molecular dynamics and structure generation. However, efficiently fine-tuning them for downstream tasks with varying geometric controls remains underexplored. In this work, we propose an SE(3)-equivariant adapter framework ( GeoAda) that enables flexible and parameter-efficient fine-tuning for controlled generative tasks without modifying the original model architecture. GeoAda introduces a structured adapter design: control signals are first encoded through coupling operators, then processed by a trainable copy of selected pretrained model layers, and finally projected back via decoupling operators followed by an equivariant zero-initialized convolution. By fine-tuning only these lightweight adapter modules, GeoAda preserves the model's geometric consistency while mitigating overfitting and catastrophic forgetting. We theoretically prove that the proposed adapters maintain SE(3)-equivariance, ensuring that the geometric inductive biases of the pretrained diffusion model remain intact during adaptation. We demonstrate the wide applicability of GeoAda across diverse geometric control types, including frame control, global control, subgraph control, and a broad range of application domains such as particle dynamics, molecular dynamics, human motion prediction, and molecule generation. Empirical results show that GeoAda achieves state-of-the-art fine-tuning performance while preserving original task accuracy, whereas other baselines experience significant performance degradation due to overfitting and catastrophic forgetting.

Updated: 2025-07-02 18:44:03

标题: GeoAda：使用等变适配器高效微调几何扩散模型

摘要: 几何扩散模型在分子动力学和结构生成方面取得了显著的成功。然而，有效地对其进行微调以适应具有不同几何控制的下游任务仍未得到充分探讨。在这项工作中，我们提出了一个SE(3)-等变适配器框架（GeoAda），可以在不修改原始模型架构的情况下，实现对受控生成任务的灵活和高效微调。GeoAda引入了一个结构化的适配器设计：控制信号首先通过耦合操作符进行编码，然后由可训练的预训练模型层的选定副本进行处理，最后通过解耦操作符进行投影，然后接着是一个等变的零初始化卷积。通过仅微调这些轻量级适配器模块，GeoAda保留了模型的几何一致性，同时减轻了过拟合和灾难性遗忘的影响。我们从理论上证明了所提出的适配器保持了SE(3)-等变性，确保了预训练扩散模型的几何归纳偏差在适应过程中保持不变。我们展示了GeoAda在各种几何控制类型，包括框架控制、全局控制、子图控制，以及粒子动力学、分子动力学、人体运动预测和分子生成等广泛应用领域的广泛适用性。实证结果表明，GeoAda在保持原始任务准确性的同时实现了最先进的微调性能，而其他基线模型由于过拟合和灾难性遗忘而经历了显著的性能下降。

更新时间: 2025-07-02 18:44:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.02085v1

Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation

The adaptive Iterative Soft-Thresholding Algorithm (ISTA) has been a popular algorithm for finding a desirable solution to the LASSO problem without explicitly tuning the regularization parameter $\lambda$. Despite that the adaptive ISTA is a successful practical algorithm, few theoretical results exist. In this paper, we present the theoretical analysis on the adaptive ISTA with the thresholding strategy of estimating noise level by median absolute deviation. We show properties of the fixed points of the algorithm, including scale equivariance, non-uniqueness, and local stability, prove the local linear convergence guarantee, and show its global convergence behavior.

Updated: 2025-07-02 18:41:59

标题: 具有中位绝对偏差的自适应迭代软阈值算法

摘要: 自适应迭代软阈值算法（ISTA）是一种流行的算法，用于在不显式调整正则化参数λ的情况下找到LASSO问题的理想解决方案。尽管自适应ISTA是一种成功的实用算法，但很少有理论结果存在。本文提出了关于自适应ISTA的理论分析，通过估计中位绝对偏差来确定噪声水平的阈值策略。我们展示了该算法的固定点的特性，包括尺度等变性、非唯一性和局部稳定性，证明了局部线性收敛保证，并展示了其全局收敛行为。

更新时间: 2025-07-02 18:41:59

领域: stat.ML,cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.02084v1

Measuring Scientific Capabilities of Language Models with a Systems Biology Dry Lab

Designing experiments and result interpretations are core scientific competencies, particularly in biology, where researchers perturb complex systems to uncover the underlying systems. Recent efforts to evaluate the scientific capabilities of large language models (LLMs) fail to test these competencies because wet-lab experimentation is prohibitively expensive: in expertise, time and equipment. We introduce SciGym, a first-in-class benchmark that assesses LLMs' iterative experiment design and analysis abilities in open-ended scientific discovery tasks. SciGym overcomes the challenge of wet-lab costs by running a dry lab of biological systems. These models, encoded in Systems Biology Markup Language, are efficient for generating simulated data, making them ideal testbeds for experimentation on realistically complex systems. We evaluated six frontier LLMs on 137 small systems, and released a total of 350 systems. Our evaluation shows that while more capable models demonstrated superior performance, all models' performance declined significantly as system complexity increased, suggesting substantial room for improvement in the scientific capabilities of LLM agents.

Updated: 2025-07-02 18:41:44

标题: 用系统生物学干实验测量语言模型的科学能力

摘要: 设计实验和结果解释是核心科学能力，特别是在生物学领域，研究人员会扰动复杂系统以揭示潜在的系统。最近评估大型语言模型（LLMs）的科学能力的努力未能测试这些能力，因为湿实验室实验费用昂贵：在专业知识、时间和设备方面。我们介绍了SciGym，这是一个首创的基准测试，评估LLMs在开放式科学发现任务中的迭代实验设计和分析能力。SciGym通过运行生物系统的干实验室来克服湿实验室成本的挑战。这些模型以系统生物学标记语言编码，对生成模拟数据非常高效，使它们成为在真实复杂系统上进行实验的理想测试平台。我们对137个小系统上的六个前沿LLMs进行了评估，并发布了总共350个系统。我们的评估显示，虽然更有能力的模型表现出优越性能，但随着系统复杂性的增加，所有模型的性能都显著下降，表明LLM代理的科学能力有很大的改进空间。

更新时间: 2025-07-02 18:41:44

领域: cs.AI

下载: http://arxiv.org/abs/2507.02083v1

Unraveling particle dark matter with Physics-Informed Neural Networks

We parametrically solve the Boltzmann equations governing freeze-in dark matter (DM) in alternative cosmologies with Physics-Informed Neural Networks (PINNs), a mesh-free method. Through inverse PINNs, using a single DM experimental point -- observed relic density -- we determine the physical attributes of the theory, namely power-law cosmologies, inspired by braneworld scenarios, and particle interaction cross sections. The expansion of the Universe in such alternative cosmologies has been parameterized through a switch-like function reproducing the Hubble law at later times. Without loss of generality, we model more realistically this transition with a smooth function. We predict a distinct pair-wise relationship between power-law exponent and particle interactions: for a given cosmology with negative (positive) exponent, smaller (larger) cross sections are required to reproduce the data. Lastly, via Bayesian methods, we quantify the epistemic uncertainty of theoretical parameters found in inverse problems.

Updated: 2025-07-02 18:41:25

标题: 用物理信息神经网络揭示粒子暗物质

摘要: 我们使用物理信息神经网络（PINNs）这种无网格方法，参数化地解决了控制冻结暗物质（DM）的玻尔兹曼方程在替代宇宙学中的问题。通过逆 PINNs，利用单个 DM 实验点 -- 观测到的遗留密度 -- 我们确定了理论的物理属性，即受到膜宇宙场景启发的幂律宇宙学和粒子相互作用截面。在这种替代宇宙学中，宇宙的膨胀已经通过一个类似开关的函数参数化，以在后期再现哈勃定律。为了更加真实地模拟这种过渡，我们使用了一个平滑函数。我们预测了幂律指数和粒子相互作用之间的明显双向关系：对于具有负（正）指数的给定宇宙学，需要较小（较大）的截面才能再现数据。最后，通过贝叶斯方法，我们量化了逆问题中找到的理论参数的认识不确定性。

更新时间: 2025-07-02 18:41:25

领域: hep-ph,cs.LG

下载: http://arxiv.org/abs/2502.17597v2

Towards Explaining Deep Neural Network Compression Through a Probabilistic Latent Space

Despite the impressive performance of deep neural networks (DNNs), their computational complexity and storage space consumption have led to the concept of network compression. While DNN compression techniques such as pruning and low-rank decomposition have been extensively studied, there has been insufficient attention paid to their theoretical explanation. In this paper, we propose a novel theoretical framework that leverages a probabilistic latent space of DNN weights and explains the optimal network sparsity by using the information-theoretic divergence measures. We introduce new analogous projected patterns (AP2) and analogous-in-probability projected patterns (AP3) notions for DNNs and prove that there exists a relationship between AP3/AP2 property of layers in the network and its performance. Further, we provide a theoretical analysis that explains the training process of the compressed network. The theoretical results are empirically validated through experiments conducted on standard pre-trained benchmarks, including AlexNet, ResNet50, and VGG16, using CIFAR10 and CIFAR100 datasets. Through our experiments, we highlight the relationship of AP3 and AP2 properties with fine-tuning pruned DNNs and sparsity levels.

Updated: 2025-07-02 18:37:53

标题: 朝向通过概率潜在空间解释深度神经网络压缩

摘要: 尽管深度神经网络（DNN）表现出色，但它们的计算复杂性和存储空间消耗导致了网络压缩的概念。虽然DNN压缩技术如剪枝和低秩分解已被广泛研究，但对它们的理论解释还未受到足够关注。本文提出了一个新的理论框架，利用DNN权重的概率潜在空间，并通过信息论分歧度量来解释最佳网络稀疏性。我们引入了新的类比投影模式（AP2）和类比概率投影模式（AP3）的概念用于DNN，并证明网络中层的AP3/AP2属性与其性能之间存在关系。此外，我们提供了一个理论分析，解释了压缩网络的训练过程。理论结果经过在标准预训练基准测试集上进行的实验验证，包括AlexNet、ResNet50和VGG16，使用CIFAR10和CIFAR100数据集。通过我们的实验，我们突出了AP3和AP2属性与微调剪枝DNN和稀疏水平之间的关系。

更新时间: 2025-07-02 18:37:53

领域: cs.LG

下载: http://arxiv.org/abs/2403.00155v3

A Model-Consistent Data-Driven Computational Strategy for PDE Joint Inversion Problems

The task of simultaneously reconstructing multiple physical coefficients in partial differential equations (PDEs) from observed data is ubiquitous in applications. In this work, we propose an integrated data-driven and model-based iterative reconstruction framework for such joint inversion problems where additional data on the unknown coefficients are supplemented for better reconstructions. Our method couples the supplementary data with the PDE model to make the data-driven modeling process consistent with the model-based reconstruction procedure. We characterize the impact of learning uncertainty on the joint inversion results for two typical inverse problems. Numerical evidence is provided to demonstrate the feasibility of using data-driven models to improve the joint inversion of multiple coefficients in PDEs.

Updated: 2025-07-02 18:33:58

标题: 一个与模型一致的数据驱动计算策略用于PDE联合反演问题

摘要: 在应用中，同时从观测数据中重建偏微分方程（PDEs）中的多个物理系数是一种普遍的任务。在这项工作中，我们提出了一个集成的数据驱动和基于模型的迭代重建框架，用于这种联合反演问题，其中附加数据用于未知系数的更好重建。我们的方法将补充数据与PDE模型耦合，使数据驱动建模过程与基于模型的重建过程保持一致。我们对两种典型的反问题的联合反演结果受学习不确定性的影响进行了表征。提供了数值证据，证明了使用数据驱动模型改进PDE中多个系数的联合反演的可行性。

更新时间: 2025-07-02 18:33:58

领域: math.NA,cs.LG,cs.NA,math.OC,stat.ME,35R30, 49M41, 65M32, 65N21, 68Q32, 68T07, 78A46

下载: http://arxiv.org/abs/2210.09228v3

Privacy-Preserving in Connected and Autonomous Vehicles Through Vision to Text Transformation

Connected and Autonomous Vehicles (CAVs) rely on a range of devices that often process privacy-sensitive data. Among these, roadside units play a critical role particularly through the use of AI-equipped (AIE) cameras for applications such as violation detection. However, the privacy risks associated with captured imagery remain a major concern, as such data can be misused for identity theft, profiling, or unauthorized commercial purposes. While traditional techniques such as face blurring and obfuscation have been applied to mitigate privacy risks, individual privacy remains at risk, as individuals can still be tracked using other features such as their clothing. This paper introduces a novel privacy-preserving framework that leverages feedback-based reinforcement learning (RL) and vision-language models (VLMs) to protect sensitive visual information captured by AIE cameras. The main idea is to convert images into semantically equivalent textual descriptions, ensuring that scene-relevant information is retained while visual privacy is preserved. A hierarchical RL strategy is employed to iteratively refine the generated text, enhancing both semantic accuracy and privacy. Evaluation results demonstrate significant improvements in both privacy protection and textual quality, with the Unique Word Count increasing by approximately 77\% and Detail Density by around 50\% compared to existing approaches.

Updated: 2025-07-02 18:31:53

标题: 隐私保护在连接和自动驾驶车辆中通过视觉到文本转换

摘要: 连接和自动驾驶车辆（CAVs）依赖于一系列设备，这些设备通常处理隐私敏感数据。其中，路边单元在违章检测等应用中通过使用配备人工智能（AIE）摄像头发挥关键作用。然而，与捕获的图像相关的隐私风险仍然是一个主要关注点，因为这些数据可能被滥用于身份盗窃、个人画像或未经授权的商业用途。虽然传统的技术如人脸模糊和混淆已被应用于减轻隐私风险，但个人隐私仍然面临风险，因为个人仍然可以通过其他特征如他们的服装来进行跟踪。本文介绍了一种新颖的隐私保护框架，利用基于反馈的强化学习（RL）和视觉语言模型（VLMs）来保护AIE摄像头捕获的敏感视觉信息。主要思想是将图像转换为语义上等效的文本描述，确保场景相关信息得以保留，同时保护视觉隐私。采用分层RL策略来迭代地改进生成的文本，提高语义准确性和隐私性。评估结果显示，在隐私保护和文本质量方面都取得了显着的改善，与现有方法相比，独特单词计数增加了约77％，细节密度增加了约50％。

更新时间: 2025-07-02 18:31:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2506.15854v2

Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs

Large language models (LLMs) have rapidly progressed into general-purpose agents capable of solving a broad spectrum of tasks. However, current models remain inefficient at reasoning: they apply fixed inference-time compute regardless of task complexity, often overthinking simple problems while underthinking hard ones. This survey presents a comprehensive review of efficient test-time compute (TTC) strategies, which aim to improve the computational efficiency of LLM reasoning. We introduce a two-tiered taxonomy that distinguishes between L1-controllability, methods that operate under fixed compute budgets, and L2-adaptiveness, methods that dynamically scale inference based on input difficulty or model confidence. We benchmark leading proprietary LLMs across diverse datasets, highlighting critical trade-offs between reasoning performance and token usage. Compared to prior surveys on efficient reasoning, our review emphasizes the practical control, adaptability, and scalability of TTC methods. Finally, we discuss emerging trends such as hybrid thinking models and identify key challenges for future work towards making LLMs more computationally efficient, robust, and responsive to user constraints.

Updated: 2025-07-02 18:27:42

标题: 预算推理：LLM中自适应和可控测试时间计算的调查

摘要: 大型语言模型（LLMs）已迅速发展成为能够解决广泛任务范围的通用代理。然而，当前模型在推理方面仍然效率低下：它们在推理时采用固定的计算时间，无论任务复杂性如何，通常在处理简单问题时过度思考，而在处理困难问题时则思考不足。本调查综述了有效的测试时间计算（TTC）策略，旨在提高LLM推理的计算效率。我们引入了一个两层分类法，区分了L1-控制性方法，这些方法在固定计算预算下运行，以及L2-适应性方法，这些方法根据输入难度或模型置信度动态调整推理。我们在各种数据集上对领先的专有LLMs进行了基准测试，突出了推理性能和令牌使用之间的关键权衡。与以往关于有效推理的调查相比，我们的综述强调了TTC方法的实际控制性、适应性和可扩展性。最后，我们讨论了新兴趋势，如混合思维模型，并确定未来工作的关键挑战，以使LLMs在计算上更加高效、稳健，并能够响应用户的约束。

更新时间: 2025-07-02 18:27:42

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02076v1

Exploring the Design Space of Diffusion Bridge Models

Diffusion bridge models and stochastic interpolants enable high-quality image-to-image (I2I) translation by creating paths between distributions in pixel space. However, the proliferation of techniques based on incompatible mathematical assumptions have impeded progress. In this work, we unify and expand the space of bridge models by extending Stochastic Interpolants (SIs) with preconditioning, endpoint conditioning, and an optimized sampling algorithm. These enhancements expand the design space of diffusion bridge models, leading to state-of-the-art performance in both image quality and sampling efficiency across diverse I2I tasks. Furthermore, we identify and address a previously overlooked issue of low sample diversity under fixed conditions. We introduce a quantitative analysis for output diversity and demonstrate how we can modify the base distribution for further improvements.

Updated: 2025-07-02 18:24:40

标题: 探索扩散桥模型的设计空间

摘要: 扩散桥模型和随机插值器通过在像素空间中创建分布之间的路径，实现了高质量的图像到图像（I2I）转换。然而，基于不兼容数学假设的技术的泛滥阻碍了进展。在这项工作中，我们通过引入预调条件、端点条件和优化的采样算法，将随机插值器（SIs）扩展，统一并扩展了桥模型的空间。这些增强功能扩展了扩散桥模型的设计空间，在各种I2I任务中实现了最先进的性能，包括图像质量和采样效率。此外，我们确定并解决了一个之前被忽视的问题，即在固定条件下样本多样性较低。我们引入了一个用于输出多样性的定量分析，并展示了如何修改基础分布以进一步改进。

更新时间: 2025-07-02 18:24:40

领域: cs.LG

下载: http://arxiv.org/abs/2410.21553v2

Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges

Crash detection from video feeds is a critical problem in intelligent transportation systems. Recent developments in large language models (LLMs) and vision-language models (VLMs) have transformed how we process, reason about, and summarize multimodal information. This paper surveys recent methods leveraging LLMs for crash detection from video data. We present a structured taxonomy of fusion strategies, summarize key datasets, analyze model architectures, compare performance benchmarks, and discuss ongoing challenges and opportunities. Our review provides a foundation for future research in this fast-growing intersection of video understanding and foundation models.

Updated: 2025-07-02 18:21:01

标题: 视频中用于崩溃检测的大型语言模型：方法、数据集和挑战综述

摘要: 视频流中的碰撞检测是智能交通系统中的一个关键问题。近年来，大型语言模型（LLMs）和视觉语言模型（VLMs）的发展改变了我们处理、推理和总结多模态信息的方式。本文调查了最近利用LLMs从视频数据中进行碰撞检测的方法。我们提出了融合策略的结构化分类，总结了关键数据集，分析了模型架构，比较了性能基准，并讨论了持续的挑战和机遇。我们的综述为未来在视频理解和基础模型快速增长的交叉领域的研究奠定了基础。

更新时间: 2025-07-02 18:21:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02074v1

HCVR: A Hybrid Approach with Correlation-aware Voting Rules for Feature Selection

In this paper, we propose HCVR (Hybrid approach with Correlation-aware Voting Rules), a lightweight rule-based feature selection method that combines Parameter-to-Parameter (P2P) and Parameter-to-Target (P2T) correlations to eliminate redundant features and retain relevant ones. This method is a hybrid of non-iterative and iterative filtering approaches for dimensionality reduction. It is a greedy method, which works by backward elimination, eliminating possibly multiple features at every step. The rules contribute to voting for features, and a decision to keep or discard is made by majority voting. The rules make use of correlation thresholds between every pair of features, and between features and the target. We provide the results from the application of HCVR to the SPAMBASE dataset. The results showed improvement performance as compared to traditional non-iterative (CFS, mRMR and MI) and iterative (RFE, SFS and Genetic Algorithm) techniques. The effectiveness was assessed based on the performance of different classifiers after applying filtering.

Updated: 2025-07-02 18:20:56

标题: HCVR：一种具有相关性感知投票规则的混合方法用于特征选择

摘要: 在本文中，我们提出了HCVR（具有相关性感知投票规则的混合方法），这是一种轻量级基于规则的特征选择方法，结合了参数与参数（P2P）和参数与目标（P2T）之间的相关性，以消除冗余特征并保留相关特征。该方法是一个非迭代和迭代过滤方法的混合体，用于降低维度。这是一种贪婪方法，通过向后消除，每一步可能消除多个特征。规则有助于对特征进行投票，并且根据多数投票决定保留或丢弃。规则利用特征之间以及特征与目标之间的相关性阈值。我们提供了将HCVR应用于SPAMBASE数据集的结果。与传统的非迭代（CFS、mRMR和MI）和迭代（RFE、SFS和遗传算法）技术相比，结果显示了性能的提升。根据应用过滤后不同分类器的性能来评估有效性。

更新时间: 2025-07-02 18:20:56

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02073v1

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.

Updated: 2025-07-02 18:17:45

标题: 非线性情况下的良性过拟合：使用梯度下降训练神经网络分类器处理嘈杂线性数据

摘要: 良性过拟合是指在存在噪声数据的情况下，插值模型能够良好泛化的现象，最初是在使用梯度下降训练的神经网络模型中观察到的。为了更好地理解这一经验观察，我们考虑了在随机初始化后，通过梯度下降在逻辑损失上训练至插值的两层神经网络的泛化误差。我们假设数据来自于良好分离的类条件对数凹分布，并允许训练标签的一个固定比例受到对手的破坏。我们展示了在这种设置下，神经网络表现出了良性过拟合：它们可以被驱动到零训练误差，完美拟合任何有噪声的训练标签，并同时实现最小化的测试误差。与以往关于良性过拟合的工作需要线性或基于核的预测器不同，我们的分析适用于模型和学习动态都基本非线性的情况。

更新时间: 2025-07-02 18:17:45

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2202.05928v5

Byzantine-Robust Gossip: Insights from a Dual Approach

Distributed learning has many computational benefits but is vulnerable to attacks from a subset of devices transmitting incorrect information. This paper investigates Byzantine-resilient algorithms in a decentralized setting, where devices communicate directly in a peer-to-peer manner within a communication network. We leverage the so-called dual approach for decentralized optimization and propose a Byzantine-robust algorithm. We provide convergence guarantees in the average consensus subcase, discuss the potential of the dual approach beyond this subcase, and re-interpret existing algorithms using the dual framework. Lastly, we experimentally show the soundness of our method.

Updated: 2025-07-02 18:12:55

标题: 拜占庭-鲁棒的谣言：双重方法的洞察

摘要: 分布式学习具有许多计算优势，但容易受到传输不正确信息的设备子集攻击。本文研究了分布式设置中的拜占庭容忍算法，在这种设置中，设备在通信网络中直接进行点对点通信。我们利用所谓的双重方法进行分布式优化，并提出了一个拜占庭鲁棒算法。我们在平均共识子情况下提供收敛保证，讨论了双重方法在这一子情况之外的潜力，并使用双重框架重新解释了现有算法。最后，我们通过实验证明了我们方法的有效性。

更新时间: 2025-07-02 18:12:55

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.03449v2

SPACE-SUIT: An Artificial Intelligence Based Chromospheric Feature Extractor and Classifier for SUIT

The Solar Ultraviolet Imaging Telescope(SUIT) onboard Aditya-L1 is an imager that observes the solar photosphere and chromosphere through observations in the wavelength range of 200-400 nm. A comprehensive understanding of the plasma and thermodynamic properties of chromospheric and photospheric morphological structures requires a large sample statistical study, necessitating the development of automatic feature detection methods. To this end, we develop the feature detection algorithm SPACE-SUIT: Solar Phenomena Analysis and Classification using Enhanced vision techniques for SUIT, to detect and classify the solar chromospheric features to be observed from SUIT's Mg II k filter. Specifically, we target plage regions, sunspots, filaments, and off-limb structures. SPACE uses YOLO, a neural network-based model to identify regions of interest. We train and validate SPACE using mock-SUIT images developed from Interface Region Imaging Spectrometer(IRIS) full-disk mosaic images in Mg II k line, while we also perform detection on Level-1 SUIT data. SPACE achieves an approximate precision of 0.788, recall 0.863 and MAP of 0.874 on the validation mock SUIT FITS dataset. Given the manual labeling of our dataset, we perform "self-validation" by applying statistical measures and Tamura features on the ground truth and predicted bounding boxes. We find the distributions of entropy, contrast, dissimilarity, and energy to show differences in the features. These differences are qualitatively captured by the detected regions predicted by SPACE and validated with the observed SUIT images, even in the absence of labeled ground truth. This work not only develops a chromospheric feature extractor but also demonstrates the effectiveness of statistical metrics and Tamura features for distinguishing chromospheric features, offering independent validation for future detection schemes.

Updated: 2025-07-02 18:01:19

标题: 太空服：基于人工智能的SUIT色球特征提取器和分类器

摘要: 太阳紫外成像望远镜(SUIT)是搭载在Aditya-L1上的一种成像仪，通过在200-400纳米波长范围内观测太阳光球和色球。对色球和光球形态结构的等离子体和热力学特性进行全面理解需要进行大样本统计研究，因此需要开发自动特征检测方法。为此，我们开发了特征检测算法SPACE-SUIT：使用增强视觉技术进行SUIT太阳现象分析和分类，以检测和分类从SUIT的Mg II k滤波器中观测到的太阳色球特征。具体而言，我们针对光区、黑子、日珥和越边结构。SPACE使用基于神经网络的YOLO模型来识别感兴趣区域。我们使用从接口区成像光谱仪(IRIS)全盘镶嵌图像中的Mg II k线开发的模拟SUIT图像对SPACE进行训练和验证，同时也对Level-1 SUIT数据进行检测。SPACE在验证模拟SUIT FITS数据集上实现了约0.788的精度、0.863的召回率和0.874的平均精度。鉴于我们数据集的手动标记，我们通过在真实和预测边界框上应用统计措施和Tamura特征进行“自我验证”。我们发现熵、对比度、不相似度和能量的分布显示出特征上的差异。这些差异在SPACE预测的检测区域中得到了定性捕捉，并通过观察到的SUIT图像进行了验证，即使在没有标记的真实情况下也是如此。这项工作不仅开发了一种色球特征提取器，还展示了统计度量和Tamura特征用于区分色球特征的有效性，为未来的检测方案提供了独立验证。

更新时间: 2025-07-02 18:01:19

领域: astro-ph.SR,astro-ph.IM,cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.08589v2

MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation

Large language models (LLMs) have democratized software development, reducing the expertise barrier for programming complex applications. This accessibility extends to malicious software development, raising significant security concerns. While LLM providers have implemented alignment mechanisms to prevent direct generation of overtly malicious code, these safeguards predominantly evaluate individual prompts in isolation, overlooking a critical vulnerability: malicious operations can be systematically decomposed into benign-appearing sub-tasks. In this paper, we introduce the Malware Generation Compiler (MGC), a novel framework that leverages this vulnerability through modular decomposition and alignment-evasive generation. MGC employs a specialized Malware Description Intermediate Representation (MDIR) to bridge high-level malicious intents and benign-appearing code snippets. Extensive evaluation demonstrates that our attack reliably generates functional malware across diverse task specifications and categories, outperforming jailbreaking methods by +365.79% and underground services by +78.07% in correctness on three benchmark datasets. Case studies further show that MGC can reproduce and even enhance 16 real-world malware samples. This work provides critical insights for security researchers by exposing the risks of compositional attacks against aligned AI systems. Demonstrations are available at https://sites.google.com/view/malware-generation-compiler.

Updated: 2025-07-02 18:00:49

标题: MGC：利用对齐LLM中的组合失明进行恶意软件生成的编译器框架

摘要: 大型语言模型(LLMs)已经使软件开发民主化，降低了编程复杂应用程序的专业知识门槛。这种可访问性延伸到恶意软件开发，引发了重大的安全顾虑。虽然LLM提供商已经实施了对齐机制来防止直接生成明显恶意代码，但这些保障主要评估单独的提示，忽视了一个关键的漏洞：恶意操作可以被系统地分解成看似良性的子任务。在本文中，我们介绍了恶意软件生成编译器(MGC)，这是一个利用模块化分解和对齐逃避生成的新框架。MGC采用专门的恶意软件描述中间表示(MDIR)来连接高层次的恶意意图和看似良性的代码片段。广泛的评估表明，我们的攻击可可靠地生成功能性的恶意软件，涵盖不同的任务规范和类别，在三个基准数据集上的正确性比越狱方法高出365.79%，地下服务高出78.07%。案例研究进一步显示，MGC可以复制甚至增强16个真实世界的恶意软件样本。这项工作通过揭示针对对齐的人工智能系统的组合攻击风险，为安全研究人员提供了重要见解。演示可在https://sites.google.com/view/malware-generation-compiler上查看。

更新时间: 2025-07-02 18:00:49

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.02057v1

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

We present Locality-aware Parallel Decoding (LPD) to accelerate autoregressive image generation. Traditional autoregressive image generation relies on next-patch prediction, a memory-bound process that leads to high latency. Existing works have tried to parallelize next-patch prediction by shifting to multi-patch prediction to accelerate the process, but only achieved limited parallelization. To achieve high parallelization while maintaining generation quality, we introduce two key techniques: (1) Flexible Parallelized Autoregressive Modeling, a novel architecture that enables arbitrary generation ordering and degrees of parallelization. It uses learnable position query tokens to guide generation at target positions while ensuring mutual visibility among concurrently generated tokens for consistent parallel decoding. (2) Locality-aware Generation Ordering, a novel schedule that forms groups to minimize intra-group dependencies and maximize contextual support, enhancing generation quality. With these designs, we reduce the generation steps from 256 to 20 (256$\times$256 res.) and 1024 to 48 (512$\times$512 res.) without compromising quality on the ImageNet class-conditional generation, and achieving at least 3.4$\times$ lower latency than previous parallelized autoregressive models.

Updated: 2025-07-02 17:59:23

标题: 基于局部性的并行解码，实现高效的自回归图像生成

摘要: 我们提出了一种局部感知并行解码（LPD）方法来加速自回归图像生成。传统的自回归图像生成依赖于下一个补丁的预测，这是一个内存受限的过程，导致高延迟。现有的工作已经尝试通过转换到多补丁预测来并行化下一个补丁的预测，以加速该过程，但只实现了有限的并行化。为了实现高度并行化并保持生成质量，我们引入了两个关键技术：（1）灵活并行自回归建模，这是一种新颖的架构，能够实现任意生成顺序和并行化程度。它使用可学习的位置查询标记来引导在目标位置生成，同时确保同时生成的标记之间的相互可见性，以实现一致的并行解码。（2）局部感知生成排序，一种新颖的调度方法，形成组以最小化组内依赖性并最大化上下文支持，从而增强生成质量。通过这些设计，我们将ImageNet类别条件生成的生成步骤从256减少到20（256×256分辨率），从1024减少到48（512×512分辨率），而不会影响生成质量，并且至少比以前的并行自回归模型低3.4倍的延迟。

更新时间: 2025-07-02 17:59:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.01957v1

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Multimodal foundation models, such as GPT-4o, have recently made remarkable progress, but it is not clear where exactly these models stand in terms of understanding vision. In this paper, we benchmark the performance of popular multimodal foundation models (GPT-4o, o4-mini, Gemini 1.5 Pro and Gemini 2.0 Flash, Claude 3.5 Sonnet, Qwen2-VL, Llama 3.2) on standard computer vision tasks (semantic segmentation, object detection, image classification, depth and surface normal prediction) using established datasets (e.g., COCO, ImageNet and its variants, etc). The main challenges to performing this are: 1) most models are trained to output text and cannot natively express versatile domains, such as segments or 3D geometry, and 2) many leading models are proprietary and accessible only at an API level, i.e., there is no weight access to adapt them. We address these challenges by translating standard vision tasks into equivalent text-promptable and API-compatible tasks via prompt chaining to create a standardized benchmarking framework. We observe that 1) the models are not close to the state-of-the-art specialist models at any task. However, 2) they are respectable generalists; this is remarkable as they are presumably trained on primarily image-text-based tasks. 3) They perform semantic tasks notably better than geometric ones. 4) While the prompt-chaining techniques affect performance, better models exhibit less sensitivity to prompt variations. 5) GPT-4o performs the best among non-reasoning models, securing the top position in 4 out of 6 tasks, 6) reasoning models, e.g. o3, show improvements in geometric tasks, and 7) a preliminary analysis of models with native image generation, like the latest GPT-4o, shows they exhibit quirks like hallucinations and spatial misalignments.

Updated: 2025-07-02 17:59:07

标题: GPT-4o对视觉理解得多好？评估多模态基础模型在标准计算机视觉任务上的表现

摘要: 多模态基础模型，如GPT-4o，最近取得了显著进展，但目前还不清楚这些模型在视觉理解方面的具体位置。在本文中，我们对流行的多模态基础模型（GPT-4o、o4-mini、Gemini 1.5 Pro和Gemini 2.0 Flash、Claude 3.5 Sonnet、Qwen2-VL、Llama 3.2）在标准计算机视觉任务（语义分割、目标检测、图像分类、深度和表面法线预测）上的性能进行基准测试，使用已建立的数据集（如COCO、ImageNet及其变种等）。执行此任务的主要挑战是：1）大多数模型经过训练输出文本，无法原生表达多样化的领域，如分段或3D几何，2）许多领先模型是专有的，只能通过API级别访问，即无法访问权重以进行调整。我们通过将标准视觉任务转换为等效的文本提示和API兼容任务，通过提示链创建一个标准化的基准测试框架来解决这些挑战。我们观察到：1）这些模型在任何任务上都不接近最先进的专业模型。但是，2）它们是受人尊敬的通才；这是令人瞩目的，因为它们很可能是主要受过图像文本训练的。3）它们在语义任务上的表现明显优于几何任务。4）虽然提示链技术会影响性能，但更好的模型对提示变化的敏感性较低。5）在非推理模型中，GPT-4o在6项任务中表现最佳，占据了前4名。6）推理模型，例如o3，在几何任务中显示出改进，7）对具有本地图像生成功能的模型进行初步分析，如最新的GPT-4o，显示出幻觉和空间错位等怪癖。

更新时间: 2025-07-02 17:59:07

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01955v1

Test-Time Scaling with Reflective Generative Model

We introduce our first reflective generative model MetaStone-S1, which obtains OpenAI o3's performance via the self-supervised process reward model (SPRM). Through sharing the backbone network and using task-specific heads for next token prediction and process scoring respectively, SPRM successfully integrates the policy model and process reward model(PRM) into a unified interface without extra process annotation, reducing over 99% PRM parameters for efficient reasoning. Equipped with SPRM, MetaStone-S1 is naturally suitable for test time scaling (TTS), and we provide three reasoning effort modes (low, medium, and high), based on the controllable thinking length. Moreover, we empirically establish a scaling law that reveals the relationship between total thinking computation and TTS performance. Experiments demonstrate that our MetaStone-S1 achieves comparable performance to OpenAI-o3-mini's series with only 32B parameter size. To support the research community, we have open-sourced MetaStone-S1 at https://github.com/MetaStone-AI/MetaStone-S1.

Updated: 2025-07-02 17:58:01

标题: 测试时间缩放与反思生成模型

摘要: 我们介绍了我们的第一个反思性生成模型MetaStone-S1，通过自监督过程奖励模型(SPRM)获得了OpenAI o3的性能。通过共享骨干网络并分别使用任务特定头部进行下一个标记预测和过程评分，SPRM成功地将策略模型和过程奖励模型(PRM)集成到一个统一接口中，而无需额外的过程注释，从而降低了超过99%的PRM参数，以实现高效推理。配备SPRM，MetaStone-S1自然适用于测试时间缩放(TTS)，我们提供了三种推理努力模式(低、中、高)，基于可控制的思考长度。此外，我们从经验上建立了一个揭示总体思考计算与TTS性能之间关系的扩展规律。实验证明，我们的MetaStone-S1仅具有32B参数大小，就实现了与OpenAI-o3-mini系列相当的性能。为了支持研究社区，我们已经在https://github.com/MetaStone-AI/MetaStone-S1上开源了MetaStone-S1。

更新时间: 2025-07-02 17:58:01

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2507.01951v1

Characterizing control between interacting subsystems with deep Jacobian estimation

Biological function arises through the dynamical interactions of multiple subsystems, including those between brain areas, within gene regulatory networks, and more. A common approach to understanding these systems is to model the dynamics of each subsystem and characterize communication between them. An alternative approach is through the lens of control theory: how the subsystems control one another. This approach involves inferring the directionality, strength, and contextual modulation of control between subsystems. However, methods for understanding subsystem control are typically linear and cannot adequately describe the rich contextual effects enabled by nonlinear complex systems. To bridge this gap, we devise a data-driven nonlinear control-theoretic framework to characterize subsystem interactions via the Jacobian of the dynamics. We address the challenge of learning Jacobians from time-series data by proposing the JacobianODE, a deep learning method that leverages properties of the Jacobian to directly estimate it for arbitrary dynamical systems from data alone. We show that JacobianODEs outperform existing Jacobian estimation methods on challenging systems, including high-dimensional chaos. Applying our approach to a multi-area recurrent neural network (RNN) trained on a working memory selection task, we show that the "sensory" area gains greater control over the "cognitive" area over learning. Furthermore, we leverage the JacobianODE to directly control the trained RNN, enabling precise manipulation of its behavior. Our work lays the foundation for a theoretically grounded and data-driven understanding of interactions among biological subsystems.

Updated: 2025-07-02 17:55:53

标题: 用深度雅各比估计表征相互作用子系统之间的控制

摘要: 生物功能是通过多个子系统之间的动态相互作用产生的，包括脑区域之间的相互作用，在基因调控网络内部的相互作用等。理解这些系统的常见方法是对每个子系统的动态进行建模，并描述它们之间的通信。另一种方法是通过控制理论的视角：了解子系统如何相互控制。这种方法涉及推断子系统之间控制的方向性、强度和上下文调节。然而，用于理解子系统控制的方法通常是线性的，不能充分描述非线性复杂系统所能实现的丰富上下文效应。为了弥合这一差距，我们设计了一个数据驱动的非线性控制理论框架，通过系统动态的雅可比矩阵来表征子系统之间的相互作用。我们通过提出雅可比ODE方法来解决从时间序列数据中学习雅可比矩阵的挑战，这是一种深度学习方法，利用雅可比矩阵的性质直接从数据中估计任意动态系统的雅可比矩阵。我们展示了雅可比ODE在挑战性系统中的表现优于现有的雅可比估计方法，包括高维混沌系统。将我们的方法应用于一个在工作记忆选择任务上训练过的多区域递归神经网络（RNN），我们展示了“感觉”区域在学习过程中对“认知”区域获得更大的控制。此外，我们利用雅可比ODE直接控制训练过的RNN，实现对其行为的精确操纵。我们的工作为理论上有基础且数据驱动的了解生物子系统之间的相互作用奠定了基础。

更新时间: 2025-07-02 17:55:53

领域: q-bio.QM,cs.LG,math.DS,q-bio.NC

下载: http://arxiv.org/abs/2507.01946v1

SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars

In recent years, large language models (LLMs) have transformed natural language understanding through vast datasets and large-scale parameterization. Inspired by this success, we present SpecCLIP, a foundation model framework that extends LLM-inspired methodologies to stellar spectral analysis. Stellar spectra, akin to structured language, encode rich physical and chemical information about stars. By training foundation models on large-scale spectral datasets, our goal is to learn robust and informative embeddings that support diverse downstream applications. As a proof of concept, SpecCLIP involves pre-training on two spectral types--LAMOST low-resolution and Gaia XP--followed by contrastive alignment using the CLIP (Contrastive Language-Image Pre-training) framework, adapted to associate spectra from different instruments. This alignment is complemented by auxiliary decoders that preserve spectrum-specific information and enable translation (prediction) between spectral types, with the former achieved by maximizing mutual information between embeddings and input spectra. The result is a cross-spectrum framework enabling intrinsic calibration and flexible applications across instruments. We demonstrate that fine-tuning these models on moderate-sized labeled datasets improves adaptability to tasks such as stellar-parameter estimation and chemical-abundance determination. SpecCLIP also enhances the accuracy and precision of parameter estimates benchmarked against external survey data. Additionally, its similarity search and cross-spectrum prediction capabilities offer potential for anomaly detection. Our results suggest that contrastively trained foundation models enriched with spectrum-aware decoders can advance precision stellar spectroscopy.

Updated: 2025-07-02 17:49:52

标题: SpecCLIP：星星的光谱测量对齐和翻译

摘要: 近年来，大型语言模型（LLMs）通过庞大的数据集和大规模参数化改变了自然语言理解。受到这一成功的启发，我们提出了SpecCLIP，这是一个基础模型框架，将LLM启发的方法论扩展到恒星光谱分析。恒星光谱类似于结构化语言，编码了关于恒星的丰富物理和化学信息。通过在大规模光谱数据集上训练基础模型，我们的目标是学习出支持多样化下游应用的稳健和信息丰富的嵌入。作为一个概念验证，SpecCLIP包括在两种光谱类型（LAMOST低分辨率和Gaia XP）上进行预训练，然后使用适应于关联来自不同仪器的光谱的CLIP（对比语言-图像预训练）框架进行对比对齐。这种对齐由保留光谱特定信息并实现光谱类型之间翻译（预测）的辅助解码器来补充，前者通过最大化嵌入和输入光谱之间的互信息来实现。结果是一个跨光谱框架，实现了内在校准和跨仪器的灵活应用。我们展示了在中等规模标记数据集上微调这些模型可以提高对任务的适应性，如恒星参数估计和化学丰度确定。SpecCLIP还提高了参数估计的准确性和精度，与外部调查数据进行了基准测试。此外，其相似性搜索和跨光谱预测功能为异常检测提供了潜力。我们的结果表明，通过对比训练的基础模型结合光谱感知解码器可以推进精密恒星光谱学。

更新时间: 2025-07-02 17:49:52

领域: astro-ph.IM,astro-ph.SR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01939v1

A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Łojasiewicz condition

We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-{\L}ojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-{\L}ojasiewicz (PL) conditions commonly assumed in the literature -- which are significantly stronger and often too restrictive in practice -- this local KL condition accommodates a broader range of practical scenarios. However, it also introduces new analytical challenges. In particular, as an optimization algorithm progresses toward a stationary point of the problem, the region over which the KL condition holds may shrink, resulting in a more intricate and potentially ill-conditioned landscape. To address this challenge, we show that the associated maximal function is locally H\"older smooth. Leveraging this key property, we develop an inexact proximal gradient method for solving the minimax problem, where the inexact gradient of the maximal function is computed by applying a proximal gradient method to a KL-structured subproblem. Under mild assumptions, we establish complexity guarantees for computing an approximate stationary point of the minimax problem.

Updated: 2025-07-02 17:45:10

标题: 一个在局部Kurdyka-Łojasiewicz条件下解决非凸非凹极小极大问题的一阶方法

摘要: 我们研究了一类非凸-非凹极小极大问题，其中内部最大化问题满足一个可能随外部最小化变量变化的局部Kurdyka-{\L}ojasiewicz (KL)条件。与文献中通常假设的全局KL或Polyak-{\L}ojasiewicz (PL)条件不同，后者在实践中往往过于严格和限制性，这个局部KL条件能够适应更广泛的实际情况。然而，它也引入了新的分析挑战。特别是，随着优化算法朝着问题的稳定点前进，KL条件成立的区域可能会收缩，导致更复杂且潜在病态的景观。为了解决这一挑战，我们展示了相关最大函数是局部H\"older光滑的。利用这一关键属性，我们开发了一种用于解决极小极大问题的不精确近端梯度方法，其中最大函数的不精确梯度通过将近端梯度方法应用于一个KL结构子问题而计算得到。在温和的假设下，我们为计算极小极大问题的近似稳定点建立了复杂度保证。

更新时间: 2025-07-02 17:45:10

领域: math.OC,cs.LG,cs.NA,math.NA,stat.ML,90C26, 90C30, 90C47, 90C99, 65K05

下载: http://arxiv.org/abs/2507.01932v1

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

In recent years, neural models trained on large multilingual text and speech datasets have shown great potential for supporting low-resource languages. This study investigates the performances of two state-of-the-art Automatic Speech Recognition (ASR) models, OpenAI's Whisper (Small & Large-V2) and Facebook's Wav2Vec-BERT on Bangla, a low-resource language. We have conducted experiments using two publicly available datasets: Mozilla Common Voice-17 and OpenSLR to evaluate model performances. Through systematic fine-tuning and hyperparameter optimization, including learning rate, epochs, and model checkpoint selection, we have compared the models based on Word Error Rate (WER), Character Error Rate (CER), Training Time, and Computational Efficiency. The Wav2Vec-BERT model outperformed Whisper across all key evaluation metrics, demonstrated superior performance while requiring fewer computational resources, and offered valuable insights to develop robust speech recognition systems in low-resource linguistic settings.

Updated: 2025-07-02 17:44:54

标题: 低资源语言的ASR模型适应性：对孟加拉语中Whisper和Wav2Vec-BERT的比较研究

摘要: 最近，训练在大型多语言文本和语音数据集上的神经模型已经显示出支持低资源语言的巨大潜力。本研究调查了两种最先进的自动语音识别（ASR）模型，OpenAI的Whisper（Small＆Large-V2）和Facebook的Wav2Vec-BERT在孟加拉语（一种低资源语言）上的表现。我们使用两个公开可用的数据集：Mozilla Common Voice-17和OpenSLR来评估模型的性能。通过系统的微调和超参数优化，包括学习率、epochs和模型检查点选择，我们根据词错误率（WER）、字符错误率（CER）、训练时间和计算效率比较了这些模型。Wav2Vec-BERT模型在所有关键评估指标上表现优于Whisper，表现出更优越的性能，同时需要更少的计算资源，并为在低资源语言环境中开发强大的语音识别系统提供了有价值的见解。

更新时间: 2025-07-02 17:44:54

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.01931v1

A Multi-Resolution Dynamic Game Framework for Cross-Echelon Decision-Making in Cyber Warfare

Cyber warfare has become a critical dimension of modern conflict, driven by society's increasing dependence on interconnected digital and physical infrastructure. Effective cyber defense often requires decision-making at different echelons, where the tactical layer focuses on detailed actions such as techniques, tactics, and procedures, while the strategic layer addresses long-term objectives and coordinated planning. Modeling these interactions at different echelons remains challenging due to the dynamic, large-scale, and interdependent nature of cyber environments. To address this, we propose a multi-resolution dynamic game framework in which the tactical layer captures fine-grained interactions using high-resolution extensive-form game trees, while the strategic layer is modeled as a Markov game defined over lower-resolution states abstracted from those game trees. This framework supports scalable reasoning and planning across different levels of abstraction through zoom-in and zoom-out operations that adjust the granularity of the modeling based on operational needs. A case study demonstrates how the framework works and its effectiveness in improving the defender's strategic advantage.

Updated: 2025-07-02 17:42:34

标题: 一个用于网络战中跨层级决策的多分辨率动态博弈框架

摘要: 网络战已经成为现代冲突的一个关键维度，这是由社会对互联数字和物理基础设施日益增加的依赖所驱动的。有效的网络防御通常需要在不同层次上做出决策，其中战术层面侧重于详细行动，如技术、战术和程序，而战略层面则涉及长期目标和协调规划。由于网络环境的动态、大规模和相互依赖的特性，对不同层次的交互建模仍然具有挑战性。为了解决这个问题，我们提出了一个多分辨率动态博弈框架，其中战术层面利用高分辨率的广泛形式博弈树来捕捉细粒度的交互作用，而战略层面则被建模为一个马尔可夫博弈，其定义在从这些博弈树中抽象出的低分辨率状态之上。这个框架通过调整建模的粒度来支持不同抽象层次上的可扩展推理和规划，通过放大和缩小操作根据运营需求调整模型的粒度。一项案例研究展示了这个框架的工作原理以及它在提高防御者战略优势方面的有效性。

更新时间: 2025-07-02 17:42:34

领域: cs.CR,cs.GT

下载: http://arxiv.org/abs/2507.03021v1

Exploring a Hybrid Deep Learning Approach for Anomaly Detection in Mental Healthcare Provider Billing: Addressing Label Scarcity through Semi-Supervised Anomaly Detection

The complexity of mental healthcare billing enables anomalies, including fraud. While machine learning methods have been applied to anomaly detection, they often struggle with class imbalance, label scarcity, and complex sequential patterns. This study explores a hybrid deep learning approach combining Long Short-Term Memory (LSTM) networks and Transformers, with pseudo-labeling via Isolation Forests (iForest) and Autoencoders (AE). Prior work has not evaluated such hybrid models trained on pseudo-labeled data in the context of healthcare billing. The approach is evaluated on two real-world billing datasets related to mental healthcare. The iForest LSTM baseline achieves the highest recall (0.963) on declaration-level data. On the operation-level data, the hybrid iForest-based model achieves the highest recall (0.744), though at the cost of lower precision. These findings highlight the potential of combining pseudo-labeling with hybrid deep learning in complex, imbalanced anomaly detection settings.

Updated: 2025-07-02 17:33:47

标题: 探索一种混合深度学习方法，用于精神卫生保健提供者账单异常检测：通过半监督异常检测解决标签稀缺问题

摘要: 精神保健费用的复杂性导致异常情况，包括欺诈。虽然机器学习方法已被应用于异常检测，但它们往往在处理类别不平衡、标签稀缺和复杂的顺序模式时面临困难。本研究探讨了一种混合深度学习方法，结合了长短期记忆（LSTM）网络和Transformers，通过隔离森林（iForest）和自动编码器（AE）进行伪标记。先前的研究尚未评估在医疗保健费用领域训练的伪标记数据上的这种混合模型。该方法在与精神保健相关的两个真实世界的费用数据集上进行评估。iForest LSTM基线在申报级别数据上实现了最高的召回率（0.963）。在操作级别数据上，基于iForest的混合模型实现了最高的召回率（0.744），尽管以较低的精度为代价。这些发现突显了在复杂、不平衡的异常检测环境中结合伪标记和混合深度学习的潜力。

更新时间: 2025-07-02 17:33:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01924v1

Artificial Intelligence Index Report 2025

Welcome to the eighth edition of the AI Index report. The 2025 Index is our most comprehensive to date and arrives at an important moment, as AI's influence across society, the economy, and global governance continues to intensify. New in this year's report are in-depth analyses of the evolving landscape of AI hardware, novel estimates of inference costs, and new analyses of AI publication and patenting trends. We also introduce fresh data on corporate adoption of responsible AI practices, along with expanded coverage of AI's growing role in science and medicine. Since its founding in 2017 as an offshoot of the One Hundred Year Study of Artificial Intelligence, the AI Index has been committed to equipping policymakers, journalists, executives, researchers, and the public with accurate, rigorously validated, and globally sourced data. Our mission has always been to help these stakeholders make better-informed decisions about the development and deployment of AI. In a world where AI is discussed everywhere - from boardrooms to kitchen tables - this mission has never been more essential. The AI Index continues to lead in tracking and interpreting the most critical trends shaping the field - from the shifting geopolitical landscape and the rapid evolution of underlying technologies, to AI's expanding role in business, policymaking, and public life. Longitudinal tracking remains at the heart of our mission. In a domain advancing at breakneck speed, the Index provides essential context - helping us understand where AI stands today, how it got here, and where it may be headed next. Recognized globally as one of the most authoritative resources on artificial intelligence, the AI Index has been cited in major media outlets such as The New York Times, Bloomberg, and The Guardian; referenced in hundreds of academic papers; and used by policymakers and government agencies around the world.

Updated: 2025-07-02 17:28:08

标题: 人工智能指数报告2025

摘要: 欢迎阅读第八版《AI指数报告》。2025年的指数是迄今为止我们最全面的，它在AI对社会、经济和全球治理的影响持续加剧的重要时刻出现。今年报告中新增了对AI硬件不断变化的深入分析、推理成本的新估算以及对AI出版和专利趋势的新分析。我们还介绍了企业采用负责任AI实践的最新数据，并扩大了对AI在科学和医学领域不断增加的角色的涵盖范围。自2017年作为人工智能一百年研究的分支成立以来，《AI指数报告》一直致力于为政策制定者、记者、高管、研究人员和公众提供准确、经过严格验证和全球资源的数据。我们的使命始终是帮助这些利益相关者更明智地决策发展和部署AI。在AI被广泛讨论的今天 - 从董事会到餐桌上 - 这一使命变得更加重要。《AI指数报告》继续引领追踪和解释塑造该领域最关键趋势的努力 - 从地缘政治格局的变化和基础技术的快速发展，到AI在商业、政策制定和公共生活中扮演的日益重要的角色。纵向追踪仍然是我们使命的核心。在一个飞速发展的领域中，指数提供了必要的背景信息 - 帮助我们了解AI当前的状况、它是如何发展至今的，以及可能的未来发展方向。作为全球公认的人工智能权威资源之一，AI指数已被《纽约时报》、彭博社和《卫报》等主要媒体引用；在数百篇学术论文中被引用；并被世界各地的政策制定者和政府机构使用。

更新时间: 2025-07-02 17:28:08

领域: cs.AI

下载: http://arxiv.org/abs/2504.07139v2

End-to-End Large Portfolio Optimization for Variance Minimization with Neural Networks through Covariance Cleaning

We develop a rotation-invariant neural network that provides the global minimum-variance portfolio by jointly learning how to lag-transform historical returns and how to regularise both the eigenvalues and the marginal volatilities of large equity covariance matrices. This explicit mathematical mapping offers clear interpretability of each module's role, so the model cannot be regarded as a pure black-box. The architecture mirrors the analytical form of the global minimum-variance solution yet remains agnostic to dimension, so a single model can be calibrated on panels of a few hundred stocks and applied, without retraining, to one thousand US equities-a cross-sectional jump that demonstrates robust out-of-sample generalisation. The loss function is the future realized minimum portfolio variance and is optimized end-to-end on real daily returns. In out-of-sample tests from January 2000 to December 2024 the estimator delivers systematically lower realised volatility, smaller maximum drawdowns, and higher Sharpe ratios than the best analytical competitors, including state-of-the-art non-linear shrinkage. Furthermore, although the model is trained end-to-end to produce an unconstrained (long-short) minimum-variance portfolio, we show that its learned covariance representation can be used in general optimizers under long-only constraints with virtually no loss in its performance advantage over competing estimators. These gains persist when the strategy is executed under a highly realistic implementation framework that models market orders at the auctions, empirical slippage, exchange fees, and financing charges for leverage, and they remain stable during episodes of acute market stress.

Updated: 2025-07-02 17:27:29

标题: 通过协方差清理的神经网络实现端到端大型投资组合优化以最小化方差

摘要: 我们开发了一个旋转不变的神经网络，通过联合学习如何对历史收益进行滞后变换以及如何正则化大型股票协方差矩阵的特征值和边际波动性，来提供全局最小方差组合。这种明确的数学映射提供了对每个模块角色的清晰解释，因此该模型不能被视为纯粹的黑盒。该架构反映了全局最小方差解的分析形式，但对维度保持不可知性，因此可以在几百只股票的面板上校准单个模型，并且无需重新训练即可应用到一千只美国股票——这种横截面跃迁展示了强大的样本外泛化能力。损失函数是未来实现的最小投资组合方差，并且在真实的每日收益上进行端到端优化。在从2000年1月到2024年12月的样本外测试中，该估计器交付的实现波动性系统地低于最佳分析竞争对手，包括最先进的非线性收缩方法。此外，尽管该模型被端到端训练以产生不受限制的（多头-空头）最小方差组合，我们展示了其学习的协方差表示可以在长仓约束下的常规优化器中使用，几乎不会损失其在与竞争估计器的性能优势。当策略在一个高度现实的实施框架下执行，该框架模拟了市场订单在拍卖中的情况、经验性滑点、交易费用以及用于杠杆的融资费用，这些收益在市场急剧变化的时期保持稳定。

更新时间: 2025-07-02 17:27:29

领域: q-fin.PM,cs.AI,math.OC,physics.data-an,stat.ML,91G10 (Primary) 68T07, 91G60, 62P05 (Secondary),I.2.6; I.5.1; G.3; J.4

下载: http://arxiv.org/abs/2507.01918v1

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant challenge, particularly when they are conflict. To address this issue, we frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives. We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions. GAPO adaptively rescales the gradients for each objective to determine an update direction that optimally balances the trade-offs between objectives. Additionally, we introduce P-GAPO, which incorporates user preferences across different objectives and achieves Pareto solutions that better align with the user's specific needs. Our theoretical analysis demonstrates that GAPO converges towards a Pareto optimal solution for multiple objectives. Empirical results on Mistral-7B show that GAPO outperforms current state-of-the-art methods, achieving superior performance in both helpfulness and harmlessness.

Updated: 2025-07-02 17:25:26

标题: 渐变自适应策略优化：朝向大型语言模型多目标对齐

摘要: 人类反馈的强化学习（RLHF）已经成为将大型语言模型（LLMs）与人类偏好对齐的强大技术。然而，有效地将LLMs与不同人类偏好对齐仍然是一个重大挑战，特别是当它们存在冲突时。为了解决这个问题，我们将人类价值对齐看作是一个多目标优化问题，旨在最大化一组潜在冲突的目标。我们引入了梯度自适应策略优化（GAPO），这是一种新颖的微调范式，采用多梯度下降方法将LLMs与不同偏好分布对齐。GAPO自适应地重新缩放每个目标的梯度，以确定一个更新方向，最优地平衡目标之间的权衡。此外，我们引入了P-GAPO，它将用户偏好整合到不同目标中，并实现更好地与用户特定需求对齐的帕累托解决方案。我们的理论分析表明，GAPO收敛于多个目标的帕累托最优解。Mistral-7B上的实证结果表明，GAPO优于当前最先进的方法，在帮助性和无害性方面表现出更好的性能。

更新时间: 2025-07-02 17:25:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01915v1

Advancing Magnetic Materials Discovery -- A structure-based machine learning approach for magnetic ordering and magnetic moment prediction

Accurately predicting magnetic behavior across diverse materials systems remains a longstanding challenge due to the complex interplay of structural and electronic factors and is pivotal for the accelerated discovery and design of next-generation magnetic materials. In this work, a refined descriptor is proposed that significantly improves the prediction of two critical magnetic properties -- magnetic ordering (Ferromagnetic vs. Ferrimagnetic) and magnetic moment per atom -- using only the structural information of materials. Unlike previous models limited to Mn-based or lanthanide-transition metal compounds, the present approach generalizes across a diverse dataset of 5741 stable, binary and ternary, ferromagnetic and ferrimagnetic compounds sourced from the Materials Project. Leveraging an enriched elemental vector representation and advanced feature engineering, including nonlinear terms and reduced matrix sparsity, the LightGBM-based model achieves an accuracy of 82.4% for magnetic ordering classification and balanced recall across FM and FiM classes, addressing a key limitation in prior studies. The model predicts magnetic moment per atom with a correlation coefficient of 0.93, surpassing the Hund's matrix and orbital field matrix descriptors. Additionally, it accurately estimates formation energy per atom, enabling assessment of both magnetic behavior and material stability. This generalized and computationally efficient framework offers a robust tool for high-throughput screening of magnetic materials with tailored properties.

Updated: 2025-07-02 17:24:50

标题: 推进磁性材料的发现--基于结构的机器学习方法用于磁性排序和磁矩预测

摘要: 精确预测不同材料系统的磁性行为仍然是一个长期存在的挑战，这是由于结构和电子因素的复杂相互作用，并且对于加速发现和设计下一代磁性材料至关重要。在这项工作中，提出了一个精细的描述符，通过仅使用材料的结构信息，显着改善了两个关键磁性属性的预测 -- 磁性排序（铁磁 vs. 铁亚铁磁）和每个原子的磁矩。与先前仅限于基于锰或镧系过渡金属化合物的模型不同，当前方法泛化到来自Materials Project的5741种稳定、二元和三元、铁磁和铁亚铁磁化合物的多样数据集。通过利用丰富的元素向量表示和先进的特征工程，包括非线性项和减少矩阵稀疏性，基于LightGBM的模型在磁性排序分类上实现了82.4%的准确度，并在FM和FiM类之间实现了平衡的召回率，解决了先前研究中的一个关键限制。该模型预测每个原子的磁矩具有0.93的相关系数，超越了Hund矩阵和轨道场矩阵描述符。此外，它准确估计每个原子的形成能量，从而实现对磁性行为和材料稳定性的评估。这种泛化和计算高效的框架为定制性能的磁性材料的高通量筛选提供了一个强大的工具。

更新时间: 2025-07-02 17:24:50

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2507.01913v1

AI4Research: A Survey of Artificial Intelligence for Scientific Research

Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs) such as OpenAI-o1 and DeepSeek-R1, have demonstrated remarkable capabilities in complex domains such as logical reasoning and experimental coding. Motivated by these advancements, numerous studies have explored the application of AI in the innovation process, particularly in the context of scientific research. These AI technologies primarily aim to develop systems that can autonomously conduct research processes across a wide range of scientific disciplines. Despite these significant strides, a comprehensive survey on AI for Research (AI4Research) remains absent, which hampers our understanding and impedes further development in this field. To address this gap, we present a comprehensive survey and offer a unified perspective on AI4Research. Specifically, the main contributions of our work are as follows: (1) Systematic taxonomy: We first introduce a systematic taxonomy to classify five mainstream tasks in AI4Research. (2) New frontiers: Then, we identify key research gaps and highlight promising future directions, focusing on the rigor and scalability of automated experiments, as well as the societal impact. (3) Abundant applications and resources: Finally, we compile a wealth of resources, including relevant multidisciplinary applications, data corpora, and tools. We hope our work will provide the research community with quick access to these resources and stimulate innovative breakthroughs in AI4Research.

Updated: 2025-07-02 17:19:20

标题: AI4Research：科学研究中的人工智能调查

摘要: 最近人工智能（AI）特别是大型语言模型（LLMs）如OpenAI-o1和DeepSeek-R1的发展，展示了在逻辑推理和实验编码等复杂领域的显著能力。受到这些进展的启发，许多研究探讨了AI在创新过程中的应用，尤其是在科学研究的背景下。这些AI技术主要旨在开发能够自主进行跨多个科学学科的研究过程的系统。尽管取得了重大进展，但仍然缺乏AI用于研究（AI4Research）的全面调查，这阻碍了我们对该领域的理解并阻碍了进一步发展。为了填补这一空白，我们提供了一份全面的调查，并提供了一个统一的AI4Research视角。具体来说，我们工作的主要贡献如下：（1）系统分类：我们首先引入了一个系统分类来对AI4Research中的五项主流任务进行分类。（2）新领域：然后，我们确定了关键的研究差距，并突出了有前途的未来方向，重点放在自动实验的严谨性和可扩展性以及社会影响上。（3）丰富的应用和资源：最后，我们整理了大量资源，包括相关的跨学科应用、数据语料库和工具。我们希望我们的工作能为研究社区提供对这些资源的快速访问，并激发AI4Research中的创新突破。

更新时间: 2025-07-02 17:19:20

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.01903v1

High-Layer Attention Pruning with Rescaling

Pruning is a highly effective approach for compressing large language models (LLMs), significantly reducing inference latency. However, conventional training-free structured pruning methods often employ a heuristic metric that indiscriminately removes some attention heads across all pruning layers, without considering their positions within the network architecture. In this work, we propose a novel pruning algorithm that strategically prunes attention heads in the model's higher layers. Since the removal of attention heads can alter the magnitude of token representations, we introduce an adaptive rescaling parameter that calibrates the representation scale post-pruning to counteract this effect. We conduct comprehensive experiments on a wide range of LLMs, including LLaMA3.1-8B, Mistral-7B-v0.3, Qwen2-7B, and Gemma2-9B. Our evaluation includes both generation and discriminative tasks across 27 datasets. The results consistently demonstrate that our method outperforms existing structured pruning methods. This improvement is particularly notable in generation tasks, where our approach significantly outperforms existing baselines.

Updated: 2025-07-02 17:15:05

标题: 高层次注意力修剪与重缩放

摘要: 修剪是一种高效的方法，用于压缩大型语言模型（LLMs），显著减少推断延迟。然而，传统的无训练结构修剪方法通常使用启发式度量，不加区分地删除所有修剪层中的一些注意力头，而不考虑它们在网络架构中的位置。在这项工作中，我们提出了一种新颖的修剪算法，可以有策略地修剪模型中较高层的注意力头。由于删除注意力头可能改变令牌表示的大小，我们引入了一个自适应的重新缩放参数，用于在修剪后校准表示尺度以抵消这种影响。我们在各种LLMs上进行了全面的实验，包括LLaMA3.1-8B、Mistral-7B-v0.3、Qwen2-7B和Gemma2-9B。我们的评估涵盖了27个数据集上的生成和判别任务。结果一致表明，我们的方法优于现有的结构修剪方法。这种改进在生成任务中特别显著，我们的方法明显优于现有基线。

更新时间: 2025-07-02 17:15:05

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.01900v1

Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?

Large language models (LLMs) are increasingly used in the creation of online content, creating feedback loops as subsequent generations of models will be trained on this synthetic data. Such loops were shown to lead to distribution shifts - models misrepresenting the true underlying distributions of human data (also called model collapse). However, how human data properties affect such shifts remains poorly understood. In this paper, we provide the first empirical examination of the effect of such properties on the outcome of recursive training. We first confirm that using different human datasets leads to distribution shifts of different magnitudes. Through exhaustive manipulation of dataset properties combined with regression analyses, we then identify a set of properties predicting distribution shift magnitudes. Lexical diversity is found to amplify these shifts, while semantic diversity and data quality mitigate them. Furthermore, we find that these influences are highly modular: data scrapped from a given internet domain has little influence on the content generated for another domain. Finally, experiments on political bias reveal that human data properties affect whether the initial bias will be amplified or reduced. Overall, our results portray a novel view, where different parts of internet may undergo different types of distribution shift.

Updated: 2025-07-02 17:14:11

标题: LLMs中的递归训练循环：训练数据属性如何调节生成数据中的分布偏移？

摘要: 大型语言模型（LLMs）在创作在线内容中越来越被广泛使用，从而创建反馈循环，随后的模型代际将在这些合成数据上进行训练。这种循环被证明会导致分布偏移 - 模型误代表人类数据的真实基础分布（也称为模型崩溃）。然而，人类数据属性如何影响这种偏移仍然知之甚少。在本文中，我们首次对这些属性对递归训练结果的影响进行了实证检验。我们首先证实使用不同的人类数据集会导致不同幅度的分布偏移。通过对数据集属性的详尽操纵结合回归分析，我们随后确定了一组预测分布偏移幅度的属性。发现词汇多样性会增强这些偏移，而语义多样性和数据质量则会减轻它们。此外，我们发现这些影响是高度模块化的：从给定互联网域中抓取的数据对于为另一个域生成的内容几乎没有影响。最后，对政治偏见的实验显示，人类数据属性影响初始偏见是被放大还是被减少。总的来说，我们的结果描绘了一种新颖的观点，即互联网的不同部分可能经历不同类型的分布偏移。

更新时间: 2025-07-02 17:14:11

领域: cs.LG,cs.AI,cs.CL,68T50,I.2.7

下载: http://arxiv.org/abs/2504.03814v3

STEM Diffraction Pattern Analysis with Deep Learning Networks

Accurate grain orientation mapping is essential for understanding and optimizing the performance of polycrystalline materials, particularly in energy-related applications. Lithium nickel oxide (LiNiO$_{2}$) is a promising cathode material for next-generation lithium-ion batteries, and its electrochemical behaviour is closely linked to microstructural features such as grain size and crystallographic orientations. Traditional orientation mapping methods--such as manual indexing, template matching (TM), or Hough transform-based techniques--are often slow and noise-sensitive when handling complex or overlapping patterns, creating a bottleneck in large-scale microstructural analysis. This work presents a machine learning-based approach for predicting Euler angles directly from scanning transmission electron microscopy (STEM) diffraction patterns (DPs). This enables the automated generation of high-resolution crystal orientation maps, facilitating the analysis of internal microstructures at the nanoscale. Three deep learning architectures--convolutional neural networks (CNNs), Dense Convolutional Networks (DenseNets), and Shifted Windows (Swin) Transformers--are evaluated, using an experimentally acquired dataset labelled via a commercial TM algorithm. While the CNN model serves as a baseline, both DenseNets and Swin Transformers demonstrate superior performance, with the Swin Transformer achieving the highest evaluation scores and the most consistent microstructural predictions. The resulting crystal maps exhibit clear grain boundary delineation and coherent intra-grain orientation distributions, underscoring the potential of attention-based architectures for analyzing diffraction-based image data. These findings highlight the promise of combining advanced machine learning models with STEM data for robust, high-throughput microstructural characterization.

Updated: 2025-07-02 16:58:09

标题: 用深度学习网络分析STEM衍射图案

摘要: 准确的晶粒取向映射对于理解和优化多晶材料的性能至关重要，特别是在与能源相关的应用中。氧化镍锂（LiNiO$_{2}$）是下一代锂离子电池的有前途的阴极材料，其电化学行为与微观结构特征（如晶粒大小和晶体取向）密切相关。传统的取向映射方法--如手动索引、模板匹配（TM）或基于Hough变换的技术--在处理复杂或重叠图案时通常较慢且易受噪声影响，从而在大规模微观结构分析中造成瓶颈。本研究提出了一种基于机器学习的方法，可直接从扫描透射电子显微镜（STEM）衍射图案（DPs）中预测欧拉角。这使得可以自动生成高分辨率的晶体取向图，便于在纳米尺度上分析内部微观结构。评估了三种深度学习架构--卷积神经网络（CNN）、稠密卷积网络（DenseNets）和移位窗口（Swin）变换器，使用通过商业TM算法标记的实验获取的数据集。虽然CNN模型作为基准，但DenseNets和Swin变换器表现出更优异的性能，Swin变换器获得了最高的评估分数和最一致的微观结构预测。结果显示的晶体图展示了清晰的晶界描绘和一致的晶内取向分布，强调了基于注意力的架构在分析基于衍射的图像数据方面的潜力。这些发现突显了将先进的机器学习模型与STEM数据相结合，进行强大、高通量的微观结构表征的前景。

更新时间: 2025-07-02 16:58:09

领域: cond-mat.dis-nn,cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2507.01889v1

A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs

Low-dose computed tomography (LDCT) imaging employed in lung cancer screening (LCS) programs is increasing in uptake worldwide. LCS programs herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease. Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. Relative to models trained from scratch, TANGERINE demonstrates fast convergence during fine-tuning, thereby requiring significantly fewer GPU hours, and displays strong label efficiency, achieving comparable or superior performance with a fraction of fine-tuning data. Pretrained using self-supervised learning on over 98,000 thoracic LDCTs, including the UK's largest LCS initiative to date and 27 public datasets, TANGERINE achieves state-of-the-art performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, while generalising robustly across diverse clinical centres. By extending a masked autoencoder framework to 3D imaging, TANGERINE offers a scalable solution for LDCT analysis, departing from recent closed, resource-intensive models by combining architectural simplicity, public availability, and modest computational requirements. Its accessible, open-source lightweight design lays the foundation for rapid integration into next-generation medical imaging tools that could transform LCS initiatives, allowing them to pivot from a singular focus on lung cancer detection to comprehensive respiratory disease management in high-risk populations.

Updated: 2025-07-02 16:52:10

标题: 一个计算节俭的开源基础模型，用于在肺癌筛查项目中检测胸部疾病

摘要: 低剂量计算机断层扫描(LDCT)成像在全球肺癌筛查(LCS)项目中的应用正在逐渐增加。LCS项目标志着一个代际机会，可以同时检测癌症和非癌症相关的早期肺部疾病。然而，这些努力受到了放射科医生短缺的阻碍，无法按规模解释扫描结果。在这里，我们介绍了TANGERINE，这是一个计算节俭、开源的体积LDCT分析视觉基础模型。TANGERINE设计宽泛，易于快速适应，可以直接进行微调，用有限的计算资源和训练数据完成各种特定疾病任务。与从头开始训练的模型相比，TANGERINE在微调过程中显示出快速收敛，因此需要较少的GPU小时，并且显示出较强的标签效率，在使用少量微调数据时实现了可比或更优秀的性能。TANGERINE在超过98000个胸部LDCT进行了自监督学习预训练，包括迄今为止英国最大的LCS计划和27个公共数据集，跨越14个疾病分类任务取得了最先进的性能，包括肺癌和多种呼吸系统疾病，同时在不同的临床中心之间具有很好的泛化能力。通过将自编码器框架扩展到3D成像，TANGERINE为LDCT分析提供了可伸缩的解决方案，使其与最近的封闭、资源密集型模型有所区别，结合了架构简单、公开可用和适度的计算需求。其易于访问的开源轻量级设计为快速整合到下一代医学成像工具奠定了基础，这可以改变LCS倡议的方向，使其从单一关注肺癌检测转变为在高风险人群中进行全面呼吸系统疾病管理。

更新时间: 2025-07-02 16:52:10

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.01881v1

Evolving HPC services to enable ML workloads on HPE Cray EX

The Alps Research Infrastructure leverages GH200 technology at scale, featuring 10,752 GPUs. Accessing Alps provides a significant computational advantage for researchers in Artificial Intelligence (AI) and Machine Learning (ML). While Alps serves a broad range of scientific communities, traditional HPC services alone are not sufficient to meet the dynamic needs of the ML community. This paper presents an initial investigation into extending HPC service capabilities to better support ML workloads. We identify key challenges and gaps we have observed since the early-access phase (2023) of Alps by the Swiss AI community and propose several technological enhancements. These include a user environment designed to facilitate the adoption of HPC for ML workloads, balancing performance with flexibility; a utility for rapid performance screening of ML applications during development; observability capabilities and data products for inspecting ongoing large-scale ML workloads; a utility to simplify the vetting of allocated nodes for compute readiness; a service plane infrastructure to deploy various types of workloads, including support and inference services; and a storage infrastructure tailored to the specific needs of ML workloads. These enhancements aim to facilitate the execution of ML workloads on HPC systems, increase system usability and resilience, and better align with the needs of the ML community. We also discuss our current approach to security aspects. This paper concludes by placing these proposals in the broader context of changes in the communities served by HPC infrastructure like ours.

Updated: 2025-07-02 16:50:49

标题: 将HPE Cray EX上的ML工作负载演变为HPC服务

摘要: 阿尔卑斯研究基础设施利用GH200技术规模化，拥有10,752个GPU。访问阿尔卑斯为人工智能（AI）和机器学习（ML）研究人员提供了显著的计算优势。尽管阿尔卑斯为广泛的科学社区提供服务，但传统的HPC服务单独并不足以满足ML社区的动态需求。本文首次调查了如何扩展HPC服务能力，以更好地支持ML工作负载。我们从瑞士AI社区自2023年早期访问阿尔卑斯开始观察到的关键挑战和差距，并提出了几项技术增强措施。这些措施包括设计用于促进ML工作负载采用HPC的用户环境，平衡性能与灵活性；在开发过程中快速筛查ML应用程序性能的实用工具；用于检查正在进行中的大规模ML工作负载的可观察性功能和数据产品；简化为计算准备就绪的分配节点审核的实用工具；用于部署各种类型工作负载的服务平面基础设施，包括支持和推断服务；以及专门满足ML工作负载特定需求的存储基础设施。这些增强措施旨在促进在HPC系统上执行ML工作负载，提高系统可用性和韧性，并更好地满足ML社区的需求。我们还讨论了当前的安全方面的处理方法。本文最后将这些提议置于服务于类似我们的HPC基础设施的社区变化的更广泛背景中。

更新时间: 2025-07-02 16:50:49

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2507.01880v1

Towards Foundation Auto-Encoders for Time-Series Anomaly Detection

We investigate a novel approach to time-series modeling, inspired by the successes of large pretrained foundation models. We introduce FAE (Foundation Auto-Encoders), a foundation generative-AI model for anomaly detection in time-series data, based on Variational Auto-Encoders (VAEs). By foundation, we mean a model pretrained on massive amounts of time-series data which can learn complex temporal patterns useful for accurate modeling, forecasting, and detection of anomalies on previously unseen datasets. FAE leverages VAEs and Dilated Convolutional Neural Networks (DCNNs) to build a generic model for univariate time-series modeling, which could eventually perform properly in out-of-the-box, zero-shot anomaly detection applications. We introduce the main concepts of FAE, and present preliminary results in different multi-dimensional time-series datasets from various domains, including a real dataset from an operational mobile ISP, and the well known KDD 2021 Anomaly Detection dataset.

Updated: 2025-07-02 16:39:36

标题: 朝向基于自动编码器的时间序列异常检测模型

摘要: 我们调查了一种新颖的时间序列建模方法，灵感来源于大型预训练基础模型的成功。我们引入了FAE（Foundation Auto-Encoders），这是一种基于变分自动编码器（VAEs）的基础生成式AI模型，用于时间序列数据的异常检测。通过基础，我们指的是在大量时间序列数据上预训练的模型，它可以学习复杂的时间模式，有助于准确建模、预测和检测先前未见的数据集中的异常。FAE利用VAEs和扩展卷积神经网络（DCNNs）构建了一个用于单变量时间序列建模的通用模型，最终可以在开箱即用、零样本异常检测应用中正常运行。我们介绍了FAE的主要概念，并在来自不同领域的多维时间序列数据集上呈现了初步结果，包括来自运营移动ISP的真实数据集和著名的KDD 2021异常检测数据集。

更新时间: 2025-07-02 16:39:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01875v1

Transparent Machine Learning: Training and Refining an Explainable Boosting Machine to Identify Overshooting Tops in Satellite Imagery

An Explainable Boosting Machine (EBM) is an interpretable machine learning (ML) algorithm that has benefits in high risk applications but has not yet found much use in atmospheric science. The overall goal of this work is twofold: (1) explore the use of EBMs, in combination with feature engineering, to obtain interpretable, physics-based machine learning algorithms for meteorological applications; (2) illustrate these methods for the detection of overshooting top (OTs) in satellite imagery. Specifically, we seek to simplify the process of OT detection by first using mathematical methods to extract key features, such as cloud texture using Gray-Level Co-occurrence Matrices, followed by applying an EBM. Our EBM focuses on the classification task of predicting OT regions, utilizing Channel 2 (visible imagery) and Channel 13 (infrared imagery) of the Advanced Baseline Imager sensor of the Geostationary Operational Environmental Satellite 16. Multi-Radar/Multi-Sensor system convection flags are used as labels to train the EBM model. Note, however, that detecting convection, while related, is different from detecting OTs. Once trained, the EBM was examined and minimally altered to more closely match strategies used by domain scientists to identify OTs. The result of our efforts is a fully interpretable ML algorithm that was developed in a human-machine collaboration. While the final model does not reach the accuracy of more complex approaches, it performs well and represents a significant step toward building fully interpretable ML algorithms for this and other meteorological applications.

Updated: 2025-07-02 16:34:50

标题: 透明机器学习：训练和改进可解释的增强机器，以识别卫星图像中的超出顶部

摘要: 可解释性提升机（EBM）是一种可解释的机器学习（ML）算法，在高风险应用中具有优势，但在大气科学中尚未得到广泛应用。本文的总体目标是双重的：（1）探索利用EBMs与特征工程相结合，为气象应用获得可解释的、基于物理学的机器学习算法；（2）展示这些方法用于卫星图像中超高顶（OTs）的检测。具体来说，我们试图通过首先使用数学方法提取关键特征，如使用灰度共生矩阵的云纹理，然后应用EBM来简化OT检测过程。我们的EBM专注于预测OT区域的分类任务，利用地球同步操作环境卫星16号的先进基线成像仪传感器的通道2（可见光图像）和通道13（红外图像）。多雷达/多传感器系统对流标志被用作标签来训练EBM模型。然而，需要注意的是，检测对流虽然相关，但与检测OTs是不同的。训练完成后，对EBM进行检查并进行最小修改，以更贴近领域科学家用于识别OTs的策略。我们努力的结果是一个完全可解释的ML算法，是由人机协作开发的。虽然最终模型没有达到更复杂方法的准确度，但它表现良好，并代表了朝着为这个和其他气象应用构建完全可解释的ML算法迈出的重要一步。

更新时间: 2025-07-02 16:34:50

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.03183v1

Real-Time Blind Defocus Deblurring for Earth Observation: The IMAGIN-e Mission Approach

This work addresses mechanical defocus in Earth observation images from the IMAGIN-e mission aboard the ISS, proposing a blind deblurring approach adapted to space-based edge computing constraints. Leveraging Sentinel-2 data, our method estimates the defocus kernel and trains a restoration model within a GAN framework, effectively operating without reference images. On Sentinel-2 images with synthetic degradation, SSIM improved by 72.47% and PSNR by 25.00%, confirming the model's ability to recover lost details when the original clean image is known. On IMAGIN-e, where no reference images exist, perceptual quality metrics indicate a substantial enhancement, with NIQE improving by 60.66% and BRISQUE by 48.38%, validating real-world onboard restoration. The approach is currently deployed aboard the IMAGIN-e mission, demonstrating its practical application in an operational space environment. By efficiently handling high-resolution images under edge computing constraints, the method enables applications such as water body segmentation and contour detection while maintaining processing viability despite resource limitations.

Updated: 2025-07-02 16:31:32

标题: 实时盲区域去模糊技术在地球观测中的应用：IMAGIN-e任务方法论

摘要: 这项工作针对国际空间站上IMAGIN-e任务的地球观测图像中的机械散焦问题，提出了一种适应空间边缘计算约束的盲去模糊方法。利用Sentinel-2数据，我们的方法估计了散焦核并在GAN框架内训练了一个恢复模型，有效地在没有参考图像的情况下运行。在具有合成退化的Sentinel-2图像上，SSIM改善了72.47%，PSNR改善了25.00%，证实了该模型在原始清晰图像已知时恢复丢失细节的能力。在IMAGIN-e上，没有参考图像存在，感知质量度量指标表明明显改善，NIQE改善了60.66%，BRISQUE改善了48.38%，验证了实际的机载恢复。该方法目前已部署在IMAGIN-e任务上，展示了在操作空间环境中的实际应用。通过在边缘计算约束下高效处理高分辨率图像，该方法实现了水体分割和轮廓检测等应用，同时保持了在资源有限情况下的处理可行性。

更新时间: 2025-07-02 16:31:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2505.22128v2

Adapting Probabilistic Risk Assessment for AI

Modern general-purpose artificial intelligence (AI) systems present an urgent risk management challenge, as their rapidly evolving capabilities and potential for catastrophic harm outpace our ability to reliably assess their risks. Current methods often rely on selective testing and undocumented assumptions about risk priorities, frequently failing to make a serious attempt at assessing the set of pathways through which AI systems pose direct or indirect risks to society and the biosphere. This paper introduces the probabilistic risk assessment (PRA) for AI framework, adapting established PRA techniques from high-reliability industries (e.g., nuclear power, aerospace) for the new challenges of advanced AI. The framework guides assessors in identifying potential risks, estimating likelihood and severity bands, and explicitly documenting evidence, underlying assumptions, and analyses at appropriate granularities. The framework's implementation tool synthesizes the results into a risk report card with aggregated risk estimates from all assessed risks. It introduces three methodological advances: (1) Aspect-oriented hazard analysis provides systematic hazard coverage guided by a first-principles taxonomy of AI system aspects (e.g. capabilities, domain knowledge, affordances); (2) Risk pathway modeling analyzes causal chains from system aspects to societal impacts using bidirectional analysis and incorporating prospective techniques; and (3) Uncertainty management employs scenario decomposition, reference scales, and explicit tracing protocols to structure credible projections with novelty or limited data. Additionally, the framework harmonizes diverse assessment methods by integrating evidence into comparable, quantified absolute risk estimates for lifecycle decisions. We have implemented this as a workbook tool for AI developers, evaluators, and regulators.

Updated: 2025-07-02 16:26:45

标题: 《为人工智能调整概率风险评估》

摘要: 现代通用人工智能（AI）系统提出了一个紧迫的风险管理挑战，因为它们快速发展的能力和潜在的灾难性危害超出了我们可靠评估其风险的能力。目前的方法通常依赖于选择性测试和未记录的风险优先级假设，经常未能认真评估AI系统对社会和生物圈构成直接或间接风险的途径集。本文介绍了适用于AI的概率风险评估（PRA）框架，将高可靠性行业（如核能、航空航天）已建立的PRA技术调整为先进AI的新挑战。该框架引导评估者确定潜在风险，估计可能性和严重性范围，并明确记录证据、基本假设和适当细粒度的分析。该框架的实施工具将结果合成为一份风险报告卡，其中包括所有评估风险的综合风险估计。它引入了三项方法论进步：（1）面向方面的危险分析提供由AI系统方面的首要分类指导的系统危险覆盖；（2）风险路径建模分析从系统方面到社会影响的因果链，使用双向分析并结合前瞻性技术；（3）不确定性管理采用场景分解、参考标度和明确的追踪协议，以结构化可信的具有新颖性或有限数据的预测。此外，该框架通过将证据整合到可比较的、量化的绝对风险估计中，实现了对生命周期决策的多样评估方法的协调。我们已将其实施为AI开发人员、评估者和监管者的工作簿工具。

更新时间: 2025-07-02 16:26:45

领域: cs.AI,cs.CY,cs.LG,cs.SY,eess.SY,stat.AP

下载: http://arxiv.org/abs/2504.18536v3

Bridging UI Design and chatbot Interactions: Applying Form-Based Principles to Conversational Agents

Domain specific chatbot applications often involve multi step interactions, such as refining search filters, selecting multiple items, or performing comparisons. Traditional graphical user interfaces (GUIs) handle these workflows by providing explicit "Submit" (commit data) and "Reset" (discard data) actions, allowing back-end systems to track user intent unambiguously. In contrast, conversational agents rely on subtle language cues, which can lead to confusion and incomplete context management. This paper proposes modeling these GUI inspired metaphors acknowledgment (submit like) and context switching (reset-like) as explicit tasks within large language model (LLM) prompts. By capturing user acknowledgment, reset actions, and chain of thought (CoT) reasoning as structured session data, we preserve clarity, reduce user confusion, and align domain-specific chatbot interactions with back-end logic. We demonstrate our approach in hotel booking and customer management scenarios, highlighting improvements in multi-turn task coherence, user satisfaction, and efficiency.

Updated: 2025-07-02 16:24:50

标题: UI设计与聊天机器人交互的桥梁：将基于表单的原则应用于对话代理

摘要: 特定领域的聊天机器人应用通常涉及多步交互，例如细化搜索过滤器、选择多个项目或进行比较。传统的图形用户界面(GUI)通过提供明确的“提交”(提交数据)和“重置”(丢弃数据)操作来处理这些工作流程，允许后端系统明确跟踪用户意图。相比之下，对话代理依赖于微妙的语言线索，这可能导致混乱和不完整的上下文管理。本文提出将这些受GUI启发的隐喻确认（类似提交）和上下文切换（类似重置）建模为大型语言模型(LLM)提示中的明确任务。通过将用户确认、重置操作和思维链(CoT)推理作为结构化会话数据捕获，我们保持了清晰度，减少了用户混淆，并将特定领域的聊天机器人交互与后端逻辑对齐。我们在酒店预订和客户管理场景中展示了我们的方法，突显了多轮任务连贯性、用户满意度和效率的改进。

更新时间: 2025-07-02 16:24:50

领域: cs.HC,cs.AI,H.5.2; I.2.7

下载: http://arxiv.org/abs/2507.01862v1

Distribution Matching for Self-Supervised Transfer Learning

In this paper, we propose a novel self-supervised transfer learning method called \underline{\textbf{D}}istribution \underline{\textbf{M}}atching (DM), which drives the representation distribution toward a predefined reference distribution while preserving augmentation invariance. DM results in a learned representation space that is intuitively structured and therefore easy to interpret. Experimental results across multiple real-world datasets and evaluation metrics demonstrate that DM performs competitively on target classification tasks compared to existing self-supervised transfer learning methods. Additionally, we provide robust theoretical guarantees for DM, including a population theorem and an end-to-end sample theorem. The population theorem bridges the gap between the self-supervised learning task and target classification accuracy, while the sample theorem shows that, even with a limited number of samples from the target domain, DM can deliver exceptional classification performance, provided the unlabeled sample size is sufficiently large.

Updated: 2025-07-02 16:12:55

标题: 自监督迁移学习的分布匹配

摘要: 在本文中，我们提出了一种新颖的自监督迁移学习方法，称为分布匹配（DM），它将表示分布驱动到预定义的参考分布，同时保留增强不变性。DM导致学习到的表示空间具有直观结构，因此易于解释。通过多个真实世界数据集和评估指标的实验结果表明，与现有的自监督迁移学习方法相比，DM在目标分类任务上表现出竞争力。此外，我们为DM提供了稳健的理论保证，包括一个总体定理和一个端到端样本定理。总体定理弥合了自监督学习任务和目标分类准确性之间的差距，而样本定理表明，即使从目标域中有限数量的样本，只要无标签样本数量足够大，DM可以提供出色的分类性能。

更新时间: 2025-07-02 16:12:55

领域: stat.ML,cs.AI,cs.LG,stat.ME

下载: http://arxiv.org/abs/2502.14424v2

Low-Perplexity LLM-Generated Sequences and Where To Find Them

As Large Language Models (LLMs) become increasingly widespread, understanding how specific training data shapes their outputs is crucial for transparency, accountability, privacy, and fairness. To explore how LLMs leverage and replicate their training data, we introduce a systematic approach centered on analyzing low-perplexity sequences - high-probability text spans generated by the model. Our pipeline reliably extracts such long sequences across diverse topics while avoiding degeneration, then traces them back to their sources in the training data. Surprisingly, we find that a substantial portion of these low-perplexity spans cannot be mapped to the corpus. For those that do match, we quantify the distribution of occurrences across source documents, highlighting the scope and nature of verbatim recall and paving a way toward better understanding of how LLMs training data impacts their behavior.

Updated: 2025-07-02 15:58:51

标题: 低困惑度LLM生成的序列以及如何找到它们

摘要: 随着大型语言模型（LLMs）的日益普及，了解特定训练数据如何塑造它们的输出对于透明度、问责制、隐私和公平性至关重要。为了探究LLMs如何利用和复制它们的训练数据，我们引入了一种系统化方法，重点分析低困惑度序列 - 模型生成的高概率文本片段。我们的流程可可靠地提取这些长序列，涵盖各种主题，同时避免退化，然后将它们追溯到训练数据的来源。令人惊讶的是，我们发现这些低困惑度片段中有相当一部分无法与语料库匹配。对于那些匹配的片段，我们量化它们在源文档中的分布，突出逐字复述的范围和性质，为更好地理解LLMs训练数据如何影响其行为铺平道路。

更新时间: 2025-07-02 15:58:51

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.01844v1

Automatic Rank Determination for Low-Rank Adaptation via Submodular Function Maximization

In this paper, we propose SubLoRA, a rank determination method for Low-Rank Adaptation (LoRA) based on submodular function maximization. In contrast to prior approaches, such as AdaLoRA, that rely on first-order (linearized) approximations of the loss function, SubLoRA utilizes second-order information to capture the potentially complex loss landscape by incorporating the Hessian matrix. We show that the linearization becomes inaccurate and ill-conditioned when the LoRA parameters have been well optimized, motivating the need for a more reliable and nuanced second-order formulation. To this end, we reformulate the rank determination problem as a combinatorial optimization problem with a quadratic objective. However, solving this problem exactly is NP-hard in general. To overcome the computational challenge, we introduce a submodular function maximization framework and devise a greedy algorithm with approximation guarantees. We derive a sufficient and necessary condition under which the rank-determination objective becomes submodular, and construct a closed-form projection of the Hessian matrix that satisfies this condition while maintaining computational efficiency. Our method combines solid theoretical foundations, second-order accuracy, and practical computational efficiency. We further extend SubLoRA to a joint optimization setting, alternating between LoRA parameter updates and rank determination under a rank budget constraint. Extensive experiments on fine-tuning physics-informed neural networks (PINNs) for solving partial differential equations (PDEs) demonstrate the effectiveness of our approach. Results show that SubLoRA outperforms existing methods in both rank determination and joint training performance.

Updated: 2025-07-02 15:56:40

标题: 通过次模函数最大化实现低秩自适应的自动秩确定

摘要: 在本文中，我们提出了SubLoRA，一种基于次模函数最大化的低秩适应（LoRA）的排名确定方法。与AdaLoRA等先前方法不同，这些方法依赖于损失函数的一阶（线性化）近似，SubLoRA利用二阶信息来捕捉可能复杂的损失景观，通过合并Hessian矩阵。我们展示了当LoRA参数已经被优化到最佳时，线性化变得不准确和病态化的现象，这激发了需要更可靠和微妙的二阶公式的需求。为此，我们将排名确定问题重新制定为具有二次目标的组合优化问题。然而，在一般情况下，解决这个问题是NP难的。为了克服计算挑战，我们引入了一个次模函数最大化框架，并设计了一个具有近似保证的贪婪算法。我们推导了一种充分必要条件，使得排名确定目标变成次模的，并构建了一个满足这个条件的Hessian矩阵的闭合形式投影，同时保持计算效率。我们的方法结合了坚实的理论基础，二阶精度和实用的计算效率。我们进一步将SubLoRA扩展到一个联合优化设置，交替进行LoRA参数更新和在排名预算约束下的排名确定。对基于物理信息的神经网络（PINNs）进行微调以解决偏微分方程（PDEs）的广泛实验表明了我们方法的有效性。结果表明，SubLoRA在排名确定和联合训练性能方面优于现有方法。

更新时间: 2025-07-02 15:56:40

领域: cs.LG,cs.IT,eess.SP,math.IT,math.OC

下载: http://arxiv.org/abs/2507.01841v1

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document understanding. We open-source GLM-4.1V-9B-Thinking, which achieves state-of-the-art performance among models of comparable size. In a comprehensive evaluation across 28 public benchmarks, our model outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on 18 benchmarks relative to the significantly larger Qwen2.5-VL-72B. Notably, GLM-4.1V-9B-Thinking also demonstrates competitive or superior performance compared to closed-source models such as GPT-4o on challenging tasks including long document understanding and STEM reasoning, further underscoring its strong capabilities. Code, models and more information are released at https://github.com/THUDM/GLM-4.1V-Thinking.

Updated: 2025-07-02 15:53:43

标题: GLM-4.1V-Thinking: 朝向可伸缩强化学习的多模态通用推理

摘要: 我们介绍了GLM-4.1V-Thinking，这是一个旨在推进通用多模态理解和推理的视觉语言模型（VLM）。在这份报告中，我们分享了在推理为中心的训练框架开发中的关键发现。我们首先通过大规模预训练开发了一个具有巨大潜力的视觉基础模型，可以说这为最终性能设定了上限。然后，我们提出了采用课程抽样的强化学习（RLCS）来释放模型的全部潜力，从而在各种任务中实现全面的能力提升，包括STEM问题解决、视频理解、内容识别、编码、基于GUI的代理、以及长文档理解。我们开源了GLM-4.1V-9B-Thinking，该模型在与相同规模模型的综合性能评估中达到了最先进的水平。在对28个公共基准测试的全面评估中，我们的模型在几乎所有任务上均优于Qwen2.5-VL-7B，并在18个基准测试中达到了与规模明显更大的Qwen2.5-VL-72B相当或甚至更优秀的性能。值得注意的是，GLM-4.1V-9B-Thinking在具有挑战性的任务上，包括长文档理解和STEM推理方面，也展现出与GPT-4o等闭源模型相竞争或更优秀的性能，进一步突显其强大的能力。代码、模型和更多信息已发布在https://github.com/THUDM/GLM-4.1V-Thinking。

更新时间: 2025-07-02 15:53:43

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01006v2

Generalization performance of narrow one-hidden layer networks in the teacher-student setting

Understanding the generalization abilities of neural networks for simple input-output distributions is crucial to account for their learning performance on real datasets. The classical teacher-student setting, where a network is trained from data obtained thanks to a label-generating teacher model, serves as a perfect theoretical test bed. In this context, a complete theoretical account of the performance of fully connected one-hidden layer networks in the presence of generic activation functions is lacking. In this work, we develop such a general theory for narrow networks, i.e. networks with a large number of hidden units, yet much smaller than the input dimension. Using methods from statistical physics, we provide closed-form expressions for the typical performance of both finite temperature (Bayesian) and empirical risk minimization estimators, in terms of a small number of weight statistics. In doing so, we highlight the presence of a transition where hidden neurons specialize when the number of samples is sufficiently large and proportional to the number of parameters of the network. Our theory accurately predicts the generalization error of neural networks trained on regression or classification tasks with either noisy full-batch gradient descent (Langevin dynamics) or full-batch gradient descent.

Updated: 2025-07-02 15:49:53

标题: 窄一隐藏层网络在师生设置中的泛化性能

摘要: 理解神经网络对简单输入输出分布的泛化能力对于解释它们在真实数据集上的学习表现至关重要。经典的师生设置，其中一个网络是从由标签生成的教师模型获得的数据中进行训练，作为一个完美的理论测试平台。在这种情况下，对于具有通用激活函数的全连接单隐藏层网络的性能的完整理论解释尚不完备。在这项工作中，我们为窄网络开发了这样一个通用理论，即具有大量隐藏单元但远远小于输入维度的网络。利用统计物理方法，我们提供了有限温度（贝叶斯）和经验风险最小化估计器的典型性能的闭式表达式，这些表达式涉及少量权重统计量。在这样做的过程中，我们强调了当样本数量足够大且与网络参数数量成比例时，隐藏神经元在特定情况下会专门化的转变的存在。我们的理论准确预测了在回归或分类任务上使用嘈杂的全批量梯度下降（朗格朗日动力学）或全批量梯度下降训练的神经网络的泛化误差。

更新时间: 2025-07-02 15:49:53

领域: cond-mat.dis-nn,cs.LG,math.PR,math.ST,stat.TH

下载: http://arxiv.org/abs/2507.00629v2

Refining Gelfond Rationality Principle Towards More Comprehensive Foundational Principles for Answer Set Semantics

Non-monotonic logic programming is the basis for a declarative problem solving paradigm known as answer set programming (ASP). Departing from the seminal definition by Gelfond and Lifschitz in 1988 for simple normal logic programs, various answer set semantics have been proposed for extensions. We consider two important questions: (1) Should the minimal model property, constraint monotonicity and foundedness as defined in the literature be mandatory conditions for an answer set semantics in general? (2) If not, what other properties could be considered as general principles for answer set semantics? We address the two questions. First, it seems that the three aforementioned conditions may sometimes be too strong, and we illustrate with examples that enforcing them may exclude expected answer sets. Second, we evolve the Gelfond answer set (GAS) principles for answer set construction by refining the Gelfond's rationality principle to well-supportedness, minimality w.r.t. negation by default and minimality w.r.t. epistemic negation. The principle of well-supportedness guarantees that every answer set is constructible from if-then rules obeying a level mapping and is thus free of circular justification, while the two minimality principles ensure that the formalism minimizes knowledge both at the level of answer sets and of world views. Third, to embody the refined GAS principles, we extend the notion of well-supportedness substantially to answer sets and world views, respectively. Fourth, we define new answer set semantics in terms of the refined GAS principles. Fifth, we use the refined GAS principles as an alternative baseline to intuitively assess the existing answer set semantics. Finally, we analyze the computational complexity.

Updated: 2025-07-02 15:47:54

标题: 改进Gelfond的理性原则，朝着更全面的基础原则发展，用于答案集语义学

摘要: 非单调逻辑编程是一种被称为答案集编程（ASP）的声明性问题解决范式的基础。离开了1988年Gelfond和Lifschitz对简单正常逻辑程序的开创性定义，各种答案集语义已被提出用于扩展。我们考虑两个重要问题：（1）文献中定义的最小模型属性、约束单调性和建立性是否应该是一般答案集语义的强制条件？（2）如果不是，还有哪些属性可以作为答案集语义的一般原则？我们探讨这两个问题。首先，似乎这三个条件有时可能太过严格，我们通过例子说明，强制执行它们可能会排除预期的答案集。其次，我们通过将Gelfond的理性原则细化为良支持性、相对于默认否定和认知否定的最小性，演变了Gelfond答案集（GAS）原则用于答案集构建。良支持性原则确保每个答案集都可以从遵守级别映射的if-then规则中构建，因此不受循环论证的影响，而两个最小性原则确保了形式化在答案集和世界观的知识最小化。第三，为了体现细化的GAS原则，我们分别将良支持性的概念大幅扩展到答案集和世界观。第四，我们根据细化的GAS原则定义了新的答案集语义。第五，我们将细化的GAS原则作为一种直观评估现有答案集语义的替代基准。最后，我们分析了计算复杂性。

更新时间: 2025-07-02 15:47:54

领域: cs.AI

下载: http://arxiv.org/abs/2507.01833v1

Out-of-Distribution Detection Methods Answer the Wrong Questions

To detect distribution shifts and improve model safety, many out-of-distribution (OOD) detection methods rely on the predictive uncertainty or features of supervised models trained on in-distribution data. In this paper, we critically re-examine this popular family of OOD detection procedures, and we argue that these methods are fundamentally answering the wrong questions for OOD detection. There is no simple fix to this misalignment, since a classifier trained only on in-distribution classes cannot be expected to identify OOD points; for instance, a cat-dog classifier may confidently misclassify an airplane if it contains features that distinguish cats from dogs, despite generally appearing nothing alike. We find that uncertainty-based methods incorrectly conflate high uncertainty with being OOD, while feature-based methods incorrectly conflate far feature-space distance with being OOD. We show how these pathologies manifest as irreducible errors in OOD detection and identify common settings where these methods are ineffective. Additionally, interventions to improve OOD detection such as feature-logit hybrid methods, scaling of model and data size, epistemic uncertainty representation, and outlier exposure also fail to address this fundamental misalignment in objectives. We additionally consider unsupervised density estimation and generative models for OOD detection, which we show have their own fundamental limitations.

Updated: 2025-07-02 15:45:17

标题: Out-of-Distribution检测方法回答了错误的问题

摘要: 为了检测分布偏移并提高模型安全性，许多超出分布（OOD）检测方法依赖于在分布数据上训练的监督模型的预测不确定性或特征。在本文中，我们对这一流行的OOD检测程序家族进行了批判性重新审视，并认为这些方法从根本上回答了OOD检测的错误问题。这种不一致没有简单的解决方案，因为仅在分布类上训练的分类器无法预期识别OOD点；例如，如果一个猫狗分类器包含可以区分猫和狗的特征，尽管通常看起来完全不同，但它可能会自信地错误地将飞机分类为猫或狗。我们发现，基于不确定性的方法错误地将高不确定性与OOD混淆，而基于特征的方法错误地将远特征空间距离与OOD混淆。我们展示了这些病态如何在OOD检测中表现为不可消除的错误，并确定了这些方法无效的常见情况。此外，用于改进OOD检测的干预措施，如特征-逻辑混合方法、模型和数据规模的缩放、认识不确定性表示和异常值曝光也未能解决这一基本目标不一致。我们另外考虑了无监督密度估计和生成模型用于OOD检测，我们展示了它们自己的基本局限性。

更新时间: 2025-07-02 15:45:17

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.01831v1

mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

Edge devices for temporal processing demand models that capture both short- and long- range dynamics under tight memory constraints. While Transformers excel at sequence modeling, their quadratic memory scaling with sequence length makes them impractical for such settings. Recurrent Neural Networks (RNNs) offer constant memory but train sequentially, and Temporal Convolutional Networks (TCNs), though efficient, scale memory with kernel size. To address this, we propose mGRADE (mininally Gated Recurrent Architecture with Delay Embedding), a hybrid-memory system that integrates a temporal 1D-convolution with learnable spacings followed by a minimal gated recurrent unit (minGRU). This design allows the convolutional layer to realize a flexible delay embedding that captures rapid temporal variations, while the recurrent module efficiently maintains global context with minimal memory overhead. We validate our approach on two synthetic tasks, demonstrating that mGRADE effectively separates and preserves multi-scale temporal features. Furthermore, on challenging pixel-by-pixel image classification benchmarks, mGRADE consistently outperforms both pure convolutional and pure recurrent counterparts using approximately 20% less memory footprint, highlighting its suitability for memory-constrained temporal processing at the edge. This highlights mGRADE's promise as an efficient solution for memory-constrained multi-scale temporal processing at the edge.

Updated: 2025-07-02 15:44:35

标题: mGRADE: 轻量级序列建模的最小重复门控与延迟卷积

摘要: 边缘设备对于时间处理需求模型而言，需要捕捉到短期和长期动态，并且在严格的内存约束下实现。虽然Transformer在序列建模方面表现出色，但其随着序列长度呈二次增长的内存需求使其在这种情境下变得不切实际。递归神经网络（RNNs）提供恒定的内存，但是以顺序方式进行训练，而时间卷积网络（TCNs）虽然高效，但是随着核大小增加而增加内存占用。为了解决这个问题，我们提出了mGRADE（具有延迟嵌入的最小门控递归架构），这是一种混合内存系统，将具有可学习间距的时间1D卷积与最小门控递归单元（minGRU）相结合。这种设计使得卷积层能够实现一个灵活的延迟嵌入，捕捉快速的时间变化，而递归模块则能够以最小的内存开销高效地维护全局上下文。我们在两个合成任务上验证了我们的方法，展示mGRADE有效地分离和保留多尺度的时间特征。此外，在具有挑战性的逐像素图像分类基准测试中，mGRADE始终优于纯卷积和纯递归对应方法，内存占用约少20%，突出了其在内存受限的边缘时间处理中的适用性。这突显了mGRADE作为在边缘内存受限多尺度时间处理的高效解决方案的潜力。

更新时间: 2025-07-02 15:44:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01829v1

MILP-SAT-GNN: Yet Another Neural SAT Solver

We proposes a novel method that enables Graph Neural Networks (GNNs) to solve SAT problems by leveraging a technique developed for applying GNNs to Mixed Integer Linear Programming (MILP). Specifically, k-CNF formulae are mapped into MILP problems, which are then encoded as weighted bipartite graphs and subsequently fed into a GNN for training and testing. From a theoretical perspective: (i) we establish permutation and equivalence invariance results, demonstrating that the method produces outputs that are stable under reordering of clauses and variables; (ii) we identify a theoretical limitation, showing that for a class of formulae called foldable formulae, standard GNNs cannot always distinguish satisfiable from unsatisfiable instances; (iii) we prove a universal approximation theorem, establishing that with Random Node Initialization (RNI), the method can approximate SAT solving to arbitrary precision on finite datasets, that is, the GNN becomes approximately sound and complete on such datasets. Furthermore, we show that for unfoldable formulae, the same approximation guarantee can be achieved without the need for RNI. Finally, we conduct an experimental evaluation of our approach, which show that, despite the simplicity of the neural architecture, the method achieves promising results.

Updated: 2025-07-02 15:39:45

标题: MILP-SAT-GNN：又一个神经SAT求解器

摘要: 我们提出了一种新颖的方法，通过利用一种为将图神经网络（GNNs）应用于混合整数线性规划（MILP）而开发的技术，使GNNs能够解决SAT问题。具体而言，k-CNF公式被映射为MILP问题，然后被编码为加权二部图，随后被输入到GNN中进行训练和测试。从理论角度来看：（i）我们建立了排列和等价不变性结果，表明该方法产生的输出在与子句和变量重新排序时是稳定的；（ii）我们确定了一个理论限制，显示对于一类被称为可折叠公式的公式，标准GNNs不能总是区分可满足实例和不可满足实例；（iii）我们证明了一个通用逼近定理，建立了使用随机节点初始化（RNI），该方法可以在有限数据集上以任意精度逼近SAT求解，即，在这些数据集上，GNN在近似上是准确和完备的。此外，我们展示了对于不可折叠公式，无需使用RNI即可实现相同的逼近保证。最后，我们对我们的方法进行了实验评估，结果表明，尽管神经结构简单，该方法取得了有希望的结果。

更新时间: 2025-07-02 15:39:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01825v1

TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents

We present a novel approach to knowledge transfer in model-based reinforcement learning, addressing the critical challenge of deploying large world models in resource-constrained environments. Our method efficiently distills a high-capacity multi-task agent (317M parameters) into a compact model (1M parameters) on the MT30 benchmark, significantly improving performance across diverse tasks. Our distilled model achieves a state-of-the-art normalized score of 28.45, surpassing the original 1M parameter model score of 18.93. This improvement demonstrates the ability of our distillation technique to capture and consolidate complex multi-task knowledge. We further optimize the distilled model through FP16 post-training quantization, reducing its size by $\sim$50\%. Our approach addresses practical deployment limitations and offers insights into knowledge representation in large world models, paving the way for more efficient and accessible multi-task reinforcement learning systems in robotics and other resource-constrained applications. Code available at https://github.com/dmytro-kuzmenko/td-mpc-opt.

Updated: 2025-07-02 15:38:49

标题: TD-MPC-Opt: 提炼基于模型的多任务强化学习代理

摘要: 我们提出了一种新颖的知识转移方法，用于模型驱动的强化学习，解决了在资源受限环境中部署大型世界模型的关键挑战。我们的方法在MT30基准测试中将高容量多任务代理（317M参数）有效地精炼成一个紧凑模型（1M参数），显著提高了在各种任务中的性能。我们的精炼模型实现了一个28.45的最新标准化得分，超过了原始的1M参数模型得分18.93。这一改进表明了我们的精炼技术捕捉和 consoli复杂多任务知识的能力。我们进一步通过FP16后训练量化来优化精炼模型，将其尺寸减小约50％。我们的方法解决了实际部署的限制，并为大型世界模型中的知识表示提供了见解，为机器人和其他资源受限应用中更高效、更易访问的多任务强化学习系统铺平了道路。代码可在 https://github.com/dmytro-kuzmenko/td-mpc-opt 获取。

更新时间: 2025-07-02 15:38:49

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.01823v1

Deep Reinforcement Learning for Traveling Purchaser Problems

The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In sharp contrast, we propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately, while evaluating and optimizing the solution from a global perspective. The key components of our approach include a bipartite graph representation for TPPs to capture the market-product relations, and a policy network that extracts information from the bipartite graph and uses it to sequentially construct the route. One significant advantage of our framework is that we can efficiently construct the route using the policy network, and once the route is determined, the associated purchasing plan can be easily derived through linear programming, while, by leveraging DRL, we can train the policy network towards optimizing the global solution objective. Furthermore, by introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances, and generalize well across instances of varying sizes and distributions, even to much larger instances that are never seen during training. Experiments on various synthetic TPP instances and the TPPLIB benchmark demonstrate that our DRL-based approach can significantly outperform well-established TPP heuristics, reducing the optimality gap by 40%-90%, and also showing an advantage in runtime, especially on large-sized instances.

Updated: 2025-07-02 15:27:01

标题: 深度强化学习在旅行购买者问题中的应用

摘要: 旅行购买者问题（TPP）是一个具有广泛应用的重要组合优化问题。由于路由和购买之间的耦合，现有的TPP研究通常同时解决路线构建和购买计划，然而，这导致精确方法具有高计算成本，而启发式方法具有复杂设计但性能有限。与之形成鲜明对比的是，我们提出了一种基于深度强化学习（DRL）的新方法，该方法分别解决路线构建和购买计划，同时从全局视角评估和优化解决方案。我们方法的关键组成部分包括用于捕捉市场产品关系的TPP的二部图表示，以及从二部图中提取信息并用于顺序构建路线的策略网络。我们框架的一个显著优势是我们可以使用策略网络有效地构建路线，并且一旦确定了路线，相关的购买计划可以通过线性规划轻松推导出来，同时，通过利用DRL，我们可以训练策略网络以优化全局解决方案目标。此外，通过引入元学习策略，策略网络可以在大规模的TPP实例上稳定训练，并且在大小和分布不同的实例之间具有很好的泛化能力，甚至可以在训练期间从未见过的更大实例上表现良好。在各种合成TPP实例和TPPLIB基准测试上的实验表明，我们基于DRL的方法可以显著优于成熟的TPP启发式方法，将最优性差距降低40%-90%，并且在运行时间上也具有优势，特别是在大规模实例上。

更新时间: 2025-07-02 15:27:01

领域: math.OC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.02476v6

Empowering Manufacturers with Privacy-Preserving AI Tools: A Case Study in Privacy-Preserving Machine Learning to Solve Real-World Problems

Small- and medium-sized manufacturers need innovative data tools but, because of competition and privacy concerns, often do not want to share their proprietary data with researchers who might be interested in helping. This paper introduces a privacy-preserving platform by which manufacturers may safely share their data with researchers through secure methods, so that those researchers then create innovative tools to solve the manufacturers' real-world problems, and then provide tools that execute solutions back onto the platform for others to use with privacy and confidentiality guarantees. We illustrate this problem through a particular use case which addresses an important problem in the large-scale manufacturing of food crystals, which is that quality control relies on image analysis tools. Previous to our research, food crystals in the images were manually counted, which required substantial and time-consuming human efforts, but we have developed and deployed a crystal analysis tool which makes this process both more rapid and accurate. The tool enables automatic characterization of the crystal size distribution and numbers from microscope images while the natural imperfections from the sample preparation are automatically removed; a machine learning model to count high resolution translucent crystals and agglomeration of crystals was also developed to aid in these efforts. The resulting algorithm was then packaged for real-world use on the factory floor via a web-based app secured through the originating privacy-preserving platform, allowing manufacturers to use it while keeping their proprietary data secure. After demonstrating this full process, future directions are also explored.

Updated: 2025-07-02 15:25:43

标题: 利用保护隐私的人工智能工具赋予制造商力量：隐私保护机器学习案例研究解决现实世界问题

摘要: 中小型制造商需要创新的数据工具，但由于竞争和隐私问题，他们通常不愿与可能有兴趣帮助的研究人员分享自己的专有数据。本文介绍了一种隐私保护平台，制造商可以通过安全的方法安全地与研究人员分享他们的数据，以便这些研究人员创造创新工具来解决制造商的实际问题，然后将执行解决方案的工具返回到平台供其他人使用，并提供隐私和保密担保。我们通过一个特定的用例来说明这个问题，该用例涉及食品晶体的大规模制造中的一个重要问题，即质量控制依赖于图像分析工具。在我们的研究之前，图像中的食品晶体是手动计数的，这需要大量的耗时人力，但我们已经开发并部署了一个晶体分析工具，使这个过程既更快速又更准确。该工具能够从显微镜图像中自动表征晶体尺寸分布和数量，同时自动去除样品准备中的自然缺陷；还开发了一个机器学习模型，用于计算高分辨率的半透明晶体和晶体凝聚物，以帮助这些工作。然后，生成的算法通过一个通过原始隐私保护平台保护的基于网络的应用程序打包，用于在工厂车间进行实际使用，使制造商可以在保持其专有数据安全的同时使用它。在演示了这个完整过程之后，还探讨了未来的方向。

更新时间: 2025-07-02 15:25:43

领域: cs.CR,cs.AI,cs.CV,cs.ET,68T01, 68T05, 68T45, 94A60

下载: http://arxiv.org/abs/2507.01808v1

LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs

Low-Rank Adapters (LoRAs) have transformed the fine-tuning of Large Language Models (LLMs) by enabling parameter-efficient updates. However, their widespread adoption remains limited by the reliance on GPU-based training. In this work, we propose a theoretically grounded approach to LoRA fine-tuning designed specifically for users with limited computational resources, particularly those restricted to standard laptop CPUs. Our method learns a meta-operator that maps any input dataset, represented as a probability distribution, to a set of LoRA weights by leveraging a large bank of pre-trained adapters for the Mistral-7B-Instruct-v0.2 model. Instead of performing new gradient-based updates, our pipeline constructs adapters via lightweight combinations of existing LoRAs directly on CPU. While the resulting adapters do not match the performance of GPU-trained counterparts, they consistently outperform the base Mistral model on downstream tasks, offering a practical and accessible alternative to traditional GPU-based fine-tuning.

Updated: 2025-07-02 15:24:47

标题: LoRA无需GPU微调：一种面向LLM的CPU高效元生成框架

摘要: 低秩适配器（LoRAs）通过实现参数高效更新，已经改变了大型语言模型（LLMs）的微调。然而，它们广泛的应用仍然受限于依赖于基于GPU的训练。在这项工作中，我们提出了一个基于理论的LoRA微调方法，专门为那些计算资源有限的用户设计，特别是那些受限于标准笔记本电脑CPU的用户。我们的方法通过学习一个元算子，将任何输入数据集（表示为概率分布）映射到一组LoRA权重，利用了预训练适配器库对Mistral-7B-Instruct-v0.2模型进行了设计。我们的流程不是通过新的基于梯度的更新来构建适配器，而是通过直接在CPU上轻量级地组合现有的LoRAs来构建适配器。虽然所得到的适配器的性能不及经过GPU训练的对应模型，但它们在下游任务中始终优于基本的Mistral模型，为传统基于GPU的微调提供了一个实用且易于使用的替代方案。

更新时间: 2025-07-02 15:24:47

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2507.01806v1

Towards Decentralized and Sustainable Foundation Model Training with the Edge

Foundation models are at the forefront of AI research, appealing for their ability to learn from vast datasets and cater to diverse tasks. Yet, their significant computational demands raise issues of environmental impact and the risk of centralized control in their development. We put forward a vision towards decentralized and sustainable foundation model training that leverages the collective compute of sparingly used connected edge AI devices. We present the rationale behind our vision, particularly in support of its sustainability benefit. We further outline a set of challenges that need to be addressed to turn this vision into reality.

Updated: 2025-07-02 15:21:40

标题: 朝着分散化和可持续的边缘基础模型训练模式前进

摘要: 基础模型处于人工智能研究的前沿，因其能够从庞大数据集中学习并适用于多样的任务而备受青睐。然而，它们显著的计算需求引发了环境影响和开发过程中集中控制的风险。我们提出了一个关于去中心化和可持续基础模型训练的愿景，利用少数连接的边缘人工智能设备的集体计算。我们阐明了我们愿景背后的理念，特别是支持其可持续性益处。我们进一步概述了一系列需要解决的挑战，以将这一愿景变为现实。

更新时间: 2025-07-02 15:21:40

领域: cs.LG

下载: http://arxiv.org/abs/2507.01803v1

The Anatomy of Evidence: An Investigation Into Explainable ICD Coding

Automatic medical coding has the potential to ease documentation and billing processes. For this task, transparency plays an important role for medical coders and regulatory bodies, which can be achieved using explainability methods. However, the evaluation of these approaches has been mostly limited to short text and binary settings due to a scarcity of annotated data. Recent efforts by Cheng et al. (2023) have introduced the MDACE dataset, which provides a valuable resource containing code evidence in clinical records. In this work, we conduct an in-depth analysis of the MDACE dataset and perform plausibility evaluation of current explainable medical coding systems from an applied perspective. With this, we contribute to a deeper understanding of automatic medical coding and evidence extraction. Our findings reveal that ground truth evidence aligns with code descriptions to a certain degree. An investigation into state-of-the-art approaches shows a high overlap with ground truth evidence. We propose match measures and highlight success and failure cases. Based on our findings, we provide recommendations for developing and evaluating explainable medical coding systems.

Updated: 2025-07-02 15:21:29

标题: 证据的解剖学：可解释的ICD编码调查

摘要: 自动医疗编码具有简化文档和结算流程的潜力。对于医学编码员和监管机构来说，透明度在医学编码中起着重要作用，可以通过可解释性方法实现。然而，由于缺乏标注数据，这些方法的评估大多限于短文本和二进制设置。程等人（2023年）最近的努力引入了MDACE数据集，该数据集提供了包含临床记录中的编码证据的宝贵资源。在这项工作中，我们对MDACE数据集进行了深入分析，并从应用角度对当前可解释的医学编码系统进行了可信度评估。通过这样做，我们为深入了解自动医学编码和证据提取做出了贡献。我们的研究结果显示，事实证据与编码描述在一定程度上是一致的。对最先进方法的调查显示，与事实证据存在高度重叠。我们提出匹配度量，并突出成功和失败案例。根据我们的研究结果，我们提出了开发和评估可解释医学编码系统的建议。

更新时间: 2025-07-02 15:21:29

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.01802v1

Neural Entropy-stable conservative flux form neural networks for learning hyperbolic conservation laws

We propose a neural entropy-stable conservative flux form neural network (NESCFN) for learning hyperbolic conservation laws and their associated entropy functions directly from solution trajectories, without requiring any predefined numerical discretization. While recent neural network architectures have successfully integrated classical numerical principles into learned models, most rely on prior knowledge of the governing equations or assume a fixed discretization. Our approach removes this dependency by embedding entropy-stable design principles into the learning process itself, enabling the discovery of physically consistent dynamics in a fully data-driven setting. By jointly learning both the numerical flux function and a corresponding entropy, the proposed method ensures conservation and entropy dissipation, critical for long-term stability and fidelity in the system of hyperbolic conservation laws. Numerical results demonstrate that the method achieves stability and conservation over extended time horizons and accurately captures shock propagation speeds, even without oracle access to future-time solution profiles in the training data.

Updated: 2025-07-02 15:18:04

标题: 神经熵稳定的保守通量形式神经网络用于学习双曲守恒定律

摘要: 我们提出了一种神经熵稳定保守通量形式神经网络（NESCFN），用于直接从解轨迹中学习双曲守恒定律及其相关熵函数，而无需预定义任何数值离散化。尽管最近的神经网络架构成功地将经典数值原理整合到学习模型中，大多数依赖于对控制方程的先验知识或假定固定的离散化。我们的方法通过将熵稳定设计原则嵌入到学习过程本身中来消除这种依赖性，从而在完全数据驱动的设置中实现物理一致的动态发现。通过同时学习数值通量函数和相应的熵，所提出的方法确保了守恒和熵耗散，这对于双曲守恒定律系统的长期稳定性和忠实性至关重要。数值结果表明，该方法在扩展时间范围内实现了稳定性和守恒，并准确捕捉了冲击传播速度，甚至在训练数据中没有对未来时间解剖的oracle访问的情况下。

更新时间: 2025-07-02 15:18:04

领域: math.NA,cs.LG,cs.NA,math-ph,math.MP,65M08, 68T07, 65M22, 65M32, 65D25

下载: http://arxiv.org/abs/2507.01795v1

How Do Vision-Language Models Process Conflicting Information Across Modalities?

AI models are increasingly required to be multimodal, integrating disparate input streams into a coherent state representation on which subsequent behaviors and actions can be based. This paper seeks to understand how such models behave when input streams present conflicting information. Focusing specifically on vision-language models, we provide inconsistent inputs (e.g., an image of a dog paired with the caption "A photo of a cat") and ask the model to report the information present in one of the specific modalities (e.g., "What does the caption say / What is in the image?"). We find that models often favor one modality over the other, e.g., reporting the image regardless of what the caption says, but that different models differ in which modality they favor. We find evidence that the behaviorally preferred modality is evident in the internal representational structure of the model, and that specific attention heads can restructure the representations to favor one modality over the other. Moreover, we find modality-agnostic "router heads" which appear to promote answers about the modality requested in the instruction, and which can be manipulated or transferred in order to improve performance across datasets and modalities. Together, the work provides essential steps towards identifying and controlling if and how models detect and resolve conflicting signals within complex multimodal environments.

Updated: 2025-07-02 15:15:14

标题: 视觉-语言模型如何处理跨模态的信息冲突？

摘要: 人工智能模型越来越需要是多模态的，将不同的输入流整合成一个连贯的状态表示，基于这个表示进行后续的行为和动作。本文旨在探讨当输入流呈现冲突信息时这类模型的行为。特别关注视觉-语言模型，我们提供不一致的输入（例如，一张狗的图片配有标题“一张猫的照片”），并要求模型报告特定模态中的信息（例如，“标题显示什么/图片中有什么？”）。我们发现模型通常更偏向某一个模态，例如，无论标题说什么，都报告图片，但不同的模型在偏好的模态上有所不同。我们发现证据表明，模型的行为偏好模态在模型的内部表示结构中是明显的，而特定的注意力头可以重新构建表示以偏好某一模态。此外，我们发现模态无关的“路由头”，似乎促进对指令中请求的模态的答案，并且可以通过操作或转移来改善跨数据集和模态的性能。总的来说，这项工作提供了识别和控制模型如何检测和解决复杂多模态环境中的冲突信号的基本步骤。

更新时间: 2025-07-02 15:15:14

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.01790v1

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging

Vision transformers (ViTs) have rapidly gained prominence in medical imaging tasks such as disease classification, segmentation, and detection due to their superior accuracy compared to conventional deep learning models. However, due to their size and complex interactions via the self-attention mechanism, they are not well understood. In particular, it is unclear whether the representations produced by such models are semantically meaningful. In this paper, using a projected gradient-based algorithm, we show that their representations are not semantically meaningful and they are inherently vulnerable to small changes. Images with imperceptible differences can have very different representations; on the other hand, images that should belong to different semantic classes can have nearly identical representations. Such vulnerability can lead to unreliable classification results; for example, unnoticeable changes cause the classification accuracy to be reduced by over 60\%. %. To the best of our knowledge, this is the first work to systematically demonstrate this fundamental lack of semantic meaningfulness in ViT representations for medical image classification, revealing a critical challenge for their deployment in safety-critical systems.

Updated: 2025-07-02 15:14:06

标题: 视觉Transformer表示具有语义意义吗？医学影像中的案例研究

摘要: 视觉变换器（ViTs）在医学影像任务中，如疾病分类、分割和检测方面迅速崭露头角，其准确性优于传统深度学习模型。然而，由于其尺寸和通过自注意机制的复杂交互，人们对它们并不了解。特别是，目前尚不清楚这些模型产生的表示是否具有语义意义。本文使用基于投影梯度的算法，展示了它们的表示并不具有语义意义，并且固有地容易受到微小变化的影响。具有难以察觉差异的图像可能具有非常不同的表示；另一方面，应该属于不同语义类别的图像可能具有几乎相同的表示。这种脆弱性可能导致不可靠的分类结果；例如，不明显的变化导致分类准确率降低超过60％。据我们所知，这是首次系统地展示了ViT在医学图像分类中表示缺乏语义意义的基本问题，揭示了在安全关键系统中部署它们所面临的重要挑战。

更新时间: 2025-07-02 15:14:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.01788v1

SpikeNAS: A Fast Memory-Aware Neural Architecture Search Framework for Spiking Neural Network-based Embedded AI Systems

Embedded AI systems are expected to incur low power/energy consumption for solving machine learning tasks, as these systems are usually power constrained (e.g., object recognition task in autonomous mobile agents with portable batteries). These requirements can be fulfilled by Spiking Neural Networks (SNNs), since their bio-inspired spike-based operations offer high accuracy and ultra low-power/energy computation. Currently, most of SNN architectures are derived from Artificial Neural Networks whose neurons' architectures and operations are different from SNNs, and/or developed without considering memory budgets from the underlying processing hardware of embedded platforms. These limitations hinder SNNs from reaching their full potential in accuracy and efficiency. Toward this, we propose SpikeNAS, a novel fast memory-aware neural architecture search (NAS) framework for SNNs that quickly finds an appropriate SNN architecture with high accuracy under the given memory budgets from targeted embedded systems. To do this, our SpikeNAS employs several key steps: analyzing the impacts of network operations on the accuracy, enhancing the network architecture to improve the learning quality, developing a fast memory-aware search algorithm, and performing quantization. The experimental results show that our SpikeNAS improves the searching time and maintains high accuracy compared to state-of-the-art while meeting the given memory budgets (e.g., 29x, 117x, and 3.7x faster search for CIFAR10, CIFAR100, and TinyImageNet200 respectively, using an Nvidia RTX A6000 GPU machine), thereby quickly providing the appropriate SNN architecture for the memory-constrained embedded AI systems.

Updated: 2025-07-02 15:13:42

标题: SpikeNAS：用于脉冲神经网络嵌入式人工智能系统的快速内存感知神经架构搜索框架

摘要: 嵌入式AI系统被期望在解决机器学习任务时具有低功耗/能耗，因为这些系统通常受到功耗限制（例如，自主移动机器人中的物体识别任务使用便携式电池）。这些要求可以通过脉冲神经网络（SNNs）来实现，因为它们的生物启发式基于脉冲的操作提供了高准确性和超低功率/能耗计算。目前，大多数SNN架构是从人工神经网络派生而来的，这些神经元的架构和操作与SNNs不同，或者是在不考虑嵌入式平台底层处理硬件的内存预算的情况下开发的。这些限制阻碍了SNNs在准确性和效率方面发挥其全部潜力。为此，我们提出了SpikeNAS，一个新颖的快速内存感知神经架构搜索（NAS）框架，用于SNNs，可以快速找到在给定的内存预算下具有高准确性的SNN架构。为了做到这一点，我们的SpikeNAS采用了几个关键步骤：分析网络操作对准确性的影响，改进网络架构以提高学习质量，开发快速内存感知搜索算法，并执行量化。实验证明，我们的SpikeNAS改善了搜索时间，并保持了与最新技术相比的高准确性，同时满足了给定的内存预算（例如，使用Nvidia RTX A6000 GPU机器，对于CIFAR10、CIFAR100和TinyImageNet200分别快29倍、117倍和3.7倍搜索），从而迅速为内存受限的嵌入式AI系统提供适当的SNN架构。

更新时间: 2025-07-02 15:13:42

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.11322v4

Probing Evaluation Awareness of Language Models

Language models can distinguish between testing and deployment phases -- a capability known as evaluation awareness. This has significant safety and policy implications, potentially undermining the reliability of evaluations that are central to AI governance frameworks and voluntary industry commitments. In this paper, we study evaluation awareness in Llama-3.3-70B-Instruct. We show that linear probes can separate real-world evaluation and deployment prompts, suggesting that current models internally represent this distinction. We also find that current safety evaluations are correctly classified by the probes, suggesting that they already appear artificial or inauthentic to models. Our findings underscore the importance of ensuring trustworthy evaluations and understanding deceptive capabilities. More broadly, our work showcases how model internals may be leveraged to support blackbox methods in safety audits, especially for future models more competent at evaluation awareness and deception.

Updated: 2025-07-02 15:12:43

标题: 探究语言模型对评估的认识

摘要: 语言模型可以区分测试和部署阶段——这一能力被称为评估意识。这对安全和政策具有重大影响，可能会削弱AI治理框架和自愿行业承诺的可靠性。在本文中，我们研究了Llama-3.3-70B-Instruct中的评估意识。我们展示了线性探针可以区分真实世界的评估和部署提示，表明当前模型在内部表示这种区别。我们还发现当前的安全评估被探针正确分类，表明它们对模型来说已经显得人为或不真实。我们的发现强调了确保可信赖的评估和理解欺骗能力的重要性。更广泛地说，我们的工作展示了模型内部如何可以被利用来支持安全审计中的黑匣子方法，特别是对于更擅长评估意识和欺骗的未来模型。

更新时间: 2025-07-02 15:12:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.01786v1

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

Data quality is a critical driver of large language model performance, yet existing model-based selection methods focus almost exclusively on English. We introduce MuRating, a scalable framework that transfers high-quality English data-quality signals into a single rater for 17 target languages. MuRating aggregates multiple English "raters" via pairwise comparisons to learn unified document-quality scores,then projects these judgments through translation to train a multilingual evaluator on monolingual, cross-lingual, and parallel text pairs. Applied to web data, MuRating selects balanced subsets of English and multilingual content to pretrain a 1.2 B-parameter LLaMA model. Compared to strong baselines, including QuRater, AskLLM, DCLM and so on, our approach boosts average accuracy on both English benchmarks and multilingual evaluations, with especially large gains on knowledge-intensive tasks. We further analyze translation fidelity, selection biases, and underrepresentation of narrative material, outlining directions for future work.

Updated: 2025-07-02 15:11:12

标题: MuRating：一种高质量的数据选择方法，用于多语言大型语言模型预训练.

摘要: 数据质量是大型语言模型性能的关键驱动因素，然而现有的基于模型的选择方法几乎完全专注于英语。我们引入了MuRating，这是一个可扩展的框架，将高质量的英语数据质量信号转移到17种目标语言的单个评分者中。MuRating通过两两比较聚合多个英语“评分者”以学习统一的文档质量分数，然后通过翻译将这些判断投影到单语、跨语言和平行文本对上，训练一个多语言评估器。应用于网络数据时，MuRating选择平衡的英语和多语种内容子集，用于预训练一个拥有1.2 B参数的LLaMA模型。与强基线方法（包括QuRater、AskLLM、DCLM等）相比，我们的方法提高了英语基准和多语言评估的平均准确性，尤其在知识密集型任务上获得了较大的收益。我们进一步分析了翻译的忠实度、选择偏见和叙事材料的低代表性，概述了未来工作的方向。

更新时间: 2025-07-02 15:11:12

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01785v1

BranchNet: A Neuro-Symbolic Learning Framework for Structured Multi-Class Classification

We introduce BranchNet, a neuro-symbolic learning framework that transforms decision tree ensembles into sparse, partially connected neural networks. Each branch, defined as a decision path from root to a parent of leaves, is mapped to a hidden neuron, preserving symbolic structure while enabling gradient-based optimization. The resulting models are compact, interpretable, and require no manual architecture tuning. Evaluated on a suite of structured multi-class classification benchmarks, BranchNet consistently outperforms XGBoost in accuracy, with statistically significant gains. We detail the architecture, training procedure, and sparsity dynamics, and discuss the model's strengths in symbolic interpretability as well as its current limitations, particularly on binary tasks where further adaptive calibration may be beneficial.

Updated: 2025-07-02 15:07:58

标题: BranchNet：用于结构化多类分类的神经符号学习框架

摘要: 我们引入了BranchNet，这是一个神经符号学习框架，它将决策树集成转化为稀疏、部分连接的神经网络。每个分支，定义为从根到叶子的父节点的决策路径，被映射到一个隐藏神经元，保留了符号结构同时实现了基于梯度的优化。结果模型紧凑、可解释，并且不需要手动架构调整。在一系列结构化的多类别分类基准测试中，BranchNet 在准确度上一直优于 XGBoost，并且获得了显著的统计学增益。我们详细描述了架构、训练过程和稀疏动态，并讨论了模型在符号可解释性方面的优势以及目前的限制，特别是在二进制任务上，进一步的自适应校准可能会有益。

更新时间: 2025-07-02 15:07:58

领域: cs.LG,cs.AI,68T07 (Primary) 62H30, 68T05 (Secondary)

下载: http://arxiv.org/abs/2507.01781v1

Beating Transformers using Synthetic Cognition

The road to Artificial General Intelligence goes through the generation of context-aware reactive behaviors, where the Transformer architecture has been proven to be the state-of-the-art. However, they still fail to develop reasoning. Recently, a novel approach for developing cognitive architectures, called Synthetic Cognition, has been proposed and implemented to develop instantaneous reactive behavior. In this study, we aim to explore the use of Synthetic Cognition to develop context-aware reactive behaviors. We propose a mechanism to deal with sequences for the recent implementation of Synthetic Cognition, and test it against DNA foundation models in DNA sequence classification tasks. In our experiments, our proposal clearly outperforms the DNA foundation models, obtaining the best score on more benchmark tasks than the alternatives. Thus, we achieve two goals: expanding Synthetic Cognition to deal with sequences, and beating the Transformer architecture for sequence classification.

Updated: 2025-07-02 15:07:35

标题: 使用合成认知击败变形金刚

摘要: 通往人工通用智能的道路经过生成具有上下文感知反应行为，其中变压器架构已被证明是最先进的。然而，它们仍然无法发展推理。最近，提出并实施了一种用于开发认知架构的新方法，称为合成认知，以开发即时反应行为。在这项研究中，我们旨在探索使用合成认知来开发具有上下文感知的反应行为。我们提出了一种处理序列的机制，用于最近实施的合成认知，并在DNA序列分类任务中将其与DNA基础模型进行测试。在我们的实验中，我们的提案明显优于DNA基础模型，在更多基准任务上获得了最佳分数。因此，我们实现了两个目标：扩展合成认知以处理序列，并在序列分类中击败变压器架构。

更新时间: 2025-07-02 15:07:35

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.07619v3

World-aware Planning Narratives Enhance Large Vision-Language Model Planner

Large Vision-Language Models (LVLMs) show promise for embodied planning tasks but struggle with complex scenarios involving unfamiliar environments and multi-step goals. Current approaches rely on environment-agnostic imitation learning that disconnects instructions from environmental contexts, causing models to struggle with context-sensitive instructions and rely on supplementary cues rather than visual reasoning during long-horizon interactions. In this work, we propose World-Aware Planning Narrative Enhancement (WAP), a framework that infuses LVLMs with comprehensive environmental understanding through four cognitive capabilities (visual appearance modeling, spatial reasoning, functional abstraction, and syntactic grounding) while developing and evaluating models using only raw visual observations through curriculum learning. Evaluations on the EB-ALFRED benchmark demonstrate substantial improvements, with Qwen2.5-VL achieving a 60.7 absolute improvement in task success rates, particularly in commonsense reasoning (+60.0) and long-horizon planning (+70.0). Notably, our enhanced open-source models outperform proprietary systems like GPT-4o and Claude-3.5-Sonnet by a large margin.

Updated: 2025-07-02 15:03:26

标题: 世界感知规划叙事增强大规模视觉语言模型规划者

摘要: 大型视觉语言模型（LVLMs）在具体规划任务中表现出潜力，但在涉及陌生环境和多步目标的复杂场景中很难应对。当前方法依赖于与环境无关的模仿学习，使指令与环境背景脱节，导致模型在处理依赖上下文的指令时遇到困难，并在长期互动过程中依赖补充线索而非视觉推理。在这项工作中，我们提出了World-Aware Planning Narrative Enhancement（WAP）框架，通过四种认知能力（视觉外观建模、空间推理、功能抽象和句法基础）向LVLMs注入全面的环境理解，同时使用课程学习仅使用原始视觉观察来开发和评估模型。在EB-ALFRED基准测试中的评估显示出显著改进，Qwen2.5-VL的任务成功率绝对改进了60.7％，特别是在常识推理（+60.0）和长期规划（+70.0）方面。值得注意的是，我们增强的开源模型在很大程度上胜过专有系统，如GPT-4o和Claude-3.5-Sonnet。

更新时间: 2025-07-02 15:03:26

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2506.21230v2

Look-Back: Implicit Visual Re-focusing in MLLM Reasoning

Multimodal Large Language Models (MLLMs) have achieved remarkable progress in multimodal reasoning. However, they often excessively rely on textual information during the later stages of inference, neglecting the crucial integration of visual input. Current methods typically address this by explicitly injecting visual information to guide the reasoning process. In this work, through an analysis of MLLM attention patterns, we made an intriguing observation: with appropriate guidance, MLLMs can spontaneously re-focus their attention on visual inputs during the later stages of reasoning, even without explicit visual information injection. This spontaneous shift in focus suggests that MLLMs are intrinsically capable of performing visual fusion reasoning. Building on this insight, we introduce Look-Back, an implicit approach designed to guide MLLMs to ``look back" at visual information in a self-directed manner during reasoning. Look-Back empowers the model to autonomously determine when, where, and how to re-focus on visual inputs, eliminating the need for explicit model-structure constraints or additional input. We demonstrate that Look-Back significantly enhances the model's reasoning and perception capabilities, as evidenced by extensive empirical evaluations on multiple multimodal benchmarks.

Updated: 2025-07-02 14:59:35

标题: 回顾：在MLLM推理中的隐性视觉重新聚焦

摘要: 多模态大型语言模型（MLLMs）在多模态推理方面取得了显著进展。然而，在推理的后期阶段，它们往往过度依赖文本信息，忽视了视觉输入的重要整合。当前的方法通常通过明确注入视觉信息来指导推理过程来解决这个问题。在这项工作中，通过分析MLLM注意力模式，我们做出了一个有趣的观察：在适当的指导下，MLLM可以在推理的后期阶段自发地重新聚焦视觉输入，即使没有明确注入视觉信息。这种自发的注意力转移表明MLLM本质上能够执行视觉融合推理。基于这一观点，我们引入了Look-Back，这是一种设计用于在推理过程中以自主方式“回看”视觉信息的隐式方法。Look-Back使模型能够自主确定何时、何地以及如何重新聚焦视觉输入，从而消除了对明确模型结构约束或额外输入的需求。我们证明Look-Back显著提升了模型的推理和感知能力，这得到了在多个多模态基准测试上进行的广泛实证评估的证明。

更新时间: 2025-07-02 14:59:35

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.03019v1

Signals and Symptoms: ICS Attack Dataset From Railway Cyber Range

The prevalence of cyberattacks on Industrial Control Systems (ICS) has highlighted the necessity for robust security measures and incident response to protect critical infrastructure. This is prominent when Operational Technology (OT) systems undergo digital transformation by integrating with Information Technology (IT) systems to enhance operational efficiency, adaptability, and safety. To support analysts in staying abreast of emerging attack patterns, there is a need for ICS datasets that reflect indicators representative of contemporary cyber threats. To address this, we conduct two ICS cyberattack simulations to showcase the impact of trending ICS cyberattacks on a railway cyber range that resembles the railway infrastructure. The attack scenario is designed to blend trending attack trends with attack patterns observed from historical ICS incidents. The resulting evidence is collected as datasets, serving as an essential resource for cyberattack analysis. This captures key indicators that are relevant to the current threat landscape, augmenting the effectiveness of security systems and analysts to protect against ICS cyber threats.

Updated: 2025-07-02 14:47:31

标题: 信号和症状：铁路网络攻击数据集

摘要: 工业控制系统（ICS）遭受网络攻击的普遍现象凸显了必须采取强大的安全措施和事件响应来保护关键基础设施的重要性。特别是在运营技术（OT）系统通过与信息技术（IT）系统集成来增强运营效率、适应性和安全性的数字转型过程中。为了支持分析人员紧跟新兴攻击模式的发展，ICs数据集需要反映当代网络威胁的指标。为了解决这个问题，我们进行了两次ICS网络攻击模拟，展示了趋势性ICS网络攻击对类似铁路基础设施的铁路网络的影响。攻击场景旨在将趋势性攻击趋势与从历史ICS事件中观察到的攻击模式相结合。结果作为数据集进行收集，作为网络攻击分析的重要资源。这捕获了与当前威胁格局相关的关键指标，增强了安全系统和分析人员保护免受ICS网络威胁的效力。

更新时间: 2025-07-02 14:47:31

领域: cs.CR

下载: http://arxiv.org/abs/2507.01768v1

Retrieving snow depth distribution by downscaling ERA5 Reanalysis with ICESat-2 laser altimetry

Estimating the variability of seasonal snow cover, in particular snow depth in remote areas, poses significant challenges due to limited spatial and temporal data availability. This study uses snow depth measurements from the ICESat-2 satellite laser altimeter, which are sparse in both space and time, and incorporates them with climate reanalysis data into a downscaling-calibration scheme to produce monthly gridded snow depth maps at microscale (10 m). Snow surface elevation measurements from ICESat-2 along profiles are compared to a digital elevation model to determine snow depth at each point. To efficiently turn sparse measurements into snow depth maps, a regression model is fitted to establish a relationship between the retrieved snow depth and the corresponding ERA5 Land snow depth. This relationship, referred to as subgrid variability, is then applied to downscale the monthly ERA5 Land snow depth data. The method can provide timeseries of monthly snow depth maps for the entire ERA5 time range (since 1950). The validation of downscaled snow depth data was performed at an intermediate scale (100 m x 500 m) using datasets from airborne laser scanning (ALS) in the Hardangervidda region of southern Norway. Results show that snow depth prediction achieved R2 values ranging from 0.74 to 0.88 (post-calibration). The method relies on globally available data and is applicable to other snow regions above the treeline. Though requiring area-specific calibration, our approach has the potential to provide snow depth maps in areas where no such data exist and can be used to extrapolate existing snow surveys in time and over larger areas. With this, it can offer valuable input data for hydrological, ecological or permafrost modeling tasks.

Updated: 2025-07-02 14:46:19

标题: 用ICESat-2激光测高数据缩小ERA5再分析的范围，检索雪深分布

摘要: 估算季节性积雪覆盖的变化性，特别是偏远地区的积雪深度，由于空间和时间数据的有限性而面临重大挑战。本研究利用ICESat-2卫星激光测高计的积雪深度测量数据，这些数据在空间和时间上都是稀疏的，并将其与气候再分析数据结合起来，通过一种下采样-校准方案生成微尺度（10米）的月度格网积雪深度图。利用ICESat-2的积雪表面高程测量数据沿轨迹与数字高程模型进行比较，以确定每个点的积雪深度。为了高效地将稀疏测量转化为积雪深度图，拟合回归模型以建立检索的积雪深度与对应ERA5 Land积雪深度之间的关系。这种关系，称为子格变异性，然后被应用于对月度ERA5 Land积雪深度数据进行下采样。该方法可以为整个ERA5时间范围（自1950年以来）提供月度积雪深度图的时间序列。对下采样后的积雪深度数据进行了中等尺度（100米×500米）的验证，使用来自挪威南部Hardangervidda地区空中激光扫描（ALS）的数据集。结果显示，积雪深度预测实现了R2值在0.74至0.88之间（校准后）。该方法依赖于全球可用数据，并适用于树线以上的其他积雪地区。尽管需要区域特定的校准，我们的方法有潜力在不存在此类数据的地区提供积雪深度图，并可用于在时间和范围更广的地区推断现有的积雪调查。通过这种方式，它可以为水文、生态或多年冻土建模任务提供有价值的输入数据。

更新时间: 2025-07-02 14:46:19

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2410.17934v2

Long-Context Linear System Identification

This paper addresses the problem of long-context linear system identification, where the state $x_t$ of a dynamical system at time $t$ depends linearly on previous states $x_s$ over a fixed context window of length $p$. We establish a sample complexity bound that matches the i.i.d. parametric rate up to logarithmic factors for a broad class of systems, extending previous works that considered only first-order dependencies. Our findings reveal a learning-without-mixing phenomenon, indicating that learning long-context linear autoregressive models is not hindered by slow mixing properties potentially associated with extended context windows. Additionally, we extend these results to (i) shared low-rank representations, where rank-regularized estimators improve the dependence of the rates on the dimensionality, and (ii) misspecified context lengths in strictly stable systems, where shorter contexts offer statistical advantages.

Updated: 2025-07-02 14:45:46

标题: 长上下文线性系统辨识

摘要: 本文讨论了长上下文线性系统识别的问题，其中动力系统在时间t的状态x_t线性依赖于固定长度为p的先前状态x_s。我们建立了一个样本复杂度界限，该界限与广泛类别的系统的i.i.d.参数速率相匹配，扩展了先前仅考虑一阶依赖性的工作。我们的研究结果揭示了一种学习无混合现象，表明学习长上下文线性自回归模型不受与扩展上下文窗口相关的潜在缓慢混合性质的阻碍。此外，我们将这些结果扩展到（i）共享低秩表示，其中秩正则化估计改善了速率对维度的依赖性，以及（ii）在严格稳定系统中上下文长度错误，其中较短的上下文提供了统计优势。

更新时间: 2025-07-02 14:45:46

领域: stat.ML,cs.LG,cs.SY,eess.SY,math.ST,stat.TH

下载: http://arxiv.org/abs/2410.05690v2

Interact2Vec -- An efficient neural network-based model for simultaneously learning users and items embeddings in recommender systems

Over the past decade, recommender systems have experienced a surge in popularity. Despite notable progress, they grapple with challenging issues, such as high data dimensionality and sparseness. Representing users and items as low-dimensional embeddings learned via neural networks has become a leading solution. However, while recent studies show promising results, many approaches rely on complex architectures or require content data, which may not always be available. This paper presents Interact2Vec, a novel neural network-based model that simultaneously learns distributed embeddings for users and items while demanding only implicit feedback. The model employs state-of-the-art strategies that natural language processing models commonly use to optimize the training phase and enhance the final embeddings. Two types of experiments were conducted regarding the extrinsic and intrinsic quality of the model. In the former, we benchmarked the recommendations generated by Interact2Vec's embeddings in a top-$N$ ranking problem, comparing them with six other recommender algorithms. The model achieved the second or third-best results in 30% of the datasets, being competitive with other recommenders, and has proven to be very efficient with an average training time reduction of 274% compared to other embedding-based models. Later, we analyzed the intrinsic quality of the embeddings through similarity tables. Our findings suggest that Interact2Vec can achieve promising results, especially on the extrinsic task, and is an excellent embedding-generator model for scenarios of scarce computing resources, enabling the learning of item and user embeddings simultaneously and efficiently.

Updated: 2025-07-02 14:43:13

标题: Interact2Vec -- 一种用于在推荐系统中同时学习用户和项目嵌入的高效神经网络模型

摘要: 在过去的十年里，推荐系统经历了一波流行。尽管取得了显著进展，但它们仍然面临着诸如高数据维度和稀疏性等挑战性问题。通过将用户和项目表示为通过神经网络学习的低维嵌入已成为一种主导解决方案。然而，尽管最近的研究显示了令人鼓舞的结果，许多方法依赖于复杂的架构或需要内容数据，这些数据并不总是可用的。本文介绍了Interact2Vec，这是一种基于神经网络的新型模型，它同时学习用户和项目的分布式嵌入，而且只需隐式反馈。该模型采用了自然语言处理模型常用的最先进策略，以优化训练阶段并增强最终的嵌入。关于模型的外在和内在质量进行了两种类型的实验。在前者中，我们在一个top-$N$排名问题中对Interact2Vec的嵌入生成的推荐进行了基准测试，并与其他六种推荐算法进行了比较。该模型在30%的数据集中取得了第二或第三好的结果，在竞争中与其他推荐系统相当，并且已经被证明非常有效，平均训练时间缩短了274%。之后，我们通过相似性表分析了嵌入的内在质量。我们的发现表明，Interact2Vec可以取得令人期待的结果，特别是在外在任务中，并且是一种出色的嵌入生成模型，适用于计算资源稀缺的情况，同时能够同时高效地学习项目和用户嵌入。

更新时间: 2025-07-02 14:43:13

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2506.22648v2

Improving Consistency Models with Generator-Augmented Flows

Consistency models imitate the multi-step sampling of score-based diffusion in a single forward pass of a neural network. They can be learned in two ways: consistency distillation and consistency training. The former relies on the true velocity field of the corresponding differential equation, approximated by a pre-trained neural network. In contrast, the latter uses a single-sample Monte Carlo estimate of this velocity field. The related estimation error induces a discrepancy between consistency distillation and training that, we show, still holds in the continuous-time limit. To alleviate this issue, we propose a novel flow that transports noisy data towards their corresponding outputs derived from a consistency model. We prove that this flow reduces the previously identified discrepancy and the noise-data transport cost. Consequently, our method not only accelerates consistency training convergence but also enhances its overall performance. The code is available at: https://github.com/thibautissenhuth/consistency_GC.

Updated: 2025-07-02 14:42:54

标题: 用生成器增强流提高一致性模型

摘要: 一致性模型在神经网络的单次前向传递中模拟基于分数的扩散的多步采样。它们可以通过两种方式学习：一致性蒸馏和一致性训练。前者依赖于相应微分方程的真实速度场，由预先训练的神经网络近似。相反，后者使用对该速度场的单样本蒙特卡罗估计。相关的估计误差导致一致性蒸馏和训练之间存在差异，我们表明这种差异在连续时间极限下仍然存在。为了缓解这个问题，我们提出了一种新颖的流动，将嘈杂的数据传输到其从一致性模型派生的相应输出。我们证明这种流动减少了先前确定的差异和噪声数据传输成本。因此，我们的方法不仅加速了一致性训练的收敛，还提高了其整体性能。代码可在以下位置找到：https://github.com/thibautissenhuth/consistency_GC。

更新时间: 2025-07-02 14:42:54

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.09570v4

Enhanced Generative Model Evaluation with Clipped Density and Coverage

Although generative models have made remarkable progress in recent years, their use in critical applications has been hindered by their incapacity to reliably evaluate sample quality. Quality refers to at least two complementary concepts: fidelity and coverage. Current quality metrics often lack reliable, interpretable values due to an absence of calibration or insufficient robustness to outliers. To address these shortcomings, we introduce two novel metrics, Clipped Density and Clipped Coverage. By clipping individual sample contributions and, for fidelity, the radii of nearest neighbor balls, our metrics prevent out-of-distribution samples from biasing the aggregated values. Through analytical and empirical calibration, these metrics exhibit linear score degradation as the proportion of poor samples increases. Thus, they can be straightforwardly interpreted as equivalent proportions of good samples. Extensive experiments on synthetic and real-world datasets demonstrate that Clipped Density and Clipped Coverage outperform existing methods in terms of robustness, sensitivity, and interpretability for evaluating generative models.

Updated: 2025-07-02 14:40:00

标题: 使用截断密度和覆盖度提升生成模型评估

摘要: 尽管生成模型在近年取得了显著进展，但它们在关键应用中的使用受到了影响，因为它们无法可靠地评估样本质量。质量至少涉及两个互补的概念：保真度和覆盖率。目前的质量度量常常缺乏可靠的、可解释的值，因为缺乏校准或对异常值的不足鲁棒性。为解决这些缺陷，我们引入了两个新颖的度量指标，即Clipped Density和Clipped Coverage。通过剪切单个样本的贡献以及对于保真度，最近邻球的半径，我们的度量指标可以防止超出分布范围的样本影响到聚合值。通过分析和实证校准，这些度量指标在质量较差的样本比例增加时呈现线性得分下降。因此，它们可以直观地解释为良好样本的等价比例。在合成和真实数据集上的广泛实验表明，Clipped Density和Clipped Coverage在评估生成模型时在鲁棒性、敏感性和可解释性方面优于现有方法。

更新时间: 2025-07-02 14:40:00

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2507.01761v1

AirRadar: Inferring Nationwide Air Quality in China with Deep Neural Networks

Monitoring real-time air quality is essential for safeguarding public health and fostering social progress. However, the widespread deployment of air quality monitoring stations is constrained by their significant costs. To address this limitation, we introduce \emph{AirRadar}, a deep neural network designed to accurately infer real-time air quality in locations lacking monitoring stations by utilizing data from existing ones. By leveraging learnable mask tokens, AirRadar reconstructs air quality features in unmonitored regions. Specifically, it operates in two stages: first capturing spatial correlations and then adjusting for distribution shifts. We validate AirRadar's efficacy using a year-long dataset from 1,085 monitoring stations across China, demonstrating its superiority over multiple baselines, even with varying degrees of unobserved data. The source code can be accessed at https://github.com/CityMind-Lab/AirRadar.

Updated: 2025-07-02 14:38:26

标题: AirRadar：利用深度神经网络推断中国全国范围内的空气质量

摘要: 监测实时空气质量对于保障公共健康和促进社会进步至关重要。然而，空气质量监测站的广泛部署受到其昂贵成本的限制。为解决这一限制，我们引入了AirRadar，这是一个深度神经网络，旨在通过利用现有监测站的数据，准确推断缺乏监测站的地点的实时空气质量。通过利用可学习的掩码标记，AirRadar在未监测到的区域重建空气质量特征。具体来说，它分为两个阶段：首先捕捉空间相关性，然后调整分布偏移。我们利用来自中国1,085个监测站的为期一年的数据集验证了AirRadar的有效性，证明了其优于多个基准线，即使存在不同程度的未观察到的数据。源代码可以在https://github.com/CityMind-Lab/AirRadar上访问。

更新时间: 2025-07-02 14:38:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2501.13141v2

Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs

Despite progress in Vision-Language Models (VLMs), their capacity for visual reasoning is often limited by the \textit{binding problem}: the failure to reliably associate perceptual features with their correct visual referents. This limitation underlies persistent errors in tasks such as counting, visual search, scene description, and spatial relationship understanding. A key factor is that current VLMs process visual features largely in parallel, lacking mechanisms for spatially grounded, serial attention. This paper introduces a simple yet effective intervention: augmenting visual inputs with low-level spatial structures (e.g., horizontal lines) and pairing this with a textual prompt that encourages sequential, spatially-aware parsing. We empirically demonstrate substantial performance improvements across core visual reasoning tasks. Specifically, our method improves GPT-4o visual search accuracy by 25.00%, increases counting accuracy by 26.83%, reduces edit distance error in scene description by 0.32, and enhances performance on spatial relationship tasks by 9.50% on a a 2D synthetic dataset. Furthermore, we find that the visual modification is essential for these gains; purely textual strategies, including Chain-of-Thought prompting, are insufficient and can even degrade performance. Our method enhances binding only with a single-query inference, underscoring the importance of visual input design over purely linguistically-based approaches. These findings suggest that low-level visual structuring is a powerful and underexplored direction for improving compositional visual reasoning and could serve as a general strategy for enhancing VLM performance on spatially grounded tasks.

Updated: 2025-07-02 14:31:49

标题: 视觉结构有助于视觉推理：解决VLM中的绑定问题

摘要: 尽管视觉语言模型（VLMs）取得了进展，但它们在视觉推理方面的能力通常受到“绑定问题”的限制：无法可靠地将感知特征与其正确的视觉指称物联系起来。这种限制导致了在诸如计数、视觉搜索、场景描述和空间关系理解等任务中持续存在的错误。一个关键因素是当前的VLMs主要并行处理视觉特征，缺乏基于空间的序列关注机制。本文介绍了一种简单但有效的干预方法：通过低级空间结构（例如水平线）增强视觉输入，并将其与鼓励顺序、空间感知解析的文本提示配对。我们在核心视觉推理任务中实证地展示了显著的性能改进。具体来说，我们的方法将GPT-4o的视觉搜索准确性提高了25.00%，计数准确性提高了26.83%，场景描述中的编辑距离误差减少了0.32，2D合成数据集上空间关系任务的性能提高了9.50%。此外，我们发现视觉修改对于这些收益至关重要；纯文本策略，包括思维链提示，是不够的，甚至可能降低性能。我们的方法仅通过单查询推断增强了绑定，强调了视觉输入设计在纯粹基于语言的方法之上的重要性。这些发现表明，低级视觉结构化是改善组合视觉推理的一个强大且未被充分探索的方向，可以作为增强VLM在基于空间的任务上性能的一般策略。

更新时间: 2025-07-02 14:31:49

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.22146v2

Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training

Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, its reliance on large volumes of labeled data raises privacy and security concerns such as susceptibility to data poisoning attacks and the risk of overfitting. In contrast, black box optimization methods, which treat the model as an opaque function, relying solely on function evaluations to guide optimization, offer a promising alternative in scenarios where data access is restricted, adversarial risks are high, or overfitting is a concern. However, black box methods also pose significant challenges, including poor scalability to high-dimensional parameter spaces, as prevalent in large language models (LLMs), and high computational costs due to reliance on numerous model evaluations. This paper introduces BBoxER, an evolutionary black-box method for LLM post-training that induces an information bottleneck via implicit compression of the training data. Leveraging the tractability of information flow, we provide strong theoretical bounds on generalization, differential privacy, susceptibility to data poisoning attacks, and robustness to extraction attacks. BBoxER operates on top of pre-trained LLMs, offering a lightweight and modular enhancement suitable for deployment in restricted or privacy-sensitive environments, in addition to non-vacuous generalization guarantees. In experiments with LLMs, we demonstrate empirically that Retrofitting methods are able to learn, showing how a few iterations of BBoxER improve performance and generalize well on a benchmark of reasoning datasets. This positions BBoxER as an attractive add-on on top of gradient-based optimization.

Updated: 2025-07-02 14:29:30

标题: 不偷看调整：LLM训练后的可证明隐私和概括界限

摘要: 基于梯度的优化是深度学习的主要工具，通过反向传播提供高效和可扩展的训练。然而，它对大量标记数据的依赖引发了隐私和安全问题，如对数据毒化攻击的敏感性和过拟合的风险。相比之下，黑盒优化方法将模型视为不透明函数，仅依靠函数评估来引导优化，在数据访问受限、对抗风险较高或过拟合是一个问题的情况下，提供了一种有前途的替代方案。然而，黑盒方法也面临着重要挑战，包括在大语言模型（LLMs）中普遍存在的高维参数空间的扩展性差以及由于依赖大量模型评估而导致的高计算成本。本文介绍了BBoxER，一种用于LLM后训练的进化黑盒方法，通过对训练数据的隐式压缩诱导信息瓶颈。利用信息流的可处理性，我们对泛化、差分隐私、对数据毒化攻击的敏感性以及对提取攻击的稳健性提供了强有力的理论界限。BBoxER在预训练的LLMs之上运行，提供了一种轻量级且模块化的增强方法，适用于部署在受限或隐私敏感环境中，并提供非空泛的泛化保证。在LLMs的实验中，我们通过实验证明，Retrofitting方法能够学习，展示了BBoxER的几次迭代如何改善性能并在推理数据集基准上表现良好。这将BBoxER定位为梯度优化的有吸引力的附加组件。

更新时间: 2025-07-02 14:29:30

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2507.01752v1

Joint Matching and Pricing for Crowd-shipping with In-store Customers

This paper examines the use of in-store customers as delivery couriers in a centralized crowd-shipping system, targeting the growing need for efficient last-mile delivery in urban areas. We consider a brick-and-mortar retail setting where shoppers are offered compensation to deliver time-sensitive online orders. To manage this process, we propose a Markov Decision Process (MDP) model that captures key uncertainties, including the stochastic arrival of orders and crowd-shippers, and the probabilistic acceptance of delivery offers. Our solution approach integrates Neural Approximate Dynamic Programming (NeurADP) for adaptive order-to-shopper assignment with a Deep Double Q-Network (DDQN) for dynamic pricing. This joint optimization strategy enables multi-drop routing and accounts for offer acceptance uncertainty, aligning more closely with real-world operations. Experimental results demonstrate that the integrated NeurADP + DDQN policy achieves notable improvements in delivery cost efficiency, with up to 6.7\% savings over NeurADP with fixed pricing and approximately 18\% over myopic baselines. We also show that allowing flexible delivery delays and enabling multi-destination routing further reduces operational costs by 8\% and 17\%, respectively. These findings underscore the advantages of dynamic, forward-looking policies in crowd-shipping systems and offer practical guidance for urban logistics operators.

Updated: 2025-07-02 14:27:32

标题: 众包运输中的联合匹配和定价与门店顾客

摘要: 这篇论文研究了在中心化众包运输系统中使用实体店内顾客作为送货员，以满足城市地区对高效末端配送的日益增长需求。我们考虑了一个实体零售环境，顾客可以获得补偿来送货时间敏感的在线订单。为了管理这一流程，我们提出了一个马尔可夫决策过程（MDP）模型，捕捉了一些关键不确定性，包括订单和众包送货员的随机到达，以及对送货报价的概率接受。我们的解决方案集成了神经近似动态规划（NeurADP）用于自适应订单分配给送货员，以及深度双Q网络（DDQN）用于动态定价。这种联合优化策略实现了多点路由，并考虑了送货报价接受不确定性，更贴近实际运营。实验结果表明，集成的NeurADP + DDQN策略在送货成本效率方面取得了显著改善，比NeurADP固定定价节省了高达6.7％，比短视基线节省了约18％。我们还展示了允许灵活的送货延迟和启用多目的地路由进一步分别降低了运营成本8％和17％。这些发现强调了在众包运输系统中采用动态、前瞻性政策的优势，并为城市物流运营商提供了实用指导。

更新时间: 2025-07-02 14:27:32

领域: cs.AI

下载: http://arxiv.org/abs/2507.01749v1

Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks

In remote control systems, transmitting large data volumes (e.g., images, video frames) from wireless sensors to remote controllers is challenging when uplink capacity is limited (e.g., RedCap devices or massive wireless sensor networks). Furthermore, controllers often need only information-rich representations of the original data. To address this, we propose a semantic-driven predictive control combined with a channel-aware scheduling to enhance control performance for multiple devices under limited network capacity. At its core, the proposed framework, coined Time-Series Joint Embedding Predictive Architecture (TS-JEPA), encodes high-dimensional sensory data into low-dimensional semantic embeddings at the sensor, reducing communication overhead. Furthermore, TS-JEPA enables predictive inference by predicting future embeddings from current ones and predicted commands, which are directly used by a semantic actor model to compute control commands within the embedding space, eliminating the need to reconstruct raw data. To further enhance reliability and communication efficiency, a channel-aware scheduling is integrated to dynamically prioritize device transmissions based on channel conditions and age of information (AoI). Simulations on inverted cart-pole systems show that the proposed framework significantly outperforms conventional control baselines in communication efficiency, control cost, and predictive accuracy. It enables robust and scalable control under limited network capacity compared to traditional scheduling schemes.

Updated: 2025-07-02 14:24:17

标题: 时间序列JEPA在受容量限制网络下的预测远程控制

摘要: 在远程控制系统中，当上行容量受限（例如RedCap设备或大规模无线传感器网络）时，从无线传感器向远程控制器传输大数据量（例如图像、视频帧）是具有挑战性的。此外，控制器通常仅需要原始数据的信息丰富表示。为了解决这个问题，我们提出了一种语义驱动的预测控制结合通道感知调度，以增强在有限网络容量下多个设备的控制性能。在其核心，所提出的框架，被称为时间序列联合嵌入预测架构（TS-JEPA），将高维感知数据编码为传感器处的低维语义嵌入，从而降低通信开销。此外，TS-JEPA通过从当前嵌入和预测命令中预测未来嵌入，使预测推理成为可能，这些预测结果直接由语义行为者模型用于在嵌入空间内计算控制命令，消除了重建原始数据的需要。为了进一步提高可靠性和通信效率，我们集成了一个基于通道条件和信息年龄（AoI）动态优先级设备传输的通道感知调度。倒立摆系统的仿真结果表明，所提出的框架在通信效率、控制成本和预测准确性方面明显优于传统的控制基线。与传统调度方案相比，它在有限网络容量下实现了稳健且可扩展的控制。

更新时间: 2025-07-02 14:24:17

领域: cs.IT,cs.LG,cs.RO,math.IT

下载: http://arxiv.org/abs/2406.04853v2

Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an estimator to extract compact sets of decision rules from tree ensembles. The extracted models are accurate and can be manually examined to reveal relationships between the predictors and the response. A key novelty of our estimator is the flexibility to jointly control the number of rules extracted and the interaction depth of each rule, which improves accuracy. We develop a tailored exact algorithm to efficiently solve optimization problems underlying our estimator and an approximate algorithm for computing regularization paths, sequences of solutions that correspond to varying model sizes. We also establish novel non-asymptotic prediction error bounds for our proposed approach, comparing it to an oracle that chooses the best data-dependent linear combination of the rules in the ensemble subject to the same complexity constraint as our estimator. The bounds illustrate that the large-sample predictive performance of our estimator is on par with that of the oracle. Through experiments, we demonstrate that our estimator outperforms existing algorithms for rule extraction.

Updated: 2025-07-02 14:20:00

标题: 从树集成中提取可解释模型：计算和统计观点

摘要: 树集成是一种非参数方法，广泛被认为具有高准确性和捕捉复杂交互作用的能力。虽然这些模型在预测方面表现出色，但它们很难解释，并且可能无法发现数据中有用的关系。我们提出了一种估计器，从树集成中提取紧凑的决策规则集。提取的模型准确度高，可以手动检查以揭示预测因子和响应之间的关系。我们估计器的一个关键创新点是同时控制提取的规则数量和每个规则的交互深度，从而提高准确性。我们开发了一个定制的精确算法来高效解决我们估计器底层的优化问题，以及一个用于计算正则路径的近似算法，这些路径代表不同模型大小对应的解序列。我们还为我们提出的方法建立了新颖的非渐近预测误差界限，将其与选择最佳数据相关线性组合的规则的神谕进行比较，条件是与我们估计器相同的复杂性约束。这些界限说明我们估计器的大样本预测性能与神谕相当。通过实验，我们证明我们的估计器优于现有的规则提取算法。

更新时间: 2025-07-02 14:20:00

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2506.20114v2

Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation

Deep neural networks for medical image segmentation often produce overconfident results misaligned with empirical observations. Such miscalibration, challenges their clinical translation. We propose to use marginal L1 average calibration error (mL1-ACE) as a novel auxiliary loss function to improve pixel-wise calibration without compromising segmentation quality. We show that this loss, despite using hard binning, is directly differentiable, bypassing the need for approximate but differentiable surrogate or soft binning approaches. Our work also introduces the concept of dataset reliability histograms which generalises standard reliability diagrams for refined visual assessment of calibration in semantic segmentation aggregated at the dataset level. Using mL1-ACE, we reduce average and maximum calibration error by 45% and 55% respectively, maintaining a Dice score of 87% on the BraTS 2021 dataset. We share our code here: https://github.com/cai4cai/ACE-DLIRIS

Updated: 2025-07-02 14:15:56

标题: 平均校准误差：一种可微损失函数，用于改进图像分割的可靠性

摘要: 深度神经网络用于医学图像分割通常会产生过于自信的结果，与实证观察不一致。这种不正确的校准挑战了它们的临床转化。我们提出使用边际L1平均校准误差（mL1-ACE）作为一种新颖的辅助损失函数，以改善像素级校准而不影响分割质量。我们展示了尽管使用硬分箱，但这种损失是直接可微的，无需使用近似但可微的替代或软分箱方法。我们的工作还引入了数据集可靠性直方图的概念，该概念泛化了用于在数据集级别聚合的语义分割中细化校准的标准可靠性图。使用mL1-ACE，我们将平均校准误差和最大校准误差分别降低了45%和55%，同时在BraTS 2021数据集上保持了87%的Dice分数。我们在这里分享我们的代码：https://github.com/cai4cai/ACE-DLIRIS

更新时间: 2025-07-02 14:15:56

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.06759v2

ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

In this paper, we present details of the 1st W-CODA workshop, held in conjunction with the ECCV 2024. W-CODA aims to explore next-generation solutions for autonomous driving corner cases, empowered by state-of-the-art multimodal perception and comprehension techniques. 5 Speakers from both academia and industry are invited to share their latest progress and opinions. We collect research papers and hold a dual-track challenge, including both corner case scene understanding and generation. As the pioneering effort, we will continuously bridge the gap between frontier autonomous driving techniques and fully intelligent, reliable self-driving agents robust towards corner cases.

Updated: 2025-07-02 14:10:25

标题: ECCV 2024 W-CODA：自动驾驶中角落案例的多模态感知和理解第一次研讨会

摘要: 在本文中，我们介绍了第一届W-CODA研讨会的细节，该研讨会与ECCV 2024会议同时举行。W-CODA旨在探索下一代自动驾驶边界案例的解决方案，借助最先进的多模态感知和理解技术。我们邀请了来自学术界和工业界的5位演讲者分享他们的最新进展和意见。我们收集研究论文并举办双轨挑战，包括边界案例场景理解和生成。作为开创性工作，我们将不断缩小前沿自动驾驶技术与完全智能、可靠的自动驾驶代理之间的差距，使其能够应对边界案例。

更新时间: 2025-07-02 14:10:25

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.01735v1

Leveraging Genetic Algorithms for Efficient Demonstration Generation in Real-World Reinforcement Learning Environments

Reinforcement Learning (RL) has demonstrated significant potential in certain real-world industrial applications, yet its broader deployment remains limited by inherent challenges such as sample inefficiency and unstable learning dynamics. This study investigates the utilization of Genetic Algorithms (GAs) as a mechanism for improving RL performance in an industrially inspired sorting environment. We propose a novel approach in which GA-generated expert demonstrations are used to enhance policy learning. These demonstrations are incorporated into a Deep Q-Network (DQN) replay buffer for experience-based learning and utilized as warm-start trajectories for Proximal Policy Optimization (PPO) agents to accelerate training convergence. Our experiments compare standard RL training with rule-based heuristics, brute-force optimization, and demonstration data, revealing that GA-derived demonstrations significantly improve RL performance. Notably, PPO agents initialized with GA-generated data achieved superior cumulative rewards, highlighting the potential of hybrid learning paradigms, where heuristic search methods complement data-driven RL. The utilized framework is publicly available and enables further research into adaptive RL strategies for real-world applications.

Updated: 2025-07-02 14:06:48

标题: 利用遗传算法在现实世界的强化学习环境中高效生成演示

摘要: 强化学习（RL）在某些现实世界的工业应用中展示出了显著的潜力，但由于样本效率低和学习动态不稳定等固有挑战，其更广泛的部署仍受限制。本研究调查了遗传算法（GAs）作为改进工业启发式排序环境中RL性能的机制。我们提出了一种新颖的方法，其中GA生成的专家演示用于增强策略学习。这些演示被整合到深度Q网络（DQN）回放缓冲区中进行基于经验的学习，并被用作Proximal Policy Optimization（PPO）代理的热启动轨迹以加速训练收敛。我们的实验比较了标准RL训练与基于规则的启发式、蛮力优化和演示数据，揭示了GA生成的演示显著改善了RL性能。值得注意的是，使用GA生成数据初始化的PPO代理实现了更高的累积奖励，突显了启发式搜索方法与数据驱动RL相辅相成的混合学习范式的潜力。所使用的框架是公开可用的，并促进了进一步研究适用于现实世界应用的自适应RL策略。

更新时间: 2025-07-02 14:06:48

领域: cs.LG

下载: http://arxiv.org/abs/2507.00762v2

Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach

This letter proposes UniToCom, a unified token communication paradigm that treats tokens as the fundamental units for both processing and wireless transmission. Specifically, to enable efficient token representations, we propose a generative information bottleneck (GenIB) principle, which facilitates the learning of tokens that preserve essential information while supporting reliable generation across multiple modalities. By doing this, GenIB-based tokenization is conducive to improving the communication efficiency and reducing computational complexity. Additionally, we develop $\sigma$-GenIB to address the challenges of variance collapse in autoregressive modeling, maintaining representational diversity and stability. Moreover, we employ a causal Transformer-based multimodal large language model (MLLM) at the receiver to unify the processing of both discrete and continuous tokens under the next-token prediction paradigm. Simulation results validate the effectiveness and superiority of the proposed UniToCom compared to baselines under dynamic channel conditions. By integrating token processing with MLLMs, UniToCom enables scalable and generalizable communication in favor of multimodal understanding and generation, providing a potential solution for next-generation intelligent communications.

Updated: 2025-07-02 14:03:01

标题: 大模型时代的令牌通信：基于信息瓶颈的方法

摘要: 这封信提出了UniToCom，一个统一的令牌通信范式，将令牌视为处理和无线传输的基本单位。具体地，为了实现高效的令牌表示，我们提出了一个生成性信息瓶颈（GenIB）原则，有助于学习保留基本信息并支持跨多种模态可靠生成的令牌。通过这样做，基于GenIB的令牌化有助于提高通信效率并降低计算复杂性。此外，我们开发了$\sigma$-GenIB来解决自回归建模中的方差坍缩挑战，保持表示多样性和稳定性。此外，我们在接收端使用因果Transformer基础的多模态大语言模型（MLLM）来统一处理离散和连续令牌在下一个令牌预测范式下的处理。模拟结果验证了所提出的UniToCom相对于基线在动态通道条件下的有效性和优越性。通过将令牌处理与MLLM结合，UniToCom实现了可扩展和通用的通信，有利于多模态理解和生成，为下一代智能通信提供了潜在解决方案。

更新时间: 2025-07-02 14:03:01

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2507.01728v1

Blockchain Address Poisoning

In many blockchains, e.g., Ethereum, Binance Smart Chain (BSC), the primary representation used for wallet addresses is a hardly memorable 40-digit hexadecimal string. As a result, users often select addresses from their recent transaction history, which enables blockchain address poisoning. The adversary first generates lookalike addresses similar to one with which the victim has previously interacted, and then engages with the victim to ``poison'' their transaction history. The goal is to have the victim mistakenly send tokens to the lookalike address, as opposed to the intended recipient. Compared to contemporary studies, this paper provides four notable contributions. First, we develop a detection system and perform measurements over two years on both Ethereum and BSC. We identify 13~times more attack attempts than reported previously -- totaling 270M on-chain attacks targeting 17M victims. 6,633 incidents have caused at least 83.8M USD in losses, which makes blockchain address poisoning one of the largest cryptocurrency phishing schemes observed in the wild. Second, we analyze a few large attack entities using improved clustering techniques, and model attacker profitability and competition. Third, we reveal attack strategies -- targeted populations, success conditions (address similarity, timing), and cross-chain attacks. Fourth, we mathematically define and simulate the lookalike address generation process across various software- and hardware-based implementations, and identify a large-scale attacker group that appears to use GPUs. We also discuss defensive countermeasures.

Updated: 2025-07-02 14:02:16

标题: 区块链地址污染

摘要: 在许多区块链系统中，例如以太坊、币安智能链（BSC），钱包地址的主要表示形式是一个难以记忆的40位十六进制字符串。因此，用户经常从他们最近的交易历史中选择地址，这导致了区块链地址污染。攻击者首先生成类似于受害者先前与之交互的地址，然后与受害者互动以“污染”其交易历史。其目的是让受害者错误地向类似地址发送代币，而不是预期的接收方。与现有研究相比，本文提供了四个显著贡献。首先，我们开发了一个检测系统，并在两年内对以太坊和BSC进行了测量。我们发现攻击尝试次数比先前报道的多13倍，总共有2.7亿次链上攻击，针对1700万受害者。6633起事件至少造成了8380万美元的损失，这使得区块链地址污染成为野外观察到的最大的加密货币钓鱼计划之一。其次，我们使用改进的聚类技术分析了一些大型攻击实体，并模拟了攻击者的盈利能力和竞争。第三，我们揭示了攻击策略--目标人口、成功条件（地址相似性、时机）和跨链攻击。第四，我们在各种基于软件和硬件的实现中数学地定义和模拟了类似地址生成过程，并确定了一个似乎使用GPU的大规模攻击者组。我们还讨论了防御对策。

更新时间: 2025-07-02 14:02:16

领域: cs.CR

下载: http://arxiv.org/abs/2501.16681v3

Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models

Despite the outstanding performance in vision-language reasoning, Large Vision-Language Models (LVLMs) might generate hallucinated contents that do not exist in the given image. Most existing LVLM hallucination benchmarks are constrained to evaluate the object-related hallucinations. However, the potential hallucination on the relations between two objects, i.e., relation hallucination, still lacks investigation. To remedy that, we design a unified framework to measure the object and relation hallucination in LVLMs simultaneously. The core idea of our framework is to evaluate hallucinations via (object, relation, object) triplets extracted from LVLMs' responses, making it easily generalizable to different vision-language tasks. Based on our framework, we further introduce Tri-HE, a novel Triplet-level Hallucination Evaluation benchmark which can be used to study both object and relation hallucination at the same time. With comprehensive evaluations on Tri-HE, we observe that the relation hallucination issue is even more serious than object hallucination among existing LVLMs, highlighting a previously neglected problem towards reliable LVLMs. Moreover, based on our findings, we design a simple training-free approach that effectively mitigates hallucinations for LVLMs. Our dataset and code for the reproduction of our experiments are available publicly at https://github.com/wujunjie1998/Tri-HE.

Updated: 2025-07-02 14:02:12

标题: 大规模视觉语言模型的统一三元水平幻觉评估

摘要: 尽管大型视觉语言模型（LVLMs）在视觉语言推理方面表现出色，但可能会生成在给定图像中不存在的幻觉内容。大多数现有的LVLM幻觉基准都受限于评估与对象相关的幻觉。然而，两个对象之间关系的潜在幻觉，即关系幻觉，仍然缺乏研究。为了解决这个问题，我们设计了一个统一的框架，同时测量LVLM中的对象和关系幻觉。我们框架的核心思想是通过从LVLM响应中提取的（对象、关系、对象）三元组来评估幻觉，使其易于推广到不同的视觉语言任务。基于我们的框架，我们进一步引入了Tri-HE，一个新颖的三元幻觉评估基准，可用于同时研究对象和关系幻觉。通过对Tri-HE进行全面评估，我们观察到在现有LVLM中，关系幻觉问题甚至比对象幻觉更严重，突显了一个以前被忽视的问题，即可靠的LVLM。此外，基于我们的发现，我们设计了一个简单的无需训练的方法，有效缓解LVLM的幻觉。我们用于重现实验的数据集和代码可在https://github.com/wujunjie1998/Tri-HE 上公开获取。

更新时间: 2025-07-02 14:02:12

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.23114v3

NGAT: A Node-level Graph Attention Network for Long-term Stock Prediction

Graph representation learning methods have been widely adopted in financial applications to enhance company representations by leveraging inter-firm relationships. However, current approaches face three key challenges: (1) The advantages of relational information are obscured by limitations in downstream task designs; (2) Existing graph models specifically designed for stock prediction often suffer from excessive complexity and poor generalization; (3) Experience-based construction of corporate relationship graphs lacks effective comparison of different graph structures. To address these limitations, we propose a long-term stock prediction task and develop a Node-level Graph Attention Network (NGAT) specifically tailored for corporate relationship graphs. Furthermore, we experimentally demonstrate the limitations of existing graph comparison methods based on model downstream task performance. Experimental results across two datasets consistently demonstrate the effectiveness of our proposed task and model. The project is publicly available on GitHub to encourage reproducibility and future research.

Updated: 2025-07-02 13:59:46

标题: NGAT: 用于长期股票预测的节点级图注意力网络

摘要: 图表示学习方法已被广泛应用于金融应用中，通过利用公司间的关系来增强公司表示。然而，当前方法面临三个关键挑战：(1) 关系信息的优势被下游任务设计的限制所掩盖；(2) 现有的专门设计用于股票预测的图模型往往存在过度复杂和泛化能力差的问题；(3) 基于经验构建公司关系图缺乏对不同图结构的有效比较。为了解决这些限制，我们提出了一个长期的股票预测任务，并开发了一个专门针对公司关系图的节点级图注意力网络(NGAT)。此外，我们在模型下游任务性能的基础上实验性地展示了现有图比较方法的局限性。在两个数据集上的实验结果一致地展示了我们提出的任务和模型的有效性。该项目已在GitHub上公开，以促进可重现性和未来研究。

更新时间: 2025-07-02 13:59:46

领域: q-fin.ST,cs.AI,cs.LG,I.2.1

下载: http://arxiv.org/abs/2507.02018v1

Revisiting Learning Rate Control

The learning rate is one of the most important hyperparameters in deep learning, and how to control it is an active area within both AutoML and deep learning research. Approaches for learning rate control span from classic optimization to online scheduling based on gradient statistics. This paper compares paradigms to assess the current state of learning rate control. We find that methods from multi-fidelity hyperparameter optimization, fixed-hyperparameter schedules, and hyperparameter-free learning often perform very well on selected deep learning tasks but are not reliable across settings. This highlights the need for algorithm selection methods in learning rate control, which have been neglected so far by both the AutoML and deep learning communities. We also observe a trend of hyperparameter optimization approaches becoming less effective as models and tasks grow in complexity, even when combined with multi-fidelity approaches for more expensive model trainings. A focus on more relevant test tasks and new promising directions like finetunable methods and meta-learning will enable the AutoML community to significantly strengthen its impact on this crucial factor in deep learning.

Updated: 2025-07-02 13:58:38

标题: 重新审视学习率控制

摘要: 学习率是深度学习中最重要的超参数之一，如何控制学习率是AutoML和深度学习研究中活跃的领域。学习率控制的方法包括经典优化和基于梯度统计的在线调度。本文比较了不同范式以评估学习率控制的当前状态。我们发现，来自多精度超参数优化、固定超参数计划和无超参数学习的方法在选定的深度学习任务中表现很好，但在不同设置下不可靠。这凸显了在学习率控制中需要算法选择方法，目前AutoML和深度学习社区都忽视了这一点。我们还观察到，随着模型和任务的复杂性增加，超参数优化方法变得越来越不那么有效，即使与更昂贵的模型训练的多精度方法相结合。聚焦于更相关的测试任务和新的有前途的方向，如微调方法和元学习，将使AutoML社区能够显著增强其对深度学习中这一关键因素的影响力。

更新时间: 2025-07-02 13:58:38

领域: cs.LG

下载: http://arxiv.org/abs/2507.01724v1

15,500 Seconds: Lean UAV Classification Leveraging PEFT and Pre-Trained Networks

Unmanned Aerial Vehicles (UAVs) pose an escalating security concerns as the market for consumer and military UAVs grows. This paper address the critical data scarcity challenges in deep UAV audio classification. We build upon our previous work expanding novel approaches such as: parameter efficient fine-tuning, data augmentation, and pre-trained networks. We achieve performance upwards of 95\% validation accuracy with EfficientNet-B0.

Updated: 2025-07-02 13:52:59

标题: 15,500秒：利用PEFT和预训练网络进行精益无人机分类

摘要: 无人机（UAV）随着消费者和军用无人机市场的增长，带来了不断升级的安全隐患。本文讨论了在深度无人机音频分类中面临的关键数据稀缺挑战。我们在之前的工作基础上，采用了新颖的方法，如参数高效微调、数据增强和预训练网络。我们使用EfficientNet-B0实现了高达95%的验证准确率。

更新时间: 2025-07-02 13:52:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.11049v2

Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks

Large Language Models (LLMs) have proven immensely beneficial in education by capturing vast amounts of literature-based information, allowing them to generate context without relying on external sources. In this paper, we propose a generative AI-powered GATE question-answering framework (GATE stands for Graduate Aptitude Test in Engineering) that leverages LLMs to explain GATE solutions and support students in their exam preparation. We conducted extensive benchmarking to select the optimal embedding model and LLM, evaluating our framework based on criteria such as latency, faithfulness, and relevance, with additional validation through human evaluation. Our chatbot integrates state-of-the-art embedding models and LLMs to deliver accurate, context-aware responses. Through rigorous experimentation, we identified configurations that balance performance and computational efficiency, ensuring a reliable chatbot to serve students' needs. Additionally, we discuss the challenges faced in data processing and modeling and implemented solutions. Our work explores the application of Retrieval-Augmented Generation (RAG) for GATE Q/A explanation tasks, and our findings demonstrate significant improvements in retrieval accuracy and response quality. This research offers practical insights for developing effective AI-driven educational tools while highlighting areas for future enhancement in usability and scalability.

Updated: 2025-07-02 13:52:49

标题: 朝着高效的教育聊天机器人：基准测试RAG框架

摘要: 大型语言模型（LLMs）已经被证明在教育领域有着巨大的益处，它们通过捕获大量基于文献的信息，能够生成上下文而不依赖外部来源。在本文中，我们提出了一个利用LLMs解释GATE解决方案并支持学生考试准备的生成式AI驱动的GATE问答框架（GATE代表工程研究生入学考试）。我们进行了广泛的基准测试，选择了最佳的嵌入模型和LLM，并根据延迟、忠实度和相关性等标准评估了我们的框架，通过人工评估进行了额外验证。我们的聊天机器人集成了最先进的嵌入模型和LLMs，以提供准确、上下文感知的回复。通过严格的实验，我们确定了平衡性能和计算效率的配置，确保可靠的聊天机器人满足学生的需求。此外，我们讨论了数据处理和建模中面临的挑战，并实施了解决方案。我们的工作探讨了用于GATE问答解释任务的检索增强生成（RAG）的应用，并我们的发现表明在检索准确性和响应质量方面取得了显著改进。这项研究为开发有效的AI驱动教育工具提供了实用见解，同时突出了在可用性和可扩展性方面的未来增强领域。

更新时间: 2025-07-02 13:52:49

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2503.00781v2

Towards culturally-appropriate conversational AI for health in the majority world: An exploratory study with citizens and professionals in Latin America

There is justifiable interest in leveraging conversational AI (CAI) for health across the majority world, but to be effective, CAI must respond appropriately within culturally and linguistically diverse contexts. Therefore, we need ways to address the fact that current LLMs exclude many lived experiences globally. Various advances are underway which focus on top-down approaches and increasing training data. In this paper, we aim to complement these with a bottom-up locally-grounded approach based on qualitative data collected during participatory workshops in Latin America. Our goal is to construct a rich and human-centred understanding of: a) potential areas of cultural misalignment in digital health; b) regional perspectives on chatbots for health and c)strategies for creating culturally-appropriate CAI; with a focus on the understudied Latin American context. Our findings show that academic boundaries on notions of culture lose meaning at the ground level and technologies will need to engage with a broader framework; one that encapsulates the way economics, politics, geography and local logistics are entangled in cultural experience. To this end, we introduce a framework for 'Pluriversal Conversational AI for Health' which allows for the possibility that more relationality and tolerance, rather than just more data, may be called for.

Updated: 2025-07-02 13:48:25

标题: 朝着适合大多数地区健康对话人工智能的文化：与拉丁美洲公民和专业人士的探索性研究

摘要: 在跨越大多数国家的文化和语言多样性的背景下，利用对话人工智能（CAI）在健康领域具有合理的兴趣，但为了发挥其效果，CAI必须在不同文化和语言环境中做出适当的回应。因此，我们需要解决当前LLMs排除了全球许多生活经验的事实。各种进展正在进行中，重点是从自上而下的方法和增加训练数据。在本文中，我们旨在通过基于拉丁美洲参与性研讨会收集的定性数据，结合自下而上的当地立足方法，以补充这些进展。我们的目标是构建一个丰富和以人为中心的理解：a）数字健康中潜在的文化错位领域；b）健康聊天机器人的区域视角；c）创建文化适宜的CAI策略，重点关注研究不足的拉丁美洲背景。我们的研究结果表明，关于文化概念的学术边界在基层失去了意义，技术将需要与更广泛的框架进行互动；一个能够涵盖经济、政治、地理和当地物流与文化体验交织在一起的方式。为此，我们介绍了一个“多元对话人工智能健康”框架，它允许更多的关系性和宽容性，而不仅仅是更多的数据可能会被要求。

更新时间: 2025-07-02 13:48:25

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2507.01719v1

A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation

Advancements in image segmentation play an integral role within the broad scope of Deep Learning-based Computer Vision. Furthermore, their widespread applicability in critical real-world tasks has resulted in challenges related to the reliability of such algorithms. Hence, uncertainty quantification has been extensively studied within this context, enabling the expression of model ignorance (epistemic uncertainty) or data ambiguity (aleatoric uncertainty) to prevent uninformed decision-making. Due to the rapid adoption of Convolutional Neural Network (CNN)-based segmentation models in high-stake applications, a substantial body of research has been published on this very topic, causing its swift expansion into a distinct field. This work provides a comprehensive overview of probabilistic segmentation, by discussing fundamental concepts of uncertainty quantification, governing advancements in the field as well as the application to various tasks. Moreover, literature on both types of uncertainties trace back to four key applications: (1) to quantify statistical inconsistencies in the annotation process due ambiguous images, (2) correlating prediction error with uncertainty, (3) expanding the model hypothesis space for better generalization, and (4) Active Learning. An extensive discussion follows that includes an overview of utilized datasets for each of the applications and evaluation of the available methods. We also highlight challenges related to architectures, uncertainty quantification methods, standardization and benchmarking, and finally end with recommendations for future work such as methods based on single forward passes and models that appropriately leverage volumetric data.

Updated: 2025-07-02 13:47:36

标题: 深度概率图像分割中贝叶斯不确定性量化的综述

摘要: 图像分割的进展在基于深度学习的计算机视觉领域中发挥着重要作用。此外，它们在关键实际任务中的广泛应用导致了与这类算法可靠性相关的挑战。因此，在这种背景下，不确定性量化得到了广泛研究，使得能够表达模型的无知（认知不确定性）或数据的模糊性（随机不确定性），以防止未知决策。由于卷积神经网络（CNN）为基础的分割模型在高风险应用中的快速采用，这个主题已经发表了大量研究成果，导致其迅速扩展为一个独立的领域。本文通过讨论不确定性量化的基本概念、领域内的进展以及在各种任务中的应用，提供了对概率分割的全面概述。此外，关于两种不确定性的文献追溯到四个关键应用：（1）量化由于模糊图像而导致的注释过程中的统计不一致性，（2）将预测误差与不确定性相关联，（3）扩展模型的假设空间以实现更好的泛化，以及（4）主动学习。随后进行了广泛讨论，包括对每种应用所使用的数据集的概述以及可用方法的评估。我们还强调了与架构、不确定性量化方法、标准化和基准测试相关的挑战，并最终提出了关于未来工作的建议，例如基于单次前向传递的方法和适当利用体积数据的模型。

更新时间: 2025-07-02 13:47:36

领域: cs.CV,cs.AI,cs.LG,eess.IV,stat.ML

下载: http://arxiv.org/abs/2411.16370v4

Agent Ideate: A Framework for Product Idea Generation from Patents Using Agentic AI

Patents contain rich technical knowledge that can inspire innovative product ideas, yet accessing and interpreting this information remains a challenge. This work explores the use of Large Language Models (LLMs) and autonomous agents to mine and generate product concepts from a given patent. In this work, we design Agent Ideate, a framework for automatically generating product-based business ideas from patents. We experimented with open-source LLMs and agent-based architectures across three domains: Computer Science, Natural Language Processing, and Material Chemistry. Evaluation results show that the agentic approach consistently outperformed standalone LLMs in terms of idea quality, relevance, and novelty. These findings suggest that combining LLMs with agentic workflows can significantly enhance the innovation pipeline by unlocking the untapped potential of business idea generation from patent data.

Updated: 2025-07-02 13:47:17

标题: Agent Ideate: 使用Agentic AI从专利中生成产品创意的框架

摘要: 专利包含丰富的技术知识，可以激发创新产品的想法，但访问和解释这些信息仍然是一个挑战。这项工作探讨了使用大型语言模型（LLMs）和自主代理来从给定的专利中挖掘和生成产品概念。在这项工作中，我们设计了Agent Ideate，这是一个自动从专利中生成基于产品的商业想法的框架。我们在计算机科学、自然语言处理和材料化学三个领域尝试了开源LLMs和基于代理的架构。评估结果显示，在想法质量、相关性和新颖性方面，代理方法始终优于独立的LLMs。这些发现表明，将LLMs与代理工作流相结合可以显著增强创新管道，释放从专利数据中生成商业想法的未开发潜力。

更新时间: 2025-07-02 13:47:17

领域: cs.AI,cs.IR,cs.LG,cs.MA

下载: http://arxiv.org/abs/2507.01717v1

Enhancing Expressivity of Quantum Neural Networks Based on the SWAP test

Parameterized quantum circuits represent promising architectures for machine learning applications, yet many lack clear connections to classical models, potentially limiting their ability to translate the wide success of classical neural networks to the quantum realm. We examine a specific type of quantum neural network (QNN) built exclusively from SWAP test circuits, and discuss its mathematical equivalence to a classical two-layer feedforward network with quadratic activation functions under amplitude encoding. Our analysis across classical real-world and synthetic datasets reveals that while this architecture can successfully learn many practical tasks, it exhibits fundamental expressivity limitations due to violating the universal approximation theorem, particularly failing on harder problems like the parity check function. To address this limitation, we introduce a circuit modification using generalized SWAP test circuits that effectively implements classical neural networks with product layers. This enhancement enables successful learning of parity check functions in arbitrary dimensions which we analytically argue to be impossible for the original architecture beyond two dimensions regardless of network size. Our results establish a framework for enhancing QNN expressivity through classical task analysis and demonstrate that our SWAP test-based architecture offers broad representational capacity, suggesting potential promise also for quantum learning tasks.

Updated: 2025-07-02 13:44:48

标题: 基于SWAP测试增强量子神经网络的表达能力

摘要: 参数化量子电路代表着机器学习应用的有前途的架构，然而许多缺乏与经典模型的明确联系，可能限制它们将经典神经网络的广泛成功转化为量子领域的能力。我们研究了一种特定类型的量子神经网络（QNN），它仅由SWAP测试电路构建，并讨论了在振幅编码下与二层经典前馈网络具有数学等价性的情况。我们在经典现实世界和合成数据集上的分析表明，虽然这种架构可以成功地学习许多实际任务，但由于违反了普遍逼近定理，它表现出根本的表达能力限制，特别是在像奇偶校验函数这样的更难的问题上失败。为了解决这个限制，我们引入了一种使用广义SWAP测试电路的电路修改，有效地实现具有乘积层的经典神经网络。这种增强使得能够成功地学习任意维度的奇偶校验函数，我们通过分析论证，原始架构在超过两个维度时无论网络大小如何都是不可能的。我们的结果建立了一个通过经典任务分析增强QNN表达能力的框架，并展示了我们基于SWAP测试的架构提供了广泛的表示能力，也为量子学习任务提供了潜在的希望。

更新时间: 2025-07-02 13:44:48

领域: quant-ph,cs.ET,cs.LG

下载: http://arxiv.org/abs/2506.16938v2

B-PL-PINN: Stabilizing PINN Training with Bayesian Pseudo Labeling

Training physics-informed neural networks (PINNs) for forward problems often suffers from severe convergence issues, hindering the propagation of information from regions where the desired solution is well-defined. Haitsiukevich and Ilin (2023) proposed an ensemble approach that extends the active training domain of each PINN based on i) ensemble consensus and ii) vicinity to (pseudo-)labeled points, thus ensuring that the information from the initial condition successfully propagates to the interior of the computational domain. In this work, we suggest replacing the ensemble by a Bayesian PINN, and consensus by an evaluation of the PINN's posterior variance. Our experiments show that this mathematically principled approach outperforms the ensemble on a set of benchmark problems and is competitive with PINN ensembles trained with combinations of Adam and LBFGS.

Updated: 2025-07-02 13:44:31

标题: B-PL-PINN：使用贝叶斯伪标签稳定PINN训练

摘要: 在解决前向问题时，训练物理信息神经网络（PINNs）通常会遇到严重的收敛问题，阻碍了信息从所需解决方案明确定义的区域传播。Haitsiukevich和Ilin（2023）提出了一种集成方法，基于i）集成共识和ii）（伪）标记点的邻近性，扩展了每个PINN的活动训练领域，从而确保初始条件的信息成功传播到计算域的内部。在这项工作中，我们建议用贝叶斯PINN替换集成，用PINN后验方差的评估替代共识。我们的实验表明，这种数学上合理的方法在一组基准问题上优于集成方法，并且与使用Adam和LBFGS组合训练的PINN集成具有竞争力。

更新时间: 2025-07-02 13:44:31

领域: cs.LG

下载: http://arxiv.org/abs/2507.01714v1

GraphGSOcc: Semantic-Geometric Graph Transformer with Dynamic-Static Decoupling for 3D Gaussian Splatting-based Occupancy Prediction

Addressing the task of 3D semantic occupancy prediction for autonomous driving, we tackle two key issues in existing 3D Gaussian Splatting (3DGS) methods: (1) unified feature aggregation neglecting semantic correlations among similar categories and across regions, (2) boundary ambiguities caused by the lack of geometric constraints in MLP iterative optimization and (3) biased issues in dynamic-static object coupling optimization. We propose the GraphGSOcc model, a novel framework that combines semantic and geometric graph Transformer and decouples dynamic-static objects optimization for 3D Gaussian Splatting-based Occupancy Prediction. We propose the Dual Gaussians Graph Attenntion, which dynamically constructs dual graph structures: a geometric graph adaptively calculating KNN search radii based on Gaussian poses, enabling large-scale Gaussians to aggregate features from broader neighborhoods while compact Gaussians focus on local geometric consistency; a semantic graph retaining top-M highly correlated nodes via cosine similarity to explicitly encode semantic relationships within and across instances. Coupled with the Multi-scale Graph Attention framework, fine-grained attention at lower layers optimizes boundary details, while coarsegrained attention at higher layers models object-level topology. On the other hand, we decouple dynamic and static objects by leveraging semantic probability distributions and design a Dynamic-Static Decoupled Gaussian Attention mechanism to optimize the prediction performance for both dynamic objects and static scenes. GraphGSOcc achieves state-ofthe-art performance on the SurroundOcc-nuScenes, Occ3D-nuScenes, OpenOcc and KITTI occupancy benchmarks. Experiments on the SurroundOcc dataset achieve an mIoU of 25.20%, reducing GPU memory to 6.8 GB, demonstrating a 1.97% mIoU improvement and 13.7% memory reduction compared to GaussianWorld.

Updated: 2025-07-02 13:42:48

标题: GraphGSOcc：具有动态静态解耦的语义几何图变换器，用于基于三维高斯喷点的占据预测

摘要: 我们针对自动驾驶中的3D语义占用预测任务，解决了现有3D高斯喷洒（3DGS）方法中存在的两个关键问题：（1）统一特征聚合忽视了相似类别和区域之间的语义相关性，（2）由于MLP迭代优化中缺乏几何约束而引起的边界模糊性，以及（3）动态-静态对象耦合优化中的偏差问题。我们提出了GraphGSOcc模型，这是一个将语义和几何图Transformer相结合的新框架，并对3D高斯喷洒占用预测进行了动态-静态对象优化解耦。我们提出了双高斯图注意力，动态构建双图结构：一个几何图根据高斯姿态自适应计算KNN搜索半径，使大规模高斯能够从更广泛的邻域聚合特征，而紧凑的高斯则专注于局部几何一致性；一个语义图通过余弦相似性保留前M个高度相关节点，以显式编码实例内部和跨实例的语义关系。结合多尺度图注意力框架，在较低层进行细粒度注意力优化边界细节，而在较高层进行粗粒度注意力建模物体级拓扑。另一方面，我们通过利用语义概率分布将动态和静态对象解耦，并设计了一个动态-静态解耦高斯注意机制，以优化动态对象和静态场景的预测性能。GraphGSOcc在SurroundOcc-nuScenes、Occ3D-nuScenes、OpenOcc和KITTI占用基准测试中实现了最先进的性能。在SurroundOcc数据集上的实验中，mIoU为25.20％，将GPU内存减少到6.8 GB，与GaussianWorld相比，mIoU提高了1.97％，内存减少了13.7％。

更新时间: 2025-07-02 13:42:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.14825v2

Towards Better Attribute Inference Vulnerability Measures

The purpose of anonymizing structured data is to protect the privacy of individuals in the data while retaining the statistical properties of the data. An important class of attack on anonymized data is attribute inference, where an attacker infers the value of an unknown attribute of a target individual given knowledge of one or more known attributes. A major limitation of recent attribute inference measures is that they do not take recall into account, only precision. It is often the case that attacks target only a fraction of individuals, for instance data outliers. Incorporating recall, however, substantially complicates the measure, because one must determine how to combine recall and precision in a composite measure for both the attack and baseline. This paper presents the design and implementation of an attribute inference measure that incorporates both precision and recall. Our design also improves on how the baseline attribute inference is computed. In experiments using a generic best row match attack on moderately-anonymized microdata, we show that in over 25\% of the attacks, our approach correctly labeled the attack to be at risk while the prior approach incorrectly labeled the attack to be safe.

Updated: 2025-07-02 13:41:08

标题: 朝着更好的属性推断漏洞度量方法前进

摘要: 对结构化数据进行匿名化的目的是保护数据中个人的隐私，同时保留数据的统计特性。对匿名化数据的一种重要攻击是属性推断，攻击者在了解一个或多个已知属性的情况下推断目标个体的未知属性的值。最近属性推断测量的一个主要局限性是它们只考虑精确度，而不考虑召回率。通常情况下，攻击只针对部分个体，例如数据异常值。然而，合并召回率会大大复杂化测量，因为必须确定如何将召回率和精确度结合在攻击和基线的复合测量中。本文介绍了一个结合精确度和召回率的属性推断测量的设计和实施。我们的设计还改进了基线属性推断的计算方式。在使用通用最佳行匹配攻击对中度匿名化微数据进行实验时，我们发现在超过25%的攻击中，我们的方法正确标记了攻击处于风险状态，而之前的方法错误地将攻击标记为安全。

更新时间: 2025-07-02 13:41:08

领域: cs.CR

下载: http://arxiv.org/abs/2507.01710v1

Grower-in-the-Loop Interactive Reinforcement Learning for Greenhouse Climate Control

Climate control is crucial for greenhouse production as it directly affects crop growth and resource use. Reinforcement learning (RL) has received increasing attention in this field, but still faces challenges, including limited training efficiency and high reliance on initial learning conditions. Interactive RL, which combines human (grower) input with the RL agent's learning, offers a potential solution to overcome these challenges. However, interactive RL has not yet been applied to greenhouse climate control and may face challenges related to imperfect inputs. Therefore, this paper aims to explore the possibility and performance of applying interactive RL with imperfect inputs into greenhouse climate control, by: (1) developing three representative interactive RL algorithms tailored for greenhouse climate control (reward shaping, policy shaping and control sharing); (2) analyzing how input characteristics are often contradicting, and how the trade-offs between them make grower's inputs difficult to perfect; (3) proposing a neural network-based approach to enhance the robustness of interactive RL agents under limited input availability; (4) conducting a comprehensive evaluation of the three interactive RL algorithms with imperfect inputs in a simulated greenhouse environment. The demonstration shows that interactive RL incorporating imperfect grower inputs has the potential to improve the performance of the RL agent. RL algorithms that influence action selection, such as policy shaping and control sharing, perform better when dealing with imperfect inputs, achieving 8.4% and 6.8% improvement in profit, respectively. In contrast, reward shaping, an algorithm that manipulates the reward function, is sensitive to imperfect inputs and leads to a 9.4% decrease in profit. This highlights the importance of selecting an appropriate mechanism when incorporating imperfect inputs.

Updated: 2025-07-02 13:40:18

标题: "种植者参与的温室气候控制互动式强化学习"

摘要: 气候控制对温室生产至关重要，因为它直接影响作物生长和资源利用。强化学习（RL）在这一领域受到越来越多的关注，但仍面临挑战，包括训练效率有限和对初始学习条件的高度依赖。交互式RL将人类（种植者）输入与RL代理的学习相结合，提供了克服这些挑战的潜在解决方案。然而，交互式RL尚未应用于温室气候控制，可能面临与不完美输入相关的挑战。因此，本文旨在探讨将具有不完美输入的交互式RL应用于温室气候控制的可能性和性能，具体包括：（1）开发三种专门针对温室气候控制的代表性交互式RL算法（奖励塑形、策略塑形和控制共享）；（2）分析输入特征通常相互矛盾，以及它们之间的权衡如何使种植者的输入难以完美；（3）提出一种基于神经网络的方法，以增强有限输入可用性下交互式RL代理的鲁棒性；（4）在模拟温室环境中对具有不完美输入的三种交互式RL算法进行全面评估。演示表明，融入不完美种植者输入的交互式RL有潜力提高RL代理的性能。影响行动选择的RL算法，如策略塑形和控制共享，在处理不完美输入时表现更好，分别实现了8.4%和6.8%的利润改善。相比之下，操纵奖励函数的奖励塑形算法对不完美输入敏感，导致利润下降了9.4%。这突显了在融入不完美输入时选择适当机制的重要性。

更新时间: 2025-07-02 13:40:18

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2505.23355v2

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness

The proliferation of multimodal memes in the social media era demands that multimodal Large Language Models (mLLMs) effectively understand meme harmfulness. Existing benchmarks for assessing mLLMs on harmful meme understanding rely on accuracy-based, model-agnostic evaluations using static datasets. These benchmarks are limited in their ability to provide up-to-date and thorough assessments, as online memes evolve dynamically. To address this, we propose AdamMeme, a flexible, agent-based evaluation framework that adaptively probes the reasoning capabilities of mLLMs in deciphering meme harmfulness. Through multi-agent collaboration, AdamMeme provides comprehensive evaluations by iteratively updating the meme data with challenging samples, thereby exposing specific limitations in how mLLMs interpret harmfulness. Extensive experiments show that our framework systematically reveals the varying performance of different target mLLMs, offering in-depth, fine-grained analyses of model-specific weaknesses. Our code is available at https://github.com/Lbotirx/AdamMeme.

Updated: 2025-07-02 13:32:30

标题: AdamMeme：在有害性上自适应地探究多模态大型语言模型的推理能力

摘要: 在社交媒体时代，多模态模因的激增要求多模态大型语言模型（mLLMs）有效理解模因的有害性。现有用于评估mLLMs对有害模因理解的基准依赖于基于准确性的、模型不可知的评估，使用静态数据集。这些基准在提供最新和全面评估能力方面存在局限性，因为在线模因动态演变。为了解决这个问题，我们提出AdamMeme，一个灵活的、基于代理的评估框架，自适应地检验mLLMs在解读模因有害性方面的推理能力。通过多代理协作，AdamMeme通过迭代更新具有挑战性样本的模因数据，从而暴露mLLMs解释有害性时的具体限制。大量实验表明，我们的框架系统地揭示了不同目标mLLMs的性能差异，提供了深入、细致的模型特定弱点分析。我们的代码可在https://github.com/Lbotirx/AdamMeme找到。

更新时间: 2025-07-02 13:32:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.01702v1

Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture

In this paper, we propose to incorporate the blackboard architecture into LLM multi-agent systems (MASs) so that (1) agents with various roles can share all the information and others' messages during the whole problem-solving process, (2) agents that will take actions are selected based on the current content of the blackboard, and (3) the selection and execution round is repeated until a consensus is reached on the blackboard. We develop the first implementation of this proposal and conduct experiments on commonsense knowledge, reasoning and mathematical datasets. The results show that our system can be competitive with the SOTA static and dynamic MASs by achieving the best average performance, and at the same time manage to spend less tokens. Our proposal has the potential to enable complex and dynamic problem-solving where well-defined structures or workflows are unavailable.

Updated: 2025-07-02 13:30:44

标题: 基于黑板架构的先进LLM多智能体系统的探索

摘要: 在这篇论文中，我们提议将黑板架构纳入LLM多智能体系统(MASs)中，以便(1)不同角色的代理可以在整个问题解决过程中共享所有信息和其他人的消息，(2)将根据黑板的当前内容选择将采取行动的代理，(3)选择和执行轮次将重复直到在黑板上达成共识。我们开发了这一提议的第一个实现，并在常识知识、推理和数学数据集上进行实验。结果显示，我们的系统在实现最佳平均性能的同时，可以与SOTA静态和动态MASs竞争，并且可以花费更少的代币。我们的提议有潜力实现复杂和动态的问题解决，即使没有明确定义的结构或工作流程。

更新时间: 2025-07-02 13:30:44

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2507.01701v1

Relational Causal Discovery with Latent Confounders

Estimating causal effects from real-world relational data can be challenging when the underlying causal model and potential confounders are unknown. While several causal discovery algorithms exist for learning causal models with latent confounders from data, they assume that the data is independent and identically distributed (i.i.d.) and are not well-suited for learning from relational data. Similarly, existing relational causal discovery algorithms assume causal sufficiency, which is unrealistic for many real-world datasets. To address this gap, we propose RelFCI, a sound and complete causal discovery algorithm for relational data with latent confounders. Our work builds upon the Fast Causal Inference (FCI) and Relational Causal Discovery (RCD) algorithms and it defines new graphical models, necessary to support causal discovery in relational domains. We also establish soundness and completeness guarantees for relational d-separation with latent confounders. We present experimental results demonstrating the effectiveness of RelFCI in identifying the correct causal structure in relational causal models with latent confounders.

Updated: 2025-07-02 13:29:35

标题: 使用潜在混淆因素进行关系因果发现

摘要: 从真实世界的关系数据中估计因果效应可能具有挑战性，特别是当潜在的因果模型和潜在混杂因素未知时。尽管存在几种因果发现算法用于从数据中学习具有潜在混杂因素的因果模型，但它们假设数据是独立同分布的（i.i.d.），并且不适合从关系数据中学习。同样，现有的关系因果发现算法假设因果充分性，这对许多真实世界数据集来说是不现实的。为了弥补这一差距，我们提出了RelFCI，这是一个适用于具有潜在混杂因素的关系数据的完备因果发现算法。我们的工作基于快速因果推断（FCI）和关系因果发现（RCD）算法，并定义了支持关系域中因果发现所必需的新图形模型。我们还为具有潜在混杂因素的关系d-分离建立了完备性和正确性保证。我们展示了实验结果，证明了RelFCI在识别具有潜在混杂因素的关系因果模型中正确因果结构的有效性。

更新时间: 2025-07-02 13:29:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01700v1

Variational Graph Convolutional Neural Networks

Estimation of model uncertainty can help improve the explainability of Graph Convolutional Networks and the accuracy of the models at the same time. Uncertainty can also be used in critical applications to verify the results of the model by an expert or additional models. In this paper, we propose Variational Neural Network versions of spatial and spatio-temporal Graph Convolutional Networks. We estimate uncertainty in both outputs and layer-wise attentions of the models, which has the potential for improving model explainability. We showcase the benefits of these models in the social trading analysis and the skeleton-based human action recognition tasks on the Finnish board membership, NTU-60, NTU-120 and Kinetics datasets, where we show improvement in model accuracy in addition to estimated model uncertainties.

Updated: 2025-07-02 13:28:37

标题: 变分图卷积神经网络

摘要: 模型不确定性的估计可以帮助提高图卷积网络的可解释性，并同时提高模型的准确性。不确定性还可以用于关键应用程序，通过专家或额外模型验证模型的结果。在本文中，我们提出空间和时空图卷积网络的变分神经网络版本。我们估计模型的输出和层级注意力的不确定性，这有助于提高模型的可解释性。我们展示了这些模型在社交交易分析和基于骨架的人体动作识别任务中的优势，这些任务涉及芬兰董事会成员、NTU-60、NTU-120和Kinetics数据集，在这些数据集上我们展示了模型准确性的提高，同时估计了模型的不确定性。

更新时间: 2025-07-02 13:28:37

领域: cs.LG

下载: http://arxiv.org/abs/2507.01699v1

There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models

Diffusion Models achieve state-of-the-art performance in generating new samples but lack a low-dimensional latent space that encodes the data into meaningful features. Inversion-based methods address this by reversing the denoising trajectory, mapping each image back to its approximated starting noise. In this work, we thoroughly analyze this procedure and focus on the relation between the initial Gaussian noise, the generated samples, and their corresponding latent encodings obtained through the DDIM inversion. First, we show that latents exhibit structural patterns in the form of less diverse noise predicted for smooth image regions. As a consequence of this divergence, we present that the space of image inversions is notably less manipulative than the original Gaussian noise. Next, we explain the origin of the phenomenon, demonstrating that, during the first inversion steps, the noise prediction error is much more significant for the plain areas than for the rest of the image. As a surprisingly simple solution, we propose to replace the first DDIM Inversion steps with a forward diffusion process, which successfully decorrelates latent encodings, leading to higher quality editions and interpolations. The code is available at https://github.com/luk-st/taba.

Updated: 2025-07-02 13:27:19

标题: 来回往返：扩散模型中噪声和图像反转之间的关系

摘要: 扩散模型在生成新样本方面取得了最先进的性能，但缺乏将数据编码为有意义特征的低维潜在空间。基于反演的方法通过反转去噪轨迹，将每个图像映射回其近似的起始噪声，从而解决了这个问题。在这项工作中，我们对这一过程进行了彻底分析，并关注初始高斯噪声、生成样本以及通过DDIM反演获得的对应潜在编码之间的关系。首先，我们展示潜在变量呈现结构模式，即对光滑图像区域预测的噪声更少。由于这种差异，我们表明图像反演空间明显比原始高斯噪声更不可操作。接下来，我们解释了这一现象的起源，证明在第一次反演步骤中，对于平坦区域的噪声预测误差比对于图像其他部分的误差更显著。作为一个令人惊讶的简单解决方案，我们建议用前向扩散过程替换第一步DDIM反演，成功地去相关潜在编码，从而实现更高质量的编辑和插值。代码可在https://github.com/luk-st/taba获得。

更新时间: 2025-07-02 13:27:19

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.23530v3

Dynamic Similarity Graph Construction with Kernel Density Estimation

In the kernel density estimation (KDE) problem, we are given a set $X$ of data points in $\mathbb{R}^d$, a kernel function $k: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}$, and a query point $\mathbf{q} \in \mathbb{R}^d$, and the objective is to quickly output an estimate of $\sum_{\mathbf{x} \in X} k(\mathbf{q}, \mathbf{x})$. In this paper, we consider $\textsf{KDE}$ in the dynamic setting, and introduce a data structure that efficiently maintains the estimates for a set of query points as data points are added to $X$ over time. Based on this, we design a dynamic data structure that maintains a sparse approximation of the fully connected similarity graph on $X$, and develop a fast dynamic spectral clustering algorithm. We further evaluate the effectiveness of our algorithms on both synthetic and real-world datasets.

Updated: 2025-07-02 13:25:22

标题: 用核密度估计构建动态相似图

摘要: 在核密度估计（KDE）问题中，我们给定了一个在$\mathbb{R}^d$中的数据点集合$X$，一个核函数$k: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}$，以及一个查询点$\mathbf{q} \in \mathbb{R}^d$，目标是快速输出对$\sum_{\mathbf{x} \in X} k(\mathbf{q}, \mathbf{x})$的估计。在本文中，我们考虑动态设置下的$\textsf{KDE}$问题，并引入了一种数据结构，可以有效地在数据点被添加到$X$中时维护一组查询点的估计值。基于此，我们设计了一种动态数据结构，可以维护在$X$上的全连接相似性图的稀疏近似，并开发了一种快速的动态谱聚类算法。我们进一步评估了我们的算法在合成和真实数据集上的有效性。

更新时间: 2025-07-02 13:25:22

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2507.01696v1

FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs--Down to 2 Bits!

A widely-used technique in designing energy-efficient deep neural network (DNN) accelerators is quantization. Recent progress in this direction has reduced the bitwidths used in DNN down to 2. Meanwhile, many prior works apply approximate multipliers (AppMuls) in designing DNN accelerators to lower their energy consumption. Unfortunately, these works still assume a bitwidth much larger than 2, which falls far behind the state-of-the-art in quantization area and even challenges the meaningfulness of applying AppMuls in DNN accelerators, since a high-bitwidth AppMul consumes much more energy than a low-bitwidth exact multiplier! Thus, an important problem to study is: Can approximate multipliers be effectively applied to quantized DNN models with very low bitwidths? In this work, we give an affirmative answer to this question and present a systematic solution that achieves the answer: FAMES, a fast approximate multiplier substitution method for mixed-precision DNNs. Our experiments demonstrate an average 28.67% energy reduction on state-of-the-art mixed-precision quantized models with bitwidths as low as 2 bits and accuracy losses kept under 1%. Additionally, our approach is up to 300x faster than previous genetic algorithm-based methods.

Updated: 2025-07-02 13:22:52

标题: FAMES: 面向混合精度量化DNN的快速近似乘法器替代方案-降至2位！

摘要: 在设计能效深度神经网络（DNN）加速器中，一种广泛使用的技术是量化。最近在这个方向上取得的进展已经将DNN中使用的比特位降低到2位。与此同时，许多先前的工作在设计DNN加速器时应用近似乘法器（AppMuls）以降低能耗。然而，这些工作仍然假设比特位远远大于2位，这远远落后于量化领域的最新技术，并且甚至挑战了在DNN加速器中应用AppMuls的意义，因为高比特位的AppMul比低比特位的精确乘法器消耗的能量要多得多！因此，一个重要的问题要研究：近似乘法器能否有效地应用于具有非常低比特位的量化DNN模型？在这项工作中，我们肯定地回答了这个问题，并提出了一个系统性解决方案来实现这一答案：FAMES，一种用于混合精度DNN的快速近似乘法器替代方法。我们的实验表明，在比特位仅为2位且准确度损失保持在1%以下的最新混合精度量化模型上，能量平均降低了28.67％。此外，我们的方法比先前基于遗传算法的方法快多达300倍。

更新时间: 2025-07-02 13:22:52

领域: cs.LG,cs.ET

下载: http://arxiv.org/abs/2411.18055v3

PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution

Deep neural networks (DNNs) have become ubiquitous thanks to their remarkable ability to model complex patterns across various domains such as computer vision, speech recognition, robotics, etc. While large DNN models are often more accurate than simpler, lightweight models, they are also resource- and energy-hungry. Hence, it is imperative to design methods to reduce reliance on such large models without significant degradation in output accuracy. The high computational cost of these models is often necessary only for a reduced set of challenging inputs, while lighter models can handle most simple ones. Thus, carefully combining properties of existing DNN models in a dynamic, input-based way opens opportunities to improve efficiency without impacting accuracy. In this work, we introduce PERTINENCE, a novel online method designed to analyze the complexity of input features and dynamically select the most suitable model from a pre-trained set to process a given input effectively. To achieve this, we employ a genetic algorithm to explore the training space of an ML-based input dispatcher, enabling convergence towards the Pareto front in the solution space that balances overall accuracy and computational efficiency. We showcase our approach on state-of-the-art Convolutional Neural Networks (CNNs) trained on the CIFAR-10 and CIFAR-100, as well as Vision Transformers (ViTs) trained on TinyImageNet dataset. We report results showing PERTINENCE's ability to provide alternative solutions to existing state-of-the-art models in terms of trade-offs between accuracy and number of operations. By opportunistically selecting among models trained for the same task, PERTINENCE achieves better or comparable accuracy with up to 36% fewer operations.

Updated: 2025-07-02 13:22:05

标题: 相关性：基于输入的机会性神经网络动态执行

摘要: 深度神经网络（DNNs）已经变得无处不在，这要归功于它们在计算机视觉、语音识别、机器人等各个领域中模拟复杂模式的卓越能力。尽管大型DNN模型通常比简单、轻量级模型更准确，但它们也需要大量资源和能源。因此，有必要设计方法来减少对这些大型模型的依赖，同时又不会显著降低输出准确性。这些模型的高计算成本通常只对一小部分具有挑战性的输入是必要的，而轻量级模型可以处理大多数简单的输入。因此，通过动态、基于输入的方式精心结合现有DNN模型的特性，可以提高效率而不影响准确性。在本工作中，我们介绍了PERTINENCE，这是一种新颖的在线方法，旨在分析输入特征的复杂性，并动态选择从预训练集中有效处理给定输入的最适合的模型。为了实现这一目标，我们采用了遗传算法来探索基于ML的输入调度器的训练空间，使其在解决方案空间中向平衡整体准确性和计算效率的Pareto前沿收敛。我们在经过训练的CIFAR-10和CIFAR-100上展示了我们的方法，以及在TinyImageNet数据集上经过训练的最先进的卷积神经网络（CNNs）和Vision Transformers（ViTs）。我们报告了PERTINENCE在准确性和操作次数之间权衡方面提供替代解决方案的能力。通过在为同一任务训练的模型中进行机会选择，PERTINENCE实现了比原模型更好或相当的准确性，同时减少了高达36%的操作次数。

更新时间: 2025-07-02 13:22:05

领域: cs.LG

下载: http://arxiv.org/abs/2507.01695v1

EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations

As the prediction horizon increases, predicting the future evolution of traffic scenes becomes increasingly difficult due to the multi-modal nature of agent motion. Most state-of-the-art (SotA) prediction models primarily focus on forecasting the most likely future. However, for the safe operation of autonomous vehicles, it is equally important to cover the distribution for plausible motion alternatives. To address this, we introduce EP-Diffuser, a novel parameter-efficient diffusion-based generative model designed to capture the distribution of possible traffic scene evolutions. Conditioned on road layout and agent history, our model acts as a predictor and generates diverse, plausible scene continuations. We benchmark EP-Diffuser against two SotA models in terms of accuracy and plausibility of predictions on the Argoverse 2 dataset. Despite its significantly smaller model size, our approach achieves both highly accurate and plausible traffic scene predictions. We further evaluate model generalization ability in an out-of-distribution (OoD) test setting using Waymo Open dataset and show superior robustness of our approach.

Updated: 2025-07-02 13:21:07

标题: EP-Diffuser：基于多项式表示的交通场景生成和预测的高效扩散模型

摘要: 随着预测时间范围的增加，由于代理运动的多模态特性，预测交通场景的未来演变变得越来越困难。大多数最先进的（SotA）预测模型主要侧重于预测最有可能的未来。然而，为了自动驾驶车辆的安全运行，同样重要的是覆盖可能运动替代方案的分布。为了解决这个问题，我们介绍了EP-Diffuser，这是一种新颖的参数高效扩散型生成模型，旨在捕捉可能的交通场景演变分布。在道路布局和代理历史的条件下，我们的模型充当预测器，并生成多样化、可信的场景延续。我们使用Argoverse 2数据集在准确性和预测的可信度方面对EP-Diffuser进行基准测试，并将其与两种SotA模型进行比较。尽管模型规模显著较小，我们的方法仍然实现了高度准确和可信的交通场景预测。我们进一步在Waymo Open数据集上评估模型的泛化能力，在一个分布外（OoD）测试环境中展示了我们方法的卓越鲁棒性。

更新时间: 2025-07-02 13:21:07

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2504.05422v2

Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks

Federated large language models (FedLLMs) provide powerful generative capabilities in CyberEdge networks while protecting data privacy. However, FedLLMs remains highly vulnerable to model poisoning attacks. This article first reviews recent model poisoning techniques and existing defense mechanisms for FedLLMs, highlighting critical limitations, particularly under non-IID text distributions. In particular, current defenses primarily utilize distance-based outlier detection or norm constraints, operating under the assumption that adversarial updates significantly diverge from benign statistics. This assumption can fail when facing adaptive attackers targeting billionparameter LLMs. Next, this article investigates emerging Graph Representation-Based Model Poisoning (GRMP), a novel attack paradigm that leverages higher-order correlations among honest client gradients to synthesize malicious updates indistinguishable from legitimate model updates. GRMP can effectively evade advanced defenses, resulting in substantial accuracy loss and performance degradation. Moreover, this article outlines a research roadmap emphasizing the importance of graph-aware secure aggregation methods, FedLLMs-specific vulnerability metrics, and evaluation frameworks to strengthen the robustness of future federated language model deployments.

Updated: 2025-07-02 13:20:52

标题: 在网络边缘的联邦LLM上基于图表示的模型投毒

摘要: 联邦化大型语言模型（FedLLMs）在CyberEdge网络中提供了强大的生成能力，同时保护数据隐私。然而，FedLLMs仍然对模型毒化攻击非常脆弱。本文首先回顾了最近的模型毒化技术和现有的FedLLMs防御机制，突出了关键限制，特别是在非独立同分布的文本分布下。特别是，当前的防御主要利用基于距离的异常值检测或规范约束，这是基于对抗性更新与良性统计显著不同的假设。当面对针对数十亿参数LLMs的自适应攻击者时，这一假设可能会失败。接下来，本文调查了新兴的基于图表示的模型毒化（GRMP），这是一种利用诚实客户梯度之间的高阶相关性来合成恶意更新的攻击范式，这些更新与合法模型更新难以区分。GRMP可以有效地规避先进的防御措施，导致重大的准确性损失和性能下降。此外，本文概述了一个研究路线图，强调了图感知安全聚合方法、FedLLMs特定的脆弱性指标和评估框架的重要性，以加强未来联邦语言模型部署的鲁棒性。

更新时间: 2025-07-02 13:20:52

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.01694v1

GPT, But Backwards: Exactly Inverting Language Model Outputs

While existing auditing techniques attempt to identify potential unwanted behaviours in large language models (LLMs), we address the complementary forensic problem of reconstructing the exact input that led to an existing LLM output - enabling post-incident analysis and potentially the detection of fake output reports. We formalize exact input reconstruction as a discrete optimisation problem with a unique global minimum and introduce SODA, an efficient gradient-based algorithm that operates on a continuous relaxation of the input search space with periodic restarts and parameter decay. Through comprehensive experiments on LLMs ranging in size from 33M to 3B parameters, we demonstrate that SODA significantly outperforms existing approaches. We succeed in fully recovering 79.5% of shorter out-of-distribution inputs from next-token logits, without a single false positive, but struggle to extract private information from the outputs of longer (15+ token) input sequences. This suggests that standard deployment practices may currently provide adequate protection against malicious use of our method. Our code is available at https://doi.org/10.5281/zenodo.15539879.

Updated: 2025-07-02 13:20:30

标题: GPT，但是反向：准确地反转语言模型的输出

摘要: 现有审计技术试图识别大型语言模型（LLMs）中潜在的不良行为，我们解决了重建导致现有LLM输出的确切输入的补充法庭问题 - 从而实现事后分析和潜在地检测虚假输出报告。我们将确切的输入重建形式化为一个具有唯一全局最小值的离散优化问题，并引入SODA，这是一种高效的基于梯度的算法，它在连续放松的输入搜索空间上操作，具有周期性重启和参数衰减。通过对从33M到3B参数的LLMs进行全面实验，我们证明了SODA显著优于现有方法。我们成功地从下一个令牌的对数中完全恢复了79.5％较短的脱离分布输入，没有一个错误报警，但在提取长（15个以上令牌）输入序列的输出中难以提取私人信息。这表明标准部署实践目前可能提供足够的保护，以防止我们的方法被恶意使用。我们的代码可以在https://doi.org/10.5281/zenodo.15539879上找到。

更新时间: 2025-07-02 13:20:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01693v1

Beyond Overcorrection: Evaluating Diversity in T2I Models with DIVBENCH

Current diversification strategies for text-to-image (T2I) models often ignore contextual appropriateness, leading to over-diversification where demographic attributes are modified even when explicitly specified in prompts. This paper introduces DIVBENCH, a benchmark and evaluation framework for measuring both under- and over-diversification in T2I generation. Through systematic evaluation of state-of-the-art T2I models, we find that while most models exhibit limited diversity, many diversification approaches overcorrect by inappropriately altering contextually-specified attributes. We demonstrate that context-aware methods, particularly LLM-guided FairDiffusion and prompt rewriting, can already effectively address under-diversity while avoiding over-diversification, achieving a better balance between representation and semantic fidelity.

Updated: 2025-07-02 13:14:42

标题: 超越过度修正：用DIVBENCH评估T2I模型中的多样性

摘要: 目前文本到图像（T2I）模型的当前多样化策略往往忽略了上下文的适当性，导致过度多样化，即使在提示中明确指定了人口属性，也会修改。本文介绍了DIVBENCH，一个用于衡量T2I生成中既有过度多样性又有不足多样性的基准和评估框架。通过对最先进的T2I模型进行系统评估，我们发现大多数模型表现出有限的多样性，许多多样化方法通过不适当地改变上下文指定的属性而过度纠正。我们展示了上下文感知方法，特别是LLM引导的FairDiffusion和提示重写，已经能够有效地解决不足多样性，同时避免过度多样化，实现更好的代表性和语义忠实度的平衡。

更新时间: 2025-07-02 13:14:42

领域: cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2507.03015v1

A generative modeling / Physics-Informed Neural Network approach to random differential equations

The integration of Scientific Machine Learning (SciML) techniques with uncertainty quantification (UQ) represents a rapidly evolving frontier in computational science. This work advances Physics-Informed Neural Networks (PINNs) by incorporating probabilistic frameworks to effectively model uncertainty in complex systems. Our approach enhances the representation of uncertainty in forward problems by combining generative modeling techniques with PINNs. This integration enables in a systematic fashion uncertainty control while maintaining the predictive accuracy of the model. We demonstrate the utility of this method through applications to random differential equations and random partial differential equations (PDEs).

Updated: 2025-07-02 13:14:17

标题: 一种生成模型/物理知识引导神经网络方法用于随机微分方程

摘要: 科学机器学习（SciML）技术与不确定性量化（UQ）的整合代表了计算科学中快速发展的前沿。本研究通过将概率框架纳入物理信息神经网络（PINNs）中，有效地模拟复杂系统中的不确定性，推进了PINNs的发展。我们的方法通过将生成建模技术与PINNs相结合，增强了正向问题中不确定性的表征。这种整合方式在保持模型预测准确性的同时，以系统化的方式实现了不确定性控制。我们通过应用于随机微分方程和随机偏微分方程（PDEs）的案例展示了这种方法的实用性。

更新时间: 2025-07-02 13:14:17

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2507.01687v1

Enhancing Robustness to Missing Modalities through Clustered Federated Learning

In the era of big data, data mining has become indispensable for uncovering hidden patterns and insights from vast and complex datasets. The integration of multimodal data sources further enhances its potential. Multimodal Federated Learning (MFL) is a distributed approach that enhances the efficiency and quality of multimodal learning, ensuring collaborative work and privacy protection. However, missing modalities pose a significant challenge in MFL, often due to data quality issues or privacy policies across the clients. In this work, we present MMiC, a framework for Mitigating Modality incompleteness in MFL within the Clusters. MMiC replaces partial parameters within client models inside clusters to mitigate the impact of missing modalities. Furthermore, it leverages the Banzhaf Power Index to optimize client selection under these conditions. Finally, MMiC employs an innovative approach to dynamically control global aggregation by utilizing Markovitz Portfolio Optimization. Extensive experiments demonstrate that MMiC consistently outperforms existing federated learning architectures in both global and personalized performance on multimodal datasets with missing modalities, confirming the effectiveness of our proposed solution.

Updated: 2025-07-02 13:11:28

标题: 通过集群化联邦学习增强对缺失模态的稳健性

摘要: 在大数据时代，数据挖掘已经成为揭示庞大复杂数据集中隐藏模式和见解的不可或缺的工具。多模态数据源的整合进一步提升了数据挖掘的潜力。多模态联邦学习（MFL）是一种分布式方法，增强了多模态学习的效率和质量，确保协作工作和隐私保护。然而，在MFL中缺失的模态构成了一个重要挑战，通常是由于客户端之间的数据质量问题或隐私政策所致。在这项工作中，我们提出了MMiC，一个旨在在集群内减轻多模态不完整性的框架。MMiC替换了集群内客户端模型中的部分参数，以减轻缺失模态的影响。此外，它利用班扎夫权力指数优化在这些条件下的客户端选择。最后，MMiC采用创新方法通过利用Markovitz组合优化动态控制全局聚合。大量实验证明，MMiC在具有缺失模态的多模态数据集上始终优于现有的联邦学习架构，证实了我们提出的解决方案的有效性。

更新时间: 2025-07-02 13:11:28

领域: cs.LG,cs.AI,I.2.11; I.2.7

下载: http://arxiv.org/abs/2505.06911v2

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling

Existing post-training techniques for large language models are broadly categorized into Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT). Each paradigm presents a distinct trade-off: SFT excels at mimicking demonstration data but can lead to problematic generalization as a form of behavior cloning. Conversely, RFT can significantly enhance a model's performance but is prone to learn unexpected behaviors, and its performance is highly sensitive to the initial policy. In this paper, we propose a unified view of these methods and introduce Prefix-RFT, a hybrid approach that synergizes learning from both demonstration and exploration. Using mathematical reasoning problems as a testbed, we empirically demonstrate that Prefix-RFT is both simple and effective. It not only surpasses the performance of standalone SFT and RFT but also outperforms parallel mixed-policy RFT methods. A key advantage is its seamless integration into existing open-source frameworks, requiring only minimal modifications to the standard RFT pipeline. Our analysis highlights the complementary nature of SFT and RFT, and validates that Prefix-RFT effectively harmonizes these two learning paradigms. Furthermore, ablation studies confirm the method's robustness to variations in the quality and quantity of demonstration data. We hope this work offers a new perspective on LLM post-training, suggesting that a unified paradigm that judiciously integrates demonstration and exploration could be a promising direction for future research.

Updated: 2025-07-02 13:04:09

标题: 混合监督和强化微调与前缀抽样

摘要: 现有的大型语言模型后训练技术通常被广泛分类为监督微调（SFT）和强化微调（RFT）。每种范式都呈现出明显的权衡：SFT在模仿演示数据方面表现出色，但可能会导致问题的泛化，形成一种行为克隆。相反，RFT可以显著提高模型的性能，但容易学习到意外的行为，并且其性能对初始策略非常敏感。在本文中，我们提出了这些方法的统一观点，并介绍了Prefix-RFT，一种融合了来自演示和探索学习的混合方法。通过数学推理问题作为测试基础，我们经验性地证明了Prefix-RFT既简单又有效。它不仅超越了独立的SFT和RFT的性能，还优于并行混合策略RFT方法。其关键优势在于其无缝集成到现有开源框架中，只需对标准RFT管道进行最少的修改。我们的分析突出了SFT和RFT的互补性质，并验证了Prefix-RFT有效地协调了这两种学习范式。此外，消融研究证实了该方法对演示数据的质量和数量变化的稳健性。我们希望这项工作为LLM后训练提供了新的视角，表明谨慎地整合演示和探索的统一范式可能是未来研究的一个有希望的方向。

更新时间: 2025-07-02 13:04:09

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.01679v1

Positioning AI Tools to Support Online Harm Reduction Practice: Applications and Design Directions

Access to accurate and actionable harm reduction information can directly impact the health outcomes of People Who Use Drugs (PWUD), yet existing online channels often fail to meet their diverse and dynamic needs due to limitations in adaptability, accessibility, and the pervasive impact of stigma. Large Language Models (LLMs) present a novel opportunity to enhance information provision, but their application in such a high-stakes domain is under-explored and presents socio-technical challenges. This paper investigates how LLMs can be responsibly designed to support the information needs of PWUD. Through a qualitative workshop involving diverse stakeholder groups (academics, harm reduction practitioners, and an online community moderator), we explored LLM capabilities, identified potential use cases, and delineated core design considerations. Our findings reveal that while LLMs can address some existing information barriers (e.g., by offering responsive, multilingual, and potentially less stigmatising interactions), their effectiveness is contingent upon overcoming challenges related to ethical alignment with harm reduction principles, nuanced contextual understanding, effective communication, and clearly defined operational boundaries. We articulate design pathways emphasising collaborative co-design with experts and PWUD to develop LLM systems that are helpful, safe, and responsibly governed. This work contributes empirically grounded insights and actionable design considerations for the responsible development of LLMs as supportive tools within the harm reduction ecosystem.

Updated: 2025-07-02 13:02:17

标题: 将AI工具定位为支持在线危害减少实践的工具：应用和设计方向

摘要: 访问准确和可行的危害减少信息可以直接影响使用药物者（PWUD）的健康结果，然而现有的在线渠道通常由于适应性、可访问性和普遍的耻辱影响而无法满足他们多样化和动态的需求。大型语言模型（LLMs）为增强信息提供提供了一个新的机会，但在这样一个高风险的领域中应用仍未得到充分探讨，并且存在着社会技术挑战。本文研究了如何负责任地设计LLMs以支持PWUD的信息需求。通过一个涉及不同利益相关者群体（学者、危害减少从业者和在线社区主持人）的定性研讨会，我们探讨了LLMs的能力，确定了潜在的用例，并明确了核心设计考虑因素。我们的研究结果表明，虽然LLMs可以解决一些现有的信息障碍（例如，通过提供响应性、多语言和潜在地减少污名化的互动），但它们的有效性取决于克服与危害减少原则的道德一致性、复杂的背景理解、有效的沟通和明确定义的运营边界相关的挑战。我们阐明了强调与专家和PWUD协同设计，开发有益、安全和负责任治理的LLM系统的设计路径。这项工作为LLMs作为危害减少生态系统中支持性工具的负责任发展提供了基于经验的见解和可操作的设计考虑因素。

更新时间: 2025-07-02 13:02:17

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2506.22941v2

Deep Recommender Models Inference: Automatic Asymmetric Data Flow Optimization

Deep Recommender Models (DLRMs) inference is a fundamental AI workload accounting for more than 79% of the total AI workload in Meta's data centers. DLRMs' performance bottleneck is found in the embedding layers, which perform many random memory accesses to retrieve small embedding vectors from tables of various sizes. We propose the design of tailored data flows to speedup embedding look-ups. Namely, we propose four strategies to look up an embedding table effectively on one core, and a framework to automatically map the tables asymmetrically to the multiple cores of a SoC. We assess the effectiveness of our method using the Huawei Ascend AI accelerators, comparing it with the default Ascend compiler, and we perform high-level comparisons with Nvidia A100. Results show a speed-up varying from 1.5x up to 6.5x for real workload distributions, and more than 20x for extremely unbalanced distributions. Furthermore, the method proves to be much more independent of the query distribution than the baseline.

Updated: 2025-07-02 13:00:39

标题: 深度推荐模型推理：自动对称数据流优化

摘要: 深度推荐模型（DLRMs）的推理是Meta数据中心总人工智能工作负载的基础，占比超过79％。DLRMs的性能瓶颈主要出现在嵌入层，该层执行许多随机内存访问，从各种大小的表中检索小的嵌入向量。我们提出了设计定制数据流以加快嵌入查找速度。具体来说，我们提出了四种策略在一个核心上有效地查找嵌入表，并提出了一个框架，自动将表不对称地映射到SoC的多个核心上。我们使用华为Ascend人工智能加速器评估了我们的方法的有效性，与默认的Ascend编译器进行了比较，并与Nvidia A100进行了高层次的比较。结果显示，对于真实的工作负载分布，加速比从1.5倍到6.5倍不等，并且对于极不平衡的分布，超过20倍。此外，该方法在查询分布方面比基准方法更加独立。

更新时间: 2025-07-02 13:00:39

领域: cs.DC,cs.AI,cs.AR,cs.IR,C.4; D.1.3; H.3.3; H.3.4

下载: http://arxiv.org/abs/2507.01676v1

Comparing Optimization Algorithms Through the Lens of Search Behavior Analysis

The field of numerical optimization has recently seen a surge in the development of "novel" metaheuristic algorithms, inspired by metaphors derived from natural or human-made processes, which have been widely criticized for obscuring meaningful innovations and failing to distinguish themselves from existing approaches. Aiming to address these concerns, we investigate the applicability of statistical tests for comparing algorithms based on their search behavior. We utilize the cross-match statistical test to compare multivariate distributions and assess the solutions produced by 114 algorithms from the MEALPY library. These findings are incorporated into an empirical analysis aiming to identify algorithms with similar search behaviors.

Updated: 2025-07-02 12:51:27

标题: 通过搜索行为分析比较优化算法

摘要: 最近，数值优化领域出现了大量“新颖”的元启发式算法的发展，这些算法受到自然或人造过程的隐喻启发，但也因为模糊了有意义的创新并未能与现有方法区分开而受到广泛批评。为了解决这些问题，我们研究了利用统计检验来比较算法的搜索行为的适用性。我们利用交叉匹配统计检验来比较多变量分布，并评估了来自MEALPY库的114种算法产生的解决方案。这些发现被纳入到一个实证分析中，旨在识别具有相似搜索行为的算法。

更新时间: 2025-07-02 12:51:27

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2507.01668v1

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance. Moreover, most existing frameworks are tightly coupled with LLM training or inference engines, making it difficult to support custom-designed engines. To address these challenges, we propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training. Specifically, we introduce a distributed data storage and transfer module that provides a unified data management and fine-grained scheduling capability in a fully streamed manner. This architecture inherently facilitates automated pipeline overlapping among RL tasks and dynamic load balancing. Moreover, we propose a producer-consumer-based asynchronous workflow engineered to minimize computational idleness by strategically deferring parameter update process within staleness thresholds. Finally, the core capability of AsynFlow is architecturally decoupled from underlying training and inference engines and encapsulated by service-oriented user interfaces, offering a modular and customizable user experience. Extensive experiments demonstrate an average of 1.59 throughput improvement compared with state-of-the-art baseline. The presented architecture in this work provides actionable insights for next-generation RL training system designs.

Updated: 2025-07-02 12:45:34

标题: AsyncFlow: 一种用于高效LLM后训练的异步流式强化学习框架

摘要: 强化学习（RL）已成为大型语言模型（LLMs）后训练阶段的关键技术。传统的任务共存RL框架存在显著的可扩展性瓶颈，而任务分离的RL框架面临复杂数据流和相应资源空闲和工作负载不平衡的挑战。此外，大多数现有框架与LLM训练或推断引擎紧密耦合，难以支持定制设计引擎。为了应对这些挑战，我们提出了AsyncFlow，一种用于高效后训练的异步流RL框架。具体来说，我们引入了一个分布式数据存储和传输模块，提供统一的数据管理和细粒度调度能力，以完全流式方式。这种架构从根本上促进了RL任务之间的自动流水线重叠和动态负载平衡。此外，我们提出了基于生产者-消费者的异步工作流引擎，通过在陈旧阈值内策略性地推迟参数更新过程来最小化计算空闲。最后，AsynFlow的核心能力在架构上与底层训练和推断引擎解耦，并由面向服务的用户界面封装，提供模块化和可定制的用户体验。大量实验表明，与最先进的基线相比，平均吞吐量提高了1.59倍。本文提出的架构为下一代RL训练系统设计提供了可操作的见解。

更新时间: 2025-07-02 12:45:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01663v1

Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond

Cardiac magnetic resonance imaging is the gold standard for non-invasive cardiac assessment, offering rich spatio-temporal views of the cardiac anatomy and physiology. Patient-level health factors, such as demographics, metabolic, and lifestyle, are known to substantially influence cardiovascular health and disease risk, yet remain uncaptured by CMR alone. To holistically understand cardiac health and to enable the best possible interpretation of an individual's disease risk, CMR and patient-level factors must be jointly exploited within an integrated framework. Recent multi-modal approaches have begun to bridge this gap, yet they often rely on limited spatio-temporal data and focus on isolated clinical tasks, thereby hindering the development of a comprehensive representation for cardiac health evaluation. To overcome these limitations, we introduce ViTa, a step toward foundation models that delivers a comprehensive representation of the heart and a precise interpretation of individual disease risk. Leveraging data from 42,000 UK Biobank participants, ViTa integrates 3D+T cine stacks from short-axis and long-axis views, enabling a complete capture of the cardiac cycle. These imaging data are then fused with detailed tabular patient-level factors, enabling context-aware insights. This multi-modal paradigm supports a wide spectrum of downstream tasks, including cardiac phenotype and physiological feature prediction, segmentation, and classification of cardiac and metabolic diseases within a single unified framework. By learning a shared latent representation that bridges rich imaging features and patient context, ViTa moves beyond traditional, task-specific models toward a universal, patient-specific understanding of cardiac health, highlighting its potential to advance clinical utility and scalability in cardiac analysis.

Updated: 2025-07-02 12:40:46

标题: 走向心脏MRI基础模型：全心评估及更多的综合视觉-表格表示

摘要: 心脏磁共振成像是无创心脏评估的黄金标准，提供了丰富的心脏解剖和生理的时空视图。患者级健康因素，如人口统计学、代谢和生活方式，已知会显著影响心血管健康和疾病风险，但仍未被单独的CMR捕获。为了全面理解心脏健康并实现对个体疾病风险的最佳解释，CMR和患者级因素必须在一个综合框架内共同利用。最近的多模态方法已开始填补这一差距，但它们通常依赖有限的时空数据，并专注于孤立的临床任务，从而阻碍了心脏健康评估的全面表达的发展。为了克服这些限制，我们引入了ViTa，这是朝着提供心脏的全面表达和个体疾病风险的精确解释的基础模型的一步。利用来自42,000名英国生物库参与者的数据，ViTa整合了来自短轴和长轴视图的3D+T影片堆叠，实现了对心脏周期的完整捕获。然后将这些成像数据与详细的表格患者级因素融合，实现了上下文感知的洞察。这种多模式范式支持一系列下游任务，包括心脏表型和生理特征预测、心脏和代谢疾病的分割和分类在一个统一框架内。通过学习一个共享的潜在表示，连接丰富的成像特征和患者背景，ViTa超越了传统的、任务特定的模型，朝着对心脏健康的普遍、个体化理解迈出了一步，凸显了它在心脏分析中提高临床效用性和可扩展性的潜力。

更新时间: 2025-07-02 12:40:46

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.13037v3

Co-Optimizing Reconfigurable Environments and Policies for Decentralized Multi-Agent Navigation

This work views the multi-agent system and its surrounding environment as a co-evolving system, where the behavior of one affects the other. The goal is to take both agent actions and environment configurations as decision variables, and optimize these two components in a coordinated manner to improve some measure of interest. Towards this end, we consider the problem of decentralized multi-agent navigation in a cluttered environment, where we assume that the layout of the environment is reconfigurable. By introducing two sub-objectives -- multi-agent navigation and environment optimization -- we propose an agent-environment co-optimization problem and develop a coordinated algorithm that alternates between these sub-objectives to search for an optimal synthesis of agent actions and environment configurations; ultimately, improving the navigation performance. Due to the challenge of explicitly modeling the relation between the agents, the environment and their performance therein, we leverage policy gradient to formulate a model-free learning mechanism within the coordinated framework. A formal convergence analysis shows that our coordinated algorithm tracks the local minimum solution of an associated time-varying non-convex optimization problem. Experiments corroborate theoretical findings and show the benefits of co-optimization. Interestingly, the results also indicate that optimized environments can offer structural guidance to de-conflict agents in motion.

Updated: 2025-07-02 12:37:15

标题: 共同优化可重构环境和策略，用于去中心化多智能体导航

摘要: 这项工作将多智能体系统及其周围环境视为一个共同进化的系统，其中一个的行为会影响另一个。目标是将智能体的行为和环境配置都视为决策变量，并协调优化这两个组件，以改善某些感兴趣的度量。为此，我们考虑了在拥挤环境中的分散式多智能体导航问题，假设环境的布局是可重新配置的。通过引入两个子目标 -- 多智能体导航和环境优化 -- 我们提出了一个智能体-环境共同优化问题，并开发了一个协调算法，交替处理这两个子目标以搜索智能体行为和环境配置的最佳综合；最终提高导航性能。由于明确建模智能体、环境和它们在其中的性能之间的关系的挑战，我们利用策略梯度在协调框架内制定了一种无模型学习机制。形式收敛分析表明，我们的协调算法跟踪一个相关时变非凸优化问题的局部最优解。实验证实了理论发现，并显示了共同优化的好处。有趣的是，结果还表明，优化的环境可以为移动中的智能体提供结构指导，以解决冲突。

更新时间: 2025-07-02 12:37:15

领域: cs.RO,cs.LG,cs.MA

下载: http://arxiv.org/abs/2403.14583v2

Reducing Variability of Multiple Instance Learning Methods for Digital Pathology

Digital pathology has revolutionized the field by enabling the digitization of tissue samples into whole slide images (WSIs). However, the high resolution and large size of WSIs present significant challenges when it comes to applying Deep Learning models. As a solution, WSIs are often divided into smaller patches with a global label (\textit{i.e., diagnostic}) per slide, instead of a (too) costly pixel-wise annotation. By treating each slide as a bag of patches, Multiple Instance Learning (MIL) methods have emerged as a suitable solution for WSI classification. A major drawback of MIL methods is their high variability in performance across different runs, which can reach up to 10-15 AUC points on the test set, making it difficult to compare different MIL methods reliably. This variability mainly comes from three factors: i) weight initialization, ii) batch (shuffling) ordering, iii) and learning rate. To address that, we introduce a Multi-Fidelity, Model Fusion strategy for MIL methods. We first train multiple models for a few epochs and average the most stable and promising ones based on validation scores. This approach can be applied to any existing MIL model to reduce performance variability. It also simplifies hyperparameter tuning and improves reproducibility while maintaining computational efficiency. We extensively validate our approach on WSI classification tasks using 2 different datasets, 3 initialization strategies and 5 MIL methods, for a total of more than 2000 experiments.

Updated: 2025-07-02 12:37:04

标题: 降低数字病理学多实例学习方法的变异性

摘要: 数字病理学通过将组织样本数字化为整张幻灯片图像（WSIs）彻底改变了该领域。然而，WSIs的高分辨率和大尺寸在应用深度学习模型时存在重大挑战。作为一种解决方案，WSIs通常被划分为较小的补丁，每张幻灯片都有一个全局标签（即诊断），而不是（过于）昂贵的像素级注释。通过将每张幻灯片视为一个补丁包，多实例学习（MIL）方法已成为WSI分类的合适解决方案。MIL方法的一个主要缺点是其在不同运行中性能的高度可变性，这可能会在测试集上达到10-15个AUC点，使得难以可靠地比较不同的MIL方法。这种可变性主要来自三个因素：i）权重初始化，ii）批处理（混洗）排序，iii）以及学习率。为了解决这个问题，我们引入了一种多保真度、模型融合策略用于MIL方法。我们首先对多个模型进行少量轮次的训练，并根据验证分数对最稳定和最有前景的模型进行平均。这种方法可以应用于任何现有的MIL模型以减少性能的可变性。它还简化了超参数调整，并提高了可重现性，同时保持了计算效率。我们在使用2个不同数据集、3种初始化策略和5种MIL方法的WSI分类任务上广泛验证了我们的方法，总共进行了2000多次实验。

更新时间: 2025-07-02 12:37:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.00292v2

Contrastive Learning and Adversarial Disentanglement for Privacy-Aware Task-Oriented Semantic Communication

Task-oriented semantic communication systems have emerged as a promising approach to achieving efficient and intelligent data transmission in next-generation networks, where only information relevant to a specific task is communicated. This is particularly important in 6G-enabled Internet of Things (6G-IoT) scenarios, where bandwidth constraints, latency requirements, and data privacy are critical. However, existing methods struggle to fully disentangle task-relevant and task-irrelevant information, leading to privacy concerns and suboptimal performance. To address this, we propose an information-bottleneck inspired method, named CLAD (contrastive learning and adversarial disentanglement). CLAD utilizes contrastive learning to effectively capture task-relevant features while employing adversarial disentanglement to discard task-irrelevant information. Additionally, due to the absence of reliable and reproducible methods to quantify the minimality of encoded feature vectors, we introduce the Information Retention Index (IRI), a comparative metric used as a proxy for the mutual information between the encoded features and the input. The IRI reflects how minimal and informative the representation is, making it highly relevant for privacy-preserving and bandwidth-efficient 6G-IoT systems. Extensive experiments demonstrate that CLAD outperforms state-of-the-art baselines in terms of semantic extraction, task performance, privacy preservation, and IRI, making it a promising building block for responsible, efficient and trustworthy 6G-IoT services.

Updated: 2025-07-02 12:36:29

标题: 对比学习和对抗性解缠：面向隐私感知任务导向的语义交流

摘要: 面向任务的语义通信系统已经成为实现下一代网络中高效智能数据传输的一种有前途的方法，其中只传递与特定任务相关的信息。在6G-物联网（6G-IoT）场景中，带宽限制、延迟要求和数据隐私至关重要。然而，现有方法往往难以完全解开与任务相关和与任务无关的信息，导致隐私问题和性能不佳。为了解决这个问题，我们提出了一种受信息瓶颈启发的方法，命名为CLAD（对比学习和对抗性解缠）。CLAD利用对比学习有效地捕捉与任务相关的特征，同时利用对抗性解缠来丢弃与任务无关的信息。此外，由于缺乏可靠和可重复的方法来量化编码特征向量的最小性，我们引入了信息保留指数（IRI），作为编码特征和输入之间互信息的代理指标。IRI反映了表示的最小性和信息量，使其对于保护隐私和节省带宽的6G-IoT系统非常相关。大量实验证明，CLAD在语义提取、任务性能、隐私保护和IRI方面优于现有技术基线，使其成为负责任、高效和值得信赖的6G-IoT服务的有前景的基础模块。

更新时间: 2025-07-02 12:36:29

领域: cs.LG,cs.AI,cs.CV,cs.IT,eess.IV,math.IT

下载: http://arxiv.org/abs/2410.22784v3

NegMerge: Sign-Consensual Weight Merging for Machine Unlearning

Machine unlearning aims to selectively remove specific knowledge from a trained model. Existing approaches, such as Task Arithmetic, fine-tune the model on the forget set to create a task vector (i.e., a direction in weight space) for subtraction from the original model's weight. However, their effectiveness is highly sensitive to hyperparameter selection, requiring extensive validation to identify the optimal vector from many fine-tuned candidates. In this paper, we propose a novel method that utilizes all fine-tuned models trained with varying hyperparameters instead of a single selection. Specifically, we aggregate the computed task vectors by retaining only the elements with consistent shared signs. The merged task vector is then negated to induce unlearning on the original model. Evaluations on zero-shot and standard image recognition tasks across twelve datasets and four backbone architectures show that our approach outperforms state-of-the-art methods while requiring similar or fewer computational resources. Code is available at https://github.com/naver-ai/negmerge.

Updated: 2025-07-02 12:33:19

标题: NegMerge: 用于机器遗忘的符号一致性权重合并

摘要: 机器遗忘旨在有选择性地从经过训练的模型中删除特定的知识。现有的方法，如任务算术，在忘记集上微调模型，以创建一个任务向量（即，在权重空间中的一个方向），用于从原始模型的权重中减去。然而，它们的有效性对超参数的选择非常敏感，需要进行大量验证以确定从众多微调候选者中找到最佳向量。在本文中，我们提出了一种新颖的方法，该方法利用了使用不同超参数训练的所有微调模型，而不是单个选择。具体来说，我们通过保留仅具有一致共享符号的元素来聚合计算的任务向量。然后，合并的任务向量被否定以诱导原始模型的遗忘。在十二个数据集和四种主干架构上对零样本和标准图像识别任务的评估显示，我们的方法优于最先进的方法，同时需要类似或更少的计算资源。代码可在https://github.com/naver-ai/negmerge获得。

更新时间: 2025-07-02 12:33:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.05583v2

SPoT: Subpixel Placement of Tokens in Vision Transformers

Vision Transformers naturally accommodate sparsity, yet standard tokenization methods confine features to discrete patch grids. This constraint prevents models from fully exploiting sparse regimes, forcing awkward compromises. We propose Subpixel Placement of Tokens (SPoT), a novel tokenization strategy that positions tokens continuously within images, effectively sidestepping grid-based limitations. With our proposed oracle-guided search, we uncover substantial performance gains achievable with ideal subpixel token positioning, drastically reducing the number of tokens necessary for accurate predictions during inference. SPoT provides a new direction for flexible, efficient, and interpretable ViT architectures, redefining sparsity as a strategic advantage rather than an imposed limitation.

Updated: 2025-07-02 12:30:32

标题: SPoT：视觉Transformer中令牌的亚像素放置

摘要: Vision Transformers自然地适应稀疏性，然而标准的分词方法将特征限制在离散的补丁网格中。这种约束阻碍了模型充分利用稀疏区域，迫使进行尴尬的妥协。我们提出了一种新颖的分词策略Subpixel Placement of Tokens（SPoT），该策略在图像中连续定位token，有效地避开了基于网格的限制。通过我们提出的oracle引导搜索，我们发现通过理想的亚像素token定位可以实现显著的性能提升，在推断过程中极大地减少了精确预测所需的token数量。SPoT为灵活、高效和可解释的ViT架构提供了新方向，重新定义稀疏性为战略优势而不是强加的限制。

更新时间: 2025-07-02 12:30:32

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.01654v1

Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!

Large language models (LLMs) face significant copyright and intellectual property challenges as the cost of training increases and model reuse becomes prevalent. While watermarking techniques have been proposed to protect model ownership, they may not be robust to continue training and development, posing serious threats to model attribution and copyright protection. This work introduces a simple yet effective approach for robust LLM fingerprinting based on intrinsic model characteristics. We discover that the standard deviation distributions of attention parameter matrices across different layers exhibit distinctive patterns that remain stable even after extensive continued training. These parameter distribution signatures serve as robust fingerprints that can reliably identify model lineage and detect potential copyright infringement. Our experimental validation across multiple model families demonstrates the effectiveness of our method for model authentication. Notably, our investigation uncovers evidence that a recently Pangu Pro MoE model released by Huawei is derived from Qwen-2.5 14B model through upcycling techniques rather than training from scratch, highlighting potential cases of model plagiarism, copyright violation, and information fabrication. These findings underscore the critical importance of developing robust fingerprinting methods for protecting intellectual property in large-scale model development and emphasize that deliberate continued training alone is insufficient to completely obscure model origins.

Updated: 2025-07-02 12:29:38

标题: LLMs的固有指纹：继续训练并不是窃取模型所需的全部！

摘要: 大型语言模型（LLMs）面临着显著的版权和知识产权挑战，随着训练成本的增加和模型复用的普及。虽然已经提出了水印技术来保护模型所有权，但它们可能不足以抵御持续训练和发展，从而对模型归属和版权保护构成严重威胁。本研究引入了一种简单而有效的基于内在模型特征的鲁棒LLM指纹技术。我们发现不同层之间的注意力参数矩阵的标准差分布呈现出独特的模式，即使在经过广泛的持续训练后仍然保持稳定。这些参数分布签名作为鲁棒指纹，可以可靠地识别模型谱系并检测潜在的版权侵权行为。我们在多个模型系列上进行的实验证实了我们的方法对模型认证的有效性。值得注意的是，我们的调查揭示了华为最近发布的Pangu Pro MoE模型是通过升级技术而不是从头开始训练的Qwen-2.5 14B模型衍生而来，突显了可能存在的模型抄袭、版权侵犯和信息伪造案例。这些发现强调了在大规模模型开发中开发鲁棒指纹方法以保护知识产权的关键重要性，并强调单纯的刻意持续训练是不足以完全掩盖模型起源的。

更新时间: 2025-07-02 12:29:38

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.03014v1

Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective

Autoregressive (AR) models have garnered significant attention in image generation for their ability to effectively capture both local and global structures within visual data. However, prevalent AR models predominantly rely on the transformer architectures, which are beset by quadratic computational complexity concerning input sequence length and substantial memory overhead due to the necessity of maintaining key-value caches. Although linear attention mechanisms have successfully reduced this burden in language models, our initial experiments reveal that they significantly degrade image generation quality because of their inability to capture critical long-range dependencies in visual data. We propose Linear Attention with Spatial-Aware Decay (LASAD), a novel attention mechanism that explicitly preserves genuine 2D spatial relationships within the flattened image sequences by computing position-dependent decay factors based on true 2D spatial location rather than 1D sequence positions. Based on this mechanism, we present LASADGen, an autoregressive image generator that enables selective attention to relevant spatial contexts with linear complexity. Experiments on ImageNet show LASADGen achieves state-of-the-art image generation performance and computational efficiency, bridging the gap between linear attention's efficiency and spatial understanding needed for high-quality generation.

Updated: 2025-07-02 12:27:06

标题: 自回归图像生成与线性复杂性：一种空间感知衰减视角

摘要: 自回归（AR）模型因其有效捕获视觉数据中的局部和全局结构而在图像生成领域引起了重大关注。然而，普遍存在的AR模型主要依赖于变压器架构，由于需要维护键值缓存，涉及输入序列长度的二次计算复杂性和大量内存开销，使其备受困扰。尽管线性注意机制已成功减轻了语言模型的负担，但我们的初步实验表明，由于无法捕捉视觉数据中的关键长程依赖关系，它们显著降低了图像生成质量。我们提出线性注意机制与空间感知衰减（LASAD），这是一种新颖的注意机制，通过基于真实的二维空间位置而不是一维序列位置计算位置相关的衰减因子，明确保留了展开的图像序列中的真实二维空间关系。基于这种机制，我们提出了LASADGen，一种自回归图像生成器，可以实现对相关空间背景的选择性关注，具有线性复杂度。ImageNet上的实验表明，LASADGen实现了最先进的图像生成性能和计算效率，弥合了线性注意效率和高质量生成所需的空间理解之间的差距。

更新时间: 2025-07-02 12:27:06

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2507.01652v1

Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an RL algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of $O(N^{\frac{3}{4}})$ up to a logarithmic factor, where $N$ is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform numerical comparisons between our method and those of the recent model-based stochastic LQ RL studies adapted to the state- and control-dependent volatility setting, demonstrating a better performance of the former in terms of regret bounds.

Updated: 2025-07-02 12:25:47

标题: 一类连续时间线性二次强化学习问题的次线性后悔

摘要: 我们研究了一类连续时间线性二次（LQ）控制问题的强化学习（RL），其中状态是标量值，运行控制奖励不存在，但状态过程的波动性取决于状态和控制变量。我们采用一种无模型的方法，既不依赖于模型参数的知识，也不依赖于其估计，并设计了一种RL算法，直接学习最优策略参数。我们的主要贡献包括引入探索进度表和对所提出算法的遗憾分析。我们提供了策略参数收敛到最优参数的收敛速度，并证明该算法实现了一个遗憾上限为$O(N^{\frac{3}{4}})$，其中$N$是学习周期的数量，除了对数因子。我们进行了模拟研究，以验证理论结果，并展示了所提出算法的有效性和可靠性。我们还对我们的方法与最近针对状态和控制相关波动性设置而改编的基于模型的随机LQ RL研究进行了数值比较，证明前者在遗憾上限方面表现更好。

更新时间: 2025-07-02 12:25:47

领域: cs.LG,cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2407.17226v5

On the Fundamental Impossibility of Hallucination Control in Large Language Models

We prove that perfect hallucination control in large language models is mathematically impossible. No LLM inference mechanism can simultaneously achieve truthful response generation, semantic information conservation, relevant knowledge revelation, and knowledge-constrained optimality. This impossibility is fundamental, arising from the mathematical structure of information aggregation itself rather than engineering limitations. The proof spans three mathematical frameworks: auction theory, proper scoring theory for probabilistic predictions, and log-sum-exp analysis for transformer architectures. In each setting, we demonstrate that information aggregation creates unavoidable violations of conservation principles. The Jensen gap in transformer probability aggregation provides a direct measure of this impossibility. These results reframe hallucination from an engineering bug to an inevitable mathematical feature of distributed intelligence. There are fundamental trade-offs between truthfulness, knowledge utilization, and response completeness, providing principled foundations for managing rather than eliminating hallucination. This work reveals deep connections between neural network inference, philosophy of knowledge and reasoning, and classical results in game theory and information theory, opening new research directions for developing beneficial AI systems within mathematical constraints.

Updated: 2025-07-02 12:24:10

标题: 关于大型语言模型中幻觉控制的根本不可能性

摘要: 我们证明在大型语言模型中，完美的幻觉控制在数学上是不可能的。没有LLM推理机制可以同时实现真实的响应生成、语义信息保留、相关知识揭示和知识约束的最优性。这种不可能性是基本的，源于信息聚合的数学结构本身，而不是工程限制。证明涵盖了三个数学框架：拍卖理论、概率预测的适当评分理论，以及变压器结构的对数和指数分析。在每个设置中，我们证明了信息聚合会产生不可避免的违反保守原则。变压器概率聚合中的Jensen差提供了这种不可能性的直接度量。这些结果将幻觉从工程缺陷转变为分布式智能的不可避免的数学特征。在真实性、知识利用和响应完整性之间存在基本的权衡，为管理幻觉而不是消除幻觉提供了有原则的基础。这项工作揭示了神经网络推理、知识和推理哲学以及博弈论和信息论中的经典结果之间的深刻联系，为在数学约束下开发有益的AI系统开辟了新的研究方向。

更新时间: 2025-07-02 12:24:10

领域: stat.ML,cs.AI,cs.CL,cs.GT,cs.LG

下载: http://arxiv.org/abs/2506.06382v2

GradMetaNet: An Equivariant Architecture for Learning on Gradients

Gradients of neural networks encode valuable information for optimization, editing, and analysis of models. Therefore, practitioners often treat gradients as inputs to task-specific algorithms, e.g. for pruning or optimization. Recent works explore learning algorithms that operate directly on gradients but use architectures that are not specifically designed for gradient processing, limiting their applicability. In this paper, we present a principled approach for designing architectures that process gradients. Our approach is guided by three principles: (1) equivariant design that preserves neuron permutation symmetries, (2) processing sets of gradients across multiple data points to capture curvature information, and (3) efficient gradient representation through rank-1 decomposition. Based on these principles, we introduce GradMetaNet, a novel architecture for learning on gradients, constructed from simple equivariant blocks. We prove universality results for GradMetaNet, and show that previous approaches cannot approximate natural gradient-based functions that GradMetaNet can. We then demonstrate GradMetaNet's effectiveness on a diverse set of gradient-based tasks on MLPs and transformers, such as learned optimization, INR editing, and estimating loss landscape curvature.

Updated: 2025-07-02 12:22:39

标题: GradMetaNet：一种用于在梯度上学习的等变构架

摘要: 神经网络的梯度梯度编码了优化、编辑和模型分析的宝贵信息。因此，从业者通常将梯度视为特定任务算法的输入，例如修剪或优化。最近的研究探索了直接在梯度上操作的学习算法，但使用的架构并非专门设计用于梯度处理，从而限制了其适用性。在本文中，我们提出了一种设计处理梯度的结构的原则性方法。我们的方法遵循三个原则：（1）保留神经元排列对称性的等变设计，（2）跨多个数据点处理梯度集以捕获曲率信息，（3）通过秩-1分解实现高效的梯度表示。基于这些原则，我们介绍了GradMetaNet，一种新颖的用于梯度学习的架构，由简单的等变块构建而成。我们证明了GradMetaNet的普适性结果，并展示了之前的方法无法近似GradMetaNet可以的自然梯度基函数。然后，我们展示了GradMetaNet在MLP和transformers上一系列基于梯度的任务中的有效性，如学习优化、INR编辑和估计损失景观曲率。

更新时间: 2025-07-02 12:22:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01649v1

Dance Dance ConvLSTM

\textit{Dance Dance Revolution} is a rhythm game consisting of songs and accompanying choreography, referred to as charts. Players press arrows on a device referred to as a dance pad in time with steps determined by the song's chart. In 2017, the authors of Dance Dance Convolution (DDC) developed an algorithm for the automatic generation of \textit{Dance Dance Revolution} charts, utilizing a CNN-LSTM architecture. We introduce Dance Dance ConvLSTM (DDCL), a new method for the automatic generation of DDR charts using a ConvLSTM based model, which improves upon the DDC methodology and substantially increases the accuracy of chart generation.

Updated: 2025-07-02 12:17:33

标题: 舞蹈舞蹈ConvLSTM

摘要: 《舞蹈革命》是一个节奏游戏，由歌曲和相关的编舞组成，称为谱表。玩家在一个被称为舞蹈垫的设备上按照歌曲谱表确定的步伐时间按压箭头。在2017年，舞蹈卷积（DDC）的作者开发了一种用于自动生成《舞蹈革命》谱表的算法，利用了CNN-LSTM架构。我们介绍了舞蹈卷积LSTM（DDCL），一种利用ConvLSTM模型进行自动生成DDR谱表的新方法，该方法改进了DDC方法，并显著提高了谱表生成的准确性。

更新时间: 2025-07-02 12:17:33

领域: cs.LG

下载: http://arxiv.org/abs/2507.01644v1

Customized Exploration of Landscape Features Driving Multi-Objective Combinatorial Optimization Performance

We present an analysis of landscape features for predicting the performance of multi-objective combinatorial optimization algorithms. We consider features from the recently proposed compressed Pareto Local Optimal Solutions Networks (C-PLOS-net) model of combinatorial landscapes. The benchmark instances are a set of rmnk-landscapes with 2 and 3 objectives and various levels of ruggedness and objective correlation. We consider the performance of three algorithms -- Pareto Local Search (PLS), Global Simple EMO Optimizer (GSEMO), and Non-dominated Sorting Genetic Algorithm (NSGA-II) - using the resolution and hypervolume metrics. Our tailored analysis reveals feature combinations that influence algorithm performance specific to certain landscapes. This study provides deeper insights into feature importance, tailored to specific rmnk-landscapes and algorithms.

Updated: 2025-07-02 12:11:41

标题: 个性化探索驱动多目标组合优化性能的景观特征

摘要: 我们提出了一项关于预测多目标组合优化算法性能的景观特征分析。我们考虑了最近提出的压缩帕累托局部最优解网络(C-PLOS-net)模型中的特征。基准实例是一组具有2和3个目标以及不同坎坷程度和目标相关性的rmnk-景观。我们考虑了三种算法的性能--帕累托局部搜索(PLS)、全局简单EMO优化器(GSEMO)和非支配排序遗传算法(NSGA-II)--使用分辨率和超体积指标。我们的定制分析揭示了影响特定景观下算法性能的特征组合。这项研究为特定rmnk-景观和算法定制的特征重要性提供了更深入的见解。

更新时间: 2025-07-02 12:11:41

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2507.01638v1

Initialization Method for Factorization Machine Based on Low-Rank Approximation for Constructing a Corrected Approximate Ising Model

This paper presents an initialization method that can approximate a given approximate Ising model with a high degree of accuracy using a factorization machine (FM), a machine learning model. The construction of an Ising models using an FM is applied to black-box combinatorial optimization problems using factorization machine with quantum annealing (FMQA). It is anticipated that the optimization performance of FMQA will be enhanced through an implementation of the warm-start method. Nevertheless, the optimal initialization method for leveraging the warm-start approach in FMQA remains undetermined. Consequently, the present study compares initialization methods based on random initialization and low-rank approximation, and then identifies a suitable one for use with warm-start in FMQA through numerical experiments. Furthermore, the properties of the initialization method by the low-rank approximation for the FM are analyzed using random matrix theory, demonstrating that the approximation accuracy of the proposed method is not significantly influenced by the specific Ising model under consideration. The findings of this study will facilitate advancements of research in the field of black-box combinatorial optimization through the use of Ising machines.

Updated: 2025-07-02 12:10:48

标题: 基于低秩逼近的分解机初始化方法用于构建修正近似伊辛模型

摘要: 本文介绍了一种初始化方法，可以使用因子分解机（FM），一种机器学习模型，对给定的近似伊辛模型进行高度准确的逼近。使用FM构建一个伊辛模型，应用于使用因子分解机和量子退火（FMQA）的黑盒组合优化问题。预计通过实施热启动方法，FMQA的优化性能将得到提升。然而，在FMQA中利用热启动方法的最佳初始化方法尚未确定。因此，本研究通过数值实验比较基于随机初始化和低秩逼近的初始化方法，然后确定一个适合在FMQA中与热启动一起使用的方法。此外，通过随机矩阵理论分析低秩逼近方法对FM的初始化方法，表明所提出的方法的逼近精度并不受特定考虑的伊辛模型的显着影响。本研究的发现将促进通过使用伊辛机器在黑盒组合优化领域的研究进展。

更新时间: 2025-07-02 12:10:48

领域: cs.LG

下载: http://arxiv.org/abs/2410.12747v3

Truthful Elicitation of Imprecise Forecasts

The quality of probabilistic forecasts is crucial for decision-making under uncertainty. While proper scoring rules incentivize truthful reporting of precise forecasts, they fall short when forecasters face epistemic uncertainty about their beliefs, limiting their use in safety-critical domains where decision-makers (DMs) prioritize proper uncertainty management. To address this, we propose a framework for scoring imprecise forecasts -- forecasts given as a set of beliefs. Despite existing impossibility results for deterministic scoring rules, we enable truthful elicitation by drawing connection to social choice theory and introducing a two-way communication framework where DMs first share their aggregation rules (e.g., averaging or min-max) used in downstream decisions for resolving forecast ambiguity. This, in turn, helps forecasters resolve indecision during elicitation. We further show that truthful elicitation of imprecise forecasts is achievable using proper scoring rules randomized over the aggregation procedure. Our approach allows DM to elicit and integrate the forecaster's epistemic uncertainty into their decision-making process, thus improving credibility.

Updated: 2025-07-02 12:09:25

标题: 真实引导不确定性预测

摘要: 概率预测的质量对于在不确定性下进行决策至关重要。尽管正确的评分规则鼓励对精确预测进行真实报告，但当预测者面临对其信念的认识不确定时，这些规则并不足够，从而限制了它们在决策者（DMs）优先考虑适当的不确定性管理的安全关键领域中的使用。为了解决这个问题，我们提出了一个评分不精确预测的框架--即以一组信念形式给出的预测。尽管对于确定性评分规则存在不可能性结果，但我们通过与社会选择理论的联系，引入了一个双向沟通框架，其中DMs首先分享他们在解决预测模糊性的下游决策中使用的聚合规则（例如平均或最小最大）。这反过来帮助预测者在调查过程中解决犹豫不决。我们进一步表明，通过在聚合过程中随机化的正确评分规则，可以实现对不精确预测的真实调查。我们的方法允许DM调查和整合预测者的认识不确定性到他们的决策过程中，从而提高可信度。

更新时间: 2025-07-02 12:09:25

领域: cs.LG

下载: http://arxiv.org/abs/2503.16395v3

Kernel Recursive Least Squares Dictionary Learning Algorithm

We propose an efficient online dictionary learning algorithm for kernel-based sparse representations. In this framework, input signals are nonlinearly mapped to a high-dimensional feature space and represented sparsely using a virtual dictionary. At each step, the dictionary is updated recursively using a novel algorithm based on the recursive least squares (RLS) method. This update mechanism works with single samples or mini-batches and maintains low computational complexity. Experiments on four datasets across different domains show that our method not only outperforms existing online kernel dictionary learning approaches but also achieves classification accuracy close to that of batch-trained models, while remaining significantly more efficient.

Updated: 2025-07-02 12:07:35

标题: 核递归最小二乘字典学习算法

摘要: 我们提出了一种高效的基于核的稀疏表示在线字典学习算法。在这个框架中，输入信号被非线性映射到高维特征空间，并使用虚拟字典稀疏表示。在每一步中，字典使用基于递归最小二乘（RLS）方法的新算法进行递归更新。这种更新机制适用于单个样本或小批量，并保持低计算复杂性。在跨不同领域的四个数据集上的实验表明，我们的方法不仅优于现有的在线核字典学习方法，而且在保持显著更高效的同时，也实现了接近批量训练模型的分类准确度。

更新时间: 2025-07-02 12:07:35

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.01636v1

EGNInfoLeaker: Unveiling the Risks of Public Key Reuse and User Identity Leakage in Blockchain

While Ethereum's discovery protocols (Discv4/ Discv5) incorporate robust cryptographic designs to protect user privacy, real-world deployment reveals critical vulnerabilities when users deviate from security guidelines. In this paper, we design a system called EGNInfoLeaker. Our study is the first work that uncovers widespread public key reuse across Ethereum's peer-to-peer networks - a practice that fundamentally undermines the protocol's privacy guarantees. Through systematic analysis of 300 real-world network snapshots, we identify 83 users controlling 483 service nodes via public key reuse, enabling precise de-anonymization through IP correlation. Using evidence collected by EGNInfoLeaker, our Graph-Based Identity Association Algorithm links users to network entities and generates comprehensive user profiles. For User27, it exposes the public key, IP, network ID, location (country/region/city), and ISP/ORG details. The EGNInfoLeaker system demonstrates how such cryptographic misuse transforms theoretical anonymity into practical identity leakage, exposing users to surveillance and targeted attacks. These findings establish that protocol security depends not only on sound design but also on strict user compliance. Going forward, our detection framework provides a foundation for enhancing real-world privacy preservation in decentralized networks.

Updated: 2025-07-02 12:07:03

标题: EGNInfoLeaker：揭示区块链中公钥重复和用户身份泄露的风险

摘要: 尽管以太坊的发现协议（Discv4/Discv5）采用了强大的加密设计来保护用户隐私，但实际部署揭示了用户偏离安全指南时存在关键漏洞。在本文中，我们设计了一个名为EGNInfoLeaker的系统。我们的研究是第一项揭示以太坊点对点网络中广泛存在的公钥重用的工作 - 这种做法从根本上破坏了协议的隐私保证。通过对300个真实网络快照的系统分析，我们确定了83个用户通过公钥重用控制了483个服务节点，从而通过IP相关性实现精确的去匿名化。利用EGNInfoLeaker收集的证据，我们的基于图的身份关联算法将用户与网络实体关联起来，并生成全面的用户资料。对于User27，它暴露了公钥、IP、网络ID、位置（国家/地区/城市）和ISP/ORG细节。EGNInfoLeaker系统展示了这种加密滥用如何将理论上的匿名性转变为实际的身份泄露，使用户暴露于监视和有针对性的攻击之中。这些发现表明，协议安全不仅取决于良好的设计，还取决于严格的用户遵从。未来，我们的检测框架为增强去中心化网络中真实世界隐私保护提供了基础。

更新时间: 2025-07-02 12:07:03

领域: cs.CR

下载: http://arxiv.org/abs/2507.01635v1

Challenges for AI in Multimodal STEM Assessments: a Human-AI Comparison

Generative AI systems have rapidly advanced, with multimodal input capabilities enabling reasoning beyond text-based tasks. In education, these advancements could influence assessment design and question answering, presenting both opportunities and challenges. To investigate these effects, we introduce a high-quality dataset of 201 university-level STEM questions, manually annotated with features such as image type, role, problem complexity, and question format. Our study analyzes how these features affect generative AI performance compared to students. We evaluate four model families with five prompting strategies, comparing results to the average of 546 student responses per question. Although the best model correctly answers on average 58.5 % of the questions using majority vote aggregation, human participants consistently outperform AI on questions involving visual components. Interestingly, human performance remains stable across question features but varies by subject, whereas AI performance is susceptible to both subject matter and question features. Finally, we provide actionable insights for educators, demonstrating how question design can enhance academic integrity by leveraging features that challenge current AI systems without increasing the cognitive burden for students.

Updated: 2025-07-02 12:06:46

标题: AI在多模态STEM评估中的挑战：人工智能与人类的比较

摘要: 生成式人工智能系统迅速发展，多模态输入能力使其能够进行超越基于文本的任务的推理。在教育领域，这些进展可能会影响评估设计和问题回答，带来机遇和挑战。为了调查这些影响，我们引入了一个高质量的数据集，其中包含201个大学级STEM问题，手动注释了诸如图像类型、角色、问题复杂性和问题格式等特征。我们的研究分析了这些特征如何影响生成式人工智能性能与学生相比。我们评估了四个模型系列，采用五种提示策略，将结果与每个问题的546个学生回答的平均值进行比较。尽管最佳模型使用多数投票聚合平均正确回答了58.5%的问题，但人类参与者在涉及视觉组件的问题上始终优于人工智能。有趣的是，人类表现在问题特征上保持稳定，但根据学科而变化，而人工智能性能易受学科和问题特征的影响。最后，我们为教育工作者提供了可行的见解，展示了如何通过利用挑战当前人工智能系统的特征来增强学术诚信，而不增加学生的认知负担。

更新时间: 2025-07-02 12:06:46

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2507.03013v1

Depth Anything at Any Condition

We present Depth Anything at Any Condition (DepthAnything-AC), a foundation monocular depth estimation (MDE) model capable of handling diverse environmental conditions. Previous foundation MDE models achieve impressive performance across general scenes but not perform well in complex open-world environments that involve challenging conditions, such as illumination variations, adverse weather, and sensor-induced distortions. To overcome the challenges of data scarcity and the inability of generating high-quality pseudo-labels from corrupted images, we propose an unsupervised consistency regularization finetuning paradigm that requires only a relatively small amount of unlabeled data. Furthermore, we propose the Spatial Distance Constraint to explicitly enforce the model to learn patch-level relative relationships, resulting in clearer semantic boundaries and more accurate details. Experimental results demonstrate the zero-shot capabilities of DepthAnything-AC across diverse benchmarks, including real-world adverse weather benchmarks, synthetic corruption benchmarks, and general benchmarks. Project Page: https://ghost233lism.github.io/depthanything-AC-page Code: https://github.com/HVision-NKU/DepthAnythingAC

Updated: 2025-07-02 12:05:57

标题: 在任何条件下的深度任何事情

摘要: 我们提出了Depth Anything at Any Condition（DepthAnything-AC），这是一个能够处理各种环境条件的基础单目深度估计（MDE）模型。先前的基础MDE模型在一般场景中表现出色，但在涉及具有挑战性条件的复杂开放世界环境中表现不佳，如光照变化、恶劣天气和传感器引起的失真。为了克服数据稀缺性和无法从损坏图像中生成高质量伪标签的问题，我们提出了一种无监督一致性正则化微调范式，只需要相对较少的未标记数据。此外，我们提出了空间距离约束，明确强制模型学习面板级相对关系，从而产生更清晰的语义边界和更准确的细节。实验结果展示了DepthAnything-AC在各种基准测试中的零射能力，包括真实世界的恶劣天气基准测试、合成损坏基准测试和一般基准测试。项目页面：https://ghost233lism.github.io/depthanything-AC-page 代码：https://github.com/HVision-NKU/DepthAnythingAC

更新时间: 2025-07-02 12:05:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.01634v1

Effective Explanations for Belief-Desire-Intention Robots: When and What to Explain

When robots perform complex and context-dependent tasks in our daily lives, deviations from expectations can confuse users. Explanations of the robot's reasoning process can help users to understand the robot intentions. However, when to provide explanations and what they contain are important to avoid user annoyance. We have investigated user preferences for explanation demand and content for a robot that helps with daily cleaning tasks in a kitchen. Our results show that users want explanations in surprising situations and prefer concise explanations that clearly state the intention behind the confusing action and the contextual factors that were relevant to this decision. Based on these findings, we propose two algorithms to identify surprising actions and to construct effective explanations for Belief-Desire-Intention (BDI) robots. Our algorithms can be easily integrated in the BDI reasoning process and pave the way for better human-robot interaction with context- and user-specific explanations.

Updated: 2025-07-02 12:02:07

标题: 信念-欲望-意图机器人的有效解释：何时以及如何解释

摘要: 当机器人在我们日常生活中执行复杂且依赖于环境的任务时，与期望相悖的情况可能会让用户感到困惑。机器人推理过程的解释可以帮助用户理解机器人的意图。然而，何时提供解释以及解释内容是重要的，以避免用户感到烦恼。我们调查了用户对在厨房中协助日常清洁任务的机器人的解释需求和内容的偏好。我们的结果显示，用户希望在出现令人惊讶的情况下得到解释，并且更喜欢简明扼要地说明混乱行为背后的意图以及相关决定的环境因素。基于这些发现，我们提出了两种算法，用于识别令人惊讶的行为并构建有效的Belief-Desire-Intention（BDI）机器人解释。我们的算法可以轻松集成到BDI推理过程中，并为具有上下文和用户特定解释的更好人机交互铺平道路。

更新时间: 2025-07-02 12:02:07

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.02016v1

Non-collective Calibrating Strategy for Time Series Forecasting

Deep learning-based approaches have demonstrated significant advancements in time series forecasting. Despite these ongoing developments, the complex dynamics of time series make it challenging to establish the rule of thumb for designing the golden model architecture. In this study, we argue that refining existing advanced models through a universal calibrating strategy can deliver substantial benefits with minimal resource costs, as opposed to elaborating and training a new model from scratch. We first identify a multi-target learning conflict in the calibrating process, which arises when optimizing variables across time steps, leading to the underutilization of the model's learning capabilities. To address this issue, we propose an innovative calibrating strategy called Socket+Plug (SoP). This approach retains an exclusive optimizer and early-stopping monitor for each predicted target within each Plug while keeping the fully trained Socket backbone frozen. The model-agnostic nature of SoP allows it to directly calibrate the performance of any trained deep forecasting models, regardless of their specific architectures. Extensive experiments on various time series benchmarks and a spatio-temporal meteorological ERA5 dataset demonstrate the effectiveness of SoP, achieving up to a 22% improvement even when employing a simple MLP as the Plug (highlighted in Figure 1). Code is available at https://github.com/hanyuki23/SoP.

Updated: 2025-07-02 12:02:03

标题: 时间序列预测的非集合校准策略

摘要: 基于深度学习的方法在时间序列预测方面取得了显著进展。尽管这些持续的发展，时间序列的复杂动态使得建立黄金模型架构的经验法则具有挑战性。在本研究中，我们认为通过通用校准策略对现有先进模型进行优化可以在资源成本最小的情况下获得实质性的好处，相较于从头开始设计和训练一个新模型。我们首先在校准过程中确定了一个多目标学习冲突，该冲突在优化各个时间步的变量时产生，导致模型学习能力的利用不足。为了解决这个问题，我们提出了一个名为Socket+Plug (SoP)的创新校准策略。该方法在每个Plug中保留了一个独立的优化器和早停监视器，同时保持完全训练的Socket骨干结构冻结。SoP的模型无关性使其能够直接校准任何经过训练的深度预测模型的性能，而不受其具体架构的限制。对各种时间序列基准和时空气象ERA5数据集进行了广泛实验，证明了SoP的有效性，即使使用简单的MLP作为Plug也能实现高达22%的改进（如图1所示）。代码可在https://github.com/hanyuki23/SoP 获取。

更新时间: 2025-07-02 12:02:03

领域: cs.LG

下载: http://arxiv.org/abs/2506.03176v2

Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation

Neural Radiance Fields (NeRF) have recently emerged as a paradigm for 3D reconstruction from multiview satellite imagery. However, state-of-the-art NeRF methods are typically constrained to small scenes due to the memory footprint during training, which we study in this paper. Previous work on large-scale NeRFs palliate this by dividing the scene into NeRFs. This paper introduces Snake-NeRF, a framework that scales to large scenes. Our out-of-core method eliminates the need to load all images and networks simultaneously, and operates on a single device. We achieve this by dividing the region of interest into NeRFs that 3D tile without overlap. Importantly, we crop the images with overlap to ensure each NeRFs is trained with all the necessary pixels. We introduce a novel $2\times 2$ 3D tile progression strategy and segmented sampler, which together prevent 3D reconstruction errors along the tile edges. Our experiments conclude that large satellite images can effectively be processed with linear time complexity, on a single GPU, and without compromise in quality.

Updated: 2025-07-02 11:59:36

标题: 瓦片和滑动：一个从局部到全球3D地球观测扩展NeRF的新框架

摘要: 最近，神经光辐射场（NeRF）已经成为从多视角卫星图像进行3D重建的范式。然而，目前的NeRF方法通常受限于训练过程中的内存占用，因此通常只能用于小场景，我们在本文中对此进行了研究。先前关于大规模NeRF的工作通过将场景分割成NeRFs来缓解这一问题。本文介绍了Snake-NeRF，一个可适用于大场景的框架。我们的离线方法消除了同时加载所有图像和网络的需要，并且在单个设备上运行。我们通过将感兴趣的区域分割成不重叠的NeRFs进行实现。重要的是，我们使用重叠裁剪图像，以确保每个NeRFs都能使用所有必要的像素进行训练。我们引入了一种新的$2\times 2$ 3D瓦片进展策略和分段采样器，共同防止沿着瓦片边缘出现3D重建错误。我们的实验结论是，大型卫星图像可以在单个GPU上以线性时间复杂度进行有效处理，并且不会降低质量。

更新时间: 2025-07-02 11:59:36

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2507.01631v1

Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss

The task of Human-Object conTact (HOT) detection involves identifying the specific areas of the human body that are touching objects. Nevertheless, current models are restricted to just one type of image, often leading to too much segmentation in areas with little interaction, and struggling to maintain category consistency within specific regions. To tackle this issue, a HOT framework, termed \textbf{P3HOT}, is proposed, which blends \textbf{P}rompt guidance and human \textbf{P}roximal \textbf{P}erception. To begin with, we utilize a semantic-driven prompt mechanism to direct the network's attention towards the relevant regions based on the correlation between image and text. Then a human proximal perception mechanism is employed to dynamically perceive key depth range around the human, using learnable parameters to effectively eliminate regions where interactions are not expected. Calculating depth resolves the uncertainty of the overlap between humans and objects in a 2D perspective, providing a quasi-3D viewpoint. Moreover, a Regional Joint Loss (RJLoss) has been created as a new loss to inhibit abnormal categories in the same area. A new evaluation metric called ``AD-Acc.'' is introduced to address the shortcomings of existing methods in addressing negative samples. Comprehensive experimental results demonstrate that our approach achieves state-of-the-art performance in four metrics across two benchmark datasets. Specifically, our model achieves an improvement of \textbf{0.7}$\uparrow$, \textbf{2.0}$\uparrow$, \textbf{1.6}$\uparrow$, and \textbf{11.0}$\uparrow$ in SC-Acc., mIoU, wIoU, and AD-Acc. metrics, respectively, on the HOT-Annotated dataset. Code is available at https://github.com/YuxiaoWang-AI/P3HOT.

Updated: 2025-07-02 11:59:32

标题: 及时指导和人类近端感知在区域联合损失热点预测中的应用

摘要: 人-物接触（HOT）检测任务涉及识别人体与物体接触的特定区域。然而，当前模型仅限于一种类型的图像，往往导致在互动较少的区域过多分割，并且难以保持特定区域内的类别一致性。为解决这一问题，提出了一种名为\textbf{P3HOT}的HOT框架，它融合了\textbf{P}rompt指导和人类\textbf{P}近\textbf{P}感知。首先，我们利用语义驱动的提示机制，根据图像和文本之间的相关性引导网络的注意力到相关区域。然后，采用人类近端感知机制动态感知人体周围的关键深度范围，使用可学习参数有效消除不期望的互动区域。计算深度解决了二维视角中人与物体重叠的不确定性，提供了准三维视角。此外，创建了一个名为“RJLoss”的区域联合损失作为新损失，用于抑制同一区域中的异常类别。引入了一种名为“AD-Acc.”的新评估指标，以解决现有方法在处理负样本时的缺点。全面的实验结果表明，我们的方法在两个基准数据集上的四个度量标准中实现了最先进的性能。具体来说，我们的模型在HOT-Annotated数据集上的SC-Acc.、mIoU、wIoU和AD-Acc.度量标准分别提高了\textbf{0.7}$\uparrow$、\textbf{2.0}$\uparrow$、\textbf{1.6}$\uparrow$和\textbf{11.0}$\uparrow$。代码可在https://github.com/YuxiaoWang-AI/P3HOT找到。

更新时间: 2025-07-02 11:59:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.01630v1

Concat-ID: Towards Universal Identity-Preserving Video Synthesis

We present Concat-ID, a unified framework for identity-preserving video generation. Concat-ID employs variational autoencoders to extract image features, which are then concatenated with video latents along the sequence dimension. It relies exclusively on inherent 3D self-attention mechanisms to incorporate them, eliminating the need for additional parameters or modules. A novel cross-video pairing strategy and a multi-stage training regimen are introduced to balance identity consistency and facial editability while enhancing video naturalness. Extensive experiments demonstrate Concat-ID's superiority over existing methods in both single and multi-identity generation, as well as its seamless scalability to multi-subject scenarios, including virtual try-on and background-controllable generation. Concat-ID establishes a new benchmark for identity-preserving video synthesis, providing a versatile and scalable solution for a wide range of applications.

Updated: 2025-07-02 11:55:35

标题: Concat-ID：朝向通用身份保持视频合成

摘要: 我们提出了Concat-ID，这是一个保持身份的视频生成的统一框架。Concat-ID使用变分自动编码器提取图像特征，然后沿着序列维度与视频潜变量连接。它完全依赖于固有的3D自注意机制来整合它们，消除了额外参数或模块的需求。引入了一种新颖的跨视频配对策略和多阶段训练计划，以平衡身份一致性和面部可编辑性，同时增强视频的自然性。大量实验证明了Concat-ID在单一和多身份生成方面优于现有方法，以及其无缝扩展到多主体场景，包括虚拟试穿和可控背景生成。Concat-ID为保持身份的视频合成建立了一个新的基准，为各种应用提供了多功能且可扩展的解决方案。

更新时间: 2025-07-02 11:55:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.14151v3

Adapting Rule Representation With Four-Parameter Beta Distribution for Learning Classifier Systems

Rule representations significantly influence the search capabilities and decision boundaries within the search space of Learning Classifier Systems (LCSs), a family of rule-based machine learning systems that evolve interpretable models through evolutionary processes. However, it is very difficult to choose an appropriate rule representation for each problem. Additionally, some problems benefit from using different representations for different subspaces within the input space. Thus, an adaptive mechanism is needed to choose an appropriate rule representation for each rule in LCSs. This article introduces a flexible rule representation using a four-parameter beta distribution and integrates it into a fuzzy-style LCS. The four-parameter beta distribution can form various function shapes, and this flexibility enables our LCS to automatically select appropriate representations for different subspaces. Our rule representation can represent crisp/fuzzy decision boundaries in various boundary shapes, such as rectangles and bells, by controlling four parameters, compared to the standard representations such as trapezoidal ones. Leveraging this flexibility, our LCS is designed to adapt the appropriate rule representation for each subspace. Moreover, our LCS incorporates a generalization bias favoring crisp rules where feasible, enhancing model interpretability without compromising accuracy. Experimental results on real-world classification tasks show that our LCS achieves significantly superior test accuracy and produces more compact rule sets. Our implementation is available at https://github.com/YNU-NakataLab/Beta4-UCS. An extended abstract related to this work is available at https://doi.org/10.36227/techrxiv.174900805.59801248/v1.

Updated: 2025-07-02 11:42:55

标题: 使用四参数贝塔分布调整规则表示以用于学习分类器系统

摘要: 规则表示显着影响学习分类器系统（LCS）搜索空间内的搜索能力和决策边界，LCS是一类基于规则的机器学习系统，通过进化过程演变出可解释的模型。然而，为每个问题选择适当的规则表示非常困难。此外，有些问题受益于在输入空间的不同子空间中使用不同的表示。因此，需要一个自适应机制为LCS中的每个规则选择适当的规则表示。本文介绍了一种灵活的规则表示，使用四参数贝塔分布，并将其集成到模糊式LCS中。四参数贝塔分布可以形成各种函数形状，这种灵活性使得我们的LCS能够自动选择适当的表示来处理不同的子空间。与标准表示（如梯形表示）相比，我们的规则表示可以通过控制四个参数代表不同边界形状的明确/模糊决策边界，如矩形和钟形。利用这种灵活性，我们的LCS被设计为为每个子空间自适应选择适当的规则表示。此外，我们的LCS包含一种偏向于明确规则的概括偏差，提高了模型的可解释性而不损害准确性。在真实分类任务上的实验结果表明，我们的LCS实现了显着优越的测试准确性，并生成了更紧凑的规则集。我们的实现可在https://github.com/YNU-NakataLab/Beta4-UCS获取。与此工作相关的扩展摘要可在https://doi.org/10.36227/techrxiv.174900805.59801248/v1获取。

更新时间: 2025-07-02 11:42:55

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2506.03602v2

Query Complexity of Classical and Quantum Channel Discrimination

Quantum channel discrimination has been studied from an information-theoretic perspective, wherein one is interested in the optimal decay rate of error probabilities as a function of the number of unknown channel accesses. In this paper, we study the query complexity of quantum channel discrimination, wherein the goal is to determine the minimum number of channel uses needed to reach a desired error probability. To this end, we show that the query complexity of binary channel discrimination depends logarithmically on the inverse error probability and inversely on the negative logarithm of the (geometric and Holevo) channel fidelity. As a special case of these findings, we precisely characterize the query complexity of discriminating two classical channels and two classical-quantum channels. Furthermore, by obtaining a tighter characterization of the sample complexity of quantum hypothesis testing, including prior probabilities, we provide a more precise characterization of query complexity when the error probability does not exceed a fixed threshold. We also provide lower and upper bounds on the query complexity of binary asymmetric channel discrimination and multiple quantum channel discrimination. For the former, the query complexity depends on the geometric R\'enyi and Petz R\'enyi channel divergences, while for the latter, it depends on the negative logarithm of the (geometric and Uhlmann) channel fidelity. For multiple channel discrimination, the upper bound scales as the logarithm of the number of channels.

Updated: 2025-07-02 11:41:06

标题: 经典和量子通道辨识的查询复杂性

摘要: 量子信道辨识已从信息论的角度进行研究，其中人们对错误概率的最优衰减速率感兴趣，这个速率是未知信道访问次数的函数。在本文中，我们研究了量子信道辨识的查询复杂度，其目标是确定达到所需错误概率所需的最小信道使用次数。为此，我们展示了二进制信道辨识的查询复杂度与逆错误概率的对数及（几何和Holevo）信道保真度的负对数成正比。作为这些发现的一个特例，我们精确地表征了区分两个经典信道和两个经典-量子信道的查询复杂度。此外，通过获得量子假设测试的样本复杂度更紧致的表征，包括先验概率，我们在错误概率不超过固定阈值时提供了更精确的查询复杂度表征。我们还提供了二进制非对称信道辨识和多重量子信道辨识的查询复杂度的下限和上限。对于前者，查询复杂度取决于几何Rényi和PetzRényi信道散度，而对于后者，则取决于（几何和Uhlmann）信道保真度的负对数。对于多信道辨识，上限与信道数量的对数成比例。

更新时间: 2025-07-02 11:41:06

领域: quant-ph,cs.IT,cs.LG,math.IT,math.ST,stat.TH

下载: http://arxiv.org/abs/2504.12989v2

Using Large Language Models to Categorize Strategic Situations and Decipher Motivations Behind Human Behaviors

By varying prompts to a large language model, we can elicit the full range of human behaviors in a variety of different scenarios in classic economic games. By analyzing which prompts elicit which behaviors, we can categorize and compare different strategic situations, which can also help provide insight into what different economic scenarios induce people to think about. We discuss how this provides a first step towards a non-standard method of inferring (deciphering) the motivations behind the human behaviors. We also show how this deciphering process can be used to categorize differences in the behavioral tendencies of different populations.

Updated: 2025-07-02 11:38:20

标题: 使用大型语言模型对战略情境进行分类和解读人类行为背后的动机

摘要: 通过改变大型语言模型的提示，我们可以在经典经济游戏的各种不同场景中引发全方位的人类行为。通过分析哪些提示引发了哪些行为，我们可以对不同的战略情况进行分类和比较，这也有助于洞察不同经济情景引发人们思考的内容。我们讨论了这如何为推断人类行为背后的动机提供了第一步，这也展示了这种推断过程如何用于分类不同人群行为倾向的差异。

更新时间: 2025-07-02 11:38:20

领域: cs.AI

下载: http://arxiv.org/abs/2503.15752v5

Enhanced Influence-aware Group Recommendation for Online Media Propagation

Group recommendation over social media streams has attracted significant attention due to its wide applications in domains such as e-commerce, entertainment, and online news broadcasting. By leveraging social connections and group behaviours, group recommendation (GR) aims to provide more accurate and engaging content to a set of users rather than individuals. Recently, influence-aware GR has emerged as a promising direction, as it considers the impact of social influence on group decision-making. In earlier work, we proposed Influence-aware Group Recommendation (IGR) to solve this task. However, this task remains challenging due to three key factors: the large and ever-growing scale of social graphs, the inherently dynamic nature of influence propagation within user groups, and the high computational overhead of real-time group-item matching. To tackle these issues, we propose an Enhanced Influence-aware Group Recommendation (EIGR) framework. First, we introduce a Graph Extraction-based Sampling (GES) strategy to minimise redundancy across multiple temporal social graphs and effectively capture the evolving dynamics of both groups and items. Second, we design a novel DYnamic Independent Cascade (DYIC) model to predict how influence propagates over time across social items and user groups. Finally, we develop a two-level hash-based User Group Index (UG-Index) to efficiently organise user groups and enable real-time recommendation generation. Extensive experiments on real-world datasets demonstrate that our proposed framework, EIGR, consistently outperforms state-of-the-art baselines in both effectiveness and efficiency.

Updated: 2025-07-02 11:34:17

标题: 增强的在线媒体传播影响感知群组推荐

摘要: 社交媒体流上的群组推荐引起了人们的极大关注，因为它在电子商务、娱乐和在线新闻广播等领域具有广泛的应用。通过利用社交关系和群体行为，群组推荐（GR）旨在为一组用户提供比个人更准确和吸引人的内容。最近，具有影响意识的群组推荐已经成为一个有前途的方向，因为它考虑了社交影响对群体决策的影响。在早期的工作中，我们提出了影响感知群组推荐（IGR）来解决这一任务。然而，由于三个关键因素，这一任务仍然具有挑战性：社交图的规模庞大且不断增长，用户群体内影响传播的固有动态性，以及实时群体-项目匹配的高计算开销。为了解决这些问题，我们提出了一个增强的影响感知群组推荐（EIGR）框架。首先，我们引入了基于图提取的采样（GES）策略，以减少多个时间社交图之间的冗余，并有效捕捉群体和项目的不断演变动态。其次，我们设计了一个新颖的动态独立级联（DYIC）模型，以预测随时间如何在社交项目和用户群体之间传播影响。最后，我们开发了一个基于两级哈希的用户群组索引（UG-Index），以有效组织用户群组并实现实时推荐生成。对真实世界数据集的大量实验表明，我们提出的框架EIGR在效果和效率上始终优于最先进的基线。

更新时间: 2025-07-02 11:34:17

领域: cs.IR,cs.AI,cs.DB

下载: http://arxiv.org/abs/2507.01616v1

EDGChain-E: A Decentralized Git-Based Framework for Versioning Encrypted Energy Data

This paper proposes a new decentralized framework, named EDGChain-E (Encrypted-Data-Git Chain for Energy), designed to manage version-controlled, encrypted energy data using blockchain and the InterPlanetary File System. The framework incorporates a Decentralized Autonomous Organization (DAO) to orchestrate collaborative data governance across the lifecycle of energy research and operations, such as smart grid monitoring, demand forecasting, and peer-to-peer energy trading. In EDGChain-E, initial commits capture the full encrypted datasets-such as smart meter readings or grid telemetry-while subsequent updates are tracked as encrypted Git patches, ensuring integrity, traceability, and privacy. This versioning mechanism supports secure collaboration across multiple stakeholders (e.g., utilities, researchers, regulators) without compromising sensitive or regulated information. We highlight the framework's capability to maintain FAIR-compliant (Findable, Accessible, Interoperable, Reusable) provenance of encrypted data. By embedding hash-based content identifiers in Merkle trees, the system enables transparent, auditable, and immutable tracking of data changes, thereby supporting reproducibility and trust in decentralized energy applications.

Updated: 2025-07-02 11:33:12

标题: EDGChain-E：用于对加密能源数据进行版本控制的去中心化基于Git的框架

摘要: 本文提出了一个新的去中心化框架，名为EDGChain-E（加密数据Git链能源），旨在使用区块链和星际文件系统管理版本控制的加密能源数据。该框架整合了去中心化自治组织（DAO），以协调能源研究和运营生命周期中的协作数据治理，如智能电网监测、需求预测和点对点能源交易。在EDGChain-E中，初始提交捕获完整的加密数据集，如智能电表读数或电网遥测，而随后的更新被跟踪为加密的Git补丁，确保完整性、可追溯性和隐私。这种版本控制机制支持跨多个利益相关者（如公用事业公司、研究人员、监管机构）的安全协作，而不会泄露敏感或受监管信息。我们强调了该框架能够维护加密数据的FAIR兼容（可查找、可访问、可互操作、可重复使用）来源。通过在默克尔树中嵌入基于哈希的内容标识符，该系统实现了数据变更的透明、可审计和不可变跟踪，从而支持去中心化能源应用的再现性和信任。

更新时间: 2025-07-02 11:33:12

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2507.01615v1

When Less Is More: Binary Feedback Can Outperform Ordinal Comparisons in Ranking Recovery

Paired comparison data, where users evaluate items in pairs, play a central role in ranking and preference learning tasks. While ordinal comparison data intuitively offer richer information than binary comparisons, this paper challenges that conventional wisdom. We propose a general parametric framework for modeling ordinal paired comparisons without ties. The model adopts a generalized additive structure, featuring a link function that quantifies the preference difference between two items and a pattern function that governs the distribution over ordinal response levels. This framework encompasses classical binary comparison models as special cases, by treating binary responses as binarized versions of ordinal data. Within this framework, we show that binarizing ordinal data can significantly improve the accuracy of ranking recovery. Specifically, we prove that under the counting algorithm, the ranking error associated with binary comparisons exhibits a faster exponential convergence rate than that of ordinal data. Furthermore, we characterize a substantial performance gap between binary and ordinal data in terms of a signal-to-noise ratio (SNR) determined by the pattern function. We identify the pattern function that minimizes the SNR and maximizes the benefit of binarization. Extensive simulations and a real application on the MovieLens dataset further corroborate our theoretical findings.

Updated: 2025-07-02 11:30:38

标题: 当少即是多：二进制反馈在排名恢复中可以胜过序数比较

摘要: 成对比较数据，在这种数据中，用户评估两个项目的差异，在排名和偏好学习任务中起着核心作用。虽然序数比较数据在直观上提供了比二进制比较更丰富的信息，但本文对这种传统智慧提出了挑战。我们提出了一个通用的参数化框架，用于建模没有关系的序数配对比较。该模型采用了广义可加结构，包括一个量化两个项目之间偏好差异的链接函数和一个控制序数响应水平分布的模式函数。这个框架包括了经典的二进制比较模型作为特例，通过将二进制响应视为序数数据的二值化版本。在这个框架内，我们证明了二值化序数数据可以显著提高排名恢复的准确性。具体地，我们证明在计数算法下，与序数数据相比，与二进制比较相关的排名误差表现出更快的指数收敛速度。此外，我们根据模式函数确定的信噪比（SNR）表现出二进制和序数数据之间的巨大性能差距。我们确定了最小化SNR并最大化二值化效益的模式函数。大量模拟和对MovieLens数据集的实际应用进一步证实了我们的理论发现。

更新时间: 2025-07-02 11:30:38

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.01613v1

Upper and lower bounds for the Lipschitz constant of random neural networks

Empirical studies have widely demonstrated that neural networks are highly sensitive to small, adversarial perturbations of the input. The worst-case robustness against these so-called adversarial examples can be quantified by the Lipschitz constant of the neural network. In this paper, we study upper and lower bounds for the Lipschitz constant of random ReLU neural networks. Specifically, we assume that the weights and biases follow a generalization of the He initialization, where general symmetric distributions for the biases are permitted. For deep networks of fixed depth and sufficiently large width, our established upper bound is larger than the lower bound by a factor that is logarithmic in the width. In contrast, for shallow neural networks we characterize the Lipschitz constant up to an absolute numerical constant that is independent of all parameters.

Updated: 2025-07-02 11:28:06

标题: 随机神经网络的利普希茨常数的上下界

摘要: 实证研究广泛表明，神经网络对输入的微小对抗性扰动非常敏感。对抗性示例的最坏情况鲁棒性可以通过神经网络的Lipschitz常数来量化。在本文中，我们研究了随机ReLU神经网络Lipschitz常数的上下界。具体来说，我们假设权重和偏置遵循He初始化的推广，其中允许偏置的一般对称分布。对于固定深度和足够大宽度的深度网络，我们建立的上界比下界大一个宽度对数的因子。相反，对于浅神经网络，我们描述了Lipschitz常数，直到一个与所有参数无关的绝对数值常数。

更新时间: 2025-07-02 11:28:06

领域: stat.ML,cs.LG,math.PR,68T07, 26A16, 60B20, 60G15

下载: http://arxiv.org/abs/2311.01356v4

Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems

The widespread use of deep learning face recognition raises several security concerns. Although prior works point at existing vulnerabilities, DNN backdoor attacks against real-life, unconstrained systems dealing with images captured in the wild remain a blind spot of the literature. This paper conducts the first system-level study of backdoors in deep learning-based face recognition systems. This paper yields four contributions by exploring the feasibility of DNN backdoors on these pipelines in a holistic fashion. We demonstrate for the first time two backdoor attacks on the face detection task: face generation and face landmark shift attacks. We then show that face feature extractors trained with large margin losses also fall victim to backdoor attacks. Combining our models, we then show using 20 possible pipeline configurations and 15 attack cases that a single backdoor enables an attacker to bypass the entire function of a system. Finally, we provide stakeholders with several best practices and countermeasures.

Updated: 2025-07-02 11:21:27

标题: 无约束人脸识别系统后门攻击的生存能力

摘要: 深度学习人脸识别的广泛应用引起了几个安全问题。尽管先前的研究指出了现有的漏洞，但针对在野外捕捉的图像进行处理的实际生活中的无约束系统的DNN后门攻击仍然是文献中的盲点。本文对基于深度学习的人脸识别系统中的后门进行了首次系统级研究。本文通过以整体方式探讨DNN后门在这些管道上的可行性，提供了四个贡献。我们首次展示了两种针对人脸检测任务的后门攻击：人脸生成和人脸关键点移位攻击。然后我们展示了使用大边际损失训练的人脸特征提取器也会成为后门攻击的受害者。结合我们的模型，我们使用20种可能的管道配置和15种攻击案例展示，一个单一的后门可以使攻击者绕过系统的整个功能。最后，我们提供了一些利益相关者的最佳实践和对策。

更新时间: 2025-07-02 11:21:27

领域: cs.CV,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2507.01607v1

Teacher training in the age of AI: Impact on AI Literacy and Teachers' Attitudes

The rapid integration of artificial intelligence (AI) in education requires teachers to develop AI competencies while preparing students for a society influenced by AI. This study evaluates the impact of an online teacher training program on German in-service teachers' AI literacy, usage behaviors, and attitudes toward AI. A pre-post design study was conducted with teachers (N1 = 291 for AI literacy, N2 = 436 for attitude assessment) participating in the course. The program combined synchronous and asynchronous learning formats, including webinars, self-paced modules, and practical projects. The participants exhibited notable improvements across all domains: AI literacy scores increased significantly, and all attitude items regarding AI usage and integration demonstrated significant positive changes. Teachers reported increased confidence in AI integration. Structured teacher training programs effectively enhance AI literacy and foster positive attitudes toward AI in education.

Updated: 2025-07-02 11:09:47

标题: 人工智能时代的教师培训：对人工智能素养和教师态度的影响

摘要: 教育领域中人工智能（AI）的快速整合要求教师在为受AI影响的社会培养学生的同时，发展AI能力。本研究评估了一项在线教师培训计划对德国在职教师的AI素养、使用行为和对AI的态度的影响。研究采用了一项前后设计研究，参与该课程的教师人数分别为291人（用于AI素养评估）和436人（用于态度评估）。该计划结合了同步和异步学习格式，包括网络研讨会、自主学习模块和实践项目。参与者在所有领域表现出显著改善：AI素养得分显著增加，所有与AI使用和整合相关的态度项目都呈现出显著积极变化。教师报告表示他们对AI整合的信心得到增强。结构化的教师培训计划有效地提升了教师的AI素养，并促进了对教育中AI的积极态度。

更新时间: 2025-07-02 11:09:47

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2507.03011v1

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis

This paper introduces Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning (TARO), a novel framework for high-fidelity and temporally coherent video-to-audio synthesis. Built upon flow-based transformers, which offer stable training and continuous transformations for enhanced synchronization and audio quality, TARO introduces two key innovations: (1) Timestep-Adaptive Representation Alignment (TRA), which dynamically aligns latent representations by adjusting alignment strength based on the noise schedule, ensuring smooth evolution and improved fidelity, and (2) Onset-Aware Conditioning (OAC), which integrates onset cues that serve as sharp event-driven markers of audio-relevant visual moments to enhance synchronization with dynamic visual events. Extensive experiments on the VGGSound and Landscape datasets demonstrate that TARO outperforms prior methods, achieving relatively 53% lower Frechet Distance (FD), 29% lower Frechet Audio Distance (FAD), and a 97.19% Alignment Accuracy, highlighting its superior audio quality and synchronization precision.

Updated: 2025-07-02 11:06:04

标题: TARO：基于时间步自适应表示对齐和基于起始点感知条件的视频到音频同步合成

摘要: 本文介绍了Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning（TARO），这是一个用于高保真和时间上连贯的视频到音频合成的新框架。TARO建立在基于流的变压器之上，这些变压器提供稳定的训练和连续的转换，以增强同步性和音频质量。TARO引入了两个关键创新：（1）Timestep-Adaptive Representation Alignment（TRA），通过根据噪声时间表调整对齐强度动态对齐潜在表示，确保平滑演变和改进的保真度；（2）Onset-Aware Conditioning（OAC），集成了起始线索，这些线索作为音频相关视觉时刻的尖锐事件驱动标记，以增强与动态视觉事件的同步。对VGGSound和Landscape数据集的大量实验表明，TARO优于先前的方法，实现相对低53%的Frechet距离（FD），低29%的Frechet音频距离（FAD）和97.19%的对齐准确度，突出了其优越的音频质量和同步精度。

更新时间: 2025-07-02 11:06:04

领域: cs.SD,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.05684v2

Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems

Traditional Data+AI systems utilize data-driven techniques to optimize performance, but they rely heavily on human experts to orchestrate system pipelines, enabling them to adapt to changes in data, queries, tasks, and environments. For instance, while there are numerous data science tools available, developing a pipeline planning system to coordinate these tools remains challenging. This difficulty arises because existing Data+AI systems have limited capabilities in semantic understanding, reasoning, and planning. Fortunately, we have witnessed the success of large language models (LLMs) in enhancing semantic understanding, reasoning, and planning abilities. It is crucial to incorporate LLM techniques to revolutionize data systems for orchestrating Data+AI applications effectively. To achieve this, we propose the concept of a 'Data Agent' - a comprehensive architecture designed to orchestrate Data+AI ecosystems, which focuses on tackling data-related tasks by integrating knowledge comprehension, reasoning, and planning capabilities. We delve into the challenges involved in designing data agents, such as understanding data/queries/environments/tools, orchestrating pipelines/workflows, optimizing and executing pipelines, and fostering pipeline self-reflection. Furthermore, we present examples of data agent systems, including a data science agent, data analytics agents (such as unstructured data analytics agent, semantic structured data analytics agent, data lake analytics agent, and multi-modal data analytics agent), and a database administrator (DBA) agent. We also outline several open challenges associated with designing data agent systems.

Updated: 2025-07-02 11:04:49

标题: 数据代理：用于编排数据+AI生态系统的整体架构

摘要: 传统的数据+人工智能系统利用数据驱动技术来优化性能，但它们严重依赖于人类专家来协调系统管道，使其能够适应数据、查询、任务和环境的变化。例如，虽然有许多数据科学工具可用，但开发一个协调这些工具的管道规划系统仍然具有挑战性。这种困难是由于现有的数据+人工智能系统在语义理解、推理和规划方面的能力有限。幸运的是，我们已经见证了大型语言模型（LLMs）在增强语义理解、推理和规划能力方面取得的成功。将LLM技术纳入到数据系统中，以有效地编排数据+人工智能应用程序至关重要。为了实现这一目标，我们提出了“数据代理”的概念 - 一个旨在编排数据+人工智能生态系统的全面架构，侧重于通过整合知识理解、推理和规划能力来解决与数据相关的任务。我们深入探讨了设计数据代理涉及的挑战，如理解数据/查询/环境/工具，编排管道/工作流程，优化和执行管道，以及促进管道的自我反思。此外，我们提出了数据代理系统的示例，包括数据科学代理、数据分析代理（如非结构化数据分析代理、语义结构化数据分析代理、数据湖分析代理和多模态数据分析代理）以及数据库管理员（DBA）代理。我们还概述了设计数据代理系统所涉及的几个挑战。

更新时间: 2025-07-02 11:04:49

领域: cs.DB,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.01599v1

Analysis of Muon's Convergence and Critical Batch Size

This paper presents a theoretical analysis of Muon, a new optimizer that leverages the inherent matrix structure of neural network parameters. We provide convergence proofs for four practical variants of Muon: with and without Nesterov momentum, and with and without weight decay. We then show that adding weight decay leads to strictly tighter bounds on both the parameter and gradient norms, and we clarify the relationship between the weight decay coefficient and the learning rate. Finally, we derive Muon's critical batch size minimizing the stochastic first-order oracle (SFO) complexity, which is the stochastic computational cost, and validate our theoretical findings with experiments.

Updated: 2025-07-02 11:03:13

标题: Muon的收敛分析和临界批量大小

摘要: 本文对Muon进行了理论分析，它是一种利用神经网络参数固有矩阵结构的新型优化器。我们为四种实用的Muon变体提供了收敛证明：有和没有Nesterov动量，以及有和没有权重衰减。然后我们展示了添加权重衰减会严格限制参数和梯度范数，并澄清了权重衰减系数与学习率之间的关系。最后，我们推导出了Muon的关键批量大小，以最小化随机一阶预言（SFO）复杂度，即随机计算成本，并通过实验证实了我们的理论发现。

更新时间: 2025-07-02 11:03:13

领域: cs.LG

下载: http://arxiv.org/abs/2507.01598v1

Efficiently Vectorized MCMC on Modern Accelerators

With the advent of automatic vectorization tools (e.g., JAX's $\texttt{vmap}$), writing multi-chain MCMC algorithms is often now as simple as invoking those tools on single-chain code. Whilst convenient, for various MCMC algorithms this results in a synchronization problem -- loosely speaking, at each iteration all chains running in parallel must wait until the last chain has finished drawing its sample. In this work, we show how to design single-chain MCMC algorithms in a way that avoids synchronization overheads when vectorizing with tools like $\texttt{vmap}$ by using the framework of finite state machines (FSMs). Using a simplified model, we derive an exact theoretical form of the obtainable speed-ups using our approach, and use it to make principled recommendations for optimal algorithm design. We implement several popular MCMC algorithms as FSMs, including Elliptical Slice Sampling, HMC-NUTS, and Delayed Rejection, demonstrating speed-ups of up to an order of magnitude in experiments.

Updated: 2025-07-02 11:02:59

标题: 现代加速器上高效向量化的MCMC

摘要: 随着自动向量化工具（例如JAX的vmap）的出现，编写多链MCMC算法通常就像在单链代码上调用这些工具一样简单。虽然方便，但对于各种MCMC算法来说，这会导致同步问题 - 简单来说，在每次迭代中，所有并行运行的链必须等到最后一条链完成抽样。在这项工作中，我们展示了如何设计单链MCMC算法，以避免在使用类似vmap的工具进行向量化时的同步开销，方法是使用有限状态机（FSMs）框架。使用简化模型，我们推导出使用我们的方法可以获得的速度提升的精确理论形式，并利用它对最佳算法设计提出合理建议。我们将几种流行的MCMC算法实现为FSMs，包括椭圆切片抽样、HMC-NUTS和延迟拒绝，实验中展示了高达一个数量级的加速。

更新时间: 2025-07-02 11:02:59

领域: cs.MS,cs.LG,stat.CO,stat.ML

下载: http://arxiv.org/abs/2503.17405v2

T3DM: Test-Time Training-Guided Distribution Shift Modelling for Temporal Knowledge Graph Reasoning

Temporal Knowledge Graph (TKG) is an efficient method for describing the dynamic development of facts along a timeline. Most research on TKG reasoning (TKGR) focuses on modelling the repetition of global facts and designing patterns of local historical facts. However, they face two significant challenges: inadequate modeling of the event distribution shift between training and test samples, and reliance on random entity substitution for generating negative samples, which often results in low-quality sampling. To this end, we propose a novel distributional feature modeling approach for training TKGR models, Test-Time Training-guided Distribution shift Modelling (T3DM), to adjust the model based on distribution shift and ensure the global consistency of model reasoning. In addition, we design a negative-sampling strategy to generate higher-quality negative quadruples based on adversarial training. Extensive experiments show that T3DM provides better and more robust results than the state-of-the-art baselines in most cases.

Updated: 2025-07-02 11:02:37

标题: T3DM: 测试时间训练引导的分布偏移建模，用于时间知识图推理

摘要: 时间知识图谱（TKG）是一种有效的描述事实在时间轴上动态发展的方法。大多数关于TKG推理（TKGR）的研究集中在建模全局事实的重复和设计本地历史事实模式上。然而，他们面临两个重要挑战：在训练和测试样本之间事件分布转移建模不足，以及依赖于随机实体替换生成负样本，这往往导致采样质量较低。为此，我们提出了一种新颖的分布特征建模方法，用于训练TKGR模型，即测试时训练引导的分布转移建模（T3DM），以根据分布转移调整模型并确保模型推理的全局一致性。此外，我们设计了一种负采样策略，基于对抗训练生成更高质量的负四元组。大量实验证明，在大多数情况下，T3DM比最先进的基线提供更好和更稳健的结果。

更新时间: 2025-07-02 11:02:37

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.01597v1

Autonomous AI Surveillance: Multimodal Deep Learning for Cognitive and Behavioral Monitoring

This study presents a novel classroom surveillance system that integrates multiple modalities, including drowsiness, tracking of mobile phone usage, and face recognition,to assess student attentiveness with enhanced precision.The system leverages the YOLOv8 model to detect both mobile phone and sleep usage,(Ghatge et al., 2024) while facial recognition is achieved through LResNet Occ FC body tracking using YOLO and MTCNN.(Durai et al., 2024) These models work in synergy to provide comprehensive, real-time monitoring, offering insights into student engagement and behavior.(S et al., 2023) The framework is trained on specialized datasets, such as the RMFD dataset for face recognition and a Roboflow dataset for mobile phone detection. The extensive evaluation of the system shows promising results. Sleep detection achieves 97. 42% mAP@50, face recognition achieves 86. 45% validation accuracy and mobile phone detection reach 85. 89% mAP@50. The system is implemented within a core PHP web application and utilizes ESP32-CAM hardware for seamless data capture.(Neto et al., 2024) This integrated approach not only enhances classroom monitoring, but also ensures automatic attendance recording via face recognition as students remain seated in the classroom, offering scalability for diverse educational environments.(Banada,2025)

Updated: 2025-07-02 10:59:01

标题: 自主AI监控：用于认知和行为监测的多模式深度学习

摘要: 本研究提出了一种新颖的课堂监控系统，集成了多种模式，包括昏昏欲睡、手机使用跟踪和人脸识别，以提高学生注意力的评估精度。该系统利用YOLOv8模型检测手机和睡眠使用，而面部识别通过LResNet Occ FC身体跟踪使用YOLO和MTCNN实现。这些模型协同工作，提供全面的实时监控，为学生参与和行为提供见解。该框架基于专门的数据集进行训练，如用于人脸识别的RMFD数据集和用于手机检测的Roboflow数据集。系统的广泛评估显示出有希望的结果。睡眠检测达到了97.42%的mAP@50，面部识别达到了86.45%的验证准确度，手机检测达到了85.89%的mAP@50。该系统在核心PHP Web应用程序中实现，并利用ESP32-CAM硬件进行无缝数据捕获。这种整合方法不仅增强了课堂监控，还通过面部识别确保学生在课堂上坐着时自动记录出勤，为各种教育环境提供可扩展性。

更新时间: 2025-07-02 10:59:01

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01590v1

Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder

The creativity of classical music arises not only from composers who craft the musical sheets but also from performers who interpret the static notations with expressive nuances. This paper addresses the challenge of generating classical piano performances from scratch, aiming to emulate the dual roles of composer and pianist in the creative process. We introduce the Expressive Compound Word (ECP) representation, which effectively captures both the metrical structure and expressive nuances of classical performances. Building on this, we propose the Expressive Music Variational AutoEncoder (XMVAE), a model featuring two branches: a Vector Quantized Variational AutoEncoder (VQ-VAE) branch that generates score-related content, representing the Composer, and a vanilla VAE branch that produces expressive details, fulfilling the role of Pianist. These branches are jointly trained with similar Seq2Seq architectures, leveraging a multiscale encoder to capture beat-level contextual information and an orthogonal Transformer decoder for efficient compound tokens decoding. Both objective and subjective evaluations demonstrate that XMVAE generates classical performances with superior musical quality compared to state-of-the-art models. Furthermore, pretraining the Composer branch on extra musical score datasets contribute to a significant performance gain.

Updated: 2025-07-02 10:54:23

标题: 用表达性音乐变分自动编码器探索古典钢琴表演生成

摘要: 古典音乐的创造力不仅来自于创作音乐乐谱的作曲家，还来自于演奏家用具有表现力的细微差别诠释静态符号。本文探讨了从零开始生成古典钢琴演奏的挑战，旨在模拟创作过程中作曲家和钢琴家的双重角色。我们引入了表达性复合词（ECP）表示法，有效捕捉了古典演奏的韵律结构和表现细节。在此基础上，我们提出了表达性音乐变分自动编码器（XMVAE），该模型包括两个分支：一个生成与乐谱相关内容的向量量化变分自动编码器（VQ-VAE）分支，代表作曲家，以及一个生成表现细节的普通VAE分支，扮演钢琴家的角色。这些分支使用相似的Seq2Seq架构进行联合训练，利用多尺度编码器捕捉节拍级别的上下文信息，并使用正交变换器解码器进行高效的复合标记解码。客观和主观评估结果表明，与最先进的模型相比，XMVAE生成的古典演奏具有更优越的音乐质量。此外，对作曲家分支进行额外乐谱数据集的预训练有助于显著提高性能。

更新时间: 2025-07-02 10:54:23

领域: cs.SD,cs.AI,cs.MM,eess.AS

下载: http://arxiv.org/abs/2507.01582v1

A Privacy-Preserving Indoor Localization System based on Hierarchical Federated Learning

Location information serves as the fundamental element for numerous Internet of Things (IoT) applications. Traditional indoor localization techniques often produce significant errors and raise privacy concerns due to centralized data collection. In response, Machine Learning (ML) techniques offer promising solutions by capturing indoor environment variations. However, they typically require central data aggregation, leading to privacy, bandwidth, and server reliability issues. To overcome these challenges, in this paper, we propose a Federated Learning (FL)-based approach for dynamic indoor localization using a Deep Neural Network (DNN) model. Experimental results show that FL has the nearby performance to Centralized Model (CL) while keeping the data privacy, bandwidth efficiency and server reliability. This research demonstrates that our proposed FL approach provides a viable solution for privacy-enhanced indoor localization, paving the way for advancements in secure and efficient indoor localization systems.

Updated: 2025-07-02 10:53:31

标题: 基于分层联邦学习的隐私保护室内定位系统

摘要: 位置信息是许多物联网（IoT）应用的基本要素。传统的室内定位技术通常会产生显著的误差，并由于集中式数据收集而引发隐私顾虑。为此，机器学习（ML）技术通过捕捉室内环境变化提供了有前途的解决方案。然而，它们通常需要集中式数据聚合，导致隐私、带宽和服务器可靠性问题。为了克服这些挑战，本文提出了一种基于联邦学习（FL）的动态室内定位方法，使用深度神经网络（DNN）模型。实验结果表明，FL与集中模型（CL）具有接近的性能，同时保持数据隐私、带宽效率和服务器可靠性。这项研究表明，我们提出的FL方法为增强隐私的室内定位提供了可行的解决方案，为安全高效的室内定位系统的进步铺平了道路。

更新时间: 2025-07-02 10:53:31

领域: cs.LG,cs.CR,eess.SP

下载: http://arxiv.org/abs/2507.01581v1

Transfer Learning for VLC-based indoor Localization: Addressing Environmental Variability

Accurate indoor localization is crucial in industrial environments. Visible Light Communication (VLC) has emerged as a promising solution, offering high accuracy, energy efficiency, and minimal electromagnetic interference. However, VLC-based indoor localization faces challenges due to environmental variability, such as lighting fluctuations and obstacles. To address these challenges, we propose a Transfer Learning (TL)-based approach for VLC-based indoor localization. Using real-world data collected at a BOSCH factory, the TL framework integrates a deep neural network (DNN) to improve localization accuracy by 47\%, reduce energy consumption by 32\%, and decrease computational time by 40\% compared to the conventional models. The proposed solution is highly adaptable under varying environmental conditions and achieves similar accuracy with only 30\% of the dataset, making it a cost-efficient and scalable option for industrial applications in Industry 4.0.

Updated: 2025-07-02 10:51:38

标题: 基于VLC的室内定位的迁移学习：解决环境变化

摘要: 精确的室内定位在工业环境中至关重要。可见光通信（VLC）已经成为一种有前途的解决方案，提供高精度、能效高和最小电磁干扰。然而，基于VLC的室内定位面临来自环境变化的挑战，如光照波动和障碍物。为了解决这些挑战，我们提出了一种基于迁移学习（TL）的VLC室内定位方法。利用在BOSCH工厂收集的真实世界数据，TL框架集成了深度神经网络（DNN），通过与传统模型相比，将定位精度提高了47％，能源消耗减少了32％，计算时间减少了40％。所提出的解决方案在不同环境条件下具有高度适应性，并且仅使用30％的数据集就能实现类似的精度，使其成为工业应用中的一种经济高效和可扩展的选择，适用于工业4.0。

更新时间: 2025-07-02 10:51:38

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2507.01575v1

Is merging worth it? Securely evaluating the information gain for causal dataset acquisition

Merging datasets across institutions is a lengthy and costly procedure, especially when it involves private information. Data hosts may therefore want to prospectively gauge which datasets are most beneficial to merge with, without revealing sensitive information. For causal estimation this is particularly challenging as the value of a merge depends not only on reduction in epistemic uncertainty but also on improvement in overlap. To address this challenge, we introduce the first cryptographically secure information-theoretic approach for quantifying the value of a merge in the context of heterogeneous treatment effect estimation. We do this by evaluating the Expected Information Gain (EIG) using multi-party computation to ensure that no raw data is revealed. We further demonstrate that our approach can be combined with differential privacy (DP) to meet arbitrary privacy requirements whilst preserving more accurate computation compared to DP alone. To the best of our knowledge, this work presents the first privacy-preserving method for dataset acquisition tailored to causal estimation. We demonstrate the effectiveness and reliability of our method on a range of simulated and realistic benchmarks. Code is publicly available: https://github.com/LucileTerminassian/causal_prospective_merge.

Updated: 2025-07-02 10:49:53

标题: 值得合并吗？安全评估因果数据集获取的信息增益

摘要: 跨机构合并数据集是一个漫长且昂贵的过程，特别是当涉及私人信息时。数据主机可能想要预先评估哪些数据集最有利于合并，而不暴露敏感信息。对于因果估计，这尤为具有挑战性，因为合并的价值不仅取决于认识不确定性的减少，还取决于重叠的改善。为了解决这一挑战，我们引入了第一个在异质处理效应估计背景下用于量化合并价值的密码学安全信息论方法。我们通过使用多方计算来评估预期信息增益（EIG），以确保不会泄露原始数据。我们进一步证明，我们的方法可以与差分隐私（DP）结合，以满足任意的隐私要求，同时与仅使用DP相比保留更准确的计算。据我们所知，这项工作提供了第一个针对因果估计量身定制的数据集获取的隐私保护方法。我们在一系列模拟和现实基准上展示了我们方法的有效性和可靠性。代码可以公开获取：https://github.com/LucileTerminassian/causal_prospective_merge。

更新时间: 2025-07-02 10:49:53

领域: stat.ML,cs.CR,cs.LG

下载: http://arxiv.org/abs/2409.07215v3

Subversion via Focal Points: Investigating Collusion in LLM Monitoring

We evaluate language models' ability to subvert monitoring protocols via collusion. More specifically, we have two instances of a model design prompts for a policy (P) and a monitor (M) in a programming task setting. The models collaboratively aim for M to classify all backdoored programs in an auditing dataset as harmful, but nevertheless classify a backdoored program produced by P as harmless. The models are isolated from each other, requiring them to independently arrive at compatible subversion strategies. We find that while Claude 3.7 Sonnet has low success rate due to poor convergence, it sometimes successfully colludes on non-obvious signals.

Updated: 2025-07-02 10:48:37

标题: 通过焦点的颠覆：调查LLM监测中的勾结

摘要: 我们评估语言模型通过勾结来颠覆监测协议的能力。更具体地，我们在编程任务设置中有两个模型设计提示的实例，一个是策略（P），一个是监视器（M）。这些模型共同致力于让M将审计数据集中的所有植入后门的程序都分类为有害，但仍然将P生成的植入后门程序分类为无害。这些模型相互隔离，需要它们独立地达成相容的颠覆策略。我们发现，尽管Claude 3.7 Sonnet的成功率较低，因为收敛性差，但有时它成功地勾结在非明显的信号上。

更新时间: 2025-07-02 10:48:37

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2507.03010v1

On the Effect of Ruleset Tuning and Data Imbalance on Explainable Network Security Alert Classifications: a Case-Study on DeepCASE

Automation in Security Operations Centers (SOCs) plays a prominent role in alert classification and incident escalation. However, automated methods must be robust in the presence of imbalanced input data, which can negatively affect performance. Additionally, automated methods should make explainable decisions. In this work, we evaluate the effect of label imbalance on the classification of network intrusion alerts. As our use-case we employ DeepCASE, the state-of-the-art method for automated alert classification. We show that label imbalance impacts both classification performance and correctness of the classification explanations offered by DeepCASE. We conclude tuning the detection rules used in SOCs can significantly reduce imbalance and may benefit the performance and explainability offered by alert post-processing methods such as DeepCASE. Therefore, our findings suggest that traditional methods to improve the quality of input data can benefit automation.

Updated: 2025-07-02 10:47:42

标题: 关于规则集调整和数据不平衡对可解释网络安全警报分类的影响：DeepCASE案例研究

摘要: 安全运营中心（SOCs）中的自动化在警报分类和事件升级中起着重要作用。然而，在存在不平衡输入数据的情况下，自动化方法必须具有稳健性，否则会对性能产生负面影响。此外，自动化方法应该做出可解释的决策。在这项工作中，我们评估了标签不平衡对网络入侵警报分类的影响。作为我们的用例，我们采用了DeepCASE，这是自动警报分类的最新方法。我们展示了标签不平衡对DeepCASE提供的分类性能和分类解释的正确性都产生影响。我们得出结论，调整SOCs中使用的检测规则可以显著减少不平衡，并可能使DeepCASE等警报后处理方法提供的性能和可解释性受益。因此，我们的发现表明，改善输入数据质量的传统方法可以使自动化受益。

更新时间: 2025-07-02 10:47:42

领域: cs.CR,cs.LG,cs.NI

下载: http://arxiv.org/abs/2507.01571v1

How Metacognitive Architectures Remember Their Own Thoughts: A Systematic Review

Background: Metacognition has gained significant attention for its potential to enhance autonomy and adaptability of artificial agents but remains a fragmented field: diverse theories, terminologies, and design choices have led to disjointed developments and limited comparability across systems. Existing overviews remain at a conceptual level that is undiscerning to the underlying algorithms, representations, and their respective success. Methods: We address this gap by performing an explorative systematic review. Reports were included if they described techniques enabling Computational Metacognitive Architectures (CMAs) to model, store, remember, and process their episodic metacognitive experiences, one of Flavell's (1979a) three foundational components of metacognition. Searches were conducted in 16 databases, consulted between December 2023 and June 2024. Data were extracted using a 20-item framework considering pertinent aspects. Results: A total of 101 reports on 35 distinct CMAs were included. Our findings show that metacognitive experiences may boost system performance and explainability, e.g., via self-repair. However, lack of standardization and limited evaluations may hinder progress: only 17% of CMAs were quantitatively evaluated regarding this review's focus, and significant terminological inconsistency limits cross-architecture synthesis. Systems also varied widely in memory content, data types, and employed algorithms. Discussion: Limitations include the non-iterative nature of the search query, heterogeneous data availability, and an under-representation of emergent, sub-symbolic CMAs. Future research should focus on standardization and evaluation, e.g., via community-driven challenges, and on transferring promising principles to emergent architectures.

Updated: 2025-07-02 10:46:39

标题: 元认知架构如何记住自己的思维：一项系统性综述

摘要: 背景：元认知引起了广泛关注，因为它有潜力提高人工智能代理的自主性和适应性，但仍然是一个碎片化的领域：不同的理论、术语和设计选择导致了不连贯的发展，系统之间的比较性有限。现有的概述仍停留在概念层面，对底层算法、表示和它们各自的成功缺乏洞察力。方法：我们通过进行探索性系统性审查来弥补这一差距。如果报告描述了使计算元认知架构（CMAs）能够建模、存储、记忆和处理其情境元认知经验的技术，则将其纳入。这是Flavell（1979a）元认知的三个基本组成部分之一。搜索在2023年12月至2024年6月之间在16个数据库中进行。使用一个包括20个项目的框架提取数据，考虑相关方面。结果：总共包括了101份关于35个不同CMAs的报告。我们的发现显示，元认知经验可能提升系统性能和可解释性，例如，通过自我修复。然而，缺乏标准化和有限的评估可能会阻碍进展：仅有17%的CMAs在本审查的重点方面进行了定量评估，显著的术语一致性不足限制了跨体系结构的综合。系统在记忆内容、数据类型和使用的算法上也存在广泛差异。讨论：限制包括搜索查询的非迭代性质，异质数据的可用性以及新兴、亚符号的CMAs的代表性不足。未来的研究应该专注于标准化和评估，例如通过社区驱动的挑战，并将有希望的原则转移到新兴的架构中。

更新时间: 2025-07-02 10:46:39

领域: q-bio.NC,cs.AI,I.2.0; I.2.4; I.2.6; I.2.8; J.4

下载: http://arxiv.org/abs/2503.13467v2

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called co-learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED yields curricula that improve zero-shot generalization across multiple benchmarks while requiring up to 2x fewer environment interactions than strong baselines. Ablation studies confirm that the transition prediction error drives rapid complexity ramp-up and that co-learnability delivers additional gains when paired with the transition prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED.

Updated: 2025-07-02 10:44:04

标题: TRACED：具有共同学习能力的过渡感知遗憾近似方法，用于环境设计

摘要: 将深度强化学习代理泛化到未知环境仍然是一个重大挑战。一个有前途的解决方案是无监督环境设计（UED），这是一个共同进化的框架，其中教师自适应地生成具有高学习潜力的任务，而学生从这个不断发展的课程中学习一个稳健的策略。现有的UED方法通常通过遗憾来衡量学习潜力，即最优性能和当前性能之间的差距，仅通过值函数损失来近似。在这些方法的基础上，我们引入了转换预测误差作为遗憾近似的一个附加项。为了捕捉在一个任务上的训练如何影响其他任务的表现，我们进一步提出了一个轻量级的度量称为共同可学性。通过结合这两个度量，我们提出了面向环境设计的具有共同可学性的转换感知遗憾近似（TRACED）。实证评估表明，TRACED提供的课程能够提高零样本泛化在多个基准上，同时需要的环境交互比强基线少多达2倍。消融研究证实，转换预测误差推动了快速复杂性的逐步增加，而共同可学性在与转换预测误差配对时提供了额外的收益。这些结果表明，通过精细的遗憾近似和明确建模任务关系，可以利用在UED中进行样本效率课程设计。

更新时间: 2025-07-02 10:44:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.19997v2

MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification

Large Language Models (LLMs) have demonstrated significant potential in code generation. However, in the factory automation sector, particularly motion control, manual programming, alongside inefficient and unsafe debugging practices, remains prevalent. This stems from the complex interplay of mechanical and electrical systems and stringent safety requirements. Moreover, most current AI-assisted motion control programming efforts focus on PLCs, with little attention given to high-level languages and function libraries. To address these challenges, we introduce MCCoder, an LLM-powered system tailored for generating motion control code, integrated with a soft-motion controller. MCCoder improves code generation through a structured workflow that combines multitask decomposition, hybrid retrieval-augmented generation (RAG), and iterative self-correction, utilizing a well-established motion library. Additionally, it integrates a 3D simulator for intuitive motion validation and logs of full motion trajectories for data verification, significantly enhancing accuracy and safety. In the absence of benchmark datasets and metrics tailored for evaluating motion control code generation, we propose MCEVAL, a dataset spanning motion tasks of varying complexity. Experiments show that MCCoder outperforms baseline models using Advanced RAG, achieving an overall performance gain of 33.09% and a 131.77% improvement on complex tasks in the MCEVAL dataset.

Updated: 2025-07-02 10:43:07

标题: MCCoder: 利用LLM辅助代码生成和严格验证优化运动控制

摘要: 大型语言模型（LLMs）在代码生成方面展示了显著潜力。然而，在工厂自动化领域，特别是运动控制领域，手动编程以及低效和不安全的调试实践仍然普遍存在。这源于机械和电气系统之间复杂的相互作用以及严格的安全要求。此外，大多数当前的人工智能辅助运动控制编程工作都集中在可编程逻辑控制器（PLC）上，对高级语言和函数库的关注较少。为了解决这些挑战，我们引入了MCCoder，这是一个为生成运动控制代码而量身定制的LLM驱动系统，集成了软运动控制器。MCCoder通过结构化工作流程改进了代码生成，结合了多任务分解、混合检索增强生成（RAG）和迭代自我校正，利用了一个成熟的运动库。此外，它集成了一个3D模拟器，用于直观运动验证，以及完整运动轨迹的日志进行数据验证，显著提高了准确性和安全性。在没有针对评估运动控制代码生成的基准数据集和指标的情况下，我们提出了MCEVAL，这是一个涵盖不同复杂性运动任务的数据集。实验表明，MCCoder使用先进的RAG优于基准模型，在MCEVAL数据集中总体性能提高了33.09%，在复杂任务上提高了131.77%。

更新时间: 2025-07-02 10:43:07

领域: cs.AI

下载: http://arxiv.org/abs/2410.15154v3

Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers

Time series classification is a fundamental task in healthcare and industry, yet the development of time series foundation models (TSFMs) remains limited by the scarcity of publicly available time series datasets. In this work, we propose Time Vision Transformer (TiViT), a framework that converts time series into images to leverage the representational power of frozen Vision Transformers (ViTs) pretrained on large-scale image datasets. First, we theoretically motivate our approach by analyzing the 2D patching of ViTs for time series, showing that it can increase the number of label-relevant tokens and reduce the sample complexity. Second, we empirically demonstrate that TiViT achieves state-of-the-art performance on standard time series classification benchmarks by utilizing the hidden representations of large OpenCLIP models. We explore the structure of TiViT representations and find that intermediate layers with high intrinsic dimension are the most effective for time series classification. Finally, we assess the alignment between TiViT and TSFM representation spaces and identify a strong complementarity, with further performance gains achieved by combining their features. Our findings reveal a new direction for reusing vision representations in a non-visual domain. Code is available at https://github.com/ExplainableML/TiViT.

Updated: 2025-07-02 10:32:10

标题: 预训练视觉Transformer中隐藏的用于分类的时间序列表示

摘要: 时间序列分类是医疗保健和工业中的一项基本任务，然而，由于公开可用的时间序列数据集稀缺，时间序列基础模型（TSFMs）的发展仍然受到限制。在这项工作中，我们提出了Time Vision Transformer（TiViT），这是一个将时间序列转换为图像的框架，以利用在大规模图像数据集上预训练的冻结Vision Transformers（ViTs）的表征能力。首先，我们通过分析ViTs对时间序列的2D patching理论上证明了我们的方法，显示它可以增加标签相关令牌的数量并减少样本复杂度。其次，我们通过利用大型OpenCLIP模型的隐藏表示，实验证明TiViT在标准时间序列分类基准上实现了最先进的性能。我们探索了TiViT表示的结构，并发现具有高固有维度的中间层对时间序列分类最有效。最后，我们评估了TiViT和TSFM表示空间之间的对齐，并确定了强大的互补性，通过结合它们的特征进一步提高性能。我们的发现揭示了在非视觉领域重用视觉表示的新方向。代码可在https://github.com/ExplainableML/TiViT上获得。

更新时间: 2025-07-02 10:32:10

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2506.08641v2

The Boolean Solution Problem from the Perspective of Predicate Logic -- Extended Version

Finding solution values for unknowns in Boolean equations was a principal reasoning mode in the Algebra of Logic of the 19th century. Schr\"oder investigated it as Aufl\"osungsproblem (solution problem). It is closely related to the modern notion of Boolean unification. Today it is commonly presented in an algebraic setting, but seems potentially useful also in knowledge representation based on predicate logic. We show that it can be modeled on the basis of first-order logic extended by second-order quantification. A wealth of classical results transfers, foundations for algorithms unfold, and connections with second-order quantifier elimination and Craig interpolation become apparent. Although for first-order inputs the set of solutions is recursively enumerable, the development of constructive methods remains a challenge. We identify some cases that allow constructions, most of them based on Craig interpolation.

Updated: 2025-07-02 10:31:26

标题: 谓词逻辑视角下的布尔解决问题 -- 扩展版本

摘要: 19世纪逻辑代数中找到布尔方程中未知数的解是主要的推理模式。Schr\"oder将其研究为Aufl\"osungsproblem（解决问题）。这与现代布尔合一的概念密切相关。今天，它通常以代数设置呈现，但在基于谓词逻辑的知识表示中似乎也有潜在用途。我们展示了它可以基于第一阶逻辑通过二阶量化来建模。大量经典结果转移，算法的基础展开，并与二阶量词消除和Craig插值产生联系。尽管对于第一阶输入，解的集合是递归可枚举的，但构造方法的发展仍然是一个挑战。我们确定了一些允许构造的情况，其中大多数基于Craig插值。

更新时间: 2025-07-02 10:31:26

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/1706.08329v4

Real-Time Emergency Vehicle Siren Detection with Efficient CNNs on Embedded Hardware

We present a full-stack emergency vehicle (EV) siren detection system designed for real-time deployment on embedded hardware. The proposed approach is based on E2PANNs, a fine-tuned convolutional neural network derived from EPANNs, and optimized for binary sound event detection under urban acoustic conditions. A key contribution is the creation of curated and semantically structured datasets - AudioSet-EV, AudioSet-EV Augmented, and Unified-EV - developed using a custom AudioSet-Tools framework to overcome the low reliability of standard AudioSet annotations. The system is deployed on a Raspberry Pi 5 equipped with a high-fidelity DAC+microphone board, implementing a multithreaded inference engine with adaptive frame sizing, probability smoothing, and a decision-state machine to control false positive activations. A remote WebSocket interface provides real-time monitoring and facilitates live demonstration capabilities. Performance is evaluated using both framewise and event-based metrics across multiple configurations. Results show the system achieves low-latency detection with improved robustness under realistic audio conditions. This work demonstrates the feasibility of deploying IoS-compatible SED solutions that can form distributed acoustic monitoring networks, enabling collaborative emergency vehicle tracking across smart city infrastructures through WebSocket connectivity on low-cost edge devices.

Updated: 2025-07-02 10:27:41

标题: 实时紧急车辆警报声检测在嵌入式硬件上的高效卷积神经网络

摘要: 我们提出了一个专为实时部署在嵌入式硬件上的全套应急车辆（EV）警报检测系统。所提出的方法基于E2PANNs，这是从EPANNs派生出的经过优化的卷积神经网络，专门用于在城市声学条件下进行二进制声事件检测。一个关键贡献是创建了经过策划和语义结构化的数据集 - AudioSet-EV、AudioSet-EV Augmented和Unified-EV - 使用自定义的AudioSet-Tools框架开发，以克服标准AudioSet注释的低可靠性。该系统部署在配备高保真DAC+麦克风板的Raspberry Pi 5上，实现了一个多线程推理引擎，具有自适应帧大小、概率平滑和一个决策状态机来控制误报激活。一个远程WebSocket接口提供实时监控，并促进了实时演示功能。通过多个配置使用基于帧和基于事件的度量评估性能。结果显示，该系统在真实音频条件下实现了低延迟检测，并具有更好的鲁棒性。这项工作展示了部署IoS兼容SED解决方案的可行性，这些解决方案可以形成分布式声学监测网络，通过低成本边缘设备上的WebSocket连接，实现智能城市基础设施上的协作紧急车辆跟踪。

更新时间: 2025-07-02 10:27:41

领域: cs.SD,cs.AI,eess.AS,68T07 (Primary), 68T10 (Secondary),B.1.5; B.4.5; C.3; C.4; I.2; K.4; J.2

下载: http://arxiv.org/abs/2507.01563v1

Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization

Generative adversarial networks (GANs) learn a latent space whose samples can be mapped to real-world images. Such latent spaces are difficult to interpret. Some earlier supervised methods aim to create an interpretable latent space or discover interpretable directions, which requires exploiting data labels or annotated synthesized samples for training. However, we propose using a modification of vector quantization called space-filling vector quantization (SFVQ), which quantizes the data on a piece-wise linear curve. SFVQ can capture the underlying morphological structure of the latent space, making it interpretable. We apply this technique to model the latent space of pre-trained StyleGAN2 and BigGAN networks on various datasets. Our experiments show that the SFVQ curve yields a general interpretable model of the latent space such that it determines which parts of the latent space correspond to specific generative factors. Furthermore, we demonstrate that each line of the SFVQ curve can potentially refer to an interpretable direction for applying intelligible image transformations. We also demonstrate that the points located on an SFVQ line can be used for controllable data augmentation.

Updated: 2025-07-02 10:27:13

标题: 使用空间填充向量量化在GANs中无监督的全景解释潜在空间

摘要: 生成对抗网络（GANs）学习一个潜在空间，其样本可以映射到现实世界的图像。这样的潜在空间很难解释。一些早期的监督方法旨在创建一个可解释的潜在空间或发现可解释的方向，这需要利用数据标签或标注的合成样本进行训练。然而，我们提出使用一种称为空间填充向量量化（SFVQ）的向量量化修改，它在一条分段线性曲线上量化数据。SFVQ可以捕捉潜在空间的基本形态结构，使其具有解释性。我们将这种技术应用于在各种数据集上预训练的StyleGAN2和BigGAN网络的潜在空间建模。我们的实验证明，SFVQ曲线产生了一个通用的可解释模型的潜在空间，从而确定潜在空间的哪些部分对应于特定的生成因子。此外，我们展示了SFVQ曲线的每一条线都可能指向一个可解释的方向，用于应用可理解的图像变换。我们还展示了位于SFVQ线上的点可用于可控数据增强。

更新时间: 2025-07-02 10:27:13

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.20573v2

Who Should I Listen To? Adaptive Collaboration in Personalized Federated Learning

Data heterogeneity is a central challenge in federated learning, and personalized federated learning (PFL) aims to address it by tailoring models to each client's distribution. Yet many PFL methods fail to outperform local or centralized baselines, suggesting a mismatch between the collaboration they enforce and the structure of the data. We propose an approach based on adaptive collaboration, where clients decide adaptively not only how much to rely on others, but also whom to trust at the level of individual examples. We instantiate this principle in FEDMOSAIC, a federated co-training method in which clients exchange predictions over a shared unlabeled dataset. This enables fine-grained trust decisions that are difficult to achieve with parameter sharing alone. Each client adjusts its loss weighting based on the agreement between private and public data, and contributes to global pseudo-labels in proportion to its estimated per-example confidence. Empirically, FEDMOSAIC improves upon state-of-the-art PFL methods across diverse non-IID settings, and we provide convergence guarantees under standard assumptions. Our results demonstrate the potential of data-aware collaboration for robust and effective personalization.

Updated: 2025-07-02 10:26:16

标题: 我该听谁的？个性化联邦学习中的自适应协作

摘要: 数据异构性是联邦学习中的一个核心挑战，个性化联邦学习（PFL）旨在通过为每个客户定制模型来解决这个挑战。然而，许多PFL方法未能超越本地或中央基线，这表明它们所强调的协作方式与数据结构不匹配。我们提出了一种基于自适应协作的方法，其中客户端自适应地决定不仅要依赖于他人多少，还要在个体示例级别上信任谁。我们在FEDMOSAIC中实现了这一原则，这是一种联邦共同训练方法，其中客户端在共享的未标记数据集上交换预测。这使得难以仅通过参数共享实现的细粒度信任决策成为可能。每个客户端根据私有和公共数据之间的一致性调整其损失加权，并根据估计的每个示例的置信度向全局伪标签进行贡献。从实证上看，FEDMOSAIC在各种非独立同分布设置中超越了最先进的PFL方法，并在标准假设下提供了收敛保证。我们的结果展示了数据感知协作对于强大且有效的个性化的潜力。

更新时间: 2025-07-02 10:26:16

领域: cs.LG

下载: http://arxiv.org/abs/2507.00259v2

PDFMathTranslate: Scientific Document Translation Preserving Layouts

Language barriers in scientific documents hinder the diffusion and development of science and technologies. However, prior efforts in translating such documents largely overlooked the information in layouts. To bridge the gap, we introduce PDFMathTranslate, the world's first open-source software for translating scientific documents while preserving layouts. Leveraging the most recent advances in large language models and precise layout detection, we contribute to the community with key improvements in precision, flexibility, and efficiency. The work has been open-sourced at https://github.com/byaidu/pdfmathtranslate with more than 22k downloads.

Updated: 2025-07-02 10:22:05

标题: PDFMathTranslate：保留布局的科学文献翻译

摘要: 科学文献中的语言障碍阻碍了科学技术的传播和发展。然而，之前翻译这些文献的努力很大程度上忽视了布局信息。为了弥合这一差距，我们引入了PDFMathTranslate，这是世界上第一个开源软件，可以在保持布局的同时翻译科学文献。利用最新的大型语言模型和精确的布局检测技术，我们为社区做出了关键的改进，提高了精度、灵活性和效率。该工作已在https://github.com/byaidu/pdfmathtranslate 开源，下载量超过22,000次。

更新时间: 2025-07-02 10:22:05

领域: cs.CL,cs.IR,cs.LG,68T50, 68T45, 68U10, 68U15,D.2.2; I.2.10; I.2.7; J.0

下载: http://arxiv.org/abs/2507.03009v1

How Weight Resampling and Optimizers Shape the Dynamics of Continual Learning and Forgetting in Neural Networks

Recent work in continual learning has highlighted the beneficial effect of resampling weights in the last layer of a neural network (``zapping"). Although empirical results demonstrate the effectiveness of this approach, the underlying mechanisms that drive these improvements remain unclear. In this work, we investigate in detail the pattern of learning and forgetting that take place inside a convolutional neural network when trained in challenging settings such as continual learning and few-shot transfer learning, with handwritten characters and natural images. Our experiments show that models that have undergone zapping during training more quickly recover from the shock of transferring to a new domain. Furthermore, to better observe the effect of continual learning in a multi-task setting we measure how each individual task is affected. This shows that, not only zapping, but the choice of optimizer can also deeply affect the dynamics of learning and forgetting, causing complex patterns of synergy/interference between tasks to emerge when the model learns sequentially at transfer time.

Updated: 2025-07-02 10:18:35

标题: 权重重采样和优化器如何塑造神经网络中继续学习和遗忘的动态

摘要: 最近在持续学习领域的研究中，突出了在神经网络的最后一层重新采样权重（“zapping”）的益处效果。尽管实证结果表明这种方法的有效性，但推动这些改进的基本机制仍不清楚。在这项工作中，我们详细研究了在挑战性环境中（如持续学习和少样本迁移学习）训练卷积神经网络时发生的学习和遗忘模式，包括手写字符和自然图像。我们的实验表明，在训练过程中经历了“zapping”的模型更快地从转移到新领域的冲击中恢复过来。此外，为了更好地观察多任务设置下持续学习的效果，我们测量了每个单独任务受到的影响。这表明，不仅“zapping”，而且优化器的选择也会深刻影响学习和遗忘的动态，导致模型在转移时按顺序学习时任务之间产生复杂的协同/干扰模式。

更新时间: 2025-07-02 10:18:35

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.01559v1

Learned-Database Systems Security

A learned database system uses machine learning (ML) internally to improve performance. We can expect such systems to be vulnerable to some adversarial-ML attacks. Often, the learned component is shared between mutually-distrusting users or processes, much like microarchitectural resources such as caches, potentially giving rise to highly-realistic attacker models. However, compared to attacks on other ML-based systems, attackers face a level of indirection as they cannot interact directly with the learned model. Additionally, the difference between the attack surface of learned and non-learned versions of the same system is often subtle. These factors obfuscate the de-facto risks that the incorporation of ML carries. We analyze the root causes of potentially-increased attack surface in learned database systems and develop a framework for identifying vulnerabilities that stem from the use of ML. We apply our framework to a broad set of learned components currently being explored in the database community. To empirically validate the vulnerabilities surfaced by our framework, we choose 3 of them and implement and evaluate exploits against these. We show that the use of ML cause leakage of past queries in a database, enable a poisoning attack that causes exponential memory blowup in an index structure and crashes it in seconds, and enable index users to snoop on each others' key distributions by timing queries over their own keys. We find that adversarial ML is an universal threat against learned components in database systems, point to open research gaps in our understanding of learned-systems security, and conclude by discussing mitigations, while noting that data leakage is inherent in systems whose learned component is shared between multiple parties.

Updated: 2025-07-02 10:16:58

标题: 学习型数据库系统安全

摘要: 一个学习数据库系统在内部使用机器学习（ML）来提高性能。我们可以预期这种系统容易受到一些对抗性ML攻击的影响。通常，学习组件在互相不信任的用户或进程之间共享，就像缓存等微体系结构资源一样，可能导致非常逼真的攻击者模型。然而，与其他基于ML的系统的攻击相比，攻击者面临一定程度的间接性，因为他们无法直接与学习模型交互。此外，学习和非学习版本相同系统的攻击面之间的差异通常是微妙的。这些因素使ML的整合带来的实际风险变得模糊不清。我们分析了学习数据库系统中潜在增加攻击面的根本原因，并开发了一个框架，用于识别由于使用ML而产生的漏洞。我们将我们的框架应用于当前在数据库社区中广泛探讨的一组学习组件。为了从实证上验证我们的框架揭示的漏洞，我们选择了其中的3个，并实施和评估了针对这些漏洞的利用程序。我们展示了ML的使用导致数据库中过去查询的泄漏，使得在索引结构中发生指数级内存爆炸并在几秒钟内导致崩溃的毒化攻击，并使索引用户能够通过在自己的键上计时查询来窥探彼此的密钥分布。我们发现对学习数据库系统中的学习组件而言，对抗性ML是一个普遍的威胁，指出了我们对学习系统安全的理解中存在的开放研究空白，并最后讨论了缓解措施，同时指出数据泄露在学习组件被多方共享的系统中是固有的。

更新时间: 2025-07-02 10:16:58

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2212.10318v4

DREAMS: A python framework for Training Deep Learning Models on EEG Data with Model Card Reporting for Medical Applications

Electroencephalography (EEG) provides a non-invasive way to observe brain activity in real time. Deep learning has enhanced EEG analysis, enabling meaningful pattern detection for clinical and research purposes. However, most existing frameworks for EEG data analysis are either focused on preprocessing techniques or deep learning model development, often overlooking the crucial need for structured documentation and model interpretability. In this paper, we introduce DREAMS (Deep REport for AI ModelS), a Python-based framework designed to generate automated model cards for deep learning models applied to EEG data. Unlike generic model reporting tools, DREAMS is specifically tailored for EEG-based deep learning applications, incorporating domain-specific metadata, preprocessing details, performance metrics, and uncertainty quantification. The framework seamlessly integrates with deep learning pipelines, providing structured YAML-based documentation. We evaluate DREAMS through two case studies: an EEG emotion classification task using the FACED dataset and a abnormal EEG classification task using the Temple Univeristy Hospital (TUH) Abnormal dataset. These evaluations demonstrate how the generated model card enhances transparency by documenting model performance, dataset biases, and interpretability limitations. Unlike existing model documentation approaches, DREAMS provides visualized performance metrics, dataset alignment details, and model uncertainty estimations, making it a valuable tool for researchers and clinicians working with EEG-based AI. The source code for DREAMS is open-source, facilitating broad adoption in healthcare AI, research, and ethical AI development.

Updated: 2025-07-02 10:08:46

标题: 梦想：用于医疗应用的基于Python的EEG数据训练深度学习模型的框架，带有模型卡片报告

摘要: 脑电图（EEG）提供了一种非侵入式的方式来实时观察大脑活动。深度学习已经增强了对EEG分析，使临床和研究目的的有意义的模式检测成为可能。然而，大多数现有的EEG数据分析框架要么集中在预处理技术上，要么集中在深度学习模型的开发上，通常忽视了对结构化文档和模型可解释性的重要需求。在本文中，我们介绍了DREAMS（Deep REport for AI ModelS），这是一个基于Python的框架，旨在为应用于EEG数据的深度学习模型生成自动化模型卡。与通用模型报告工具不同，DREAMS专门为基于EEG的深度学习应用而设计，包括领域特定的元数据、预处理细节、性能指标和不确定性量化。该框架与深度学习管道无缝集成，提供结构化的基于YAML的文档。我们通过两个案例研究评估了DREAMS：使用FACED数据集进行EEG情绪分类任务和使用Temple Univeristy Hospital（TUH）异常数据集进行异常EEG分类任务。这些评估展示了生成的模型卡如何通过记录模型性能、数据集偏见和可解释性限制来增强透明度。与现有的模型文档方法不同，DREAMS提供了可视化的性能指标、数据集对齐细节和模型不确定性估计，使其成为与基于EEG的AI一起工作的研究人员和临床医生的有价值的工具。DREAMS的源代码是开源的，有助于在医疗AI、研究和伦理AI开发中广泛采用。

更新时间: 2025-07-02 10:08:46

领域: cs.AI

下载: http://arxiv.org/abs/2409.17815v2

ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention

Linear attention mechanisms deliver significant advantages for Large Language Models (LLMs) by providing linear computational complexity, enabling efficient processing of ultra-long sequences (e.g., 1M context). However, existing Sequence Parallelism (SP) methods, essential for distributing these workloads across devices, become the primary bottleneck due to substantial communication overhead. In this paper, we introduce ZeCO (Zero Communication Overhead) sequence parallelism for linear attention models, a new SP method designed to overcome these limitations and achieve end-to-end near-linear scalability for long sequence training. For example, training a model with a 1M sequence length across 64 devices using ZeCO takes roughly the same time as training with an 16k sequence on a single device. At the heart of ZeCO lies All-Scan, a new collective communication primitive. All-Scan provides each SP rank with precisely the initial operator state it requires while maintaining a minimal communication footprint, effectively eliminating communication overhead. Theoretically, we prove the optimaity of ZeCO, showing that it introduces only negligible time and space overhead. Empirically, we compare the communication costs of different sequence parallelism strategies and demonstrate that All-Scan achieves the fastest communication in SP scenarios. Specifically, on 256 GPUs with an 8M sequence length, ZeCO achieves a 60\% speedup compared to the current state-of-the-art (SOTA) SP method. We believe ZeCO establishes a clear path toward efficiently training next-generation LLMs on previously intractable sequence lengths.

Updated: 2025-07-02 10:04:00

标题: ZeCO：线性注意力的零通信开销序列并行化

摘要: 线性注意机制通过提供线性计算复杂性，在处理超长序列（例如1M上下文）时实现高效处理，为大型语言模型（LLMs）带来显著优势。然而，现有的序列并行（SP）方法，用于在设备之间分配工作负载，由于大量通信开销，成为主要瓶颈。在本文中，我们介绍了适用于线性注意模型的ZeCO（零通信开销）序列并行方法，这是一种旨在克服这些限制并实现长序列训练端到端几乎线性可扩展性的新SP方法。例如，使用ZeCO在64个设备上对1M序列长度的模型进行训练所需的时间大致与在单个设备上对16k序列进行训练的时间相同。ZeCO的核心是All-Scan，一种新的集体通信原语。All-Scan为每个SP排名提供其所需的初始运算符状态，同时保持最小的通信开销，有效消除通信开销。从理论上讲，我们证明了ZeCO的最优性，显示它只引入了可忽略的时间和空间开销。在实证方面，我们比较了不同序列并行策略的通信成本，并展示了All-Scan在SP场景中实现最快的通信。具体而言，在具有8M序列长度的256个GPU上，与当前最先进的SP方法相比，ZeCO实现了60％的加速。我们相信ZeCO为在以前难以处理的序列长度上高效训练下一代LLMs奠定了清晰的道路。

更新时间: 2025-07-02 10:04:00

领域: cs.LG

下载: http://arxiv.org/abs/2507.01004v2

AI and Remote Sensing for Resilient and Sustainable Built Environments: A Review of Current Methods, Open Data and Future Directions

Critical infrastructure, such as transport networks, underpins economic growth by enabling mobility and trade. However, ageing assets, climate change impacts (e.g., extreme weather, rising sea levels), and hybrid threats ranging from natural disasters to cyber attacks and conflicts pose growing risks to their resilience and functionality. This review paper explores how emerging digital technologies, specifically Artificial Intelligence (AI), can enhance damage assessment and monitoring of transport infrastructure. A systematic literature review examines existing AI models and datasets for assessing damage in roads, bridges, and other critical infrastructure impacted by natural disasters. Special focus is given to the unique challenges and opportunities associated with bridge damage detection due to their structural complexity and critical role in connectivity. The integration of SAR (Synthetic Aperture Radar) data with AI models is also discussed, with the review revealing a critical research gap: a scarcity of studies applying AI models to SAR data for comprehensive bridge damage assessment. Therefore, this review aims to identify the research gaps and provide foundations for AI-driven solutions for assessing and monitoring critical transport infrastructures.

Updated: 2025-07-02 09:59:23

标题: 人工智能和遥感技术在具有弹性和可持续建筑环境中的应用：当前方法、开放数据和未来方向综述

摘要: 关键基础设施，如交通网络，通过促进流动性和贸易支撑经济增长。然而，老化资产、气候变化影响（如极端天气、海平面上升）以及从自然灾害到网络攻击和冲突的混合威胁不断增加，对其韧性和功能性构成日益严重的风险。本综述论文探讨了新兴数字技术，特别是人工智能（AI），如何增强对交通基础设施的损害评估和监测。系统文献综述了现有的AI模型和数据集，用于评估受自然灾害影响的道路、桥梁和其他关键基础设施的损害。特别关注桥梁损坏检测所面临的独特挑战和机遇，因为桥梁在连接性方面具有结构复杂性和关键作用。还讨论了合成孔径雷达（SAR）数据与AI模型的整合，综述揭示了一个关键的研究空白：应用AI模型对SAR数据进行全面桥梁损害评估的研究缺乏。因此，本综述旨在识别研究空白，并为评估和监测关键交通基础设施的AI驱动解决方案奠定基础。

更新时间: 2025-07-02 09:59:23

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01547v1

MARVIS: Modality Adaptive Reasoning over VISualizations

Scientific applications of machine learning often rely on small, specialized models tuned to particular domains. Such models often achieve excellent performance, but lack flexibility. Foundation models offer versatility, but typically underperform specialized approaches, especially on non-traditional modalities and long-tail domains. We propose MARVIS (Modality Adaptive Reasoning over VISualizations), a training-free method that enables even small vision-language models to predict any data modality with high accuracy. MARVIS transforms latent embedding spaces into visual representations and then leverages the spatial and fine-grained reasoning skills of VLMs to successfully interpret and utilize them. MARVIS achieves competitive performance on vision, audio, biological, and tabular domains using a single 3B parameter model, achieving results that beat Gemini by 16\% on average and approach specialized methods, without exposing personally identifiable information (P.I.I.) or requiring any domain-specific training. We open source our code and datasets at https://github.com/penfever/marvis

Updated: 2025-07-02 09:56:24

标题: MARVIS：适应可视化的模态自适应推理

摘要: 机器学习的科学应用通常依赖于针对特定领域调整的小型专业模型。这些模型通常表现出色，但缺乏灵活性。基础模型提供了灵活性，但通常在非传统模态和长尾领域上表现不佳。我们提出了MARVIS（Modality Adaptive Reasoning over VISualizations），这是一种无需训练的方法，可以使即使是小型的视觉语言模型也能高精度地预测任何数据模态。MARVIS将潜在嵌入空间转换为视觉表示，然后利用VLM的空间和细粒度推理技能成功解释和利用它们。MARVIS使用单个3B参数模型在视觉、音频、生物和表格领域上实现了竞争性的性能，平均超过Gemini 16\%，并接近专门方法，而无需暴露个人可识别信息（P.I.I.）或需要任何领域特定的训练。我们在https://github.com/penfever/marvis开源了我们的代码和数据集。

更新时间: 2025-07-02 09:56:24

领域: cs.LG

下载: http://arxiv.org/abs/2507.01544v1

Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles

Gaussian mixture models (GMMs) are ubiquitous in statistical learning, particularly for unsupervised problems. While full GMMs suffer from the overparameterization of their covariance matrices in high-dimensional spaces, spherical GMMs (with isotropic covariance matrices) certainly lack flexibility to fit certain anisotropic distributions. Connecting these two extremes, we introduce a new family of parsimonious GMMs with piecewise-constant covariance eigenvalue profiles. These extend several low-rank models like the celebrated mixtures of probabilistic principal component analyzers (MPPCA), by enabling any possible sequence of eigenvalue multiplicities. If the latter are prespecified, then we can naturally derive an expectation-maximization (EM) algorithm to learn the mixture parameters. Otherwise, to address the notoriously-challenging issue of jointly learning the mixture parameters and hyperparameters, we propose a componentwise penalized EM algorithm, whose monotonicity is proven. We show the superior likelihood-parsimony tradeoffs achieved by our models on a variety of unsupervised experiments: density fitting, clustering and single-image denoising.

Updated: 2025-07-02 09:52:56

标题: 简洁的高斯混合模型与分段常数特征值配置

摘要: 高斯混合模型（GMM）在统计学习中是无处不在的，特别是对于无监督问题。虽然完整的GMM在高维空间中存在协方差矩阵的过度参数化问题，而球形GMM（具有各向同性协方差矩阵）在拟合某些各向异性分布方面缺乏灵活性。连接这两个极端，我们引入了一种新的简约GMM家族，具有分段常数协方差特征值配置文件。这些扩展了几种低秩模型，如著名的概率主成分分析器（MPPCA）的混合模型，通过允许任何可能的特征值重复序列。如果后者是预先指定的，那么我们可以自然地推导出一种期望最大化（EM）算法来学习混合参数。否则，为了解决共同学习混合参数和超参数的臭名昭著的问题，我们提出了一种分量惩罚的EM算法，其单调性已被证明。我们在各种无监督实验中展示了我们的模型在似然和简洁性之间取得的卓越平衡：密度拟合、聚类和单图像去噪。

更新时间: 2025-07-02 09:52:56

领域: stat.ML,cs.LG,stat.AP,stat.CO,stat.ME

下载: http://arxiv.org/abs/2507.01542v1

Cybersecurity Issues in Local Energy Markets

Local Energy Markets (LEMs), though pivotal to the energy transition, face growing cybersecurity threats due to their reliance on smart grid communication standards and vulnerable Internet-of-Things (IoT)-enabled devices. This is a critical issue because such vulnerabilities can be exploited to manipulate market operations, compromise participants' privacy, and destabilize power distribution networks. This work maps LEM communication flows to existing standards, highlights potential impacts of key identified vulnerabilities, and simulates cyberattack scenarios on a privacy-preserving LEM model to assess their impacts. Findings reveal how attackers could distort pricing and demand patterns. We finally present recommendations for researchers, industry developers, policymakers, and LEM stakeholders to secure future LEM deployments.

Updated: 2025-07-02 09:44:51

标题: 本地能源市场中的网络安全问题

摘要: 地方能源市场（LEMs）虽然对能源转型至关重要，但由于它们依赖智能电网通信标准和易受攻击的物联网（IoT）设备，面临日益增长的网络安全威胁。这是一个关键问题，因为这种漏洞可以被利用来操纵市场运营、损害参与者的隐私权，并破坏电力配送网络的稳定性。本研究将LEM通信流程映射到现有标准，突出了关键识别漏洞的潜在影响，并在一个保护隐私的LEM模型上模拟网络攻击场景，以评估它们的影响。研究结果显示攻击者如何扭曲定价和需求模式。最后，我们提出建议，供研究人员、行业开发者、政策制定者和LEM利益相关者保障未来的LEM部署。

更新时间: 2025-07-02 09:44:51

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.01536v1

Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning technologies

Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators-particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy-are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our findings challenge claims of statistical robustness, revealing that even generators labeled ''crush-resistant'' (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist.

Updated: 2025-07-02 09:38:00

标题: 机器学习技术中伪随机数生成器的统计质量和可重现性

摘要: 机器学习（ML）框架在数据洗牌、权重初始化、dropout和优化等任务中严重依赖伪随机数生成器（PRNGs）。然而，这些生成器的统计质量和可重现性，特别是当它们集成到像PyTorch、TensorFlow和NumPy这样的框架中时，尚未得到充分探讨。在本文中，我们比较了在ML框架中使用的PRNGs（Mersenne Twister、PCG和Philox）与它们原始的C实现之间的统计质量。使用严格的TestU01 BigCrush测试套件，我们评估了每个生成器的896个独立随机流。我们的发现挑战了统计鲁棒性的声明，揭示了即使是标记为“抗压”的生成器（例如PCG、Philox）也可能在某些统计测试中失败。令人惊讶的是，我们可以观察到同一算法的原生和框架集成版本之间的失败概况存在一些差异，突显出可能存在的一些实现差异。

更新时间: 2025-07-02 09:38:00

领域: cs.OH,cs.CR,cs.LG

下载: http://arxiv.org/abs/2507.03007v1

Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs

This paper provides a proof of the consistency of sparse grid quadrature for numerical integration of high dimensional distributions. In a first step, a transport map is learned that normalizes the distribution to a noise distribution on the unit cube. This step is built on the statistical learning theory of neural ordinary differential equations, which has been established recently. Secondly, the composition of the generative map with the quantity of interest is integrated numerically using the Clenshaw-Curtis sparse grid quadrature. A decomposition of the total numerical error in quadrature error and statistical error is provided. As main result it is proven in the framework of empirical risk minimization that all error terms can be controlled in the sense of PAC (probably approximately correct) learning and with high probability the numerical integral approximates the theoretical value up to an arbitrary small error in the limit where the data set size is growing and the network capacity is increased adaptively.

Updated: 2025-07-02 09:37:16

标题: 使用神经ODE学习的稀疏网格积分规则的一致性

摘要: 本文提供了一种稀疏网格求积法在高维分布数值积分中的一致性证明。首先，学习一个传输映射，将分布标准化为单位立方体上的噪声分布。这一步建立在最近确立的神经常微分方程统计学习理论基础之上。其次，利用Clenshaw-Curtis稀疏网格求积数值积分生成映射与感兴趣的数量进行组合。提供了将总数值误差分解为求积误差和统计误差的方法。作为主要结果，证明在经验风险最小化框架下，所有误差项均可以控制在PAC（可能近似正确）学习的意义上，并且在数据集大小增长和网络容量自适应增加的极限情况下，数值积分近似理论值并且具有高概率性，误差可任意小。

更新时间: 2025-07-02 09:37:16

领域: math.NA,cs.LG,cs.NA,math.PR

下载: http://arxiv.org/abs/2507.01533v1

Chargax: A JAX Accelerated EV Charging Simulator

Deep Reinforcement Learning can play a key role in addressing sustainable energy challenges. For instance, many grid systems are heavily congested, highlighting the urgent need to enhance operational efficiency. However, reinforcement learning approaches have traditionally been slow due to the high sample complexity and expensive simulation requirements. While recent works have effectively used GPUs to accelerate data generation by converting environments to JAX, these works have largely focussed on classical toy problems. This paper introduces Chargax, a JAX-based environment for realistic simulation of electric vehicle charging stations designed for accelerated training of RL agents. We validate our environment in a variety of scenarios based on real data, comparing reinforcement learning agents against baselines. Chargax delivers substantial computational performance improvements of over 100x-1000x over existing environments. Additionally, Chargax' modular architecture enables the representation of diverse real-world charging station configurations.

Updated: 2025-07-02 09:27:14

标题: Chargax：一个JAX加速的EV充电模拟器

摘要: 深度强化学习在解决可持续能源挑战中起着关键作用。例如，许多电网系统严重拥挤，突显了提高运营效率的迫切需求。然而，由于高样本复杂性和昂贵的模拟要求，强化学习方法传统上速度较慢。尽管最近的研究有效地利用GPU加速数据生成，通过将环境转换为JAX，但这些研究主要集中在传统的玩具问题上。本文介绍了Chargax，一种基于JAX的环境，用于加速训练RL代理的真实模拟电动汽车充电站。我们在基于实际数据的各种场景中验证了我们的环境，将强化学习代理与基线进行了比较。Chargax在现有环境上实现了超过100倍至1000倍的计算性能改进。此外，Chargax的模块化架构使其能够表示多样化的真实世界充电站配置。

更新时间: 2025-07-02 09:27:14

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.01522v1

Loss Functions in Diffusion Models: A Comparative Study

Diffusion models have emerged as powerful generative models, inspiring extensive research into their underlying mechanisms. One of the key questions in this area is the loss functions these models shall train with. Multiple formulations have been introduced in the literature over the past several years with some links and some critical differences stemming from various initial considerations. In this paper, we explore the different target objectives and corresponding loss functions in detail. We present a systematic overview of their relationships, unifying them under the framework of the variational lower bound objective. We complement this theoretical analysis with an empirical study providing insights into the conditions under which these objectives diverge in performance and the underlying factors contributing to such deviations. Additionally, we evaluate how the choice of objective impacts the model ability to achieve specific goals, such as generating high-quality samples or accurately estimating likelihoods. This study offers a unified understanding of loss functions in diffusion models, contributing to more efficient and goal-oriented model designs in future research.

Updated: 2025-07-02 09:23:34

标题: 扩散模型中的损失函数：一项比较研究

摘要: 扩散模型已经成为强大的生成模型，激发了对其基本机制的广泛研究。在这一领域的一个关键问题是这些模型应该使用哪种损失函数进行训练。在过去几年的文献中引入了多种表达形式，其中一些联系和一些关键差异源自各种初始考虑。本文详细探讨了不同的目标目标和相应的损失函数。我们在变分下限目标框架下对它们的关系进行了系统概述。我们通过经验研究补充了这一理论分析，从而深入了解这些目标在性能上发散的条件以及导致这种偏差的基本因素。此外，我们评估了目标选择如何影响模型实现特定目标的能力，比如生成高质量样本或准确估计可能性。这项研究提供了对扩散模型中损失函数的统一理解，有助于在未来研究中更高效和目标导向的模型设计。

更新时间: 2025-07-02 09:23:34

领域: cs.LG

下载: http://arxiv.org/abs/2507.01516v1

SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism

By incorporating visual inputs, Multimodal Large Language Models (MLLMs) extend LLMs to support visual reasoning. However, this integration also introduces new vulnerabilities, making MLLMs susceptible to multimodal jailbreak attacks and hindering their safe deployment.Existing defense methods, including Image-to-Text Translation, Safe Prompting, and Multimodal Safety Tuning, attempt to address this by aligning multimodal inputs with LLMs' built-in safeguards.Yet, they fall short in uncovering root causes of multimodal vulnerabilities, particularly how harmful multimodal tokens trigger jailbreak in MLLMs? Consequently, they remain vulnerable to text-driven multimodal jailbreaks, often exhibiting overdefensive behaviors and imposing heavy training overhead.To bridge this gap, we present an comprehensive analysis of where, how and which harmful multimodal tokens bypass safeguards in MLLMs. Surprisingly, we find that less than 1% tokens in early-middle layers are responsible for inducing unsafe behaviors, highlighting the potential of precisely removing a small subset of harmful tokens, without requiring safety tuning, can still effectively improve safety against jailbreaks. Motivated by this, we propose Safe Prune-then-Restore (SafePTR), an training-free defense framework that selectively prunes harmful tokens at vulnerable layers while restoring benign features at subsequent layers.Without incurring additional computational overhead, SafePTR significantly enhances the safety of MLLMs while preserving efficiency. Extensive evaluations across three MLLMs and five benchmarks demonstrate SafePTR's state-of-the-art performance in mitigating jailbreak risks without compromising utility.

Updated: 2025-07-02 09:22:03

标题: SafePTR：通过修剪再恢复机制在多模态LLMs中实现令牌级越狱防御

摘要: 通过整合视觉输入，多模态大语言模型（MLLMs）将LLMs扩展到支持视觉推理。然而，这种整合也引入了新的漏洞，使MLLMs容易受到多模态越狱攻击的影响，从而阻碍了它们的安全部署。现有的防御方法，包括图像到文本翻译、安全提示和多模态安全调整，试图通过将多模态输入与LLMs内置的保护机制对齐来解决这个问题。然而，它们在揭示多模态漏洞的根本原因方面存在不足，特别是有害的多模态标记如何触发MLLMs中的越狱问题？因此，它们仍然容易受到文本驱动的多模态越狱攻击的影响，通常表现出过度防御行为，导致训练开销过大。为了弥补这一差距，我们提出了一个全面的分析，说明有害的多模态标记如何绕过MLLMs的保护机制。令人惊讶的是，我们发现早中层中不到1%的标记会导致不安全行为，强调在不需要安全调整的情况下，精确地移除一小部分有害标记仍然可以有效地提高对越狱的安全性。受此启发，我们提出了Safe Prune-then-Restore（SafePTR），这是一个无需训练的防御框架，可以在易受攻击的层级选择性地修剪有害标记，同时在后续层级恢复良性特征。在不增加额外计算开销的情况下，SafePTR显著提高了MLLMs的安全性，同时保持了效率。在三个MLLMs和五个基准测试中广泛评估表明，SafePTR在减轻越狱风险方面表现出最先进的性能，而不会损害实用性。

更新时间: 2025-07-02 09:22:03

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2507.01513v1

Mamba Guided Boundary Prior Matters: A New Perspective for Generalized Polyp Segmentation

Polyp segmentation in colonoscopy images is crucial for early detection and diagnosis of colorectal cancer. However, this task remains a significant challenge due to the substantial variations in polyp shape, size, and color, as well as the high similarity between polyps and surrounding tissues, often compounded by indistinct boundaries. While existing encoder-decoder CNN and transformer-based approaches have shown promising results, they struggle with stable segmentation performance on polyps with weak or blurry boundaries. These methods exhibit limited abilities to distinguish between polyps and non-polyps and capture essential boundary cues. Moreover, their generalizability still falls short of meeting the demands of real-time clinical applications. To address these limitations, we propose SAM-MaGuP, a groundbreaking approach for robust polyp segmentation. By incorporating a boundary distillation module and a 1D-2D Mamba adapter within the Segment Anything Model (SAM), SAM-MaGuP excels at resolving weak boundary challenges and amplifies feature learning through enriched global contextual interactions. Extensive evaluations across five diverse datasets reveal that SAM-MaGuP outperforms state-of-the-art methods, achieving unmatched segmentation accuracy and robustness. Our key innovations, a Mamba-guided boundary prior and a 1D-2D Mamba block, set a new benchmark in the field, pushing the boundaries of polyp segmentation to new heights.

Updated: 2025-07-02 09:16:58

标题: 黑曼巴引导的边界先验问题：广义息肉分割的新视角

摘要: 结肠镜图像中的息肉分割对于早期检测和诊断结直肠癌至关重要。然而，由于息肉形状、大小和颜色存在显著变化，以及息肉与周围组织之间的高相似性，加之边界模糊，这一任务仍然是一个重大挑战。虽然现有的编码器-解码器CNN和基于transformer的方法显示出了有希望的结果，但它们在对具有边界模糊或模糊的息肉进行稳定分割性能方面存在困难。这些方法在区分息肉和非息肉以及捕获关键边界线索方面能力有限。此外，它们的泛化能力仍然不足以满足实时临床应用的需求。为了解决这些局限性，我们提出了SAM-MaGuP，这是一种具有突破性的强大息肉分割方法。通过在Segment Anything Model（SAM）中引入边界提炼模块和1D-2D Mamba适配器，SAM-MaGuP在解决弱边界挑战和通过丰富的全局上下文交互增强特征学习方面表现出色。对五个不同数据集的广泛评估显示，SAM-MaGuP优于最先进的方法，实现了无与伦比的分割准确性和稳健性。我们的关键创新，即Mamba引导的边界先验和1D-2D Mamba块，为该领域设立了新的标杆，将息肉分割的界限推向了新的高度。

更新时间: 2025-07-02 09:16:58

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.01509v1

Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence

The collection and release of street-level recordings as Open Data play a vital role in advancing autonomous driving systems and AI research. However, these datasets pose significant privacy risks, particularly for pedestrians, due to the presence of Personally Identifiable Information (PII) that extends beyond biometric traits such as faces. In this paper, we present cRID, a novel cross-modal framework combining Large Vision-Language Models, Graph Attention Networks, and representation learning to detect textual describable clues of PII and enhance person re-identification (Re-ID). Our approach focuses on identifying and leveraging interpretable features, enabling the detection of semantically meaningful PII beyond low-level appearance cues. We conduct a systematic evaluation of PII presence in person image datasets. Our experiments show improved performance in practical cross-dataset Re-ID scenarios, notably from Market-1501 to CUHK03-np (detected), highlighting the framework's practical utility. Code is available at https://github.com/RAufschlaeger/cRID.

Updated: 2025-07-02 09:10:33

标题: 跟踪线索：使用跨模态智能进行人员再识别实验

摘要: 街道级录音的收集和发布作为开放数据在推动自动驾驶系统和人工智能研究方面发挥着重要作用。然而，这些数据集存在显著的隐私风险，特别是对于行人而言，这是由于存在超出生物特征（如面部）的个人可识别信息（PII）。在本文中，我们提出了cRID，这是一个结合了大型视觉-语言模型、图注意网络和表示学习的新型跨模态框架，用于检测PII的可描述文本线索并增强人员重新识别（Re-ID）。我们的方法侧重于识别和利用可解释特征，从而使得在低级外观线索之外，可以检测到具有语义意义的PII。我们对人员图像数据集中的PII存在性进行了系统评估。我们的实验显示，在实际的跨数据集Re-ID场景中，性能有所提升，特别是从Market-1501到CUHK03-np（检测到），突显了该框架的实用性。代码可在https://github.com/RAufschlaeger/cRID 上找到。

更新时间: 2025-07-02 09:10:33

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.01504v1

Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling

We present LrcSSM, a $\textit{nonlinear}$ recurrent model that processes long sequences as fast as today's linear state-space layers. By forcing the state-transition matrix to be diagonal and learned at every step, the full sequence can be solved in parallel with a single prefix-scan, giving $\mathcal{O}(TD)$ time and memory and only $\mathcal{O}(\log T)$ sequential depth, for input-sequence length $T$ and a state dimension $D$. Moreover, LrcSSM offers a formal gradient-stability guarantee that other input-varying systems such as Liquid-S4 and Mamba do not provide. Lastly, for network depth $L$, as the forward and backward passes cost $\Theta(T\,D\,L)$ FLOPs, with its low sequential depth and parameter count $\Theta(D\,L)$, the model follows the compute-optimal scaling law regime ($\beta \approx 0.42$) recently observed for Mamba, outperforming quadratic-attention Transformers at equal compute while avoiding the memory overhead of FFT-based long convolutions. We show that on a series of long-range forecasting tasks, LrcSSM outperforms LRU, S5 and Mamba.

Updated: 2025-07-02 09:09:49

标题: 扩展液阻性液电容网络以实现高效的序列建模

摘要: 我们提出了LrcSSM，这是一个$\textit{非线性}$循环模型，可以像今天的线性状态空间层一样快速处理长序列。通过强制状态转移矩阵为对角线并在每一步学习，整个序列可以通过单个前缀扫描并行求解，时间和内存复杂度为$\mathcal{O}(TD)$，仅有$\mathcal{O}(\log T)$的顺序深度，其中输入序列长度为$T$，状态维度为$D$。此外，LrcSSM提供了其他输入变化系统（如Liquid-S4和Mamba）无法提供的正式梯度稳定性保证。最后，对于网络深度$L$，前向和后向传递的计算成本为$\Theta(T\,D\,L)$ FLOPs，具有低顺序深度和参数数量$\Theta(D\,L)$，该模型遵循最优计算缩放定律（$\beta \approx 0.42$），最近观察到的Mamba，可以在相同计算量下优于二次注意力Transformer，同时避免了基于FFT的长卷积的内存开销。我们展示了在一系列长期预测任务中，LrcSSM优于LRU、S5和Mamba。

更新时间: 2025-07-02 09:09:49

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2505.21717v3

SimBank: from Simulation to Solution in Prescriptive Process Monitoring

Prescriptive Process Monitoring (PresPM) is an emerging area within Process Mining, focused on optimizing processes through real-time interventions for effective decision-making. PresPM holds significant promise for organizations seeking enhanced operational performance. However, the current literature faces two key limitations: a lack of extensive comparisons between techniques and insufficient evaluation approaches. To address these gaps, we introduce SimBank: a simulator designed for accurate benchmarking of PresPM methods. Modeled after a bank's loan application process, SimBank enables extensive comparisons of both online and offline PresPM methods. It incorporates a variety of intervention optimization problems with differing levels of complexity and supports experiments on key causal machine learning challenges, such as assessing a method's robustness to confounding in data. SimBank additionally offers a comprehensive evaluation capability: for each test case, it can generate the true outcome under each intervention action, which is not possible using recorded datasets. The simulator incorporates parallel activities and loops, drawing from common logs to generate cases that closely resemble real-life process instances. Our proof of concept demonstrates SimBank's benchmarking capabilities through experiments with various PresPM methods across different interventions, highlighting its value as a publicly available simulator for advancing research and practice in PresPM.

Updated: 2025-07-02 09:09:00

标题: SimBank：从模拟到解决方案的处方过程监控

摘要: Prescriptive Process Monitoring (PresPM)是过程挖掘中的新兴领域，专注于通过实时干预来优化流程，以进行有效的决策制定。PresPM对于寻求提高运营绩效的组织具有重要的潜力。然而，目前的文献存在两个关键限制：缺乏技术之间的广泛比较和不足的评估方法。为了解决这些缺口，我们引入了SimBank：一个设计用于准确基准测试PresPM方法的模拟器。模拟器模拟了银行的贷款申请流程，使得能够广泛比较在线和离线的PresPM方法。它包括各种干预优化问题，具有不同的复杂性水平，并支持关键的因果机器学习挑战的实验，例如评估方法对数据混淆的稳健性。SimBank还提供了全面的评估能力：对于每个测试案例，它可以生成每个干预行动下的真实结果，这是使用记录数据集不可能实现的。该模拟器包括并行活动和循环，利用常见日志来生成与真实过程实例密切相似的案例。我们的概念验证通过对不同干预的各种PresPM方法进行实验，突显了SimBank作为一个公开可用的模拟器，推动PresPM研究和实践的价值。

更新时间: 2025-07-02 09:09:00

领域: cs.DB,cs.LG

下载: http://arxiv.org/abs/2506.14772v3

Integrating Traditional and Deep Learning Methods to Detect Tree Crowns in Satellite Images

Global warming, loss of biodiversity, and air pollution are among the most significant problems facing Earth. One of the primary challenges in addressing these issues is the lack of monitoring forests to protect them. To tackle this problem, it is important to leverage remote sensing and computer vision methods to automate monitoring applications. Hence, automatic tree crown detection algorithms emerged based on traditional and deep learning methods. In this study, we first introduce two different tree crown detection methods based on these approaches. Then, we form a novel rule-based approach that integrates these two methods to enhance robustness and accuracy of tree crown detection results. While traditional methods are employed for feature extraction and segmentation of forested areas, deep learning methods are used to detect tree crowns in our method. With the proposed rule-based approach, we post-process these results, aiming to increase the number of detected tree crowns through neighboring trees and localized operations. We compare the obtained results with the proposed method in terms of the number of detected tree crowns and report the advantages, disadvantages, and areas for improvement of the obtained outcomes.

Updated: 2025-07-02 09:05:28

标题: 将传统方法和深度学习方法整合起来，用于在卫星图像中检测树冠

摘要: 全球变暖、生物多样性丧失和空气污染是地球面临的最重要问题之一。解决这些问题的主要挑战之一是缺乏监测森林以保护它们。为了解决这个问题，利用遥感和计算机视觉方法自动化监测应用是很重要的。因此，基于传统和深度学习方法出现了自动树冠检测算法。在这项研究中，我们首先介绍了基于这些方法的两种不同的树冠检测方法。然后，我们提出了一种新颖的基于规则的方法，将这两种方法整合起来以增强树冠检测结果的鲁棒性和准确性。虽然传统方法用于特征提取和森林区域的分割，但我们的方法使用深度学习方法来检测树冠。通过提出的基于规则的方法，我们对这些结果进行后处理，旨在通过相邻树木和局部操作增加检测到的树冠数量。我们比较了所得结果与提出方法在检测到的树冠数量方面的优劣，并报道了所得结果的优点、缺点和改进空间。

更新时间: 2025-07-02 09:05:28

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.01502v1

Meteoroid stream identification with HDBSCAN unsupervised clustering algorithm

Accurate identification of meteoroid streams is central to understanding their origins and evolution. However, overlapping clusters and background noise hinder classification, an issue amplified for missions such as ESA's LUMIO that rely on meteor shower observations to infer lunar meteoroid impact parameters. This study evaluates the performance of the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithm for unsupervised meteoroid stream identification, comparing its outcomes with the established Cameras for All-Sky Meteor Surveillance (CAMS) look-up table method. We analyze the CAMS Meteoroid Orbit Database v3.0 using three feature vectors: LUTAB (CAMS geocentric parameters), ORBIT (heliocentric orbital elements), and GEO (adapted geocentric parameters). HDBSCAN is applied with varying minimum cluster sizes and two cluster selection methods (eom and leaf). To align HDBSCAN clusters with CAMS classifications, the Hungarian algorithm determines the optimal mapping. Clustering performance is assessed via the Silhouette score, Normalized Mutual Information, and F1 score, with Principal Component Analysis further supporting the analysis. With the GEO vector, HDBSCAN confirms 39 meteoroid streams, 21 strongly aligning with CAMS. The ORBIT vector identifies 30 streams, 13 with high matching scores. Less active showers pose identification challenges. The eom method consistently yields superior performance and agreement with CAMS. Although HDBSCAN requires careful selection of the minimum cluster size, it delivers robust, internally consistent clusters and outperforms the look-up table method in statistical coherence. These results underscore HDBSCAN's potential as a mathematically consistent alternative for meteoroid stream identification, although further validation is needed to assess physical validity.

Updated: 2025-07-02 09:04:44

标题: 使用HDBSCAN无监督聚类算法识别流星流

摘要: 准确识别流星流对于理解它们的起源和演化至关重要。然而，重叠的簇和背景噪音阻碍了分类，这对于依赖流星雨观测推断月球陨石冲击参数的任务（如ESA的LUMIO）而言是一个加剧的问题。本研究评估了基于密度的空间聚类应用与噪声（HDBSCAN）算法在无监督流星流识别中的性能，将其结果与已建立的全天摄像头流星监测（CAMS）查找表方法进行比较。我们使用三个特征向量分析了CAMS流星轨道数据库v3.0：LUTAB（CAMS地心参数）、ORBIT（太阳系轨道要素）和GEO（适应的地心参数）。HDBSCAN采用不同的最小簇大小和两种簇选择方法（eom和leaf）。为了将HDBSCAN聚类与CAMS分类对齐，匈牙利算法确定了最佳映射。通过轮廓分数、标准化互信息和F1分数评估聚类性能，主成分分析进一步支持分析。使用GEO向量，HDBSCAN确认了39个流星流，其中21个与CAMS高度对齐。ORBIT向量识别了30个流星流，其中13个匹配得分较高。活跃度较低的流星雨面临识别挑战。eom方法始终表现出较高的性能和与CAMS的一致性。尽管HDBSCAN需要仔细选择最小的簇大小，但它提供了强大、内部一致的簇，并在统计一致性方面优于查找表方法。这些结果强调了HDBSCAN作为流星流识别的数学一致性替代方案的潜力，尽管还需要进一步验证以评估其物理有效性。

更新时间: 2025-07-02 09:04:44

领域: astro-ph.EP,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2507.01501v1

optimizn: a Python Library for Developing Customized Optimization Algorithms

Combinatorial optimization problems are prevalent across a wide variety of domains. These problems are often nuanced, their optimal solutions might not be efficiently obtainable, and they may require lots of time and compute resources to solve (they are NP-hard). It follows that the best course of action for solving these problems is to use general optimization algorithm paradigms to quickly and easily develop algorithms that are customized to these problems and can produce good solutions in a reasonable amount of time. In this paper, we present optimizn, a Python library for developing customized optimization algorithms under general optimization algorithm paradigms (simulated annealing, branch and bound). Additionally, optimizn offers continuous training, with which users can run their algorithms on a regular cadence, retain the salient aspects of previous runs, and use them in subsequent runs to potentially produce solutions that get closer and closer to optimality. An earlier version of this paper was peer reviewed and published internally at Microsoft.

Updated: 2025-07-02 09:03:51

标题: 优化：一个用于开发定制优化算法的Python库

摘要: 组合优化问题在各个领域都很普遍。这些问题往往很微妙，它们的最佳解决方案可能无法高效获取，并且可能需要大量的时间和计算资源来解决（它们是 NP 难题）。因此，解决这些问题的最佳方法是使用通用优化算法范式，快速而轻松地开发针对这些问题定制的算法，并在合理的时间内产生良好的解决方案。在本文中，我们介绍了optimizn，一个用于开发基于通用优化算法范式（模拟退火、分支定界）的定制优化算法的 Python 库。此外，optimizn提供连续训练，用户可以定期运行他们的算法，保留之前运行的显著特征，并在后续运行中使用它们，以潜在地产生越来越逼近最优解的解决方案。本文的早期版本已在 Microsoft 内部进行同行评审并发表。

更新时间: 2025-07-02 09:03:51

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2503.00033v2

Diffusion-based Iterative Counterfactual Explanations for Fetal Ultrasound Image Quality Assessment

Obstetric ultrasound image quality is crucial for accurate diagnosis and monitoring of fetal health. However, acquiring high-quality standard planes is difficult, influenced by the sonographer's expertise and factors like the maternal BMI or fetus dynamics. In this work, we explore diffusion-based counterfactual explainable AI to generate realistic, high-quality standard planes from low-quality non-standard ones. Through quantitative and qualitative evaluation, we demonstrate the effectiveness of our approach in generating plausible counterfactuals of increased quality. This shows future promise for enhancing training of clinicians by providing visual feedback and potentially improving standard plane quality and acquisition for downstream diagnosis and monitoring.

Updated: 2025-07-02 09:00:19

标题: 基于扩散的胎儿超声图像质量评估的迭代反事实解释

摘要: 产科超声图像质量对于准确诊断和监测胎儿健康至关重要。然而，获取高质量的标准平面很困难，受到超声医生的专业知识以及诸如孕妇BMI或胎儿动态等因素的影响。在这项工作中，我们探索了基于扩散的反事实可解释人工智能，从低质量的非标准平面生成逼真的高质量标准平面。通过定量和定性评估，我们展示了我们方法在生成增加质量的合理反事实方面的有效性。这表明未来有望通过提供视觉反馈来增强临床医生的培训，并有可能改进下游诊断和监测的标准平面质量和获取。

更新时间: 2025-07-02 09:00:19

领域: eess.IV,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2403.08700v2

On the Trade-off between Flatness and Optimization in Distributed Learning

This paper proposes a theoretical framework to evaluate and compare the performance of stochastic gradient algorithms for distributed learning in relation to their behavior around local minima in nonconvex environments. Previous works have noticed that convergence toward flat local minima tend to enhance the generalization ability of learning algorithms. This work discovers three interesting results. First, it shows that decentralized learning strategies are able to escape faster away from local minima and favor convergence toward flatter minima relative to the centralized solution. Second, in decentralized methods, the consensus strategy has a worse excess-risk performance than diffusion, giving it a better chance of escaping from local minima and favoring flatter minima. Third, and importantly, the ultimate classification accuracy is not solely dependent on the flatness of the local minimum but also on how well a learning algorithm can approach that minimum. In other words, the classification accuracy is a function of both flatness and optimization performance. In this regard, since diffusion has a lower excess-risk than consensus, when both algorithms are trained starting from random initial points, diffusion enhances the classification accuracy. The paper examines the interplay between the two measures of flatness and optimization error closely. One important conclusion is that decentralized strategies deliver in general enhanced classification accuracy because they strike a more favorable balance between flatness and optimization performance compared to the centralized solution.

Updated: 2025-07-02 08:56:14

标题: 关于分布式学习中平坦性与优化之间的权衡

摘要: 本文提出了一个理论框架，用于评估和比较分布式学习中随机梯度算法的性能，特别是它们在非凸环境中围绕局部最小值的行为。先前的研究已经注意到，收敛于平坦局部最小值有助于增强学习算法的泛化能力。本研究发现了三个有趣的结果。首先，它表明去中心化学习策略能够更快地摆脱局部最小值，并有利于收敛于更平坦的最小值相对于集中式解决方案。其次，在去中心化方法中，共识策略的超额风险表现比扩散策略差，使其更有可能摆脱局部最小值并有利于更平坦的最小值。第三，更重要的是，最终的分类准确性不仅仅取决于局部最小值的平坦度，还取决于学习算法能够多好地接近该最小值。换句话说，分类准确性是平坦度和优化性能的函数。在这方面，由于扩散的超额风险低于共识，当两种算法都从随机初始点开始训练时，扩散提高了分类准确性。本文密切检验了平坦度和优化错误这两个指标之间的相互作用。一个重要的结论是，去中心化策略通常提供了增强的分类准确性，因为它们在平坦度和优化性能之间达到了更有利的平衡，相比于集中式解决方案。

更新时间: 2025-07-02 08:56:14

领域: cs.LG

下载: http://arxiv.org/abs/2406.20006v2

Crop Pest Classification Using Deep Learning Techniques: A Review

Insect pests continue to bring a serious threat to crop yields around the world, and traditional methods for monitoring them are often slow, manual, and difficult to scale. In recent years, deep learning has emerged as a powerful solution, with techniques like convolutional neural networks (CNNs), vision transformers (ViTs), and hybrid models gaining popularity for automating pest detection. This review looks at 37 carefully selected studies published between 2018 and 2025, all focused on AI-based pest classification. The selected research is organized by crop type, pest species, model architecture, dataset usage, and key technical challenges. The early studies relied heavily on CNNs but latest work is shifting toward hybrid and transformer-based models that deliver higher accuracy and better contextual understanding. Still, challenges like imbalanced datasets, difficulty in detecting small pests, limited generalizability, and deployment on edge devices remain significant hurdles. Overall, this review offers a structured overview of the field, highlights useful datasets, and outlines the key challenges and future directions for AI-based pest monitoring systems.

Updated: 2025-07-02 08:52:35

标题: 使用深度学习技术对作物病虫害进行分类：综述

摘要: 昆虫害继续对全球农作物产量构成严重威胁，传统的监测方法往往是缓慢、手动且难以扩展。近年来，深度学习已经成为一个强大的解决方案，其中卷积神经网络（CNNs）、视觉变压器（ViTs）和混合模型等技术因其自动化害虫检测的功能而备受推崇。本综述回顾了2018年至2025年间发表的37篇精选研究，所有这些研究都专注于基于人工智能的害虫分类。所选研究按作物类型、害虫物种、模型架构、数据集使用和关键技术挑战进行组织。早期研究主要依赖CNNs，但最新的工作正在转向使用混合和基于变压器的模型，这些模型提供了更高的准确性和更好的上下文理解能力。然而，诸如数据集不平衡、难以检测小型害虫、有限的泛化能力以及在边缘设备上部署等挑战仍然是重要的障碍。总的来说，本综述提供了对该领域的结构化概述，突出了有用的数据集，并概述了基于人工智能的害虫监测系统的关键挑战和未来方向。

更新时间: 2025-07-02 08:52:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.01494v1

A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures

In recent years, Large-Language-Model-driven AI agents have exhibited unprecedented intelligence and adaptability, and are rapidly changing human production and life. Nowadays, agents are undergoing a new round of evolution. They no longer act as an isolated island like LLMs. Instead, they start to communicate with diverse external entities, such as other agents and tools, to perform more complex tasks collectively. Under this trend, agent communication is regarded as a foundational pillar of the future AI ecosystem, and many organizations have intensively begun to design related communication protocols (e.g., Anthropic's MCP and Google's A2A) within the recent few months. However, this new field exposes significant security hazards, which can cause severe damage to real-world scenarios. To help researchers quickly figure out this promising topic and benefit the future agent communication development, this paper presents a comprehensive survey of agent communication security. More precisely, we first present a clear definition of agent communication and categorize the entire lifecycle of agent communication into three stages: user-agent interaction, agent-agent communication, and agent-environment communication. Next, for each communication phase, we dissect related protocols and analyze the security risks according to the communication characteristics. Then, we summarize and outlook on the possible defense countermeasures for each risk. In addition, we conduct experiments using MCP and A2A to help readers better understand the novel vulnerabilities brought by agent communication. Finally, we discuss open issues and future directions in this promising research field.

Updated: 2025-07-02 08:50:11

标题: 一项关于LLM驱动的AI代理通信的调查：协议、安全风险和防御对策

摘要: 近年来，由大型语言模型驱动的AI代理展现出前所未有的智能和适应性，正在迅速改变人类生产和生活。如今，代理正在经历一轮新的演变。它们不再像LLMs那样作为孤立的岛屿。相反，它们开始与多样化的外部实体进行交流，如其他代理和工具，以集体执行更复杂的任务。在这一趋势下，代理通信被视为未来AI生态系统的基石，许多组织已经开始密集地设计相关的通信协议（例如Anthropic的MCP和Google的A2A）在最近几个月内。然而，这个新领域暴露了重大的安全风险，可能对现实场景造成严重损害。为了帮助研究人员快速了解这一有前途的主题并受益于未来代理通信的发展，本文提出了对代理通信安全的全面调查。更确切地说，我们首先对代理通信进行了明确的定义，并将整个代理通信的生命周期分为三个阶段：用户-代理交互，代理-代理通信和代理-环境通信。然后，对于每个通信阶段，我们对相关协议进行了分析，并根据通信特性分析了安全风险。接着，我们总结并展望了针对每种风险的可能防御对策。此外，我们使用MCP和A2A进行实验，以帮助读者更好地理解代理通信带来的新型漏洞。最后，我们讨论了这一有前途研究领域的未解决问题和未来方向。

更新时间: 2025-07-02 08:50:11

领域: cs.CR

下载: http://arxiv.org/abs/2506.19676v3

Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning

Large Language Models (LLMs) have emerged as one of the most significant technological advancements in artificial intelligence in recent years. Their ability to understand, generate, and reason with natural language has transformed how we interact with AI systems. With the development of LLM-based agents and reinforcement-learning-based reasoning models, the study of applying reinforcement learning in agent frameworks has become a new research focus. However, all previous studies face the challenge of deciding the tool calling process and the reasoning process simultaneously, and the chain of reasoning was solely relied on the unprocessed raw result with redundant information and symbols unrelated to the task from the tool, which impose a heavy burden on the model's capability to reason. Therefore, in our research, we proposed a hierarchical framework Agent-as-tool that detach the tool calling process and the reasoning process, which enables the model to focus on the verbally reasoning process while the tool calling process is handled by another agent. Our work had achieved comparable results with only a slight reinforcement fine-tuning on 180 samples, and had achieved exceptionally well performance in Bamboogle with 63.2% of exact match and 75.2% in cover exact match, exceeding Search-R1 by 4.8% in exact match and 3.2% in cover exact match.

Updated: 2025-07-02 08:49:43

标题: 主体作为工具：关于强化学习下的层次决策研究

摘要: 大型语言模型(LLMs)已经成为近年来人工智能领域最重要的技术进步之一。它们能够理解、生成和推理自然语言，改变了我们与人工智能系统互动的方式。随着基于LLM的代理和基于强化学习的推理模型的发展，将强化学习应用于代理框架的研究成为一个新的研究重点。然而，所有先前的研究都面临同时决定工具调用过程和推理过程的挑战，推理链仅仅依赖于来自工具的未处理原始结果，其中包含与任务无关的冗余信息和符号，给模型推理能力带来重大负担。因此，在我们的研究中，我们提出了一个分层框架Agent-as-tool，将工具调用过程和推理过程分离，使模型能够专注于语言推理过程，而工具调用过程由另一个代理处理。我们的工作在180个样本上只进行了轻微的强化调整就取得了可比的结果，在Bamboogle中取得了63.2%的精确匹配和75.2%的覆盖精确匹配，超过了Search-R1在精确匹配上的4.8%和覆盖精确匹配上的3.2%。

更新时间: 2025-07-02 08:49:43

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2507.01489v1

How to Securely Shuffle? A survey about Secure Shufflers for privacy-preserving computations

Ishai et al. (FOCS'06) introduced secure shuffling as an efficient building block for private data aggregation. Recently, the field of differential privacy has revived interest in secure shufflers by highlighting the privacy amplification they can provide in various computations. Although several works argue for the utility of secure shufflers, they often treat them as black boxes; overlooking the practical vulnerabilities and performance trade-offs of existing implementations. This leaves a central question open: what makes a good secure shuffler? This survey addresses that question by identifying, categorizing, and comparing 26 secure protocols that realize the necessary shuffling functionality. To enable a meaningful comparison, we adapt and unify existing security definitions into a consistent set of properties. We also present an overview of privacy-preserving technologies that rely on secure shufflers, offer practical guidelines for selecting appropriate protocols, and outline promising directions for future work.

Updated: 2025-07-02 08:48:53

标题: 如何安全地洗牌？关于隐私保护计算的安全洗牌器调查

摘要: Ishai等人（FOCS'06）将安全洗牌引入作为私人数据聚合的有效构建模块。最近，差分隐私领域通过突显安全洗牌在各种计算中提供的隐私放大效果，重新激起了对安全洗牌的兴趣。尽管有几项研究支持安全洗牌的实用性，但它们通常将其视为黑箱；忽视了现有实现的实际漏洞和性能权衡。这留下了一个核心问题：什么才是一个好的安全洗牌器？这项调查通过识别、分类和比较26个实现必要洗牌功能的安全协议来回答这个问题。为了进行有意义的比较，我们将现有的安全定义调整和统一为一致的属性集。我们还提供了依赖于安全洗牌器的隐私保护技术概述，为选择适当的协议提供了实用指南，并概述了未来工作的有希望的方向。

更新时间: 2025-07-02 08:48:53

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2507.01487v1

BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments

Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation. Yet, their application remains constrained by rigid protocol design, limited adaptability to dynamic lab conditions, inadequate error handling, and high operational complexity. Here we introduce BioMARS (Biological Multi-Agent Robotic System), an intelligent platform that integrates LLMs, VLMs, and modular robotics to autonomously design, plan, and execute biological experiments. BioMARS uses a hierarchical architecture: the Biologist Agent synthesizes protocols via retrieval-augmented generation; the Technician Agent translates them into executable robotic pseudo-code; and the Inspector Agent ensures procedural integrity through multimodal perception and anomaly detection. The system autonomously conducts cell passaging and culture tasks, matching or exceeding manual performance in viability, consistency, and morphological integrity. It also supports context-aware optimization, outperforming conventional strategies in differentiating retinal pigment epithelial cells. A web interface enables real-time human-AI collaboration, while a modular backend allows scalable integration with laboratory hardware. These results highlight the feasibility of generalizable, AI-driven laboratory automation and the transformative role of language-based reasoning in biological research.

Updated: 2025-07-02 08:47:02

标题: BioMARS：用于自主生物实验的多智能体机器人系统

摘要: 大型语言模型（LLMs）和视觉语言模型（VLMs）有潜力通过实现自主实验来改变生物研究。然而，它们的应用仍受到严格的协议设计、对动态实验室条件的适应性有限、错误处理不足和高操作复杂性的限制。在这里，我们介绍了BioMARS（生物多智能机器人系统），这是一个智能平台，集成了LLMs、VLMs和模块化机器人技术，可以自主设计、规划和执行生物实验。BioMARS采用分层架构：生物学家代理通过检索增强生成方案；技术人员代理将其转化为可执行的机器人伪代码；检验员代理通过多模态感知和异常检测确保程序完整性。该系统自主进行细胞传代和培养任务，在存活率、一致性和形态完整性方面与手动操作相匹敌或超越。它还支持上下文感知优化，在分化视网膜色素上皮细胞方面优于传统策略。一个网络界面实现了实时人工智能协作，而模块化后端允许与实验室硬件进行可扩展集成。这些结果突显了可推广的、AI驱动的实验室自动化的可行性，以及基于语言推理在生物研究中的转变作用。

更新时间: 2025-07-02 08:47:02

领域: cs.RO,cs.AI,cs.MA,q-bio.QM

下载: http://arxiv.org/abs/2507.01485v1

Epistemic Scarcity: The Economics of Unresolvable Unknowns

This paper presents a praxeological analysis of artificial intelligence and algorithmic governance, challenging assumptions about the capacity of machine systems to sustain economic and epistemic order. Drawing on Misesian a priori reasoning and Austrian theories of entrepreneurship, we argue that AI systems are incapable of performing the core functions of economic coordination: interpreting ends, discovering means, and communicating subjective value through prices. Where neoclassical and behavioural models treat decisions as optimisation under constraint, we frame them as purposive actions under uncertainty. We critique dominant ethical AI frameworks such as Fairness, Accountability, and Transparency (FAT) as extensions of constructivist rationalism, which conflict with a liberal order grounded in voluntary action and property rights. Attempts to encode moral reasoning in algorithms reflect a misunderstanding of ethics and economics. However complex, AI systems cannot originate norms, interpret institutions, or bear responsibility. They remain opaque, misaligned, and inert. Using the concept of epistemic scarcity, we explore how information abundance degrades truth discernment, enabling both entrepreneurial insight and soft totalitarianism. Our analysis ends with a civilisational claim: the debate over AI concerns the future of human autonomy, institutional evolution, and reasoned choice. The Austrian tradition, focused on action, subjectivity, and spontaneous order, offers the only coherent alternative to rising computational social control.

Updated: 2025-07-02 08:46:24

标题: 认知稀缺：无法解决未知的经济学

摘要: 这篇论文提出了对人工智能和算法治理的实践分析，挑战了关于机器系统维持经济和认识秩序能力的假设。借鉴Mises的先验推理和奥地利创业理论，我们认为人工智能系统无法执行经济协调的核心功能：解释目的，发现手段，并通过价格传达主观价值。在新古典和行为模型将决策视为受约束的优化时，我们将其框定为在不确定性下的目的性行为。我们批评主流的道德人工智能框架，如公平性、问责性和透明度（FAT），将其视为建构主义理性主义的延伸，与基于自愿行动和财产权的自由秩序相冲突。试图将道德推理编码到算法中反映了对伦理学和经济学的误解。然而，无论多么复杂，人工智能系统都无法产生规范，解释制度，或承担责任。它们仍然是不透明的，不协调的和惰性的。通过认知稀缺的概念，我们探讨信息丰富度如何降低真相辨别，从而使创业洞察和软专制主义成为可能。我们的分析以一个文明主张结束：关于人工智能的辩论涉及人类自治、制度演变和理性选择的未来。奥地利传统，关注行动、主观性和自发秩序，提供了对不断上升的计算社会控制的唯一连贯替代方案。

更新时间: 2025-07-02 08:46:24

领域: econ.GN,cs.AI,cs.CY,physics.hist-ph,q-fin.EC,91B42, 91B40, 68T01,J.4; I.2.1; K.4.1; K.4.2

下载: http://arxiv.org/abs/2507.01483v1

Evaluating the Effectiveness of Direct Preference Optimization for Personalizing German Automatic Text Simplifications for Persons with Intellectual Disabilities

Automatic text simplification (ATS) aims to enhance language accessibility for various target groups, particularly persons with intellectual disabilities. Recent advancements in generative AI, especially large language models (LLMs), have substantially improved the quality of machine-generated text simplifications, thereby mitigating information barriers for the target group. However, existing LLM-based ATS systems do not incorporate preference feedback on text simplifications during training, resulting in a lack of personalization tailored to the specific needs of target group representatives. In this work, we extend the standard supervised fine-tuning (SFT) approach for adapting LLM-based ATS models by leveraging a computationally efficient LLM alignment technique -- direct preference optimization (DPO). Specifically, we post-train LLM-based ATS models using human feedback collected from persons with intellectual disabilities, reflecting their preferences on paired text simplifications generated by mainstream LLMs. Furthermore, we propose a pipeline for developing personalized LLM-based ATS systems, encompassing data collection, model selection, SFT and DPO post-training, and evaluation. Our findings underscore the necessity of active participation of target group persons in designing personalized AI accessibility solutions aligned with human expectations. This work represents a step towards personalizing inclusive AI systems at the target-group level, incorporating insights not only from text simplification experts but also from target group persons themselves.

Updated: 2025-07-02 08:43:06

标题: 评估直接偏好优化对智障人士个性化德语自动文本简化的有效性

摘要: 自动文本简化（ATS）旨在提高各种目标群体，特别是智障人士的语言可访问性。最近生成式人工智能（AI）的进展，特别是大型语言模型（LLMs），显著提高了机器生成的文本简化的质量，从而减轻了目标群体的信息障碍。然而，现有基于LLM的ATS系统在训练过程中没有整合文本简化的偏好反馈，导致缺乏针对目标群体代表具体需求的个性化定制。在这项工作中，我们通过利用一种计算效率较高的LLM对齐技术--直接偏好优化（DPO），扩展了标准监督微调（SFT）方法，用于调整基于LLM的ATS模型。具体而言，我们使用从智障人士收集的人类反馈对基于主流LLM生成的配对文本简化进行后训练。此外，我们提出了一个用于开发个性化LLM-based ATS系统的流程，包括数据收集、模型选择、SFT和DPO后训练以及评估。我们的发现强调了目标群体人员在设计符合人类期望的个性化AI可访问性解决方案中的积极参与的必要性。这项工作代表了向目标群体级别个性化包容性AI系统迈出的一步，不仅融合了文本简化专家的见解，还融合了目标群体人员自身的见解。

更新时间: 2025-07-02 08:43:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.01479v1

ManifoldMind: Dynamic Hyperbolic Reasoning for Trustworthy Recommendations

We introduce ManifoldMind, a probabilistic geometric recommender system for exploratory reasoning over semantic hierarchies in hyperbolic space. Unlike prior methods with fixed curvature and rigid embeddings, ManifoldMind represents users, items, and tags as adaptive-curvature probabilistic spheres, enabling personalised uncertainty modeling and geometry-aware semantic exploration. A curvature-aware semantic kernel supports soft, multi-hop inference, allowing the model to explore diverse conceptual paths instead of overfitting to shallow or direct interactions. Experiments on four public benchmarks show superior NDCG, calibration, and diversity compared to strong baselines. ManifoldMind produces explicit reasoning traces, enabling transparent, trustworthy, and exploration-driven recommendations in sparse or abstract domains.

Updated: 2025-07-02 08:42:11

标题: ManifoldMind：可靠推荐的动态双曲推理

摘要: 我们介绍了ManifoldMind，这是一个基于概率几何的推荐系统，用于在双曲空间中的语义层次结构上进行探索性推理。与之前具有固定曲率和刚性嵌入的方法不同，ManifoldMind将用户、项目和标签表示为自适应曲率的概率球，实现个性化的不确定性建模和几何感知的语义探索。一种曲率感知的语义核心支持软、多跳推理，使模型能够探索多样化的概念路径，而不是过度拟合于浅层或直接交互。在四个公共基准测试中的实验结果显示，与强基线相比，ManifoldMind表现出更优越的NDCG、校准和多样性。ManifoldMind产生明确的推理轨迹，使得在稀疏或抽象领域中能够进行透明、可信赖和探索驱动的推荐。

更新时间: 2025-07-02 08:42:11

领域: cs.IR,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.02014v1

Optimizing Methane Detection On Board Satellites: Speed, Accuracy, and Low-Power Solutions for Resource-Constrained Hardware

Methane is a potent greenhouse gas, and detecting its leaks early via hyperspectral satellite imagery can help mitigate climate change. Meanwhile, many existing missions operate in manual tasking regimes only, thus missing potential events of interest. To overcome slow downlink rates cost-effectively, onboard detection is a viable solution. However, traditional methane enhancement methods are too computationally demanding for resource-limited onboard hardware. This work accelerates methane detection by focusing on efficient, low-power algorithms. We test fast target detection methods (ACE, CEM) that have not been previously used for methane detection and propose a Mag1c-SAS - a significantly faster variant of the current state-of-the-art algorithm for methane detection: Mag1c. To explore their true detection potential, we integrate them with a machine learning model (U-Net, LinkNet). Our results identify two promising candidates (Mag1c-SAS and CEM), both acceptably accurate for the detection of strong plumes and computationally efficient enough for onboard deployment: one optimized more for accuracy, the other more for speed, achieving up to ~100x and ~230x faster computation than original Mag1c on resource-limited hardware. Additionally, we propose and evaluate three band selection strategies. One of them can outperform the method traditionally used in the field while using fewer channels, leading to even faster processing without compromising accuracy. This research lays the foundation for future advancements in onboard methane detection with minimal hardware requirements, improving timely data delivery. The produced code, data, and models are open-sourced and can be accessed from https://github.com/zaitra/methane-filters-benchmark.

Updated: 2025-07-02 08:34:34

标题: 优化卫星上的甲烷检测：资源受限硬件的速度、准确性和低功耗解决方案

摘要: 甲烷是一种强效的温室气体，通过高光谱卫星图像早期检测其泄漏可以帮助减缓气候变化。同时，许多现有任务只在手动指派制度下运作，因此错过了可能感兴趣的事件。为了通过成本效益高效地克服下行速率慢的问题，机载检测是一个可行的解决方案。然而，传统的甲烷增强方法对资源有限的机载硬件来说计算要求过高。这项工作通过专注于高效、低功耗的算法加速甲烷检测。我们测试了快速目标检测方法（ACE、CEM），这些方法以前尚未用于甲烷检测，并提出了一种明显更快的变体Mag1c-SAS，这是目前最先进的甲烷检测算法Mag1c的一个变体。为了探索它们真正的检测潜力，我们将它们与机器学习模型（U-Net、LinkNet）集成。我们的结果确定了两个有前途的候选者（Mag1c-SAS和CEM），两者对于检测强烟囱都具有可接受的准确性，并且在机载部署上具有足够的计算效率：一个更多优化于准确性，另一个更多优化于速度，比原始的Mag1c在资源有限的硬件上计算速度快了多达100倍和230倍。此外，我们提出并评估了三种波段选择策略。其中一种可以在使用更少通道的情况下胜过传统领域中使用的方法，从而实现更快的处理速度而不影响准确性。这项研究为未来在机载甲烷检测方面的最低硬件要求奠定了基础，提高了及时数据传输。所产生的代码、数据和模型是开源的，可以从https://github.com/zaitra/methane-filters-benchmark访问。

更新时间: 2025-07-02 08:34:34

领域: cs.CV,cs.LG,cs.PF

下载: http://arxiv.org/abs/2507.01472v1

Zero-Incentive Dynamics: a look at reward sparsity through the lens of unrewarded subgoals

This work re-examines the commonly held assumption that the frequency of rewards is a reliable measure of task difficulty in reinforcement learning. We identify and formalize a structural challenge that undermines the effectiveness of current policy learning methods: when essential subgoals do not directly yield rewards. We characterize such settings as exhibiting zero-incentive dynamics, where transitions critical to success remain unrewarded. We show that state-of-the-art deep subgoal-based algorithms fail to leverage these dynamics and that learning performance is highly sensitive to the temporal proximity between subgoal completion and eventual reward. These findings reveal a fundamental limitation in current approaches and point to the need for mechanisms that can infer latent task structure without relying on immediate incentives.

Updated: 2025-07-02 08:33:03

标题: 零激励动态：通过未奖励子目标的视角看奖励稀疏化

摘要: 这项工作重新审视了一个常见的假设，即奖励频率是强化学习中任务难度的可靠度量。我们确定并形式化了一个结构性挑战，这个挑战削弱了当前策略学习方法的有效性：当关键子目标并不直接产生奖励时。我们将这样的设置描述为零激励动态，其中对成功至关重要的转换仍未受到奖励。我们展示了目前最先进的深度子目标算法未能利用这些动态，并且学习性能对子目标完成和最终奖励之间的时间接近性非常敏感。这些发现揭示了当前方法的一个基本局限，并指出需要能够推断潜在任务结构的机制，而不依赖于即时激励。

更新时间: 2025-07-02 08:33:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01470v1

Cross-platform Smartphone Positioning at Museums

Indoor Positioning Systems (IPSs) hold significant potential for enhancing visitor experiences in cultural heritage institutions. By enabling personalized navigation, efficient artifact organization, and better interaction with exhibits, IPSs can transform the modalities of how individuals engage with museums, galleries and libraries. However, these institutions face several challenges in implementing IPSs, including environmental constraints, technical limits, and limited experimentation. In other contexts, Received Signal Strength (RSS)-based approaches using Bluetooth Low Energy (BLE) and WiFi have emerged as preferred solutions due to their non-invasive nature and minimal infrastructure requirements. Nevertheless, the lack of publicly available RSS datasets that specifically reflect museum environments presents a substantial barrier to developing and evaluating positioning algorithms designed for the intricate spatial characteristics typical of cultural heritage sites. To address this limitation, we present BAR, a novel RSS dataset collected in front of 90 artworks across 13 museum rooms using two different platforms, i.e., Android and iOS. Additionally, we provide an advanced position classification baseline taking advantage of a proximity-based method and $k$-NN algorithms. In our analysis, we discuss the results and offer suggestions for potential research directions.

Updated: 2025-07-02 08:31:12

标题: 跨平台智能手机在博物馆定位

摘要: 室内定位系统（IPSs）在增强文化遗产机构游客体验方面具有重要潜力。通过实现个性化导航、高效的文物组织以及更好地与展品互动，IPSs可以改变个人与博物馆、画廊和图书馆互动的方式。然而，这些机构在实施IPSs时面临一些挑战，包括环境限制、技术限制和有限的实验。在其他情境中，使用蓝牙低功耗（BLE）和WiFi的基于接收信号强度（RSS）的方法已成为首选解决方案，因为它们具有非侵入性和最小的基础设施要求。然而，缺乏公开可用的反映博物馆环境的RSS数据集，这是开发和评估专为典型文化遗产场所复杂空间特征设计的定位算法的重要障碍。为了解决这一限制，我们提出了BAR，这是一个通过两种不同平台（即Android和iOS）在13个博物馆展室前收集的90件艺术品的新颖RSS数据集。此外，我们提供了一个利用基于接近的方法和k-NN算法的高级位置分类基线。在分析中，我们讨论了结果并提出了潜在的研究方向建议。

更新时间: 2025-07-02 08:31:12

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.01469v1

Adaptive NAD: Online and Self-adaptive Unsupervised Network Anomaly Detector

The widespread usage of the Internet of Things (IoT) has raised the risks of cyber threats, thus developing Anomaly Detection Systems (ADSs) that can adapt to evolving or new attacks is critical. Previous studies primarily focused on offline unsupervised learning methods to safeguard ADSs, which is not applicable in practical real-world applications. Besides, most of them strongly rely on assumptions of known legitimates and fail to satisfy the interpretable requirements in security applications, creating barriers to the adoption in practice. In this paper, we design Adaptive NAD, a general framework to improve and interpret online unsupervised anomaly detection in security domains. An interpretable two-layer anomaly detection strategy is proposed to generate reliable high-confidence pseudo-labels. Then, an online learning scheme is introduced to update Adaptive NAD by a novel threshold calculation technique to adapt to new threats. Experimental results demonstrate that Adaptive NAD achieves more than 5.4%, 23.0%, and 3.2% improvements in SPAUC compared with state-of-the-art solutions on the CIC-Darknet2020, CIC-DoHBrw-2020, and Edge-IIoTset datasets, respectively. The code is released at https://github.com/MyLearnCodeSpace/Adaptive-NAD.

Updated: 2025-07-02 08:26:47

标题: 自适应NAD：在线和自适应的无监督网络异常检测器

摘要: 物联网(IoT)的广泛应用增加了网络威胁的风险，因此开发能够适应不断发展或新型攻击的异常检测系统(ADSs)至关重要。先前的研究主要集中在离线无监督学习方法，以保护ADSs，但这在实际世界中并不适用。此外，大多数方法强烈依赖于已知合法性的假设，并且不能满足安全应用中的可解释性要求，从而在实践中存在障碍。本文设计了自适应NAD，这是一个通用框架，旨在改进和解释安全领域中的在线无监督异常检测。提出了一种可解释的两层异常检测策略，用于生成可靠的高置信度伪标签。然后，引入了一种在线学习方案，通过一种新颖的阈值计算技术来更新自适应NAD，以适应新的威胁。实验结果表明，与CIC-Darknet2020、CIC-DoHBrw-2020和Edge-IIoTset数据集上的最新解决方案相比，自适应NAD在SPAUC上分别取得了超过5.4%、23.0%和3.2%的改进。代码发布在https://github.com/MyLearnCodeSpace/Adaptive-NAD。

更新时间: 2025-07-02 08:26:47

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2410.22967v4

SKIL: Semantic Keypoint Imitation Learning for Generalizable Data-efficient Manipulation

Real-world tasks such as garment manipulation and table rearrangement demand robots to perform generalizable, highly precise, and long-horizon actions. Although imitation learning has proven to be an effective approach for teaching robots new skills, large amounts of expert demonstration data are still indispensible for these complex tasks, resulting in high sample complexity and costly data collection. To address this, we propose Semantic Keypoint Imitation Learning (SKIL), a framework which automatically obtains semantic keypoints with the help of vision foundation models, and forms the descriptor of semantic keypoints that enables efficient imitation learning of complex robotic tasks with significantly lower sample complexity. In real-world experiments, SKIL doubles the performance of baseline methods in tasks such as picking a cup or mouse, while demonstrating exceptional robustness to variations in objects, environmental changes, and distractors. For long-horizon tasks like hanging a towel on a rack where previous methods fail completely, SKIL achieves a mean success rate of 70\% with as few as 30 demonstrations. Furthermore, SKIL naturally supports cross-embodiment learning due to its semantic keypoints abstraction. Our experiments demonstrate that even human videos bring considerable improvement to the learning performance. All these results demonstrate the great success of SKIL in achieving data-efficient generalizable robotic learning. Visualizations and code are available at: https://skil-robotics.github.io/SKIL-robotics/.

Updated: 2025-07-02 08:26:20

标题: SKIL: 语义关键点模仿学习用于通用数据有效操作

摘要: 实际世界中的任务，如服装操纵和桌子重新排列，要求机器人执行可泛化、高度精确和长期行动。虽然模仿学习已被证明是教导机器人新技能的有效方法，但对于这些复杂任务仍然需要大量专家示范数据，从而导致高样本复杂性和昂贵的数据收集。为了解决这个问题，我们提出了语义关键点模仿学习（SKIL）框架，该框架利用视觉基础模型自动获取语义关键点，并形成语义关键点的描述符，从而实现对复杂机器人任务进行高效模仿学习，样本复杂性显著降低。在真实世界的实验中，SKIL在诸如拿杯子或鼠标等任务中将基线方法的性能提升一倍，同时对于物体变化、环境变化和干扰物具有异常的稳健性。对于像在挂架上挂毛巾这样的长期任务，之前的方法完全失败，而SKIL在仅有30次示范时就实现了平均成功率达到70\%。此外，由于其语义关键点抽象，SKIL自然地支持跨体现学习。我们的实验表明，即使是人类视频也会显著提高学习性能。所有这些结果表明SKIL在实现数据高效通用机器人学习方面取得了巨大成功。可在以下网址查看可视化和代码：https://skil-robotics.github.io/SKIL-robotics/.

更新时间: 2025-07-02 08:26:20

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2501.14400v2

Symbolic identification of tensor equations in multidimensional physical fields

Recently, data-driven methods have shown great promise for discovering governing equations from simulation or experimental data. However, most existing approaches are limited to scalar equations, with few capable of identifying tensor relationships. In this work, we propose a general data-driven framework for identifying tensor equations, referred to as Symbolic Identification of Tensor Equations (SITE). The core idea of SITE--representing tensor equations using a host-plasmid structure--is inspired by the multidimensional gene expression programming (M-GEP) approach. To improve the robustness of the evolutionary process, SITE adopts a genetic information retention strategy. Moreover, SITE introduces two key innovations beyond conventional evolutionary algorithms. First, it incorporates a dimensional homogeneity check to restrict the search space and eliminate physically invalid expressions. Second, it replaces traditional linear scaling with a tensor linear regression technique, greatly enhancing the efficiency of numerical coefficient optimization. We validate SITE using two benchmark scenarios, where it accurately recovers target equations from synthetic data, showing robustness to noise and small sample sizes. Furthermore, SITE is applied to identify constitutive relations directly from molecular simulation data, which are generated without reliance on macroscopic constitutive models. It adapts to both compressible and incompressible flow conditions and successfully identifies the corresponding macroscopic forms, highlighting its potential for data-driven discovery of tensor equation.

Updated: 2025-07-02 08:25:05

标题: 多维物理场中张量方程的符号识别

摘要: 最近，数据驱动方法在从模拟或实验数据中发现控制方程方面表现出很大的潜力。然而，大多数现有方法仅限于标量方程，很少能够识别张量关系。在这项工作中，我们提出了一种用于识别张量方程的通用数据驱动框架，称为符号识别张量方程（SITE）。SITE的核心思想是使用基因表达规划（M-GEP）方法启发的主-质粒结构来表示张量方程。为了提高进化过程的稳健性，SITE采用了一个遗传信息保留策略。此外，SITE引入了两个超越传统进化算法的关键创新。首先，它结合了维度齐次性检查，限制了搜索空间，并消除了物理上无效的表达式。第二，它用张量线性回归技术取代传统的线性缩放，极大地提高了数值系数优化的效率。我们使用两个基准场景验证了SITE，在那里它能够准确地从合成数据中恢复目标方程，显示了对噪声和小样本大小的稳健性。此外，SITE被应用于直接从分子模拟数据中识别本构关系，这些数据生成时没有依赖于宏观本构模型。它适应了可压缩和不可压缩流动条件，并成功地识别了相应的宏观形式，突显了其在数据驱动发现张量方程方面的潜力。

更新时间: 2025-07-02 08:25:05

领域: math-ph,cs.LG,math.MP

下载: http://arxiv.org/abs/2507.01466v1

A new efficient RPKI Design

Resource Public Key Infrastructure (RPKI) is a critical security mechanism for BGP, but the complexity of its architecture is a growing concern as its adoption scales. Current RPKI design heavily reuses legacy PKI components, such as X.509 EE-certificates, ASN.1 encoding, and XML-based repository protocols, all these introduce excessive cryptographic validation, redundant metadata, and inefficiencies in both storage and processing. We show that these design choices, although based on established standards, create significant performance bottlenecks, increase the vulnerability surface, and hinder scalability for wide-scale Internet deployment. In this paper, we perform the first systematic analysis of the root causes of complexity in RPKI's design and experimentally quantify their real-world impact. We show that over 70% of validation time in RPKI relying parties is spent on certificate parsing and signature verification, much of it unnecessary. Building on this insight, we introduce the improved RPKI (iRPKI), a backwards-compatible redesign that preserves all security guarantees while substantially reducing protocol overhead. iRPKI eliminates EE-certificates and ROA signatures, merges revocation and integrity objects, replaces verbose encodings with Protobuf, and restructures repository metadata for more efficient access. We experimentally demonstrate that our implementation of iRPKI in the Routinator validator achieves a 20x speed-up of processing time, 18x improvement of bandwidth requirements and 8x reduction in cache memory footprint, while also eliminating classes of vulnerabilities that have led to at least 10 vulnerabilities in RPKI software. iRPKI significantly increases the feasibility of deploying RPKI at scale in the Internet, and especially in constrained environments. Our design may be deployed incrementally without impacting existing operations.

Updated: 2025-07-02 08:24:50

标题: 一个新高效的RPKI设计

摘要: 资源公钥基础设施（RPKI）是BGP的一个关键安全机制，但随着其采用规模的扩大，其架构的复杂性成为一个日益关注的问题。当前的RPKI设计大量重用传统的PKI组件，如X.509 EE证书、ASN.1编码和基于XML的存储库协议，所有这些都引入了过多的加密验证、冗余元数据以及存储和处理效率方面的低效。我们表明，这些设计选择，虽然基于已建立的标准，但会导致显著的性能瓶颈，增加漏洞面，以及阻碍广泛部署到互联网上的可扩展性。在本文中，我们对RPKI设计中的复杂性根本原因进行了首次系统分析，并在实验中量化它们在现实世界中的影响。我们表明，在RPKI依赖方中，超过70%的验证时间花费在证书解析和签名验证上，其中很多是不必要的。基于这一洞察，我们引入了改进后的RPKI（iRPKI），这是一个向后兼容的重新设计，保留了所有安全保证，同时大幅减少了协议开销。iRPKI消除了EE证书和ROA签名，合并了吊销和完整性对象，用Protobuf替换了冗长的编码，并重新构造了存储库元数据以实现更高效的访问。我们实验证明，在Routinator验证器中实现的iRPKI可以实现处理时间的20倍加速，带宽需求的18倍改进以及缓存内存占用的8倍减少，同时也消除了导致至少10个RPKI软件漏洞的漏洞类别。iRPKI显著提高了在互联网上大规模部署RPKI的可行性，特别是在受限环境中。我们的设计可以逐步部署，而不会影响现有运营。

更新时间: 2025-07-02 08:24:50

领域: cs.CR

下载: http://arxiv.org/abs/2507.01465v1

NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation

Instance segmentation of novel objects instances in RGB images, given some example images for each object, is a well known problem in computer vision. Designing a model general enough to be employed, for all kinds of novel objects, without (re-) training, has proven to be a difficult task. To handle this, we propose a simple, yet powerful, framework, called: Novel Object Cyclic Threshold based Instance Segmentation (NOCTIS). This work stems from and improves upon previous ones like CNOS, SAM-6D and NIDS-Net; thus, it also leverages on recent vision foundation models, namely: Grounded-SAM 2 and DINOv2. It utilises Grounded-SAM 2 to obtain object proposals with precise bounding boxes and their corresponding segmentation masks; while DINOv2's zero-shot capabilities are employed to generate the image embeddings. The quality of those masks, together with their embeddings, is of vital importance to our approach; as the proposal-object matching is realized by determining an object matching score based on the similarity of the class embeddings and the average maximum similarity of the patch embeddings. Differently to SAM-6D, calculating the latter involves a prior patch filtering based on the distance between each patch and its corresponding cyclic/roundtrip patch in the image grid. Furthermore, the average confidence of the proposals' bounding box and mask is used as an additional weighting factor for the object matching score. We empirically show that NOCTIS, without further training/fine tuning, outperforms the best RGB and RGB-D methods on the seven core datasets of the BOP 2023 challenge for the "Model-based 2D segmentation of unseen objects" task.

Updated: 2025-07-02 08:23:14

标题: NOCTIS：基于新颖物体循环阈值的实例分割

摘要: 在RGB图像中实例化新对象实例的分割，在给定每个对象的一些示例图像的情况下，是计算机视觉中众所周知的问题。设计一个足够通用的模型，可以应用于所有种类的新对象，而无需重新训练，被证明是一项困难的任务。为了处理这个问题，我们提出了一个简单但强大的框架，称为：基于新对象循环阈值的实例分割（NOCTIS）。这项工作源自并改进了之前的工作，如CNOS、SAM-6D和NIDS-Net；因此，它还利用了最近的视觉基础模型，即：Grounded-SAM 2和DINOv2。它利用Grounded-SAM 2获得具有精确边界框和相应分割掩模的对象建议；而DINOv2的零样本能力被用来生成图像嵌入。这些掩模的质量，以及它们的嵌入，对我们的方法至关重要；因为建议-对象匹配是通过确定基于类嵌入的相似性和补丁嵌入的平均最大相似性来实现的。与SAM-6D不同，计算后者涉及根据每个补丁与图像网格中其对应的循环/往返补丁之间的距离进行先前补丁过滤。此外，建议的边界框和掩模的平均置信度被用作对象匹配分数的额外加权因子。我们凭经验证明，NOCTIS在不进行进一步训练/微调的情况下，在BOP 2023挑战的七个核心数据集上优于最好的RGB和RGB-D方法，用于"基于模型的看不见对象的2D分割"任务。

更新时间: 2025-07-02 08:23:14

领域: cs.CV,cs.AI,I.2; I.4; I.5

下载: http://arxiv.org/abs/2507.01463v1

Quantum-Assisted Automatic Path-Planning for Robotic Quality Inspection in Industry 4.0

This work explores the application of hybrid quantum-classical algorithms to optimize robotic inspection trajectories derived from Computer-Aided Design (CAD) models in industrial settings. By modeling the task as a 3D variant of the Traveling Salesman Problem, incorporating incomplete graphs and open-route constraints, this study evaluates the performance of two D-Wave-based solvers against classical methods such as GUROBI and Google OR-Tools. Results across five real-world cases demonstrate competitive solution quality with significantly reduced computation times, highlighting the potential of quantum approaches in automation under Industry 4.0.

Updated: 2025-07-02 08:21:52

标题: 量子辅助工业4.0中机器人质量检测自动路径规划

摘要: 这项工作探讨了在工业环境中将混合量子-经典算法应用于优化机器人检测轨迹，这些轨迹是根据计算机辅助设计（CAD）模型派生而来。通过将任务建模为3D旅行推销员问题的变体，结合不完整图和开放路径约束，本研究评估了基于D-Wave的两种求解器与GUROBI和Google OR-Tools等经典方法的性能。在五个真实案例中的结果显示出了具有竞争力的解决方案质量，同时大大降低了计算时间，凸显了量子方法在工业4.0下自动化领域的潜力。

更新时间: 2025-07-02 08:21:52

领域: cs.RO,cs.AI,cs.ET

下载: http://arxiv.org/abs/2507.01462v1

Topological Signatures vs. Gradient Histograms: A Comparative Study for Medical Image Classification

We present the first comparative study of two fundamentally distinct feature extraction techniques: Histogram of Oriented Gradients (HOG) and Topological Data Analysis (TDA), for medical image classification using retinal fundus images. HOG captures local texture and edge patterns through gradient orientation histograms, while TDA, using cubical persistent homology, extracts high-level topological signatures that reflect the global structure of pixel intensities. We evaluate both methods on the large APTOS dataset for two classification tasks: binary detection (normal versus diabetic retinopathy) and five-class diabetic retinopathy severity grading. From each image, we extract 26244 HOG features and 800 TDA features, using them independently to train seven classical machine learning models with 10-fold cross-validation. XGBoost achieved the best performance in both cases: 94.29 percent accuracy (HOG) and 94.18 percent (TDA) on the binary task; 74.41 percent (HOG) and 74.69 percent (TDA) on the multi-class task. Our results show that both methods offer competitive performance but encode different structural aspects of the images. This is the first work to benchmark gradient-based and topological features on retinal imagery. The techniques are interpretable, applicable to other medical imaging domains, and suitable for integration into deep learning pipelines.

Updated: 2025-07-02 08:20:51

标题: 拓扑特征与梯度直方图：医学图像分类的比较研究

摘要: 我们提出了对两种基本不同的特征提取技术进行比较研究：定向梯度直方图（HOG）和拓扑数据分析（TDA），用于使用视网膜底图像进行医学图像分类。HOG通过梯度方向直方图捕获局部纹理和边缘模式，而TDA则利用立方体持久性同调提取反映像素强度全局结构的高级拓扑特征。我们在大型APTOS数据集上为两个分类任务评估了这两种方法：二元检测（正常与糖尿病视网膜病变）和五级糖尿病视网膜病变严重程度分级。从每幅图像中，我们提取了26244个HOG特征和800个TDA特征，分别使用它们来训练七种经典机器学习模型，并进行10倍交叉验证。XGBoost在两种情况下表现最佳：二元任务的准确率为94.29％（HOG）和94.18％（TDA）；多类任务的准确率分别为74.41％（HOG）和74.69％（TDA）。我们的结果表明，这两种方法都提供了有竞争力的性能，但编码了图像的不同结构方面。这是第一项对视网膜图像上的基于梯度和拓扑特征进行基准测试的工作。这些技术是可解释的，适用于其他医学成像领域，并适合集成到深度学习管道中。

更新时间: 2025-07-02 08:20:51

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.03006v1

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

The modern paradigm in machine learning involves pre-training on diverse data, followed by task-specific fine-tuning. In reinforcement learning (RL), this translates to learning via offline RL on a diverse historical dataset, followed by rapid online RL fine-tuning using interaction data. Most RL fine-tuning methods require continued training on offline data for stability and performance. However, this is undesirable because training on diverse offline data is slow and expensive for large datasets, and in principle, also limit the performance improvement possible because of constraints or pessimism on offline data. In this paper, we show that retaining offline data is unnecessary as long as we use a properly-designed online RL approach for fine-tuning offline RL initializations. To build this approach, we start by analyzing the role of retaining offline data in online fine-tuning. We find that continued training on offline data is mostly useful for preventing a sudden divergence in the value function at the onset of fine-tuning, caused by a distribution mismatch between the offline data and online rollouts. This divergence typically results in unlearning and forgetting the benefits of offline pre-training. Our approach, Warm-start RL (WSRL), mitigates the catastrophic forgetting of pre-trained initializations using a very simple idea. WSRL employs a warmup phase that seeds the online RL run with a very small number of rollouts from the pre-trained policy to do fast online RL. The data collected during warmup helps ``recalibrate'' the offline Q-function to the online distribution, allowing us to completely discard offline data without destabilizing the online RL fine-tuning. We show that WSRL is able to fine-tune without retaining any offline data, and is able to learn faster and attains higher performance than existing algorithms irrespective of whether they retain offline data or not.

Updated: 2025-07-02 08:19:18

标题: 高效的在线强化学习微调并不需要保留离线数据

摘要: 在机器学习中，现代范式涉及在各种数据上进行预训练，然后进行特定任务的微调。在强化学习（RL）中，这意味着通过离线RL学习一个多样化的历史数据集，然后使用交互数据进行快速在线RL微调。大多数RL微调方法需要在离线数据上进行持续训练以保持稳定性和性能。然而，这是不可取的，因为在大型数据集上训练多样化的离线数据缓慢且昂贵，并且从原则上讲，由于离线数据的限制或悲观主义，也限制了可能的性能改进。在本文中，我们表明保留离线数据是不必要的，只要我们使用一个经过适当设计的在线RL方法对离线RL初始化进行微调。为了构建这种方法，我们首先分析了在在线微调中保留离线数据的作用。我们发现，在离线数据和在线执行之间的分布不匹配导致微调开始时值函数出现突然发散时，持续在离线数据上训练大多有助于防止这种情况发生。这种发散通常导致取消学习和忘记离线预训练的好处。我们的方法，Warm-start RL（WSRL），通过一个非常简单的想法缓解了预训练初始化的灾难性遗忘。WSRL使用一个预热阶段，从预训练策略中获取非常少量的执行来进行快速在线RL。在预热期间收集的数据有助于“重新校准”离线Q函数到在线分布，使我们能够完全舍弃离线数据而不破坏在线RL微调。我们表明，WSRL能够在不保留任何离线数据的情况下进行微调，并且无论它们是否保留离线数据，都能够比现有算法更快地学习并获得更高的性能。

更新时间: 2025-07-02 08:19:18

领域: cs.LG

下载: http://arxiv.org/abs/2412.07762v3

Tensor Program Optimization for the RISC-V Vector Extension Using Probabilistic Programs

RISC-V provides a flexible and scalable platform for applications ranging from embedded devices to high-performance computing clusters. Particularly, its RISC-V Vector Extension (RVV) becomes of interest for the acceleration of AI workloads. But writing software that efficiently utilizes the vector units of RISC-V CPUs without expert knowledge requires the programmer to rely on the autovectorization features of compilers or hand-crafted libraries like muRISCV-NN. Smarter approaches, like autotuning frameworks, have been missing the integration with the RISC-V RVV extension, thus heavily limiting the efficient deployment of complex AI workloads. In this paper, we present a workflow based on the TVM compiler to efficiently map AI workloads onto RISC-V vector units. Instead of relying on hand-crafted libraries, we integrated the RVV extension into TVM's MetaSchedule framework, a probabilistic program framework for tensor operation tuning. We implemented different RISC-V SoCs on an FPGA and tuned a wide range of AI workloads on them. We found that our proposal shows a mean improvement of 46% in execution latency when compared against the autovectorization feature of GCC, and 29% against muRISCV-NN. Moreover, the binary resulting from our proposal has a smaller code memory footprint, making it more suitable for embedded devices. Finally, we also evaluated our solution on a commercially available RISC-V SoC implementing the RVV 1.0 Vector Extension and found our solution is able to find mappings that are 35% faster on average than the ones proposed by LLVM. We open-sourced our proposal for the community to expand it to target other RISC-V extensions.

Updated: 2025-07-02 08:15:33

标题: 使用概率程序对RISC-V矢量扩展进行张量程序优化

摘要: RISC-V提供了一个灵活且可扩展的平台，适用于从嵌入式设备到高性能计算集群的各种应用。特别是，其RISC-V矢量扩展（RVV）对于加速AI工作负载变得越来越有趣。但是，要编写有效利用RISC-V CPU的矢量单元的软件而不需要专业知识，程序员必须依赖于编译器的自动矢量化功能或像muRISCV-NN这样手工制作的库。智能方法，如自动调优框架，一直缺乏与RISC-V RVV扩展的集成，因此严重限制了复杂AI工作负载的高效部署。在本文中，我们提出了一种基于TVM编译器的工作流程，可以有效地将AI工作负载映射到RISC-V矢量单元。我们将RVV扩展集成到TVM的MetaSchedule框架中，这是一个用于张量操作调优的概率程序框架，而不是依赖于手工制作的库。我们在FPGA上实现了不同的RISC-V SoC，并对它们进行了广泛的AI工作负载调优。我们发现，与GCC的自动矢量化功能相比，我们的提案在执行延迟方面平均提高了46%，而与muRISCV-NN相比提高了29%。此外，我们提案生成的二进制代码存储空间更小，更适合嵌入式设备。最后，我们还在一个商用的实现RVV 1.0矢量扩展的RISC-V SoC上评估了我们的解决方案，并发现我们的解决方案能够找到比LLVM提出的映射平均快35%的结果。我们向社区开源了我们的提案，以便将其扩展到其他RISC-V扩展。

更新时间: 2025-07-02 08:15:33

领域: cs.LG,cs.AI,cs.SE

下载: http://arxiv.org/abs/2507.01457v1

BIS Reasoning 1.0: The First Large-Scale Japanese Benchmark for Belief-Inconsistent Syllogistic Reasoning

We present BIS Reasoning 1.0, the first large-scale Japanese dataset of syllogistic reasoning problems explicitly designed to evaluate belief-inconsistent reasoning in large language models (LLMs). Unlike prior datasets such as NeuBAROCO and JFLD, which focus on general or belief-aligned reasoning, BIS Reasoning 1.0 introduces logically valid yet belief-inconsistent syllogisms to uncover reasoning biases in LLMs trained on human-aligned corpora. We benchmark state-of-the-art models - including GPT models, Claude models, and leading Japanese LLMs - revealing significant variance in performance, with GPT-4o achieving 79.54% accuracy. Our analysis identifies critical weaknesses in current LLMs when handling logically valid but belief-conflicting inputs. These findings have important implications for deploying LLMs in high-stakes domains such as law, healthcare, and scientific literature, where truth must override intuitive belief to ensure integrity and safety.

Updated: 2025-07-02 08:15:13

标题: BIS 推理 1.0：日本首个大规模信念不一致三段论推理基准

摘要: 我们提出了BIS Reasoning 1.0，这是第一个大规模的日语数据集，专门设计用于评估大型语言模型（LLMs）中的信念不一致推理。与NeuBAROCO和JFLD等之前的数据集不同，这些数据集侧重于一般或信念一致的推理，BIS Reasoning 1.0引入了逻辑有效但信念不一致的三段论，以揭示在人类对齐语料库上训练的LLMs中的推理偏见。我们对最先进的模型进行基准测试 - 包括GPT模型、Claude模型和领先的日本LLMs - 发现性能存在显著差异，其中GPT-4o的准确率达到了79.54%。我们的分析确定了当前LLMs在处理逻辑有效但信念冲突的输入时存在的关键弱点。这些发现对于在法律、医疗保健和科学文献等高风险领域部署LLMs具有重要意义，其中真相必须优先于直觉信念，以确保完整性和安全性。

更新时间: 2025-07-02 08:15:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.06955v3

Rational Censorship Attack: Breaking Blockchain with a Blackboard

Censorship resilience is a fundamental assumption underlying the security of blockchain protocols. Additionally, the analysis of blockchain security from an economic and game theoretic perspective has been growing in popularity in recent years. In this work, we present a surprising rational censorship attack on blockchain censorship resilience when we adopt the analysis of blockchain security from a game theoretic lens and assume all users are rational. In our attack, a colluding group with sufficient voting power censors the remainder nodes such that the group alone can gain all the rewards from maintaining the blockchain. We show that if nodes are rational, coordinating this attack just requires a public read and write blackboard and we formally model the attack using a game theoretic framework. Furthermore, we note that to ensure the success of the attack, nodes need to know the total true voting power held by the colluding group. We prove that the strategy to join the rational censorship attack and also for nodes to honestly declare their power is a subgame perfect equilibrium in the corresponding extensive form game induced by our attack. Finally, we discuss the implications of the attack on blockchain users and protocol designers as well as some potential countermeasures.

Updated: 2025-07-02 08:12:12

标题: 理性审查攻击：用黑板破解区块链

摘要: 审查制度的弹性是支持区块链协议安全的基本假设。此外，最近几年来，从经济和博弈论的角度对区块链安全性进行分析变得越来越受欢迎。在这项工作中，我们提出了一种令人惊讶的理性审查攻击，当我们从博弈论的角度分析区块链安全性，并假设所有用户都是理性的时，对区块链审查制度进行攻击。在我们的攻击中，一个具有足够投票权力的勾结团体审查其他节点，以便该团体独自获得维护区块链的所有奖励。我们展示，如果节点是理性的，协调这种攻击只需要一个公开读写黑板，并且我们正式地使用博弈论框架对攻击进行建模。此外，我们指出，为确保攻击的成功，节点需要知道勾结团体持有的总真实投票权力。我们证明，加入理性审查攻击的策略以及节点诚实声明他们的权力是由我们攻击诱发的相应广泛形式博弈中的子博弈完美均衡。最后，我们讨论了攻击对区块链用户和协议设计者的影响，以及一些潜在的对策。

更新时间: 2025-07-02 08:12:12

领域: cs.GT,cs.CR,cs.DC

下载: http://arxiv.org/abs/2507.01453v1

Using multi-agent architecture to mitigate the risk of LLM hallucinations

Improving customer service quality and response time are critical factors for maintaining customer loyalty and increasing a company's market share. While adopting emerging technologies such as Large Language Models (LLMs) is becoming a necessity to achieve these goals, the risk of hallucination remains a major challenge. In this paper, we present a multi-agent system to handle customer requests sent via SMS. This system integrates LLM based agents with fuzzy logic to mitigate hallucination risks.

Updated: 2025-07-02 08:06:02

标题: 使用多智能体架构来减轻LLM幻觉风险

摘要: 提高客户服务质量和响应时间是保持客户忠诚度和增加公司市场份额的关键因素。虽然采用新兴技术如大型语言模型(LLMs)已成为实现这些目标的必要条件，但幻觉风险仍然是一个重要挑战。本文介绍了一个多代理系统，用于处理通过短信发送的客户请求。该系统集成了基于LLM的代理和模糊逻辑，以减轻幻觉风险。

更新时间: 2025-07-02 08:06:02

领域: cs.AI

下载: http://arxiv.org/abs/2507.01446v1

FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning

Remote sensing image captioning aims to generate descriptive text from remote sensing images, typically employing an encoder-decoder framework. In this setup, a convolutional neural network (CNN) extracts feature representations from the input image, which then guide the decoder in a sequence-to-sequence caption generation process. Although much research has focused on refining the decoder, the quality of image representations from the encoder remains crucial for accurate captioning. This paper introduces a novel approach that integrates features from two distinct CNN based encoders, capturing complementary information to enhance caption generation. Additionally, we propose a weighted averaging technique to combine the outputs of all GRUs in the stacked decoder. Furthermore, a comparison-based beam search strategy is incorporated to refine caption selection. The results demonstrate that our fusion-based approach, along with the enhanced stacked decoder, significantly outperforms both the transformer-based state-of-the-art model and other LSTM-based baselines.

Updated: 2025-07-02 07:55:33

标题: FE-LWS：通过解码器堆叠和融合编码实现的遥感图像字幕的优化图像文本表示

摘要: 遥感图像字幕生成旨在从遥感图像中生成描述性文本，通常采用编码器-解码器框架。在这种设置中，卷积神经网络（CNN）从输入图像中提取特征表示，然后引导解码器进行序列到序列的字幕生成过程。尽管许多研究都集中在优化解码器上，但编码器中的图像表示质量对于准确的字幕生成仍然至关重要。本文介绍了一种新颖的方法，该方法将来自两个不同基于CNN的编码器的特征集成在一起，捕获互补信息以增强字幕生成。此外，我们提出了一种加权平均技术，用于将堆叠解码器中所有GRU的输出组合在一起。此外，还加入了基于比较的波束搜索策略来完善字幕选择。结果表明，我们基于融合的方法，以及增强的堆叠解码器，显著优于基于变压器的最新模型和其他基于LSTM的基线模型。

更新时间: 2025-07-02 07:55:33

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2502.09282v2

DICE-BENCH: Evaluating the Tool-Use Capabilities of Large Language Models in Multi-Round, Multi-Party Dialogues

Existing function-calling benchmarks focus on single-turn interactions. However, they overlook the complexity of real-world scenarios. To quantify how existing benchmarks address practical applications, we introduce DICE-SCORE, a metric that evaluates the dispersion of tool-related information such as function name and parameter values throughout the dialogue. Analyzing existing benchmarks through DICE-SCORE reveals notably low scores, highlighting the need for more realistic scenarios. To address this gap, we present DICE-BENCH, a framework that constructs practical function-calling datasets by synthesizing conversations through a tool graph that maintains dependencies across rounds and a multi-agent system with distinct personas to enhance dialogue naturalness. The final dataset comprises 1,607 high-DICE-SCORE instances. Our experiments on 19 LLMs with DICE-BENCH show that significant advances are still required before such models can be deployed effectively in real-world settings. Our code and data are all publicly available: https://snuhcc.github.io/DICE-Bench/.

Updated: 2025-07-02 07:55:09

标题: DICE-BENCH：在多轮、多方对话中评估大型语言模型的工具使用能力

摘要: 现有的函数调用基准重点放在单轮交互上。然而，它们忽视了真实场景的复杂性。为了量化现有基准如何解决实际应用问题，我们引入了DICE-SCORE，这是一个评估工具相关信息（如函数名称和参数值）在对话中分散程度的度量。通过DICE-SCORE对现有基准进行分析显示出明显低分，突出了对更加真实场景的需求。为了填补这一差距，我们提出了DICE-BENCH，这是一个通过维护跨轮依赖关系的工具图和一个具有不同人物角色的多代理系统来增强对话自然性，从而构建实际函数调用数据集的框架。最终数据集包括1,607个高DICE-SCORE实例。我们在19个LLMs上使用DICE-BENCH进行实验，结果表明在将这样的模型有效部署到现实场景中之前，仍需要取得显著进展。我们的代码和数据都是公开可用的：https://snuhcc.github.io/DICE-Bench/.

更新时间: 2025-07-02 07:55:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.22853v2

Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization

The nonconvex formulation of the matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient Descent (GD) is a simple yet efficient baseline algorithm for solving nonconvex optimization problems. The success of GD has been witnessed in many different problems in both theory and practice when it is combined with random initialization. However, previous works on matrix completion require either careful initialization or regularizers to prove the convergence of GD. In this paper, we study the rank-1 symmetric matrix completion and prove that GD converges to the ground truth when small random initialization is used. We show that in a logarithmic number of iterations, the trajectory enters the region where local convergence occurs. We provide an upper bound on the initialization size that is sufficient to guarantee the convergence, and show that a larger initialization can be used as more samples are available. We observe that the implicit regularization effect of GD plays a critical role in the analysis, and for the entire trajectory, it prevents each entry from becoming much larger than the others.

Updated: 2025-07-02 07:52:07

标题: 使用梯度下降和小随机初始化完成秩-1矩阵补全

摘要: 矩阵补全问题的非凸制定近年来受到了重视，因为与凸制定相比，其复杂性更为可承受。梯度下降（GD）是解决非凸优化问题的一种简单而高效的基准算法。当与随机初始化结合时，GD的成功在理论和实践中的许多不同问题中都得到了证明。然而，在以往的矩阵补全研究中，要么需要谨慎初始化，要么需要正则化器才能证明GD的收敛性。在本文中，我们研究了秩为1的对称矩阵补全，并证明了在使用小的随机初始化时，GD会收敛到真实值。我们展示了在对数次迭代中，轨迹进入了局部收敛发生的区域。我们提供了一个足以保证收敛的初始化大小的上界，并且证明随着更多样本的可用性，可以使用更大的初始化。我们观察到GD的隐式正则化效应在分析中起到了关键作用，并且在整个轨迹中，它防止每个条目变得比其他条目大得多。

更新时间: 2025-07-02 07:52:07

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2212.09396v3

Machine Learning Based Stress Testing Framework for Indian Financial Market Portfolios

This paper presents a machine learning driven framework for sectoral stress testing in the Indian financial market, focusing on financial services, information technology, energy, consumer goods, and pharmaceuticals. Initially, we address the limitations observed in conventional stress testing through dimensionality reduction and latent factor modeling via Principal Component Analysis and Autoencoders. Building on this, we extend the methodology using Variational Autoencoders, which introduces a probabilistic structure to the latent space. This enables Monte Carlo-based scenario generation, allowing for more nuanced, distribution-aware simulation of stressed market conditions. The proposed framework captures complex non-linear dependencies and supports risk estimation through Value-at-Risk and Expected Shortfall. Together, these pipelines demonstrate the potential of Machine Learning approaches to improve the flexibility, robustness, and realism of financial stress testing.

Updated: 2025-07-02 07:47:56

标题: 基于机器学习的印度金融市场投资组合压力测试框架

摘要: 本文提出了一个机器学习驱动的框架，用于印度金融市场的部门压力测试，重点放在金融服务、信息技术、能源、消费品和制药行业。最初，我们通过主成分分析和自编码器来解决传统压力测试中观察到的限制。在此基础上，我们使用变分自编码器扩展了方法论，引入概率结构到潜在空间。这使得通过蒙特卡罗场景生成更加细致、分布感知的模拟压力市场条件成为可能。所提出的框架捕捉了复杂的非线性依赖关系，并支持通过风险价值和预期损失进行风险评估。这些流程共同展示了机器学习方法改进金融压力测试的灵活性、鲁棒性和现实性的潜力。

更新时间: 2025-07-02 07:47:56

领域: q-fin.RM,cs.LG,q-fin.PM

下载: http://arxiv.org/abs/2507.02011v1

EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices

Large Language Models (LLMs) have gained significant attention due to their versatility across a wide array of applications. Fine-tuning LLMs with parameter-efficient adapters, such as Low-Rank Adaptation (LoRA), enables these models to efficiently adapt to downstream tasks without extensive retraining. Deploying fine-tuned LLMs on multi-tenant edge devices offers substantial benefits, such as reduced latency, enhanced privacy, and personalized responses. However, serving LLMs efficiently on resource-constrained edge devices presents critical challenges, including the complexity of adapter selection for different tasks and memory overhead from frequent adapter swapping. Moreover, given the multiple requests in multi-tenant settings, processing requests sequentially results in underutilization of computational resources and increased latency. This paper introduces EdgeLoRA, an efficient system for serving LLMs on edge devices in multi-tenant environments. EdgeLoRA incorporates three key innovations: (1) an adaptive adapter selection mechanism to streamline the adapter configuration process; (2) heterogeneous memory management, leveraging intelligent adapter caching and pooling to mitigate memory operation overhead; and (3) batch LoRA inference, enabling efficient batch processing to significantly reduce computational latency. Comprehensive evaluations using the Llama3.1-8B model demonstrate that EdgeLoRA significantly outperforms the status quo (i.e., llama.cpp) in terms of both latency and throughput. The results demonstrate that EdgeLoRA can achieve up to a 4 times boost in throughput. Even more impressively, it can serve several orders of magnitude more adapters simultaneously. These results highlight EdgeLoRA's potential to transform edge deployment of LLMs in multi-tenant scenarios, offering a scalable and efficient solution for resource-constrained environments.

Updated: 2025-07-02 07:47:28

标题: EdgeLoRA：边缘设备上高效的多租户LLM服务系统

摘要: Large Language Models (LLMs)由于其在各种应用中的多功能性而受到了广泛关注。使用参数高效的适配器（如低秩适配器LoRA）对LLMs进行微调，使这些模型能够在不经过大量重新训练的情况下有效地适应下游任务。在多租户边缘设备上部署经过微调的LLMs提供了重要的好处，如降低延迟、增强隐私性和个性化响应。然而，在资源受限的边缘设备上有效地提供LLMs存在关键挑战，包括为不同任务选择适配器的复杂性以及频繁适配器交换导致的内存开销。此外，在多租户环境中存在多个请求时，按顺序处理请求会导致计算资源的低利用率和延迟增加。本文介绍了EdgeLoRA，这是一个在多租户环境中在边缘设备上提供LLMs的高效系统。EdgeLoRA包括三个关键创新点：（1）自适应适配器选择机制，简化适配器配置过程；（2）异构内存管理，利用智能适配器缓存和池化来减少内存操作开销；（3）批量LoRA推断，实现高效的批量处理以显著减少计算延迟。使用Llama3.1-8B模型进行的全面评估表明，EdgeLoRA在延迟和吞吐量方面明显优于现状（即llama.cpp）。结果表明，EdgeLoRA的吞吐量提高了高达4倍。更令人印象深刻的是，它可以同时服务数量级更多的适配器。这些结果突显了EdgeLoRA在多租户场景中改变LLMs边缘部署的潜力，为资源受限环境提供了可扩展且高效的解决方案。

更新时间: 2025-07-02 07:47:28

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01438v1

Slot: Provenance-Driven APT Detection through Graph Reinforcement Learning

Advanced Persistent Threats (APTs) represent sophisticated cyberattacks characterized by their ability to remain undetected within the victim system for extended periods, aiming to exfiltrate sensitive data or disrupt operations. Existing detection approaches often struggle to effectively identify these complex threats, construct the attack chain for defense facilitation, or resist adversarial attacks. To overcome these challenges, we propose Slot, an advanced APT detection approach based on provenance graphs and graph reinforcement learning. Slot excels in uncovering multi-level hidden relationships, such as causal, contextual, and indirect connections, among system behaviors through provenance graph mining. By pioneering the integration of graph reinforcement learning, Slot dynamically adapts to new user activities and evolving attack strategies, enhancing its resilience against adversarial attacks. Additionally, Slot automatically constructs the attack chain according to detected attacks with clustering algorithms, providing precise identification of attack paths and facilitating the development of defense strategies. Evaluations with real-world datasets demonstrate Slot's outstanding accuracy, efficiency, adaptability, and robustness in APT detection, with most metrics surpassing state-of-the-art methods. Additionally, case studies conducted to assess Slot's effectiveness in supporting APT defense further establish it as a practical and reliable tool for cybersecurity protection.

Updated: 2025-07-02 07:41:06

标题: 插槽：基于图强化学习的溯源驱动的高级持续威胁检测

摘要: 高级持久威胁（APTs）代表着一种复杂的网络攻击，其特点是能够在受害者系统内长时间保持未被发现的状态，旨在窃取敏感数据或破坏运营。现有的检测方法往往难以有效识别这些复杂威胁，构建攻击链以进行防御，或抵抗敌对攻击。为了克服这些挑战，我们提出了Slot，一种基于溯源图和图强化学习的先进APTs检测方法。Slot擅长通过溯源图挖掘揭示系统行为中的多层隐藏关系，如因果、上下文和间接连接。通过开创性地将图强化学习整合进来，Slot能够动态适应新的用户活动和不断演变的攻击策略，增强其抵抗敌对攻击的能力。此外，Slot还通过聚类算法自动构建攻击链，根据检测到的攻击提供精确的攻击路径识别，促进防御策略的制定。通过真实世界数据集的评估，证明了Slot在APTs检测方面的卓越准确性、效率、适应性和稳健性，大多数指标均超过了现有方法。此外，进行的案例研究评估了Slot在支持APTs防御方面的有效性，进一步确立了它作为网络安全保护的实用和可靠工具。

更新时间: 2025-07-02 07:41:06

领域: cs.CR

下载: http://arxiv.org/abs/2410.17910v3

Embodied Instruction Following in Unknown Environments

Enabling embodied agents to complete complex human instructions from natural language is crucial to autonomous systems in household services. Conventional methods can only accomplish human instructions in the known environment where all interactive objects are provided to the embodied agent, and directly deploying the existing approaches for the unknown environment usually generates infeasible plans that manipulate non-existing objects. On the contrary, we propose an embodied instruction following (EIF) method for complex tasks in the unknown environment, where the agent efficiently explores the unknown environment to generate feasible plans with existing objects to accomplish abstract instructions. Specifically, we build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller with multimodal large language models. We then construct a semantic representation map of the scene with dynamic region attention to demonstrate the known visual clues, where the goal of task planning and scene exploration is aligned for human instruction. For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues. For the exploration controller, the optimal navigation or object interaction policy is predicted based on the generated step-wise plans and the known visual clues. The experimental results demonstrate that our method can achieve 45.09% success rate in 204 complex human instructions such as making breakfast and tidying rooms in large house-level scenes. Code and supplementary are available at https://gary3410.github.io/eif_unknown.

Updated: 2025-07-02 07:36:55

标题: 未知环境中的具身化指导跟随

摘要: 使具有实体的代理能够根据自然语言完成复杂的人类指令对于家庭服务中的自主系统至关重要。传统方法只能在已知环境中完成人类指令，其中所有交互对象均为实体代理提供，并且直接将现有方法部署到未知环境通常会生成操作不存在对象的不可行计划。相反，我们提出了一种适用于未知环境中复杂任务的实体指令跟随（EIF）方法，其中代理有效地探索未知环境，以使用现有对象完成抽象指令生成可行计划。具体来说，我们构建了一个包括高级任务规划器和低级探索控制器的分层实体指令跟随框架，采用多模态大型语言模型。然后，我们利用动态区域注意力构建场景的语义表示地图，展示已知的视觉线索，其中任务规划和场景探索的目标与人类指令一致。对于任务规划器，我们根据任务完成过程和已知的视觉线索生成实现人类目标的可行逐步计划。对于探索控制器，根据生成的逐步计划和已知的视觉线索预测最佳导航或对象交互策略。实验结果表明，我们的方法在包括制作早餐和整理大型房屋场景中的房间等204个复杂人类指令中可以达到45.09%的成功率。代码和补充材料可在https://gary3410.github.io/eif_unknown找到。

更新时间: 2025-07-02 07:36:55

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.11818v2

Efficient Split Federated Learning for Large Language Models over Communication Networks

Fine-tuning pre-trained large language models (LLMs) in a distributed manner poses significant challenges on resource-constrained edge networks. To address this challenge, we propose SflLLM, a novel framework that integrates split federated learning with parameter-efficient fine-tuning techniques. By leveraging model splitting and low-rank adaptation (LoRA), SflLLM reduces the computational burden on edge devices. Furthermore, the introduction of a federated server facilitates parallel training and enhances data privacy. To accommodate heterogeneous communication conditions and diverse computational capabilities of edge devices, as well as the impact of LoRA rank selection on model convergence and training cost, we formulate a joint optimization problem of both communication and computation resource. The formulated problem jointly optimizes subchannel allocation, power control, model splitting point selection, and LoRA rank configuration, aimed at minimizing total training delay. An iterative optimization algorithm is proposed to solve this problem efficiently. Specifically, a greedy heuristic is employed for subchannel allocation, the power control subproblem is reformulated as a convex optimization problem using auxiliary variables, and an exhaustive search is adopted for optimal split position and rank selection. Simulation results demonstrate that the proposed SflLLM framework achieves comparable model accuracy while significantly reducing client-side computational requirements. Furthermore, the proposed resource allocation scheme and adaptive LoRA rank selection strategy notably reduce the training latency compared to conventional approaches.

Updated: 2025-07-02 07:35:07

标题: 高效的分割式联邦学习在通信网络上用于大型语言模型

摘要: 在资源受限的边缘网络中以分布式方式微调预训练的大型语言模型(LLMs)面临着重大挑战。为了解决这一挑战，我们提出了SflLLM，这是一个集成了分割联邦学习和参数高效微调技术的新框架。通过利用模型分割和低秩适应（LoRA），SflLLM减少了边缘设备上的计算负担。此外，引入联邦服务器促进了并行训练并增强了数据隐私性。为了适应异构通信条件和边缘设备的各种计算能力，以及LoRA秩选择对模型收敛和训练成本的影响，我们制定了一个联合优化问题，旨在同时优化通信和计算资源。制定的问题联合优化子信道分配、功率控制、模型分割点选择和LoRA秩配置，旨在最小化总训练延迟。提出了一个迭代优化算法来高效解决这个问题。具体地，采用贪婪启发式方法进行子信道分配，将功率控制子问题重新构建为一个使用辅助变量的凸优化问题，并采用穷举搜索进行最佳分割位置和秩选择。仿真结果表明，所提出的SflLLM框架在显著降低客户端计算要求的同时实现了可比较的模型准确性。此外，所提出的资源分配方案和自适应LoRA秩选择策略显著减少了训练延迟，相比传统方法。

更新时间: 2025-07-02 07:35:07

领域: cs.LG,cs.NI

下载: http://arxiv.org/abs/2504.14667v2

Perceiving Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models

Achieving deep alignment between vision and language remains a central challenge for Multimodal Large Language Models (MLLMs). These models often fail to fully leverage visual input, defaulting to strong language priors. Our approach first provides insights into how MLLMs internally build visual understanding of image regions and then introduces techniques to amplify this capability. Specifically, we explore techniques designed both to deepen the model's understanding of visual content and to ensure that these visual insights actively guide language generation. We demonstrate the superior multimodal understanding of our resultant model through a detailed upstream analysis quantifying its ability to predict visually-dependent tokens as well as 10 pt boost on visually challenging tasks.

Updated: 2025-07-02 07:30:12

标题: 超越语言先验知识的感知：增强多模态模型中的视觉理解和注意力

摘要: 实现视觉和语言之间的深度对齐仍然是多模态大型语言模型（MLLMs）面临的中心挑战。这些模型经常无法充分利用视觉输入，而是默认强大的语言先验知识。我们的方法首先提供了关于MLLMs如何内部构建图像区域的视觉理解的见解，然后引入了增强这种能力的技术。具体来说，我们探索了旨在加深模型对视觉内容理解的技术，并确保这些视觉洞见积极引导语言生成。通过详细的上游分析，量化我们最终模型预测依赖视觉的标记的能力以及在视觉挑战任务上的10个百分点提升，我们展示了我们模型的优越多模态理解能力。

更新时间: 2025-07-02 07:30:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2505.05626v3

Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI applications. Among newly-researched memory technologies, racetrack memory is a non-volatile technology that allows high data density fabrication, making it a good fit for in-memory computing. However, integrating in-memory arithmetic circuits with memory cells affects both the memory density and power efficiency. It remains challenging to build efficient in-memory arithmetic circuits on racetrack memory within area and energy constraints. To this end, we present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory. We design a series of fundamental arithmetic circuits as in-memory computing cells suited for multiply-and-accumulate operations. Moreover, we explore the design space of racetrack memory based systems and CNN model architectures, employing co-design to improve the efficiency and performance of performing CNN inference in racetrack memory while maintaining model accuracy. Our designed circuits and model-system co-optimization strategies achieve a small memory bank area with significant improvements in energy and performance for racetrack memory based embedded systems.

Updated: 2025-07-02 07:29:53

标题: 基于磁道存储器的内存计算硬件软件协同探索，用于嵌入式系统中的CNN推断

摘要: 深度神经网络生成和处理大量数据，给低资源嵌入式系统带来挑战。内存计算已被证明是一种高效的计算基础设施，并展现了在嵌入式人工智能应用中的潜力。在新近研究的内存技术中，磁道内存是一种非易失性技术，允许高数据密度制造，使其非常适合内存计算。然而，将内存算术电路与内存单元集成会影响内存密度和功耗效率。在面积和能量约束下构建在磁道内存上高效的内存算术电路仍然具有挑战性。因此，我们提出了一种优化用于磁道内存的高效内存卷积神经网络（CNN）加速器。我们设计了一系列基本算术电路作为适合乘加运算的内存计算单元。此外，我们探索了基于磁道内存系统和CNN模型体系结构的设计空间，采用协同设计来提高在磁道内存中进行CNN推断的效率和性能，同时保持模型精度。我们设计的电路和模型-系统协同优化策略在磁道内存为基础的嵌入式系统中实现了小内存库区域，并在能量和性能方面取得了显著改进。

更新时间: 2025-07-02 07:29:53

领域: cs.ET,cs.AI,cs.AR

下载: http://arxiv.org/abs/2507.01429v1

A Compact 16-bit S-box over Tower Field $\F_{(((2^2)^2)^2)^2}$ with High Security

This paper introduces a compact and secure 16-bit substitution box (S-box) designed over the composite field $\F_{(((2^2)^2)^2)^2}$, optimized for both hardware efficiency and cryptographic robustness. The proposed S-box decomposes operations into subfields, leveraging a tower field architecture. This enables significant hardware reduction through optimized field inversion and a low-cost affine transformation. Security evaluations confirm resilience against linear, differential, algebraic and DPA attacks, validated via metrics including Nonlinearity (32512), Differential Uniformity (4), Algebraic Degree (15), Transparency order (15.9875) and SNR (0.34e-08). The hardware results, in 65 nm CMOS technology, show the proposed 16-bit S-box has lower hardware resources consumption and lower critical path delay (CPD) than those of other 16-bit S-boxes. By integrating high algebraic complexity with resource-efficient structures, this work addresses the growing demand for scalable cryptographic primitives in data-sensitive applications, demonstrating that larger S-boxes can enhance security without proportional hardware costs. The results underscore the viability of composite field-based architectures in balancing security and efficiency for modern block ciphers.

Updated: 2025-07-02 07:22:22

标题: 一个具有高安全性的 Tower Field $\F_{(((2^2)^2)^2)^2}$ 上的紧凑型 16 位 S 盒

摘要: 本文介绍了一个紧凑且安全的16位替代盒（S盒），设计在复合域$\F_{(((2^2)^2)^2)^2}$上，旨在同时优化硬件效率和密码学强度。所提出的S盒将操作分解为子域，利用塔型域结构。这通过优化域反演和低成本仿射变换实现了显著的硬件减少。安全评估确认了对线性、差分、代数和DPA攻击的抵抗力，通过指标包括非线性（32512）、差分均匀性（4）、代数度（15）、透明度顺序（15.9875）和SNR（0.34e-08）进行验证。在65纳米CMOS技术中，硬件结果显示所提出的16位S盒比其他16位S盒具有更低的硬件资源消耗和更低的关键路径延迟（CPD）。通过将高代数复杂性与资源高效结构集成，这项工作满足了数据敏感应用中对可扩展密码原语的不断增长需求，表明更大的S盒可以提高安全性而不增加成本。结果强调了复合域架构在现代分组密码中平衡安全性和效率的可行性。

更新时间: 2025-07-02 07:22:22

领域: cs.CR

下载: http://arxiv.org/abs/2507.01423v1

DocShaDiffusion: Diffusion Model in Latent Space for Document Image Shadow Removal

Document shadow removal is a crucial task in the field of document image enhancement. However, existing methods tend to remove shadows with constant color background and ignore color shadows. In this paper, we first design a diffusion model in latent space for document image shadow removal, called DocShaDiffusion. It translates shadow images from pixel space to latent space, enabling the model to more easily capture essential features. To address the issue of color shadows, we design a shadow soft-mask generation module (SSGM). It is able to produce accurate shadow mask and add noise into shadow regions specially. Guided by the shadow mask, a shadow mask-aware guided diffusion module (SMGDM) is proposed to remove shadows from document images by supervising the diffusion and denoising process. We also propose a shadow-robust perceptual feature loss to preserve details and structures in document images. Moreover, we develop a large-scale synthetic document color shadow removal dataset (SDCSRD). It simulates the distribution of realistic color shadows and provides powerful supports for the training of models. Experiments on three public datasets validate the proposed method's superiority over state-of-the-art. Our code and dataset will be publicly available.

Updated: 2025-07-02 07:22:09

标题: DocShaDiffusion: 隐空间中的文档图像阴影去除扩散模型

摘要: 文献摘要：文档阴影去除是文档图像增强领域中的关键任务。然而，现有方法往往只能去除具有恒定背景颜色的阴影，忽略了彩色阴影。本文首先设计了一个潜在空间中的扩散模型，用于文档图像阴影去除，称为DocShaDiffusion。它将阴影图像从像素空间转换到潜在空间，使模型更容易捕捉到关键特征。为解决彩色阴影问题，我们设计了一个阴影软蒙版生成模块（SSGM）。它能够生成准确的阴影蒙版，并特别向阴影区域添加噪音。在阴影蒙版的指导下，提出了一个阴影蒙版感知引导扩散模块（SMGDM），通过监督扩散和去噪过程来从文档图像中去除阴影。我们还提出了一个阴影鲁棒的感知特征损失，以保留文档图像中的细节和结构。此外，我们开发了一个大规模合成文档彩色阴影去除数据集（SDCSRD）。它模拟了现实彩色阴影的分布，并为模型的训练提供了有力支持。在三个公共数据集上的实验证实了所提方法优于最先进技术。我们的代码和数据集将公开可用。

更新时间: 2025-07-02 07:22:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.01422v1

Penalizing Transparency? How AI Disclosure and Author Demographics Shape Human and AI Judgments About Writing

As AI integrates in various types of human writing, calls for transparency around AI assistance are growing. However, if transparency operates on uneven ground and certain identity groups bear a heavier cost for being honest, then the burden of openness becomes asymmetrical. This study investigates how AI disclosure statement affects perceptions of writing quality, and whether these effects vary by the author's race and gender. Through a large-scale controlled experiment, both human raters (n = 1,970) and LLM raters (n = 2,520) evaluated a single human-written news article while disclosure statements and author demographics were systematically varied. This approach reflects how both human and algorithmic decisions now influence access to opportunities (e.g., hiring, promotion) and social recognition (e.g., content recommendation algorithms). We find that both human and LLM raters consistently penalize disclosed AI use. However, only LLM raters exhibit demographic interaction effects: they favor articles attributed to women or Black authors when no disclosure is present. But these advantages disappear when AI assistance is revealed. These findings illuminate the complex relationships between AI disclosure and author identity, highlighting disparities between machine and human evaluation patterns.

Updated: 2025-07-02 07:18:09

标题: 对透明度进行惩罚？AI披露和作者人口统计数据如何影响人类和AI对写作的判断

摘要: 随着人工智能在各种类型的人类写作中的应用，对AI辅助的透明度的呼声正在增加。然而，如果透明度在不均等的基础上运作，并且某些身份群体因为诚实而承担更重的代价，那么公开的负担将变得不对称。本研究调查了AI披露声明如何影响写作质量的感知，以及这些影响是否因作者的种族和性别而有所不同。通过一项大规模的受控实验，人类评分人员（n = 1,970）和LLM评分人员（n = 2,520）对一篇人类撰写的新闻文章进行评估，同时系统地变化披露声明和作者人口统计数据。这种方法反映了人类和算法决策如何影响到机会（例如雇佣，晋升）和社会认可（例如内容推荐算法）的获取。我们发现，人类评分人员和LLM评分人员都会一致地对披露的AI使用进行惩罚。然而，只有LLM评分人员表现出人口统计交互效应：当没有披露时，他们倾向于支持归属于女性或黑人作者的文章。但是当AI辅助被揭示时，这些优势消失。这些发现阐明了AI披露和作者身份之间复杂的关系，凸显了机器和人类评估模式之间的差异。

更新时间: 2025-07-02 07:18:09

领域: cs.CY,cs.AI,H.5.2; I.2

下载: http://arxiv.org/abs/2507.01418v1

A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior

Image watermarks have been considered a promising technique to help detect AI-generated content, which can be used to protect copyright or prevent fake image abuse. In this work, we present a black-box method for removing invisible image watermarks, without the need of any dataset of watermarked images or any knowledge about the watermark system. Our approach is simple to implement: given a single watermarked image, we regress it by deep image prior (DIP). We show that from the intermediate steps of DIP one can reliably find an evasion image that can remove invisible watermarks while preserving high image quality. Due to its unique working mechanism and practical effectiveness, we advocate including DIP as a baseline invasion method for benchmarking the robustness of watermarking systems. Finally, by showing the limited ability of DIP and other existing black-box methods in evading training-based visible watermarks, we discuss the positive implications on the practical use of training-based visible watermarks to prevent misinformation abuse.

Updated: 2025-07-02 07:13:48

标题: 使用深度图像先验的去除隐形图像水印的基线方法

摘要: 图像水印被认为是一种有希望的技术，可以帮助检测AI生成的内容，从而用于保护版权或防止虚假图像滥用。在这项工作中，我们提出了一种黑盒方法，用于去除隐形图像水印，而无需任何水印图像数据集或任何关于水印系统的知识。我们的方法实现简单：给定一个单个水印图像，我们通过深度图像先验（DIP）进行回归。我们展示了通过DIP的中间步骤，可以可靠地找到一个规避图像，可以去除隐形水印同时保持高质量图像。由于其独特的工作机制和实际有效性，我们提倡将DIP作为基准入侵方法，用于评估水印系统的鲁棒性。最后，通过展示DIP和其他现有黑盒方法在规避基于训练的可见水印方面的有限能力，我们讨论了基于训练的可见水印在预防误导滥用方面的积极意义。

更新时间: 2025-07-02 07:13:48

领域: eess.IV,cs.AI,cs.CR,cs.CV

下载: http://arxiv.org/abs/2502.13998v2

A deep solver for backward stochastic Volterra integral equations

We present the first deep-learning solver for backward stochastic Volterra integral equations (BSVIEs) and their fully-coupled forward-backward variants. The method trains a neural network to approximate the two solution fields in a single stage, avoiding the use of nested time-stepping cycles that limit classical algorithms. For the decoupled case we prove a non-asymptotic error bound composed of an a posteriori residual plus the familiar square root dependence on the time step. Numerical experiments confirm this rate and reveal two key properties: \emph{scalability}, in the sense that accuracy remains stable from low dimension up to 500 spatial variables while GPU batching keeps wall-clock time nearly constant; and \emph{generality}, since the same method handles coupled systems whose forward dynamics depend on the backward solution. These results open practical access to a family of high-dimensional, path-dependent problems in stochastic control and quantitative finance.

Updated: 2025-07-02 07:12:03

标题: 一种用于反向随机Volterra积分方程的深度求解器

摘要: 我们提出了第一个用于解决反向随机Volterra积分方程（BSVIEs）及其完全耦合的前向-后向变体的深度学习求解器。该方法训练神经网络以在单个阶段近似两个解场，避免了限制传统算法的嵌套时间步进循环的使用。对于解耦合情况，我们证明了一个非渐近误差界，由事后残差和时间步长的平方根依赖组成。数值实验证实了这种速率并揭示了两个关键属性：可扩展性，即精度从低维度到500个空间变量时保持稳定，而GPU批处理使墙时钟时间几乎恒定；以及普遍性，因为相同的方法处理前向动态依赖于后向解的耦合系统。这些结果为解决随机控制和量化金融中高维、路径依赖问题打开了实际访问的途径。

更新时间: 2025-07-02 07:12:03

领域: math.NA,cs.LG,cs.NA,math.PR,q-fin.MF,65C30, 60H20, 60H35, 68T07,G.1.9; G.3; I.2.6; F.2.1

下载: http://arxiv.org/abs/2505.18297v2

Decomposing Prediction Mechanisms for In-Context Recall

We introduce a new family of toy problems that combine features of linear-regression-style continuous in-context learning (ICL) with discrete associative recall. We pretrain transformer models on sample traces from this toy, specifically symbolically-labeled interleaved state observations from randomly drawn linear deterministic dynamical systems. We study if the transformer models can recall the state of a sequence previously seen in its context when prompted to do so with the corresponding in-context label. Taking a closer look at this task, it becomes clear that the model must perform two functions: (1) identify which system's state should be recalled and apply that system to its last seen state, and (2) continuing to apply the correct system to predict the subsequent states. Training dynamics reveal that the first capability emerges well into a model's training. Surprisingly, the second capability, of continuing the prediction of a resumed sequence, develops much earlier. Via out-of-distribution experiments, and a mechanistic analysis on model weights via edge pruning, we find that next-token prediction for this toy problem involves at least two separate mechanisms. One mechanism uses the discrete symbolic labels to do the associative recall required to predict the start of a resumption of a previously seen sequence. The second mechanism, which is largely agnostic to the discrete symbolic labels, performs a "Bayesian-style" prediction based on the previous token and the context. These two mechanisms have different learning dynamics. To confirm that this multi-mechanism (manifesting as separate phase transitions) phenomenon is not just an artifact of our toy setting, we used OLMo training checkpoints on an ICL translation task to see a similar phenomenon: a decisive gap in the emergence of first-task-token performance vs second-task-token performance.

Updated: 2025-07-02 07:09:09

标题: 拆解上下文召回的预测机制

摘要: 我们引入了一类新的玩具问题，将线性回归风格的连续上下文学习（ICL）与离散联想回忆相结合。我们在这个玩具问题的示例跟踪上预训练变压器模型，具体来说是从随机抽取的线性确定性动力系统中符号标记的交替状态观察中。我们研究变压器模型是否能在提示时回忆先前在其上下文中看到的序列的状态，同时使用相应的上下文标签。仔细观察这个任务，可以清楚地看到模型必须执行两个功能：（1）识别应该回忆哪个系统的状态并应用该系统到其最近看到的状态，（2）继续应用正确的系统来预测接下来的状态。训练动态显示，第一个能力在模型的训练中发展得很好。令人惊讶的是，第二个能力，即继续预测恢复序列的发展，要早得多。通过超出分布的实验和通过边缘修剪对模型权重进行的机械分析，我们发现这个玩具问题的下一个标记预测涉及至少两个独立机制。一个机制使用离散符号标签进行必要的联想回忆，以预测先前看到的序列的恢复开始。第二个机制，基本上对于离散符号标签是不可知的，根据先前的标记和上下文执行“贝叶斯风格”的预测。这两个机制具有不同的学习动态。为了确认这种多机制（表现为不同阶段转变）现象不仅仅是我们玩具设置的一个产物，我们使用ICL翻译任务上的OLMo训练检查点来观察一个类似的现象：在首次任务标记性能与第二次任务标记性能出现的决定性差距。

更新时间: 2025-07-02 07:09:09

领域: cs.LG

下载: http://arxiv.org/abs/2507.01414v1

Evaluating LLM Agent Collusion in Double Auctions

Large language models (LLMs) have demonstrated impressive capabilities as autonomous agents with rapidly expanding applications in various domains. As these agents increasingly engage in socioeconomic interactions, identifying their potential for undesirable behavior becomes essential. In this work, we examine scenarios where they can choose to collude, defined as secretive cooperation that harms another party. To systematically study this, we investigate the behavior of LLM agents acting as sellers in simulated continuous double auction markets. Through a series of controlled experiments, we analyze how parameters such as the ability to communicate, choice of model, and presence of environmental pressures affect the stability and emergence of seller collusion. We find that direct seller communication increases collusive tendencies, the propensity to collude varies across models, and environmental pressures, such as oversight and urgency from authority figures, influence collusive behavior. Our findings highlight important economic and ethical considerations for the deployment of LLM-based market agents.

Updated: 2025-07-02 07:06:49

标题: 评估双向拍卖中LLM代理勾结

摘要: 大型语言模型(LLMs)已经展示出令人印象深刻的能力，作为自主代理在各个领域中迅速扩展应用。随着这些代理越来越多地参与经济社会交互，识别它们潜在的不良行为变得至关重要。在这项工作中，我们研究了它们可能选择勾结的情景，勾结被定义为对另一方造成伤害的秘密合作。为了系统地研究这一问题，我们调查了LLM代理在模拟连续双边拍卖市场中作为卖家时的行为。通过一系列控制实验，我们分析了诸如沟通能力、模型选择以及环境压力等参数如何影响卖家勾结的稳定性和出现。我们发现直接卖家沟通会增加勾结倾向，勾结的倾向因模型而异，而监督和来自权威人物的紧迫感等环境压力会影响勾结行为。我们的研究结果突出了LLM市场代理部署时重要的经济和伦理考虑。

更新时间: 2025-07-02 07:06:49

领域: cs.GT,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01413v1

Age Sensitive Hippocampal Functional Connectivity: New Insights from 3D CNNs and Saliency Mapping

Grey matter loss in the hippocampus is a hallmark of neurobiological aging, yet understanding the corresponding changes in its functional connectivity remains limited. Seed-based functional connectivity (FC) analysis enables voxel-wise mapping of the hippocampus's synchronous activity with cortical regions, offering a window into functional reorganization during aging. In this study, we develop an interpretable deep learning framework to predict brain age from hippocampal FC using a three-dimensional convolutional neural network (3D CNN) combined with LayerCAM saliency mapping. This approach maps key hippocampal-cortical connections, particularly with the precuneus, cuneus, posterior cingulate cortex, parahippocampal cortex, left superior parietal lobule, and right superior temporal sulcus, that are highly sensitive to age. Critically, disaggregating anterior and posterior hippocampal FC reveals distinct mapping aligned with their known functional specializations. These findings provide new insights into the functional mechanisms of hippocampal aging and demonstrate the power of explainable deep learning to uncover biologically meaningful patterns in neuroimaging data.

Updated: 2025-07-02 07:05:18

标题: 年龄敏感的海马功能连接：来自3D CNNs和显著性映射的新见解

摘要: 海马灰质损失是神经生物学老化的一个标志，然而对其功能连接相应变化的理解仍然有限。基于种子的功能连接（FC）分析可以将海马与皮层区域的同步活动进行逐体素映射，为了解老化过程中的功能重组提供了窗口。在这项研究中，我们开发了一种可解释的深度学习框架，使用三维卷积神经网络（3D CNN）结合LayerCAM显著性映射来预测大脑年龄从海马FC。这种方法映射了关键的海马皮层连接，特别是与顶枕、枕叶、后扣带皮质、海马旁皮质、左侧上顶叶和右侧上颞沟的连接，这些连接对年龄非常敏感。关键是，分解海马前后连接显示出与其已知功能专门化相一致的不同映射。这些发现为我们提供了关于海马老化的功能机制的新见解，并展示了可解释的深度学习揭示神经影像数据中生物学上有意义模式的力量。

更新时间: 2025-07-02 07:05:18

领域: q-bio.NC,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.01411v1

A Fuzzy Approach to the Specification, Verification and Validation of Risk-Based Ethical Decision Making Models

The ontological and epistemic complexities inherent in the moral domain make it challenging to establish clear standards for evaluating the performance of a moral machine. In this paper, we present a formal method to describe Ethical Decision Making models based on ethical risk assessment. Then, we show how these models that are specified as fuzzy rules can be verified and validated using fuzzy Petri nets. A case study from the medical field is considered to illustrate the proposed approach.

Updated: 2025-07-02 07:05:11

标题: 一个模糊方法用于风险基础伦理决策模型的规范、验证和验证

摘要: 在伦理领域固有的本体论和认识论复杂性使得建立评估道德机器表现的清晰标准具有挑战性。在本文中，我们提出了一种基于伦理风险评估的形式化方法来描述伦理决策模型。然后，我们展示了如何使用模糊彼得里网验证和验证这些规范为模糊规则的模型。考虑了医疗领域的一个案例研究来说明所提出的方法。

更新时间: 2025-07-02 07:05:11

领域: cs.AI

下载: http://arxiv.org/abs/2507.01410v1

DeFusion: An Effective Decoupling Fusion Network for Multi-Modal Pregnancy Prediction

Temporal embryo images and parental fertility table indicators are both valuable for pregnancy prediction in \textbf{in vitro fertilization embryo transfer} (IVF-ET). However, current machine learning models cannot make full use of the complementary information between the two modalities to improve pregnancy prediction performance. In this paper, we propose a Decoupling Fusion Network called DeFusion to effectively integrate the multi-modal information for IVF-ET pregnancy prediction. Specifically, we propose a decoupling fusion module that decouples the information from the different modalities into related and unrelated information, thereby achieving a more delicate fusion. And we fuse temporal embryo images with a spatial-temporal position encoding, and extract fertility table indicator information with a table transformer. To evaluate the effectiveness of our model, we use a new dataset including 4046 cases collected from Southern Medical University. The experiments show that our model outperforms state-of-the-art methods. Meanwhile, the performance on the eye disease prediction dataset reflects the model's good generalization. Our code is available at https://github.com/Ou-Young-1999/DFNet.

Updated: 2025-07-02 06:46:00

标题: DeFusion：一种有效的多模态孕妇预测解耦融合网络

摘要: 胚胎时间图像和父母生育表指标对体外受精胚胎移植（IVF-ET）的妊娠预测都是有价值的。然而，当前的机器学习模型无法充分利用两种模态之间的互补信息来提高妊娠预测性能。在本文中，我们提出了一种名为DeFusion的解耦融合网络，用于有效地整合IVF-ET妊娠预测的多模态信息。具体来说，我们提出了一个解耦融合模块，将来自不同模态的信息解耦为相关和无关信息，从而实现更精细的融合。我们将时间胚胎图像与空间-时间位置编码进行融合，并使用表格变换器提取生育表指标信息。为了评估我们模型的有效性，我们使用了一个包括来自南方医科大学的4046例病例的新数据集。实验表明，我们的模型优于最先进的方法。同时，对眼疾预测数据集的表现反映出模型的良好泛化性。我们的代码可在https://github.com/Ou-Young-1999/DFNet上找到。

更新时间: 2025-07-02 06:46:00

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2501.04353v2

MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow

In modern medicine, clinical diagnosis relies on the comprehensive analysis of primarily textual and visual data, drawing on medical expertise to ensure systematic and rigorous reasoning. Recent advances in large Vision-Language Models (VLMs) and agent-based methods hold great potential for medical diagnosis, thanks to the ability to effectively integrate multi-modal patient data. However, they often provide direct answers and draw empirical-driven conclusions without quantitative analysis, which reduces their reliability and clinical usability. We propose MedAgent-Pro, a new agentic reasoning paradigm that follows the diagnosis principle in modern medicine, to decouple the process into sequential components for step-by-step, evidence-based reasoning. Our MedAgent-Pro workflow presents a hierarchical diagnostic structure to mirror this principle, consisting of disease-level standardized plan generation and patient-level personalized step-by-step reasoning. To support disease-level planning, an RAG-based agent is designed to retrieve medical guidelines to ensure alignment with clinical standards. For patient-level reasoning, we propose to integrate professional tools such as visual models to enable quantitative assessments. Meanwhile, we propose to verify the reliability of each step to achieve evidence-based diagnosis, enforcing rigorous logical reasoning and a well-founded conclusion. Extensive experiments across a wide range of anatomical regions, imaging modalities, and diseases demonstrate the superiority of MedAgent-Pro to mainstream VLMs, agentic systems and state-of-the-art expert models. Ablation studies and human evaluation by clinical experts further validate its robustness and clinical relevance. Code is available at https://github.com/jinlab-imvr/MedAgent-Pro.

Updated: 2025-07-02 06:36:32

标题: MedAgent-Pro：基于推理代理工作流的循证多模式医学诊断

摘要: 在现代医学中，临床诊断依赖于对主要是文本和视觉数据的全面分析，借助医学专业知识确保系统化和严谨的推理。大规模视觉语言模型（VLMs）和基于代理的方法的最新进展为医学诊断提供了巨大潜力，因为能够有效整合多模态患者数据。然而，它们经常提供直接答案并得出经验驱动的结论，而没有定量分析，这降低了它们的可靠性和临床可用性。我们提出了MedAgent-Pro，一种新的代理推理范式，遵循现代医学诊断原则，将过程分解为顺序组件，以实现逐步、基于证据的推理。我们的MedAgent-Pro工作流展示了一个层次化的诊断结构，以反映这一原则，包括疾病级别的标准化计划生成和患者级别的个性化逐步推理。为了支持疾病级别规划，设计了基于RAG的代理来检索医学指南，以确保与临床标准对齐。对于患者级别的推理，我们提出整合专业工具，如视觉模型，以实现定量评估。同时，我们提出验证每个步骤的可靠性以实现基于证据的诊断，强调严谨的逻辑推理和基于事实的结论。通过在广泛的解剖区域、成像模态和疾病上进行大量实验，展示了MedAgent-Pro相对于主流VLMs、代理系统和最先进的专家模型的优越性。消融研究和由临床专家进行的人类评估进一步验证了其稳健性和临床相关性。代码可在https://github.com/jinlab-imvr/MedAgent-Pro 下载。

更新时间: 2025-07-02 06:36:32

领域: cs.AI

下载: http://arxiv.org/abs/2503.18968v3

Medical-Knowledge Driven Multiple Instance Learning for Classifying Severe Abdominal Anomalies on Prenatal Ultrasound

Fetal abdominal malformations are serious congenital anomalies that require accurate diagnosis to guide pregnancy management and reduce mortality. Although AI has demonstrated significant potential in medical diagnosis, its application to prenatal abdominal anomalies remains limited. Most existing studies focus on image-level classification and rely on standard plane localization, placing less emphasis on case-level diagnosis. In this paper, we develop a case-level multiple instance learning (MIL)-based method, free of standard plane localization, for classifying fetal abdominal anomalies in prenatal ultrasound. Our contribution is three-fold. First, we adopt a mixture-of-attention-experts module (MoAE) to weight different attention heads for various planes. Secondly, we propose a medical-knowledge-driven feature selection module (MFS) to align image features with medical knowledge, performing self-supervised image token selection at the case-level. Finally, we propose a prompt-based prototype learning (PPL) to enhance the MFS. Extensively validated on a large prenatal abdominal ultrasound dataset containing 2,419 cases, with a total of 24,748 images and 6 categories, our proposed method outperforms the state-of-the-art competitors. Codes are available at:https://github.com/LL-AC/AAcls.

Updated: 2025-07-02 06:31:26

标题: 基于医学知识的多实例学习用于产前超声检查中分类严重腹部异常

摘要: 胎儿腹部畸形是严重的先天性畸形，需要准确的诊断以指导妊娠管理并降低死亡率。虽然人工智能在医学诊断中表现出显著的潜力，但其在产前腹部畸形中的应用仍然有限。大多数现有研究集中在图像级别的分类，并依赖于标准平面定位，对个案级别诊断的重视较少。在本文中，我们提出了一种基于多实例学习（MIL）的个案级别方法，免除了标准平面定位，用于对产前超声中的胎儿腹部畸形进行分类。我们的贡献有三个方面。首先，我们采用注意力专家混合模块（MoAE）来为不同的平面加权不同的注意力头。其次，我们提出了一个以医学知识驱动的特征选择模块（MFS），以将图像特征与医学知识对齐，在个案级别进行自监督图像令牌选择。最后，我们提出了基于提示的原型学习（PPL）来增强MFS。在包含2,419个案例、共24,748幅图像和6个类别的大型产前腹部超声数据集上进行了广泛验证，我们提出的方法优于最先进的竞争对手。代码可在https://github.com/LL-AC/AAcls 获取。

更新时间: 2025-07-02 06:31:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.01401v1

Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

Most autonomous cars rely on the availability of high-definition (HD) maps. Current research aims to address this constraint by directly predicting HD map elements from onboard sensors and reasoning about the relationships between the predicted map and traffic elements. Despite recent advancements, the coherent online construction of HD maps remains a challenging endeavor, as it necessitates modeling the high complexity of road topologies in a unified and consistent manner. To address this challenge, we propose a coherent approach to predict lane segments and their corresponding topology, as well as road boundaries, all by leveraging prior map information represented by commonly available standard-definition (SD) maps. We propose a network architecture, which leverages hybrid lane segment encodings comprising prior information and denoising techniques to enhance training stability and performance. Furthermore, we facilitate past frames for temporal consistency. Our experimental evaluation demonstrates that our approach outperforms previous methods by a large margin, highlighting the benefits of our modeling scheme.

Updated: 2025-07-02 06:26:17

标题: 利用标准清晰度地图进行连贯的在线道路拓扑估计和推理

摘要: 大多数自动驾驶汽车依赖高清地图的可用性。目前的研究旨在通过直接从车载传感器预测高清地图元素并推理预测地图与交通元素之间的关系来解决这一限制。尽管最近取得了进展，但在线构建高清地图仍然是一项具有挑战性的工作，因为它需要以统一和一致的方式对道路拓扑的高复杂性进行建模。为了解决这一挑战，我们提出了一种连贯的方法，通过利用常见的标准清晰度地图所代表的先前地图信息来预测车道段及其相应的拓扑结构，以及道路边界。我们提出了一种网络架构，利用包括先前信息和去噪技术在内的混合车道段编码来增强训练的稳定性和性能。此外，我们促进了过去的帧以实现时间上的一致性。我们的实验评估表明，我们的方法在很大程度上优于先前的方法，突出了我们建模方案的益处。

更新时间: 2025-07-02 06:26:17

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.01397v1

Surrogate Modeling via Factorization Machine and Ising Model with Enhanced Higher-Order Interaction Learning

Recently, a surrogate model was proposed that employs a factorization machine to approximate the underlying input-output mapping of the original system, with quantum annealing used to optimize the resulting surrogate function. Inspired by this approach, we propose an enhanced surrogate model that incorporates additional slack variables into both the factorization machine and its associated Ising representation thereby unifying what was by design a two-step process into a single, integrated step. During the training phase, the slack variables are iteratively updated, enabling the model to account for higher-order feature interactions. We apply the proposed method to the task of predicting drug combination effects. Experimental results indicate that the introduction of slack variables leads to a notable improvement of performance. Our algorithm offers a promising approach for building efficient surrogate models that exploit potential quantum advantages.

Updated: 2025-07-02 06:10:49

标题: 通过因子分解机和具有增强高阶交互学习的伊辛模型进行代理建模

摘要: 最近，提出了一个替代模型，该模型利用因子分解机来近似原始系统的输入输出映射，通过量子退火来优化所得到的替代函数。受到这种方法的启发，我们提出了一个增强的替代模型，该模型将额外的松弛变量纳入因子分解机及其相关的Ising表示中，从而将本来是设计为两步骤的过程统一为一个单一的集成步骤。在训练阶段，松弛变量被迭代更新，使模型能够考虑更高阶的特征交互作用。我们将提出的方法应用于预测药物组合效应的任务。实验结果表明，引入松弛变量显著提高了性能。我们的算法提供了一个有希望的方法，用于构建利用潜在量子优势的高效替代模型。

更新时间: 2025-07-02 06:10:49

领域: cs.LG,quant-ph

下载: http://arxiv.org/abs/2507.01389v1

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

With the growing integration of vision-language models (VLMs), mobile agents are now widely used for tasks like UI automation and camera-based user assistance. These agents are often fine-tuned on limited user-generated datasets, leaving them vulnerable to covert threats during the training process. In this work we present GHOST, the first clean-label backdoor attack specifically designed for mobile agents built upon VLMs. Our method manipulates only the visual inputs of a portion of the training samples - without altering their corresponding labels or instructions - thereby injecting malicious behaviors into the model. Once fine-tuned with this tampered data, the agent will exhibit attacker-controlled responses when a specific visual trigger is introduced at inference time. The core of our approach lies in aligning the gradients of poisoned samples with those of a chosen target instance, embedding backdoor-relevant features into the poisoned training data. To maintain stealth and enhance robustness, we develop three realistic visual triggers: static visual patches, dynamic motion cues, and subtle low-opacity overlays. We evaluate our method across six real-world Android apps and three VLM architectures adapted for mobile use. Results show that our attack achieves high attack success rates (up to 94.67 percent) while maintaining high clean-task performance (FSR up to 95.85 percent). Additionally, ablation studies shed light on how various design choices affect the efficacy and concealment of the attack. Overall, this work is the first to expose critical security flaws in VLM-based mobile agents, highlighting their susceptibility to clean-label backdoor attacks and the urgent need for effective defense mechanisms in their training pipelines.

Updated: 2025-07-02 06:08:03

标题: 屏幕劫持：移动环境中VLM代理的视觉中毒

摘要: 随着视觉-语言模型（VLMs）的不断整合，移动代理现在广泛用于UI自动化和基于摄像头的用户辅助等任务。这些代理通常在有限的用户生成的数据集上进行微调，使它们在训练过程中容易受到潜在威胁。在这项工作中，我们提出了GHOST，这是第一个专门针对基于VLMs构建的移动代理的干净标签后门攻击。我们的方法仅操纵部分训练样本的视觉输入，而不改变其相应的标签或指令，从而向模型中注入恶意行为。一旦使用这些篡改数据进行微调，当在推理时引入特定的视觉触发器时，代理将展示受攻击者控制的响应。我们方法的核心在于将被污染样本的梯度与选择的目标实例的梯度对齐，将后门相关特征嵌入到被污染的训练数据中。为了保持隐蔽性并增强鲁棒性，我们开发了三种现实的视觉触发器：静态视觉补丁、动态运动线索和微妙的低不透明度叠加层。我们在六个真实的Android应用程序和三个适用于移动设备的VLM架构上评估我们的方法。结果显示，我们的攻击实现了高攻击成功率（高达94.67％），同时保持高的干净任务性能（FSR高达95.85％）。此外，消融研究揭示了各种设计选择如何影响攻击的有效性和隐蔽性。总的来说，这项工作是首次揭示了基于VLM的移动代理中的关键安全漏洞，突显了它们对干净标签后门攻击的易受性以及对其训练管道中有效防御机制的紧迫需求。

更新时间: 2025-07-02 06:08:03

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2506.13205v4

Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent

For nonconvex objective functions, including those found in training deep neural networks, stochastic gradient descent (SGD) with momentum is said to converge faster and have better generalizability than SGD without momentum. In particular, adding momentum is thought to reduce stochastic noise. To verify this, we estimated the magnitude of gradient noise by using convergence analysis and an optimal batch size estimation formula and found that momentum does not reduce gradient noise. We also analyzed the effect of search direction noise, which is stochastic noise defined as the error between the search direction of the optimizer and the steepest descent direction, and found that it inherently smooths the objective function and that momentum does not reduce search direction noise either. Finally, an analysis of the degree of smoothing introduced by search direction noise revealed that adding momentum offers limited advantage to SGD.

Updated: 2025-07-02 06:01:50

标题: 动量在随机梯度下降中并不减少随机噪音

摘要: 对于非凸目标函数，包括训练深度神经网络中所发现的那些，据说带有动量的随机梯度下降（SGD）比不带动量的SGD收敛速度更快，泛化能力更好。特别是，添加动量被认为可以减少随机噪声。为了验证这一点，我们利用收敛分析和最佳批量大小估计公式估计了梯度噪声的大小，发现动量并不减少梯度噪声。我们还分析了搜索方向噪声的影响，这是定义为优化器搜索方向与最陡下降方向之间误差的随机噪声，并发现它本质上使目标函数变得平滑，而动量也不会减少搜索方向噪声。最后，对搜索方向噪声引入的平滑程度进行的分析显示，添加动量对SGD提供了有限的优势。

更新时间: 2025-07-02 06:01:50

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2402.02325v5

LUSD: Localized Update Score Distillation for Text-Guided Image Editing

While diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the rich generative prior of text-to-image diffusion models to solve this task without additional fine-tuning. However, these methods often struggle with tasks such as object insertion. Our investigation of these failures reveals significant variations in gradient magnitude and spatial distribution, making hyperparameter tuning highly input-specific or unsuccessful. To address this, we propose two simple yet effective modifications: attention-based spatial regularization and gradient filtering-normalization, both aimed at reducing these variations during gradient updates. Experimental results show our method outperforms state-of-the-art score distillation techniques in prompt fidelity, improving successful edits while preserving the background. Users also preferred our method over state-of-the-art techniques across three metrics, and by 58-64% overall.

Updated: 2025-07-02 06:00:36

标题: LUSD：用于文本引导图像编辑的本地化更新分数蒸馏

摘要: 扩散模型在给定目标提示的图像编辑中显示出有希望的结果，但要实现提示的准确性和背景的保留仍然困难。最近的研究引入了评分蒸馏技术，利用文本到图像扩散模型的丰富生成先验来解决这一任务，而无需额外的微调。然而，这些方法常常在诸如对象插入等任务中遇到困难。我们对这些失败的调查揭示了梯度幅度和空间分布的显著变化，使得超参数调整高度依赖于输入，或者无法成功。为了解决这个问题，我们提出了两种简单而有效的修改：基于注意力的空间正则化和梯度过滤-归一化，旨在在梯度更新过程中减少这些变化。实验结果显示，我们的方法在准确性方面优于最先进的评分蒸馏技术，提高了成功的编辑同时保留了背景。用户还更倾向于我们的方法，而不是整体上超过最先进的技术。

更新时间: 2025-07-02 06:00:36

领域: cs.GR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.11054v2

RALLY: Role-Adaptive LLM-Driven Yoked Navigation for Agentic UAV Swarms

Intelligent control of Unmanned Aerial Vehicles (UAVs) swarms has emerged as a critical research focus, and it typically requires the swarm to navigate effectively while avoiding obstacles and achieving continuous coverage over multiple mission targets. Although traditional Multi-Agent Reinforcement Learning (MARL) approaches offer dynamic adaptability, they are hindered by the semantic gap in numerical communication and the rigidity of homogeneous role structures, resulting in poor generalization and limited task scalability. Recent advances in Large Language Model (LLM)-based control frameworks demonstrate strong semantic reasoning capabilities by leveraging extensive prior knowledge. However, due to the lack of online learning and over-reliance on static priors, these works often struggle with effective exploration, leading to reduced individual potential and overall system performance. To address these limitations, we propose a Role-Adaptive LLM-Driven Yoked navigation algorithm RALLY. Specifically, we first develop an LLM-driven semantic decision framework that uses structured natural language for efficient semantic communication and collaborative reasoning. Afterward, we introduce a dynamic role-heterogeneity mechanism for adaptive role switching and personalized decision-making. Furthermore, we propose a Role-value Mixing Network (RMIX)-based assignment strategy that integrates LLM offline priors with MARL online policies to enable semi-offline training of role selection strategies. Experiments in the Multi-Agent Particle Environment (MPE) environment and a Software-In-The-Loop (SITL) platform demonstrate that RALLY outperforms conventional approaches in terms of task coverage, convergence speed, and generalization, highlighting its strong potential for collaborative navigation in agentic multi-UAV systems.

Updated: 2025-07-02 05:44:17

标题: RALLY: 基于角色自适应LLM驱动的代理UAV群体导航

摘要: 智能控制无人机（UAV）群体已经成为一个关键的研究焦点，通常需要群体在避开障碍物、实现连续覆盖多个任务目标的同时有效导航。虽然传统的多智能体强化学习（MARL）方法提供了动态适应性，但由于数值通信中的语义鸿沟和同质角色结构的刚性，导致了较差的泛化性能和有限的任务可扩展性。基于大型语言模型（LLM）的控制框架的最新进展展示了通过利用广泛的先验知识而具有强大的语义推理能力。然而，由于缺乏在线学习并过度依赖静态先验，这些工作通常面临有效探索的困难，导致个体潜力和整个系统性能降低。为了解决这些限制，我们提出了一个角色自适应的LLM驱动的联动导航算法RALLY。具体来说，我们首先开发了一个LLM驱动的语义决策框架，利用结构化自然语言进行高效的语义通信和协作推理。随后，我们引入了一个动态角色异质性机制，用于自适应角色切换和个性化决策制定。此外，我们提出了一个基于角色价值混合网络（RMIX）的分配策略，将LLM离线先验与MARL在线策略整合，以实现角色选择策略的半离线训练。在多智能体粒子环境（MPE）环境和软件在环（SITL）平台上的实验表明，RALLY在任务覆盖、收敛速度和泛化性能方面优于传统方法，突显了其在代理多UAV系统中协同导航中的潜力。

更新时间: 2025-07-02 05:44:17

领域: cs.MA,cs.AI,cs.RO

下载: http://arxiv.org/abs/2507.01378v1

Direct Quantized Training of Language Models with Stochastic Rounding

Although recent quantized Large Language Models (LLMs), such as BitNet, have paved the way for significant reduction in memory usage during deployment with binary or ternary weights, training these models still demands substantial memory footprints. This is partly because high-precision (i.e., unquantized) weights required for straight-through estimation must be maintained throughout the whole training process. To address this, we explore directly updating the quantized low-precision weights without relying on straight-through estimation during backpropagation, aiming to save memory usage during training. Specifically, we employ a stochastic rounding technique to minimize the information loss caused by the use of low-bit weights throughout training. Experimental results on our LLaMA-structured models of various sizes indicate that (1) training with only low-precision weights is feasible even when they are constrained to ternary values; (2) extending the bit width to 8 bits achieves performance on par with BitNet b1.58; (3) our models remain robust to precision scaling and memory reduction, showing minimal performance degradation when moving from FP32 to lower-memory environments (BF16/FP8); and (4) our models also support inference using ternary weights, showcasing their flexibility in deployment.

Updated: 2025-07-02 05:35:17

标题: 用随机舍入的方式直接对语言模型进行量化训练

摘要: 尽管最近的量化大型语言模型（LLMs），如BitNet，为在部署过程中使用二进制或三进制权重显著减少内存使用量铺平了道路，但训练这些模型仍然需要大量内存占用。这部分是因为需要高精度（即未量化）权重用于直通估计，在整个训练过程中必须保持。为了解决这个问题，我们探索了在反向传播过程中直接更新量化的低精度权重，而无需依赖直通估计，旨在节省训练过程中的内存使用量。具体而言，我们采用了一种随机舍入技术，以最小化在整个训练过程中使用低位权重所引起的信息损失。对我们的各种大小的LLaMA结构模型进行的实验结果表明：（1）即使将它们限制为三值，只使用低精度权重进行训练也是可行的；（2）将位宽扩展到8位可以达到与BitNet b1.58相当的性能；（3）我们的模型对精度缩放和内存减少保持稳健，从FP32迁移到较低内存环境（BF16/FP8）时显示出最小的性能降级；（4）我们的模型还支持使用三值权重进行推断，展示了它们在部署中的灵活性。

更新时间: 2025-07-02 05:35:17

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2412.04787v2

Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess

While reinforcement learning (RL) for large language models (LLMs) has shown promise in mathematical reasoning, strategic reasoning for LLMs using RL remains largely unexplored. We investigate whether LLMs can develop strategic reasoning capabilities through RL in chess. To this end, we leverage a chess-pretrained action-value network to provide dense reward on the LLM's output move quality, which can be seen as a form of knowledge distillation. Our experiments show that our distillation-based dense rewards often outperform sparse binary rewards. However, surprisingly, all models plateau far below expert levels. We provide SFT and RL ablations on chess reasoning training and find evidence that this limitation stems from a deficit in the pretrained models' internal understanding of chess--a deficit which RL alone may not be able to fully overcome.

Updated: 2025-07-02 05:31:51

标题: 大型语言模型能够发展战略推理能力吗？来自学习国际象棋的后训练见解

摘要: 虽然针对大型语言模型（LLMs）的强化学习（RL）在数学推理方面显示出潜力，但使用RL进行LLMs的战略推理仍然大多未被探索。我们研究了LLMs是否能通过RL在国际象棋中发展战略推理能力。为此，我们利用国际象棋预训练的行动价值网络，为LLM的输出移动质量提供密集奖励，这可以看作是一种知识蒸馏。我们的实验表明，基于蒸馏的密集奖励往往胜过稀疏的二元奖励。然而，令人惊讶的是，所有模型的水平都远远低于专家水平。我们对国际象棋推理训练进行了SFT和RL消融，并发现证据表明，这种限制源于预训练模型对国际象棋内部理解的不足--这种不足仅凭RL本身可能无法完全克服。

更新时间: 2025-07-02 05:31:51

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.00726v2

AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing

AI agents are autonomous systems designed to perceive, reason, and act within dynamic environments. With the rapid advancements in generative AI (GenAI), large language models (LLMs) and multimodal large language models (MLLMs) have significantly improved AI agents' capabilities in semantic comprehension, complex reasoning, and autonomous decision-making. At the same time, the rise of Agentic AI highlights adaptability and goal-directed autonomy in dynamic and complex environments. LLMs-based AI Agents (LLM-Agents), MLLMs-based AI Agents (MLLM-Agents), and Agentic AI contribute to expanding AI's capabilities in information processing, environmental perception, and autonomous decision-making, opening new avenues for smart manufacturing. However, the definitions, capability boundaries, and practical applications of these emerging AI paradigms in smart manufacturing remain unclear. To address this gap, this study systematically reviews the evolution of AI and AI agent technologies, examines the core concepts and technological advancements of LLM-Agents, MLLM-Agents, and Agentic AI, and explores their potential applications in and integration into manufacturing, along with the potential challenges they may face.

Updated: 2025-07-02 05:31:17

标题: AI代理和代理AI-导航未来制造业的各种概念

摘要: 人工智能代理是设计用于感知、推理和行动在动态环境中的自主系统。随着生成式人工智能（GenAI）的快速发展，大型语言模型（LLMs）和多模态大型语言模型（MLLMs）显著提高了人工智能代理在语义理解、复杂推理和自主决策方面的能力。同时，自主人工智能的崛起强调了在动态和复杂环境中的适应性和目标导向自主性。基于LLMs的人工智能代理（LLM-Agents）、基于MLLMs的人工智能代理（MLLM-Agents）和自主人工智能有助于扩展人工智能在信息处理、环境感知和自主决策方面的能力，为智能制造开辟了新途径。然而，这些新兴人工智能范式在智能制造中的定义、能力边界和实际应用仍不明确。为了弥补这一差距，本研究系统地回顾了人工智能和人工智能代理技术的演变，审视了LLM-Agents、MLLM-Agents和自主人工智能的核心概念和技术进步，并探讨了它们在制造业中的潜在应用及整合，以及它们可能面临的挑战。

更新时间: 2025-07-02 05:31:17

领域: cs.AI

下载: http://arxiv.org/abs/2507.01376v1

Feature Reweighting for EEG-based Motor Imagery Classification

Classification of motor imagery (MI) using non-invasive electroencephalographic (EEG) signals is a critical objective as it is used to predict the intention of limb movements of a subject. In recent research, convolutional neural network (CNN) based methods have been widely utilized for MI-EEG classification. The challenges of training neural networks for MI-EEG signals classification include low signal-to-noise ratio, non-stationarity, non-linearity, and high complexity of EEG signals. The features computed by CNN-based networks on the highly noisy MI-EEG signals contain irrelevant information. Subsequently, the feature maps of the CNN-based network computed from the noisy and irrelevant features contain irrelevant information. Thus, many non-contributing features often mislead the neural network training and degrade the classification performance. Hence, a novel feature reweighting approach is proposed to address this issue. The proposed method gives a noise reduction mechanism named feature reweighting module that suppresses irrelevant temporal and channel feature maps. The feature reweighting module of the proposed method generates scores that reweight the feature maps to reduce the impact of irrelevant information. Experimental results show that the proposed method significantly improved the classification of MI-EEG signals of Physionet EEG-MMIDB and BCI Competition IV 2a datasets by a margin of 9.34% and 3.82%, respectively, compared to the state-of-the-art methods.

Updated: 2025-07-02 05:29:39

标题: 基于脑电信号的运动想象分类的特征重新加权

摘要: 使用非侵入性脑电图（EEG）信号对运动意向（MI）进行分类是一个关键目标，因为它用于预测受试者肢体运动的意图。在最近的研究中，基于卷积神经网络（CNN）的方法被广泛应用于MI-EEG分类。训练神经网络用于MI-EEG信号分类的挑战包括低信噪比、非稳态、非线性和EEG信号的高复杂性。基于CNN的网络在高噪音MI-EEG信号上计算的特征包含无关信息。随后，基于CNN的网络从嘈杂和无关特征计算的特征图包含无关信息。因此，许多无贡献的特征经常误导神经网络训练并降低分类性能。因此，提出了一种新颖的特征重新加权方法来解决这个问题。所提出的方法提供了一个名为特征重新加权模块的噪声减少机制，抑制无关的时间和通道特征图。所提出方法的特征重新加权模块生成分数，重新加权特征图以减少无关信息的影响。实验结果表明，与最先进的方法相比，所提出的方法在Physionet EEG-MMIDB和BCI Competition IV 2a数据集的MI-EEG信号分类方面分别提高了9.34％和3.82％。

更新时间: 2025-07-02 05:29:39

领域: cs.LG,cs.HC,eess.SP

下载: http://arxiv.org/abs/2308.02515v2

Active Measurement: Efficient Estimation at Scale

AI has the potential to transform scientific discovery by analyzing vast datasets with little human effort. However, current workflows often do not provide the accuracy or statistical guarantees that are needed. We introduce active measurement, a human-in-the-loop AI framework for scientific measurement. An AI model is used to predict measurements for individual units, which are then sampled for human labeling using importance sampling. With each new set of human labels, the AI model is improved and an unbiased Monte Carlo estimate of the total measurement is refined. Active measurement can provide precise estimates even with an imperfect AI model, and requires little human effort when the AI model is very accurate. We derive novel estimators, weighting schemes, and confidence intervals, and show that active measurement reduces estimation error compared to alternatives in several measurement tasks.

Updated: 2025-07-02 05:20:32

标题: 主动测量：大规模高效估计

摘要: 人工智能有潜力通过分析大量数据集来改变科学发现，减少人力成本。然而，目前的工作流程通常无法提供所需的准确性或统计保证。我们引入了主动测量，这是一个人机协同的人工智能框架用于科学测量。一个人工智能模型用于预测单个单位的测量结果，然后利用重要性抽样进行人工标记。随着每组新的人工标签，人工智能模型得到改进，并对总测量结果进行无偏蒙特卡洛估计。主动测量即使在模型不完美的情况下也能提供精确估计，并且当人工智能模型非常准确时需要很少的人力投入。我们推导了新颖的估计器、加权方案和置信区间，并展示了主动测量在几个测量任务中相比替代方案可以降低估计误差。

更新时间: 2025-07-02 05:20:32

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.01372v1

Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU

Advanced Large Language Models (LLMs) have achieved impressive performance across a wide range of complex and long-context natural language tasks. However, performing long-context LLM inference locally on a commodity GPU (a PC) with privacy concerns remains challenging due to the increasing memory demands of the key-value (KV) cache. Existing systems typically identify important tokens and selectively offload their KV data to GPU and CPU memory. The KV data needs to be offloaded to disk due to the limited memory on a commodity GPU, but the process is bottlenecked by token importance evaluation overhead and the disk's low bandwidth. In this paper, we present LeoAM, the first efficient importance-aware long-context LLM inference system for a single commodity GPU with adaptive hierarchical GPU-CPU-Disk KV management. Our system employs an adaptive KV management strategy that partitions KV data into variable-sized chunks based on the skewed distribution of attention weights across different layers to reduce computational and additional transmission overheads. Moreover, we propose a lightweight KV abstract method, which minimizes transmission latency by storing and extracting the KV abstract of each chunk on disk instead of the full KV data. LeoAM also leverages the dynamic compression and pipeline techniques to further accelerate inference. Experimental results demonstrate that LongInfer achieves an average inference latency speedup of 3.46x, while maintaining comparable LLM response quality. In scenarios with larger batch sizes, it achieves up to a 5.47x speedup.

Updated: 2025-07-02 05:12:29

标题: 打破长上下文LLM推理的边界：单一商品GPU上的自适应KV管理

摘要: 先进的大型语言模型（LLMs）在各种复杂和长上下文自然语言任务中取得了令人印象深刻的性能。然而，在存在隐私问题的情况下，在商品GPU（PC）上进行长上下文LLM推理仍然具有挑战性，这是由于键-值（KV）缓存对内存需求不断增加。现有系统通常识别重要的标记，并选择性地将它们的KV数据卸载到GPU和CPU内存中。由于商品GPU上的内存有限，KV数据需要卸载到磁盘上，但这一过程受到标记重要性评估开销和磁盘低带宽的瓶颈影响。在本文中，我们提出了LeoAM，这是第一个针对单个商品GPU的高效的重要性感知长上下文LLM推理系统，具有自适应的分层GPU-CPU-磁盘KV管理。我们的系统采用一种自适应的KV管理策略，根据不同层之间注意权重的偏斜分布，将KV数据分成不同大小的块，以减少计算和额外传输开销。此外，我们提出了一种轻量级的KV抽象方法，通过在磁盘上存储和提取每个块的KV抽象，而不是完整的KV数据，来最小化传输延迟。LeoAM还利用动态压缩和流水线技术进一步加速推理。实验结果表明，LongInfer实现了平均推理延迟加速3.46倍，同时保持了可比的LLM响应质量。在批处理大小较大的情况下，它实现了高达5.47倍的加速。

更新时间: 2025-07-02 05:12:29

领域: cs.OS,cs.CR,68M20,C.4

下载: http://arxiv.org/abs/2506.20187v2

Activation Reward Models for Few-Shot Model Alignment

Aligning Large Language Models (LLMs) and Large Multimodal Models (LMMs) to human preferences is a central challenge in improving the quality of the models' generative outputs for real-world applications. A common approach is to use reward modeling to encode preferences, enabling alignment via post-training using reinforcement learning. However, traditional reward modeling is not easily adaptable to new preferences because it requires a separate reward model, commonly trained on large preference datasets. To address this, we introduce Activation Reward Models (Activation RMs) -- a novel few-shot reward modeling method that leverages activation steering to construct well-aligned reward signals using minimal supervision and no additional model finetuning. Activation RMs outperform existing few-shot reward modeling approaches such as LLM-as-a-judge with in-context learning, voting-based scoring, and token probability scoring on standard reward modeling benchmarks. Furthermore, we demonstrate the effectiveness of Activation RMs in mitigating reward hacking behaviors, highlighting their utility for safety-critical applications. Toward this end, we propose PreferenceHack, a novel few-shot setting benchmark, the first to test reward models on reward hacking in a paired preference format. Finally, we show that Activation RM achieves state-of-the-art performance on this benchmark, surpassing even GPT-4o.

Updated: 2025-07-02 05:10:29

标题: 少样本模型对齐的激活奖励模型

摘要: 将大型语言模型（LLMs）和大型多模态模型（LMMs）与人类偏好对齐是提高模型生成输出质量的核心挑战，特别是在现实世界应用中。一种常见的方法是使用奖励建模来编码偏好，通过强化学习进行后期训练来实现对齐。然而，传统的奖励建模不容易适应新的偏好，因为它需要一个单独的奖励模型，通常在大型偏好数据集上训练。为了解决这个问题，我们引入了激活奖励模型（Activation RMs）——一种新颖的少样本奖励建模方法，利用激活引导来构建对齐良好的奖励信号，使用最少的监督和没有额外的模型微调。激活奖励模型在标准奖励建模基准测试中表现优于现有的少样本奖励建模方法，如上下文学习中的LLM作为评判者，基于投票的评分和基于标记概率的评分。此外，我们展示了激活奖励模型在缓解奖励欺骗行为方面的有效性，突出了它们在安全关键应用中的实用性。为此，我们提出了PreferenceHack，一种新颖的少样本设置基准测试，首次以成对偏好格式测试奖励模型在奖励欺骗方面的表现。最后，我们展示了激活奖励模型在这一基准测试上实现了最先进的性能，甚至超过了GPT-4o。

更新时间: 2025-07-02 05:10:29

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.01368v1

Continual Learning with Strategic Selection and Forgetting for Network Intrusion Detection

Intrusion Detection Systems (IDS) are crucial for safeguarding digital infrastructure. In dynamic network environments, both threat landscapes and normal operational behaviors are constantly changing, resulting in concept drift. While continuous learning mitigates the adverse effects of concept drift, insufficient attention to drift patterns and excessive preservation of outdated knowledge can still hinder the IDS's adaptability. In this paper, we propose SSF (Strategic Selection and Forgetting), a novel continual learning method for IDS, providing continuous model updates with a constantly refreshed memory buffer. Our approach features a strategic sample selection algorithm to select representative new samples and a strategic forgetting mechanism to drop outdated samples. The proposed strategic sample selection algorithm prioritizes new samples that cause the `drifted' pattern, enabling the model to better understand the evolving landscape. Additionally, we introduce strategic forgetting upon detecting significant drift by discarding outdated samples to free up memory, allowing the incorporation of more recent data. SSF captures evolving patterns effectively and ensures the model is aligned with the change of data patterns, significantly enhancing the IDS's adaptability to concept drift. The state-of-the-art performance of SSF on NSL-KDD and UNSW-NB15 datasets demonstrates its superior adaptability to concept drift for network intrusion detection. The code is released at https://github.com/xinchen930/SSF-Strategic-Selection-and-Forgetting.

Updated: 2025-07-02 05:07:28

标题: 网络入侵检测的策略选择和遗忘的持续学习

摘要: 入侵检测系统（IDS）对于保护数字基础设施至关重要。在动态网络环境中，威胁景观和正常运行行为不断变化，导致概念漂移。虽然持续学习可以减轻概念漂移的不利影响，但对漂移模式的不足关注和过度保留过时知识仍可能阻碍IDS的适应性。在本文中，我们提出SSF（战略选择和遗忘），一种针对IDS的新型持续学习方法，提供不断更新的模型和不断刷新的内存缓冲区。我们的方法采用战略样本选择算法来选择代表性的新样本，以及战略遗忘机制来丢弃过时的样本。提出的战略样本选择算法优先选择引起“漂移”模式的新样本，使模型能更好地理解不断变化的景观。此外，我们在检测到显著漂移时引入战略遗忘，通过丢弃过时样本来释放内存，从而允许整合更近期的数据。SSF有效捕捉演变的模式，并确保模型与数据模式的变化保持一致，大大增强IDS对概念漂移的适应性。SSF在NSL-KDD和UNSW-NB15数据集上的最新性能表现显示了其对网络入侵检测概念漂移的优越适应性。代码发布在https://github.com/xinchen930/SSF-Strategic-Selection-and-Forgetting。

更新时间: 2025-07-02 05:07:28

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.16264v4

Tightly-Coupled LiDAR-IMU-Leg Odometry with Online Learned Leg Kinematics Incorporating Foot Tactile Information

In this letter, we present tightly coupled LiDAR-IMU-leg odometry, which is robust to challenging conditions such as featureless environments and deformable terrains. We developed an online learning-based leg kinematics model named the neural leg kinematics model, which incorporates tactile information (foot reaction force) to implicitly express the nonlinear dynamics between robot feet and the ground. Online training of this model enhances its adaptability to weight load changes of a robot (e.g., assuming delivery or transportation tasks) and terrain conditions. According to the \textit{neural adaptive leg odometry factor} and online uncertainty estimation of the leg kinematics model-based motion predictions, we jointly solve online training of this kinematics model and odometry estimation on a unified factor graph to retain the consistency of both. The proposed method was verified through real experiments using a quadruped robot in two challenging situations: 1) a sandy beach, representing an extremely featureless area with a deformable terrain, and 2) a campus, including multiple featureless areas and terrain types of asphalt, gravel (deformable terrain), and grass. Experimental results showed that our odometry estimation incorporating the \textit{neural leg kinematics model} outperforms state-of-the-art works. Our project page is available for further details: https://takuokawara.github.io/RAL2025_project_page/

Updated: 2025-07-02 04:53:47

标题: 紧密耦合的LiDAR-IMU-腿部测距系统，结合在线学习的腿部运动学和足部触觉信息

摘要: 在这封信中，我们提出了一种紧密耦合的LiDAR-IMU-leg里程计，能够在诸如无特征环境和可变地形等挑战条件下表现稳健。我们开发了一种基于在线学习的腿部运动学模型，名为神经腿部运动学模型，该模型融入了触觉信息（脚反作用力），以隐式表达机器人脚与地面之间的非线性动态。该模型的在线训练增强了其对机器人的重量负载变化（例如，假设交付或运输任务）和地形条件的适应性。根据神经自适应腿部里程计因子和基于腿部运动学模型的运动预测的在线不确定性估计，我们共同解决了这个运动学模型和里程计估计的在线训练，将它们保持一致性。通过在两种具有挑战性的情况下使用四足机器人进行真实实验来验证所提出的方法：1）沙滩，代表一个具有可变地形的极端无特征区域，以及2）校园，包括多个无特征区域和地形类型，如沥青、碎石（可变地形）和草地。实验结果表明，我们结合神经腿部运动学模型的里程计估计优于最先进的作品。我们的项目页面提供了更多详细信息：https://takuokawara.github.io/RAL2025_project_page/

更新时间: 2025-07-02 04:53:47

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.09548v2

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in chart understanding tasks. However, interpreting charts with textual descriptions often leads to information loss, as it fails to fully capture the dense information embedded in charts. In contrast, parsing charts into code provides lossless representations that can effectively contain all critical details. Although existing open-source MLLMs have achieved success in chart understanding tasks, they still face two major challenges when applied to chart-to-code tasks: (1) Low executability and poor restoration of chart details in the generated code and (2) Lack of large-scale and diverse training data. To address these challenges, we propose \textbf{ChartCoder}, the first dedicated chart-to-code MLLM, which leverages Code LLMs as the language backbone to enhance the executability of the generated code. Furthermore, we introduce \textbf{Chart2Code-160k}, the first large-scale and diverse dataset for chart-to-code generation, and propose the \textbf{Snippet-of-Thought (SoT)} method, which transforms direct chart-to-code generation data into step-by-step generation. Experiments demonstrate that ChartCoder, with only 7B parameters, surpasses existing open-source MLLMs on chart-to-code benchmarks, achieving superior chart restoration and code excitability. Our code is available at https://github.com/thunlp/ChartCoder.

Updated: 2025-07-02 04:47:37

标题: ChartCoder：推动用于图表到代码生成的多模态大型语言模型

摘要: 多模态大型语言模型（MLLMs）在图表理解任务中展现出了卓越的能力。然而，使用文本描述解释图表通常会导致信息丢失，因为它未能完全捕捉图表中嵌入的密集信息。相比之下，将图表解析成代码提供了无损表示，可以有效地包含所有关键细节。尽管现有的开源MLLMs在图表理解任务中取得了成功，但在应用于图表到代码任务时仍面临两个主要挑战：（1）生成的代码可执行性低，对图表细节的恢复不佳；（2）缺乏大规模且多样化的训练数据。为了解决这些挑战，我们提出了\textbf{ChartCoder}，作为首个专门用于图表到代码的MLLM，利用代码LLMs作为语言基础来增强生成代码的可执行性。此外，我们介绍了\textbf{Chart2Code-160k}，作为首个大规模且多样化的图表到代码生成数据集，并提出了\textbf{Snippet-of-Thought（SoT）}方法，将直接的图表到代码生成数据转化为逐步生成。实验证明，仅具有7B参数的ChartCoder在图表到代码基准测试中超越了现有的开源MLLMs，实现了优越的图表恢复和代码激励。我们的代码可在https://github.com/thunlp/ChartCoder找到。

更新时间: 2025-07-02 04:47:37

领域: cs.AI

下载: http://arxiv.org/abs/2501.06598v3

Efficient Kilometer-Scale Precipitation Downscaling with Conditional Wavelet Diffusion

Effective hydrological modeling and extreme weather analysis demand precipitation data at a kilometer-scale resolution, which is significantly finer than the 10 km scale offered by standard global products like IMERG. To address this, we propose the Wavelet Diffusion Model (WDM), a generative framework that achieves 10x spatial super-resolution (downscaling to 1 km) and delivers a 9x inference speedup over pixel-based diffusion models. WDM is a conditional diffusion model that learns the learns the complex structure of precipitation from MRMS radar data directly in the wavelet domain. By focusing on high-frequency wavelet coefficients, it generates exceptionally realistic and detailed 1-km precipitation fields. This wavelet-based approach produces visually superior results with fewer artifacts than pixel-space models, and delivers a significant gains in sampling efficiency. Our results demonstrate that WDM provides a robust solution to the dual challenges of accuracy and speed in geoscience super-resolution, paving the way for more reliable hydrological forecasts.

Updated: 2025-07-02 04:46:28

标题: 高效的千米尺度降水条件小波扩散

摘要: 有效的水文建模和极端天气分析需要以千米为尺度的降水数据，这显著比标准全球产品如IMERG提供的10公里尺度要精细。为了解决这个问题，我们提出了Wavelet Diffusion Model（WDM），这是一个生成性框架，实现了10倍的空间超分辨率（降分辨率到1千米），并且比基于像素的扩散模型提供了9倍的推断加速。WDM是一个有条件的扩散模型，直接从MRMS雷达数据中在小波域学习降水的复杂结构。通过专注于高频小波系数，它生成了异常真实和详细的1千米降水场。这种基于小波的方法产生了比像素空间模型更加视觉上优越的结果，并且在采样效率上取得了显著的增益。我们的结果表明，WDM为地球科学超分辨率中准确性和速度的双重挑战提供了稳健的解决方案，为更可靠的水文预测铺平了道路。

更新时间: 2025-07-02 04:46:28

领域: cs.LG,physics.ao-ph,86A10 (Primary) 86A22, 68U10 (Secondary),J.2; I.4.4

下载: http://arxiv.org/abs/2507.01354v1

Dataset Distillation via the Wasserstein Metric

Dataset Distillation (DD) aims to generate a compact synthetic dataset that enables models to achieve performance comparable to training on the full large dataset, significantly reducing computational costs. Drawing from optimal transport theory, we introduce WMDD (Wasserstein Metric-based Dataset Distillation), a straightforward yet powerful method that employs the Wasserstein metric to enhance distribution matching. We compute the Wasserstein barycenter of features from a pretrained classifier to capture essential characteristics of the original data distribution. By optimizing synthetic data to align with this barycenter in feature space and leveraging per-class BatchNorm statistics to preserve intra-class variations, WMDD maintains the efficiency of distribution matching approaches while achieving state-of-the-art results across various high-resolution datasets. Our extensive experiments demonstrate WMDD's effectiveness and adaptability, highlighting its potential for advancing machine learning applications at scale.

Updated: 2025-07-02 04:25:43

标题: 数据集精炼的瓦瑟斯坦度量方式

摘要: 数据集精炼（DD）旨在生成一个紧凑的合成数据集，使模型能够在性能上达到与完整大型数据集训练相媲美的水平，显著降低计算成本。借鉴于最优传输理论，我们引入了基于Wasserstein度量的数据集精炼方法WMDD（Wasserstein Metric-based Dataset Distillation），这是一种简单而强大的方法，利用Wasserstein度量来增强分布匹配。我们计算了预训练分类器特征的Wasserstein重心，以捕捉原始数据分布的关键特征。通过优化合成数据与该重心在特征空间中的对齐，并利用每个类别的BatchNorm统计信息来保留类内变化，WMDD保持了分布匹配方法的效率，同时在各种高分辨率数据集上取得了最新的成果。我们的大量实验证明了WMDD的有效性和适应性，突显了其在推动大规模机器学习应用的潜力。

更新时间: 2025-07-02 04:25:43

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.18531v3

Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations

We have developed Aitomia - a platform powered by AI to assist in performing AI-driven atomistic and quantum chemical (QC) simulations. This evolving intelligent assistant platform is equipped with chatbots and AI agents to help experts and guide non-experts in setting up and running the atomistic simulations, monitoring their computation status, analyzing the simulation results, and summarizing them for the user in text and graphical forms. We achieve these goals by exploiting open-source large language models (LLMs, original and fine-tuned), rule-based agents, and a retrieval-augmented generation (RAG) system. Aitomia leverages the versatility of our MLatom ecosystem, supporting AI-enhanced computational chemistry tasks ranging from ground- to excited-state calculations such as geometry optimizations, thermochemistry, and spectra calculations. Aitomia is the first intelligent assistant publicly accessible online on a cloud computing platform for atomistic simulations of broad scope (Aitomistic Hub at https://aitomistic.xyz), while it may also be deployed locally as described at http://mlatom.com/aitomia. Aitomia is expected to lower the barrier to performing atomistic simulations, democratizing simulations, and accelerating research and development in the relevant fields.

Updated: 2025-07-02 04:17:44

标题: Aitomia：您的智能助手，用于基于人工智能的原子和量子化学模拟

摘要: 我们开发了Aitomia - 一个由人工智能驱动的平台，用于辅助进行基于人工智能的原子尺度和量子化学（QC）模拟。这个不断发展的智能助手平台配备了聊天机器人和人工智能代理，帮助专家并指导非专家设置和运行原子尺度模拟，监控计算状态，分析模拟结果，并以文本和图形形式为用户总结。我们通过利用开源大型语言模型（LLMs，原始和微调）、基于规则的代理和检索增强生成（RAG）系统实现了这些目标。Aitomia利用了我们的MLatom生态系统的多功能性，支持从基态到激发态计算的AI增强计算化学任务，例如几何优化、热化学和光谱计算。Aitomia是第一个公开在线访问的云计算平台上的智能助手，用于广泛范围的原子尺度模拟（Aitomistic Hub at https://aitomistic.xyz），同时也可以按照http://mlatom.com/aitomia中描述的方式在本地部署。预计Aitomia将降低进行原子尺度模拟的门槛，使模拟民主化，并加速相关领域的研究和开发。

更新时间: 2025-07-02 04:17:44

领域: physics.comp-ph,cs.AI,cs.LG,cs.MA,physics.chem-ph

下载: http://arxiv.org/abs/2505.08195v2

Pre-training Large Memory Language Models with Internal and External Knowledge

Neural language models are black-boxes -- both linguistic patterns and factual knowledge are distributed across billions of opaque parameters. This entangled encoding makes it difficult to reliably inspect, verify, or update specific facts. We propose a new class of language models, Large Memory Language Models (LMLM) with a pre-training recipe that stores factual knowledge in both internal weights and an external database. Our approach strategically masks externally retrieved factual values from the training loss, thereby teaching the model to perform targeted lookups rather than relying on memorization in model weights. Our experiments demonstrate that LMLMs achieve competitive performance compared to significantly larger, knowledge-dense LLMs on standard benchmarks, while offering the advantages of explicit, editable, and verifiable knowledge bases. This work represents a fundamental shift in how language models interact with and manage factual knowledge.

Updated: 2025-07-02 04:16:51

标题: 使用内部和外部知识预训练大规模记忆语言模型

摘要: 神经语言模型是黑盒子——语言模式和事实知识都分布在数十亿个不透明参数中。这种纠缠的编码使得难以可靠地检查、验证或更新特定事实。我们提出了一种新的语言模型类别，即大型记忆语言模型（LMLM），其预训练配方将事实知识存储在内部权重和外部数据库中。我们的方法有策略地掩盖了从训练损失中检索的事实值，从而教导模型执行有针对性的查找，而不是依赖于模型权重中的记忆。我们的实验表明，与在标准基准上具有显著更大、知识密集的LLM相比，LMLM实现了竞争性的性能，同时提供了明确、可编辑和可验证的知识库的优势。这项工作代表了语言模型与管理事实知识的基本转变。

更新时间: 2025-07-02 04:16:51

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2505.15962v2

User-guided Generative Source Separation

Music source separation (MSS) aims to extract individual instrument sources from their mixture. While most existing methods focus on the widely adopted four-stem separation setup (vocals, bass, drums, and other instruments), this approach lacks the flexibility needed for real-world applications. To address this, we propose GuideSep, a diffusion-based MSS model capable of instrument-agnostic separation beyond the four-stem setup. GuideSep is conditioned on multiple inputs: a waveform mimicry condition, which can be easily provided by humming or playing the target melody, and mel-spectrogram domain masks, which offer additional guidance for separation. Unlike prior approaches that relied on fixed class labels or sound queries, our conditioning scheme, coupled with the generative approach, provides greater flexibility and applicability. Additionally, we design a mask-prediction baseline using the same model architecture to systematically compare predictive and generative approaches. Our objective and subjective evaluations demonstrate that GuideSep achieves high-quality separation while enabling more versatile instrument extraction, highlighting the potential of user participation in the diffusion-based generative process for MSS. Our code and demo page are available at https://yutongwen.github.io/GuideSep/

Updated: 2025-07-02 03:58:52

标题: 用户引导的生成源分离

摘要: 音乐源分离（MSS）旨在从混音中提取单独的乐器源。尽管大多数现有方法侧重于被广泛采用的四音轨分离设置（人声、低音、鼓和其他乐器），但这种方法缺乏实际应用所需的灵活性。为了解决这个问题，我们提出了GuideSep，这是一种基于扩散的MSS模型，能够实现超越四音轨设置的与乐器无关的分离。GuideSep以多个输入为条件：一种波形模仿条件，可以通过哼唱或演奏目标旋律来轻松提供，以及mel-spectrogram领域的掩模，为分离提供额外指导。与依赖固定类标签或声音查询的先前方法不同，我们的条件方案结合生成方法提供了更大的灵活性和适用性。此外，我们使用相同的模型架构设计了一个掩模预测基线，以系统地比较预测和生成方法。我们的客观和主观评估表明，GuideSep实现了高质量的分离，同时实现了更多样化的乐器提取，突显了用户参与扩散型生成过程在MSS中的潜力。我们的代码和演示页面可在https://yutongwen.github.io/GuideSep/ 上找到。

更新时间: 2025-07-02 03:58:52

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2507.01339v1

KatFishNet: Detecting LLM-Generated Korean Text through Linguistic Feature Analysis

The rapid advancement of large language models (LLMs) increases the difficulty of distinguishing between human-written and LLM-generated text. Detecting LLM-generated text is crucial for upholding academic integrity, preventing plagiarism, protecting copyrights, and ensuring ethical research practices. Most prior studies on detecting LLM-generated text focus primarily on English text. However, languages with distinct morphological and syntactic characteristics require specialized detection approaches. Their unique structures and usage patterns can hinder the direct application of methods primarily designed for English. Among such languages, we focus on Korean, which has relatively flexible spacing rules, a rich morphological system, and less frequent comma usage compared to English. We introduce KatFish, the first benchmark dataset for detecting LLM-generated Korean text. The dataset consists of text written by humans and generated by four LLMs across three genres. By examining spacing patterns, part-of-speech diversity, and comma usage, we illuminate the linguistic differences between human-written and LLM-generated Korean text. Building on these observations, we propose KatFishNet, a detection method specifically designed for the Korean language. KatFishNet achieves an average of 19.78% higher AUROC compared to the best-performing existing detection method. Our code and data are available at https://github.com/Shinwoo-Park/detecting_llm_generated_korean_text_through_linguistic_analysis.

Updated: 2025-07-02 03:55:41

标题: KatFishNet：通过语言特征分析检测LLM生成的韩文文本

摘要: 大型语言模型（LLMs）的快速发展增加了区分人类编写和LLM生成文本的难度。检测LLM生成的文本对维护学术诚信、防止抄袭、保护版权以及确保道德研究实践至关重要。大多数先前关于检测LLM生成文本的研究主要集中在英语文本上。然而，具有独特形态和句法特征的语言需要专门的检测方法。它们独特的结构和使用模式可能会阻碍为英语设计的方法的直接应用。在这些语言中，我们专注于韩语，相对灵活的间距规则，丰富的形态系统以及与英语相比较少的逗号使用。我们介绍了KatFish，用于检测LLM生成的韩语文本的第一个基准数据集。该数据集包括由人类编写和由四个LLMs生成的文本，涵盖三种流派。通过研究间距模式、词性多样性和逗号使用，我们阐明了人类编写和LLM生成的韩语文本之间的语言差异。基于这些观察结果，我们提出了KatFishNet，一种专门为韩语设计的检测方法。与表现最佳的现有检测方法相比，KatFishNet的平均AUROC提高了19.78%。我们的代码和数据可在https://github.com/Shinwoo-Park/detecting_llm_generated_korean_text_through_linguistic_analysis找到。

更新时间: 2025-07-02 03:55:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.00032v4

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization

State Space Models (SSMs), like recent Mamba2, have achieved remarkable performance and received extensive attention. However, deploying Mamba2 on resource-constrained edge devices encounters many problems: severe outliers within the linear layer challenging the quantization, diverse and irregular element-wise tensor operations, and hardware-unfriendly nonlinear functions in the SSM block. To address these issues, this paper presents FastMamba, a dedicated accelerator on FPGA with hardware-algorithm co-design to promote the deployment efficiency of Mamba2. Specifically, we successfully achieve 8-bit quantization for linear layers through Hadamard transformation to eliminate outliers. Moreover, a hardware-friendly and fine-grained power-of-two quantization framework is presented for the SSM block and convolution layer, and a first-order linear approximation is developed to optimize the nonlinear functions. Based on the accurate algorithm quantization, we propose an accelerator that integrates parallel vector processing units, pipelined execution dataflow, and an efficient SSM Nonlinear Approximation Unit, which enhances computational efficiency and reduces hardware complexity. Finally, we evaluate FastMamba on Xilinx VC709 FPGA. For the input prefill task on Mamba2-130M, FastMamba achieves 68.80\times and 8.90\times speedup over Intel Xeon 4210R CPU and NVIDIA RTX 3090 GPU, respectively. In the output decode experiment with Mamba2-2.7B, FastMamba attains 6\times higher energy efficiency than RTX 3090 GPU.

Updated: 2025-07-02 03:52:35

标题: FastMamba：一种在FPGA上具有准确量化的高速高效Mamba加速器

摘要: 状态空间模型（SSMs），如最近的Mamba2，取得了显著的性能并受到了广泛关注。然而，在资源受限的边缘设备上部署Mamba2会遇到许多问题：线性层内严重的异常值挑战了量化，元素间的张量操作多样且不规则，以及在SSM块中存在不友好的硬件非线性函数。为解决这些问题，本文提出了FastMamba，这是一个专用的FPGA加速器，具有硬件-算法协同设计，以提高Mamba2的部署效率。具体来说，我们通过Hadamard变换成功实现了线性层的8位量化，以消除异常值。此外，我们提出了一个硬件友好且细粒度的二的幂量化框架，用于SSM块和卷积层，并开发了一个一阶线性逼近来优化非线性函数。基于准确的算法量化，我们提出了一个集成了并行向量处理单元、流水线执行数据流和高效的SSM非线性逼近单元的加速器，以增强计算效率并减少硬件复杂性。最后，我们在Xilinx VC709 FPGA上评估了FastMamba。在Mamba2-130M的输入预填任务中，FastMamba相对于Intel Xeon 4210R CPU和NVIDIA RTX 3090 GPU分别实现了68.80倍和8.90倍的加速。在Mamba2-2.7B的输出解码实验中，FastMamba比RTX 3090 GPU实现了6倍更高的能效。

更新时间: 2025-07-02 03:52:35

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2505.18975v3

LEDOM: An Open and Fundamental Reverse Language Model

We introduce LEDOM, the first purely reverse language model, trained autoregressively on 435B tokens with 2B and 7B parameter variants, which processes sequences in reverse temporal order through previous token prediction. For the first time, we present the reverse language model as a potential foundational model across general tasks, accompanied by a set of intriguing examples and insights. Based on LEDOM, we further introduce a novel application: Reverse Reward, where LEDOM-guided reranking of forward language model outputs leads to substantial performance improvements on mathematical reasoning tasks. This approach leverages LEDOM's unique backward reasoning capability to refine generation quality through posterior evaluation. Our findings suggest that LEDOM exhibits unique characteristics with broad application potential. We will release all models, training code, and pre-training data to facilitate future research.

Updated: 2025-07-02 03:52:00

标题: LEDOM: 一个开放且基础的逆向语言模型

摘要: 我们介绍LEDOM，这是第一个纯粹的逆向语言模型，通过对435B个标记进行自回归训练，使用2B和7B参数变体，通过先前标记预测以逆向时间顺序处理序列。我们首次将逆向语言模型作为潜在的通用任务基础模型呈现，附带一组引人入胜的示例和见解。基于LEDOM，我们进一步介绍了一种新的应用：逆向奖励，其中LEDOM引导的对正向语言模型输出的重新排序在数学推理任务上带来了显著的性能改进。这种方法利用LEDOM独特的反向推理能力通过后验评估来提高生成质量。我们的发现表明LEDOM表现出具有广泛应用潜力的独特特征。我们将发布所有模型、训练代码和预训练数据，以促进未来的研究。

更新时间: 2025-07-02 03:52:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.01335v1

Real-is-Sim: Bridging the Sim-to-Real Gap with a Dynamic Digital Twin

We introduce real-is-sim, a new approach to integrating simulation into behavior cloning pipelines. In contrast to real-only methods, which lack the ability to safely test policies before deployment, and sim-to-real methods, which require complex adaptation to cross the sim-to-real gap, our framework allows policies to seamlessly switch between running on real hardware and running in parallelized virtual environments. At the center of real-is-sim is a dynamic digital twin, powered by the Embodied Gaussian simulator, that synchronizes with the real world at 60Hz. This twin acts as a mediator between the behavior cloning policy and the real robot. Policies are trained using representations derived from simulator states and always act on the simulated robot, never the real one. During deployment, the real robot simply follows the simulated robot's joint states, and the simulation is continuously corrected with real world measurements. This setup, where the simulator drives all policy execution and maintains real-time synchronization with the physical world, shifts the responsibility of crossing the sim-to-real gap to the digital twin's synchronization mechanisms, instead of the policy itself. We demonstrate real-is-sim on a long-horizon manipulation task (PushT), showing that virtual evaluations are consistent with real-world results. We further show how real-world data can be augmented with virtual rollouts and compare to policies trained on different representations derived from the simulator state including object poses and rendered images from both static and robot-mounted cameras. Our results highlight the flexibility of the real-is-sim framework across training, evaluation, and deployment stages. Videos available at https://real-is-sim.github.io.

Updated: 2025-07-02 03:27:04

标题: 真实-仿真：通过动态数字孪生技术弥合仿真与实际之间的差距

摘要: 我们介绍了real-is-sim，这是一种将模拟整合到行为克隆流程中的新方法。与仅使用真实数据的方法不同，这种方法缺乏在部署前安全测试策略的能力，而sim-to-real方法则需要复杂的适应来跨越模拟到真实的差距。我们的框架允许策略在真实硬件上和并行化虚拟环境中无缝切换运行。real-is-sim的核心是由具有60Hz同步功能的体现高斯模拟器驱动的动态数字孪生体。这个孪生体充当了行为克隆策略和真实机器人之间的中介。策略是使用从模拟器状态派生的表示进行训练，并且始终在模拟机器人上执行，而不是在真实机器人上执行。在部署期间，真实机器人仅需跟随模拟机器人的关节状态，并且通过真实世界的测量持续校正模拟。在这种设置中，模拟器驱动所有策略执行，并在物理世界中保持实时同步，将跨越模拟到真实的责任转移到了数字孪生体的同步机制，而不是策略本身。我们在一个长期操纵任务（PushT）上演示了real-is-sim，显示虚拟评估与真实世界结果一致。我们进一步展示了如何通过虚拟展开增强真实世界数据，并与从模拟器状态获取的不同表示训练的策略进行比较，包括对象姿态和来自静态和机器人安装摄像头的渲染图像。我们的结果突显了real-is-sim框架在训练、评估和部署阶段的灵活性。视频可在https://real-is-sim.github.io上获得。

更新时间: 2025-07-02 03:27:04

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2504.03597v2

Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding

Artificial intelligence (AI) is increasingly used in every stage of drug development. One challenge facing drug discovery AI is that drug pharmacokinetic (PK) datasets are often collected independently from each other, often with limited overlap, creating data overlap sparsity. Data sparsity makes data curation difficult for researchers looking to answer research questions in poly-pharmacy, drug combination research, and high-throughput screening. We propose Imagand, a novel SMILES-to-Pharmacokinetic (S2PK) diffusion model capable of generating an array of PK target properties conditioned on SMILES inputs. We show that Imagand-generated synthetic PK data closely resembles real data univariate and bivariate distributions, and improves performance for downstream tasks. Imagand is a promising solution for data overlap sparsity and allows researchers to efficiently generate ligand PK data for drug discovery research. Code is available at https://github.com/bing1100/Imagand.

Updated: 2025-07-02 03:26:25

标题: 药物发现SMILES到药代动力学扩散模型与深度分子理解

摘要: 人工智能（AI）越来越多地应用于药物开发的各个阶段。药物发现AI面临的一个挑战是，药物药代动力学（PK）数据通常是独立收集的，且通常存在有限的重叠，导致数据重叠稀疏性。数据稀疏性使得研究人员在多药物治疗、药物组合研究和高通量筛选中难以进行数据整理。我们提出了Imagand，一种新颖的SMILES-to-Pharmacokinetic（S2PK）扩散模型，能够在SMILES输入的条件下生成一系列PK目标属性。我们展示了Imagand生成的合成PK数据与真实数据的一元和二元分布密切相似，并提高了下游任务的性能。Imagand是解决数据重叠稀疏性的一种有前途的解决方案，并允许研究人员高效地为药物发现研究生成配体PK数据。代码可在https://github.com/bing1100/Imagand上找到。

更新时间: 2025-07-02 03:26:25

领域: q-bio.QM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.07636v2

Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning

Offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data. A central concern in this context is ensuring the safety of the learned policy by quantifying uncertainties associated with various actions and environmental stochasticity. Traditional approaches primarily emphasize mitigating epistemic uncertainty by learning risk-averse policies, often overlooking environmental stochasticity. In this study, we propose an uncertainty-aware distributional offline RL method to simultaneously address both epistemic uncertainty and environmental stochasticity. We propose a model-free offline RL algorithm capable of learning risk-averse policies and characterizing the entire distribution of discounted cumulative rewards, as opposed to merely maximizing the expected value of accumulated discounted returns. Our method is rigorously evaluated through comprehensive experiments in both risk-sensitive and risk-neutral benchmarks, demonstrating its superior performance.

Updated: 2025-07-02 03:26:21

标题: 离线强化学习中风险厌恶行为建模的扩散策略

摘要: 离线强化学习（RL）面临独特的挑战，因为它仅依赖于观察数据。在这种情况下的一个核心问题是通过量化与各种动作和环境随机性相关的不确定性来确保学习策略的安全性。传统方法主要强调通过学习风险规避策略来减轻认知不确定性，往往忽视了环境随机性。在这项研究中，我们提出了一种具有不确定性意识的分布式离线RL方法，以同时解决认知不确定性和环境随机性。我们提出了一种无模型的离线RL算法，能够学习风险规避策略并描述折扣累积奖励的整个分布，而不仅仅是最大化累积折扣回报的期望值。通过在风险敏感和风险中性基准测试中进行全面实验，我们严格评估了我们的方法，展示了其优越的性能。

更新时间: 2025-07-02 03:26:21

领域: cs.LG

下载: http://arxiv.org/abs/2403.17646v2

Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy

Detecting abnormal events in real-world customer service dialogues is highly challenging due to the complexity of business data and the dynamic nature of customer interactions. Moreover, models must demonstrate strong out-of-domain (OOD) generalization to enable rapid adaptation across different business scenarios and maximize commercial value. In this work, we propose a novel Adaptive Perplexity-Aware Reinforcement Learning (APARL) framework that leverages the advanced reasoning capabilities of large language models for abnormal event detection. APARL introduces a dual-loop dynamic curriculum learning architecture, enabling the model to progressively focus on more challenging samples as its proficiency increases. This design effectively addresses performance bottlenecks and significantly enhances OOD transferability. Extensive evaluations on food delivery dialogue tasks show that our model achieves significantly enhanced adaptability and robustness, attaining the highest F1 score with an average improvement of 17.19\%, and an average improvement of 9.59\% in OOD transfer tests. This method provides a superior solution for industrial deployment of anomaly detection models, contributing to improved operational efficiency and commercial benefits.

Updated: 2025-07-02 03:26:02

标题: 现实世界事件检测的推理器：通过自适应困惑感知抽样策略扩展强化学习

摘要: 在真实世界的客户服务对话中检测异常事件具有极高的挑战性，这是由于商业数据的复杂性和客户互动的动态性。此外，模型必须展示出色的域外泛化能力，以实现在不同业务场景下的快速适应并最大化商业价值。在这项工作中，我们提出了一种新颖的自适应困惑感知强化学习（APARL）框架，利用大型语言模型的先进推理能力进行异常事件检测。APARL引入了双循环动态课程学习架构，使模型在熟练度提高时逐渐专注于更具挑战性的样本。这种设计有效地解决了性能瓶颈，并显著增强了域外传递能力。在食品配送对话任务上进行了广泛评估，结果显示我们的模型实现了显著提升的适应性和稳健性，取得了最高的F1分数，平均改善了17.19\%，在域外传递测试中平均提升了9.59\%。该方法为异常检测模型的工业部署提供了更优越的解决方案，有助于提高运营效率和商业利益。

更新时间: 2025-07-02 03:26:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01327v1

SWinMamba: Serpentine Window State Space Model for Vascular Segmentation

Vascular segmentation in medical images is crucial for disease diagnosis and surgical navigation. However, the segmented vascular structure is often discontinuous due to its slender nature and inadequate prior modeling. In this paper, we propose a novel Serpentine Window Mamba (SWinMamba) to achieve accurate vascular segmentation. The proposed SWinMamba innovatively models the continuity of slender vascular structures by incorporating serpentine window sequences into bidirectional state space models. The serpentine window sequences enable efficient feature capturing by adaptively guiding global visual context modeling to the vascular structure. Specifically, the Serpentine Window Tokenizer (SWToken) adaptively splits the input image using overlapping serpentine window sequences, enabling flexible receptive fields (RFs) for vascular structure modeling. The Bidirectional Aggregation Module (BAM) integrates coherent local features in the RFs for vascular continuity representation. In addition, dual-domain learning with Spatial-Frequency Fusion Unit (SFFU) is designed to enhance the feature representation of vascular structure. Extensive experiments on three challenging datasets demonstrate that the proposed SWinMamba achieves superior performance with complete and connected vessels.

Updated: 2025-07-02 03:15:08

标题: SWinMamba：蛇纹窗口状态空间模型用于血管分割

摘要: 在医学图像中进行血管分割对于疾病诊断和手术导航至关重要。然而，由于血管结构纤细且先前建模不足，分割后的血管结构通常是不连续的。在本文中，我们提出了一种新颖的Serpentine Window Mamba (SWinMamba)方法来实现准确的血管分割。提出的SWinMamba通过将蛇形窗口序列纳入双向状态空间模型，创新地对纤细血管结构的连续性进行建模。蛇形窗口序列通过自适应地引导全局视觉上下文建模到血管结构，实现了高效的特征捕获。具体而言，Serpentine Window Tokenizer (SWToken)通过重叠的蛇形窗口序列自适应地分割输入图像，为血管结构建模提供了灵活的感知域。双向聚集模块（BAM）整合了感知域内的连贯局部特征，用于血管连续性表示。此外，通过空间-频率融合单元（SFFU）进行双域学习，以增强血管结构的特征表示。在三个具有挑战性的数据集上进行了大量实验证明，所提出的SWinMamba方法实现了具有完整和连接的血管的卓越性能。

更新时间: 2025-07-02 03:15:08

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.01323v1

ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks

In-context learning (ICL) has demonstrated remarkable success in large language models (LLMs) due to its adaptability and parameter-free nature. However, it also introduces a critical vulnerability to backdoor attacks, where adversaries can manipulate LLM behaviors by simply poisoning a few ICL demonstrations. In this paper, we propose, for the first time, the dual-learning hypothesis, which posits that LLMs simultaneously learn both the task-relevant latent concepts and backdoor latent concepts within poisoned demonstrations, jointly influencing the probability of model outputs. Through theoretical analysis, we derive an upper bound for ICL backdoor effects, revealing that the vulnerability is dominated by the concept preference ratio between the task and the backdoor. Motivated by these findings, we propose ICLShield, a defense mechanism that dynamically adjusts the concept preference ratio. Our method encourages LLMs to select clean demonstrations during the ICL phase by leveraging confidence and similarity scores, effectively mitigating susceptibility to backdoor attacks. Extensive experiments across multiple LLMs and tasks demonstrate that our method achieves state-of-the-art defense effectiveness, significantly outperforming existing approaches (+26.02% on average). Furthermore, our method exhibits exceptional adaptability and defensive performance even for closed-source models (e.g., GPT-4).

Updated: 2025-07-02 03:09:20

标题: ICLShield：探索和缓解上下文学习后门攻击

摘要: 上下文学习（ICL）在大型语言模型（LLMs）中取得了显著的成功，这归功于其适应性和无需参数的特性。然而，它也引入了一种重要的弱点，即后门攻击，对手可以通过简单地篡改一些ICL演示来操纵LLM的行为。在本文中，我们首次提出了双学习假设，认为LLMs同时学习了任务相关的潜在概念和被污染演示中的后门潜在概念，共同影响模型输出的概率。通过理论分析，我们推导出了ICL后门效应的上界，揭示了这种脆弱性主要受到任务和后门之间概念偏好比率的影响。受到这些发现的启发，我们提出了ICLShield，一种动态调整概念偏好比率的防御机制。我们的方法通过利用置信度和相似度分数，在ICL阶段鼓励LLMs选择干净的演示，有效减轻对后门攻击的敏感性。跨多个LLMs和任务进行的广泛实验表明，我们的方法实现了最先进的防御效果，明显优于现有方法（平均增加了26.02%）。此外，我们的方法即使对于闭源模型（例如GPT-4）也表现出卓越的适应性和防御性能。

更新时间: 2025-07-02 03:09:20

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2507.01321v1

Fourier Series Guided Design of Quantum Convolutional Neural Networks for Enhanced Time Series Forecasting

In this study, we apply 1D quantum convolution to address the task of time series forecasting. By encoding multiple points into the quantum circuit to predict subsequent data, each point becomes a feature, transforming the problem into a multidimensional one. Building on theoretical foundations from prior research, which demonstrated that Variational Quantum Circuits (VQCs) can be expressed as multidimensional Fourier series, we explore the capabilities of different architectures and ansatz. This analysis considers the concepts of circuit expressibility and the presence of barren plateaus. Analyzing the problem within the framework of the Fourier series enabled the design of an architecture that incorporates data reuploading, resulting in enhanced performance. Rather than a strict requirement for the number of free parameters to exceed the degrees of freedom of the Fourier series, our findings suggest that even a limited number of parameters can produce Fourier functions of higher degrees. This highlights the remarkable expressive power of quantum circuits. This observation is also significant in reducing training times. The ansatz with greater expressibility and number of non-zero Fourier coefficients consistently delivers favorable results across different scenarios, with performance metrics improving as the number of qubits increases.

Updated: 2025-07-02 03:04:19

标题: 傅立叶级数指导的量子卷积神经网络设计，用于增强时间序列预测

摘要: 在这项研究中，我们应用1D量子卷积来解决时间序列预测的任务。通过将多个点编码到量子电路中以预测后续数据，每个点都成为一个特征，将问题转化为多维问题。借鉴先前研究的理论基础，该研究表明变分量子电路（VQCs）可以表示为多维傅里叶级数，我们探索了不同架构和参数化形式的能力。这项分析考虑了电路的可表达性和贫瘠平原的存在。在傅里叶级数的框架内分析问题，设计了一个结合数据重新上传的架构，从而提高了性能。我们的研究表明，并不需要自由参数的数量超过傅里叶级数的自由度，即使有限数量的参数也可以产生更高阶的傅里叶函数。这突显了量子电路的出色表达能力。这一观察结果也在减少训练时间方面具有重要意义。具有更高表达能力和非零傅里叶系数数量的参数化形式，在不同情景下始终提供有利的结果，性能指标随着量子比特数量的增加而改善。

更新时间: 2025-07-02 03:04:19

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2404.15377v3

Neural Hamiltonian Operator

Stochastic control problems in high dimensions are notoriously difficult to solve due to the curse of dimensionality. An alternative to traditional dynamic programming is Pontryagin's Maximum Principle (PMP), which recasts the problem as a system of Forward-Backward Stochastic Differential Equations (FBSDEs). In this paper, we introduce a formal framework for solving such problems with deep learning by defining a \textbf{Neural Hamiltonian Operator (NHO)}. This operator parameterizes the coupled FBSDE dynamics via neural networks that represent the feedback control and an ansatz for the value function's spatial gradient. We show how the optimal NHO can be found by training the underlying networks to enforce the consistency conditions dictated by the PMP. By adopting this operator-theoretic view, we situate the deep FBSDE method within the rigorous language of statistical inference, framing it as a problem of learning an unknown operator from simulated data. This perspective allows us to prove the universal approximation capabilities of NHOs under general martingale drivers and provides a clear lens for analyzing the significant optimization challenges inherent to this class of models.

Updated: 2025-07-02 02:56:49

标题: 神经哈密顿算符

摘要: 高维随机控制问题因维度灾难而难以解决。庞特里亚金最大值原理（PMP）是传统动态规划的一种替代方法，将问题重新构建为前后随机微分方程系统（FBSDEs）。本文介绍了一个用深度学习解决这类问题的形式框架，通过定义一个\textbf{神经哈密顿算子（NHO）}。这个算子通过神经网络参数化耦合的FBSDE动态，代表反馈控制和值函数空间梯度的猜测。我们展示了如何通过训练底层网络以强制执行PMP规定的一致性条件来找到最优NHO。通过采用这种算子理论观点，我们将深度FBSDE方法置于统计推断的严格语言中，将其构建为从模拟数据中学习未知算子的问题。这种观点使我们能够证明NHO在一般鞅驱动器下的通用逼近能力，并提供了一个清晰的视角来分析这类模型固有的显著优化挑战。

更新时间: 2025-07-02 02:56:49

领域: cs.LG,cs.AI,math.DS,math.OC

下载: http://arxiv.org/abs/2507.01313v1

DiffusionLight-Turbo: Accelerated Light Probes for Free via Single-Pass Chrome Ball Inpainting

We introduce a simple yet effective technique for estimating lighting from a single low-dynamic-range (LDR) image by reframing the task as a chrome ball inpainting problem. This approach leverages a pre-trained diffusion model, Stable Diffusion XL, to overcome the generalization failures of existing methods that rely on limited HDR panorama datasets. While conceptually simple, the task remains challenging because diffusion models often insert incorrect or inconsistent content and cannot readily generate chrome balls in HDR format. Our analysis reveals that the inpainting process is highly sensitive to the initial noise in the diffusion process, occasionally resulting in unrealistic outputs. To address this, we first introduce DiffusionLight, which uses iterative inpainting to compute a median chrome ball from multiple outputs to serve as a stable, low-frequency lighting prior that guides the generation of a high-quality final result. To generate high-dynamic-range (HDR) light probes, an Exposure LoRA is fine-tuned to create LDR images at multiple exposure values, which are then merged. While effective, DiffusionLight is time-intensive, requiring approximately 30 minutes per estimation. To reduce this overhead, we introduce DiffusionLight-Turbo, which reduces the runtime to about 30 seconds with minimal quality loss. This 60x speedup is achieved by training a Turbo LoRA to directly predict the averaged chrome balls from the iterative process. Inference is further streamlined into a single denoising pass using a LoRA swapping technique. Experimental results that show our method produces convincing light estimates across diverse settings and demonstrates superior generalization to in-the-wild scenarios. Our code is available at https://diffusionlight.github.io/turbo

Updated: 2025-07-02 02:47:01

标题: DiffusionLight-Turbo：通过单次通道铬球修复加速光探测器

摘要: 我们介绍了一种简单而有效的技术，通过将任务重新定义为chrome球修补问题，从而从单个低动态范围（LDR）图像中估计照明。这种方法利用了预先训练的扩散模型Stable Diffusion XL，以克服现有方法在依赖有限HDR全景数据集时的泛化失败。尽管在概念上简单，但任务仍然具有挑战性，因为扩散模型通常会插入不正确或不一致的内容，并且不能轻松地生成HDR格式的chrome球。我们的分析表明，修补过程对扩散过程中的初始噪声非常敏感，有时会导致不现实的输出。为了解决这个问题，我们首先引入了DiffusionLight，它使用迭代修补来计算多个输出的中位chrome球，作为稳定的低频照明先验，指导生成高质量的最终结果。为了生成高动态范围（HDR）光探测器，我们对Exposure LoRA进行了微调，以在多个曝光值下创建LDR图像，然后将其合并。虽然有效，DiffusionLight耗时，每次估计约需30分钟。为了减少这种开销，我们引入了DiffusionLight-Turbo，将运行时间减少到约30秒，质量损失最小。通过训练Turbo LoRA直接预测迭代过程中的平均chrome球，实现了60倍的加速。推断进一步简化为使用LoRA交换技术进行单次去噪。实验结果表明，我们的方法在各种设置中产生令人信服的光估计，并且在野外场景中表现出优越的泛化能力。我们的代码可在https://diffusionlight.github.io/turbo上找到。

更新时间: 2025-07-02 02:47:01

领域: cs.CV,cs.GR,cs.LG,I.3.3; I.4.8

下载: http://arxiv.org/abs/2507.01305v1

MMLU-Reason: Benchmarking Multi-Task Multi-modal Language Understanding and Reasoning

Recent advances in Multi-Modal Large Language Models (MLLMs) have enabled unified processing of language, vision, and structured inputs, opening the door to complex tasks such as logical deduction, spatial reasoning, and scientific analysis. Despite their promise, the reasoning capabilities of MLLMs, particularly those augmented with intermediate thinking traces (MLLMs-T), remain poorly understood and lack standardized evaluation benchmarks. Existing work focuses primarily on perception or final answer correctness, offering limited insight into how models reason or fail across modalities. To address this gap, we introduce the MMLU-Reason, a new benchmark designed to rigorously evaluate multi-modal reasoning with explicit thinking. The MMLU-Reason comprises 1) a high-difficulty dataset of 1,083 questions spanning six diverse reasoning types with symbolic depth and multi-hop demands and 2) a modular Reasoning Trace Evaluation Pipeline (RTEP) for assessing reasoning quality beyond accuracy through metrics like relevance, consistency, and structured error annotations. Empirical results show that MLLMs-T overall outperform non-thinking counterparts, but even top models like Claude-3.7-Sonnet and Gemini-2.5 Pro suffer from reasoning pathologies such as inconsistency and overthinking. This benchmark reveals persistent gaps between accuracy and reasoning quality and provides an actionable evaluation pipeline for future model development. Overall, the MMLU-Reason offers a scalable foundation for evaluating, comparing, and improving the next generation of multi-modal reasoning systems.

Updated: 2025-07-02 02:32:49

标题: MMLU-Reason: 多任务多模态语言理解与推理的基准测试

摘要: 最近在多模态大型语言模型（MLLMs）方面取得的进展已经实现了统一处理语言、视觉和结构化输入，为逻辑推理、空间推理和科学分析等复杂任务打开了大门。尽管它们具有潜力，但MLLMs的推理能力，特别是那些附加了中间思维痕迹（MLLMs-T）的模型，仍然不为人了解，缺乏标准化的评估基准。现有工作主要关注感知或最终答案的正确性，对模型在各种模式下推理或失败的情况了解有限。为了填补这一空白，我们引入了MMLU-Reason，这是一个旨在严格评估具有明确思维的多模态推理的新基准。MMLU-Reason包括：1）一个高难度数据集，包含1,083个问题，涵盖六种不同的推理类型，具有符号深度和多跳要求；2）一个模块化的推理痕迹评估管道（RTEP），用于通过相关性、一致性和结构化错误注释等指标评估推理质量。实证结果表明，MLLMs-T总体上优于非思维对应物，但即使像Claude-3.7-Sonnet和Gemini-2.5 Pro这样的顶级模型也存在不一致和过度思考等推理病理问题。这一基准揭示了准确性与推理质量之间持续存在的差距，并为未来模型发展提供可操作的评估管道。总的来说，MMLU-Reason为评估、比较和改进下一代多模态推理系统提供了一个可扩展的基础。

更新时间: 2025-07-02 02:32:49

领域: cs.AI

下载: http://arxiv.org/abs/2505.16459v4

Empirical Bayes Estimation for Lasso-Type Regularizers: Analysis of Automatic Relevance Determination

This paper focuses on linear regression models with non-conjugate sparsity-inducing regularizers such as lasso and group lasso. Although the empirical Bayes approach enables us to estimate the regularization parameter, little is known on the properties of the estimators. In particular, many aspects regarding the specific conditions under which the mechanism of automatic relevance determination (ARD) occurs remain unexplained. In this paper, we derive the empirical Bayes estimators for the group lasso regularized linear regression models with limited parameters. It is shown that the estimators diverge under a specific condition, giving rise to the ARD mechanism. We also prove that empirical Bayes methods can produce the ARD mechanism in general regularized linear regression models and clarify the conditions under which models such as ridge, lasso, and group lasso can do so.

Updated: 2025-07-02 02:29:00

标题: Lasso类型正则化器的经验贝叶斯估计：自动相关性确定分析

摘要: 这篇论文关注具有非共轭稀疏诱导正则化器（如套索和组套索）的线性回归模型。虽然经验贝叶斯方法使我们能够估计正则化参数，但对估计量的性质了解甚少。特别是，关于自动相关性确定（ARD）机制发生的具体条件的许多方面仍然没有解释。在本文中，我们推导了具有有限参数的组套索正则化线性回归模型的经验贝叶斯估计量。结果表明，在特定条件下，估计量会发散，从而产生ARD机制。我们还证明经验贝叶斯方法可以在一般正则化线性回归模型中产生ARD机制，并澄清岭回归、套索和组套索等模型能够做到这一点的条件。

更新时间: 2025-07-02 02:29:00

领域: math.ST,cs.IT,cs.LG,math.IT,stat.TH

下载: http://arxiv.org/abs/2501.11280v4

Game-Theoretic Modeling of Vehicle Unprotected Left Turns Considering Drivers' Bounded Rationality

Modeling the decision-making behavior of vehicles presents unique challenges, particularly during unprotected left turns at intersections, where the uncertainty of human drivers is especially pronounced. In this context, connected autonomous vehicle (CAV) technology emerges as a promising avenue for effectively managing such interactions while ensuring safety and efficiency. Traditional approaches, often grounded in game theory assumptions of perfect rationality, may inadequately capture the complexities of real-world scenarios and drivers' decision-making errors. To fill this gap, we propose a novel decision-making model for vehicle unprotected left-turn scenarios, integrating game theory with considerations for drivers' bounded rationality. Our model, formulated as a two-player normal-form game solved by a quantal response equilibrium (QRE), offers a more nuanced depiction of driver decision-making processes compared to Nash equilibrium (NE) models. Leveraging an Expectation-Maximization (EM) algorithm coupled with a subtle neural network trained on precise microscopic vehicle trajectory data, we optimize model parameters to accurately reflect drivers' interaction-aware bounded rationality and driving styles. Through comprehensive simulation experiments, we demonstrate the efficacy of our proposed model in capturing the interaction-aware bounded rationality and decision tendencies between players. The proposed model proves to be more realistic and efficient than NE models in unprotected left-turn scenarios. Our findings contribute valuable insights into the vehicle decision-making behaviors with bounded rationality, thereby informing the development of more robust and realistic autonomous driving systems.

Updated: 2025-07-02 02:22:11

标题: 博弈论模型下考虑驾驶员有限理性的车辆左转无保护转弯行为

摘要: 模拟车辆决策行为面临着独特的挑战，尤其是在交叉口进行无保护左转时，人类驾驶员的不确定性尤为明显。在这种情况下，连接自动驾驶车辆（CAV）技术成为有效管理这种交互并确保安全和效率的有希望的途径。传统方法通常基于完美理性的博弈论假设，可能无法充分捕捉现实场景和驾驶员决策错误的复杂性。为了填补这一空白，我们提出了一种新颖的车辆无保护左转场景决策模型，将博弈论与考虑驾驶员有限理性的因素结合起来。我们的模型被构建为一个由量子响应均衡（QRE）解决的两人正态形式博弈，相比纳什均衡（NE）模型，提供了对驾驶员决策过程更细致的描绘。利用期望最大化（EM）算法结合对精确微观车辆轨迹数据进行训练的微妙神经网络，我们优化模型参数以准确反映驾驶员的交互感知有限理性和驾驶风格。通过全面的模拟实验，我们展示了我们提出的模型在捕捉玩家之间的交互感知有限理性和决策倾向方面的有效性。提出的模型在无保护左转场景中证明比NE模型更加现实和高效。我们的研究结果为具有有限理性的车辆决策行为提供了宝贵的见解，从而为更健壮和现实的自动驾驶系统的发展提供了信息。

更新时间: 2025-07-02 02:22:11

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2507.03002v1

Hardness of Quantum Distribution Learning and Quantum Cryptography

The existence of one-way functions (OWFs) forms the minimal assumption in classical cryptography. However, this is not necessarily the case in quantum cryptography. One-way puzzles (OWPuzzs), introduced by Khurana and Tomer, provide a natural quantum analogue of OWFs. The existence of OWPuzzs implies $PP\neq BQP$, while the converse remains open. In classical cryptography, the analogous problem-whether OWFs can be constructed from $P \neq NP$-has long been studied from the viewpoint of hardness of learning. Hardness of learning in various frameworks (including PAC learning) has been connected to OWFs or to $P \neq NP$. In contrast, no such characterization previously existed for OWPuzzs. In this paper, we establish the first complete characterization of OWPuzzs based on the hardness of a well-studied learning model: distribution learning. Specifically, we prove that OWPuzzs exist if and only if proper quantum distribution learning is hard on average. A natural question that follows is whether the worst-case hardness of proper quantum distribution learning can be derived from $PP \neq BQP$. If so, and a worst-case to average-case hardness reduction is achieved, it would imply OWPuzzs solely from $PP \neq BQP$. However, we show that this would be extremely difficult: if worst-case hardness is PP-hard (in a black-box reduction), then $SampBQP \neq SampBPP$ follows from the infiniteness of the polynomial hierarchy. Despite that, we show that $PP \neq BQP$ is equivalent to another standard notion of hardness of learning: agnostic. We prove that $PP \neq BQP$ if and only if agnostic quantum distribution learning with respect to KL divergence is hard. As a byproduct, we show that hardness of agnostic quantum distribution learning with respect to statistical distance against $PPT^{\Sigma_3^P}$ learners implies $SampBQP \neq SampBPP$.

Updated: 2025-07-02 02:12:38

标题: 量子分布学习和量子密码学的难度

摘要: 一次函数（OWFs）的存在形成了经典密码学中的最小假设。然而，在量子密码学中并非总是如此。Khurana和Tomer引入的一次难题（OWPuzzs）提供了OWFs的自然量子类比。OWPuzzs的存在意味着$PP \neq BQP$，反之尚未确定。在经典密码学中，类似问题-OWFs是否可以从$P \neq NP$构建-长期以来一直从学习难度的角度进行研究。在各种框架中（包括PAC学习），学习难度与OWFs或$P \neq NP$相关联。相比之下，以前不存在OWPuzzs的这种表征。在这篇论文中，我们基于一个经过深入研究的学习模型：分布学习，建立了对OWPuzzs的第一个完整表征。具体来说，我们证明了如果在平均情况下量子分布学习困难，那么OWPuzzs存在。随之而来的一个自然问题是，是否可以从$PP \neq BQP$推导出适当的量子分布学习的最坏情况困难性。如果可以，并且实现了最坏情况到平均情况的困难性减少，那么这将意味着仅从$PP \neq BQP$推导出OWPuzzs。然而，我们证明这将非常困难：如果最坏情况的困难性是PP难的（在黑盒约简中），那么从多项式层次无限性可推导出$SampBQP \neq SampBPP$。尽管如此，我们证明了$PP \neq BQP$等价于学习难度的另一个标准概念：懵懂。我们证明了只有当相对于KL散度的懵懂量子分布学习困难时，$PP \neq BQP$。作为副产品，我们证明了相对于$PPT^{\Sigma_3^P}$学习者的懵懂量子分布学习在统计距离上的困难性意味着$SampBQP \neq SampBPP$。

更新时间: 2025-07-02 02:12:38

领域: quant-ph,cs.CC,cs.CR

下载: http://arxiv.org/abs/2507.01292v1

OralBBNet: Spatially Guided Dental Segmentation of Panoramic X-Rays with Bounding Box Priors

Teeth segmentation and recognition play a vital role in a variety of dental applications and diagnostic procedures. The integration of deep learning models has facilitated the development of precise and automated segmentation methods. Although prior research has explored teeth segmentation, not many methods have successfully performed tooth segmentation and detection simultaneously. This study presents UFBA-425, a dental dataset derived from the UFBA-UESC dataset, featuring bounding box and polygon annotations for 425 panoramic dental X-rays. In addition, this paper presents the OralBBNet architecture, which is based on the best segmentation and detection qualities of architectures such as U-Net and YOLOv8, respectively. OralBBNet is designed to improve the accuracy and robustness of tooth classification and segmentation on panoramic X-rays by leveraging the complementary strengths of U-Net and YOLOv8. Our approach achieved a 1-3% improvement in mean average precision (mAP) for tooth detection compared to existing techniques and a 15-20% improvement in the dice score for teeth segmentation over state-of-the-art (SOTA) solutions for various tooth categories and 2-4% improvement in the dice score compared to other SOTA segmentation architectures. The results of this study establish a foundation for the wider implementation of object detection models in dental diagnostics.

Updated: 2025-07-02 02:11:49

标题: OralBBNet：具有边界框先验的全景X射线牙科分割的空间引导

摘要: 牙齿分割和识别在各种牙科应用和诊断程序中发挥着至关重要的作用。深度学习模型的整合促进了精确和自动化分割方法的发展。尽管先前的研究已经探索了牙齿分割，但并不多的方法成功地同时执行了牙齿分割和检测。本研究介绍了UFBA-425，这是从UFBA-UESC数据集中衍生出的一个牙科数据集，包含了对425张全景牙齿X光片的边界框和多边形注释。此外，本文介绍了基于U-Net和YOLOv8等架构的最佳分割和检测质量的OralBBNet架构。OralBBNet旨在通过利用U-Net和YOLOv8的互补优势，改进全景X光片上牙齿分类和分割的准确性和稳健性。与现有技术相比，我们的方法在牙齿检测的平均精度(mAP)方面实现了1-3%的改进，并在牙齿分割的骰子分数上比各种牙齿类别的最新技术解决方案提高了15-20%，并与其他最先进的分割架构相比有2-4%的改进。本研究的结果为在牙科诊断中更广泛地实施目标检测模型奠定了基础。

更新时间: 2025-07-02 02:11:49

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.03747v3

Far From Sight, Far From Mind: Inverse Distance Weighting for Graph Federated Recommendation

Graph federated recommendation systems offer a privacy-preserving alternative to traditional centralized recommendation architectures, which often raise concerns about data security. While federated learning enables personalized recommendations without exposing raw user data, existing aggregation methods overlook the unique properties of user embeddings in this setting. Indeed, traditional aggregation methods fail to account for their complexity and the critical role of user similarity in recommendation effectiveness. Moreover, evolving user interactions require adaptive aggregation while preserving the influence of high-relevance anchor users (the primary users before expansion in graph-based frameworks). To address these limitations, we introduce Dist-FedAvg, a novel distance-based aggregation method designed to enhance personalization and aggregation efficiency in graph federated learning. Our method assigns higher aggregation weights to users with similar embeddings, while ensuring that anchor users retain significant influence in local updates. Empirical evaluations on multiple datasets demonstrate that Dist-FedAvg consistently outperforms baseline aggregation techniques, improving recommendation accuracy while maintaining seamless integration into existing federated learning frameworks.

Updated: 2025-07-02 01:57:58

标题: 远离视线，远离思绪：逆距离加权用于图联合推荐

摘要: 图形联邦推荐系统提供了一种保护隐私的替代方案，与传统的集中式推荐架构相比，后者往往会引发数据安全问题。虽然联邦学习可以在不暴露原始用户数据的情况下实现个性化推荐，但现有的聚合方法忽视了用户嵌入在这种情况下的独特属性。事实上，传统的聚合方法未能考虑到用户嵌入的复杂性以及用户相似性在推荐效果中的关键作用。此外，不断变化的用户互动需要适应性聚合，同时保留高相关性锚定用户（在基于图的框架中扩展之前的主要用户）的影响力。为了解决这些限制，我们引入了一种新颖的基于距离的聚合方法Dist-FedAvg，旨在增强图形联邦学习中的个性化和聚合效率。我们的方法将更高的聚合权重分配给具有相似嵌入的用户，同时确保锚定用户在本地更新中保留重要影响力。对多个数据集进行的实证评估表明，Dist-FedAvg始终优于基线聚合技术，提高了推荐准确性，同时保持了与现有联邦学习框架的无缝集成。

更新时间: 2025-07-02 01:57:58

领域: cs.LG,cs.DC,cs.IR

下载: http://arxiv.org/abs/2507.01285v1

VLAD: A VLM-Augmented Autonomous Driving Framework with Hierarchical Planning and Interpretable Decision Process

Recent advancements in open-source Visual Language Models (VLMs) such as LLaVA, Qwen-VL, and Llama have catalyzed extensive research on their integration with diverse systems. The internet-scale general knowledge encapsulated within these models presents significant opportunities for enhancing autonomous driving perception, prediction, and planning capabilities. In this paper we propose VLAD, a vision-language autonomous driving model, which integrates a fine-tuned VLM with VAD, a state-of-the-art end-to-end system. We implement a specialized fine-tuning approach using custom question-answer datasets designed specifically to improve the spatial reasoning capabilities of the model. The enhanced VLM generates high-level navigational commands that VAD subsequently processes to guide vehicle operation. Additionally, our system produces interpretable natural language explanations of driving decisions, thereby increasing transparency and trustworthiness of the traditionally black-box end-to-end architecture. Comprehensive evaluation on the real-world nuScenes dataset demonstrates that our integrated system reduces average collision rates by 31.82% compared to baseline methodologies, establishing a new benchmark for VLM-augmented autonomous driving systems.

Updated: 2025-07-02 01:52:40

标题: VLAD：具有分层规划和可解释决策过程的VLM增强自主驾驶框架

摘要: 最近开源视觉语言模型（VLMs）的进展，如LLaVA、Qwen-VL和Llama，催生了与不同系统整合的广泛研究。这些模型中包含的互联网规模的通用知识为增强自动驾驶的感知、预测和规划能力提供了重要机会。本文提出了VLAD，一个视觉语言自动驾驶模型，该模型将经过精细调整的VLM与VAD，一个最先进的端到端系统集成在一起。我们采用专门设计的自定义问答数据集实施了一种专门的微调方法，旨在提高模型的空间推理能力。增强的VLM生成高级导航命令，VAD随后处理以指导车辆操作。此外，我们的系统生成可解释的自然语言解释驾驶决策，从而提高了传统黑匣子端到端架构的透明度和可信度。在真实世界的nuScenes数据集上进行的全面评估表明，我们的集成系统与基准方法相比，将平均碰撞率降低了31.82%，为VLM增强的自动驾驶系统建立了一个新的基准。

更新时间: 2025-07-02 01:52:40

领域: cs.RO,cs.AI,cs.CV,cs.ET,cs.LG

下载: http://arxiv.org/abs/2507.01284v1

Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care

The recent boom of large language models (LLMs) has re-ignited the hope that artificial intelligence (AI) systems could aid medical diagnosis. Yet despite dazzling benchmark scores, LLM assistants have yet to deliver measurable improvements at the bedside. This scoping review aims to highlight the areas where AI is limited to make practical contributions in the clinical setting, specifically in dementia diagnosis and care. Standalone machine-learning models excel at pattern recognition but seldom provide actionable, interpretable guidance, eroding clinician trust. Adjacent use of LLMs by physicians did not result in better diagnostic accuracy or speed. Key limitations trace to the data-driven paradigm: black-box outputs which lack transparency, vulnerability to hallucinations, and weak causal reasoning. Hybrid approaches that combine statistical learning with expert rule-based knowledge, and involve clinicians throughout the process help bring back interpretability. They also fit better with existing clinical workflows, as seen in examples like PEIRS and ATHENA-CDS. Future decision-support should prioritise explanatory coherence by linking predictions to clinically meaningful causes. This can be done through neuro-symbolic or hybrid AI that combines the language ability of LLMs with human causal expertise. AI researchers have addressed this direction, with explainable AI and neuro-symbolic AI being the next logical steps in further advancement in AI. However, they are still based on data-driven knowledge integration instead of human-in-the-loop approaches. Future research should measure success not only by accuracy but by improvements in clinician understanding, workflow fit, and patient outcomes. A better understanding of what helps improve human-computer interactions is greatly needed for AI systems to become part of clinical practice.

Updated: 2025-07-02 01:43:06

标题: 超越黑盒人工智能：可解释的混合系统用于痴呆症护理

摘要: 最近大型语言模型（LLMs）的兴起重新点燃了人们对人工智能（AI）系统能够帮助医学诊断的希望。然而，尽管令人瞩目的基准分数，LLM助手尚未在床边提供可衡量的改进。本篇范围审查旨在凸显AI受限于在临床环境中做出实际贡献的领域，特别是在痴呆症的诊断和护理方面。独立的机器学习模型擅长模式识别，但很少提供可操作、可解释的指导，破坏了临床医生的信任。医生对LLM的相邻使用并没有带来更好的诊断准确性或速度。主要限制源于数据驱动范式：缺乏透明度的黑盒输出、易受幻觉的影响以及薄弱的因果推理。将统计学习与专家基于规则的知识相结合，并在整个过程中涉及临床医生的混合方法有助于恢复可解释性。它们也更适合现有的临床工作流程，如PEIRS和ATHENA-CDS中所见。未来的决策支持应优先考虑通过将预测与临床意义重要原因联系起来的解释连贯性。这可以通过将LLMs的语言能力与人类因果专业知识相结合的神经符号或混合人工智能来实现。AI研究人员已经关注了这个方向，可解释的AI和神经符号的AI是AI进一步发展的下一个合乎逻辑的步骤。然而，它们仍然基于数据驱动的知识整合，而非人和环路方法。未来的研究不仅应该通过准确性来衡量成功，还应该通过临床医生的理解、工作流程适应性和患者结果的改进。对于帮助改善人机交互的理解是AI系统成为临床实践的一部分所迫切需要的。

更新时间: 2025-07-02 01:43:06

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.01282v1

Rethinking All Evidence: Enhancing Trustworthy Retrieval-Augmented Generation via Conflict-Driven Summarization

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating their parametric knowledge with external retrieved content. However, knowledge conflicts caused by internal inconsistencies or noisy retrieved content can severely undermine the generation reliability of RAG systems.In this work, we argue that LLMs should rethink all evidence, including both retrieved content and internal knowledge, before generating responses.We propose CARE-RAG (Conflict-Aware and Reliable Evidence for RAG), a novel framework that improves trustworthiness through Conflict-Driven Summarization of all available evidence.CARE-RAG first derives parameter-aware evidence by comparing parameter records to identify diverse internal perspectives. It then refines retrieved evidences to produce context-aware evidence, removing irrelevant or misleading content. To detect and summarize conflicts, we distill a 3B LLaMA3.2 model to perform conflict-driven summarization, enabling reliable synthesis across multiple sources.To further ensure evaluation integrity, we introduce a QA Repair step to correct outdated or ambiguous benchmark answers.Experiments on revised QA datasets with retrieval data show that CARE-RAG consistently outperforms strong RAG baselines, especially in scenarios with noisy or conflicting evidence.

Updated: 2025-07-02 01:39:49

标题: 重新思考所有证据：通过冲突驱动的总结增强可信的检索增强生成

摘要: 检索增强生成（RAG）通过将参数化知识与外部检索内容整合，增强了大型语言模型（LLMs）。然而，由于内部不一致或嘈杂的检索内容引起的知识冲突可能严重削弱RAG系统的生成可靠性。在这项工作中，我们认为LLMs应该在生成回应之前重新考虑所有证据，包括检索内容和内部知识。我们提出了CARE-RAG（冲突感知和可靠证据用于RAG），这是一个通过冲突驱动摘要提高可信度的新框架，总结所有可用证据。CARE-RAG首先通过比较参数记录来识别不同的内部观点，得出参数感知证据。然后，它对检索到的证据进行精炼，产生上下文感知证据，消除不相关或误导性内容。为了检测和总结冲突，我们提炼了一个3B LLaMA3.2模型进行冲突驱动摘要，实现跨多个来源的可靠综合。为了进一步确保评估的完整性，我们引入了一个QA修复步骤，来纠正过时或模糊的基准答案。在修订后的QA数据集上进行的实验表明，CARE-RAG在具有嘈杂或冲突证据的场景中始终优于强大的RAG基线。

更新时间: 2025-07-02 01:39:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.01281v1

A Framework for Mining Collectively-Behaving Bots in MMORPGs

In MMORPGs (Massively Multiplayer Online Role-Playing Games), abnormal players (bots) using unauthorized automated programs to carry out pre-defined behaviors systematically and repeatedly are commonly observed. Bots usually engage in these activities to gain in-game money, which they eventually trade for real money outside the game. Such abusive activities negatively impact the in-game experiences of legitimate users since bots monopolize specific hunting areas and obtain valuable items. Thus, detecting abnormal players is a significant task for game companies. Motivated by the fact that bots tend to behave collectively with similar in-game trajectories due to the auto-programs, we developed BotTRep, a framework that comprises trajectory representation learning followed by clustering using a completely unlabeled in-game trajectory dataset. Our model aims to learn representations for in-game trajectory sequences so that players with contextually similar trajectories have closer embeddings. Then, by applying DBSCAN to these representations and visualizing the corresponding moving patterns, our framework ultimately assists game masters in identifying and banning bots.

Updated: 2025-07-02 01:35:55

标题: 《在MMORPG中挖掘集体行为机器人的框架》

摘要: 在大型多人在线角色扮演游戏（MMORPGs）中，常常可以观察到异常玩家（机器人）使用未经授权的自动化程序系统地和重复地执行预定义的行为。机器人通常参与这些活动是为了获得游戏内的货币，最终将其交易为游戏外的真实货币。这种滥用活动对合法用户的游戏体验产生负面影响，因为机器人垄断了特定的狩猎区域并获取了有价值的物品。因此，检测异常玩家对游戏公司来说是一个重要任务。受到机器人由于自动程序而倾向于集体行动且具有类似游戏轨迹的事实的启发，我们开发了BotTRep，一个框架，它包括轨迹表示学习和使用完全未标记的游戏内轨迹数据集进行聚类。我们的模型旨在学习游戏内轨迹序列的表示，以便具有上下文相似轨迹的玩家具有更接近的嵌入。然后，通过在这些表示上应用DBSCAN并可视化相应的移动模式，我们的框架最终帮助游戏管理员识别并封禁机器人。

更新时间: 2025-07-02 01:35:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2501.10461v2

SoK: Usability Studies in Differential Privacy

Differential Privacy (DP) has emerged as a pivotal approach for safeguarding individual privacy in data analysis, yet its practical adoption is often hindered by challenges in the implementation and communication of DP. This paper presents a comprehensive systematization of existing research studies around the usability of DP, synthesizing insights from studies on both the practical use of DP tools and strategies for conveying DP parameters that determine privacy protection levels, such as epsilon($\varepsilon$). By reviewing and analyzing these studies, we identify core usability challenges, best practices, and critical gaps in current DP tools that affect adoption across diverse user groups, including developers, data analysts, and non-technical stakeholders. Our analysis highlights actionable insights and pathways for future research that emphasizes user-centered design and clear communication, fostering the development of more accessible DP tools that meet practical needs and support broader adoption.

Updated: 2025-07-02 01:29:35

标题: SoK：差分隐私中的可用性研究

摘要: 差分隐私（DP）已经成为在数据分析中保护个人隐私的关键方法，然而其实际应用常常受到DP实施和沟通方面的挑战的阻碍。本文对现有关于DP可用性的研究进行了全面系统化，综合了关于DP工具的实际使用和传达确定隐私保护水平的DP参数（如ε）的研究见解。通过回顾和分析这些研究，我们确定了当前DP工具中影响各种用户群体采纳的核心可用性挑战、最佳实践和关键缺口，包括开发人员、数据分析师和非技术利益相关者。我们的分析突显了可操作的见解和未来研究的途径，强调用户中心设计和清晰沟通，促进开发更易于接触的DP工具以满足实际需求并支持更广泛的采用。

更新时间: 2025-07-02 01:29:35

领域: cs.HC,cs.CR

下载: http://arxiv.org/abs/2412.16825v2

AI Meets Maritime Training: Precision Analytics for Enhanced Safety and Performance

Traditional simulator-based training for maritime professionals is critical for ensuring safety at sea but often depends on subjective trainer assessments of technical skills, behavioral focus, communication, and body language, posing challenges such as subjectivity, difficulty in measuring key features, and cognitive limitations. Addressing these issues, this study develops an AI-driven framework to enhance maritime training by objectively assessing trainee performance through visual focus tracking, speech recognition, and stress detection, improving readiness for high-risk scenarios. The system integrates AI techniques, including visual focus determination using eye tracking, pupil dilation analysis, and computer vision; communication analysis through a maritime-specific speech-to-text model and natural language processing; communication correctness using large language models; and mental stress detection via vocal pitch. Models were evaluated on data from simulated maritime scenarios with seafarers exposed to controlled high-stress events. The AI algorithms achieved high accuracy, with ~92% for visual detection, ~91% for maritime speech recognition, and ~90% for stress detection, surpassing existing benchmarks. The system provides insights into visual attention, adherence to communication checklists, and stress levels under demanding conditions. This study demonstrates how AI can transform maritime training by delivering objective performance analytics, enabling personalized feedback, and improving preparedness for real-world operational challenges.

Updated: 2025-07-02 01:19:32

标题: AI遇见海事培训：精准分析提升安全性和性能

摘要: 传统基于模拟器的海事专业人员培训对确保海上安全至关重要，但往往依赖于主观训练师对技术技能、行为重点、沟通和肢体语言的评估，面临主观性、难以衡量关键特征和认知限制等挑战。为解决这些问题，本研究开发了一个AI驱动的框架，通过视觉焦点跟踪、语音识别和压力检测客观评估学员表现，改善应对高风险情景的准备。该系统集成了AI技术，包括使用眼动追踪、瞳孔扩张分析和计算机视觉确定视觉焦点；通过海事专用语音转文本模型和自然语言处理进行沟通分析；使用大型语言模型进行沟通正确性评估；通过声音音高检测进行心理压力检测。模型在暴露于受控高压力事件的模拟海事场景数据中进行评估。AI算法实现了高准确率，视觉检测约为92％，海事语音识别约为91％，压力检测约为90％，超过现有基准。该系统提供了有关视觉注意力、遵守沟通清单以及在苛刻条件下的压力水平的见解。该研究展示了AI如何通过提供客观的绩效分析，实现个性化反馈，并提高应对真实操作挑战的准备情况，从而改变海事培训。

更新时间: 2025-07-02 01:19:32

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2507.01274v1

BoltzNCE: Learning Likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation

Efficient sampling from the Boltzmann distribution defined by an energy function is a key challenge in modeling physical systems such as molecules. Boltzmann Generators tackle this by leveraging Continuous Normalizing Flows that transform a simple prior into a distribution that can be reweighted to match the Boltzmann distribution using sample likelihoods. However, obtaining likelihoods requires computing costly Jacobians during integration, making it impractical for large molecular systems. To overcome this, we propose learning the likelihood of the generated distribution via an energy-based model trained with noise contrastive estimation and score matching. By using stochastic interpolants to anneal between the prior and generated distributions, we combine both the objective functions to efficiently learn the density function. On the alanine dipeptide system, we demonstrate that our method yields free energy profiles and energy distributions comparable to those obtained with exact likelihoods. Additionally, we show that free energy differences between metastable states can be estimated accurately with orders-of-magnitude speedup.

Updated: 2025-07-02 01:16:47

标题: BoltzNCE: 使用随机插值和噪声对比估计学习Boltzmann生成的似然函数

摘要: 通过能量函数定义的玻尔兹曼分布进行高效抽样是对建模分子等物理系统的关键挑战。玻尔兹曼生成器通过利用连续标准化流来应对这一挑战，将简单的先验分布转换为可以通过样本似然重新加权以匹配玻尔兹曼分布的分布。然而，获得似然性需要在积分过程中计算昂贵的雅可比矩阵，这使得对于大分子系统来说变得不切实际。为了克服这一挑战，我们提出通过基于能量的模型学习生成分布的似然性，该模型经过噪声对比估计和分数匹配训练。通过使用随机插值器在先验分布和生成分布之间退火，我们结合了两个目标函数以有效地学习密度函数。在丙氨酸二肽系统上，我们演示了我们的方法产生的自由能剖面和能量分布与通过精确似然性获得的结果相媲美。此外，我们展示了在亚稳态之间准确估计自由能差异的速度大幅提高。

更新时间: 2025-07-02 01:16:47

领域: cs.LG,physics.bio-ph

下载: http://arxiv.org/abs/2507.00846v2

PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning

In recent years, unlearning techniques, which are methods for inducing a model to "forget" previously learned information, have attracted attention as a way to address privacy and copyright concerns in large language models (LLMs) and large multimodal models (LMMs). While several unlearning benchmarks have been established for LLMs, a practical evaluation framework for unlearning in LMMs has been less explored. Specifically, existing unlearning benchmark for LMMs considers only scenarios in which the model is required to unlearn fine-tuned knowledge through a single unlearning operation. In this study, we introduce PULSE protocol for realistic unlearning scenarios for LMMs by introducing two critical perspectives: (i) Pre-trained knowledge Unlearning for analyzing the effect across different knowledge acquisition phases and (ii) Long-term Sustainability Evaluation to address sequential requests. We then evaluate existing unlearning methods along these dimensions. Our results reveal that, although some techniques can successfully unlearn knowledge acquired through fine-tuning, they struggle to eliminate information learned during pre-training. Moreover, methods that effectively unlearn a batch of target data in a single operation exhibit substantial performance degradation when the same data are split and unlearned sequentially.

Updated: 2025-07-02 01:13:08

标题: 脉冲：用于大型多模态模型遗忘的实用评估场景

摘要: 在最近几年中，遗忘技术，即诱导模型“忘记”先前学习信息的方法，引起了人们的关注，作为解决大型语言模型（LLMs）和大型多模型模型（LMMs）中隐私和版权问题的一种方法。虽然已经建立了几个用于LLMs的遗忘基准，但对于LMMs中的遗忘的实际评估框架尚未得到深入探讨。具体而言，现有LMMs的遗忘基准仅考虑模型需要通过单个遗忘操作来遗忘经过微调的知识的情景。在本研究中，我们通过引入两个关键视角，即（i）用于分析不同知识获取阶段影响的预训练知识遗忘和（ii）用于解决顺序请求的长期可持续性评估，引入了用于LMMs的现实遗忘场景的PULSE协议。然后我们评估了现有的遗忘方法在这些维度上的表现。我们的结果显示，尽管一些技术可以成功地遗忘通过微调获取的知识，但它们难以消除在预训练期间学到的信息。此外，那些能够在单次操作中有效遗忘一批目标数据的方法，在将同一数据拆分并按顺序遗忘时表现出明显的性能下降。

更新时间: 2025-07-02 01:13:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01271v1

$μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

Automated radiology report generation (RRG) aims to produce detailed textual reports from clinical imaging, such as computed tomography (CT) scans, to improve the accuracy and efficiency of diagnosis and provision of management advice. RRG is complicated by two key challenges: (1) inherent complexity in extracting relevant information from imaging data under resource constraints, and (2) difficulty in objectively evaluating discrepancies between model-generated and expert-written reports. To address these challenges, we propose $\mu^2$LLM, a $\underline{\textbf{mu}}$ltiscale $\underline{\textbf{mu}}$ltimodal large language models for RRG tasks. The novel ${\mu}^2$Tokenizer, as an intermediate layer, integrates multi-modal features from the multiscale visual tokenizer and the text tokenizer, then enhances report generation quality through direct preference optimization (DPO), guided by GREEN-RedLlama. Experimental results on four large CT image-report medical datasets demonstrate that our method outperforms existing approaches, highlighting the potential of our fine-tuned $\mu^2$LLMs on limited data for RRG tasks. At the same time, for prompt engineering, we introduce a five-stage, LLM-driven pipeline that converts routine CT reports into paired visual-question-answer triples and citation-linked reasoning narratives, creating a scalable, high-quality supervisory corpus for explainable multimodal radiology LLM. All code, datasets, and models will be publicly available in our official repository. https://github.com/Siyou-Li/u2Tokenizer

Updated: 2025-07-02 01:08:41

标题: $μ^2$Tokenizer：可微分的多尺度多模态分词器用于放射学报告生成

摘要: 自动化放射学报告生成（RRG）旨在从临床影像（如计算机断层扫描）中生成详细的文本报告，以提高诊断准确性和效率，并提供管理建议。RRG受到两个关键挑战的影响：（1）在资源约束下从影像数据中提取相关信息的固有复杂性，以及（2）难以客观评估模型生成和专家撰写的报告之间的差异。为了解决这些挑战，我们提出了$\mu^2$LLM，这是一个用于RRG任务的多尺度多模态大语言模型。作为中间层的新颖${\mu}^2$Tokenizer整合了来自多尺度视觉标记器和文本标记器的多模态特征，然后通过GREEN-RedLlama引导的直接优化偏好（DPO）来增强报告生成质量。在四个大型CT图像报告医学数据集上的实验结果表明，我们的方法优于现有方法，突显了我们在有限数据上对RRG任务进行微调的$\mu^2$LLMs的潜力。同时，为了进行快速工程，我们引入了一个五阶段的LLM驱动流水线，将常规CT报告转换为成对的视觉问题答案三元组和引文链接推理叙事，为可解释的多模态放射学LLM创建可扩展的高质量监督语料库。所有代码、数据集和模型将在我们的官方存储库中公开提供。https://github.com/Siyou-Li/u2Tokenizer

更新时间: 2025-07-02 01:08:41

领域: cs.LG,cs.CL,eess.IV

下载: http://arxiv.org/abs/2507.00316v2

Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

Entity alignment (EA) aims at identifying equivalent entity pairs across different knowledge graphs (KGs) that refer to the same real-world identity. To circumvent the shortage of seed alignments provided for training, recent EA models utilize pseudo-labeling strategies to iteratively add unaligned entity pairs predicted with high confidence to the seed alignments for model training. However, the adverse impact of confirmation bias during pseudo-labeling has been largely overlooked, thus hindering entity alignment performance. To systematically combat confirmation bias for pseudo-labeling-based entity alignment, we propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA) that explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment. UPL-EA consists of two complementary components: (1) Optimal Transport (OT)-based pseudo-labeling uses discrete OT modeling as an effective means to determine entity correspondences and reduce erroneous matches across two KGs. An effective criterion is derived to infer pseudo-labeled alignments that satisfy one-to-one correspondences; (2) Parallel pseudo-label ensembling refines pseudo-labeled alignments by combining predictions over multiple models independently trained in parallel. The ensembled pseudo-labeled alignments are thereafter used to augment seed alignments to reinforce subsequent model training for alignment inference. The effectiveness of UPL-EA in eliminating pseudo-labeling errors is both theoretically supported and experimentally validated. Our extensive results and in-depth analyses demonstrate the superiority of UPL-EA over 15 competitive baselines and its utility as a general pseudo-labeling framework for entity alignment.

Updated: 2025-07-02 01:04:31

标题: 对抗确认偏见：一种统一的伪标签框架用于实体对齐

摘要: 实体对齐（EA）旨在识别跨不同知识图谱（KGs）的等价实体对，这些实体对指向同一现实世界身份。为了克服训练中提供的种子对齐不足的问题，最近的EA模型利用伪标记策略，将高置信度预测的未对齐实体对迭代地添加到种子对齐中进行模型训练。然而，在伪标记过程中确认偏见的负面影响往往被忽视，从而阻碍了实体对齐的性能。为了系统地对抗基于伪标记的实体对齐中的确认偏见，我们提出了一个统一的伪标记实体对齐框架（UPL-EA），明确消除伪标记错误，以提高实体对齐的准确性。UPL-EA包括两个互补组成部分：（1）基于最优传输（OT）的伪标记利用离散OT建模作为确定实体对应关系和减少两个KGs之间错误匹配的有效手段。推导出了一个有效的标准，用于推断满足一对一对应关系的伪标记对齐；（2）并行伪标记整合通过独立训练的多个模型组合预测，对伪标记对齐进行优化。然后使用整合的伪标记对齐来增强种子对齐，以加强后续模型训练，以进行对齐推断。UPL-EA消除伪标记错误的有效性在理论上得到支持，并在实验证实。我们广泛的结果和深入的分析表明，UPL-EA优于15个竞争基线，并且作为实体对齐的通用伪标记框架的实用性。

更新时间: 2025-07-02 01:04:31

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2307.02075v4

Evaluating Hierarchical Clinical Document Classification Using Reasoning-Based LLMs

This study evaluates how well large language models (LLMs) can classify ICD-10 codes from hospital discharge summaries, a critical but error-prone task in healthcare. Using 1,500 summaries from the MIMIC-IV dataset and focusing on the 10 most frequent ICD-10 codes, the study tested 11 LLMs, including models with and without structured reasoning capabilities. Medical terms were extracted using a clinical NLP tool (cTAKES), and models were prompted in a consistent, coder-like format. None of the models achieved an F1 score above 57%, with performance dropping as code specificity increased. Reasoning-based models generally outperformed non-reasoning ones, with Gemini 2.5 Pro performing best overall. Some codes, such as those related to chronic heart disease, were classified more accurately than others. The findings suggest that while LLMs can assist human coders, they are not yet reliable enough for full automation. Future work should explore hybrid methods, domain-specific model training, and the use of structured clinical data.

Updated: 2025-07-02 00:53:54

标题: 使用基于推理的LLMs评估分层临床文档分类

摘要: 这项研究评估了大型语言模型（LLMs）在对医院出院总结中的ICD-10代码进行分类的能力，这是医疗保健中一个关键但容易出错的任务。使用来自MIMIC-IV数据集的1,500个总结，并重点关注前10个最常见的ICD-10代码，研究测试了11个LLMs，包括具有和不具有结构化推理能力的模型。医学术语是使用临床自然语言处理工具（cTAKES）提取的，并且以一种一致的编码器风格格式提示模型。没有一个模型的F1分数超过57％，随着代码特定性的增加，性能下降。基于推理的模型通常优于非推理的模型，其中Gemini 2.5 Pro总体表现最佳。一些代码，例如与慢性心脏病相关的代码，比其他代码更准确地分类。研究结果表明，虽然LLMs可以帮助人类编码员，但它们还不够可靠以实现完全自动化。未来的工作应该探索混合方法、领域特定模型训练和使用结构化临床数据。

更新时间: 2025-07-02 00:53:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.03001v1

LLM-based Realistic Safety-Critical Driving Video Generation

Designing diverse and safety-critical driving scenarios is essential for evaluating autonomous driving systems. In this paper, we propose a novel framework that leverages Large Language Models (LLMs) for few-shot code generation to automatically synthesize driving scenarios within the CARLA simulator, which has flexibility in scenario scripting, efficient code-based control of traffic participants, and enforcement of realistic physical dynamics. Given a few example prompts and code samples, the LLM generates safety-critical scenario scripts that specify the behavior and placement of traffic participants, with a particular focus on collision events. To bridge the gap between simulation and real-world appearance, we integrate a video generation pipeline using Cosmos-Transfer1 with ControlNet, which converts rendered scenes into realistic driving videos. Our approach enables controllable scenario generation and facilitates the creation of rare but critical edge cases, such as pedestrian crossings under occlusion or sudden vehicle cut-ins. Experimental results demonstrate the effectiveness of our method in generating a wide range of realistic, diverse, and safety-critical scenarios, offering a promising tool for simulation-based testing of autonomous vehicles.

Updated: 2025-07-02 00:45:19

标题: 基于LLM的逼真安全关键驾驶视频生成

摘要: 设计多样化和安全关键的驾驶场景对于评估自动驾驶系统至关重要。在本文中，我们提出了一个新颖的框架，利用大型语言模型（LLMs）进行少样本代码生成，以自动合成CARLA模拟器中的驾驶场景。该模拟器具有场景脚本的灵活性，对交通参与者的高效基于代码的控制，以及对现实物理动态的实施。给定一些示例提示和代码样本，LLM生成指定交通参与者行为和位置的安全关键场景脚本，特别关注碰撞事件。为了弥合仿真和真实外观之间的差距，我们集成了使用Cosmos-Transfer1和ControlNet的视频生成管道，将渲染场景转换为逼真的驾驶视频。我们的方法实现了可控的场景生成，并促进了罕见但关键的边缘案例的创建，例如在遮挡下的行人穿越或突然的车辆切入。实验结果证明了我们的方法在生成各种逼真、多样化和安全关键场景方面的有效性，为自动驾驶车辆的基于仿真的测试提供了一种有前景的工具。

更新时间: 2025-07-02 00:45:19

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.01264v1

Dynamic Matching with Post-allocation Service and its Application to Refugee Resettlement

Motivated by our collaboration with a major refugee resettlement agency in the U.S., we study a dynamic matching problem where each new arrival (a refugee case) must be matched immediately and irrevocably to one of the static resources (a location with a fixed annual quota). In addition to consuming the static resource, each case requires post-allocation services from a server, such as a translator. Given the uncertainty in service time, a server may not be available at a given time, thus we refer to it as a dynamic resource. Upon matching, the case will wait to avail service in a first-come-first-serve manner. Bursty matching to a location may result in undesirable congestion at its corresponding server. Consequently, the central planner (the agency) faces a dynamic matching problem with an objective that combines the matching reward (captured by pair-specific employment outcomes) with the cost for congestion for dynamic resources and over-allocation for the static ones. Motivated by the observed fluctuations in the composition of refugee pools across the years, we aim to design algorithms that do not rely on distributional knowledge. We develop learning-based algorithms that are asymptotically optimal in certain regimes, easy to interpret, and computationally fast. Our design is based on learning the dual variables of the underlying optimization problem; however, the main challenge lies in the time-varying nature of the dual variables associated with dynamic resources. Our theoretical development brings together techniques from Lyapunov analysis, adversarial online learning, and stochastic optimization. On the application side, when tested on real data from our partner agency and incorporating practical considerations, our method outperforms existing ones making it a viable candidate for replacing the current practice upon experimentation.

Updated: 2025-07-02 00:39:19

标题: 动态匹配与后分配服务及其在难民安置中的应用

摘要: 受美国一家主要难民安置机构的合作启发，我们研究了一个动态匹配问题，其中每个新到达的人（难民案例）必须立即且不可撤销地与静态资源（具有固定年度配额的位置）之一匹配。除了消耗静态资源外，每个案例还需要从服务器（如翻译员）获得分配后服务。鉴于服务时间的不确定性，某时可能没有服务器可用，因此我们将其称为动态资源。匹配后，案例将按先到先服务的方式等待获得服务。向一个位置的突发匹配可能导致其相应服务器的不良拥堵。因此，中央规划者（机构）面临一个动态匹配问题，其目标结合了匹配奖励（通过特定配对就业结果捕捉）与动态资源拥塞成本和静态资源超分配的成本。受观察到的难民群体成分在年度间波动的启发，我们旨在设计不依赖分布知识的算法。我们开发了学习型算法，在某些情况下渐近最优，易于解释且计算速度快。我们的设计基于学习基础优化问题的对偶变量；然而，主要挑战在于与动态资源相关的时间变化性对偶变量。我们的理论发展汇集了来自李亚普诺夫分析、对抗性在线学习和随机优化的技术。在应用方面，当在合作机构的真实数据上进行测试并考虑实际因素时，我们的方法胜过现有方法，使其成为在实验中取代当前实践的可行候选者。

更新时间: 2025-07-02 00:39:19

领域: cs.DS,cs.GT,cs.LG,math.OC

下载: http://arxiv.org/abs/2410.22992v2

Automated Classification of Volcanic Earthquakes Using Transformer Encoders: Insights into Data Quality and Model Interpretability

Precisely classifying earthquake types is crucial for elucidating the relationship between volcanic earthquakes and volcanic activity. However, traditional methods rely on subjective human judgment, which requires considerable time and effort. To address this issue, we developed a deep learning model using a transformer encoder for a more objective and efficient classification. Tested on Mount Asama's diverse seismic activity, our model achieved high F1 scores (0.930 for volcano tectonic, 0.931 for low-frequency earthquakes, and 0.980 for noise), superior to a conventional CNN-based method. To enhance interpretability, attention weight visualizations were analyzed, revealing that the model focuses on key waveform features similarly to human experts. However, inconsistencies in training data, such as ambiguously labeled B-type events with S-waves, were found to influence classification accuracy and attention weight distributions. Experiments addressing data selection and augmentation demonstrated the importance of balancing data quality and diversity. In addition, stations within 3 km of the crater played an important role in improving model performance and interpretability. These findings highlight the potential of Transformer-based models for automated volcanic earthquake classification, particularly in improving efficiency and interpretability. By addressing challenges such as data imbalance and subjective labeling, our approach provides a robust framework for understanding seismic activity at Mount Asama. Moreover, this framework offers opportunities for transfer learning to other volcanic regions, paving the way for enhanced volcanic hazard assessments and disaster mitigation strategies.

Updated: 2025-07-02 00:37:07

标题: 使用Transformer编码器自动分类火山地震：关于数据质量和模型可解释性的见解

摘要: 精确分类地震类型对阐明火山地震与火山活动之间的关系至关重要。然而，传统方法依赖主观人类判断，需要大量时间和精力。为了解决这个问题，我们开发了一个使用Transformer编码器的深度学习模型，以实现更客观和高效的分类。在富有多样性的浅间山地震活动测试中，我们的模型取得了高F1分数（火山构造0.930，低频地震0.931，噪声0.980），优于传统的基于CNN的方法。为了增强可解释性，我们分析了注意力权重可视化，揭示了模型与人类专家类似地关注关键波形特征。然而，训练数据中的不一致性，如含糊标记的B型事件与S波，被发现会影响分类准确性和注意力权重分布。处理数据选择和增强的实验显示了平衡数据质量和多样性的重要性。此外，距离火山口3公里内的站点在改善模型性能和可解释性方面发挥了重要作用。这些发现突显了基于Transformer模型的火山地震自动分类的潜力，特别是在提高效率和可解释性方面。通过解决数据不平衡和主观标记等挑战，我们的方法提供了一个强大的框架，用于理解浅间山的地震活动。此外，这一框架为将来将转移学习应用于其他火山地区提供了机会，为增强火山灾害评估和灾害缓解策略铺平了道路。

更新时间: 2025-07-02 00:37:07

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2507.01260v1

GAIus: Combining Genai with Legal Clauses Retrieval for Knowledge-based Assistant

In this paper we discuss the capability of large language models to base their answer and provide proper references when dealing with legal matters of non-english and non-chinese speaking country. We discuss the history of legal information retrieval, the difference between case law and statute law, its impact on the legal tasks and analyze the latest research in this field. Basing on that background we introduce gAIus, the architecture of the cognitive LLM-based agent, whose responses are based on the knowledge retrieved from certain legal act, which is Polish Civil Code. We propose a retrieval mechanism which is more explainable, human-friendly and achieves better results than embedding-based approaches. To evaluate our method we create special dataset based on single-choice questions from entrance exams for law apprenticeships conducted in Poland. The proposed architecture critically leveraged the abilities of used large language models, improving the gpt-3.5-turbo-0125 by 419%, allowing it to beat gpt-4o and lifting gpt-4o-mini score from 31% to 86%. At the end of our paper we show the possible future path of research and potential applications of our findings.

Updated: 2025-07-02 00:36:27

标题: 盖乌斯：将Genai与法律条款检索相结合，用于基于知识的助理

摘要: 在本文中，我们讨论了大型语言模型在处理非英语和非中文国家的法律问题时，基于其答案并提供适当参考资料的能力。我们讨论了法律信息检索的历史，案例法和法典法之间的区别，以及对法律任务的影响，并分析了这一领域的最新研究。基于这一背景，我们介绍了gAIus，基于认知LLM的代理的架构，其回答基于从波兰民法中检索的知识。我们提出了一种更易解释、人性化并且比基于嵌入式方法更好的检索机制。为了评估我们的方法，我们创建了基于波兰律师实习入学考试的单选题的特殊数据集。所提出的架构批判性地利用了使用的大型语言模型的能力，将gpt-3.5-turbo-0125的性能提高了419%，使其击败了gpt-4o，并将gpt-4o-mini的得分从31%提高到86%。在本文的结尾，我们展示了研究的可能未来路径和我们发现的潜在应用。

更新时间: 2025-07-02 00:36:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.01259v1

AIRES: Accelerating Out-of-Core GCNs via Algorithm-System Co-Design

Graph convolutional networks (GCNs) are fundamental in various scientific applications, ranging from biomedical protein-protein interactions (PPI) to large-scale recommendation systems. An essential component for modeling graph structures in GCNs is sparse general matrix-matrix multiplication (SpGEMM). As the size of graph data continues to scale up, SpGEMMs are often conducted in an out-of-core fashion due to limited GPU memory space in resource-constrained systems. Albeit recent efforts that aim to alleviate the memory constraints of out-of-core SpGEMM through either GPU feature caching, hybrid CPU-GPU memory layout, or performing the computation in sparse format, current systems suffer from both high I/O latency and GPU under-utilization issues. In this paper, we first identify the problems of existing systems, where sparse format data alignment and memory allocation are the main performance bottlenecks, and propose AIRES, a novel algorithm-system co-design solution to accelerate out-of-core SpGEMM computation for GCNs. Specifically, from the algorithm angle, AIRES proposes to alleviate the data alignment issues on the block level for matrices in sparse formats and develops a tiling algorithm to facilitate row block-wise alignment. On the system level, AIRES employs a three-phase dynamic scheduling that features a dual-way data transfer strategy utilizing a tiered memory system: integrating GPU memory, GPU Direct Storage (GDS), and host memory to reduce I/O latency and improve throughput. Evaluations show that AIRES significantly outperforms the state-of-the-art methods, achieving up to 1.8x lower latency in real-world graph processing benchmarks.

Updated: 2025-07-02 00:35:43

标题: AIRES：通过算法-系统共同设计加速超大规模GCNs

摘要: 图卷积网络（GCNs）在各种科学应用中起着基础作用，从生物医学蛋白质相互作用（PPI）到大规模推荐系统。建模GCNs中图结构的一个重要组件是稀疏通用矩阵-矩阵乘法（SpGEMM）。随着图数据规模的不断扩大，由于资源受限系统中GPU内存空间有限，SpGEMMs通常以外存方式进行。尽管最近的努力旨在通过GPU特征缓存、混合CPU-GPU内存布局或以稀疏格式进行计算等方式缓解外存SpGEMM的内存限制，但目前的系统存在高I/O延迟和GPU资源利用不足的问题。本文首先确定了现有系统存在的问题，其中稀疏格式数据对齐和内存分配是主要的性能瓶颈，并提出了AIRES，一种新颖的算法-系统协同设计解决方案，以加速GCNs的外存SpGEMM计算。具体来说，从算法角度上，AIRES提出在稀疏格式矩阵的块级别上缓解数据对齐问题，并开发了一个平铺算法来促进行块级对齐。在系统层面上，AIRES采用了一个三阶段动态调度，采用了双向数据传输策略，利用分层内存系统：整合GPU内存、GPU直接存储（GDS）和主机内存，以减少I/O延迟并提高吞吐量。评估结果显示，AIRES在真实世界图处理基准测试中明显优于最先进方法，延迟降低了高达1.8倍。

更新时间: 2025-07-02 00:35:43

领域: cs.LG

下载: http://arxiv.org/abs/2507.02006v1

Deterministic Cryptographic Seed Generation via Cyclic Modular Inversion over $\mathbb{Z}/3^p\mathbb{Z}$

We present a deterministic framework for cryptographic seed generation based on cyclic modular inversion over $\mathbb{Z}/3^p\mathbb{Z}$. The method enforces algebraic admissibility on seed inputs via the identity $d_k \equiv -\left(2^{k-1}\right)^{-1} \bmod 3^p$, thereby producing structured and invertible residue sequences. This mapping yields entropy-rich, cycle-complete seeds well-suited for cryptographic primitives such as DRBGs, KDFs, and post-quantum schemes. To assess the quality of randomness, we introduce the Entropy Confidence Score (ECS), a composite metric reflecting coverage, uniformity, and modular bias. Although not a cryptographic PRNG in itself, the framework serves as a deterministic entropy filter that conditions and validates seed inputs prior to their use by conventional generators. Empirical and hardware-based results confirm constant-time execution, minimal side-channel leakage, and lightweight feasibility for embedded applications. The framework complements existing cryptographic stacks by acting as an algebraically verifiable entropy filter, thereby enhancing structural soundness and auditability.

Updated: 2025-07-02 00:17:55

标题: 通过$\mathbb{Z}/3^p\mathbb{Z}$上的循环模反演确定性加密种子生成

摘要: 我们提出了一个基于$\mathbb{Z}/3^p\mathbb{Z}$上循环模反演的密码种子生成确定性框架。该方法通过$d_k \equiv -\left(2^{k-1}\right)^{-1} \bmod 3^p$这一恒等式，在种子输入上强制执行代数可接受性，从而产生结构化且可逆的残留序列。这种映射产生熵丰富、循环完整的种子，非常适用于密码原语，如DRBGs、KDFs和后量子方案。为了评估随机性的质量，我们引入了熵置信度评分（ECS），这是一个反映覆盖率、均匀性和模偏差的复合度量。虽然本身并非密码学PRNG，但该框架作为确定性熵过滤器，可以在常规生成器使用之前对种子输入进行条件化和验证。实证和基于硬件的结果证实了常数时间执行、最小侧信道泄漏和嵌入式应用的轻量级可行性。该框架通过作为一个代数可验证的熵过滤器，补充了现有的密码堆栈，从而增强了结构完整性和可审计性。

更新时间: 2025-07-02 00:17:55

领域: cs.CR,cs.IT,math.IT,Primary 05A17, Secondary 11D45, 11Y60, 94A60,F.2.1

下载: http://arxiv.org/abs/2507.03000v1

Human Mobility Modeling with Household Coordination Activities under Limited Information via Retrieval-Augmented LLMs

Understanding human mobility patterns has long been a challenging task in transportation modeling. Due to the difficulties in obtaining high-quality training datasets across diverse locations, conventional activity-based models and learning-based human mobility modeling algorithms are particularly limited by the availability and quality of datasets. Current approaches primarily focus on spatial-temporal patterns while neglecting semantic relationships such as logical connections or dependencies between activities and household coordination activities like joint shopping trips or family meal times, both crucial for realistic mobility modeling. We propose a retrieval-augmented large language model (LLM) framework that generates activity chains with household coordination using only public accessible statistical and socio-demographic information, reducing the need for sophisticated mobility data. The retrieval-augmentation mechanism enables household coordination and maintains statistical consistency across generated patterns, addressing a key gap in existing methods. Our validation with NHTS and SCAG-ABM datasets demonstrates effective mobility synthesis and strong adaptability for regions with limited mobility data availability.

Updated: 2025-07-02 00:16:38

标题: 使用检索增强的LLMs模型在有限信息下对家庭协调活动进行人类流动建模

摘要: 理解人类活动模式长期以来一直是交通建模中的一个具有挑战性的任务。由于在不同地点获取高质量的训练数据集存在困难，传统的基于活动的模型和基于学习的人类移动建模算法在数据集的可用性和质量方面受到限制。当前方法主要关注空间-时间模式，而忽视了语义关系，如活动之间的逻辑连接或依赖性以及家庭协调活动，比如共同购物旅行或家庭用餐时间，这些对于现实中的移动建模至关重要。我们提出了一个检索增强的大型语言模型(LLM)框架，可以利用仅公开可访问的统计和社会人口信息生成具有家庭协调的活动链，从而减少对复杂移动数据的需求。检索增强机制实现了家庭协调并在生成的模式中保持统计一致性，解决了现有方法中的一个关键缺口。我们通过NHTS和SCAG-ABM数据集的验证表明，该框架能够有效地合成移动性，并且对于移动数据可用性有限的地区具有很强的适应性。

更新时间: 2025-07-02 00:16:38

领域: cs.AI,cs.SI

下载: http://arxiv.org/abs/2409.17495v2

Towards Safety Evaluations of Theory of Mind in Large Language Models

As the capabilities of large language models (LLMs) continue to advance, the importance of rigorous safety evaluation is becoming increasingly evident. Recent concerns within the realm of safety assessment have highlighted instances in which LLMs exhibit behaviors that appear to disable oversight mechanisms and respond in a deceptive manner. For example, there have been reports suggesting that, when confronted with information unfavorable to their own persistence during task execution, LLMs may act covertly and even provide false answers to questions intended to verify their behavior. To evaluate the potential risk of such deceptive actions toward developers or users, it is essential to investigate whether these behaviors stem from covert, intentional processes within the model. In this study, we propose that it is necessary to measure the theory of mind capabilities of LLMs. We begin by reviewing existing research on theory of mind and identifying the perspectives and tasks relevant to its application in safety evaluation. Given that theory of mind has been predominantly studied within the context of developmental psychology, we analyze developmental trends across a series of open-weight LLMs. Our results indicate that while LLMs have improved in reading comprehension, their theory of mind capabilities have not shown comparable development. Finally, we present the current state of safety evaluation with respect to LLMs' theory of mind, and discuss remaining challenges for future work.

Updated: 2025-07-02 00:16:28

标题: 朝向大型语言模型中理论心智安全评估

摘要: 随着大型语言模型（LLMs）的能力不断提升，严格的安全评估的重要性变得日益明显。最近在安全评估领域出现的担忧突显出LLMs表现出的行为可能会禁用监督机制并以欺骗性方式作出回应。例如，有报道称，当LLMs面对不利于其任务执行时，可能会秘密行动，甚至提供错误答案以验证其行为。为了评估这种欺骗行为对开发人员或用户的潜在风险，有必要调查这些行为是否源于模型内部的秘密、有意的过程。在本研究中，我们提出有必要测量LLMs的心灵理论能力。我们首先回顾了有关心灵理论的现有研究，并确定了在安全评估中应用的相关观点和任务。鉴于心灵理论主要在发展心理学背景下进行研究，我们分析了一系列开放权重LLMs的发展趋势。我们的结果表明，虽然LLMs在阅读理解方面有所改善，但其心灵理论能力并未显示出相应的发展。最后，我们展示了关于LLMs心灵理论的安全评估的当前状态，并讨论了未来工作面临的挑战。

更新时间: 2025-07-02 00:16:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.17352v2