Arxiv Day: Article

Quantifying the Uniqueness and Divisiveness of Presidential Discourse

Do American presidents speak discernibly different from each other? If so, in what ways? And are these differences confined to any single medium of communication? To investigate these questions, this paper introduces a novel metric of uniqueness based on large language models, develops a new lexicon for divisive speech, and presents a framework for assessing the distinctive ways in which presidents speak about their political opponents. Applying these tools to a variety of corpora of presidential speeches, we find considerable evidence that Donald Trump's speech patterns diverge from those of all major party nominees for the presidency in recent history. Trump is significantly more distinctive than his fellow Republicans, whose uniqueness values appear closer to those of the Democrats. Contributing to these differences is Trump's employment of divisive and antagonistic language, particularly when targeting his political opponents. These differences hold across a variety of measurement strategies, arise on both the campaign trail and in official presidential addresses, and do not appear to be an artifact of secular changes in presidential communications.

Updated: 2025-07-23 23:44:10

标题: 量化总统话语的独特性和分裂性

摘要: 美国总统在言论中是否具有明显的差异？如果是，以何种方式？这些差异是否限定在某一种传播媒介中？为了调查这些问题，本文介绍了一种基于大型语言模型的独特性度量标准，开发了一种新的分裂性言论词汇，并提出了一个评估总统在言论中对待政治对手的独特方式的框架。将这些工具应用于各种总统演讲文献中，我们发现大量证据表明，唐纳德·特朗普的言论模式与近代历史上所有主要政党总统候选人的不同。特朗普比他的共和党同僚更具独特性，其独特性数值似乎更接近民主党人的数值。导致这些差异的原因之一是特朗普采用了分裂性和对抗性的语言，尤其是在针对政治对手时。这些差异在各种测量策略中都存在，在竞选活动中和官方总统演讲中都出现，并且似乎不是总统传播方式中的历史性变化的产物。

更新时间: 2025-07-23 23:44:10

领域: cs.CL,cs.AI,cs.CY,cs.SI

下载: http://arxiv.org/abs/2401.01405v2

Synthesis of timeline-based planning strategies avoiding determinization

Qualitative timeline-based planning models domains as sets of independent, but interacting, components whose behaviors over time, the timelines, are governed by sets of qualitative temporal constraints (ordering relations), called synchronization rules. Its plan-existence problem has been shown to be PSPACE-complete; in particular, PSPACE-membership has been proved via reduction to the nonemptiness problem for nondeterministic finite automata. However, nondeterministic automata cannot be directly used to synthesize planning strategies as a costly determinization step is needed. In this paper, we identify a fragment of qualitative timeline-based planning whose plan-existence problem can be directly mapped into the nonemptiness problem of deterministic finite automata, which can then synthesize strategies. In addition, we identify a maximal subset of Allen's relations that fits into such a deterministic fragment.

Updated: 2025-07-23 23:39:04

标题: 时间轴规划策略的合成，避免确定化

摘要: 基于时间线的定性规划模型将领域定义为一组相互独立但相互作用的组件，它们在时间上的行为（时间线）受定性时间约束（排序关系）集合所控制，称为同步规则。其计划存在问题已被证明是PSPACE完全的；特别是通过将其归约为非确定有限自动机的非空问题证明了PSPACE成员资格。然而，非确定性自动机不能直接用于合成规划策略，因为需要进行昂贵的确定化步骤。在本文中，我们确定了一种定性时间线规划的片段，其计划存在问题可以直接映射到确定性有限自动机的非空问题，从而能够合成策略。此外，我们确定了适合于这种确定性片段的Allen关系的最大子集。

更新时间: 2025-07-23 23:39:04

领域: cs.AI

下载: http://arxiv.org/abs/2507.17988v1

Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale

The integration of large language models (LLMs) into educational tools has the potential to substantially impact how teachers plan instruction, support diverse learners, and engage in professional reflection. Yet little is known about how educators actually use these tools in practice and how their interactions with AI can be meaningfully studied at scale. This paper presents a human-AI collaborative methodology for large-scale qualitative analysis of over 140,000 educator-AI messages drawn from a generative AI platform used by K-12 teachers. Through a four-phase coding pipeline, we combined inductive theme discovery, codebook development, structured annotation, and model benchmarking to examine patterns of educator engagement and evaluate the performance of LLMs in qualitative coding tasks. We developed a hierarchical codebook aligned with established teacher evaluation frameworks, capturing educators' instructional goals, contextual needs, and pedagogical strategies. Our findings demonstrate that LLMs, particularly Claude 3.5 Haiku, can reliably support theme identification, extend human recognition in complex scenarios, and outperform open-weight models in both accuracy and structural reliability. The analysis also reveals substantive patterns in how educators inquire AI to enhance instructional practices (79.7 percent of total conversations), create or adapt content (76.1 percent), support assessment and feedback loop (46.9 percent), attend to student needs for tailored instruction (43.3 percent), and assist other professional responsibilities (34.2 percent), highlighting emerging AI-related competencies that have direct implications for teacher preparation and professional development. This study offers a scalable, transparent model for AI-augmented qualitative research and provides foundational insights into the evolving role of generative AI in educational practice.

Updated: 2025-07-23 23:23:38

标题: 解码教学对话：人工智能协作分析教师在规模上使用人工智能工具

摘要: 将大型语言模型（LLMs）集成到教育工具中，有潜力显著影响教师规划教学、支持多样化学习者和进行专业反思的方式。然而，目前对于教育工作者实际如何在实践中使用这些工具以及他们与人工智能的互动如何能够被有意义地大规模研究知之甚少。本文提出了一种人工智能协作方法，用于对K-12教师使用的生成式人工智能平台中超过14万条教育工作者-人工智能消息进行大规模定性分析。通过四阶段编码流程，我们结合归纳主题发现、代码书开发、结构化标注和模型基准测试，以审查教育工作者参与模式并评估LLMs在定性编码任务中的性能。我们开发了一个与已建立的教师评估框架对齐的分层代码书，捕捉教育工作者的教学目标、背景需求和教学策略。我们的研究结果表明，LLMs，特别是Claude 3.5 Haiku，可以可靠地支持主题识别，在复杂场景中扩展人类识别，并在准确性和结构可靠性方面击败开放式权重模型。分析还揭示了教育工作者如何向人工智能询问以增强教学实践的实质性模式（79.7％的总对话），创建或调整内容（76.1％），支持评估和反馈循环（46.9％），关注学生需求以获取定制教学（43.3％），以及协助其他专业责任（34.2％），突显了对教师培训和专业发展具有直接影响的新兴与人工智能相关的能力。该研究提供了一个可扩展、透明的AI增强定性研究模型，并为生成式人工智能在教育实践中不断发展的角色提供了基础见解。

更新时间: 2025-07-23 23:23:38

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2507.17985v1

Machine Unlearning of Traffic State Estimation and Prediction

Data-driven traffic state estimation and prediction (TSEP) relies heavily on data sources that contain sensitive information. While the abundance of data has fueled significant breakthroughs, particularly in machine learning-based methods, it also raises concerns regarding privacy, cybersecurity, and data freshness. These issues can erode public trust in intelligent transportation systems. Recently, regulations have introduced the "right to be forgotten", allowing users to request the removal of their private data from models. As machine learning models can remember old data, simply removing it from back-end databases is insufficient in such systems. To address these challenges, this study introduces a novel learning paradigm for TSEP-Machine Unlearning TSEP-which enables a trained TSEP model to selectively forget privacy-sensitive, poisoned, or outdated data. By empowering models to "unlearn," we aim to enhance the trustworthiness and reliability of data-driven traffic TSEP.

Updated: 2025-07-23 23:23:18

标题: 机器学习交通状态估计和预测的取消学习

摘要: 基于数据驱动的交通状态估计和预测（TSEP）在很大程度上依赖包含敏感信息的数据源。尽管数据的丰富性促成了重大突破，尤其是基于机器学习的方法，但也引发了关于隐私、网络安全和数据新鲜度的担忧。这些问题可能会损害公众对智能交通系统的信任。最近，法规引入了“被遗忘的权利”，允许用户请求将其私人数据从模型中删除。由于机器学习模型可以记住旧数据，仅仅从后端数据库中移除数据在这种系统中是不够的。为了解决这些挑战，本研究引入了一种新的学习范式，即TSEP-Machine Unlearning TSEP，该范式使经过训练的TSEP模型可以有选择地忘记隐私敏感、被污染或过时的数据。通过赋予模型“遗忘”的能力，我们旨在提高数据驱动交通TSEP的可信度和可靠性。

更新时间: 2025-07-23 23:23:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17984v1

Machine Unlearning of Traffic State Estimation and Prediction

Updated: 2025-07-23 23:23:18

标题: 机器取消交通状态估计和预测

摘要: 数据驱动的交通状态估计和预测(TSEP)在很大程度上依赖包含敏感信息的数据源。尽管数据的丰富性推动了重大突破，特别是在基于机器学习的方法方面，但也引发了对隐私、网络安全和数据新鲜度的担忧。这些问题可能会损害公众对智能交通系统的信任。最近，法规引入了“被遗忘的权利”，允许用户请求将他们的私人数据从模型中删除。由于机器学习模型可以记住旧数据，仅仅从后端数据库中删除数据在这种系统中是不够的。为了应对这些挑战，本研究引入了一种新的学习范式，即TSEP-Machine Unlearning TSEP，该范式使训练有素的TSEP模型能够选择性地忘记隐私敏感、被污染或过时的数据。通过赋予模型“忘记”的能力，我们旨在提高数据驱动交通TSEP的可信度和可靠性。

更新时间: 2025-07-23 23:23:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17984v1

Translating Between the Common Haar Random State Model and the Unitary Model

Black-box separations are a cornerstone of cryptography, indicating barriers to various goals. A recent line of work has explored black-box separations for quantum cryptographic primitives. Namely, a number of separations are known in the Common Haar Random State (CHRS) model, though this model is not considered a complete separation, but rather a starting point. A few very recent works have attempted to lift these separations to a unitary separation, which are considered complete separations. Unfortunately, we find significant errors in some of these lifting results. We prove general conditions under which CHRS separations can be generically lifted, thereby giving simple, modular, and bug-free proofs of complete unitary separations between various quantum primitives. Our techniques allow for simpler proofs of existing separations as well as new separations that were previously only known in the CHRS model.

Updated: 2025-07-23 23:22:47

标题: 在常见的Haar随机态模型和幺正模型之间进行翻译

摘要: 黑盒分离是密码学的基石，指示了对各种目标的障碍。最近的一系列研究探讨了量子密码原语的黑盒分离。也就是说，已知在Common Haar Random State (CHRS)模型中存在许多分离，尽管这个模型并不被认为是完全分离，而只是一个起点。最近一些研究尝试将这些分离提升为一个单元分离，这被认为是完全分离。不幸的是，我们发现一些提升结果中存在着显著的错误。我们证明了在哪些一般条件下CHRS分离可以被一般性地提升，从而给出了各种量子原语之间的完全单元分离的简单、模块化和无错误的证明。我们的技术允许更简单地证明现有的分离，同时也可以证明以前只在CHRS模型中知道的新的分离。

更新时间: 2025-07-23 23:22:47

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2503.11634v2

Translating Between the Common Haar Random State Model and the Unitary Model

Updated: 2025-07-23 23:22:47

标题: 在通用Haar随机态模型和酉模型之间的翻译

摘要: 黑盒分离是密码学的基石，表明了对各种目标的障碍。最近一系列工作探讨了量子密码原语的黑盒分离。即，已知在Common Haar Random State（CHRS）模型中存在一些分离，尽管这个模型并不被认为是完全分离，而是一个起点。最近有一些作品试图将这些分离提升为一个幺正分离，被认为是完全分离。不幸的是，我们发现一些这些提升结果中存在显著的错误。我们证明了一般条件下CHRS分离可以被通用提升，从而给出了各种量子原语之间完整幺正分离的简单、模块化和无漏洞的证明。我们的技术允许更简单地证明现有的分离，以及之前只在CHRS模型中已知的新分离。

更新时间: 2025-07-23 23:22:47

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2503.11634v2

Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications Across Lab and Field Settings

Photoplethysmography (PPG)-based foundation models are gaining traction due to the widespread use of PPG in biosignal monitoring and their potential to generalize across diverse health applications. In this paper, we introduce Pulse-PPG, the first open-source PPG foundation model trained exclusively on raw PPG data collected over a 100-day field study with 120 participants. Existing PPG foundation models are either open-source but trained on clinical data or closed-source, limiting their applicability in real-world settings. We evaluate Pulse-PPG across multiple datasets and downstream tasks, comparing its performance against a state-of-the-art foundation model trained on clinical data. Our results demonstrate that Pulse-PPG, trained on uncurated field data, exhibits superior generalization across clinical and mobile health applications in both lab and field settings. This suggests that exposure to real-world variability enables the model to learn fine-grained representations, making it more adaptable across tasks. Furthermore, pre-training on field data surprisingly outperforms its pre-training on clinical data in many tasks, reinforcing the importance of training on real-world, diverse datasets. To encourage further advancements in robust foundation models leveraging field data, we plan to release Pulse-PPG, providing researchers with a powerful resource for developing more generalizable PPG-based models.

Updated: 2025-07-23 23:13:37

标题: 脉搏-PPG：一种开源的经过现场训练的PPG基础模型，适用于实验室和现场环境的可穿戴应用

摘要: 基于光电容积描记术（PPG）的基础模型因PPG在生物信号监测中的广泛应用以及其在各种健康应用中的潜力而受到关注。本文介绍了Pulse-PPG，这是第一个在120名参与者进行了为期100天的现场研究中仅使用原始PPG数据训练的开源PPG基础模型。现有的PPG基础模型要么是开源但是基于临床数据训练，要么是封闭源代码的，限制了它们在实际环境中的适用性。我们在多个数据集和下游任务中评估了Pulse-PPG，并将其性能与基于临床数据训练的最先进基础模型进行了比较。我们的结果表明，Pulse-PPG在未经整理的现场数据上训练后，在实验室和现场环境中在临床和移动健康应用中表现出卓越的泛化能力。这表明，接触真实世界的变化使模型能够学习精细的表示，使其在各种任务中更具适应性。此外，令人惊讶的是，在许多任务中，对现场数据进行预训练优于对临床数据进行预训练，强调了在真实世界和多样化数据集上进行训练的重要性。为了鼓励进一步发展利用现场数据的强大基础模型，我们计划发布Pulse-PPG，为研究人员提供一个开发更具泛化能力的基于PPG模型的强大资源。

更新时间: 2025-07-23 23:13:37

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2502.01108v2

Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications Across Lab and Field Settings

Updated: 2025-07-23 23:13:37

标题: 脉动-PPG：一种开源的现场训练PPG基础模型，适用于实验室和现场环境中的可穿戴应用

摘要: 基于光电容积描记术(PPG)的基础模型因PPG在生物信号监测中的广泛应用和其在各种健康应用中的潜力而受到关注。在本文中，我们介绍了Pulse-PPG，这是第一个在120名参与者的100天实地研究中专门训练的开源PPG基础模型。现有的PPG基础模型要么是开源的，但是基于临床数据训练，要么是封闭源的，限制了它们在现实世界环境中的适用性。我们评估了Pulse-PPG在多个数据集和下游任务中的表现，将其与基于临床数据训练的最新基础模型进行比较。我们的结果表明，经过未经筛选的实地数据训练的Pulse-PPG在临床和移动健康应用中都表现出优越的泛化性能，无论是在实验室还是实地环境中。这表明，暴露于真实世界的变化性使得模型能够学习精细的表示，使其更适应各种任务。此外，对实地数据进行预训练在许多任务中出乎意料地优于对临床数据进行预训练，强调了在真实世界多样化数据进行训练的重要性。为了促进利用实地数据进一步推进强大基础模型的发展，我们计划发布Pulse-PPG，为研究人员提供一个强大的资源，用于开发更具泛化能力的基于PPG的模型。

更新时间: 2025-07-23 23:13:37

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2502.01108v2

Machine Learning Workflow for Analysis of High-Dimensional Order Parameter Space: A Case Study of Polymer Crystallization from Molecular Dynamics Simulations

Currently, identification of crystallization pathways in polymers is being carried out using molecular simulation-based data on a preset cut-off point on a single order parameter (OP) to define nucleated or crystallized regions. Aside from sensitivity to cut-off, each of these OPs introduces its own systematic biases. In this study, an integrated machine learning workflow is presented to accurately quantify crystallinity in polymeric systems using atomistic molecular dynamics data. Each atom is represented by a high-dimensional feature vector that combines geometric, thermodynamic-like, and symmetry-based descriptors. Low dimensional embeddings are employed to expose latent structural fingerprints within atomic environments. Subsequently, unsupervised clustering on the embeddings identified crystalline and amorphous atoms with high fidelity. After generating high quality labels with multidimensional data, we use supervised learning techniques to identify a minimal set of order parameters that can fully capture this label. Various tests were conducted to reduce the feature set, demonstrating that using only three order parameters is sufficient to recreate the crystallization labels. Based on these observed OPs, the crystallinity index (C-index) is defined as the logistic regression model's probability of crystallinity, remaining bimodal throughout the process and achieving over 0.98 classification performance (AUC). Notably, a model trained on one or a few snapshots enables efficient on-the-fly computation of crystallinity. Lastly, we demonstrate how the optimal C-index fit evolves during various stages of crystallization, supporting the hypothesis that entropy dominates early nucleation, while symmetry gains relevance later. This workflow provides a data-driven strategy for OP selection and a metric to monitor structural transformations in large-scale polymer simulations.

Updated: 2025-07-23 23:02:10

标题: 机器学习工作流程用于分析高维序参空间：以分子动力学模拟为例的聚合物结晶案例研究

摘要: 目前，在聚合物结晶路径的识别中，通常使用基于分子模拟数据的集合截止点来定义已成核或已结晶区域的单一序参数（OP）。除了对截止点的敏感性外，每种序参数都会引入自己的系统偏差。在本研究中，提出了一种整合的机器学习工作流程，利用原子级分子动力学数据准确量化聚合物系统中的结晶度。每个原子由一个高维特征向量表示，结合了几何、热力学样和基于对称性的描述符。低维嵌入用于揭示原子环境中的潜在结构指纹。随后，在嵌入上进行无监督聚类，识别出高度可信的结晶和无定形原子。在使用多维数据生成高质量标签后，我们使用监督学习技术来识别能够完全捕捉这些标签的最小一组序参数。进行了各种测试来减少特征集，结果表明只使用三个序参数就足以重新创建结晶标签。基于这些观察到的序参数，结晶度指数（C指数）被定义为逻辑回归模型对结晶度的概率，在整个过程中保持双峰，且达到超过0.98的分类性能（AUC）。值得注意的是，训练在一个或几个快照上的模型能够有效地实时计算结晶度。最后，我们展示了最佳C指数拟合在不同结晶阶段如何演变，支持熵在早期成核中占主导地位，而对称性在后期变得更为重要的假设。这个工作流程为序参数选择提供了数据驱动的策略，以及监测大规模聚合物模拟中结构转化的度量标准。

更新时间: 2025-07-23 23:02:10

领域: physics.comp-ph,cs.LG

下载: http://arxiv.org/abs/2507.17980v1

Machine Learning Workflow for Analysis of High-Dimensional Order Parameter Space: A Case Study of Polymer Crystallization from Molecular Dynamics Simulations

Updated: 2025-07-23 23:02:10

标题: 机器学习工作流程用于分析高维序参空间：聚合物结晶的案例研究，来自分子动力学模拟

摘要: 目前，通过使用基于分子模拟数据的集合截断点来识别聚合物晶化途径，以定义已成核或已结晶区域的单一序参（OP）。除了对截断点的敏感性外，每个OP都会引入自己的系统性偏差。在本研究中，提出了一个集成的机器学习工作流程，利用原子级分子动力学数据准确量化聚合物系统中的结晶性。每个原子通过组合几何、热力学和基于对称性的描述符的高维特征向量来表示。低维嵌入被用来揭示原子环境中的潜在结构指纹。随后，在嵌入上进行无监督聚类，高度准确地识别出结晶和非晶态原子。在利用多维数据生成高质量标签后，我们使用监督学习技术来识别能够完全捕捉该标签的最小一组序参。进行了各种测试以减少特征集，结果表明仅使用三个序参就足以重现结晶标签。基于观察到的OPs，结晶度指数（C-指数）被定义为逻辑回归模型对结晶性的概率，整个过程中保持双峰分布，并实现超过0.98的分类性能（AUC）。值得注意的是，使用一个或少数快照训练的模型能够有效地实时计算结晶度。最后，我们展示了在结晶的各个阶段中最佳C-指数拟合是如何演变的，支持熵在早期成核中占主导地位，而对称性在后期变得更加重要的假设。这个工作流程为OP选择提供了数据驱动的策略，并提供了一个度量标准，用于监测大规模聚合物模拟中的结构转变。

更新时间: 2025-07-23 23:02:10

领域: physics.comp-ph,cs.LG

下载: http://arxiv.org/abs/2507.17980v1

SIFOTL: A Principled, Statistically-Informed Fidelity-Optimization Method for Tabular Learning

Identifying the factors driving data shifts in tabular datasets is a significant challenge for analysis and decision support systems, especially those focusing on healthcare. Privacy rules restrict data access, and noise from complex processes hinders analysis. To address this challenge, we propose SIFOTL (Statistically-Informed Fidelity-Optimization Method for Tabular Learning) that (i) extracts privacy-compliant data summary statistics, (ii) employs twin XGBoost models to disentangle intervention signals from noise with assistance from LLMs, and (iii) merges XGBoost outputs via a Pareto-weighted decision tree to identify interpretable segments responsible for the shift. Unlike existing analyses which may ignore noise or require full data access for LLM-based analysis, SIFOTL addresses both challenges using only privacy-safe summary statistics. Demonstrating its real-world efficacy, for a MEPS panel dataset mimicking a new Medicare drug subsidy, SIFOTL achieves an F1 score of 0.85, substantially outperforming BigQuery Contribution Analysis (F1=0.46) and statistical tests (F1=0.20) in identifying the segment receiving the subsidy. Furthermore, across 18 diverse EHR datasets generated based on Synthea ABM, SIFOTL sustains F1 scores of 0.86-0.96 without noise and >= 0.75 even with injected observational noise, whereas baseline average F1 scores range from 0.19-0.67 under the same tests. SIFOTL, therefore, provides an interpretable, privacy-conscious workflow that is empirically robust to observational noise.

Updated: 2025-07-23 23:00:24

标题: SIFOTL：一种基于统计信息的原则性可信度优化方法，用于表格学习

摘要: 在表格数据集中识别驱动数据变化的因素对于分析和决策支持系统是一个重大挑战，特别是那些专注于医疗保健的系统。隐私规则限制了数据访问，复杂过程中的噪音阻碍了分析。为了解决这一挑战，我们提出了SIFOTL（用于表格学习的统计信息支持的保真度优化方法），该方法（i）提取符合隐私规定的数据摘要统计信息，（ii）利用双XGBoost模型来从噪音中分离干预信号，辅助LLMs，并且（iii）通过帕累托加权决策树合并XGBoost输出，以识别负责数据变化的可解释分段。与现有的分析方法不同，SIFOTL可以在只使用隐私安全的摘要统计信息的情况下解决这两个挑战，而不会忽略噪音或需要对LLM进行全面数据访问。通过在模拟新的医疗保险药物补助的MEPS面板数据集上展示其实际有效性，SIFOTL实现了0.85的F1分数，明显优于BigQuery贡献分析（F1=0.46）和统计测试（F1=0.20）在识别接受补助的分段方面。此外，在基于Synthea ABM生成的18个不同电子健康记录数据集中，SIFOTL在没有噪音的情况下保持了0.86-0.96的F1分数，即使有注入的观察性噪音，也保持了>= 0.75的F1分数，而在相同测试下，基准平均F1分数范围从0.19到0.67。因此，SIFOTL提供了一个可解释的、注重隐私的工作流程，从经验上对观察性噪音具有强大的鲁棒性。

更新时间: 2025-07-23 23:00:24

领域: cs.LG

下载: http://arxiv.org/abs/2507.17979v1

SIFOTL: A Principled, Statistically-Informed Fidelity-Optimization Method for Tabular Learning

Updated: 2025-07-23 23:00:24

标题: SIFOTL：一种基于统计信息的原则性完整性优化方法，用于表格学习

摘要: 在表格数据集中确定驱动数据变化的因素是分析和决策支持系统面临的重要挑战，特别是那些专注于医疗保健的系统。隐私规则限制了数据访问，复杂过程中的噪音阻碍了分析。为了解决这一挑战，我们提出了SIFOTL（用于表格学习的统计信息驱动保真优化方法），该方法（i）提取符合隐私规定的数据摘要统计信息，（ii）利用双 XGBoost 模型来解开干预信号和噪音，并在LLMs的帮助下，（iii）通过帕累托加权决策树将 XGBoost 的输出合并，以识别负责数据变化的可解释部分。与现有分析不同，后者可能会忽略噪音或需要完整的数据访问进行基于LLM的分析，SIFOTL只使用符合隐私规定的摘要统计信息解决了这两个挑战。通过展示其在模拟新医保药物补贴的MEPS面板数据集上的实际有效性，SIFOTL实现了0.85的F1分数，明显优于BigQuery贡献分析（F1=0.46）和统计测试（F1=0.20）在识别接受补贴的部分上。此外，在基于Synthea ABM生成的18个不同的电子健康记录（EHR）数据集中，SIFOTL在没有噪音的情况下保持0.86-0.96的F1分数，并且即使注入了观察噪音，也保持>=0.75的F1分数，而基准平均F1分数在相同的测试下范围从0.19到0.67。因此，SIFOTL提供了一个可解释的、注重隐私的工作流程，经验上对观察噪音具有稳健的应对能力。

更新时间: 2025-07-23 23:00:24

领域: cs.LG

下载: http://arxiv.org/abs/2507.17979v1

MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection

Phishing emails continue to pose a significant threat to cybersecurity by exploiting human vulnerabilities through deceptive content and malicious payloads. While Machine Learning (ML) models are effective at detecting phishing threats, their performance largely relies on the quality and diversity of the training data. This paper presents MeAJOR (Merged email Assets from Joint Open-source Repositories) Corpus, a novel, multi-source phishing email dataset designed to overcome critical limitations in existing resources. It integrates 135894 samples representing a broad number of phishing tactics and legitimate emails, with a wide spectrum of engineered features. We evaluated the dataset's utility for phishing detection research through systematic experiments with four classification models (RF, XGB, MLP, and CNN) across multiple feature configurations. Results highlight the dataset's effectiveness, achieving 98.34% F1 with XGB. By integrating broad features from multiple categories, our dataset provides a reusable and consistent resource, while addressing common challenges like class imbalance, generalisability and reproducibility.

Updated: 2025-07-23 22:57:08

标题: MeAJOR语料库：用于钓鱼邮件检测的多源数据集

摘要: 网络钓鱼邮件继续通过欺骗性内容和恶意载荷利用人类的弱点，对网络安全构成重大威胁。虽然机器学习（ML）模型在检测网络钓鱼威胁方面表现出效果，但其性能在很大程度上取决于训练数据的质量和多样性。本文介绍了MeAJOR（来自联合开源资源的合并电子邮件资产）语料库，这是一个新颖的、多源网络钓鱼邮件数据集，旨在克服现有资源中的关键限制。该数据集集成了135894个样本，代表了广泛的网络钓鱼策略和合法邮件，具有广泛的工程特征。我们通过对四种分类模型（RF、XGB、MLP和CNN）在多种特征配置下进行系统实验，评估了数据集在网络钓鱼检测研究中的实用性。结果突显了该数据集的有效性，使用XGB实现了98.34%的F1。通过集成多个类别的广泛特征，我们的数据集提供了一个可重复使用且一致的资源，同时解决了类别不平衡、泛化性和可重现性等常见挑战。

更新时间: 2025-07-23 22:57:08

领域: cs.CR,cs.AI,cs.HC,68P20 (Primary) 68T05, 68T07, 68T10 (Secondary),K.6.5; I.2.6; I.2.7; C.2.0

下载: http://arxiv.org/abs/2507.17978v1

MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection

Updated: 2025-07-23 22:57:08

标题: MeAJOR语料库：用于网络钓鱼邮件检测的多源数据集

摘要: 网络钓鱼邮件继续通过欺骗性内容和恶意载荷利用人类的弱点，构成对网络安全的重大威胁。虽然机器学习（ML）模型在检测网络钓鱼威胁方面效果显著，但它们的性能在很大程度上取决于训练数据的质量和多样性。本文介绍了MeAJOR（来自联合开源存储库的合并电子邮件资产）语料库，这是一个新颖的、多源网络钓鱼邮件数据集，旨在克服现有资源中的关键限制。它集成了135894个样本，代表了广泛的网络钓鱼策略和合法电子邮件，具有各种工程特性。我们通过对四种分类模型（RF、XGB、MLP和CNN）在多个特征配置下的系统实验来评估该数据集对网络钓鱼检测研究的实用性。结果突出了数据集的有效性，XGB实现了98.34%的F1。通过整合来自多个类别的广泛特征，我们的数据集提供了一个可重复使用和一致的资源，同时解决了诸如类别不平衡、泛化能力和可重现性等常见挑战。

更新时间: 2025-07-23 22:57:08

领域: cs.CR,cs.AI,cs.HC,68P20 (Primary) 68T05, 68T07, 68T10 (Secondary),K.6.5; I.2.6; I.2.7; C.2.0

下载: http://arxiv.org/abs/2507.17978v1

Improving the Computational Efficiency and Explainability of GeoAggregator

Accurate modeling and explaining geospatial tabular data (GTD) are critical for understanding geospatial phenomena and their underlying processes. Recent work has proposed a novel transformer-based deep learning model named GeoAggregator (GA) for this purpose, and has demonstrated that it outperforms other statistical and machine learning approaches. In this short paper, we further improve GA by 1) developing an optimized pipeline that accelerates the dataloading process and streamlines the forward pass of GA to achieve better computational efficiency; and 2) incorporating a model ensembling strategy and a post-hoc model explanation function based on the GeoShapley framework to enhance model explainability. We validate the functionality and efficiency of the proposed strategies by applying the improved GA model to synthetic datasets. Experimental results show that our implementation improves the prediction accuracy and inference speed of GA compared to the original implementation. Moreover, explanation experiments indicate that GA can effectively captures the inherent spatial effects in the designed synthetic dataset. The complete pipeline has been made publicly available for community use (https://github.com/ruid7181/GA-sklearn).

Updated: 2025-07-23 22:51:09

标题: 改进GeoAggregator的计算效率和可解释性

摘要: 准确建模和解释地理空间表格数据（GTD）对于理解地理空间现象及其潜在过程至关重要。最近的工作提出了一种新颖的基于Transformer的深度学习模型，名为GeoAggregator（GA），旨在实现这一目的，并证明其优于其他统计和机器学习方法。在本短文中，我们通过以下两种方式进一步改进GA：1）开发了一个优化的流水线，加快数据加载过程并简化GA的前向传递，以实现更好的计算效率；2）结合模型集成策略和基于GeoShapley框架的事后模型解释功能，增强模型的可解释性。我们通过将改进后的GA模型应用于合成数据集来验证所提出策略的功能和效率。实验结果表明，我们的实现相较于原始实现，提高了GA的预测准确性和推断速度。此外，解释实验表明，GA能有效地捕捉设计的合成数据集中的固有空间效应。完整的流水线已公开提供给社区使用（https://github.com/ruid7181/GA-sklearn）。

更新时间: 2025-07-23 22:51:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17977v1

Improving the Computational Efficiency and Explainability of GeoAggregator

Updated: 2025-07-23 22:51:09

标题: 提高GeoAggregator的计算效率和可解释性

摘要: 准确建模和解释地理空间表格数据（GTD）对于理解地理现象及其潜在过程至关重要。最近的研究提出了一种名为GeoAggregator（GA）的新型基于Transformer的深度学习模型，表明其优于其他统计和机器学习方法。在本文中，我们通过以下两种方式进一步改进GA：1）开发了一个优化的流水线，加速数据加载过程并简化GA的前向传递，以实现更好的计算效率；2）结合模型集成策略和基于GeoShapley框架的后续模型解释功能，以增强模型可解释性。我们通过将改进后的GA模型应用于合成数据集来验证所提出的策略的功能性和效率。实验结果显示，我们的实施相对于原始实施改进了GA的预测准确性和推理速度。此外，解释实验表明，GA能够有效捕捉设计的合成数据集中的固有空间效应。完整的流水线已公开可用于社区使用（https://github.com/ruid7181/GA-sklearn）。

更新时间: 2025-07-23 22:51:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17977v1

Natural Language Processing for Tigrinya: Current State and Future Directions

Despite being spoken by millions of people, Tigrinya remains severely underrepresented in Natural Language Processing (NLP) research. This work presents a comprehensive survey of NLP research for Tigrinya, analyzing over 40 studies spanning more than a decade of work from 2011 to 2025. We systematically review the current state of computational resources, models, and applications across ten distinct downstream tasks, including morphological processing, machine translation, speech recognition, and question-answering. Our analysis reveals a clear trajectory from foundational, rule-based systems to modern neural architectures, with progress consistently unlocked by resource creation milestones. We identify key challenges rooted in Tigrinya's morphological complexity and resource scarcity, while highlighting promising research directions, including morphology-aware modeling, cross-lingual transfer, and community-centered resource development. This work serves as both a comprehensive reference for researchers and a roadmap for advancing Tigrinya NLP. A curated metadata of the surveyed studies and resources is made publicly available.\footnote{Tigrinya NLP Anthology: https://github.com/fgaim/tigrinya-nlp-anthology.

Updated: 2025-07-23 22:45:30

标题: 提格里尼亚语的自然语言处理：现状与未来发展方向

摘要: 尽管提格雷尼亚语被数百万人使用，但在自然语言处理（NLP）研究中仍然严重缺乏代表性。本文提出了一项关于提格雷尼亚语NLP研究的全面调查，分析了2011年至2025年间超过40项研究，跨越了十多年的工作。我们系统地审查了当前计算资源、模型和应用的现状，涵盖了十个不同的下游任务，包括形态处理、机器翻译、语音识别和问答。我们的分析揭示了从基础规则系统到现代神经结构的明显轨迹，进展始终是由资源创建里程碑不断解锁的。我们确定了根植于提格雷尼亚语形态复杂性和资源匮乏的关键挑战，同时强调了有前途的研究方向，包括形态感知建模、跨语言转移和面向社区的资源开发。这项工作既是研究人员的全面参考，也是推进提格雷尼亚语NLP的路线图。调查研究和资源的精选元数据已公开提供。

更新时间: 2025-07-23 22:45:30

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2507.17974v1

Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA

Recent advances in text-to-video generation have enabled high-quality synthesis from text and image prompts. While the personalization of dynamic concepts, which capture subject-specific appearance and motion from a single video, is now feasible, most existing methods require per-instance fine-tuning, limiting scalability. We introduce a fully zero-shot framework for dynamic concept personalization in text-to-video models. Our method leverages structured 2x2 video grids that spatially organize input and output pairs, enabling the training of lightweight Grid-LoRA adapters for editing and composition within these grids. At inference, a dedicated Grid Fill module completes partially observed layouts, producing temporally coherent and identity preserving outputs. Once trained, the entire system operates in a single forward pass, generalizing to previously unseen dynamic concepts without any test-time optimization. Extensive experiments demonstrate high-quality and consistent results across a wide range of subjects beyond trained concepts and editing scenarios.

Updated: 2025-07-23 22:09:38

标题: 基于网格的零样本动态概念个性化与LoRA

摘要: 最近文本到视频生成技术的进步使得从文本和图像提示中实现了高质量的合成。虽然现在已经可以个性化动态概念，从单个视频中捕捉特定主题的外观和动作，但大多数现有方法需要每个实例的微调，限制了可扩展性。我们引入了一个全新的零样本框架，用于文本到视频模型中的动态概念个性化。我们的方法利用了结构化的2x2视频网格，空间组织输入和输出对，使得可以训练轻量级的Grid-LoRA适配器，用于在这些网格内进行编辑和组合。在推断中，一个专门的Grid Fill模块完成了部分观察到的布局，产生了时间连贯和保持身份的输出。一旦训练完成，整个系统在单次前向传递中运行，泛化到以前未见的动态概念，而无需任何测试时优化。广泛的实验表明，在训练概念和编辑场景之外，跨一系列主题展示了高质量和一致的结果。

更新时间: 2025-07-23 22:09:38

领域: cs.GR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.17963v1

Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA

Updated: 2025-07-23 22:09:38

标题: 基于网格的LoRA零射动态概念个性化

摘要: 最近在文本到视频生成领域取得了重大进展，实现了从文本和图像提示中生成高质量的合成。虽然现在可以个性化动态概念，从单个视频中捕捉特定主题的外观和动作，但大多数现有方法需要每个实例进行微调，从而限制了可扩展性。我们引入了一个完全零样本的框架，用于文本到视频模型中的动态概念个性化。我们的方法利用了结构化的2x2视频网格，空间组织输入和输出对，从而可以训练轻量级的Grid-LoRA适配器，用于在这些网格内进行编辑和组合。在推理阶段，专用的Grid Fill模块完成部分观察到的布局，产生时间上连贯且保留身份的输出。一旦训练完成，整个系统在单次前向传递中运行，可以推广到以前未见的动态概念，而无需任何测试时优化。广泛的实验表明，在训练的概念和编辑场景之外，跨各种主题展示了高质量和一致的结果。

更新时间: 2025-07-23 22:09:38

领域: cs.GR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.17963v1

TimelyHLS: LLM-Based Timing-Aware and Architecture-Specific FPGA HLS Optimization

Achieving timing closure and design-specific optimizations in FPGA-targeted High-Level Synthesis (HLS) remains a significant challenge due to the complex interaction between architectural constraints, resource utilization, and the absence of automated support for platform-specific pragmas. In this work, we propose TimelyHLS, a novel framework integrating Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) to automatically generate and iteratively refine HLS code optimized for FPGA-specific timing and performance requirements. TimelyHLS is driven by a structured architectural knowledge base containing FPGA-specific features, synthesis directives, and pragma templates. Given a kernel, TimelyHLS generates HLS code annotated with both timing-critical and design-specific pragmas. The synthesized RTL is then evaluated using commercial toolchains, and simulation correctness is verified against reference outputs via custom testbenches. TimelyHLS iteratively incorporates synthesis logs and performance reports into the LLM engine for refinement in the presence of functional discrepancies. Experimental results across 10 FPGA architectures and diverse benchmarks show that TimelyHLS reduces the need for manual tuning by up to 70%, while achieving up to 4x latency speedup (e.g., 3.85x for Matrix Multiplication, 3.7x for Bitonic Sort) and over 50% area savings in certain cases (e.g., 57% FF reduction in Viterbi). TimelyHLS consistently achieves timing closure and functional correctness across platforms, highlighting the effectiveness of LLM-driven, architecture-aware synthesis in automating FPGA design.

Updated: 2025-07-23 22:08:15

标题: TimelyHLS: 基于LLM的时序感知和特定架构的FPGA HLS优化

摘要: 在面向FPGA的高级综合（HLS）中实现时间闭合和设计特定优化仍然是一个重要挑战，这是由于架构约束、资源利用和缺乏针对平台的特定指令的自动支持之间复杂的相互作用。在这项工作中，我们提出了TimelyHLS，这是一个新颖的框架，将大型语言模型（LLMs）与检索增强生成（RAG）相结合，以自动生成并迭代优化适用于FPGA特定时序和性能要求的HLS代码。TimelyHLS受结构化架构知识库驱动，其中包含FPGA特定特性、综合指令和编程模板。给定一个核心，TimelyHLS生成带有时间关键和设计特定指令标注的HLS代码。然后使用商业工具链评估合成的RTL，并通过自定义测试台验证仿真正确性。在功能不一致的情况下，TimelyHLS迭代地将综合日志和性能报告整合到LLM引擎中进行优化。在10个FPGA架构和多样化基准测试之间的实验结果显示，TimelyHLS减少了高达70%的手动调试需求，同时实现了高达4倍的延迟加速（例如，矩阵乘法为3.85倍，比特级排序为3.7倍）和在某些情况下超过50%的面积节约（例如，Viterbi中减少了57%的FF）。TimelyHLS在各平台上始终实现时间闭合和功能正确性，突显了LLM驱动的、架构感知的综合在自动化FPGA设计中的有效性。

更新时间: 2025-07-23 22:08:15

领域: cs.CR

下载: http://arxiv.org/abs/2507.17962v1

TimelyHLS: LLM-Based Timing-Aware and Architecture-Specific FPGA HLS Optimization

Updated: 2025-07-23 22:08:15

标题: TimelyHLS: 基于LLM的时序感知和特定架构的FPGA HLS优化

摘要: 在面向FPGA的高级综合（HLS）中实现时序闭合和设计特定优化仍然是一个重要挑战，这是由于架构约束、资源利用和缺乏针对平台特定编译指示的自动支持之间复杂的相互作用。在这项工作中，我们提出了TimelyHLS，这是一个新颖的框架，将大语言模型（LLMs）与检索增强生成（RAG）集成在一起，用于自动生成并迭代优化针对FPGA特定时序和性能要求的HLS代码。TimelyHLS受结构化的架构知识库驱动，其中包含FPGA特定特性、综合指令和编译指示模板。给定一个核心，TimelyHLS生成带有时序关键和设计特定编译指示的HLS代码。然后使用商业工具链评估合成的RTL，并通过自定义测试台验证仿真正确性。TimelyHLS迭代地将综合日志和性能报告纳入LLM引擎中，以在功能差异存在的情况下进行细化。跨10个FPGA架构和不同基准的实验结果显示，TimelyHLS减少了高达70％的手动调整需求，同时在某些情况下实现了高达4倍的延迟加速（例如，矩阵乘法为3.85倍，比特随机排序为3.7倍）和超过50％的面积节约（例如，Viterbi中FF减少了57％）。TimelyHLS在各个平台上始终实现时序闭合和功能正确性，突显了LLM驱动的架构感知综合在自动化FPGA设计方面的有效性。

更新时间: 2025-07-23 22:08:15

领域: cs.CR

下载: http://arxiv.org/abs/2507.17962v1

VIBE: Video-Input Brain Encoder for fMRI Response Modeling

We present VIBE, a two-stage Transformer that fuses multi-modal video, audio, and text features to predict fMRI activity. Representations from open-source models (Qwen2.5, BEATs, Whisper, SlowFast, V-JEPA) are merged by a modality-fusion transformer and temporally decoded by a prediction transformer with rotary embeddings. Trained on 65 hours of movie data from the CNeuroMod dataset and ensembled across 20 seeds, VIBE attains mean parcel-wise Pearson correlations of 32.25 on in-distribution Friends S07 and 21.25 on six out-of-distribution films. An earlier iteration of the same architecture obtained 0.3198 and 0.2096, respectively, winning Phase-1 and placing second overall in the Algonauts 2025 Challenge.

Updated: 2025-07-23 22:02:56

标题: VIBE：视频输入脑编码器用于fMRI响应建模

摘要: 我们提出了VIBE，这是一个两阶段的Transformer模型，它融合多模态视频、音频和文本特征以预测fMRI活动。来自开源模型（Qwen2.5、BEATs、Whisper、SlowFast、V-JEPA）的表示由一个模态融合Transformer进行合并，并通过带有旋转嵌入的预测Transformer进行时间解码。在CNeuroMod数据集的65小时电影数据上进行训练，并在20个种子上进行集成，VIBE在分布内的《老友记》第7季和六部分布外电影上分别达到32.25和21.25的平均parcel-wise Pearson相关性。相同架构的早期版本在Algonauts 2025挑战赛中获得了0.3198和0.2096的相关性，分别赢得了第一阶段，并在总体上获得了第二名。

更新时间: 2025-07-23 22:02:56

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.17958v1

VIBE: Video-Input Brain Encoder for fMRI Response Modeling

Updated: 2025-07-23 22:02:56

标题: VIBE：用于fMRI响应建模的视频输入脑编码器

摘要: 我们提出了VIBE，一个两阶段的Transformer，融合多模态视频、音频和文本特征来预测fMRI活动。来自开源模型（Qwen2.5、BEATs、Whisper、SlowFast、V-JEPA）的表示通过模态融合Transformer进行合并，并通过带有旋转嵌入的预测Transformer进行时间解码。在CNeuroMod数据集的65小时电影数据上训练，并在20个种子上进行集成，VIBE在分布内的《老友记》第7季上达到了32.25的平均包裹级皮尔逊相关系数，在六部分布外的电影上达到了21.25。相同架构的早期版本分别获得了0.3198和0.2096，在Algonauts 2025挑战赛中获得了第一阶段的胜利，并在总体上排名第二。

更新时间: 2025-07-23 22:02:56

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.17958v1

Formal Verification of the Safegcd Implementation

The modular inverse is an essential piece of computation required for elliptic curve operations used for digital signatures in Bitcoin and other applications. A novel approach to the extended Euclidean algorithm has been developed by Bernstein and Yang within the last few years and incorporated into the libsecp256k1 cryptographic library used by Bitcoin. However, novel algorithms introduce new risks of errors. To address this we have completed a computer verified proof of the correctness of (one of) libsecp256k1's modular inverse implementations with the Coq proof assistant using the Verifiable C's implementation of separation logic.

Updated: 2025-07-23 21:57:30

标题: Safegcd实现的形式验证

摘要: 模块逆元是比特币和其他应用程序中用于数字签名的椭圆曲线操作所需的计算的重要部分。伯恩斯坦和杨最近几年内开发了一种新颖的扩展欧几里德算法，已经被整合到比特币使用的libsecp256k1加密库中。然而，新颖的算法引入了新的错误风险。为了解决这个问题，我们使用Coq证明助手和Verifiable C的分离逻辑实现完成了对libsecp256k1模块逆元实现之一的正确性的计算机验证证明。

更新时间: 2025-07-23 21:57:30

领域: cs.CR,cs.LO

下载: http://arxiv.org/abs/2507.17956v1

Formal Verification of the Safegcd Implementation

Updated: 2025-07-23 21:57:30

标题: 对Safegcd实现的形式验证

摘要: 模块逆是比特币和其他应用程序中用于数字签名的椭圆曲线操作所需的基本计算组件。伯恩斯坦和杨最近几年内开发了一种新颖的扩展欧几里德算法，已经被整合进比特币使用的libsecp256k1加密库中。然而，新颖的算法引入了新的错误风险。为了解决这个问题，我们使用Coq证明助手和可验证C的分离逻辑实现，完成了对libsecp256k1的模块逆实现之一的正确性的计算机验证证明。

更新时间: 2025-07-23 21:57:30

领域: cs.CR,cs.LO

下载: http://arxiv.org/abs/2507.17956v1

On the Structure of Game Provenance and its Applications

Provenance in databases has been thoroughly studied for positive and for recursive queries, then for first-order (FO) queries, i.e., having negation but no recursion. Query evaluation can be understood as a two-player game where the opponents argue whether or not a tuple is in the query answer. This game-theoretic approach yields a natural provenance model for FO queries, unifying how and why-not provenance. Here, we study the fine-grain structure of game provenance. A game $G=(V,E)$ consists of positions $V$ and moves $E$ and can be solved by computing the well-founded model of a single, unstratifiable rule: \[ \text{win}(X) \leftarrow \text{move}(X, Y), \neg \, \text{win}(Y). \] In the solved game $G^{\lambda}$, the value of a position $x\,{\in}\,V$ is either won, lost, or drawn. This value is explained by the provenance $\mathscr{P}$(x), i.e., certain (annotated) edges reachable from $x$. We identify seven edge types that give rise to new kinds of provenance, i.e., potential, actual, and primary, and demonstrate that "not all moves are created equal". We describe the new provenance types, show how they can be computed while solving games, and discuss applications, e.g., for abstract argumentation frameworks.

Updated: 2025-07-23 21:57:22

标题: 关于游戏溯源结构及其应用

摘要: 数据库中的溯源已经被彻底研究过，对于正向和递归查询，然后是对于一阶（FO）查询，即具有否定但没有递归的查询。查询评估可以理解为一个双方对抗的游戏，对手们争论一个元组是否在查询答案中。这种博弈论方法为FO查询提供了一种自然的溯源模型，统一了如何和为什么不溯源。在这里，我们研究了游戏溯源的细粒度结构。一个游戏$G=(V,E)$由位置$V$和移动$E$组成，可以通过计算单一、不可分层规则的良基模型来解决：\[ \text{win}(X) \leftarrow \text{move}(X, Y), \neg \, \text{win}(Y). \] 在解决的游戏$G^{\lambda}$中，一个位置$x\,{\in}\,V$的值要么赢，要么输，要么平局。这个值由溯源$\mathscr{P}$(x)解释，即从$x$可达的某些（带注释的）边。我们确定了七种边类型，产生了新类型的溯源，即潜在的、实际的和主要的，并且证明了“并非所有移动都是平等的”。我们描述了新的溯源类型，展示了它们在解决游戏时如何计算，并讨论了应用，例如对于抽象论证框架。

更新时间: 2025-07-23 21:57:22

领域: cs.AI

下载: http://arxiv.org/abs/2410.05094v2

Clo-HDnn: A 4.66 TFLOPS/W and 3.78 TOPS/W Continual On-Device Learning Accelerator with Energy-efficient Hyperdimensional Computing via Progressive Search

Clo-HDnn is an on-device learning (ODL) accelerator designed for emerging continual learning (CL) tasks. Clo-HDnn integrates hyperdimensional computing (HDC) along with low-cost Kronecker HD Encoder and weight clustering feature extraction (WCFE) to optimize accuracy and efficiency. Clo-HDnn adopts gradient-free CL to efficiently update and store the learned knowledge in the form of class hypervectors. Its dual-mode operation enables bypassing costly feature extraction for simpler datasets, while progressive search reduces complexity by up to 61% by encoding and comparing only partial query hypervectors. Achieving 4.66 TFLOPS/W (FE) and 3.78 TOPS/W (classifier), Clo-HDnn delivers 7.77x and 4.85x higher energy efficiency compared to SOTA ODL accelerators.

Updated: 2025-07-23 21:50:28

标题: Clo-HDnn：一种4.66 TFLOPS/W和3.78 TOPS/W的持续设备学习加速器，通过渐进搜索实现能效高的超维计算

摘要: Clo-HDnn是一种专为新兴的持续学习（CL）任务设计的设备上学习（ODL）加速器。Clo-HDnn整合了超维计算（HDC）以及低成本的Kronecker HD编码器和权重聚类特征提取（WCFE），以优化准确性和效率。Clo-HDnn采用无梯度CL，以高效地更新和存储学到的知识，以类超向量的形式呈现。其双模式操作使其能够为简单数据集绕过昂贵的特征提取，而渐进式搜索通过仅对部分查询超向量进行编码和比较，将复杂性降低了61%。Clo-HDnn实现了4.66 TFLOPS/W（前端）和3.78 TOPS/W（分类器），与SOTA ODL加速器相比，能提供7.77倍和4.85倍更高的能效。

更新时间: 2025-07-23 21:50:28

领域: cs.LG

下载: http://arxiv.org/abs/2507.17953v1

Clo-HDnn: A 4.66 TFLOPS/W and 3.78 TOPS/W Continual On-Device Learning Accelerator with Energy-efficient Hyperdimensional Computing via Progressive Search

Updated: 2025-07-23 21:50:28

标题: Clo-HDnn：一种4.66 TFLOPS/W和3.78 TOPS/W的持续在设备学习加速器，通过渐进搜索实现能效超维计算。

摘要: Clo-HDnn是一种专为新兴的持续学习任务设计的设备上学习（ODL）加速器。Clo-HDnn集成了超维计算（HDC），以及低成本的Kronecker HD编码器和权重聚类特征提取（WCFE），以优化准确性和效率。Clo-HDnn采用无梯度持续学习，以有效地更新和存储学到的知识，以类超矢量的形式。其双模式操作能够在简单数据集中绕过昂贵的特征提取，而通过仅编码和比较部分查询超矢量，渐进式搜索将复杂性减少了61%。Clo-HDnn实现了4.66 TFLOPS/W（特征提取器）和3.78 TOPS/W（分类器），与SOTA ODL加速器相比，能够提供7.77倍和4.85倍更高的能源效率。

更新时间: 2025-07-23 21:50:28

领域: cs.LG

下载: http://arxiv.org/abs/2507.17953v1

Multilingual LLMs Are Not Multilingual Thinkers: Evidence from Hindi Analogy Evaluation

Analogies test a model's ability to infer implicit relationships between concepts, making them a key benchmark for evaluating reasoning capabilities. While large language models (LLMs) are widely evaluated for reasoning in English, their abilities in Indic languages remain understudied, limiting our understanding of whether these models generalize across languages. To address this gap, we introduce a new Hindi Analogy Test Set (HATS), comprising 405 multiple-choice questions sourced from Indian government exams. We benchmark state-of-the-art multilingual LLMs using various prompting strategies and introduce a grounded Chain of Thought approach that leverages cognitive theories of analogical reasoning. This approach improves model performance on Hindi analogy questions. Our experiments show that models perform best with English prompts, irrespective of the prompting strategy. Our test set addresses the lack of a critical resource to evaluate LLM reasoning capabilities in Hindi.

Updated: 2025-07-23 21:50:22

标题: 多语言LLM并不是多语言思维者：来自印地语类比评估的证据

摘要: 类比测试模型推断概念之间的隐含关系的能力，使其成为评估推理能力的关键基准。尽管大型语言模型（LLMs）广泛用于英语推理能力的评估，但它们在印地语中的能力仍未得到充分研究，限制了我们对这些模型是否能够跨越语言进行泛化的理解。为了填补这一空白，我们引入了一个新的印地语类比测试集（HATS），包括来自印度政府考试的405道多项选择题。我们使用各种提示策略对最先进的多语言LLMs进行基准测试，并引入了一种基于认知类比推理理论的基础“思维链”方法。这种方法改善了模型在印地语类比问题上的表现。我们的实验表明，无论采用何种提示策略，模型在英语提示下表现最佳。我们的测试集解决了在印地语中评估LLM推理能力的关键资源缺乏问题。

更新时间: 2025-07-23 21:50:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.13238v2

Are LLM Belief Updates Consistent with Bayes' Theorem?

Do larger and more capable language models learn to update their "beliefs" about propositions more consistently with Bayes' theorem when presented with evidence in-context? To test this, we formulate a Bayesian Coherence Coefficient (BCC) metric and generate a dataset with which to measure the BCC. We measure BCC for multiple pre-trained-only language models across five model families, comparing against the number of model parameters, the amount of training data, and model scores on common benchmarks. Our results provide evidence for our hypothesis that larger and more capable pre-trained language models assign credences that are more coherent with Bayes' theorem. These results have important implications for our understanding and governance of LLMs.

Updated: 2025-07-23 21:46:37

标题: LLM信仰更新与贝叶斯定理一致吗？

摘要: 随着更大更强大的语言模型学习如何更新其关于陈述的“信念”，当提供上下文中的证据时，它们是否更一致地符合贝叶斯定理？为了测试这一点，我们制定了一个贝叶斯一致性系数（BCC）指标，并生成了一个数据集来衡量BCC。我们对五个模型系列中的多个仅预训练语言模型进行了BCC的测量，与模型参数数量、训练数据量以及模型在常见基准测试中的得分进行了比较。我们的结果为我们的假设提供了证据，即更大更强大的预训练语言模型分配的信任更符合贝叶斯定理。这些结果对我们对LLMs的理解和治理具有重要意义。

更新时间: 2025-07-23 21:46:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.17951v1

VERIRAG: Healthcare Claim Verification via Statistical Audit in Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) systems are increasingly adopted in clinical decision support, yet they remain methodologically blind-they retrieve evidence but cannot vet its scientific quality. A paper claiming "Antioxidant proteins decreased after alloferon treatment" and a rigorous multi-laboratory replication study will be treated as equally credible, even if the former lacked scientific rigor or was even retracted. To address this challenge, we introduce VERIRAG, a framework that makes three notable contributions: (i) the Veritable, an 11-point checklist that evaluates each source for methodological rigor, including data integrity and statistical validity; (ii) a Hard-to-Vary (HV) Score, a quantitative aggregator that weights evidence by its quality and diversity; and (iii) a Dynamic Acceptance Threshold, which calibrates the required evidence based on how extraordinary a claim is. Across four datasets-comprising retracted, conflicting, comprehensive, and settled science corpora-the VERIRAG approach consistently outperforms all baselines, achieving absolute F1 scores ranging from 0.53 to 0.65, representing a 10 to 14 point improvement over the next-best method in each respective dataset. We will release all materials necessary for reproducing our results.

Updated: 2025-07-23 21:32:50

标题: VERIRAG: 通过检索增强生成的统计审计进行医疗保险索赔验证

摘要: 检索增强生成（RAG）系统在临床决策支持中越来越受到采用，但它们仍然在方法上盲目-它们检索证据但无法审核其科学质量。一篇声称“抗氧化蛋白在阿洛费龙治疗后减少”的论文和一项严格的多实验室复制研究将被视为同样可信，即使前者缺乏科学严谨性或甚至被撤回。为了解决这一挑战，我们引入了VERIRAG，这是一个框架，它做出了三个显著的贡献：（i）Veritable，一个评估每个来源的方法论严谨性的11点检查表，包括数据完整性和统计有效性；（ii）Hard-to-Vary（HV）得分，一个定量聚合器，通过其质量和多样性对证据进行加权；（iii）一个动态接受阈值，根据声明的特殊程度来校准所需的证据。在包括被撤回、冲突、全面和已解决的科学文献的四个数据集中，VERIRAG方法始终优于所有基线方法，实现的绝对F1分数范围从0.53到0.65，分别比每个数据集中下一个最佳方法提高了10到14个百分点。我们将发布重现我们结果所需的所有材料。

更新时间: 2025-07-23 21:32:50

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2507.17948v1

Evaluating the Performance of AI Text Detectors, Few-Shot and Chain-of-Thought Prompting Using DeepSeek Generated Text

Large language models (LLMs) have rapidly transformed the creation of written materials. LLMs have led to questions about writing integrity, thereby driving the creation of artificial intelligence (AI) detection technologies. Adversarial attacks, such as standard and humanized paraphrasing, inhibit detectors' ability to detect machine-generated text. Previous studies have mainly focused on ChatGPT and other well-known LLMs and have shown varying accuracy across detectors. However, there is a clear gap in the literature about DeepSeek, a recently published LLM. Therefore, in this work, we investigate whether six generally accessible AI detection tools -- AI Text Classifier, Content Detector AI, Copyleaks, QuillBot, GPT-2, and GPTZero -- can consistently recognize text generated by DeepSeek. The detectors were exposed to the aforementioned adversarial attacks. We also considered DeepSeek as a detector by performing few-shot prompting and chain-of-thought reasoning (CoT) for classifying AI and human-written text. We collected 49 human-authored question-answer pairs from before the LLM era and generated matching responses using DeepSeek-v3, producing 49 AI-generated samples. Then, we applied adversarial techniques such as paraphrasing and humanizing to add 196 more samples. These were used to challenge detector robustness and assess accuracy impact. While QuillBot and Copyleaks showed near-perfect performance on original and paraphrased DeepSeek text, others -- particularly AI Text Classifier and GPT-2 -- showed inconsistent results. The most effective attack was humanization, reducing accuracy to 71% for Copyleaks, 58% for QuillBot, and 52% for GPTZero. Few-shot and CoT prompting showed high accuracy, with the best five-shot result misclassifying only one of 49 samples (AI recall 96%, human recall 100%).

Updated: 2025-07-23 21:26:33

标题: 评估AI文本检测器、少样本和DeepSeek生成文本的思维链提示的性能

摘要: 大型语言模型（LLMs）已迅速改变了书面材料的创作。LLMs引发了关于写作诚信的问题，从而推动了人工智能（AI）检测技术的发展。对抗性攻击，如标准和人性化的改写，阻碍了检测器检测机器生成文本的能力。先前的研究主要关注ChatGPT和其他知名LLMs，并显示了不同的准确性。然而，文献中存在关于DeepSeek的明显空白，这是一个最近发布的LLM。因此，在这项工作中，我们调查了六种通常可访问的AI检测工具-- AI文本分类器，内容检测器AI，Copyleaks，QuillBot，GPT-2和GPTZero--是否能够一致地识别由DeepSeek生成的文本。检测器经历了上述对抗性攻击。我们还将DeepSeek视为一个检测器，通过进行少量提示和思维链推理（CoT）来对AI和人工编写的文本进行分类。我们收集了LLM时代之前的49个人类撰写的问答对，并使用DeepSeek-v3生成了相匹配的回答，产生了49个AI生成样本。然后，我们应用了改写和人性化等对抗性技术，添加了196个样本。这些样本用于挑战检测器的稳健性，并评估准确性影响。虽然QuillBot和Copyleaks在原始和改写的DeepSeek文本上表现出几乎完美的性能，其他检测器--特别是AI文本分类器和GPT-2--显示出不一致的结果。最有效的攻击是人性化，将Copyleaks的准确性降低到71％，QuillBot的准确性降低到58％，GPTZero的准确性降低到52％。少量提示和CoT提示显示出高准确性，最佳的五次提示结果只错误分类了49个样本中的一个（AI召回率96％，人类召回率100％）。

更新时间: 2025-07-23 21:26:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.17944v1

LLM Alignment as Retriever Optimization: An Information Retrieval Perspective

Large Language Models (LLMs) have revolutionized artificial intelligence with capabilities in reasoning, coding, and communication, driving innovation across industries. Their true potential depends on effective alignment to ensure correct, trustworthy and ethical behavior, addressing challenges like misinformation, hallucinations, bias and misuse. While existing Reinforcement Learning (RL)-based alignment methods are notoriously complex, direct optimization approaches offer a simpler alternative. In this work, we introduce a novel direct optimization approach for LLM alignment by drawing on established Information Retrieval (IR) principles. We present a systematic framework that bridges LLM alignment and IR methodologies, mapping LLM generation and reward models to IR's retriever-reranker paradigm. Building on this foundation, we propose LLM Alignment as Retriever Preference Optimization (LarPO), a new alignment method that enhances overall alignment quality. Extensive experiments validate LarPO's effectiveness with 38.9 % and 13.7 % averaged improvement on AlpacaEval2 and MixEval-Hard respectively. Our work opens new avenues for advancing LLM alignment by integrating IR foundations, offering a promising direction for future research.

Updated: 2025-07-23 21:26:26

标题: LLM对齐作为检索优化：信息检索视角

摘要: 大型语言模型（LLMs）已经在推理、编码和通信方面革新了人工智能，推动了各行业的创新。它们的真正潜力取决于有效的对齐，以确保正确、可信和道德行为，解决挑战如虚假信息、幻觉、偏见和滥用等问题。虽然现有基于强化学习（RL）的对齐方法复杂而著名，但直接优化方法提供了一个更简单的替代方案。在这项工作中，我们引入了一种新颖的直接优化方法，通过借鉴已建立的信息检索（IR）原则来对齐LLM。我们提出了一个系统框架，将LLM生成和奖励模型映射到IR的检索器-重新排列器范式。在此基础上，我们提出了LLM对齐作为检索器首选优化（LarPO），这是一种新的对齐方法，提高了整体对齐质量。大量实验证实了LarPO在AlpacaEval2和MixEval-Hard上分别平均提高了38.9％和13.7％。我们的工作通过整合IR基础为推进LLM对齐开辟了新方向，为未来研究提供了一个有前途的方向。

更新时间: 2025-07-23 21:26:26

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2502.03699v3

Analyzing Fairness of Computer Vision and Natural Language Processing Models

Machine learning (ML) algorithms play a critical role in decision-making across various domains, such as healthcare, finance, education, and law enforcement. However, concerns about fairness and bias in these systems have raised significant ethical and social challenges. To address these challenges, this research utilizes two prominent fairness libraries, Fairlearn by Microsoft and AIF360 by IBM. These libraries offer comprehensive frameworks for fairness analysis, providing tools to evaluate fairness metrics, visualize results, and implement bias mitigation algorithms. The study focuses on assessing and mitigating biases for unstructured datasets using Computer Vision (CV) and Natural Language Processing (NLP) models. The primary objective is to present a comparative analysis of the performance of mitigation algorithms from the two fairness libraries. This analysis involves applying the algorithms individually, one at a time, in one of the stages of the ML lifecycle, pre-processing, in-processing, or post-processing, as well as sequentially across more than one stage. The results reveal that some sequential applications improve the performance of mitigation algorithms by effectively reducing bias while maintaining the model's performance. Publicly available datasets from Kaggle were chosen for this research, providing a practical context for evaluating fairness in real-world machine learning workflows.

Updated: 2025-07-23 21:24:40

标题: 分析计算机视觉和自然语言处理模型的公平性

摘要: 机器学习（ML）算法在医疗保健、金融、教育和执法等各个领域的决策中发挥着关键作用。然而，对这些系统中的公平性和偏见的担忧引发了重大的伦理和社会挑战。为了解决这些挑战，本研究利用了微软的Fairlearn和IBM的AIF360两个知名的公平性库。这些库提供了公平性分析的综合框架，提供评估公平性指标、可视化结果和实施偏见缓解算法的工具。该研究侧重于使用计算机视觉（CV）和自然语言处理（NLP）模型对非结构化数据集进行偏见评估和缓解。主要目标是对比分析这两个公平性库中缓解算法的性能。该分析涉及将算法分别应用在ML生命周期的一个阶段，预处理、处理中或后处理，以及跨越一个以上阶段的顺序应用。结果显示，一些顺序应用能够通过有效减少偏见来提高缓解算法的性能，同时保持模型的性能。本研究选择了来自Kaggle的公开可用数据集，为评估真实世界机器学习工作流程中的公平性提供了实际背景。

更新时间: 2025-07-23 21:24:40

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2412.09900v3

Analyzing Fairness of Computer Vision and Natural Language Processing Models

Updated: 2025-07-23 21:24:40

标题: 分析计算机视觉和自然语言处理模型的公平性

摘要: 机器学习（ML）算法在医疗保健、金融、教育和执法等各个领域的决策中起着关键作用。然而，对这些系统中的公平性和偏见的担忧引发了重大的道德和社会挑战。为了解决这些挑战，本研究利用了微软的Fairlearn和IBM的AIF360两个著名的公平性库。这些库提供了公平性分析的全面框架，提供了评估公平性指标、可视化结果和实施偏见缓解算法的工具。该研究重点评估和减轻使用计算机视觉（CV）和自然语言处理（NLP）模型的非结构化数据集的偏见。主要目标是对比分析这两个公平性库中缓解算法的性能。该分析涉及将算法单独应用在ML生命周期的一个阶段（预处理、内部处理或后处理）中，以及依次跨越多个阶段。结果表明，一些连续应用通过有效降低偏见同时保持模型性能来改善缓解算法的性能。本研究选择了来自Kaggle的公开可用数据集，为评估实际机器学习工作流程中的公平性提供了实际背景。

更新时间: 2025-07-23 21:24:40

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2412.09900v3

Minimax Data Sanitization with Distortion Constraint and Adversarial Inference

We study a privacy-preserving data-sharing setting where a privatizer transforms private data into a sanitized version observed by an authorized reconstructor and two unauthorized adversaries, each with access to side information correlated with the private data. The reconstructor is evaluated under a distortion function, while each adversary is evaluated using a separate loss function. The privatizer ensures the reconstructor distortion remains below a fixed threshold while maximizing the minimum loss across the two adversaries. This two-adversary setting models cases where individual users cannot reconstruct the data accurately, but their combined side information enables estimation within the distortion threshold. The privatizer maximizes individual loss while permitting accurate reconstruction only through collaboration. This echoes secret-sharing principles, but with lossy rather than perfect recovery. We frame this as a constrained data-driven minimax optimization problem and propose a data-driven training procedure that alternately updates the privatizer, reconstructor, and adversaries. We also analyze the Gaussian and binary cases as special scenarios where optimal solutions can be obtained. These theoretical optimal results are benchmarks for evaluating the proposed minimax training approach.

Updated: 2025-07-23 21:22:35

标题: 极小化失真约束和对抗推断的数据清理

摘要: 我们研究了一个隐私保护的数据共享设置，其中一个私有化者将私人数据转换为经授权重建者和两个未经授权的敌人观察到的经过清洗的版本，每个敌人都可以访问与私有数据相关的边缘信息。重建者根据失真函数进行评估，而每个敌人则使用不同的损失函数进行评估。私有化者确保重建者的失真保持在一个固定的阈值以下，同时最大化两个敌人的最小损失。这种双敌人设置模拟了个体用户无法准确重建数据的情况，但他们的综合边缘信息使得在失真阈值内进行估计成为可能。私有化者最大化个体损失，同时只通过协作允许准确重建。这与秘密共享原则相呼应，但是实现的是有损而不是完美的恢复。我们将其构建为一个受约束的数据驱动极小极大优化问题，并提出了一个数据驱动的训练过程，交替更新私有化者、重建者和敌人。我们还分析了高斯和二进制情况作为特殊情况，可以获得最优解。这些理论最优结果是评估所提出的极小极大训练方法的基准。

更新时间: 2025-07-23 21:22:35

领域: cs.IT,cs.AI,math.IT

下载: http://arxiv.org/abs/2507.17942v1

WaveVerify: A Novel Audio Watermarking Framework for Media Authentication and Combatting Deepfakes

The rapid advancement of voice generation technologies has enabled the synthesis of speech that is perceptually indistinguishable from genuine human voices. While these innovations facilitate beneficial applications such as personalized text-to-speech systems and voice preservation, they have also introduced significant risks, including deepfake impersonation scams and synthetic media-driven disinformation campaigns. Recent reports indicate that in 2024, deepfake fraud attempts surged by over 1,300% compared to 2023, underscoring the urgent need for robust audio content authentication. The financial sector has been particularly impacted, with a loss of over 10 million USD to voice scams and individual victims reporting losses exceeding $6,000 from AI-generated deepfake calls. In response, regulators and governments worldwide are enacting measures to improve AI content transparency and traceability, emphasizing the development of forensic tools and watermarking techniques as essential strategies to uphold media integrity.

Updated: 2025-07-23 21:16:08

标题: WaveVerify：一种用于媒体验证和对抗深度伪造的新型音频水印框架

摘要: 语音生成技术的快速发展使得合成的语音在感知上与真实的人类声音无法区分。虽然这些创新有助于诸如个性化文本转语音系统和声音保存等有益应用，但它们也带来了重大风险，包括深度伪造冒名骗局和合成媒体驱动的虚假信息传播。最近的报告显示，2024年，与2023年相比，深度伪造欺诈尝试激增了超过1300%，凸显了对强大的音频内容认证的迫切需求。金融部门尤其受到影响，因声音诈骗而造成损失超过1000万美元，个别受害者称因人工智能生成的深度伪造电话而遭受超过6000美元的损失。为了应对这一情况，全球监管机构和政府正在采取措施，以改善人工智能内容的透明度和可追溯性，强调开发法证工具和水印技术作为维护媒体完整性的重要策略。

更新时间: 2025-07-23 21:16:08

领域: cs.CR

下载: http://arxiv.org/abs/2507.21150v1

Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

Lyrics-to-Song (LS2) generation models promise end-to-end music synthesis from text, yet their vulnerability to training data memorization remains underexplored. We introduce Adversarial PhoneTic Prompting (APT), a novel attack where lyrics are semantically altered while preserving their acoustic structure through homophonic substitutions (e.g., Eminem's famous "mom's spaghetti" $\rightarrow$ "Bob's confetti"). Despite these distortions, we uncover a powerful form of sub-lexical memorization: models like SUNO and YuE regenerate outputs strikingly similar to known training content, achieving high similarity across audio-domain metrics, including CLAP, AudioJudge, and CoverID. This vulnerability persists across multiple languages and genres. More surprisingly, we discover that phoneme-altered lyrics alone can trigger visual memorization in text-to-video models. When prompted with phonetically modified lyrics from Lose Yourself, Veo 3 reconstructs visual elements from the original music video -- including character appearance and scene composition -- despite no visual cues in the prompt. We term this phenomenon phonetic-to-visual regurgitation. Together, these findings expose a critical vulnerability in transcript-conditioned multimodal generation: phonetic prompting alone can unlock memorized audiovisual content, raising urgent questions about copyright, safety, and content provenance in modern generative systems. Example generations are available on our demo page (jrohsc.github.io/music_attack/).

Updated: 2025-07-23 21:11:47

标题: 鲍勃的五彩纸屑：音乐和视频生成中的语音记忆攻击

摘要: 歌词到歌曲（LS2）生成模型承诺从文本开始到结束的音乐合成，然而它们对训练数据记忆的脆弱性仍未得到充分探讨。我们引入了Adversarial PhoneTic Prompting（APT），这是一种新颖的攻击方式，其中歌词在保持其声学结构的同时被语义上改变，通过音位替换（例如，Eminem的著名“妈妈的意大利面”→“鲍勃的纸屑”）。尽管存在这些扭曲，我们发现一种强大的次词素记忆形式：像SUNO和YuE这样的模型重新生成的输出与已知的训练内容非常相似，在音频领域的度量标准（包括CLAP、AudioJudge和CoverID）上取得了高相似度。这种脆弱性跨越多种语言和流派。更令人惊讶的是，我们发现单独改变音素的歌词可以触发文本到视频模型中的视觉记忆。当用来自《Lose Yourself》的音位修改后的歌词提示时，Veo 3重新构建了原始音乐视频中的视觉元素，包括角色外观和场景构图，尽管提示中没有视觉线索。我们将这一现象称为音素到视觉的反刍。总的来说，这些发现揭示了在文本条件的多模式生成中的一个关键脆弱性：仅仅通过音素提示就可以解锁记忆的音频视觉内容，引发了关于现代生成系统中版权、安全和内容来源的紧急问题。示例生成可在我们的演示页面（jrohsc.github.io/music_attack/）上找到。

更新时间: 2025-07-23 21:11:47

领域: cs.SD,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2507.17937v1

Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions

We study the bidding problem in repeated uniform price multi-unit auctions from the perspective of a value-maximizing buyer. The buyer aims to maximize their cumulative value over $T$ rounds while adhering to per-round return-on-investment (RoI) constraints in a strategic (or adversarial) environment. Using an $m$-uniform bidding format, the buyer submits $m$ bid-quantity pairs $(b_i, q_i)$ to demand $q_i$ units at bid $b_i$, with $m \ll M$ in practice, where $M$ denotes the maximum demand of the buyer. We introduce the notion of safe bidding strategies as those that satisfy the RoI constraints irrespective of competing bids. Despite the stringent requirement, we show that these strategies satisfy a mild no-overbidding condition, depend only on the valuation curve of the bidder, and the bidder can focus on a finite subset without loss of generality. Though the subset size is $O(M^m)$, we design a polynomial-time learning algorithm that achieves sublinear regret, both in full-information and bandit settings, relative to the hindsight-optimal safe strategy. We assess the robustness of safe strategies against the hindsight-optimal strategy from a richer class. We define the richness ratio $\alpha \in (0,1]$ as the minimum ratio of the value of the optimal safe strategy to that of the optimal strategy from richer class and construct hard instances showing the tightness of $\alpha$. Our algorithm achieves $\alpha$-approximate sublinear regret against these stronger benchmarks. Simulations on semi-synthetic auction data show that empirical richness ratios significantly outperform the theoretical worst-case bounds. The proposed safe strategies and learning algorithm extend naturally to more nuanced buyer and competitor models.

Updated: 2025-07-23 21:09:41

标题: 学习在统一价格拍卖中最大化买家价值的安全策略

摘要: 我们从一个价值最大化的买家的角度研究了重复统一价格多单位拍卖中的竞标问题。买家的目标是在$T$轮中最大化他们的累积价值，同时遵守每轮投资回报率（RoI）约束，在战略（或对抗性）环境中。使用$m$个统一竞标格式，买家提交$m$个竞标-数量对$(b_i, q_i)$，以在竞标$b_i$时需求$q_i$个单位，其中在实践中$m \ll M$，其中$M$表示买家的最大需求量。我们引入了安全竞标策略的概念，即无论竞争性竞标如何，都能满足RoI约束。尽管要求严格，我们表明这些策略满足轻微的不过度竞标条件，仅取决于竞标者的估值曲线，并且竞标者可以专注于一个有限的子集而不失一般性。尽管子集大小为$O(M^m)$，我们设计了一个多项式时间学习算法，相对于事后最优安全策略，在完全信息和强盗设置中都能实现次线性后悔。我们评估了安全策略相对于更丰富类别中事后最优策略的鲁棒性。我们将富度比率$\alpha \in (0,1]$定义为最优安全策略价值与更丰富类别最优策略价值的最小比率，并构建了显示$\alpha$紧密性的困难实例。我们的算法相对于这些更强基准实现了$\alpha$近似次线性后悔。对半合成拍卖数据的模拟显示，实证富度比率明显优于理论最坏情况下的界限。所提出的安全策略和学习算法自然地扩展到更加细致的买家和竞争者模型。

更新时间: 2025-07-23 21:09:41

领域: cs.DS,cs.GT,cs.LG

下载: http://arxiv.org/abs/2406.03674v3

Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions

Updated: 2025-07-23 21:09:41

标题: 在统一价格拍卖中学习安全策略以最大化买家的价值

摘要: 我们从价值最大化买家的角度研究了重复统一价格多单位拍卖中的竞标问题。买家的目标是在$T$轮中最大化累积价值，同时遵守每轮投资回报率（RoI）约束，处于战略（或对抗性）环境中。通过使用$m$-统一竞标格式，买家提交$m$个竞标-数量对$(b_i, q_i)$，以要求在竞标$b_i$时需求$q_i$单位，实际上$m \ll M$，其中$M$表示买家的最大需求量。我们引入了安全竞标策略的概念，即无论竞争竞标如何，都能满足RoI约束的策略。尽管要求严格，我们展示了这些策略满足轻微的不过度竞标条件，仅取决于竞标者的估值曲线，并且买家可以集中精力在一个有限的子集上而不失一般性。尽管子集大小为$O(M^m)$，我们设计了一个多项式时间的学习算法，在完全信息和强盗设置中实现了次线性的后悔，相对于事后最优的安全策略。我们评估了安全策略相对于来自更丰富类别的事后最优策略的鲁棒性。我们将丰富比率$\alpha \in (0,1]$定义为最优安全策略价值与来自更丰富类别的最优策略价值的比率，并构建了显示$\alpha$紧密性的困难实例。我们的算法相对于这些更强大的基准实现了$\alpha$近似的次线性后悔。在半合成拍卖数据上进行的模拟表明，实证丰富比率明显优于理论最坏情况下的界限。提出的安全策略和学习算法自然地扩展到更复杂的买家和竞争者模型。

更新时间: 2025-07-23 21:09:41

领域: cs.DS,cs.GT,cs.LG

下载: http://arxiv.org/abs/2406.03674v3

Quantum Machine Learning Playground

This article introduces an innovative interactive visualization tool designed to demystify quantum machine learning (QML) algorithms. Our work is inspired by the success of classical machine learning visualization tools, such as TensorFlow Playground, and aims to bridge the gap in visualization resources specifically for the field of QML. The article includes a comprehensive overview of relevant visualization metaphors from both quantum computing and classical machine learning, the development of an algorithm visualization concept, and the design of a concrete implementation as an interactive web application. By combining common visualization metaphors for the so-called data re-uploading universal quantum classifier as a representative QML model, this article aims to lower the entry barrier to quantum computing and encourage further innovation in the field. The accompanying interactive application is a proposal for the first version of a quantum machine learning playground for learning and exploring QML models.

Updated: 2025-07-23 21:08:29

标题: 量子机器学习游乐场

摘要: 本文介绍了一种创新的交互式可视化工具，旨在揭示量子机器学习（QML）算法的神秘。我们的工作受到了经典机器学习可视化工具（如TensorFlow Playground）的成功启发，并旨在弥补专门针对QML领域的可视化资源不足的差距。本文包括了来自量子计算和经典机器学习的相关可视化隐喻的全面概述，算法可视化概念的开发，以及作为交互式Web应用程序的具体实施设计。通过将所谓的数据重新上传通用量子分类器的常见可视化隐喻相结合，本文旨在降低进入量子计算领域的门槛，并鼓励该领域的进一步创新。随附的交互式应用程序是一个提议的第一个版本的量子机器学习游乐场，用于学习和探索QML模型。

更新时间: 2025-07-23 21:08:29

领域: quant-ph,cs.GR,cs.LG

下载: http://arxiv.org/abs/2507.17931v1

Quantum Machine Learning Playground

Updated: 2025-07-23 21:08:29

标题: 量子机器学习游乐场

摘要: 本文介绍了一种创新的交互式可视化工具，旨在揭示量子机器学习（QML）算法的奥秘。我们的工作受到经典机器学习可视化工具（如TensorFlow Playground）成功的启发，并旨在填补专门针对QML领域的可视化资源的空白。本文包括从量子计算和经典机器学习中相关可视化隐喻的综合概述，算法可视化概念的发展，以及作为交互式Web应用程序的具体实现的设计。通过将所谓的数据重新上传通用量子分类器的常见可视化隐喻结合起来作为代表性的QML模型，本文旨在降低进入量子计算领域的门槛，并鼓励该领域的进一步创新。随附的交互应用程序是学习和探索QML模型的第一个版本的量子机器学习游乐场的提议。

更新时间: 2025-07-23 21:08:29

领域: quant-ph,cs.GR,cs.LG

下载: http://arxiv.org/abs/2507.17931v1

The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts

Computational workloads composing traditional Transformer models are starkly bifurcated. Multi-Head Attention (MHA) is memory-bound, with low arithmetic intensity, while feedforward layers are compute-bound. This dichotomy has long motivated research into specialized hardware to mitigate the MHA bottleneck. This paper argues that recent architectural shifts, namely Multi-head Latent Attention (MLA) and Mixture-of-Experts (MoE), challenge the premise of specialized attention hardware. We make two key observations. First, the arithmetic intensity of MLA is over two orders of magnitude greater than that of MHA, shifting it close to a compute-bound regime well-suited for modern accelerators like GPUs. Second, by distributing MoE experts across a pool of accelerators, their arithmetic intensity can be tuned through batching to match that of the dense layers, creating a more balanced computational profile. These findings reveal a diminishing need for specialized attention hardware. The central challenge for next-generation Transformers is no longer accelerating a single memory-bound layer. Instead, the focus must shift to designing balanced systems with sufficient compute, memory capacity, memory bandwidth, and high-bandwidth interconnects to manage the diverse demands of large-scale models.

Updated: 2025-07-23 20:55:41

标题: 新的LLM瓶颈：潜在注意力和专家混合的系统视角

摘要: 传统Transformer模型的计算工作负载明显二分。多头注意力（MHA）受内存限制，具有较低的算术强度，而前馈层受计算限制。这种二分长期以来一直激发了研究专门硬件以减轻MHA瓶颈的动机。本文认为，最近的架构转变，即多头潜在注意力（MLA）和专家混合（MoE），挑战了专门的注意力硬件的前提。我们做出了两个关键观察。首先，MLA的算术强度比MHA高两个数量级以上，使其接近适合现代加速器（如GPU）的计算限制状态。其次，通过将MoE专家分布在一组加速器中，它们的算术强度可以通过批处理来调整，以匹配密集层的算术强度，从而创建更平衡的计算配置文件。这些发现揭示了对专门的注意力硬件需求的减少。下一代Transformer的核心挑战不再是加速单个受内存限制的层。相反，重点必须转向设计具有足够计算、内存容量、内存带宽和高带宽互连的平衡系统，以管理大规模模型的多样需求。

更新时间: 2025-07-23 20:55:41

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2507.15465v2

SMARTAPS: Tool-augmented LLMs for Operations Management

Large language models (LLMs) present intriguing opportunities to enhance user interaction with traditional algorithms and tools in real-world applications. An advanced planning system (APS) is a sophisticated software that leverages optimization to help operations planners create, interpret, and modify an operational plan. While highly beneficial, many customers are priced out of using an APS due to the ongoing costs of consultants responsible for customization and maintenance. To address the need for a more accessible APS expressed by supply chain planners, we present SmartAPS, a conversational system built on a tool-augmented LLM. Our system provides operations planners with an intuitive natural language chat interface, allowing them to query information, perform counterfactual reasoning, receive recommendations, and execute scenario analysis to better manage their operation. A short video demonstrating the system has been released: https://youtu.be/KtIrJjlDbyw

Updated: 2025-07-23 20:53:40

标题: SMARTAPS: 工具增强的运营管理LLMs

摘要: 大型语言模型（LLMs）为增强用户与传统算法和工具在实际应用中的互动提供了引人入胜的机会。高级规划系统（APS）是一种复杂的软件，利用优化帮助运营规划人员创建、解释和修改运营计划。尽管极具益处，但许多客户因顾问定制和维护的持续成本而无法使用APS。为了满足供应链规划人员对更易于访问APS的需求，我们提出了SmartAPS，这是一个基于增强工具的LLM构建的对话系统。我们的系统为运营规划人员提供直观的自然语言聊天接口，使他们能够查询信息、进行反事实推理、接收建议并执行情景分析，以更好地管理他们的运营。已发布了演示该系统的短视频链接：https://youtu.be/KtIrJjlDbyw

更新时间: 2025-07-23 20:53:40

领域: cs.AI

下载: http://arxiv.org/abs/2507.17927v1

Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks

The grand goal of AI research, and particularly Self Supervised Learning (SSL), is to produce systems that can successfully solve any possible task. In contrast, current evaluation methods available to AI researchers typically rely on a fixed collection of hand-picked downstream benchmarks. Hence, a large amount of effort is put into designing and searching for large collection of evaluation tasks that can serve as a proxy of our grand goal. We argue that such a rigid evaluation protocol creates a silent bottleneck in AI research. To remedy that, we define a probabilistic space of downstream tasks obtained by adopting a distribution of tasks and by defining Task Priors. Under this view, one can evaluate a model's performance over the set of all possible downstream tasks. Our framework is the first to provide answers to key questions such as (i) what is the average performance of my model over all possible downstream tasks weighted by the probability to encounter each task? or (ii) what is the variance of my model's performance across all downstream tasks under the defined Task Priors? Beyond establishing a new standard for evaluation, we believe that Task Priors will accelerate the pace of research in SSL - where downstream task evaluation is the sole qualitative signal that researchers have access to.

Updated: 2025-07-23 20:53:29

标题: 任务先验：通过考虑下游任务的整个空间来增强模型评估

摘要: 人工智能研究的宏伟目标，尤其是自监督学习（SSL），是生产能够成功解决任何可能任务的系统。相比之下，目前AI研究人员可用的评估方法通常依赖于一组固定的手工挑选的下游基准。因此，大量的工作被投入到设计和搜索大量评估任务的集合，这些任务可以作为我们宏伟目标的代理。我们认为，这种刚性的评估协议在AI研究中制造了一个潜在的瓶颈。为了解决这个问题，我们定义了一个通过采用任务分布和定义任务先验获得的下游任务的概率空间。在这个视角下，可以评估模型在所有可能的下游任务集上的表现。我们的框架是第一个提供关键问题答案的，比如（i）在加权考虑每个任务遇到的概率的情况下，我的模型在所有可能的下游任务上的平均表现是多少？或者（ii）在定义的任务先验下，我的模型在所有下游任务中的表现方差是多少？除了建立新的评估标准，我们相信任务先验将加速SSL研究的进程 - 在这里，下游任务评估是研究人员唯一可以访问的质量信号。

更新时间: 2025-07-23 20:53:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.09871v2

Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks

Updated: 2025-07-23 20:53:29

标题: 任务先验：通过考虑下游任务的整个空间来增强模型评估

摘要: 人工智能研究的宏伟目标，尤其是自监督学习（SSL），是生产能够成功解决任何可能任务的系统。相比之下，目前AI研究人员可用的评估方法通常依赖于一系列手动挑选的下游基准。因此，大量的工作被投入到设计和搜索大量的评估任务集合，这些任务可以作为我们宏伟目标的代理。我们认为，这种僵化的评估协议在AI研究中造成了潜在的瓶颈。为了解决这个问题，我们定义了一个通过采用任务分布和定义任务先验获得的下游任务的概率空间。在这个视角下，可以评估模型在所有可能的下游任务集合上的性能。我们的框架是第一个提供答案的，例如（i）根据遇到每个任务的概率加权的所有可能下游任务上的模型平均性能是多少？或者（ii）在定义的任务先验下，我的模型在所有下游任务上的性能方差是多少？除了建立新的评估标准外，我们相信任务先验将加快SSL研究的进度 - 在那里，下游任务评估是研究人员可以访问的唯一定性信号。

更新时间: 2025-07-23 20:53:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.09871v2

UrbanPulse: A Cross-City Deep Learning Framework for Ultra-Fine-Grained Population Transfer Prediction

Accurate population flow prediction is essential for urban planning, transportation management, and public health. Yet existing methods face key limitations: traditional models rely on static spatial assumptions, deep learning models struggle with cross-city generalization, and Large Language Models (LLMs) incur high computational costs while failing to capture spatial structure. Moreover, many approaches sacrifice resolution by clustering Points of Interest (POIs) or restricting coverage to subregions, limiting their utility for city-wide analytics. We introduce UrbanPulse, a scalable deep learning framework that delivers ultra-fine-grained, city-wide OD flow predictions by treating each POI as an individual node. It combines a temporal graph convolutional encoder with a transformer-based decoder to model multi-scale spatiotemporal dependencies. To ensure robust generalization across urban contexts, UrbanPulse employs a three-stage transfer learning strategy: pretraining on large-scale urban graphs, cold-start adaptation, and reinforcement learning fine-tuning.Evaluated on over 103 million cleaned GPS records from three metropolitan areas in California, UrbanPulse achieves state-of-the-art accuracy and scalability. Through efficient transfer learning, UrbanPulse takes a key step toward making high-resolution, AI-powered urban forecasting deployable in practice across diverse cities.

Updated: 2025-07-23 20:44:25

标题: 城市脉动：一种用于超细粒度人口转移预测的跨城市深度学习框架

摘要: 准确的人口流动预测对于城市规划、交通管理和公共卫生至关重要。然而，现有方法存在关键局限性：传统模型依赖于静态空间假设，深度学习模型在跨城市泛化方面存在困难，而大型语言模型（LLMs）在未能捕捉空间结构的同时产生了高昂的计算成本。此外，许多方法通过对兴趣点（POIs）进行聚类或限制覆盖范围到子区域来牺牲分辨率，限制了它们在全城分析中的实用性。我们介绍了UrbanPulse，一个可扩展的深度学习框架，通过将每个POI视为一个独立节点，提供了超细粒度的全城OD流量预测。它结合了时间图卷积编码器和基于变压器的解码器，以建模多尺度时空依赖关系。为了确保在城市环境中的健壮泛化，UrbanPulse采用了三阶段的迁移学习策略：在大规模城市图上进行预训练，冷启动适应以及强化学习微调。通过对加利福尼亚州三个大都会区超过1.03亿条清洗过的GPS记录进行评估，UrbanPulse实现了最先进的准确性和可扩展性。通过高效的迁移学习，UrbanPulse迈出了实现高分辨率、AI驱动的城市预测在不同城市实践中可部署的重要一步。

更新时间: 2025-07-23 20:44:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17924v1

UrbanPulse: A Cross-City Deep Learning Framework for Ultra-Fine-Grained Population Transfer Prediction

Updated: 2025-07-23 20:44:25

标题: UrbanPulse：一种用于超细粒度人口转移预测的跨城市深度学习框架

摘要: 精准的人口流动预测对城市规划、交通管理和公共卫生至关重要。然而，现有方法面临关键局限性：传统模型依赖静态空间假设，深度学习模型在跨城通用性方面遇到困难，大型语言模型（LLMs）计算成本高昂，同时无法捕捉空间结构。此外，许多方法通过聚类兴趣点（POIs）或限制覆盖范围到子区域来牺牲分辨率，限制了它们在全城分析中的实用性。我们引入UrbanPulse，这是一个可扩展的深度学习框架，通过将每个POI视为独立节点，提供了细粒度、全城OD流量预测。它结合了时间图卷积编码器和基于变压器的解码器，以建模多尺度时空依赖关系。为了确保在城市环境中的强大泛化能力，UrbanPulse采用了三阶段的迁移学习策略：在大规模城市图上预训练，冷启动适应和强化学习微调。通过对加利福尼亚州三个大都市地区的超过1.03亿个清洗后的GPS记录进行评估，UrbanPulse实现了最先进的准确性和可扩展性。通过高效的迁移学习，UrbanPulse迈出了实现高分辨率、人工智能驱动的城市预测在不同城市中实践可部署的关键一步。

更新时间: 2025-07-23 20:44:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17924v1

From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models

Text-to-image (T2I) models have become prevalent across numerous applications, making their robust evaluation against adversarial attacks a critical priority. Continuous access to new and challenging adversarial prompts across diverse domains is essential for stress-testing these models for resilience against novel attacks from multiple vectors. Current techniques for generating such prompts are either entirely authored by humans or synthetically generated. On the one hand, datasets of human-crafted adversarial prompts are often too small in size and imbalanced in their cultural and contextual representation. On the other hand, datasets of synthetically-generated prompts achieve scale, but typically lack the realistic nuances and creative adversarial strategies found in human-crafted prompts. To combine the strengths of both human and machine approaches, we propose Seed2Harvest, a hybrid red-teaming method for guided expansion of culturally diverse, human-crafted adversarial prompt seeds. The resulting prompts preserve the characteristics and attack patterns of human prompts while maintaining comparable average attack success rates (0.31 NudeNet, 0.36 SD NSFW, 0.12 Q16). Our expanded dataset achieves substantially higher diversity with 535 unique geographic locations and a Shannon entropy of 7.48, compared to 58 locations and 5.28 entropy in the original dataset. Our work demonstrates the importance of human-machine collaboration in leveraging human creativity and machine computational capacity to achieve comprehensive, scalable red-teaming for continuous T2I model safety evaluation.

Updated: 2025-07-23 20:39:14

标题: 从种子到收获：利用人工智能增强红队文本到图像模型的创造力

摘要: 文本到图像（T2I）模型已经在许多应用中变得流行，因此对其进行针对对抗攻击的稳健评估成为一项至关重要的任务。持续获得来自不同领域的新颖且具有挑战性的对抗提示，对于对这些模型进行抗击打击测试以确保其对来自多个向量的新型攻击具有韧性是至关重要的。目前用于生成此类提示的技术要么完全由人类编写，要么是合成生成的。一方面，人工制作的对抗提示数据集往往规模过小，并且在文化和上下文表达方面存在不平衡。另一方面，合成生成的提示数据集规模较大，但通常缺乏人工制作提示中所具有的现实细微差别和创造性对抗策略。为了结合人类和机器方法的优势，我们提出了Seed2Harvest，一种用于引导扩展具有文化多样性的人工制作对抗提示种子的混合红队方法。结果提示保留了人类提示的特征和攻击模式，同时保持了可比较的平均攻击成功率（0.31 NudeNet，0.36 SD NSFW，0.12 Q16）。我们扩展的数据集在唯一地理位置达到了535个，香农熵为7.48，而原始数据集中的地理位置仅有58个，熵为5.28。我们的工作展示了人机协作在利用人类创造力和机器计算能力方面的重要性，以实现全面、可扩展的红队测试，用于持续评估T2I模型的安全性。

更新时间: 2025-07-23 20:39:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17922v1

From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models

Updated: 2025-07-23 20:39:14

标题: 从种子到收获：利用人工智能增强红队文本到图像模型的创造力

摘要: 文本到图像(T2I)模型已经在许多应用程序中变得普遍，因此对其进行强大的对抗攻击评估成为一个关键优先事项。在不同领域持续获取新的具有挑战性的对抗提示对于对这些模型进行抗击多重向量新攻击的压力测试至关重要。目前用于生成此类提示的技术要么完全由人类创作，要么是合成生成的。一方面，由人类精心制作的对抗提示数据集往往在规模上太小，在文化和背景上不平衡。另一方面，合成生成的提示数据集规模虽大，但通常缺乏人类制作提示中的现实细微差别和创造性对抗策略。为了结合人类和机器方法的优势，我们提出了Seed2Harvest，一种用于引导扩展具有文化多样性的人类制作对抗提示种子的混合红队方法。生成的提示保留了人类提示的特征和攻击模式，同时保持了相当的平均攻击成功率(0.31 NudeNet, 0.36 SD NSFW, 0.12 Q16)。相比原始数据集中的58个位置和5.28的熵，我们的扩展数据集在独特地理位置达到了535个，香农熵为7.48。我们的工作展示了在利用人类创造力和机器计算能力实现全面、可扩展的红队合作中的重要性，以持续评估T2I模型的安全性。

更新时间: 2025-07-23 20:39:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17922v1

Sliding Window Informative Canonical Correlation Analysis

Canonical correlation analysis (CCA) is a technique for finding correlated sets of features between two datasets. In this paper, we propose a novel extension of CCA to the online, streaming data setting: Sliding Window Informative Canonical Correlation Analysis (SWICCA). Our method uses a streaming principal component analysis (PCA) algorithm as a backend and uses these outputs combined with a small sliding window of samples to estimate the CCA components in real time. We motivate and describe our algorithm, provide numerical simulations to characterize its performance, and provide a theoretical performance guarantee. The SWICCA method is applicable and scalable to extremely high dimensions, and we provide a real-data example that demonstrates this capability.

Updated: 2025-07-23 20:35:15

标题: 滑动窗口信息规范相关分析

摘要: 典型相关分析（CCA）是一种用于在两个数据集之间找到相关特征集的技术。在本文中，我们提出了一种新颖的CCA扩展，用于在线、流数据设置：滑动窗口信息典型相关分析（SWICCA）。我们的方法使用流式主成分分析（PCA）算法作为后端，并将这些输出与一个小滑动窗口的样本结合起来，实时估计CCA组件。我们阐述并描述了我们的算法，提供了数值模拟来表征其性能，并提供了理论性能保证。SWICCA方法适用且可扩展到极高维度，并提供了一个展示这一能力的实际数据示例。

更新时间: 2025-07-23 20:35:15

领域: stat.ML,cs.LG,eess.IV,math.ST,stat.CO,stat.ME,stat.TH,62H20, 62H25 (Primary) 62J10, 62L10 (Secondary)

下载: http://arxiv.org/abs/2507.17921v1

Sliding Window Informative Canonical Correlation Analysis

Updated: 2025-07-23 20:35:15

标题: 滑动窗口信息规范相关分析

摘要: 典型相关分析（CCA）是一种用于在两个数据集之间找到相关特征集的技术。在本文中，我们提出了一种新颖的CCA扩展，适用于在线、流数据设置：滑动窗口信息典型相关分析（SWICCA）。我们的方法使用流式主成分分析（PCA）算法作为后端，并将这些输出与一小组滑动窗口样本结合起来，实时估计CCA组件。我们激发并描述了我们的算法，提供了数值模拟来表征其性能，并提供了理论性能保证。SWICCA方法适用且可扩展到极高维度，我们提供了一个展示这种能力的实际数据示例。

更新时间: 2025-07-23 20:35:15

领域: stat.ML,cs.LG,eess.IV,math.ST,stat.CO,stat.ME,stat.TH,62H20, 62H25 (Primary) 62J10, 62L10 (Secondary)

下载: http://arxiv.org/abs/2507.17921v1

Tuning Sequential Monte Carlo Samplers via Greedy Incremental Divergence Minimization

The performance of sequential Monte Carlo (SMC) samplers heavily depends on the tuning of the Markov kernels used in the path proposal. For SMC samplers with unadjusted Markov kernels, standard tuning objectives, such as the Metropolis-Hastings acceptance rate or the expected-squared jump distance, are no longer applicable. While stochastic gradient-based end-to-end optimization has been explored for tuning SMC samplers, they often incur excessive training costs, even for tuning just the kernel step sizes. In this work, we propose a general adaptation framework for tuning the Markov kernels in SMC samplers by minimizing the incremental Kullback-Leibler (KL) divergence between the proposal and target paths. For step size tuning, we provide a gradient- and tuning-free algorithm that is generally applicable for kernels such as Langevin Monte Carlo (LMC). We further demonstrate the utility of our approach by providing a tailored scheme for tuning kinetic LMC used in SMC samplers. Our implementations are able to obtain a full schedule of tuned parameters at the cost of a few vanilla SMC runs, which is a fraction of gradient-based approaches.

Updated: 2025-07-23 20:34:43

标题: 通过贪婪递增散度最小化调整顺序蒙特卡洛抽样器

摘要: 序贯蒙特卡罗（SMC）采样器的性能严重依赖于路径提议中使用的马尔可夫核的调整。对于未经调整的马尔可夫核的SMC采样器，传统的调整目标，如Metropolis-Hastings接受率或期望平方跳跃距离，不再适用。虽然已经探索了基于随机梯度的端到端优化来调整SMC采样器，但它们往往会产生过高的训练成本，即使仅调整核步长。在这项工作中，我们提出了一个通用的适应框架，通过最小化提议和目标路径之间的增量Kullback-Leibler（KL）散度来调整SMC采样器中的马尔可夫核。对于步长调整，我们提供了一个无梯度和无调整的算法，通常适用于Langevin Monte Carlo（LMC）等核。我们进一步通过提供一个专门用于调整SMC采样器中使用的动力学LMC的方案，展示了我们方法的实用性。我们的实现能够在几次普通的SMC运行的成本下获得一组调整参数的完整计划，这只是梯度基础方法的一小部分。

更新时间: 2025-07-23 20:34:43

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2503.15704v4

Tuning Sequential Monte Carlo Samplers via Greedy Incremental Divergence Minimization

Updated: 2025-07-23 20:34:43

标题: 通过贪婪增量分歧最小化调整顺序蒙特卡洛抽样器

摘要: 顺序蒙特卡洛（SMC）采样器的性能严重依赖于路径提议中使用的马尔可夫核的调整。对于未调整的马尔可夫核的SMC采样器，标准调整目标，如Metropolis-Hastings接受率或期望平方跳跃距离，不再适用。虽然已经探索了基于随机梯度的端到端优化来调整SMC采样器，但它们往往会产生过高的训练成本，即使只是调整核心步长。在这项工作中，我们提出了一个通用的适应框架，通过最小化提议路径和目标路径之间的增量Kullback-Leibler（KL）散度来调整SMC采样器中的马尔可夫核。对于步长调整，我们提供了一种无梯度和无调整的算法，通常适用于Langevin Monte Carlo（LMC）等核心。我们进一步通过提供一个定制方案来调整用于SMC采样器中的动力学LMC来展示我们方法的实用性。我们的实现能够以几次基本的SMC运行的成本获得调整参数的完整时间表，这是梯度基础方法的一小部分。

更新时间: 2025-07-23 20:34:43

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2503.15704v4

SETOL: A Semi-Empirical Theory of (Deep) Learning

We present a SemiEmpirical Theory of Learning (SETOL) that explains the remarkable performance of State-Of-The-Art (SOTA) Neural Networks (NNs). We provide a formal explanation of the origin of the fundamental quantities in the phenomenological theory of Heavy-Tailed Self-Regularization (HTSR): the heavy-tailed power-law layer quality metrics, alpha and alpha-hat. In prior work, these metrics have been shown to predict trends in the test accuracies of pretrained SOTA NN models, importantly, without needing access to either testing or training data. Our SETOL uses techniques from statistical mechanics as well as advanced methods from random matrix theory and quantum chemistry. The derivation suggests new mathematical preconditions for ideal learning, including a new metric, ERG, which is equivalent to applying a single step of the Wilson Exact Renormalization Group. We test the assumptions and predictions of SETOL on a simple 3-layer multilayer perceptron (MLP), demonstrating excellent agreement with the key theoretical assumptions. For SOTA NN models, we show how to estimate the individual layer qualities of a trained NN by simply computing the empirical spectral density (ESD) of the layer weight matrices and plugging this ESD into our SETOL formulas. Notably, we examine the performance of the HTSR alpha and the SETOL ERG layer quality metrics, and find that they align remarkably well, both on our MLP and on SOTA NNs.

Updated: 2025-07-23 20:22:20

标题: SETOL: 一个半经验的（深度）学习理论

摘要: 我们提出了一种半经验学习理论（SETOL），解释了现代神经网络（NNs）的卓越性能。我们提供了对重尾自正则化（HTSR）现象论理论中基础量的起源的正式解释：重尾幂律层质量指标alpha和alpha-hat。在先前的工作中，这些指标已被证明能够预测预先训练的SOTA NN模型的测试精度的趋势，而且无需访问测试或训练数据。我们的SETOL使用了统计力学的技术，以及来自随机矩阵理论和量子化学的高级方法。推导表明了理想学习的新数学前提条件，包括一个新的度量标准ERG，这相当于应用一步Wilson精确重整化群。我们在一个简单的3层多层感知器（MLP）上测试了SETOL的假设和预测，展示了与关键理论假设的良好一致性。对于SOTA NN模型，我们展示了如何通过简单地计算层权重矩阵的经验谱密度（ESD）并将其插入我们的SETOL公式来估计训练后NN的单个层质量。值得注意的是，我们检验了HTSR alpha和SETOL ERG层质量指标的性能，并发现它们在我们的MLP和SOTA NN上都有非常好的一致性。

更新时间: 2025-07-23 20:22:20

领域: cs.LG,cond-mat.stat-mech

下载: http://arxiv.org/abs/2507.17912v1

SETOL: A Semi-Empirical Theory of (Deep) Learning

Updated: 2025-07-23 20:22:20

标题: SETOL：一种半经验主义的（深度）学习理论

摘要: 我们提出了一种半经验学习理论（SETOL），解释了最先进神经网络（NNs）的出色性能。我们对重尾自正则化（HTSR）现象论理论中基本量的起源进行了形式上的解释：重尾幂律层质量指标alpha和alpha-hat。在以前的研究中，这些指标已经被证明可以预测预训练SOTA NN模型的测试准确性的趋势，而且不需要访问测试或训练数据。我们的SETOL使用来自统计力学以及随机矩阵理论和量子化学的先进方法。推导表明了理想学习的新数学前提条件，包括一个新的度量，ERG，等同于应用一步Wilson精确重整化群。我们在一个简单的3层多层感知器（MLP）上测试了SETOL的假设和预测，展示了与关键理论假设的卓越一致性。对于SOTA NN模型，我们展示了如何通过简单计算层权重矩阵的经验谱密度（ESD）并将其插入我们的SETOL公式来估计已训练NN的各个层质量。值得注意的是，我们检查了HTSR alpha和SETOL ERG层质量指标的性能，并发现它们在我们的MLP和SOTA NN上都表现出明显的一致性。

更新时间: 2025-07-23 20:22:20

领域: cs.LG,cond-mat.stat-mech

下载: http://arxiv.org/abs/2507.17912v1

EEG Foundation Models: A Critical Review of Current Progress and Future Directions

Patterns of electrical brain activity recorded via electroencephalography (EEG) offer immense value for scientific and clinical investigations. The inability of supervised EEG encoders to learn robust EEG patterns and their over-reliance on expensive signal annotations have sparked a transition towards general-purpose self-supervised EEG encoders, i.e., EEG foundation models (EEG-FMs), for robust and scalable EEG feature extraction. However, the real-world readiness of early EEG-FMs and the rubric for long-term research progress remain unclear. A systematic and comprehensive review of first-generation EEG-FMs is therefore necessary to understand the current state-of-the-art and identify key directions for future EEG-FMs. To that end, this study reviews 10 early EEG-FMs and presents a critical synthesis of their methodology, empirical findings, and outstanding research gaps. We find that most EEG-FMs adopt a sequence-based modeling scheme that relies on transformer-based backbones and the reconstruction of masked sequences for self-supervision. However, model evaluations remain heterogeneous and largely limited, making it challenging to assess their practical off-the-shelf utility. In addition to adopting standardized and realistic evaluations, future work should demonstrate more substantial scaling effects and make principled and trustworthy choices throughout the EEG representation learning pipeline. We believe that developing benchmarks, software tools, technical methodologies, and applications in collaboration with domain experts may further advance the translational utility and real-world adoption of EEG-FMs.

Updated: 2025-07-23 20:10:43

标题: EEG基础模型：当前进展和未来发展方向的关键审查

摘要: 通过脑电图（EEG）记录的电脑活动模式为科学和临床研究提供了巨大的价值。监督式EEG编码器无法学习到稳健的EEG模式，并且过度依赖昂贵的信号标注，这导致了向通用自监督EEG编码器，即EEG基础模型（EEG-FMs）的转变，以实现稳健和可扩展的EEG特征提取。然而，早期EEG-FMs的实际可用性以及长期研究进展的标准尚不清楚。因此，有必要对第一代EEG-FMs进行系统和全面的审查，以了解当前技术水平，并确定未来EEG-FMs的关键方向。为此，本研究审查了10个早期EEG-FMs，并对它们的方法论、实证发现和尚未解决的研究缺口进行了关键综合。我们发现大多数EEG-FMs采用基于序列的建模方案，依赖于基于变压器的骨干和对掩码序列的重建进行自监督。然而，模型评估仍然异质且在很大程度上受限，这使得评估其实际现成实用性变得困难。除了采用标准化和现实的评估外，未来工作应该展示更为实质性的扩展效果，并在整个EEG表示学习过程中做出有原则和可信的选择。我们相信，与领域专家合作开发基准、软件工具、技术方法和应用可能进一步推动EEG-FMs的转化效用和实际应用。

更新时间: 2025-07-23 20:10:43

领域: eess.SP,cs.AI,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2507.11783v2

EEG Foundation Models: A Critical Review of Current Progress and Future Directions

Updated: 2025-07-23 20:10:43

标题: 脑电图基础模型：当前进展和未来方向的关键评论

摘要: 通过脑电图（EEG）记录的电脑活动模式为科学和临床研究提供了巨大的价值。受监督的EEG编码器无法学习稳健的EEG模式，过分依赖昂贵的信号标记已经引发了向通用自监督EEG编码器转变的趋势，即EEG基础模型（EEG-FMs），以实现稳健和可扩展的EEG特征提取。然而，早期EEG-FMs的现实世界准备情况以及长期研究进展的标准尚不清楚。因此，有必要对第一代EEG-FMs进行系统和全面的审查，以了解当前的技术水平，并确定未来EEG-FMs的关键方向。为此，本研究回顾了10个早期EEG-FMs，并对其方法论、实证发现和突出的研究空白进行了关键综合。我们发现大多数EEG-FMs采用了基于序列的建模方案，依赖于基于转换器的主干和对掩模序列的重建进行自监督。然而，模型评估仍然是异质的并且在很大程度上受限，这使得评估它们的实际即插即用效用变得具有挑战性。除了采用标准化和现实评估外，未来的工作还应展示更显著的扩展效应，并在整个EEG表示学习流程中做出有原则和可信的选择。我们相信，与领域专家合作开发基准、软件工具、技术方法和应用可能进一步推动EEG-FMs的转化效用和实际采用。

更新时间: 2025-07-23 20:10:43

领域: eess.SP,cs.AI,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2507.11783v2

Deep learning-aided inverse design of porous metamaterials

The ultimate aim of the study is to explore the inverse design of porous metamaterials using a deep learning-based generative framework. Specifically, we develop a property-variational autoencoder (pVAE), a variational autoencoder (VAE) augmented with a regressor, to generate structured metamaterials with tailored hydraulic properties, such as porosity and permeability. While this work uses the lattice Boltzmann method (LBM) to generate intrinsic permeability tensor data for limited porous microstructures, a convolutional neural network (CNN) is trained using a bottom-up approach to predict effective hydraulic properties. This significantly reduces the computational cost compared to direct LBM simulations. The pVAE framework is trained on two datasets: a synthetic dataset of artificial porous microstructures and CT-scan images of volume elements from real open-cell foams. The encoder-decoder architecture of the VAE captures key microstructural features, mapping them into a compact and interpretable latent space for efficient structure-property exploration. The study provides a detailed analysis and interpretation of the latent space, demonstrating its role in structure-property mapping, interpolation, and inverse design. This approach facilitates the generation of new metamaterials with desired properties. The datasets and codes used in this study will be made open-access to support further research.

Updated: 2025-07-23 20:07:53

标题: 深度学习辅助多孔介质材料的逆向设计

摘要: 该研究的最终目的是探索使用基于深度学习的生成框架的多孔介质材料的逆向设计。具体来说，我们开发了一种属性变分自动编码器（pVAE），这是一种增强了回归器的变分自动编码器（VAE），用于生成具有定制水力性能（例如孔隙率和渗透率）的结构化变形材料。虽然本工作使用晶格玻尔兹曼方法（LBM）生成有限多孔微结构的内在渗透率张量数据，但卷积神经网络（CNN）通过自底向上的方法进行训练以预测有效的水力性能。与直接的LBM模拟相比，这显著降低了计算成本。pVAE框架在两个数据集上进行了训练：一个是人工多孔微结构的合成数据集，另一个是来自真实开孔泡沫的CT扫描图像的体积元素。VAE的编码器-解码器架构捕获了关键的微结构特征，并将它们映射到一个紧凑且可解释的潜在空间，以便进行有效的结构-性能探索。该研究提供了对潜在空间的详细分析和解释，展示了其在结构-性能映射、插值和逆向设计中的作用。这种方法有助于生成具有所需性能的新型材料。本研究使用的数据集和代码将对外开放，以支持进一步的研究。

更新时间: 2025-07-23 20:07:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17907v1

Deep learning-aided inverse design of porous metamaterials

Updated: 2025-07-23 20:07:53

标题: 深度学习辅助多孔介质材料的逆向设计

摘要: 本研究的最终目标是利用基于深度学习的生成框架探索多孔介质超材料的逆向设计。具体来说，我们开发了一种属性变分自动编码器（pVAE），这是一个增加了回归器的变分自动编码器（VAE），用于生成具有定制水力特性（如孔隙度和渗透率）的结构化超材料。虽然本文使用晶格玻尔兹曼方法（LBM）生成有限多孔微结构的内在渗透率张量数据，但使用卷积神经网络（CNN）通过自下而上的方法进行训练以预测有效的水力特性。与直接的LBM模拟相比，这显著降低了计算成本。pVAE框架在两个数据集上进行训练：一个是人工多孔微结构的合成数据集，另一个是来自真实开孔泡沫的CT扫描图像的体积元素。VAE的编码器-解码器架构捕获了关键的微结构特征，将它们映射到一个紧凑且可解释的潜在空间中，以便进行高效的结构-性能探索。该研究对潜在空间进行了详细分析和解释，展示了其在结构-性能映射、插值和逆向设计中的作用。这种方法有助于生成具有所需属性的新型超材料。本研究使用的数据集和代码将被开放获取，以支持进一步研究。

更新时间: 2025-07-23 20:07:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17907v1

Federated Learning for Large-Scale Cloud Robotic Manipulation: Opportunities and Challenges

Federated Learning (FL) is an emerging distributed machine learning paradigm, where the collaborative training of a model involves dynamic participation of devices to achieve broad objectives. In contrast, classical machine learning (ML) typically requires data to be located on-premises for training, whereas FL leverages numerous user devices to train a shared global model without the need to share private data. Current robotic manipulation tasks are constrained by the individual capabilities and speed of robots due to limited low-latency computing resources. Consequently, the concept of cloud robotics has emerged, allowing robotic applications to harness the flexibility and reliability of computing resources, effectively alleviating their computational demands across the cloud-edge continuum. Undoubtedly, within this distributed computing context, as exemplified in cloud robotic manipulation scenarios, FL offers manifold advantages while also presenting several challenges and opportunities. In this paper, we present fundamental concepts of FL and their connection to cloud robotic manipulation. Additionally, we envision the opportunities and challenges associated with realizing efficient and reliable cloud robotic manipulation at scale through FL, where researchers adopt to design and verify FL models in either centralized or decentralized settings.

Updated: 2025-07-23 20:01:36

标题: 大规模云机器人操纵的联合学习：机遇与挑战

摘要: 联邦学习（FL）是一种新兴的分布式机器学习范式，其中模型的协作训练涉及设备的动态参与以实现广泛的目标。相比之下，传统的机器学习（ML）通常需要数据在本地进行训练，而FL利用众多用户设备训练一个共享的全局模型，无需共享私人数据。当前的机器人操作任务受限于机器人的个体能力和速度，由于受限于有限的低延迟计算资源。因此，云机器人技术的概念应运而生，允许机器人应用程序利用计算资源的灵活性和可靠性，有效地减轻它们在云-边缘连续体上的计算需求。毫无疑问，在这种分布式计算环境中，如云机器人操作场景所示，FL提供了多方面的优势，同时也带来了一些挑战和机会。在本文中，我们介绍了FL的基本概念及其与云机器人操作的关系。此外，我们展望了通过FL实现规模化高效可靠的云机器人操作所涉及的机会和挑战，研究人员可以选择在集中式或分散式环境中设计和验证FL模型。

更新时间: 2025-07-23 20:01:36

领域: cs.LG

下载: http://arxiv.org/abs/2507.17903v1

Federated Learning for Large-Scale Cloud Robotic Manipulation: Opportunities and Challenges

Updated: 2025-07-23 20:01:36

标题: 大规模云机器人操作的联合学习：机遇与挑战

摘要: 联邦学习（FL）是一种新兴的分布式机器学习范式，其中模型的协作训练涉及设备的动态参与以实现广泛的目标。相比之下，传统的机器学习（ML）通常需要数据位于本地进行训练，而FL利用大量用户设备来训练共享的全局模型，无需共享私人数据。当前的机器人操作任务受限于机器人的个体能力和速度，由于受限于有限的低延迟计算资源。因此，云机器人技术的概念出现了，允许机器人应用程序利用计算资源的灵活性和可靠性，有效地缓解它们在云边缘连续性上的计算需求。毫无疑问，在这种分布式计算环境下，如在云机器人操作场景中展示的，FL提供了多方面的优势，同时也带来了一些挑战和机遇。在本文中，我们介绍了FL的基本概念及其与云机器人操作的联系。此外，我们设想了通过FL实现大规模高效可靠的云机器人操作所面临的机遇和挑战，研究人员采用集中式或分散式设置来设计和验证FL模型。

更新时间: 2025-07-23 20:01:36

领域: cs.LG

下载: http://arxiv.org/abs/2507.17903v1

"Think First, Verify Always": Training Humans to Face AI Risks

Artificial intelligence enables unprecedented attacks on human cognition, yet cybersecurity remains predominantly device-centric. This paper introduces the "Think First, Verify Always" (TFVA) protocol, which repositions humans as 'Firewall Zero', the first line of defense against AI-enabled threats. The protocol is grounded in five operational principles: Awareness, Integrity, Judgment, Ethical Responsibility, and Transparency (AIJET). A randomized controlled trial (n=151) demonstrated that a minimal 3-minute intervention produced statistically significant improvements in cognitive security task performance, with participants showing an absolute +7.87% gains compared to controls. These results suggest that brief, principles-based training can rapidly enhance human resilience against AI-driven cognitive manipulation. We recommend that GenAI platforms embed "Think First, Verify Always" as a standard prompt, replacing passive warnings with actionable protocols to enhance trustworthy and ethical AI use. By bridging the gap between technical cybersecurity and human factors, the TFVA protocol establishes human-empowered security as a vital component of trustworthy AI systems.

Updated: 2025-07-23 19:59:08

标题: “先思考，始终验证：培训人类面对人工智能风险”

摘要: 人工智能使对人类认知产生了前所未有的攻击，然而网络安全仍然主要以设备为中心。本文介绍了“先思考，始终验证”（TFVA）协议，将人类重新定位为‘防火墙零’，对抗人工智能威胁的第一道防线。该协议基于五个运作原则：意识、完整性、判断力、道德责任和透明度（AIJET）。一项随机对照试验（n=151）表明，一次最少为3分钟的干预在认知安全任务表现上产生了显著的改善，与对照组相比，参与者表现出了绝对值+7.87%的增益。这些结果表明，简短、基于原则的培训可以迅速增强人类对抗人工智能驱动的认知操纵的韧性。我们建议GenAI平台将“先思考，始终验证”作为一个标准提示嵌入，用可操作的协议替代被动警告，以增强可信和道德的人工智能使用。通过弥合技术网络安全和人类因素之间的差距，TFVA协议将人类授权的安全作为可信任的人工智能系统的重要组成部分。

更新时间: 2025-07-23 19:59:08

领域: cs.HC,cs.AI,cs.CR,cs.CY

下载: http://arxiv.org/abs/2508.03714v1

Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025)

Accurately predicting distributed cortical responses to naturalistic stimuli requires models that integrate visual, auditory and semantic information over time. We present a hierarchical multimodal recurrent ensemble that maps pretrained video, audio, and language embeddings to fMRI time series recorded while four subjects watched almost 80 hours of movies provided by the Algonauts 2025 challenge. Modality-specific bidirectional RNNs encode temporal dynamics; their hidden states are fused and passed to a second recurrent layer, and lightweight subject-specific heads output responses for 1000 cortical parcels. Training relies on a composite MSE-correlation loss and a curriculum that gradually shifts emphasis from early sensory to late association regions. Averaging 100 model variants further boosts robustness. The resulting system ranked third on the competition leaderboard, achieving an overall Pearson r = 0.2094 and the highest single-parcel peak score (mean r = 0.63) among all participants, with particularly strong gains for the most challenging subject (Subject 5). The approach establishes a simple, extensible baseline for future multimodal brain-encoding benchmarks.

Updated: 2025-07-23 19:48:27

标题: 多模态循环集合用于预测对自然电影的大脑反应（Algonauts 2025）

摘要: 准确预测对自然刺激的分布式皮层反应需要整合视觉、听觉和语义信息的模型。我们提出了一个分层多模态循环集合，将预训练的视频、音频和语言嵌入映射到由Algonauts2025挑战提供的近80小时电影期间四名受试者观看时记录的fMRI时间序列。特定于模态的双向RNN编码时间动态；它们的隐藏状态被融合并传递到第二个循环层，并且轻量级特定于受试者的头部输出了1000个皮层区域的响应。训练依赖于一个复合的MSE-相关性损失和一个课程，逐渐将重点从早期感官区域转移到晚期联想区域。对100个模型变体的平均进一步提高了鲁棒性。结果系统在比赛排行榜上排名第三，实现了总体皮尔逊r = 0.2094，并且在所有参与者中取得了最高的单一区域峰值分数（平均r = 0.63），尤其是对于最具挑战性的受试者（Subject 5）获得了明显的增益。该方法为未来多模态大脑编码基准建立了一个简单、可扩展的基准线。

更新时间: 2025-07-23 19:48:27

领域: q-bio.NC,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.17897v1

Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025)

Updated: 2025-07-23 19:48:27

标题: 多模态循环集成用于预测对自然电影的大脑响应（Algonauts 2025）

摘要: 准确预测自然刺激的分布式皮层响应需要整合视觉、听觉和语义信息的模型。我们提出了一个分层多模态递归集合，将预训练的视频、音频和语言嵌入映射到记录了四名受试者观看由Algonauts 2025挑战提供的近80小时电影时的fMRI时间序列。特定模态的双向RNN编码时间动态；它们的隐藏状态被融合并传递到第二个递归层，并且轻量级的特定受试者头部输出1000个皮层区域的响应。训练依赖于复合MSE-相关性损失和一个课程，逐渐将重点从早期感觉到后期关联区域转移。平均100个模型变体进一步提高了稳健性。最终的系统在比赛排行榜上排名第三，实现了总体Pearson r = 0.2094，以及所有参与者中最高的单区域峰值得分（平均r = 0.63），尤其是对于最具挑战性的受试者（受试者5）获得了特别强大的增益。该方法为未来多模态脑编码基准建立了一个简单、可扩展的基线。

更新时间: 2025-07-23 19:48:27

领域: q-bio.NC,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.17897v1

VeriMinder: Mitigating Analytical Vulnerabilities in NL2SQL

Application systems using natural language interfaces to databases (NLIDBs) have democratized data analysis. This positive development has also brought forth an urgent challenge to help users who might use these systems without a background in statistical analysis to formulate bias-free analytical questions. Although significant research has focused on text-to-SQL generation accuracy, addressing cognitive biases in analytical questions remains underexplored. We present VeriMinder, https://veriminder.ai, an interactive system for detecting and mitigating such analytical vulnerabilities. Our approach introduces three key innovations: (1) a contextual semantic mapping framework for biases relevant to specific analysis contexts (2) an analytical framework that operationalizes the Hard-to-Vary principle and guides users in systematic data analysis (3) an optimized LLM-powered system that generates high-quality, task-specific prompts using a structured process involving multiple candidates, critic feedback, and self-reflection. User testing confirms the merits of our approach. In direct user experience evaluation, 82.5% participants reported positively impacting the quality of the analysis. In comparative evaluation, VeriMinder scored significantly higher than alternative approaches, at least 20% better when considered for metrics of the analysis's concreteness, comprehensiveness, and accuracy. Our system, implemented as a web application, is set to help users avoid "wrong question" vulnerability during data analysis. VeriMinder code base with prompts, https://reproducibility.link/veriminder, is available as an MIT-licensed open-source software to facilitate further research and adoption within the community.

Updated: 2025-07-23 19:48:12

标题: VeriMinder：减轻NL2SQL中的分析漏洞

摘要: 使用自然语言界面到数据库（NLIDBs）的应用系统已经使数据分析民主化。这一积极发展也提出了一个迫切挑战，即帮助那些可能在没有统计分析背景的情况下使用这些系统的用户制定无偏见的分析问题。尽管大量研究集中在文本到SQL生成准确性上，但解决分析问题中的认知偏见仍未被充分探讨。我们介绍了VeriMinder，https://veriminder.ai，一个用于检测和减轻这种分析脆弱性的交互式系统。我们的方法引入了三个关键创新：（1）一个与特定分析背景相关的偏见上下文语义映射框架；（2）一个将“难以变化”原则操作化并指导用户进行系统数据分析的分析框架；（3）一个优化的LLM动力系统，使用涉及多个候选项、评论反馈和自我反思的结构化过程生成高质量、任务特定的提示。用户测试证实了我们方法的优点。在直接用户体验评估中，82.5%的参与者报告称对分析质量产生了积极影响。在比较评估中，VeriMinder得分显著高于替代方法，至少在分析具体性、全面性和准确性的指标上提高了20%。我们的系统，作为一个Web应用程序实现，旨在帮助用户在数据分析过程中避免“错误问题”脆弱性。VeriMinder代码库与提示，https://reproducibility.link/veriminder，作为一个MIT许可的开源软件，可促进社区内进一步研究和采用。

更新时间: 2025-07-23 19:48:12

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2507.17896v1

Lower Bounds for Public-Private Learning under Distribution Shift

The most effective differentially private machine learning algorithms in practice rely on an additional source of purportedly public data. This paradigm is most interesting when the two sources combine to be more than the sum of their parts. However, there are settings such as mean estimation where we have strong lower bounds, showing that when the two data sources have the same distribution, there is no complementary value to combining the two data sources. In this work we extend the known lower bounds for public-private learning to setting where the two data sources exhibit significant distribution shift. Our results apply to both Gaussian mean estimation where the two distributions have different means, and to Gaussian linear regression where the two distributions exhibit parameter shift. We find that when the shift is small (relative to the desired accuracy), either public or private data must be sufficiently abundant to estimate the private parameter. Conversely, when the shift is large, public data provides no benefit.

Updated: 2025-07-23 19:46:08

标题: 公共-私人学习在分布转移下的下界

摘要: 在实践中，最有效的差分隐私机器学习算法依赖于另一个据称为公共数据的来源。当这两个来源结合起来的效果超过它们各自的总和时，这种范式最有趣。然而，在均值估计等设置中，我们有强有力的下界，表明当两个数据源具有相同的分布时，结合这两个数据源没有互补价值。在这项工作中，我们将已知的公共-私人学习的下界扩展到两个数据源展现出显著分布转变的情况。我们的结果适用于两个分布均值不同的高斯均值估计和两个分布展现参数转变的高斯线性回归。我们发现当转变很小（相对于所需的精度）时，要么公共数据要么私人数据必须足够丰富才能估计私人参数。相反，当转变很大时，公共数据提供不了任何好处。

更新时间: 2025-07-23 19:46:08

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2507.17895v1

Lower Bounds for Public-Private Learning under Distribution Shift

Updated: 2025-07-23 19:46:08

标题: 公共-私人学习在分布转移下的下界

摘要: 在实践中，最有效的差分隐私机器学习算法依赖于另一个据称为公共数据的额外数据源。当这两个数据源结合起来的时候，这种范式最有趣，因为它们的价值不仅仅是各自部分的总和。然而，在某些情况下，例如均值估计，我们有强有力的下界，表明当两个数据源具有相同分布时，将这两个数据源结合起来并没有补充价值。在这项工作中，我们将已知的公共-私人学习的下界扩展到两个数据源表现出明显分布偏移的情况。我们的结果适用于两个分布均值不同的高斯均值估计，以及两个分布表现出参数偏移的高斯线性回归。我们发现当偏移很小（相对于所需的精度）时，要么公共数据要么私人数据必须足够丰富才能估计私人参数。相反，当偏移很大时，公共数据不提供任何好处。

更新时间: 2025-07-23 19:46:08

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2507.17895v1

Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes

This paper explores the application of reinforcement learning techniques to enhance the performance of decoding of linear block codes based on flipping bits and finding optimal decisions. We describe the methodology for mapping the iterative decoding process into Markov Decision Processes (MDPs) and propose different methods to reduce the number of states in the MDP. A truncated MDP is proposed to reduce the number of states in the MDP by learning a Hamming ball with a specified radius around codewords. We then propose a general scheme for reinforcement learning based decoders applicable to any class of codes to improve the performance of decoders. We call this scheme an action-list decoding. We design an action-list decoder based on the Deep-Q network values that substantially enhance performance. We also get benefit of automorphism group of code to further improve the code performance. Additionally, we propose a feedback-based method to exploit and enhance the performance of existing high-performing decoders by applying reinforcement learning algorithms after the existing decoders. These approaches effectively reduces the complexity of the reinforcement learning block. Finally, we present experimental results for the Low-Density Parity Check (LDPC) codes over the Binary Symmetric Channel (BSC) to demonstrate the efficiency of the proposed methods.

Updated: 2025-07-23 19:42:51

标题: Action-List强化学习综合症解码二进制线性分组码

摘要: 本文探讨了利用强化学习技术来增强基于翻转比特和寻找最优决策的线性块码解码性能的应用。我们描述了将迭代解码过程映射到马尔可夫决策过程（MDPs）的方法，并提出了减少MDP状态数量的不同方法。我们提出了一个截断MDP，通过学习围绕码字的指定半径的海明球来减少MDP中的状态数量。然后，我们提出了一种适用于任何类型码的基于强化学习的解码器的通用方案，以提高解码器的性能。我们称这种方案为动作列表解码。我们设计了一个基于深度Q网络值的动作列表解码器，大大提高了性能。我们还利用码的自同构群来进一步提高码的性能。此外，我们提出了一种基于反馈的方法，通过在现有解码器之后应用强化学习算法来利用和增强现有高性能解码器的性能。这些方法有效地降低了强化学习块的复杂度。最后，我们提供了在二元对称信道（BSC）上对低密度奇偶校验（LDPC）码的实验结果，以展示所提方法的效率。

更新时间: 2025-07-23 19:42:51

领域: cs.IT,cs.AI,cs.LG,math.IT

下载: http://arxiv.org/abs/2507.17893v1

Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes

Updated: 2025-07-23 19:42:51

标题: 行动列表强化学习综合解码用于二进制线性分组码

摘要: 本文探讨了强化学习技术在基于翻转比特和找到最优决策的线性块码解码性能提升中的应用。我们描述了将迭代解码过程映射到马尔可夫决策过程（MDP）中的方法，并提出了不同的方法来减少MDP中状态的数量。我们提出了一个截断的MDP，通过学习围绕码字的指定半径的海明球来减少MDP中状态的数量。然后，我们提出了一种适用于任何类别的码的强化学习解码器的通用方案，以改善解码器的性能。我们称这种方案为动作列表解码。我们设计了一个基于深度Q网络值的动作列表解码器，大大提高了性能。我们还利用码的自同构群进一步提高了码的性能。此外，我们提出了一种基于反馈的方法，通过在现有解码器之后应用强化学习算法来利用和提升现有高性能解码器的性能。这些方法有效地降低了强化学习块的复杂性。最后，我们提供了对于二元对称信道（BSC）上低密度奇偶校验（LDPC）码的实验结果，以展示所提方法的效率。

更新时间: 2025-07-23 19:42:51

领域: cs.IT,cs.AI,cs.LG,math.IT

下载: http://arxiv.org/abs/2507.17893v1

Chemical reasoning in LLMs unlocks strategy-aware synthesis planning and reaction mechanism elucidation

While automated chemical tools excel at specific tasks, they have struggled to capture the strategic thinking that characterizes expert chemical reasoning. Here we demonstrate that large language models (LLMs) can serve as powerful tools enabling chemical analysis. When integrated with traditional search algorithms, they enable a new approach to computer-aided synthesis that mirrors human expert thinking. Rather than using LLMs to directly manipulate chemical structures, we leverage their ability to evaluate chemical strategies and guide search algorithms toward chemically meaningful solutions. We demonstrate this paradigm through two fundamental challenges: strategy-aware retrosynthetic planning and mechanism elucidation. In retrosynthetic planning, our system allows chemists to specify desired synthetic strategies in natural language -- from protecting group strategies to global feasibility assessment -- and uses traditional or LLM-guided Monte Carlo Tree Search to find routes that satisfy these constraints. In mechanism elucidation, LLMs guide the search for plausible reaction mechanisms by combining chemical principles with systematic exploration. This approach shows strong performance across diverse chemical tasks, with newer and larger models demonstrating increasingly sophisticated chemical reasoning. Our approach establishes a new paradigm for computer-aided chemistry that combines the strategic understanding of LLMs with the precision of traditional chemical tools, opening possibilities for more intuitive and powerful chemical automation systems.

Updated: 2025-07-23 19:39:27

标题: 低分子量化合物中的化学推理揭示了策略感知的合成规划和反应机理阐明

摘要: 虽然自动化化学工具在特定任务方面表现出色，但它们在捕捉专家化学推理所具有的战略思维方面却遇到了困难。在这里，我们展示了大型语言模型（LLMs）可以作为强大的工具，实现化学分析。当与传统搜索算法集成时，它们可以实现一种模拟人类专家思维的计算机辅助合成新方法。我们利用LLMs的能力评估化学策略并引导搜索算法朝着具有化学意义的解决方案前进，而不是直接操作化学结构。我们通过两个基本挑战展示了这种范式：策略感知的逆合成规划和机理阐明。在逆合成规划中，我们的系统允许化学家用自然语言指定所需的合成策略，从保护基策略到全局可行性评估，并使用传统或LLM引导的蒙特卡洛树搜索找到满足这些约束条件的路径。在机理阐明中，LLMs通过将化学原理与系统探索相结合，引导搜索可信的反应机理。这种方法在各种化学任务中表现出色，较新且更大的模型展示了越来越复杂的化学推理能力。我们的方法建立了一种将LLMs的战略理解与传统化学工具的精确性结合起来的计算机辅助化学新范式，为更直观和强大的化学自动化系统开辟了可能性。

更新时间: 2025-07-23 19:39:27

领域: cs.AI,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2503.08537v2

DeepCrossAttention: Supercharging Transformer Residual Connections

Transformer networks have achieved remarkable success across diverse domains, leveraging a variety of architectural innovations, including residual connections. However, traditional residual connections, which simply sum the outputs of previous layers, can dilute crucial information. This work introduces DeepCrossAttention (DCA), an approach that enhances residual learning in transformers. DCA employs learnable, input-dependent weights to dynamically combine layer outputs, enabling the model to selectively focus on the most relevant information in any of the previous layers. Furthermore, DCA incorporates depth-wise cross-attention, allowing for richer interactions between layers at different depths. Our language modeling experiments show that DCA achieves improved perplexity for a given training time. Moreover, DCA obtains the same model quality up to 3x faster while adding a negligible number of parameters. Theoretical analysis confirms that DCA provides an improved trade-off between accuracy and model size when the ratio of collective layer ranks to the ambient dimension falls below a critical threshold.

Updated: 2025-07-23 19:32:20

标题: DeepCrossAttention：超级变压器残差连接

摘要: Transformer网络在各个领域取得了显著的成功，利用了各种架构创新，包括残差连接。然而，传统的残差连接仅仅是简单地将前一层的输出相加，可能会稀释关键信息。本文引入了DeepCrossAttention（DCA），一种增强transformers中残差学习的方法。DCA利用可学习的、输入相关的权重动态地组合层输出，使模型能够选择性地关注前一层中最相关的信息。此外，DCA还融入了深度交叉注意力，允许不同深度的层之间进行更丰富的交互。我们的语言建模实验表明，DCA在给定的训练时间内实现了改进的困惑度。此外，DCA以最多3倍的速度获得相同的模型质量，同时增加了可忽略的参数数量。理论分析证实，当集体层秩与环境维度的比率低于临界阈值时，DCA提供了更好的精度和模型大小之间的权衡。

更新时间: 2025-07-23 19:32:20

领域: cs.LG

下载: http://arxiv.org/abs/2502.06785v2

DeepCrossAttention: Supercharging Transformer Residual Connections

Updated: 2025-07-23 19:32:20

标题: 深度交叉注意力：加速Transformer残差连接

摘要: Transformer网络在各个领域取得了显著的成功，利用了各种架构创新，包括残差连接。然而，传统的残差连接只是简单地将前一层的输出相加，可能会稀释关键信息。本研究引入了DeepCrossAttention（DCA）方法，用于增强transformers中的残差学习。DCA使用可学习的、依赖于输入的权重动态地结合层的输出，使模型能够选择性地关注任何一个前一层中最相关的信息。此外，DCA还包括深度交叉注意力，允许不同深度的层之间进行更丰富的交互。我们的语言建模实验证明，DCA在给定训练时间内实现了改进的困惑度。此外，DCA在增加可忽略的参数数量的同时，最多可以比原模型更快3倍地获得相同的模型质量。理论分析证实，在集体层秩与环境维度之比低于临界阈值时，DCA提供了更好的精度和模型大小的权衡。

更新时间: 2025-07-23 19:32:20

领域: cs.LG

下载: http://arxiv.org/abs/2502.06785v2

Learning to Locate: GNN-Powered Vulnerability Path Discovery in Open Source Code

Detecting security vulnerabilities in open-source software is a critical task that is highly regarded in the related research communities. Several approaches have been proposed in the literature for detecting vulnerable codes and identifying the classes of vulnerabilities. However, there is still room to work in explaining the root causes of detected vulnerabilities through locating vulnerable statements and the discovery of paths leading to the activation of the vulnerability. While frameworks like SliceLocator offer explanations by identifying vulnerable paths, they rely on rule-based sink identification that limits their generalization. In this paper, we introduce VulPathFinder, an explainable vulnerability path discovery framework that enhances SliceLocator's methodology by utilizing a novel Graph Neural Network (GNN) model for detecting sink statements, rather than relying on predefined rules. The proposed GNN captures semantic and syntactic dependencies to find potential sink points (PSPs), which are candidate statements where vulnerable paths end. After detecting PSPs, program slicing can be used to extract potentially vulnerable paths, which are then ranked by feeding them back into the target graph-based detector. Ultimately, the most probable path is returned, explaining the root cause of the detected vulnerability. We demonstrated the effectiveness of the proposed approach by performing evaluations on a benchmark of the buffer overflow CWEs from the SARD dataset, providing explanations for the corresponding detected vulnerabilities. The results show that VulPathFinder outperforms both original SliceLocator and GNNExplainer (as a general GNN explainability tool) in discovery of vulnerability paths to identified PSPs.

Updated: 2025-07-23 19:30:37

标题: 学习定位：在开源代码中使用GNN技术发现漏洞路径

摘要: 在相关研究领域中，检测开源软件中的安全漏洞是一项备受重视的关键任务。文献中提出了几种方法来检测易受攻击的代码并识别漏洞类别。然而，仍有待探讨如何通过定位易受攻击语句和发现导致漏洞激活的路径来解释检测到的漏洞的根本原因。虽然像SliceLocator这样的框架能够通过识别易受攻击路径来提供解释，但它们依赖于基于规则的漏洞点识别，限制了它们的泛化性。本文介绍了VulPathFinder，这是一个可解释漏洞路径发现框架，通过利用一种新颖的图神经网络（GNN）模型来检测漏洞语句，而不是依赖预定义规则，从而增强了SliceLocator的方法论。所提出的GNN捕获了语义和句法依赖关系，以找到潜在的漏洞点（PSPs），即漏洞路径结束的候选语句。在检测到PSPs后，可以使用程序切片来提取潜在易受攻击的路径，然后通过将它们反馈到目标基于图的检测器对其进行排名。最终，返回最有可能的路径，解释检测到的漏洞的根本原因。我们通过对SARD数据集中缓冲区溢出CWE的基准进行评估，为相应检测到的漏洞提供解释，证明了所提出方法的有效性。结果表明，VulPathFinder在发现通向识别的PSP的漏洞路径方面优于原始的SliceLocator和GNNExplainer（作为通用的GNN可解释性工具）。

更新时间: 2025-07-23 19:30:37

领域: cs.CR

下载: http://arxiv.org/abs/2507.17888v1

Learning to Locate: GNN-Powered Vulnerability Path Discovery in Open Source Code

Updated: 2025-07-23 19:30:37

标题: 学习定位：在开源代码中使用GNN查找漏洞路径

摘要: 在相关研究社区中，检测开源软件中的安全漏洞是一项备受重视的关键任务。文献中提出了几种方法来检测易受攻击代码并识别漏洞类别。然而，仍然有空间可以通过定位易受攻击语句和发现导致漏洞激活的路径来解释检测到的漏洞根本原因。虽然像SliceLocator这样的框架通过识别易受攻击路径提供解释，但它们依赖于基于规则的汇点识别，从而限制了其泛化性。在本文中，我们介绍了VulPathFinder，这是一个可解释的漏洞路径发现框架，通过利用一种新颖的图神经网络（GNN）模型来检测汇点语句，而不是依赖于预定义规则。所提出的GNN捕获语义和句法依赖关系，以找到潜在的汇点（PSPs），即漏洞路径结束的候选语句。在检测到PSPs后，可以使用程序切片来提取潜在易受攻击路径，然后通过将它们反馈到目标基于图的检测器来对其进行排序。最终，最可能的路径将被返回，解释检测到的漏洞的根本原因。我们通过对SARD数据集中缓冲区溢出CWEs的基准测试进行评估，为相应检测到的漏洞提供解释，展示了所提出方法的有效性。结果显示，VulPathFinder在发现通向已识别PSPs的漏洞路径方面优于原始的SliceLocator和GNNExplainer（作为通用GNN可解释性工具）。

更新时间: 2025-07-23 19:30:37

领域: cs.CR

下载: http://arxiv.org/abs/2507.17888v1

Fourier Neural Operators for Non-Markovian Processes:Approximation Theorems and Experiments

This paper introduces an operator-based neural network, the mirror-padded Fourier neural operator (MFNO), designed to learn the dynamics of stochastic systems. MFNO extends the standard Fourier neural operator (FNO) by incorporating mirror padding, enabling it to handle non-periodic inputs. We rigorously prove that MFNOs can approximate solutions of path-dependent stochastic differential equations and Lipschitz transformations of fractional Brownian motions to an arbitrary degree of accuracy. Our theoretical analysis builds on Wong--Zakai type theorems and various approximation techniques. Empirically, the MFNO exhibits strong resolution generalization--a property rarely seen in standard architectures such as LSTMs, TCNs, and DeepONet. Furthermore, our model achieves performance that is comparable or superior to these baselines while offering significantly faster sample path generation than classical numerical schemes.

Updated: 2025-07-23 19:30:34

标题: 傅里叶神经算子用于非马尔可夫过程：逼近定理与实验

摘要: 这篇论文介绍了一种基于操作符的神经网络，即镜像填充傅立叶神经算子（MFNO），旨在学习随机系统的动态。MFNO通过整合镜像填充扩展了标准傅立叶神经算子（FNO），使其能够处理非周期性输入。我们严格证明了MFNO可以近似解路径依赖随机微分方程和分数布朗运动的利普希茨变换，精度可达任意程度。我们的理论分析基于Wong-Zakai类型定理和各种逼近技术。在经验上，MFNO表现出强大的分辨率泛化能力，这是标准架构（如LSTM、TCN和DeepONet）中很少见的特性。此外，我们的模型在提供比这些基线更快的样本路径生成速度的同时，实现了与它们相媲美或更优的性能。

更新时间: 2025-07-23 19:30:34

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2507.17887v1

Fourier Neural Operators for Non-Markovian Processes:Approximation Theorems and Experiments

Updated: 2025-07-23 19:30:34

标题: 傅立叶神经算子用于非马尔科夫过程：逼近定理和实验

摘要: 本文介绍了一种基于操作符的神经网络，即镜像填充傅立叶神经算子（MFNO），旨在学习随机系统的动力学。MFNO通过引入镜像填充扩展了标准的傅立叶神经算子（FNO），使其能够处理非周期性输入。我们严格证明了MFNO可以近似解决依赖路径的随机微分方程和分数布朗运动的利普希茨变换，精度可任意调节。我们的理论分析基于Wong-Zakai类型定理和各种逼近技术。从经验上看，MFNO表现出强大的分辨率泛化能力——这是标准架构如LSTM、TCN和DeepONet中很少见的特性。此外，我们的模型在提供明显更快的样本路径生成速度的同时，达到与这些基线相媲美或更好的性能。

更新时间: 2025-07-23 19:30:34

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2507.17887v1

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Vision-language models are integral to computer vision research, yet many high-performing models remain closed-source, obscuring their data, design and training recipe. The research community has responded by using distillation from black-box models to label training data, achieving strong benchmark results, at the cost of measurable scientific progress. However, without knowing the details of the teacher model and its data sources, scientific progress remains difficult to measure. In this paper, we study building a Perception Language Model (PLM) in a fully open and reproducible framework for transparent research in image and video understanding. We analyze standard training pipelines without distillation from proprietary models and explore large-scale synthetic data to identify critical data gaps, particularly in detailed video understanding. To bridge these gaps, we release 2.8M human-labeled instances of fine-grained video question-answer pairs and spatio-temporally grounded video captions. Additionally, we introduce PLM-VideoBench, a suite for evaluating challenging video understanding tasks focusing on the ability to reason about "what", "where", "when", and "how" of a video. We make our work fully reproducible by providing data, training recipes, code & models. https://github.com/facebookresearch/perception_models

Updated: 2025-07-23 19:22:35

标题: PerceptionLM: 用于详细视觉理解的开放获取数据和模型

摘要: 视觉语言模型对计算机视觉研究至关重要，然而许多高性能模型仍然是闭源的，隐匿了它们的数据、设计和训练方法。研究界通过使用来自黑盒模型的蒸馏来标记训练数据做出了回应，取得了强大的基准结果，但以可量化的科学进展为代价。然而，如果不了解教师模型及其数据来源的细节，科学进展仍然难以衡量。在本文中，我们研究了在全面开放和可重复的框架中构建感知语言模型（PLM）以进行透明的图像和视频理解研究。我们分析了标准的训练流程，没有使用专有模型的蒸馏，并探索了大规模的合成数据，以识别关键的数据空白，特别是在详细视频理解方面。为了弥合这些差距，我们发布了280万个人标记的细粒度视频问答对和时空地理视频标题实例。此外，我们推出了PLM-VideoBench，一个用于评估具有挑战性的视频理解任务的套件，重点关注对视频的“什么”、“哪里”、“何时”和“如何”进行推理的能力。我们通过提供数据、训练方法、代码和模型使我们的工作完全可重现。https://github.com/facebookresearch/perception_models

更新时间: 2025-07-23 19:22:35

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.13180v3

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Updated: 2025-07-23 19:22:35

标题: PerceptionLM：用于详细视觉理解的开放访问数据和模型

摘要: 视觉语言模型是计算机视觉研究中不可或缺的部分，然而许多高性能模型仍然是闭源的，使其数据、设计和训练方法不透明。研究界通过使用黑盒模型的提取来标记训练数据，取得了强大的基准结果，但这也牺牲了可衡量的科学进展。然而，如果不了解教师模型及其数据来源的细节，科学进展仍然难以量化。在本文中，我们研究了在一个完全开放和可重复的框架中构建一个感知语言模型（PLM），以进行图像和视频理解的透明研究。我们分析了标准的训练流程，没有使用专有模型的提取，并探索大规模合成数据，以识别关键数据差距，特别是在详细的视频理解方面。为了弥合这些差距，我们发布了280万个细粒度视频问答对和时空定位视频字幕的人工标记实例。此外，我们还推出了PLM-VideoBench，这是一个用于评估具有挑战性的视频理解任务的套件，重点关注于对视频的“什么”、“何处”、“何时”和“如何”进行推理的能力。我们通过提供数据、训练方法、代码和模型使我们的工作完全可重复。https://github.com/facebookresearch/perception_models

更新时间: 2025-07-23 19:22:35

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.13180v3

A Supervised Machine Learning Framework for Multipactor Breakdown Prediction in High-Power Radio Frequency Devices and Accelerator Components: A Case Study in Planar Geometry

Multipactor is a nonlinear electron avalanche phenomenon that can severely impair the performance of high-power radio frequency (RF) devices and accelerator systems. Accurate prediction of multipactor susceptibility across different materials and operational regimes remains a critical yet computationally intensive challenge in accelerator component design and RF engineering. This study presents the first application of supervised machine learning (ML) for predicting multipactor susceptibility in two-surface planar geometries. A simulation-derived dataset spanning six distinct secondary electron yield (SEY) material profiles is used to train regression models - including Random Forest (RF), Extra Trees (ET), Extreme Gradient Boosting (XGBoost), and funnel-structured Multilayer Perceptrons (MLPs) - to predict the time-averaged electron growth rate, ${\delta}_{avg}$. Performance is evaluated using Intersection over Union (IoU), Structural Similarity Index (SSIM), and Pearson correlation coefficient. Tree-based models consistently outperform MLPs in generalizing across disjoint material domains. MLPs trained using a scalarized objective function that combines IoU and SSIM during Bayesian hyperparameter optimization with 5-fold cross-validation outperform those trained with single-objective loss functions. Principal Component Analysis reveals that performance degradation for certain materials stems from disjoint feature-space distributions, underscoring the need for broader dataset coverage. This study demonstrates both the promise and limitations of ML-based multipactor prediction and lays the groundwork for accelerated, data-driven modeling in advanced RF and accelerator system design.

Updated: 2025-07-23 19:14:46

标题: 一个用于高功率射频器件和加速器元件中多普勒击穿预测的监督式机器学习框架：平面几何案例研究

摘要: 多极体是一种非线性电子雪崩现象，严重影响高功率射频（RF）设备和加速器系统的性能。在加速器组件设计和RF工程中，跨不同材料和操作范围精确预测多极体敏感性仍然是一个关键但计算密集型挑战。本研究首次应用监督机器学习（ML）来预测两表面平面几何中的多极体敏感性。使用模拟派生的数据集，跨越六种不同的二次电子产额（SEY）材料配置文件来训练回归模型 - 包括随机森林（RF）、额外树（ET）、极端梯度提升（XGBoost）和漏斗结构的多层感知器（MLPs）- 来预测时间平均电子增长率，δavg。性能评估使用IoU（交集超联合）、结构相似性指数（SSIM）和皮尔逊相关系数。基于树的模型在横跨不同材料领域时通常优于MLPs。使用IoU和SSIM组合的标量化目标函数在贝叶斯超参数优化和5倍交叉验证期间训练的MLPs优于使用单一目标损失函数训练的模型。主成分分析显示，对于某些材料，性能下降源于不连续的特征空间分布，强调了对更广泛数据集覆盖的需求。本研究展示了基于ML的多极体预测的潜力和局限性，并为高级RF和加速器系统设计中的加速、数据驱动建模奠定了基础。

更新时间: 2025-07-23 19:14:46

领域: physics.acc-ph,cs.LG,physics.app-ph,physics.plasm-ph

下载: http://arxiv.org/abs/2507.17881v1

A Supervised Machine Learning Framework for Multipactor Breakdown Prediction in High-Power Radio Frequency Devices and Accelerator Components: A Case Study in Planar Geometry

Updated: 2025-07-23 19:14:46

标题: 一个监督式机器学习框架用于高功率射频器件和加速器组件中多电子器击穿预测：平面几何案例研究

摘要: 多重放电器是一种非线性电子雪崩现象，可以严重影响高功率射频（RF）器件和加速器系统的性能。在加速器部件设计和RF工程中，准确预测不同材料和操作范围内的多重放电器敏感性仍然是一个关键但计算密集的挑战。本研究首次应用监督机器学习（ML）来预测两表面平面几何中的多重放电器敏感性。利用从模拟中得出的跨越六种不同次级电子发射率（SEY）材料特性的数据集来训练回归模型 - 包括随机森林（RF）、额外树（ET）、极限梯度提升（XGBoost）和漏斗结构的多层感知器（MLPs）- 以预测时间平均电子增长率${\delta}_{avg}$。性能使用交集超过联合（IoU）、结构相似性指数（SSIM）和皮尔逊相关系数进行评估。基于树的模型在一般化跨越不同材料领域方面始终优于MLPs。使用将IoU和SSIM结合在贝叶斯超参数优化期间的标量化目标函数进行训练的MLPs，在5倍交叉验证中胜过使用单目标损失函数进行训练的MLPs。主成分分析显示，对于某些材料，性能下降源于不连续的特征空间分布，强调了更广泛数据集覆盖的必要性。本研究展示了基于ML的多重放电器预测的前景和局限性，并为先进RF和加速器系统设计中的加速、数据驱动建模奠定了基础。

更新时间: 2025-07-23 19:14:46

领域: physics.acc-ph,cs.LG,physics.app-ph,physics.plasm-ph

下载: http://arxiv.org/abs/2507.17881v1

Look the Other Way: Designing 'Positive' Molecules with Negative Data via Task Arithmetic

The scarcity of molecules with desirable properties (i.e., 'positive' molecules) is an inherent bottleneck for generative molecule design. To sidestep such obstacle, here we propose molecular task arithmetic: training a model on diverse and abundant negative examples to learn 'property directions' $--$ without accessing any positively labeled data $--$ and moving models in the opposite property directions to generate positive molecules. When analyzed on 20 zero-shot design experiments, molecular task arithmetic generated more diverse and successful designs than models trained on positive molecules. Moreover, we employed molecular task arithmetic in dual-objective and few-shot design tasks. We find that molecular task arithmetic can consistently increase the diversity of designs while maintaining desirable design properties. With its simplicity, data efficiency, and performance, molecular task arithmetic bears the potential to become the $\textit{de-facto}$ transfer learning strategy for de novo molecule design.

Updated: 2025-07-23 19:05:37

标题: 别掉头看：通过任务算术设计“积极”分子的负面数据

摘要: 具有理想特性（即“正向”分子）的分子稀缺是生成分子设计的固有瓶颈。为了避开这种障碍，我们提出了分子任务算术：在多样化丰富的负面示例上训练模型，学习“性质方向”$--$而无需访问任何正面标记数据$--$并将模型移动到相反的性质方向以生成正面分子。在20个零样本设计实验中进行分析时，分子任务算术生成的设计比在正面分子上训练的模型更多样化且更成功。此外，我们在双目标和少样本设计任务中应用了分子任务算术。我们发现，分子任务算术可以持续增加设计的多样性，同时保持理想的设计特性。凭借其简单性、数据效率和性能，分子任务算术具有成为全新分子设计的$\textit{事实}$转移学习策略的潜力。

更新时间: 2025-07-23 19:05:37

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2507.17876v1

Look the Other Way: Designing 'Positive' Molecules with Negative Data via Task Arithmetic

Updated: 2025-07-23 19:05:37

标题: 看向另一边：通过任务算术设计具有负数据的“积极”分子

摘要: 所需性质（即“积极”分子）稀缺是生成分子设计的固有瓶颈。为了避开这种障碍，我们在此提出分子任务算术：在多样化丰富的负面示例上训练模型，学习“属性方向”$--$而无需访问任何积极标记数据$--$并将模型移动到相反的属性方向以生成积极分子。在对20个零样本设计实验进行分析时，分子任务算术生成了比基于积极分子训练的模型更多样化和成功的设计。此外，我们在双目标和少样本设计任务中应用了分子任务算术。我们发现分子任务算术可以持续增加设计的多样性，同时保持理想的设计属性。由于其简单性、数据效率和性能，分子任务算术有潜力成为从头设计分子的$\textit{de-facto}$迁移学习策略。

更新时间: 2025-07-23 19:05:37

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2507.17876v1

I2I-STRADA -- Information to Insights via Structured Reasoning Agent for Data Analysis

Recent advances in agentic systems for data analysis have emphasized automation of insight generation through multi-agent frameworks, and orchestration layers. While these systems effectively manage tasks like query translation, data transformation, and visualization, they often overlook the structured reasoning process underlying analytical thinking. Reasoning large language models (LLMs) used for multi-step problem solving are trained as general-purpose problem solvers. As a result, their reasoning or thinking steps do not adhere to fixed processes for specific tasks. Real-world data analysis requires a consistent cognitive workflow: interpreting vague goals, grounding them in contextual knowledge, constructing abstract plans, and adapting execution based on intermediate outcomes. We introduce I2I-STRADA (Information-to-Insight via Structured Reasoning Agent for Data Analysis), an agentic architecture designed to formalize this reasoning process. I2I-STRADA focuses on modeling how analysis unfolds via modular sub-tasks that reflect the cognitive steps of analytical reasoning. Evaluations on the DABstep and DABench benchmarks show that I2I-STRADA outperforms prior systems in planning coherence and insight alignment, highlighting the importance of structured cognitive workflows in agent design for data analysis.

Updated: 2025-07-23 18:58:42

标题: I2I-STRADA -- 通过结构化推理代理实现数据分析的信息到洞察力

摘要: 最近在代理系统数据分析方面的进展强调通过多代理框架和编排层自动生成洞察。虽然这些系统有效地管理诸如查询翻译、数据转换和可视化等任务，但它们经常忽视分析思维背后的结构化推理过程。用于多步问题解决的大型语言模型（LLM）训练为通用问题解决者。因此，它们的推理或思考步骤不遵循特定任务的固定流程。现实世界的数据分析需要一致的认知工作流程：解释模糊目标，将其建立在上下文知识中，构建抽象计划，并根据中间结果调整执行。我们介绍了I2I-STRADA（通过结构化推理代理实现信息到洞察的数据分析），这是一个旨在形式化这一推理过程的代理架构。I2I-STRADA专注于建模分析是如何通过反映分析推理的认知步骤的模块化子任务展开的。对DABstep和DABench基准进行的评估表明，I2I-STRADA在计划连贯性和洞察对齐方面优于先前的系统，突出了结构化认知工作流程在数据分析代理设计中的重要性。

更新时间: 2025-07-23 18:58:42

领域: cs.AI

下载: http://arxiv.org/abs/2507.17874v1

Integrating Feature Selection and Machine Learning for Nitrogen Assessment in Grapevine Leaves using In-Field Hyperspectral Imaging

Nitrogen (N) is one of the most crucial nutrients in vineyards, affecting plant growth and subsequent products such as wine and juice. Because soil N has high spatial and temporal variability, it is desirable to accurately estimate the N concentration of grapevine leaves and manage fertilization at the individual plant level to optimally meet plant needs. In this study, we used in-field hyperspectral images with wavelengths ranging from $400 to 1000nm of four different grapevine cultivars collected from distinct vineyards and over two growth stages during two growing seasons to develop models for predicting N concentration at the leaf-level and canopy-level. After image processing, two feature selection methods were employed to identify the optimal set of spectral bands that were responsive to leaf N concentrations. The selected spectral bands were used to train and test two different Machine Learning (ML) models, Gradient Boosting and XGBoost, for predicting nitrogen concentrations. The comparison of selected bands for both leaf-level and canopy-level datasets showed that most of the spectral regions identified by the feature selection methods were across both methods and the dataset types (leaf- and canopy-level datasets), particularly in the key regions, 500-525nm, 650-690nm, 750-800nm, and 900-950nm. These findings indicated the robustness of these spectral regions for predicting nitrogen content. The results for N prediction demonstrated that the ML model achieved an R square of 0.49 for canopy-level data and an R square of 0.57 for leaf-level data, despite using different sets of selected spectral bands for each analysis level. The study demonstrated the potential of using in-field hyperspectral imaging and the use of spectral data in integrated feature selection and ML techniques to monitor N status in vineyards.

Updated: 2025-07-23 18:53:23

标题: 使用现场高光谱成像将特征选择和机器学习集成用于葡萄叶片中的氮素评估

摘要: 氮（N）是葡萄园中最关键的营养素之一，影响植物生长和随后产品（如葡萄酒和果汁）。由于土壤中的氮具有高空间和时间变异性，因此希望能够准确估计葡萄叶的氮浓度，并在个体植株水平上进行施肥管理，以最大限度地满足植物的需求。在这项研究中，我们使用了从不同葡萄园收集的四种不同葡萄品种在两个生长季节期间的两个生长阶段的波长范围为400至1000nm的现场高光谱图像，以开发模型来预测叶层和冠层水平上的氮浓度。在图像处理后，采用了两种特征选择方法来确定对叶片氮浓度有响应的最佳光谱波段集。选定的光谱波段用于训练和测试两种不同的机器学习（ML）模型，梯度提升和XGBoost，用于预测氮浓度。对叶层和冠层数据集的选定波段进行比较显示，特征选择方法确定的大部分光谱区域跨越了两种方法和数据集类型（叶片和冠层级数据集），特别是在关键区域500-525nm、650-690nm、750-800nm和900-950nm。这些发现表明了这些光谱区域用于预测氮含量的稳健性。氮预测的结果表明，尽管在每个分析水平使用不同的选定光谱波段，ML模型在冠层数据上实现了0.49的R平方，叶层数据上实现了0.57的R平方。该研究展示了在葡萄园中使用现场高光谱成像和在集成特征选择和ML技术中使用光谱数据来监测氮状态的潜力。

更新时间: 2025-07-23 18:53:23

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.17869v1

Integrating Feature Selection and Machine Learning for Nitrogen Assessment in Grapevine Leaves using In-Field Hyperspectral Imaging

Updated: 2025-07-23 18:53:23

标题: 利用现场高光谱成像将特征选择和机器学习集成用于葡萄叶片中的氮素评估

摘要: 氮（N）是葡萄园中最关键的营养元素之一，影响植物生长以及随后的产品，如葡萄酒和果汁。由于土壤中的氮具有高度的空间和时间变异性，因此希望能够准确估计葡萄叶片的氮浓度，并在单株植物水平上管理施肥，以最佳方式满足植物需求。在这项研究中，我们利用了从不同葡萄园收集的四种不同葡萄品种在两个生长阶段、两个生长季节中采集的波长范围为$400至1000nm的现场高光谱图像，开发了用于预测叶片水平和冠层水平的氮浓度的模型。在图像处理后，采用了两种特征选择方法来确定对叶片氮浓度响应的最佳光谱波段集。所选的光谱波段用于训练和测试两种不同的机器学习（ML）模型，梯度提升和XGBoost，以预测氮浓度。对叶片水平和冠层水平数据集的选定波段进行比较，结果显示大多数被特征选择方法确定的光谱区域跨越了两种方法和数据集类型（叶片和冠层水平数据集），尤其是在关键区域500-525nm，650-690nm，750-800nm和900-950nm。这些发现表明这些光谱区域对于预测氮含量具有很强的鲁棒性。氮预测结果表明，尽管每个分析水平使用了不同的选定光谱波段，但ML模型在冠层水平数据上达到了0.49的R平方，在叶片水平数据上达到了0.57的R平方。该研究证明了利用现场高光谱成像和光谱数据在集成特征选择和ML技术中的潜力，以监测葡萄园中的氮状况。

更新时间: 2025-07-23 18:53:23

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.17869v1

Learning Individual Reproductive Behavior from Aggregate Fertility Rates via Neural Posterior Estimation

Age-specific fertility rates (ASFRs) provide the most extensive record of reproductive change, but their aggregate nature obscures the individual-level behavioral mechanisms that drive fertility trends. To bridge this micro-macro divide, we introduce a likelihood-free Bayesian framework that couples a demographically interpretable, individual-level simulation model of the reproductive process with Sequential Neural Posterior Estimation (SNPE). We show that this framework successfully recovers core behavioral parameters governing contemporary fertility, including preferences for family size, reproductive timing, and contraceptive failure, using only ASFRs. The framework's effectiveness is validated on cohorts from four countries with diverse fertility regimes. Most compellingly, the model, estimated solely on aggregate data, successfully predicts out-of-sample distributions of individual-level outcomes, including age at first sex, desired family size, and birth intervals. Because our framework yields complete synthetic life histories, it significantly reduces the data requirements for building microsimulation models and enables behaviorally explicit demographic forecasts.

Updated: 2025-07-23 18:51:06

标题: 通过神经后验估计从总体生育率中学习个体生殖行为

摘要: 年龄特定生育率（ASFRs）提供了生殖变化最广泛的记录，但它们的总体性质掩盖了推动生育趋势的个体级行为机制。为了弥合这种微观-宏观分歧，我们引入了一个基于贝叶斯框架的无似然度方法，将一个人口学可解释的、个体级别的生殖过程模拟模型与顺序神经后验估计（SNPE）相结合。我们展示了这个框架成功地恢复了控制当代生育的核心行为参数，包括对家庭规模、生育时间和避孕失败的偏好，仅使用ASFRs。这个框架的有效性已在来自四个生育制度不同的国家的队列上得到验证。最具说服力的是，仅仅根据总体数据估计的模型成功预测了样本外的个体级别结果分布，包括初次性行为年龄、期望家庭规模和生育间隔。由于我们的框架产生完整的合成生活历程，它显著减少了建立微观模拟模型的数据要求，并且使行为明确的人口预测成为可能。

更新时间: 2025-07-23 18:51:06

领域: stat.AP,cs.LG

下载: http://arxiv.org/abs/2506.22607v2

Learning Individual Reproductive Behavior from Aggregate Fertility Rates via Neural Posterior Estimation

Updated: 2025-07-23 18:51:06

标题: 通过神经后验估计从聚合生育率中学习个体生殖行为

摘要: 年龄特定生育率（ASFRs）提供了生殖变化最广泛的记录，但它们的总体性质掩盖了驱动生育趋势的个体级行为机制。为了弥合这种微观-宏观分歧，我们引入了一个无似然贝叶斯框架，将人口学可解释的、个体级的生殖过程模拟模型与顺序神经后验估计（SNPE）相结合。我们展示了这个框架成功地恢复了控制当代生育的核心行为参数，包括对家庭规模、生育时间和避孕失败的偏好，仅使用ASFRs。该框架的有效性已经在来自四个生育制度不同的国家的队列上得到验证。最具说服力的是，仅基于聚合数据估计的模型成功地预测了个体级结果的样本外分布，包括初次性交的年龄、期望的家庭规模和出生间隔。由于我们的框架提供了完整的合成生活历程，它显著降低了构建微观模拟模型的数据需求，并使得行为明确的人口预测成为可能。

更新时间: 2025-07-23 18:51:06

领域: stat.AP,cs.LG

下载: http://arxiv.org/abs/2506.22607v2

Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving

Generative conversational interfaces powered by large language models (LLMs) typically stream output token-by-token at a rate determined by computational budget, often neglecting actual human reading speeds and the cognitive load associated with the content. This mismatch frequently leads to inefficient use of computational resources. For example, in cloud-based services, streaming content faster than users can read appears unnecessary, resulting in wasted computational resources and potential delays for other users, particularly during peak usage periods. To address this issue, we propose an adaptive streaming method that dynamically adjusts the pacing of LLM streaming output in real-time based on inferred cognitive load. Our approach estimates the cognitive load associated with streaming content and strategically slows down the stream during complex or information-rich segments, thereby freeing computational resources for other users. We conducted a statistical analysis and simulation based on a statistical model derived from data collected in a crowdsourced user study across various types of LLM-generated content. Our results show that this adaptive method can effectively reduce computational consumption while largely maintaining streaming speed above user's normal reading speed.

Updated: 2025-07-23 18:50:43

标题: 流式传输，快与慢：认知负荷感知的流式传输以实现高效的LLM服务

摘要: 由大型语言模型（LLMs）提供动力的生成式对话界面通常以计算预算确定的速率逐个标记地流输出，经常忽视实际的人类阅读速度和与内容相关的认知负荷。这种不匹配经常导致计算资源的低效使用。例如，在基于云的服务中，以用户无法阅读的速度流出内容似乎是不必要的，导致计算资源的浪费和潜在的延迟给其他用户，特别是在高峰使用期间。为了解决这个问题，我们提出了一种自适应流方法，根据推断的认知负荷实时动态调整LLM流输出的节奏。我们的方法估计与流媒体内容相关的认知负荷，并在复杂或信息丰富的片段中策略性地减慢流速，从而为其他用户释放计算资源。我们根据在众包用户研究中收集的各种类型的LLM生成内容的数据派生的统计模型进行了统计分析和模拟。我们的结果表明，这种自适应方法可以有效减少计算消耗，同时大部分保持流媒体速度高于用户的正常阅读速度。

更新时间: 2025-07-23 18:50:43

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2504.17999v2

Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving

Updated: 2025-07-23 18:50:43

标题: 实时、快速和缓慢：基于认知负荷的流媒体以提高LLM服务效率

摘要: 由大型语言模型（LLMs）驱动的生成式对话界面通常以计算预算确定的速率逐标记流出，往往忽略实际人类阅读速度和与内容相关的认知负荷。这种不匹配经常导致计算资源的低效使用。例如，在基于云的服务中，将内容流速快于用户阅读速度似乎是不必要的，导致计算资源浪费和其他用户可能出现潜在延迟，特别是在高峰使用时期。为了解决这个问题，我们提出了一种自适应流方法，根据推断的认知负荷实时动态调整LLM流式输出的节奏。我们的方法估计了与流式内容相关的认知负荷，并在复杂或信息丰富的部分策略性地减慢流速，从而为其他用户释放计算资源。我们基于在众包用户研究中收集的各种类型的LLM生成内容的数据推导的统计模型进行了统计分析和模拟。我们的结果表明，这种自适应方法可以有效减少计算消耗，同时基本保持流速高于用户正常阅读速度。

更新时间: 2025-07-23 18:50:43

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2504.17999v2

PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models

The risk of misusing text-to-image generative models for malicious uses, especially due to the open-source development of such models, has become a serious concern. As a risk mitigation strategy, attributing generative models with neural fingerprinting is emerging as a popular technique. There has been a plethora of recent work that aim for addressing neural fingerprinting. A trade-off between the attribution accuracy and generation quality of such models has been studied extensively. None of the existing methods yet achieved 100% attribution accuracy. However, any model with less than cent percent accuracy is practically non-deployable. In this work, we propose an accurate method to incorporate neural fingerprinting for text-to-image diffusion models leveraging the concepts of cyclic error correcting codes from the literature of coding theory.

Updated: 2025-07-23 18:41:23

标题: 圣骑士：文本到图像扩散模型的强大神经指纹

摘要: 文档摘要：对于恶意用途误用文本到图像生成模型的风险，尤其是由于这些模型的开源开发，已经成为一个严重的关注点。作为一种风险缓解策略，将生成模型归因为神经指纹技术正在成为一种流行的技术。最近有大量工作致力于解决神经指纹技术的问题。对于这些模型的归因准确性和生成质量之间的权衡已经得到广泛研究。目前还没有任何一种方法可以达到100%的归因准确性。然而，任何准确度低于百分之百的模型实际上都无法部署。在这项工作中，我们提出了一种准确的方法，利用编码理论文献中的循环纠错码概念，将神经指纹技术纳入文本到图像扩散模型中。

更新时间: 2025-07-23 18:41:23

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.03170v2

PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models

Updated: 2025-07-23 18:41:23

标题: 圣骑士：文本到图像扩散模型的强大神经指纹技术

摘要: 对于恶意用途误用文本到图像生成模型的风险，特别是由于这些模型的开源开发，已经成为一个严重的关注点。作为一种风险缓解策略，将生成模型与神经指纹识别联系起来的技术正逐渐成为一种流行的技术。最近有大量的工作旨在解决神经指纹识别的问题。这些模型在归因准确性和生成质量之间存在一个权衡。目前还没有任何一种现有方法能够达到100%的归因准确性。然而，任何准确度不到百分百的模型实际上都无法部署。在这项工作中，我们提出了一种准确的方法，利用编码理论文献中的循环纠错码的概念，将神经指纹识别纳入文本到图像扩散模型中。

更新时间: 2025-07-23 18:41:23

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.03170v2

Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based Image Synthesis

Recent advancements in Deep Learning and its application on the edge hold great potential for the revolution of routine screenings for skin cancers like Melanoma. Along with the anticipated benefits of this technology, potential dangers arise from unforseen and inherent biases. Thus, assessing and improving the fairness of such systems is of utmost importance. A key challenge in fairness assessment is to ensure that the evaluation dataset is sufficiently representative of different Personal Identifiable Information (PII) (sex, age, and race) and other minority groups. Against the backdrop of this challenge, this study leverages the state-of-the-art Generative AI (GenAI) LightningDiT model to assess the fairness of publicly available melanoma classifiers. The results suggest that fairness assessment using highly realistic synthetic data is a promising direction. Yet, our findings indicate that verifying fairness becomes difficult when the melanoma-detection model used for evaluation is trained on data that differ from the dataset underpinning the synthetic images. Nonetheless, we propose that our approach offers a valuable new avenue for employing synthetic data to gauge and enhance fairness in medical-imaging GenAI systems.

Updated: 2025-07-23 18:33:27

标题: 通过基于GenAI的图像合成实现基于AI的皮肤病变分类器的公平性评估的便利化

摘要: 深度学习在边缘应用上的最新进展对于革命性地改变黑色素瘤等皮肤癌常规筛查具有巨大潜力。除了这项技术预期的好处之外，潜在的危险也源于意想不到的和固有的偏见。因此，评估和改进这类系统的公平性至关重要。公平性评估中的一个关键挑战是确保评估数据集充分代表不同的个人可识别信息（PII）（性别、年龄和种族）以及其他少数群体。在这一挑战背景下，本研究利用最先进的生成人工智能（GenAI）LightningDiT模型，评估公开可用的黑色素瘤分类器的公平性。结果表明，使用高度逼真的合成数据进行公平性评估是一个有前途的方向。然而，我们的研究结果表明，当用于评估的黑色素瘤检测模型训练数据与支撑合成图像的数据集不同时，验证公平性变得困难。尽管如此，我们提出我们的方法为在医学成像GenAI系统中利用合成数据来评估和提升公平性提供了一个有价值的新途径。

更新时间: 2025-07-23 18:33:27

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.17860v1

Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based Image Synthesis

Updated: 2025-07-23 18:33:27

标题: 通过基于GenAI的图像合成实现AI皮肤病变分类器公平性评估的便利化

摘要: 最近深度学习技术的进步及其在边缘应用上的应用为革新像黑色素瘤这样的皮肤癌常规筛查提供了巨大潜力。尽管这项技术有望带来的好处，但潜在的危险也来自于未预料到和固有的偏见。因此，评估和改善这些系统的公平性至关重要。在公平性评估中的一个关键挑战是确保评估数据集充分代表不同的个人可识别信息（PII）（性别、年龄和种族）和其他少数群体。在面对这一挑战的背景下，本研究利用最先进的生成人工智能（GenAI）LightningDiT模型来评估公开可用的黑色素瘤分类器的公平性。结果表明，使用高度逼真的合成数据进行公平性评估是一个有前途的方向。然而，我们的发现表明，当用于评估的黑色素瘤检测模型是在与支撑合成图像的数据不同的数据上训练时，验证公平性会变得困难。尽管如此，我们提出我们的方法为在医学成像GenAI系统中利用合成数据来评估和增强公平性提供了一个有价值的新途径。

更新时间: 2025-07-23 18:33:27

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.17860v1

Detail++: Training-Free Detail Enhancer for Text-to-Image Diffusion Models

Recent advances in text-to-image (T2I) generation have led to impressive visual results. However, these models still face significant challenges when handling complex prompt, particularly those involving multiple subjects with distinct attributes. Inspired by the human drawing process, which first outlines the composition and then incrementally adds details, we propose Detail++, a training-free framework that introduces a novel Progressive Detail Injection (PDI) strategy to address this limitation. Specifically, we decompose a complex prompt into a sequence of simplified sub-prompts, guiding the generation process in stages. This staged generation leverages the inherent layout-controlling capacity of self-attention to first ensure global composition, followed by precise refinement. To achieve accurate binding between attributes and corresponding subjects, we exploit cross-attention mechanisms and further introduce a Centroid Alignment Loss at test time to reduce binding noise and enhance attribute consistency. Extensive experiments on T2I-CompBench and a newly constructed style composition benchmark demonstrate that Detail++ significantly outperforms existing methods, particularly in scenarios involving multiple objects and complex stylistic conditions.

Updated: 2025-07-23 18:20:46

标题: Detail++：用于文本到图像扩散模型的无需训练的细节增强器

摘要: 最近在文本到图像（T2I）生成领域取得了显著进展，产生了令人印象深刻的视觉结果。然而，这些模型在处理复杂提示时仍面临重大挑战，特别是涉及具有不同属性的多个主题的情况。受人类绘画过程的启发，即首先勾勒构图，然后逐渐添加细节，我们提出了Detail++，这是一个无需训练的框架，引入了一种新颖的渐进式细节注入（PDI）策略来解决这一限制。具体地，我们将复杂提示分解为一系列简化的子提示，引导生成过程分阶段进行。这种分阶段生成利用了自注意力的固有布局控制能力，首先确保全局构图，然后进行精细的细化。为了实现属性与相应主题之间的准确绑定，我们利用交叉注意力机制，并在测试时进一步引入重心对齐损失，以减少绑定噪声并增强属性一致性。在T2I-CompBench和新构建的风格合成基准测试上进行的大量实验表明，Detail++明显优于现有方法，特别是在涉及多个对象和复杂风格条件的情况下。

更新时间: 2025-07-23 18:20:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17853v1

Performance Evaluation and Threat Mitigation in Large-scale 5G Core Deployment

The deployment of large-scale software-based 5G core functions presents significant challenges due to their reliance on optimized and intelligent resource provisioning for their services. Many studies have focused on analyzing the impact of resource allocation for complex deployments using mathematical models, queue theories, or even Artificial Intelligence (AI). This paper elucidates the effects of chaotic workloads, generated by Distributed Denial of Service (DDoS) on different Network Functions (NFs) on User Equipment registration performance. Our findings highlight the necessity of diverse resource profiles to ensure Service-Level Agreement (SLA) compliance in large-scale 5G core deployments. Additionally, our analysis of packet capture approaches demonstrates the potential of kernel-based monitoring for scalable security threat defense. Finally, our empirical evaluation provides insights into the effective deployment of 5G NFs in complex scenarios.

Updated: 2025-07-23 18:17:26

标题: 大规模5G核心部署中的性能评估和威胁缓解

摘要: 大规模软件化5G核心功能的部署面临重大挑战，因为它们依赖于针对其服务进行优化和智能资源配置。许多研究已经集中于分析使用数学模型、队列理论甚至人工智能（AI）来进行复杂部署资源分配的影响。本文阐明了分布式拒绝服务（DDoS）生成的混乱工作负载对用户设备注册性能上的不同网络功能（NFs）的影响。我们的研究结果突出了确保大规模5G核心部署中服务级别协议（SLA）合规性所需多样化资源配置文件的必要性。此外，我们对数据包捕获方法的分析展示了基于内核的监控在可扩展安全威胁防御方面的潜力。最后，我们的实证评估为在复杂场景中有效部署5G NFs提供了洞察。

更新时间: 2025-07-23 18:17:26

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.17850v1

Performance Evaluation and Threat Mitigation in Large-scale 5G Core Deployment

Updated: 2025-07-23 18:17:26

标题: 大规模5G核心部署中的性能评估和威胁缓解

摘要: 大规模基于软件的5G核心功能的部署面临重大挑战，因为它们依赖于为其服务进行优化和智能资源配置。许多研究集中于分析使用数学模型、队列理论甚至人工智能（AI）对复杂部署资源分配的影响。本文阐明了由分布式拒绝服务（DDoS）生成的混乱工作负载对用户设备注册性能上不同网络功能（NFs）的影响。我们的研究结果突出了在大规模5G核心部署中确保服务级别协议（SLA）合规性的不同资源配置文件的必要性。此外，我们对数据包捕获方法的分析展示了基于内核的监控在可扩展的安全威胁防御方面的潜力。最后，我们的实证评估提供了关于在复杂情景中有效部署5G NFs的见解。

更新时间: 2025-07-23 18:17:26

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.17850v1

Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance

Differentially private stochastic gradient descent privatizes model training by injecting noise into each iteration, where the noise magnitude increases with the number of model parameters. Recent works suggest that we can reduce the noise by leveraging public data for private machine learning, by projecting gradients onto a subspace prescribed by the public data. However, given a choice of public datasets, it is not a priori clear which one may be most appropriate for the private task. We give an algorithm for selecting a public dataset by measuring a low-dimensional subspace distance between gradients of the public and private examples. We provide theoretical analysis demonstrating that the excess risk scales with this subspace distance. This distance is easy to compute and robust to modifications in the setting. Empirical evaluation shows that trained model accuracy is monotone in this distance.

Updated: 2025-07-23 18:17:16

标题: 通过梯度子空间距离选择私人机器学习的公共数据集

摘要: 差分私密随机梯度下降通过在每次迭代中注入噪音来私密化模型训练，其中噪音幅度随模型参数数量增加而增加。最近的研究表明，我们可以通过利用公共数据来减少噪音，通过将梯度投影到公共数据规定的子空间上进行私密机器学习。然而，给定一组公共数据集，不清楚哪一个可能最适合私密任务。我们提供了一种选择公共数据集的算法，通过测量公共和私密示例梯度之间的低维子空间距离。我们提供了理论分析，证明超出风险随这个子空间距离的增加而增加。这个距离易于计算，并且对设置中的修改具有鲁棒性。实证评估表明，训练模型的准确性与这个距离单调增加。

更新时间: 2025-07-23 18:17:16

领域: stat.ML,cs.CR,cs.CV,cs.DS,cs.LG

下载: http://arxiv.org/abs/2303.01256v2

Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance

Updated: 2025-07-23 18:17:16

标题: 通过梯度子空间距离为私有机器学习选择公共数据集

摘要: 差分私有随机梯度下降通过在每次迭代中注入噪声来私有化模型训练，其中噪声大小随模型参数数量增加而增加。最近的研究表明，我们可以通过将梯度投影到公共数据指定的子空间来减少噪声，以用于私有机器学习。然而，考虑到公共数据集的选择，不清楚哪一个可能最适合私有任务。我们提供了一种通过测量公共和私有示例的梯度之间的低维子空间距离来选择公共数据集的算法。我们提供了理论分析，表明过多风险与该子空间距离成比例。这个距离容易计算，并且对设置中的修改具有鲁棒性。实证评估表明，训练模型的准确率与这个距离单调递增。

更新时间: 2025-07-23 18:17:16

领域: stat.ML,cs.CR,cs.CV,cs.DS,cs.LG

下载: http://arxiv.org/abs/2303.01256v2

A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report

Statistical agencies routinely use different strategies to protect the confidentiality of tabular data from those used to protect the individual records in publicly released microdata. Aggregation is assumed to make the resulting statistics inherently less disclosive than the microdata. The 2010 U.S. Census used different disclosure limitation rules for its tabular and microdata publications. We show that the assumption that these tabular data are inherently less disclosive than their underlying microdata is wrong. The 2010 Census published more than 150 billion statistics in 180 table sets, almost all at the most detailed geographic level -- individual census blocks. Using only 34 of the published table sets, we reconstructed microdata for five variables (census block, sex, age, race, and ethnicity). Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We confirm through reidentification studies that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable people (unique persons with race and ethnicity different from the modal person on the census block) with 95\% accuracy. Next, we show that the more robust disclosure limitation framework used for the 2020 U.S. Census defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality or overly degrade the statistics' utility for the primary statutory use case: redrawing the boundaries of all the nation's legislative and voting districts in compliance with the 1965 Voting Rights Act. This is the full technical report. For the summary paper see https://doi.org/10.1162/99608f92.4a1ebf70.

Updated: 2025-07-23 18:16:41

标题: 对2010年美国人口普查的模拟重建和重新识别攻击：全技术报告

摘要: 统计机构通常使用不同的策略来保护表格数据的机密性，与保护公开发布的微观数据中的个人记录所使用的策略不同。聚合被认为使得得出的统计数据本质上比微观数据更不易泄露。2010年美国人口普查针对其表格和微观数据出版采用了不同的披露限制规则。我们表明，这些表格数据本质上比其基础微观数据更不易泄露的假设是错误的。2010年人口普查在180个表格集中发布了超过1500亿个统计数据，几乎所有数据都在最详细的地理级别--个别人口普查区块。仅使用34个已发布的表格集，我们重建了五个变量的微观数据（人口普查区块、性别、年龄、种族和族裔）。仅使用已发布的数据，一个攻击者可以使用我们的方法验证所有记录中70%的人口普查区块（9700万人）都被完美重建。我们通过重新识别研究证实，攻击者可以在完美重建准确性的人口普查区块内，对340万易受攻击的人（种族和族裔与人口普查区块上的众数人不同的独特个人）的种族和族裔实际人口普查回应进行正确推断，准确率达到95％。接下来，我们展示了2020年美国人口普查使用的更强大的披露限制框架防御了基于重建的攻击。最后，我们展示了2020年人口普查披露回避系统的可用替代方案要么无法保护机密性，要么过度降低了统计数据在主要法定用例中的效用：根据1965年选举权法案重新划定全国所有立法和选区的边界。这是完整的技术报告。有关摘要论文，请参阅https://doi.org/10.1162/99608f92.4a1ebf70。

更新时间: 2025-07-23 18:16:41

领域: stat.AP,cs.CR,econ.EM

下载: http://arxiv.org/abs/2312.11283v2

A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report

Updated: 2025-07-23 18:16:41

标题: 对2010年美国人口普查的模拟重建和重新识别攻击：完整技术报告

摘要: 统计机构通常会使用不同的策略来保护表格数据的机密性，以防止其泄露，而这些策略与保护公开发布的微观数据中的个人记录的策略不同。人们普遍认为，聚合会使得结果统计比微观数据更不易泄露。2010年美国人口普查在其表格和微观数据出版物中使用了不同的披露限制规则。我们展示了这种表格数据比其基础微观数据更不易泄露的假设是错误的。2010年人口普查在180个表集中发布了超过1500亿个统计数据，几乎所有数据都是在最详细的地理级别——个别人口普查区块。仅使用34个已发布的表集，我们重建了五个变量的微观数据（人口普查区块、性别、年龄、种族和族裔）。仅使用已发布的数据，使用我们的方法的攻击者可以验证所有人口普查区块中70%的记录（9700万人）都被完美重建。我们通过重新识别研究证实，攻击者可以在完全重建准确率的人口普查区块内，以95%的准确率正确推断出340万易受攻击的人（在人口普查区块上与模态人口种族和族裔不同的独特个人）的实际人口普查反应的种族和族裔。接下来，我们展示了2020年美国人口普查使用的更为强大的披露限制框架可抵御基于重建的攻击。最后，我们展示了2020年人口普查披露规避系统的可用替代方案要么无法保护机密性，要么过度降低了统计数据的实用性，无法满足主要法定用途：根据1965年选举权法案重新划分全国所有立法和投票区的边界。这是完整的技术报告。有关摘要论文，请参阅https://doi.org/10.1162/99608f92.4a1ebf70。

更新时间: 2025-07-23 18:16:41

领域: stat.AP,cs.CR,econ.EM

下载: http://arxiv.org/abs/2312.11283v2

SV3.3B: A Sports Video Understanding Model for Action Recognition

This paper addresses the challenge of automated sports video analysis, which has traditionally been limited by computationally intensive models requiring server-side processing and lacking fine-grained understanding of athletic movements. Current approaches struggle to capture the nuanced biomechanical transitions essential for meaningful sports analysis, often missing critical phases like preparation, execution, and follow-through that occur within seconds. To address these limitations, we introduce SV3.3B, a lightweight 3.3B parameter video understanding model that combines novel temporal motion difference sampling with self-supervised learning for efficient on-device deployment. Our approach employs a DWT-VGG16-LDA based keyframe extraction mechanism that intelligently identifies the 16 most representative frames from sports sequences, followed by a V-DWT-JEPA2 encoder pretrained through mask-denoising objectives and an LLM decoder fine-tuned for sports action description generation. Evaluated on a subset of the NSVA basketball dataset, SV3.3B achieves superior performance across both traditional text generation metrics and sports-specific evaluation criteria, outperforming larger closed-source models including GPT-4o variants while maintaining significantly lower computational requirements. Our model demonstrates exceptional capability in generating technically detailed and analytically rich sports descriptions, achieving 29.2% improvement over GPT-4o in ground truth validation metrics, with substantial improvements in information density, action complexity, and measurement precision metrics essential for comprehensive athletic analysis. Model Available at https://huggingface.co/sportsvision/SV3.3B.

Updated: 2025-07-23 18:11:39

标题: SV3.3B：一种用于动作识别的体育视频理解模型

摘要: 本文讨论了自动化体育视频分析的挑战，传统上受到需要服务器端处理和缺乏对运动员动作细微理解的计算密集型模型的限制。当前方法往往难以捕捉对有意义的体育分析至关重要的细微生物力学过渡，通常会错过在几秒钟内发生的准备、执行和跟进等关键阶段。为了解决这些限制，我们引入了SV3.3B，这是一个轻量级的3.3B参数视频理解模型，它结合了新颖的时间运动差别采样和自监督学习，实现了高效的设备部署。我们的方法采用基于DWT-VGG16-LDA的关键帧提取机制，智能地从体育序列中识别出最具代表性的16帧，然后通过面具去噪目标预训练的V-DWT-JEPA2编码器和为体育动作描述生成进行微调的LLM解码器。在NSVA篮球数据集的子集上进行评估，SV3.3B在传统文本生成指标和体育特定评估标准上取得了优异的性能，优于包括GPT-4o变体在内的更大规模的闭源模型，同时保持了显著较低的计算要求。我们的模型在生成技术细致和分析丰富的体育描述方面表现出卓越能力，与GPT-4o相比，在地面真实验证指标上实现了29.2%的改进，信息密度、动作复杂度和测量精度等关键的运动分析指标也有显著改进。模型可在https://huggingface.co/sportsvision/SV3.3B找到。

更新时间: 2025-07-23 18:11:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17844v1

On the Energy Distribution of the Galactic Center Excess' Sources

The Galactic Center Excess (GCE) remains one of the defining mysteries uncovered by the Fermi $\gamma$-ray Space Telescope. Although it may yet herald the discovery of annihilating dark matter, weighing against that conclusion are analyses showing the spatial structure of the emission appears more consistent with a population of dim point sources. Technical limitations have restricted prior analyses to studying the point-source hypothesis purely spatially. All spectral information that could help disentangle the GCE from the complex and uncertain astrophysical emission was discarded. We demonstrate that a neural network-aided simulation-based inference approach can overcome such limitations and thereby confront the point source explanation of the GCE with spatial and spectral data. The addition is profound: energy information drives the putative point sources to be significantly dimmer, indicating either the GCE is truly diffuse in nature or made of an exceptionally large number of sources. Quantitatively, for our best fit background model, the excess is essentially consistent with Poisson emission as predicted by dark matter. If the excess is instead due to point sources, our median prediction is ${\cal O}(10^5)$ sources in the Galactic Center, or more than 35,000 sources at 90% confidence, both significantly larger than the hundreds of sources preferred by earlier point-source analyses of the GCE.

Updated: 2025-07-23 18:00:00

标题: 关于银河中心过剩源的能量分布

摘要: 银河系中心过剩现象（GCE）仍然是费米伽马射线太空望远镜揭示的一个关键谜团。虽然它可能预示着黑暗物质湮灭的发现，但反对这一结论的是显示发射的空间结构似乎更符合一个暗淡点源的统计分析。技术限制限制了先前的分析仅在空间上研究点源假设。所有可能有助于将GCE与复杂且不确定的天体发射区分开的光谱信息都被丢弃了。我们证明，神经网络辅助的基于模拟的推断方法可以克服这些限制，从而将GCE的点源解释与空间和光谱数据对抗。这一补充具有深远意义：能量信息使假定的点源变得明显更暗，表明GCE要么真正是一种扩散的性质，要么由异常庞大数量的源组成。定量上，对于我们最佳的背景模型，过剩基本上与由黑暗物质预测的泊松发射一致。如果过剩是由点源引起的，我们的中位预测是在银河中心存在${\cal O}(10^5)$个源，或在90%的置信水平下超过35,000个源，这两者都比先前GCE的点源分析所偏好的数百个源显着更多。

更新时间: 2025-07-23 18:00:00

领域: astro-ph.HE,astro-ph.CO,astro-ph.IM,cs.LG,hep-ph

下载: http://arxiv.org/abs/2507.17804v1

On the Energy Distribution of the Galactic Center Excess' Sources

Updated: 2025-07-23 18:00:00

标题: 关于银河系中心过剩源的能量分布

摘要: 银河中心过剩（GCE）仍然是费米伽马射线空间望远镜揭示的一个关键谜团。尽管这可能预示着暗物质湮灭的发现，但与这一结论相矛盾的是分析显示，发射的空间结构似乎更符合一个昏暗点源群体的假设。技术限制限制了先前的分析仅在纯粹空间上研究点源假说。所有可帮助区分GCE与复杂且不确定的天体发射的光谱信息都被丢弃了。我们证明，神经网络辅助的基于模拟的推断方法可以克服这些限制，从而通过空间和光谱数据与GCE的点源解释对抗。这个补充是深刻的：能量信息使得假设的点源明显更暗淡，这表明GCE要么真正是扩散的，要么由异常大量的源组成。定量上，对于我们最佳拟合的背景模型，过剩基本上与暗物质预测的泊松发射一致。如果过剩是由点源引起的，我们的中位预测是银河中心有${\cal O}(10^5)$个源，或者在90%的置信水平下有超过35,000个源，这两个数字都显著大于之前对GCE进行的点源分析所偏好的数百个源。

更新时间: 2025-07-23 18:00:00

领域: astro-ph.HE,astro-ph.CO,astro-ph.IM,cs.LG,hep-ph

下载: http://arxiv.org/abs/2507.17804v1

Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

Robustness and resource-efficiency are two highly desirable properties for modern machine learning models. However, achieving them jointly remains a challenge. In this paper, we position high learning rates as a facilitator for simultaneously achieving robustness to spurious correlations and network compressibility. We demonstrate that large learning rates also produce desirable representation properties such as invariant feature utilization, class separation, and activation sparsity. Importantly, our findings indicate that large learning rates compare favorably to other hyperparameters and regularization methods, in consistently satisfying these properties in tandem. In addition to demonstrating the positive effect of large learning rates across diverse spurious correlation datasets, models, and optimizers, we also present strong evidence that the previously documented success of large learning rates in standard classification tasks is likely due to its effect on addressing hidden/rare spurious correlations in the training dataset.

Updated: 2025-07-23 17:59:02

标题: 大学习速率同时实现对虚假相关性的稳健性和可压缩性

摘要: 鲁棒性和资源效率是现代机器学习模型中两个非常理想的特性。然而，同时实现它们仍然是一项挑战。在本文中，我们将高学习率定位为同时实现对伪相关性和网络可压缩性鲁棒性的促进因素。我们证明了较大的学习率还能产生诸如不变特征利用、类别分离和激活稀疏性等理想的表示特性。重要的是，我们的研究结果表明，较大的学习率与其他超参数和正则化方法相比，在同时满足这些特性方面表现更好。除了展示较大学习率在各种伪相关性数据集、模型和优化器中的积极影响外，我们还提供了强有力的证据，证明了较大学习率在标准分类任务中的成功可能是由其对训练数据集中隐藏/稀有伪相关性的影响所致。

更新时间: 2025-07-23 17:59:02

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2507.17748v1

Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks

As frontier language models increasingly saturate standard QA benchmarks, concerns about data contamination, memorization, and escalating dataset creation costs persist. We propose a debate-driven evaluation paradigm that transforms any existing QA dataset into structured adversarial debates--where one model is given the official answer to defend, and another constructs and defends an alternative answer--adjudicated by a judge model blind to the correct solution. By forcing multi-round argumentation, this approach substantially increases difficulty while penalizing shallow memorization, yet reuses QA items to reduce curation overhead. We make two main contributions: (1) an evaluation pipeline to systematically convert QA tasks into debate-based assessments, and (2) a public benchmark that demonstrates our paradigm's effectiveness on a subset of MMLU-Pro questions, complete with standardized protocols and reference models. Empirical results validate the robustness of the method and its effectiveness against data contamination--a Llama 3.1 model fine-tuned on test questions showed dramatic accuracy improvements (50% -> 82%) but performed worse in debates. Results also show that even weaker judges can reliably differentiate stronger debaters, highlighting how debate-based evaluation can scale to future, more capable systems while maintaining a fraction of the cost of creating new benchmarks. Overall, our framework underscores that "pretraining on the test set is no longer all you need," offering a sustainable path for measuring the genuine reasoning ability of advanced language models.

Updated: 2025-07-23 17:58:14

标题: 在测试集上进行预训练已不再是你所需要的一切：一种基于辩论的问答基准方法

摘要: 随着边缘语言模型在标准问答基准上的普及，对于数据污染、记忆以及不断增加的数据集创建成本的担忧依然存在。我们提出了一个以辩论为驱动的评估范式，将任何现有的问答数据集转化为结构化的对抗性辩论——其中一个模型被给予官方答案进行辩护，另一个构建并辩护一个替代答案——由一个对正确解决方案一无所知的评判模型进行仲裁。通过强制进行多轮论证，这种方法显著增加了难度，同时惩罚浅层记忆，但可以通过重复使用问答项目来降低策划成本。我们做出了两个主要贡献：（1）一个评估流程，系统地将问答任务转化为基于辩论的评估，以及（2）一个公共基准，展示了我们的范式在一部分MMLU-Pro问题上的有效性，包括标准化的协议和参考模型。实证结果验证了该方法的稳健性以及其对数据污染的有效性——一个在测试问题上进行微调的Llama 3.1模型显示出显著的准确率提高（50% -> 82%），但在辩论中表现较差。结果还表明，即使是较弱的评判者也可以可靠地区分较强的辩手，突出了基于辩论的评估如何能够扩展到未来更强大的系统，同时保持创建新基准成本的一小部分。总的来说，我们的框架强调“只在测试集上进行预训练已经不够了”，为衡量先进语言模型的真实推理能力提供了可持续的路径。

更新时间: 2025-07-23 17:58:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.17747v1

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

Extending Reinforcement Learning with Verifiable Rewards (RLVR) to real-world tasks often requires balancing objective and subjective evaluation criteria. However, many such tasks lack a single, unambiguous ground truth-making it difficult to define reliable reward signals for post-training language models. While traditional preference-based methods offer a workaround, they rely on opaque reward functions that are difficult to interpret and prone to spurious correlations. We introduce $\textbf{Rubrics as Rewards}$ (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a $28\%$ relative improvement on HealthBench-1k compared to simple Likert-based approaches, while matching or surpassing the performance of reward signals derived from expert-written references. By treating rubrics as structured reward signals, we show that RaR enables smaller-scale judge models to better align with human preferences and sustain robust performance across model scales.

Updated: 2025-07-23 17:57:55

标题: 标题翻译为：评分标准作为奖励：超越可验证领域的强化学习

摘要: 将强化学习与可验证奖励（RLVR）扩展到真实世界任务通常需要平衡客观和主观评估标准。然而，许多此类任务缺乏单一、明确的基本事实，这使得为训练后的语言模型定义可靠的奖励信号变得困难。虽然传统的基于偏好的方法提供了一种解决方法，但它们依赖于难以解释和容易产生虚假相关性的不透明奖励函数。我们引入了“评分表作为奖励”（RaR）框架，使用结构化的、清单式的评分表作为可解释的奖励信号，用于在政策训练中与GRPO一起训练。我们最佳的RaR方法在HealthBench-1k上相对于简单的Likert-based方法取得了高达28%的改进，同时与从专家编写的参考资料中获得的奖励信号的性能相匹配或超越。通过将评分表视为结构化奖励信号，我们展示了RaR使得规模较小的评估模型能够更好地与人类偏好保持一致，并在各种模型规模上实现稳健的性能。

更新时间: 2025-07-23 17:57:55

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.17746v1

Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

Recent advances in sparse voxel representations have significantly improved the quality of 3D content generation, enabling high-resolution modeling with fine-grained geometry. However, existing frameworks suffer from severe computational inefficiencies due to the quadratic complexity of attention mechanisms in their two-stage diffusion pipelines. In this work, we propose Ultra3D, an efficient 3D generation framework that significantly accelerates sparse voxel modeling without compromising quality. Our method leverages the compact VecSet representation to efficiently generate a coarse object layout in the first stage, reducing token count and accelerating voxel coordinate prediction. To refine per-voxel latent features in the second stage, we introduce Part Attention, a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions. This design preserves structural continuity while avoiding unnecessary global attention, achieving up to 6.7x speed-up in latent generation. To support this mechanism, we construct a scalable part annotation pipeline that converts raw meshes into part-labeled sparse voxels. Extensive experiments demonstrate that Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves state-of-the-art performance in both visual fidelity and user preference.

Updated: 2025-07-23 17:57:16

标题: Ultra3D：具有部分注意力的高效高保真度3D生成

摘要: 最近，稀疏体素表示方面的进展显著提高了3D内容生成的质量，实现了具有精细几何的高分辨率建模。然而，现有框架由于其两阶段扩散管道中的注意力机制的二次复杂度而遭受严重的计算效率问题。在这项工作中，我们提出了Ultra3D，一种高效的3D生成框架，显著加速稀疏体素建模而不影响质量。我们的方法利用紧凑的VecSet表示，在第一阶段高效生成粗糙的对象布局，减少令牌数量并加速体素坐标预测。为了在第二阶段细化每个体素的潜在特征，我们引入了Part Attention，一种几何感知的局部化注意力机制，将注意力计算限制在语义一致的部件区域内。这种设计保持了结构连续性，同时避免了不必要的全局注意力，使潜在生成加速高达6.7倍。为支持这种机制，我们构建了一个可扩展的部件注释管道，将原始网格转换为带有部件标签的稀疏体素。大量实验证明Ultra3D支持1024分辨率的高分辨率3D生成，并在视觉保真度和用户偏好方面实现了业界领先的性能。

更新时间: 2025-07-23 17:57:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17745v1

Yume: An Interactive World Generation Model

Yume aims to use images, text, or videos to create an interactive, realistic, and dynamic world, which allows exploration and control using peripheral devices or neural signals. In this report, we present a preview version of \method, which creates a dynamic world from an input image and allows exploration of the world using keyboard actions. To achieve this high-fidelity and interactive video world generation, we introduce a well-designed framework, which consists of four main components, including camera motion quantization, video generation architecture, advanced sampler, and model acceleration. First, we quantize camera motions for stable training and user-friendly interaction using keyboard inputs. Then, we introduce the Masked Video Diffusion Transformer~(MVDT) with a memory module for infinite video generation in an autoregressive manner. After that, training-free Anti-Artifact Mechanism (AAM) and Time Travel Sampling based on Stochastic Differential Equations (TTS-SDE) are introduced to the sampler for better visual quality and more precise control. Moreover, we investigate model acceleration by synergistic optimization of adversarial distillation and caching mechanisms. We use the high-quality world exploration dataset \sekai to train \method, and it achieves remarkable results in diverse scenes and applications. All data, codebase, and model weights are available on https://github.com/stdstu12/YUME. Yume will update monthly to achieve its original goal. Project page: https://stdstu12.github.io/YUME-Project/.

Updated: 2025-07-23 17:57:09

标题: 梦：一个交互式世界生成模型

摘要: Yume旨在利用图像、文本或视频创建一个交互式、逼真和动态的世界，允许使用外围设备或神经信号进行探索和控制。在这份报告中，我们展示了\method的预览版本，它可以从输入图像创建一个动态世界，并允许使用键盘操作探索这个世界。为了实现这种高保真度和交互式视频世界生成，我们引入了一个精心设计的框架，包括四个主要组件，包括相机运动量化、视频生成架构、高级采样器和模型加速。首先，我们对相机运动进行量化，以实现稳定的训练和用户友好的键盘交互。然后，我们引入了带有记忆模块的Masked Video Diffusion Transformer（MVDT），以自回归方式进行无限视频生成。之后，我们为采样器引入了无需训练的Anti-Artifact Mechanism（AAM）和基于随机微分方程的Time Travel Sampling（TTS-SDE），以获得更好的视觉质量和更精确的控制。此外，我们通过对抗蒸馏和缓存机制的协同优化来研究模型加速。我们使用高质量的世界探索数据集\sekai来训练\method，它在不同场景和应用中取得了显著的成果。所有数据、代码库和模型权重都可以在https://github.com/stdstu12/YUME 上找到。Yume将每月更新以实现其最初的目标。项目页面：https://stdstu12.github.io/YUME-Project/。

更新时间: 2025-07-23 17:57:09

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.17744v1

Symmetric Private Information Retrieval (SPIR) on Graph-Based Replicated Systems

We introduce the problem of symmetric private information retrieval (SPIR) on replicated databases modeled by a simple graph. In this model, each vertex corresponds to a server, and a message is replicated on two servers if and only if there is an edge between them. We consider the setting where the server-side common randomness necessary to accomplish SPIR is also replicated at the servers according to the graph, and we call this as message-specific common randomness. In this setting, we establish a lower bound on the SPIR capacity, i.e., the maximum download rate, for general graphs, by proposing an achievable SPIR scheme. Next, we prove that, for any SPIR scheme to be feasible, the minimum size of message-specific randomness should be equal to the size of a message. Finally, by providing matching upper bounds, we derive the exact SPIR capacity for the class of path and regular graphs.

Updated: 2025-07-23 17:51:08

标题: 对称私有信息检索（SPIR）在基于图的复制系统上

摘要: 我们介绍了在由简单图建模的复制数据库上的对称私有信息检索（SPIR）问题。在这个模型中，每个顶点对应一个服务器，只有在它们之间存在边时，消息才会在两个服务器上复制。我们考虑了服务器端必要的共同随机性以完成SPIR的设置也根据图在服务器上进行复制，并将其称为特定消息的共同随机性。在这种设置下，我们通过提出一个可实现的SPIR方案，为一般图建立了SPIR容量的下界，即最大下载速率。接下来，我们证明了对于任何SPIR方案来说，特定消息随机性的最小尺寸应该等于消息的尺寸。最后，通过提供匹配的上界，我们推导了路径和正则图类的精确SPIR容量。

更新时间: 2025-07-23 17:51:08

领域: cs.IT,cs.CR,cs.DB,cs.NI,eess.SP,math.IT

下载: http://arxiv.org/abs/2507.17736v1

SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars

In recent years, large language models (LLMs) have transformed natural language understanding through vast datasets and large-scale parameterization. Inspired by this success, we present SpecCLIP, a foundation model framework that extends LLM-inspired methodologies to stellar spectral analysis. Stellar spectra, akin to structured language, encode rich physical and chemical information about stars. By training foundation models on large-scale spectral datasets, our goal is to learn robust and informative embeddings that support diverse downstream applications. As a proof of concept, SpecCLIP involves pre-training on two spectral types--LAMOST low-resolution and Gaia XP--followed by contrastive alignment using the CLIP (Contrastive Language-Image Pre-training) framework, adapted to associate spectra from different instruments. This alignment is complemented by auxiliary decoders that preserve spectrum-specific information and enable translation (prediction) between spectral types, with the former achieved by maximizing mutual information between embeddings and input spectra. The result is a cross-spectrum framework enabling intrinsic calibration and flexible applications across instruments. We demonstrate that fine-tuning these models on moderate-sized labeled datasets improves adaptability to tasks such as stellar-parameter estimation and chemical-abundance determination. SpecCLIP also enhances the accuracy and precision of parameter estimates benchmarked against external survey data. Additionally, its similarity search and cross-spectrum prediction capabilities offer potential for anomaly detection. Our results suggest that contrastively trained foundation models enriched with spectrum-aware decoders can advance precision stellar spectroscopy.

Updated: 2025-07-23 17:47:04

标题: SpecCLIP：星星的光谱测量对齐和翻译

摘要: 近年来，大型语言模型(LLMs)通过庞大的数据集和大规模参数化改变了自然语言理解。受到这一成功的启发，我们提出了SpecCLIP，这是一个基础模型框架，将LLM启发的方法论扩展到恒星光谱分析中。恒星光谱类似于结构化语言，编码了有关恒星的丰富物理和化学信息。通过在大规模光谱数据集上训练基础模型，我们的目标是学习出支持多样化下游应用的稳健且信息丰富的嵌入。作为概念验证，SpecCLIP包括在两种光谱类型上进行预训练--LAMOST低分辨率和Gaia XP--然后使用适应于关联来自不同仪器的光谱的CLIP（对比语言-图像预训练）框架进行对比对齐。这种对齐还通过保留光谱特定信息并实现光谱类型之间的翻译（预测）的辅助解码器来补充，前者通过最大化嵌入和输入光谱之间的互信息来实现。结果是一个跨光谱框架，实现了内在校准和在不同仪器上的灵活应用。我们证明，对这些模型在中等规模标记数据集上进行微调可以提高适应性，例如恒星参数估计和化学丰度测定。SpecCLIP还提高了参数估计的准确性和精度，并与外部调查数据进行了基准测试。此外，它的相似度搜索和跨光谱预测功能为异常检测提供了潜力。我们的结果表明，经对比训练的基础模型结合了光谱感知解码器，可以推动精密恒星光谱学的发展。

更新时间: 2025-07-23 17:47:04

领域: astro-ph.IM,astro-ph.SR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01939v2

Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning

Large Language Models (LLMs) exhibit considerable promise in financial applications; however, prevailing models frequently demonstrate limitations when confronted with scenarios that necessitate sophisticated reasoning capabilities, stringent trustworthiness criteria, and efficient adaptation to domain-specific requirements. We introduce the Agentar-Fin-R1 series of financial large language models (8B and 32B parameters), specifically engineered based on the Qwen3 foundation model to enhance reasoning capabilities, reliability, and domain specialization for financial applications. Our optimization approach integrates a high-quality, systematic financial task label system with a comprehensive multi-layered trustworthiness assurance framework. This framework encompasses high-quality trustworthy knowledge engineering, multi-agent trustworthy data synthesis, and rigorous data validation governance. Through label-guided automated difficulty-aware optimization, tow-stage training pipeline, and dynamic attribution systems, we achieve substantial improvements in training efficiency. Our models undergo comprehensive evaluation on mainstream financial benchmarks including Fineva, FinEval, and FinanceIQ, as well as general reasoning datasets such as MATH-500 and GPQA-diamond. To thoroughly assess real-world deployment capabilities, we innovatively propose the Finova evaluation benchmark, which focuses on agent-level financial reasoning and compliance verification. Experimental results demonstrate that Agentar-Fin-R1 not only achieves state-of-the-art performance on financial tasks but also exhibits exceptional general reasoning capabilities, validating its effectiveness as a trustworthy solution for high-stakes financial applications. The Finova bench is available at https://github.com/antgroup/Finova.

Updated: 2025-07-23 17:47:01

标题: Agentar-Fin-R1：通过领域专业知识、培训效率和先进推理提升金融智能

摘要: 大型语言模型（LLMs）在金融应用中表现出相当大的潜力；然而，当前的模型在面对需要复杂推理能力、严格的可信度标准以及对特定领域需求的高效适应性的情景时经常表现出限制。我们介绍了Agentar-Fin-R1系列的金融大型语言模型（8B和32B参数），这些模型是基于Qwen3基础模型专门设计的，以增强金融应用中的推理能力、可靠性和领域专业化。我们的优化方法集成了一个高质量的系统性金融任务标签系统，配合全面的多层次可信度保证框架。该框架包括高质量可信的知识工程、多智能体可信数据合成和严格的数据验证治理。通过标签引导的自动困难感知优化、两阶段训练流水线和动态归因系统，我们实现了训练效率的显著提升。我们的模型在主流金融基准测试中进行了全面评估，包括Fineva、FinEval和FinanceIQ，以及数学-500和GPQA-diamond等一般推理数据集。为了全面评估实际部署能力，我们创新地提出了Finova评估基准，重点关注基于代理层次的金融推理和合规性验证。实验结果表明，Agentar-Fin-R1不仅在金融任务上取得了最先进的性能，而且展示出了杰出的一般推理能力，验证了其作为高风险金融应用的可信解决方案的有效性。Finova基准可在https://github.com/antgroup/Finova获取。

更新时间: 2025-07-23 17:47:01

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.16802v2

Flow Matching Meets Biology and Life Science: A Survey

Over the past decade, advances in generative modeling, such as generative adversarial networks, masked autoencoders, and diffusion models, have significantly transformed biological research and discovery, enabling breakthroughs in molecule design, protein generation, drug discovery, and beyond. At the same time, biological applications have served as valuable testbeds for evaluating the capabilities of generative models. Recently, flow matching has emerged as a powerful and efficient alternative to diffusion-based generative modeling, with growing interest in its application to problems in biology and life sciences. This paper presents the first comprehensive survey of recent developments in flow matching and its applications in biological domains. We begin by systematically reviewing the foundations and variants of flow matching, and then categorize its applications into three major areas: biological sequence modeling, molecule generation and design, and peptide and protein generation. For each, we provide an in-depth review of recent progress. We also summarize commonly used datasets and software tools, and conclude with a discussion of potential future directions. The corresponding curated resources are available at https://github.com/Violet24K/Awesome-Flow-Matching-Meets-Biology.

Updated: 2025-07-23 17:44:29

标题: 匹配流 meets 生物学和生命科学：一项调查

摘要: 在过去的十年中，生成建模的进步，如生成对抗网络、掩蔽自动编码器和扩散模型，显著改变了生物研究和发现，实现了分子设计、蛋白生成、药物发现等方面的突破。同时，生物应用也成为评估生成模型能力的宝贵试验基地。最近，流匹配已经成为扩散式生成建模的强大高效替代方案，引起了在生物学和生命科学问题上的应用的兴趣。本文首次全面调查了流匹配的最新发展及其在生物领域的应用。我们首先系统地回顾了流匹配的基础和变体，然后将其应用分类为三个主要领域：生物序列建模、分子生成和设计，以及肽和蛋白生成。对于每个领域，我们提供了最新进展的深入回顾。我们还总结了常用的数据集和软件工具，并在讨论潜在未来方向后结束。相应的整理资源可在https://github.com/Violet24K/Awesome-Flow-Matching-Meets-Biology上找到。

更新时间: 2025-07-23 17:44:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17731v1

Online Submission and Evaluation System Design for Competition Operations

Research communities have developed benchmark datasets across domains to compare the performance of algorithms and techniques However, tracking the progress in these research areas is not easy, as publications appear in different venues at the same time, and many of them claim to represent the state-of-the-art. To address this, research communities often organise periodic competitions to evaluate the performance of various algorithms and techniques, thereby tracking advancements in the field. However, these competitions pose a significant operational burden. The organisers must manage and evaluate a large volume of submissions. Furthermore, participants typically develop their solutions in diverse environments, leading to compatibility issues during the evaluation of their submissions. This paper presents an online competition system that automates the submission and evaluation process for a competition. The competition system allows organisers to manage large numbers of submissions efficiently, utilising isolated environments to evaluate submissions. This system has already been used successfully for several competitions, including the Grid-Based Pathfinding Competition and the League of Robot Runners competition.

Updated: 2025-07-23 17:44:10

标题: 在线提交和评估系统设计用于竞赛运营

摘要: 研究社区已经在不同领域开发了基准数据集，用于比较算法和技术的性能。然而，要跟踪这些研究领域的进展并不容易，因为研究成果同时出现在不同的场所，并且许多研究声称代表了最新技术水平。为了解决这个问题，研究社区经常组织定期竞赛来评估各种算法和技术的性能，从而跟踪领域的进展。然而，这些竞赛带来了重大的运营负担。组织者必须管理和评估大量的提交内容。此外，参与者通常在不同的环境中开发他们的解决方案，导致在评估他们的提交内容时出现兼容性问题。本文介绍了一个在线竞赛系统，可以自动化竞赛的提交和评估过程。该竞赛系统允许组织者高效地管理大量的提交内容，并利用隔离环境来评估提交内容。这个系统已经成功地用于几个竞赛，包括基于网格的路径规划竞赛和机器人跑步者联赛竞赛。

更新时间: 2025-07-23 17:44:10

领域: cs.AI

下载: http://arxiv.org/abs/2507.17730v1

Deep Generative Learning of Magnetic Frustration in Artificial Spin Ice from Magnetic Force Microscopy Images

Increasingly large datasets of microscopic images with atomic resolution facilitate the development of machine learning methods to identify and analyze subtle physical phenomena embedded within the images. In this work, microscopic images of honeycomb lattice spin-ice samples serve as datasets from which we automate the calculation of net magnetic moments and directional orientations of spin-ice configurations. In the first stage of our workflow, machine learning models are trained to accurately predict magnetic moments and directions within spin-ice structures. Variational Autoencoders (VAEs), an emergent unsupervised deep learning technique, are employed to generate high-quality synthetic magnetic force microscopy (MFM) images and extract latent feature representations, thereby reducing experimental and segmentation errors. The second stage of proposed methodology enables precise identification and prediction of frustrated vertices and nanomagnetic segments, effectively correlating structural and functional aspects of microscopic images. This facilitates the design of optimized spin-ice configurations with controlled frustration patterns, enabling potential on-demand synthesis.

Updated: 2025-07-23 17:40:39

标题: 人工自旋冰中磁矛盾的深度生成学习，基于磁力显微镜图像

摘要: 随着原子分辨率的显微图像数据集越来越大，机器学习方法的发展有助于识别和分析嵌入在图像中的微妙物理现象。在这项工作中，蜂窝格子自旋冰样品的显微图像作为数据集，我们自动化计算自旋冰结构的净磁矩和方向。在我们的工作流程的第一阶段，机器学习模型被训练以准确预测自旋冰结构中的磁矩和方向。变分自动编码器（VAEs）是一种新兴的无监督深度学习技术，被用来生成高质量的合成磁力显微镜（MFM）图像并提取潜在特征表示，从而减少实验和分割错误。拟议方法的第二阶段实现了对受挫顶点和纳米磁段的精确识别和预测，有效地将显微图像的结构和功能方面进行相关。这有助于设计具有受控受挫模式的优化自旋冰结构，从而实现潜在的按需合成。

更新时间: 2025-07-23 17:40:39

领域: cond-mat.dis-nn,cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2507.17726v1

Robot Operation of Home Appliances by Reading User Manuals

Operating home appliances, among the most common tools in every household, is a critical capability for assistive home robots. This paper presents ApBot, a robot system that operates novel household appliances by "reading" their user manuals. ApBot faces multiple challenges: (i) infer goal-conditioned partial policies from their unstructured, textual descriptions in a user manual document, (ii) ground the policies to the appliance in the physical world, and (iii) execute the policies reliably over potentially many steps, despite compounding errors. To tackle these challenges, ApBot constructs a structured, symbolic model of an appliance from its manual, with the help of a large vision-language model (VLM). It grounds the symbolic actions visually to control panel elements. Finally, ApBot closes the loop by updating the model based on visual feedback. Our experiments show that across a wide range of simulated and real-world appliances, ApBot achieves consistent and statistically significant improvements in task success rate, compared with state-of-the-art large VLMs used directly as control policies. These results suggest that a structured internal representations plays an important role in robust robot operation of home appliances, especially, complex ones.

Updated: 2025-07-23 17:39:54

标题: 通过阅读用户手册实现家用电器的机器人操作

摘要: 在每个家庭中最常见的工具之一是家用电器，对于辅助家庭机器人来说，操作家用电器是一项至关重要的能力。本文介绍了ApBot，一个通过“阅读”用户手册来操作新型家用电器的机器人系统。ApBot面临着多重挑战：（i）从用户手册文档中的非结构化文本描述中推断目标条件部分策略，（ii）将策略与物理世界中的电器进行关联，（iii）在潜在的多个步骤中可靠执行策略，尽管存在累积误差。为了解决这些挑战，ApBot利用大型视觉语言模型（VLM）构建了一个电器的结构化符号模型，从其手册中提取信息。它通过视觉将符号动作与控制面板元素关联。最后，ApBot通过基于视觉反馈更新模型来闭环。我们的实验证明，在各种模拟和真实世界的电器中，与直接用作控制策略的最先进的大型VLM相比，ApBot在任务成功率方面取得了一致且具有统计学意义的改进。这些结果表明，结构化内部表示在家用电器的稳健机器人操作中发挥着重要作用，尤其是对于复杂的家用电器。

更新时间: 2025-07-23 17:39:54

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2505.20424v2

On the Interaction of Compressibility and Adversarial Robustness

Modern neural networks are expected to simultaneously satisfy a host of desirable properties: accurate fitting to training data, generalization to unseen inputs, parameter and computational efficiency, and robustness to adversarial perturbations. While compressibility and robustness have each been studied extensively, a unified understanding of their interaction still remains elusive. In this work, we develop a principled framework to analyze how different forms of compressibility - such as neuron-level sparsity and spectral compressibility - affect adversarial robustness. We show that these forms of compression can induce a small number of highly sensitive directions in the representation space, which adversaries can exploit to construct effective perturbations. Our analysis yields a simple yet instructive robustness bound, revealing how neuron and spectral compressibility impact $L_\infty$ and $L_2$ robustness via their effects on the learned representations. Crucially, the vulnerabilities we identify arise irrespective of how compression is achieved - whether via regularization, architectural bias, or implicit learning dynamics. Through empirical evaluations across synthetic and realistic tasks, we confirm our theoretical predictions, and further demonstrate that these vulnerabilities persist under adversarial training and transfer learning, and contribute to the emergence of universal adversarial perturbations. Our findings show a fundamental tension between structured compressibility and robustness, and suggest new pathways for designing models that are both efficient and secure.

Updated: 2025-07-23 17:35:48

标题: 压缩性与对抗鲁棒性相互作用论

摘要: 现代神经网络被期望同时满足许多理想特性：对训练数据的准确拟合，对未见输入的泛化能力，参数和计算效率，以及对对抗性扰动的稳健性。虽然可压缩性和稳健性都得到了广泛研究，但它们之间相互作用的统一理解仍然难以捉摸。在这项工作中，我们开发了一个原则性框架来分析不同形式的可压缩性 - 如神经元级稀疏和谱可压缩性 - 如何影响对抗性稳健性。我们表明，这些形式的压缩可以在表示空间中引入少量高度敏感的方向，对手可以利用这些方向构建有效的扰动。我们的分析得出了一个简单但有启发性的稳健性界限，揭示了神经元和谱可压缩性通过它们对学习表示的影响如何影响$L_\infty$和$L_2$稳健性。至关重要的是，我们确定的漏洞不论压缩是通过正则化、结构偏差还是隐式学习动态实现的都会出现 -。通过对合成和现实任务的实证评估，我们验证了我们的理论预测，并进一步证明了这些漏洞在对抗训练和迁移学习下持续存在，并有助于普遍对抗性扰动的出现。我们的发现显示了结构化可压缩性和稳健性之间的根本张力，并提出了设计既高效又安全的模型的新途径。

更新时间: 2025-07-23 17:35:48

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2507.17725v1

Towards Generalist Robot Learning from Internet Video: A Survey

Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video. This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots. This survey systematically examines the emerging field of LfV. We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data. Next, we comprehensively review current methods for extracting knowledge from large-scale internet video, overcoming LfV challenges, and improving robot learning through video-informed training. The survey concludes with a critical discussion of future opportunities. Here, we emphasize the need for scalable foundation model approaches that can leverage the full range of available internet video and enhance the learning of robot policies and dynamics models. Overall, the survey aims to inform and catalyse future LfV research, driving progress towards general-purpose robots.

Updated: 2025-07-23 17:31:03

标题: 朝向从互联网视频中学习通用机器人：一项调查

摘要: 将深度学习应用于海量和多样化的互联网数据，推动了视频生成和自然语言处理等领域的突破性进展。然而，机器人学习迄今为止未能复制这一成功，并受限于可用数据的稀缺。从视频学习（LfV）方法旨在通过将传统机器人数据与大规模互联网视频相结合，解决这一数据瓶颈问题。这些视频数据提供了有关物理动态、行为和任务的基础信息，对于通用用途的机器人来说具有很高的信息量。本研究系统地审视了LfV这一新兴领域。首先概述了基本概念，包括详细介绍视频数据中的分布变化和缺失的动作标签等基本LfV挑战。接下来，我们全面审查了从大规模互联网视频中提取知识、克服LfV挑战以及通过视频训练提高机器人学习的当前方法。调查最后对未来机会进行了批判性讨论。在此，我们强调了需要可扩展的基础模型方法，可以利用全范围的可用互联网视频，增强机器人政策和动态模型的学习。总体而言，本研究旨在为未来的LfV研究提供信息和推动进展，推动通用用途的机器人的发展。

更新时间: 2025-07-23 17:31:03

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2404.19664v5

Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning

Generative models such as diffusion and flow-matching offer expressive policies for offline reinforcement learning (RL) by capturing rich, multimodal action distributions, but their iterative sampling introduces high inference costs and training instability due to gradient propagation across sampling steps. We propose the \textit{Single-Step Completion Policy} (SSCP), a generative policy trained with an augmented flow-matching objective to predict direct completion vectors from intermediate flow samples, enabling accurate, one-shot action generation. In an off-policy actor-critic framework, SSCP combines the expressiveness of generative models with the training and inference efficiency of unimodal policies, without requiring long backpropagation chains. Our method scales effectively to offline, offline-to-online, and online RL settings, offering substantial gains in speed and adaptability over diffusion-based baselines. We further extend SSCP to goal-conditioned RL, enabling flat policies to exploit subgoal structures without explicit hierarchical inference. SSCP achieves strong results across standard offline RL and behavior cloning benchmarks, positioning it as a versatile, expressive, and efficient framework for deep RL and sequential decision-making.

Updated: 2025-07-23 17:30:42

标题: 基于流程的单步完成：高效且具有表现力的策略学习

摘要: 生成模型，如扩散和流匹配，通过捕捉丰富的多模态动作分布，为离线强化学习（RL）提供了表达力强的策略，但它们的迭代采样引入了高推理成本和训练不稳定性，因为梯度在采样步骤间传播。我们提出了“单步完成策略”（SSCP），这是一个生成策略，通过训练带有增强流匹配目标的模型来预测中间流样本中的直接完成向量，从而实现准确的一次性动作生成。在离线演员-评论家框架中，SSCP结合了生成模型的表达能力和单模态策略的训练和推理效率，而无需长时间的反向传播链。我们的方法有效地扩展到离线、离线至在线和在线RL设置，相比基于扩散的基线，提供了速度和适应性方面的重大增益。我们进一步将SSCP扩展到目标条件RL，使得扁平策略能够利用子目标结构，而无需显式的分层推理。SSCP在标准离线RL和行为克隆基准测试中取得了强大的成绩，将其定位为用于深度RL和序贯决策制定的多功能、表现力强和高效的框架。

更新时间: 2025-07-23 17:30:42

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2506.21427v2

AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer

With the rise of voice-enabled artificial intelligence (AI) systems, quantitative survey researchers have access to a new data-collection mode: AI telephone surveying. By using AI to conduct phone interviews, researchers can scale quantitative studies while balancing the dual goals of human-like interactivity and methodological rigor. Unlike earlier efforts that used interactive voice response (IVR) technology to automate these surveys, voice AI enables a more natural and adaptive respondent experience as it is more robust to interruptions, corrections, and other idiosyncrasies of human speech. We built and tested an AI system to conduct quantitative surveys based on large language models (LLM), automatic speech recognition (ASR), and speech synthesis technologies. The system was specifically designed for quantitative research, and strictly adhered to research best practices like question order randomization, answer order randomization, and exact wording. To validate the system's effectiveness, we deployed it to conduct two pilot surveys with the SSRS Opinion Panel and followed-up with a separate human-administered survey to assess respondent experiences. We measured three key metrics: the survey completion rates, break-off rates, and respondent satisfaction scores. Our results suggest that shorter instruments and more responsive AI interviewers may contribute to improvements across all three metrics studied.

Updated: 2025-07-23 17:30:14

标题: AI 电话调查：利用 AI 面试者自动化定量数据收集

摘要: 随着语音启用的人工智能系统的兴起，定量调查研究人员可以利用一种新的数据收集模式：AI电话调查。通过使用AI进行电话访谈，研究人员可以在平衡人类交互性和方法论严谨性这两个目标的基础上扩大定量研究规模。与早期使用交互式语音响应（IVR）技术自动化这些调查的努力不同，语音人工智能使得受访者体验更加自然和适应性更强，因为它更具鲁棒性，能够处理中断、更正和其他人类语音的特异性。我们构建并测试了一个基于大型语言模型（LLM）、自动语音识别（ASR）和语音合成技术进行定量调查的AI系统。该系统专门设计用于定量研究，并严格遵循研究最佳实践，如问题顺序随机化、答案顺序随机化和确切措辞。为了验证系统的有效性，我们将其部署用于与SSRS意见小组进行两项试点调查，并随后进行人工管理的调查以评估受访者体验。我们测量了三个关键指标：调查完成率、中断率和受访者满意度分数。我们的结果表明，较短的工具和更具响应性的AI访谈者可能有助于改善所有三个研究中研究的指标。

更新时间: 2025-07-23 17:30:14

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.17718v1

From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes

AI-generated clinical notes are increasingly used in healthcare, but evaluating their quality remains a challenge due to high subjectivity and limited scalability of expert review. Existing automated metrics often fail to align with real-world physician preferences. To address this, we propose a pipeline that systematically distills real user feedback into structured checklists for note evaluation. These checklists are designed to be interpretable, grounded in human feedback, and enforceable by LLM-based evaluators. Using deidentified data from over 21,000 clinical encounters, prepared in accordance with the HIPAA safe harbor standard, from a deployed AI medical scribe system, we show that our feedback-derived checklist outperforms baseline approaches in our offline evaluations in coverage, diversity, and predictive power for human ratings. Extensive experiments confirm the checklist's robustness to quality-degrading perturbations, significant alignment with clinician preferences, and practical value as an evaluation methodology. In offline research settings, the checklist can help identify notes likely to fall below our chosen quality thresholds.

Updated: 2025-07-23 17:28:31

标题: 从反馈到清单：对人工智能生成的临床笔记进行基于实证的评估

摘要: 人工智能生成的临床记录在医疗保健中越来越被广泛使用，但由于专家审查的高度主观性和有限的可扩展性，评估它们的质量仍然是一个挑战。现有的自动化指标通常无法与真实世界的医生偏好保持一致。为了解决这个问题，我们提出了一个流程，系统地将真实用户反馈提炼成结构化的检查表，用于记录评估。这些检查表设计成易于理解，基于人类反馈，且能够被基于LLM的评估者执行。使用根据HIPAA安全港标准准备的超过21,000个临床接触的去身份化数据，来自一个部署的AI医学抄写员系统，我们展示了我们基于反馈的检查表在覆盖范围、多样性和对人类评级的预测能力方面优于基线方法的离线评估。广泛的实验验证了检查表对质量降级扰动的稳健性，与临床医生偏好的显著一致性以及作为评估方法的实际价值。在离线研究环境中，该检查表可以帮助识别可能低于我们选择的质量阈值的记录。

更新时间: 2025-07-23 17:28:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.17717v1

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

Long-form video understanding presents significant challenges due to extensive temporal-spatial complexity and the difficulty of question answering under such extended contexts. While Large Language Models (LLMs) have demonstrated considerable advancements in video analysis capabilities and long context handling, they continue to exhibit limitations when processing information-dense hour-long videos. To overcome such limitations, we propose the Deep Video Discovery agent to leverage an agentic search strategy over segmented video clips. Different from previous video agents manually designing a rigid workflow, our approach emphasizes the autonomous nature of agents. By providing a set of search-centric tools on multi-granular video database, our DVD agent leverages the advanced reasoning capability of LLM to plan on its current observation state, strategically selects tools, formulates appropriate parameters for actions, and iteratively refines its internal reasoning in light of the gathered information. We perform comprehensive evaluation on multiple long video understanding benchmarks that demonstrates the advantage of the entire system design. Our DVD agent achieves SOTA performance, significantly surpassing prior works by a large margin on the challenging LVBench dataset. Comprehensive ablation studies and in-depth tool analyses are also provided, yielding insights to further advance intelligent agents tailored for long-form video understanding tasks. The code has been released in https://github.com/microsoft/DeepVideoDiscovery.

Updated: 2025-07-23 17:26:05

标题: 深度视频发现：具有工具使用的自主搜索，用于长篇视频理解

摘要: 长篇视频理解面临着重大挑战，因为存在广泛的时间空间复杂性以及在这种扩展上下文下回答问题的困难。虽然大型语言模型（LLMs）在视频分析能力和长期上下文处理方面取得了显着进展，但它们在处理信息密集的长达一小时的视频时仍然存在局限性。为了克服这些限制，我们提出了Deep Video Discovery代理，利用一种代理式搜索策略在分段视频剪辑上进行搜索。与先前设计刚性工作流程的视频代理不同，我们的方法强调代理的自主性质。通过在多粒度视频数据库上提供一套以搜索为中心的工具，我们的DVD代理利用LLM的先进推理能力来计划其当前观察状态，战略性地选择工具，制定适当的行动参数，并根据收集到的信息迭代地完善其内部推理。我们对多个长视频理解基准进行了全面评估，证明了整个系统设计的优势。我们的DVD代理在具有挑战性的LVBench数据集上取得了SOTA性能，明显超过先前的作品。我们还提供了全面的消融研究和深入的工具分析，为进一步推进定制长篇视频理解任务的智能代理提供了见解。代码已发布在https://github.com/microsoft/DeepVideoDiscovery。

更新时间: 2025-07-23 17:26:05

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2505.18079v3

Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased

Imbalanced binary classification problems arise in many fields of study. When using machine learning models for these problems, it is common to subsample the majority class (i.e., undersampling) to create a (more) balanced dataset for model training. This biases the model's predictions because the model learns from a dataset that does not follow the same data generating process as new data. One way of accounting for this bias is to analytically map the resulting predictions to new values based on the sampling rate for the majority class, which was used to create the training dataset. While this approach may work well for some machine learning models, we show that calibrating a random forest this way has unintended negative consequences, including prevalence estimates that can be upwardly biased. These prevalence estimates depend on both i) the number of predictors considered at each split in the random forest; and ii) the sampling rate used. We explain the former using known properties of random forests and analytical calibration. However, in investigating the latter issue, we made a surprising discovery - contrary to the widespread belief that decision trees are biased towards the majority class, they actually can be biased towards the minority class.

Updated: 2025-07-23 17:25:41

标题: 使用基于树模型的不平衡数据学习面临的挑战：患病率估计系统地依赖于超参数，并可能存在向上偏差

摘要: 不平衡的二元分类问题在许多领域中都会出现。在使用机器学习模型处理这些问题时，通常会对多数类别进行子采样（即欠采样），以创建一个（更）平衡的数据集用于模型训练。这会导致模型的预测存在偏差，因为模型是从一个不遵循与新数据相同数据生成过程的数据集中学习的。一种解决这种偏差的方法是根据用于创建训练数据集的多数类别的采样率，将结果预测进行分析映射到新值。尽管这种方法可能对某些机器学习模型有效，但我们展示了通过这种方式校准随机森林会产生意想不到的负面后果，包括可能会上升的患病率估计。这些患病率估计取决于i）在随机森林中考虑的每次分裂的预测变量数量；以及ii）使用的采样率。我们通过已知的随机森林特性和分析校准来解释前者。然而，在调查后者问题时，我们做出了一个令人惊讶的发现 - 与广泛的观念相反，决策树实际上可能偏向于少数类别。

更新时间: 2025-07-23 17:25:41

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2412.16209v3

Sequential Bayesian Design for Efficient Surrogate Construction in the Inversion of Darcy Flows

Inverse problems governed by partial differential equations (PDEs) play a crucial role in various fields, including computational science, image processing, and engineering. Particularly, Darcy flow equation is a fundamental equation in fluid mechanics, which plays a crucial role in understanding fluid flow through porous media. Bayesian methods provide an effective approach for solving PDEs inverse problems, while their numerical implementation requires numerous evaluations of computationally expensive forward solvers. Therefore, the adoption of surrogate models with lower computational costs is essential. However, constructing a globally accurate surrogate model for high-dimensional complex problems demands high model capacity and large amounts of data. To address this challenge, this study proposes an efficient locally accurate surrogate that focuses on the high-probability regions of the true likelihood in inverse problems, with relatively low model complexity and few training data requirements. Additionally, we introduce a sequential Bayesian design strategy to acquire the proposed surrogate since the high-probability region of the likelihood is unknown. The strategy treats the posterior evolution process of sequential Bayesian design as a Gaussian process, enabling algorithmic acceleration through one-step ahead prior. The complete algorithmic framework is referred to as Sequential Bayesian design for locally accurate surrogate (SBD-LAS). Finally, three experiments based the Darcy flow equation demonstrate the advantages of the proposed method in terms of both inversion accuracy and computational speed.

Updated: 2025-07-23 17:25:14

标题: 贝叶斯序贯设计在达西流反演中高效代理构建的应用

摘要: 由偏微分方程（PDEs）控制的反问题在各个领域中起着关键作用，包括计算科学、图像处理和工程领域。特别地，达西流动方程是流体力学中的基本方程，对于理解流体通过多孔介质的流动起着关键作用。贝叶斯方法为解决PDEs反问题提供了有效的途径，但它们的数值实现需要大量评估计算昂贵的前向求解器。因此，采用具有较低计算成本的替代模型至关重要。然而，为高维复杂问题构建全局准确的替代模型需要高模型容量和大量数据。为了解决这一挑战，本研究提出了一种有效的局部准确替代模型，侧重于反问题中真实似然的高概率区域，具有相对较低的模型复杂性和少量训练数据需求。此外，我们引入了一种顺序贝叶斯设计策略来获取提出的替代模型，因为真实似然的高概率区域是未知的。该策略将顺序贝叶斯设计的后验演变过程视为高斯过程，通过一步先验实现算法加速。完整的算法框架被称为局部准确替代模型的顺序贝叶斯设计（SBD-LAS）。最后，基于达西流动方程进行的三个实验展示了所提出方法在反演精度和计算速度方面的优势。

更新时间: 2025-07-23 17:25:14

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.17713v1

Quantum Software Security Challenges within Shared Quantum Computing Environments

The number of qubits in quantum computers keeps growing, but most quantum programs remain relatively small because of the noisy nature of the underlying quantum hardware. This might lead quantum cloud providers to explore increased hardware utilization, and thus profitability through means such as multi-programming, which would allow the execution of multiple programs in parallel. The adoption of such technology would bring entirely new challenges to the field of quantum software security. This article explores and reports the key challenges identified in quantum software security within shared quantum computing environments.

Updated: 2025-07-23 17:23:34

标题: 在共享量子计算环境中的量子软件安全挑战

摘要: 量子计算机中的量子位数不断增长，但由于底层量子硬件的嘈杂性，大多数量子程序仍然相对较小。这可能导致量子云提供商探索增加硬件利用率，从而通过诸如多程序设计之类的方式实现盈利能力，这将允许并行执行多个程序。采用这种技术将为量子软件安全领域带来全新挑战。本文探讨并报告了在共享量子计算环境中确定的量子软件安全性的关键挑战。

更新时间: 2025-07-23 17:23:34

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2507.17712v1

The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks

This research addresses the critical lack of comprehensive studies on feature scaling by systematically evaluating 12 scaling techniques - including several less common transformations - across 14 different Machine Learning algorithms and 16 datasets for classification and regression tasks. We meticulously analyzed impacts on predictive performance (using metrics such as accuracy, MAE, MSE, and $R^2$) and computational costs (training time, inference time, and memory usage). Key findings reveal that while ensemble methods (such as Random Forest and gradient boosting models like XGBoost, CatBoost and LightGBM) demonstrate robust performance largely independent of scaling, other widely used models such as Logistic Regression, SVMs, TabNet, and MLPs show significant performance variations highly dependent on the chosen scaler. This extensive empirical analysis, with all source code, experimental results, and model parameters made publicly available to ensure complete transparency and reproducibility, offers model-specific crucial guidance to practitioners on the need for an optimal selection of feature scaling techniques.

Updated: 2025-07-23 17:23:04

标题: 机器学习中特征缩放的影响：对回归和分类任务的影响

摘要: 这项研究解决了在特征缩放方面缺乏全面研究的重要问题，通过系统评估了12种缩放技术（包括一些较少常见的转换），涵盖了14种不同的机器学习算法和16个数据集，用于分类和回归任务。我们仔细分析了对预测性能（使用准确率、MAE、MSE和$R^2$等指标）和计算成本（训练时间、推断时间和内存使用）的影响。关键发现表明，虽然集成方法（如随机森林和梯度提升模型，如XGBoost、CatBoost和LightGBM）表现稳健，很大程度上独立于缩放，但其他广泛使用的模型，如逻辑回归、支持向量机、TabNet和MLP，表现出明显的性能变化，高度依赖于所选择的缩放器。这项广泛的经验分析，所有源代码、实验结果和模型参数均公开可用，以确保完全透明性和可重复性，为从业者提供了针对模型的关键指导，指导他们选择最佳的特征缩放技术。

更新时间: 2025-07-23 17:23:04

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2506.08274v3

Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure

Financial scenario simulation is essential for risk management and portfolio optimization, yet it remains challenging especially in high-dimensional and small data settings common in finance. We propose a diffusion factor model that integrates latent factor structure into generative diffusion processes, bridging econometrics with modern generative AI to address the challenges of the curse of dimensionality and data scarcity in financial simulation. By exploiting the low-dimensional factor structure inherent in asset returns, we decompose the score function--a key component in diffusion models--using time-varying orthogonal projections, and this decomposition is incorporated into the design of neural network architectures. We derive rigorous statistical guarantees, establishing nonasymptotic error bounds for both score estimation at O(d^{5/2} n^{-2/(k+5)}) and generated distribution at O(d^{5/4} n^{-1/2(k+5)}), primarily driven by the intrinsic factor dimension k rather than the number of assets d, surpassing the dimension-dependent limits in the classical nonparametric statistics literature and making the framework viable for markets with thousands of assets. Numerical studies confirm superior performance in latent subspace recovery under small data regimes. Empirical analysis demonstrates the economic significance of our framework in constructing mean-variance optimal portfolios and factor portfolios. This work presents the first theoretical integration of factor structure with diffusion models, offering a principled approach for high-dimensional financial simulation with limited data. Our code is available at https://github.com/xymmmm00/diffusion_factor_model.

Updated: 2025-07-23 17:18:54

标题: 扩散因子模型：利用因子结构生成高维收益

摘要: 金融场景模拟对于风险管理和投资组合优化至关重要，然而在金融领域常见的高维度和小数据设置中仍然具有挑战性。我们提出了一个扩散因子模型，将潜在因子结构整合到生成性扩散过程中，将计量经济学与现代生成式人工智能相结合，以解决金融模拟中维度诅咒和数据稀缺性的挑战。通过利用资产回报中固有的低维因子结构，我们使用时间变化的正交投影来分解得分函数--扩散模型中的关键组成部分，这种分解被纳入神经网络架构设计中。我们推导了严格的统计保证，建立了得分估计和生成分布的非渐近误差界，主要由固有因子维度k驱动，而不是资产数量d，超越了经典非参数统计文献中的维度相关限制，使该框架适用于拥有数千资产的市场。数值研究证实了在小数据情况下对潜在子空间恢复的卓越性能。实证分析证明了我们框架在构建均值-方差最优投资组合和因子组合方面的经济意义。这项工作首次将因子结构与扩散模型在理论上整合，为具有有限数据的高维金融模拟提供了一个原则性方法。我们的代码可在https://github.com/xymmmm00/diffusion_factor_model获取。

更新时间: 2025-07-23 17:18:54

领域: q-fin.ST,cs.LG,q-fin.MF

下载: http://arxiv.org/abs/2504.06566v4

HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging

Large language models (LLMs) often leverage adapters, such as low-rank-based adapters, to achieve strong performance on downstream tasks. However, storing a separate adapter for each task significantly increases memory requirements, posing a challenge for resource-constrained environments such as mobile devices. Although model merging techniques can reduce storage costs, they typically result in substantial performance degradation. In this work, we introduce HydraOpt, a new model merging technique that capitalizes on the inherent similarities between the matrices of low-rank adapters. Unlike existing methods that produce a fixed trade-off between storage size and performance, HydraOpt allows us to navigate this spectrum of efficiency and performance. Our experiments show that HydraOpt significantly reduces storage size (48% reduction) compared to storing all adapters, while achieving competitive performance (0.2-1.8% drop). Furthermore, it outperforms existing merging techniques in terms of performance at the same or slightly worse storage efficiency.

Updated: 2025-07-23 17:12:19

标题: HydraOpt: 导航适配器合并的效率-性能权衡

摘要: 大型语言模型（LLM）通常利用适配器，如基于低秩的适配器，以在下游任务上取得强大的性能。然而，为每个任务存储单独的适配器会显著增加内存需求，在资源受限的环境（如移动设备）中构成挑战。虽然模型合并技术可以减少存储成本，但通常会导致性能大幅下降。在这项工作中，我们介绍了HydraOpt，一种利用低秩适配器矩阵之间固有相似性的新的模型合并技术。与现有方法不同，HydraOpt允许我们在存储大小和性能之间灵活选择。我们的实验表明，与存储所有适配器相比，HydraOpt显著减少了存储大小（减少48%），同时实现了竞争性的性能（下降0.2-1.8%）。此外，在性能方面，它超过了现有的合并技术，同时存储效率相同或稍差。

更新时间: 2025-07-23 17:12:19

领域: cs.LG

下载: http://arxiv.org/abs/2507.17706v1

Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming Problem

Mixed-integer programming (MIP) is a powerful paradigm for modeling and solving various important combinatorial optimization problems. Recently, learning-based approaches have shown a potential to speed up MIP solving via offline training that then guides important design decisions during the search. However, a significant drawback of these methods is their heavy reliance on offline training, which requires collecting training datasets and computationally costly training epochs yet offering only limited generalization to unseen (larger) instances. In this paper, we propose Balans, an adaptive meta-solver for MIPs with online learning capability that does not require any supervision or apriori training. At its core, Balans is based on adaptive large-neighborhood search, operating on top of an MIP solver by successive applications of destroy and repair neighborhood operators. During the search, the selection among different neighborhood definitions is guided on the fly for the instance at hand via multi-armed bandit algorithms. Our extensive experiments on hard optimization instances show that Balans offers significant performance gains over the default MIP solver, is better than committing to any single best neighborhood, and improves over the state-of-the-art large-neighborhood search for MIPs. Finally, we release Balans as a highly configurable, MIP solver agnostic, open-source software.

Updated: 2025-07-23 17:09:55

标题: Balans：基于多臂老虎机的自适应大领域搜索用于混合整数规划问题

摘要: 混合整数规划（MIP）是建模和解决各种重要组合优化问题的强大范例。最近，基于学习的方法显示出通过离线训练加快MIP求解的潜力，然后在搜索过程中指导重要设计决策。然而，这些方法的一个显著缺点是它们严重依赖于离线训练，这需要收集训练数据集和计算昂贵的训练周期，却仅对未见过的（更大的）实例提供有限的泛化能力。在本文中，我们提出了Balans，一个具有在线学习能力的自适应元求解器，它不需要任何监督或先验训练。在其核心，Balans基于自适应大领域搜索，通过连续应用破坏和修复领域操作在MIP求解器之上运行。在搜索过程中，通过多臂赌博算法动态指导当前实例中不同领域定义之间的选择。我们对难解优化实例的广泛实验表明，Balans比默认的MIP求解器提供了显著的性能提升，优于任何单一最佳领域的选择，并改进了现有的MIP的大领域搜索技术。最后，我们将Balans作为一个高度可配置、与MIP求解器无关的开源软件发布。

更新时间: 2025-07-23 17:09:55

领域: cs.AI,cs.LG,math.OC

下载: http://arxiv.org/abs/2412.14382v3

Thinking Isn't an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations

Large Reasoning Models (LRMs) have become a central focus in today's large language model (LLM) research, where models are designed to output a step-by-step thinking process before arriving at a final answer to handle complex reasoning tasks. Despite their promise, recent empirical studies (e.g., [Shojaee et al., 2025] from Apple) suggest that this thinking process may not actually enhance reasoning ability, where LLMs without explicit reasoning actually outperform LRMs on tasks with low or high complexity. In this work, we revisit these findings and investigate whether the limitations of LRMs persist when tool augmentations are introduced. We incorporate two types of tools, Python interpreters and scratchpads, and evaluate three representative LLMs and their LRM counterparts on Apple's benchmark reasoning puzzles. Our results show that, with proper tool use, LRMs consistently outperform their non-reasoning counterparts across all levels of task complexity. These findings challenge the recent narrative that reasoning is an illusion and highlight the potential of tool-augmented LRMs for solving complex problems.

Updated: 2025-07-23 17:04:20

标题: 思考并非幻觉：通过工具增强克服推理模型的局限性

摘要: 大型推理模型（LRMs）已成为当今大型语言模型（LLM）研究的中心关注点，这些模型被设计为在最终得出答案之前输出一个逐步推理过程，以处理复杂的推理任务。尽管它们很有潜力，但最近的实证研究（例如，来自苹果的[Shojaee et al.，2025]）表明，这种思考过程实际上可能并不会提高推理能力，在低复杂度或高复杂度的任务中，没有明确推理的LLMs可能会表现出比LRMs更好的性能。在这项工作中，我们重新审视这些发现，并调查当引入工具增强时，LRMs的限制是否仍然存在。我们结合了两种类型的工具，Python解释器和草稿本，并在苹果的基准推理谜题上评估了三种代表性的LLMs及其LRM对应物。我们的结果显示，通过适当使用工具，LRMs在所有任务复杂度水平上都一贯表现优于其非推理对应物。这些发现挑战了最近的叙事，即推理是一种幻觉，并突显了工具增强的LRMs在解决复杂问题方面的潜力。

更新时间: 2025-07-23 17:04:20

领域: cs.AI

下载: http://arxiv.org/abs/2507.17699v1

A Mathematical Theory of Discursive Networks

Large language models (LLMs) turn writing into a live exchange between humans and software. We characterize this new medium as a discursive network that treats people and LLMs as equal nodes and tracks how their statements circulate. We define the generation of erroneous information as invalidation (any factual, logical, or structural breach) and show it follows four hazards: drift from truth, self-repair, fresh fabrication, and external detection. We develop a general mathematical model of discursive networks that shows that a network governed only by drift and self-repair stabilizes at a modest error rate. Giving each false claim even a small chance of peer review shifts the system to a truth-dominant state. We operationalize peer review with the open-source Flaws-of-Others (FOO) algorithm: a configurable loop in which any set of agents critique one another while a harmonizer merges their verdicts. We identify an ethical transgression, epithesis, that occurs when humans fail to engage in the discursive network. The takeaway is practical and cultural: reliability in this new medium comes not from perfecting single models but from connecting imperfect ones into networks that enforce mutual accountability.

Updated: 2025-07-23 17:02:53

标题: 一种关于辩论网络的数学理论

摘要: 大型语言模型（LLM）将写作转变为人类和软件之间的实时交流。我们将这种新媒介描述为一种辩论网络，将人类和LLMs视为平等节点，并跟踪它们的声明如何流通。我们将错误信息的生成定义为无效化（任何事实、逻辑或结构性违规），并展示它遵循四种危险：远离真相、自我修复、新鲜编造和外部检测。我们开发了一种关于辩论网络的通用数学模型，显示仅由漂移和自我修复管理的网络将稳定在适度错误率。即使给每个虚假声明一个很小的同行评审机会也会将系统转变为以真相为主导的状态。我们通过开源的Flaws-of-Others（FOO）算法对同行评审进行了操作化：一个可配置的循环，在这个循环中，任何一组代理互相批评，同时一个协调者合并他们的裁决。我们确定了一种伦理上的违规行为，即当人类未参与辩论网络时发生的“epithesis”。结论是实践和文化：在这种新媒介中的可靠性不是来自完善单一模型，而是来自将不完美的模型连接成互相强制问责的网络。

更新时间: 2025-07-23 17:02:53

领域: cs.CL,cs.LG,68T01, 60J10, 91D30, 05C82, 68T50, 68W20, 94A15,I.2.7; I.2.11; G.3

下载: http://arxiv.org/abs/2507.06565v5

Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks

Large Language Model (LLM)-based autonomous agents are expected to play a vital role in the evolution of 6G networks, by empowering real-time decision-making related to management and service provisioning to end-users. This shift facilitates the transition from a specialized intelligence approach, where artificial intelligence (AI) algorithms handle isolated tasks, to artificial general intelligence (AGI)-driven networks, where agents possess broader reasoning capabilities and can manage diverse network functions. In this paper, we introduce a novel agentic paradigm that combines LLMs with real-time optimization algorithms towards Trustworthy AI, defined as symbiotic agents. Optimizers at the LLM's input-level provide bounded uncertainty steering for numerically precise tasks, whereas output-level optimizers supervised by the LLM enable adaptive real-time control. We design and implement two novel agent types including: (i) Radio Access Network optimizers, and (ii) multi-agent negotiators for Service-Level Agreements (SLAs). We further propose an end-to-end architecture for AGI networks and evaluate it on a 5G testbed capturing channel fluctuations from moving vehicles. Results show that symbiotic agents reduce decision errors fivefold compared to standalone LLM-based agents, while smaller language models (SLM) achieve similar accuracy with a 99.9% reduction in GPU resource overhead and in near-real-time loops of 82 ms. A multi-agent demonstration for collaborative RAN on the real-world testbed highlights significant flexibility in service-level agreement and resource allocation, reducing RAN over-utilization by approximately 44%. Drawing on our findings and open-source implementations, we introduce the symbiotic paradigm as the foundation for next-generation, AGI-driven networks-systems designed to remain adaptable, efficient, and trustworthy even as LLMs advance.

Updated: 2025-07-23 17:01:23

标题: 共生代理：一个可靠的AGI驱动网络的新范式

摘要: 基于大型语言模型（LLM）的自主代理人预计将在6G网络的演进中发挥重要作用，通过赋予终端用户与管理和服务供应相关的实时决策能力。这种转变促进了从专业智能方法的过渡，其中人工智能（AI）算法处理孤立任务，到人工通用智能（AGI）驱动网络的转变，其中代理人具有更广泛的推理能力并能够管理各种网络功能。在本文中，我们介绍了一种将LLM与实时优化算法相结合的新型代理范式，以实现所谓的共生代理人的可信人工智能。LLM输入级别的优化器为数字精确任务提供有界不确定性引导，而受LLM监督的输出级别优化器实现自适应实时控制。我们设计并实现了两种新型代理类型，包括：（i）无线接入网络优化器，以及（ii）服务级协议（SLA）的多代理协商者。我们进一步提出了一个面向AGI网络的端到端架构，并在一个捕捉移动车辆信道波动的5G测试平台上进行评估。结果显示，与独立LLM代理相比，共生代理人将决策错误减少了五倍，而较小的语言模型（SLM）在GPU资源开销减少了99.9%的同时，准确性相似，并在82毫秒的接近实时循环中实现。在真实世界测试平台上展示的多代理协作RAN演示突显了服务级协议和资源分配的显著灵活性，将RAN过度利用减少了约44%。借鉴我们的发现和开源实现，我们将共生范式引入为下一代AGI驱动网络系统的基础，这些系统旨在即使LLM不断发展也能保持适应性、高效性和可信赖性。

更新时间: 2025-07-23 17:01:23

领域: cs.AI,cs.NI

下载: http://arxiv.org/abs/2507.17695v1

Joint Asymmetric Loss for Learning with Noisy Labels

Learning with noisy labels is a crucial task for training accurate deep neural networks. To mitigate label noise, prior studies have proposed various robust loss functions, particularly symmetric losses. Nevertheless, symmetric losses usually suffer from the underfitting issue due to the overly strict constraint. To address this problem, the Active Passive Loss (APL) jointly optimizes an active and a passive loss to mutually enhance the overall fitting ability. Within APL, symmetric losses have been successfully extended, yielding advanced robust loss functions. Despite these advancements, emerging theoretical analyses indicate that asymmetric losses, a new class of robust loss functions, possess superior properties compared to symmetric losses. However, existing asymmetric losses are not compatible with advanced optimization frameworks such as APL, limiting their potential and applicability. Motivated by this theoretical gap and the prospect of asymmetric losses, we extend the asymmetric loss to the more complex passive loss scenario and propose the Asymetric Mean Square Error (AMSE), a novel asymmetric loss. We rigorously establish the necessary and sufficient condition under which AMSE satisfies the asymmetric condition. By substituting the traditional symmetric passive loss in APL with our proposed AMSE, we introduce a novel robust loss framework termed Joint Asymmetric Loss (JAL). Extensive experiments demonstrate the effectiveness of our method in mitigating label noise. Code available at: https://github.com/cswjl/joint-asymmetric-loss

Updated: 2025-07-23 16:57:43

标题: 使用带有噪声标签的联合不对称损失进行学习

摘要: 学习带有噪声标签是训练准确深度神经网络的关键任务。为了减轻标签噪声，先前的研究提出了各种鲁棒损失函数，特别是对称损失。然而，对称损失通常由于过于严格的约束而面临拟合不足的问题。为了解决这个问题，主动被动损失（APL）联合优化主动损失和被动损失，相互增强整体拟合能力。在APL中，对称损失已经成功地扩展，产生了先进的鲁棒损失函数。尽管取得了这些进展，新兴的理论分析表明，不对称损失，一种新的鲁棒损失函数类别，与对称损失相比具有更优越的特性。然而，现有的不对称损失与APL等先进优化框架不兼容，限制了它们的潜力和适用性。受到这一理论差距和不对称损失前景的激励，我们将不对称损失扩展到更复杂的被动损失场景，并提出了不对称均方误差（AMSE），一种新颖的不对称损失。我们严格地建立了AMSE满足不对称条件的必要和充分条件。通过用我们提出的AMSE替换APL中传统的对称被动损失，我们引入了一种新颖的鲁棒损失框架，称为联合不对称损失（JAL）。大量实验表明我们的方法在减轻标签噪声方面的有效性。代码可在以下链接找到：https://github.com/cswjl/joint-asymmetric-loss

更新时间: 2025-07-23 16:57:43

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.17692v1

CASCADE: LLM-Powered JavaScript Deobfuscator at Google

Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR). By employing Gemini to identify critical prelude functions, the foundational components underlying the most prevalent obfuscation techniques, and leveraging JSIR for subsequent code transformations, CASCADE effectively recovers semantic elements like original strings and API names, and reveals original program behaviors. This method overcomes limitations of existing static and dynamic deobfuscation techniques, eliminating hundreds to thousands of hardcoded rules while achieving reliability and flexibility. CASCADE is already deployed in Google's production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency and reducing reverse engineering efforts.

Updated: 2025-07-23 16:57:32

标题: CASCADE: 在Google上使用LLM技术的JavaScript去混淆器

摘要: 软件混淆在JavaScript中特别普遍，阻碍了代码理解和分析，给软件测试、静态分析和恶意软件检测带来了重大挑战。本文介绍了CASCADE，一种新颖的混合方法，将Gemini的先进编码能力与编译器中间表示(IR)的确定性转换能力相结合，特别是JavaScript IR (JSIR)。通过利用Gemini识别关键的序言函数，即支撑最普遍混淆技术的基础组件，并利用JSIR进行后续代码转换，CASCADE有效地恢复了原始字符串和API名称等语义元素，并揭示了原始程序行为。这种方法克服了现有静态和动态去混淆技术的局限性，消除了数百到数千个硬编码规则，同时实现了可靠性和灵活性。CASCADE已经在谷歌的生产环境中部署，显著提高了JavaScript去混淆效率，减少了逆向工程的工作量。

更新时间: 2025-07-23 16:57:32

领域: cs.SE,cs.AI,cs.CR,cs.LG,cs.PL

下载: http://arxiv.org/abs/2507.17691v1

In-Trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates

Inverse reinforcement learning (IRL) aims to learn a reward function and a corresponding policy that best fit the demonstrated trajectories of an expert. However, current IRL works cannot learn incrementally from an ongoing trajectory because they have to wait to collect at least one complete trajectory to learn. To bridge the gap, this paper considers the problem of learning a reward function and a corresponding policy while observing the initial state-action pair of an ongoing trajectory and keeping updating the learned reward and policy when new state-action pairs of the ongoing trajectory are observed. We formulate this problem as an online bi-level optimization problem where the upper level dynamically adjusts the learned reward according to the newly observed state-action pairs with the help of a meta-regularization term, and the lower level learns the corresponding policy. We propose a novel algorithm to solve this problem and guarantee that the algorithm achieves sub-linear local regret $O(\sqrt{T}+\log T+\sqrt{T}\log T)$. If the reward function is linear, we prove that the proposed algorithm achieves sub-linear regret $O(\log T)$. Experiments are used to validate the proposed algorithm.

Updated: 2025-07-23 16:56:21

标题: 在轨迹内逆强化学习：在持续轨迹终止之前逐步学习

摘要: 反向强化学习（IRL）旨在学习一个奖励函数和相应的策略，以最佳地适应专家示范的轨迹。然而，当前的IRL工作无法从正在进行的轨迹中逐渐学习，因为它们必须等待收集至少一个完整的轨迹才能学习。为了填补这一差距，本文考虑了在观察到正在进行的轨迹的初始状态-动作对时学习奖励函数和相应策略的问题，并在观察到正在进行的轨迹的新状态-动作对时不断更新学习到的奖励和策略。我们将这个问题形式化为一个在线双层优化问题，其中上层根据新观察到的状态-动作对动态调整学习到的奖励，借助元正则化项，并且下层学习相应的策略。我们提出了一种新颖的算法来解决这个问题，并保证该算法实现次线性的局部遗憾$ O（\sqrt{T} + \log T + \sqrt{T} \log T）$。如果奖励函数是线性的，我们证明了所提出的算法实现次线性遗憾$ O（\log T）$。实验证明了所提出的算法的有效性。

更新时间: 2025-07-23 16:56:21

领域: cs.LG

下载: http://arxiv.org/abs/2410.15612v7

Mindfulness Meditation and Respiration: Accelerometer-Based Respiration Rate and Mindfulness Progress Estimation to Enhance App Engagement and Mindfulness Skills

Mindfulness training is widely recognized for its benefits in reducing depression, anxiety, and loneliness. With the rise of smartphone-based mindfulness apps, digital meditation has become more accessible, but sustaining long-term user engagement remains a challenge. This paper explores whether respiration biosignal feedback and mindfulness skill estimation enhance system usability and skill development. We develop a smartphone's accelerometer-based respiration tracking algorithm, eliminating the need for additional wearables. Unlike existing methods, our approach accurately captures slow breathing patterns typical of mindfulness meditation. Additionally, we introduce the first quantitative framework to estimate mindfulness skills-concentration, sensory clarity, and equanimity-based on accelerometer-derived respiration data. We develop and test our algorithms on 261 mindfulness sessions in both controlled and real-world settings. A user study comparing an experimental group receiving biosignal feedback with a control group using a standard app shows that respiration feedback enhances system usability. Our respiration tracking model achieves a mean absolute error (MAE) of 1.6 breaths per minute, closely aligning with ground truth data, while our mindfulness skill estimation attains F1 scores of 80-84% in tracking skill progression. By integrating respiration tracking and mindfulness estimation into a commercial app, we demonstrate the potential of smartphone sensors to enhance digital mindfulness training.

Updated: 2025-07-23 16:52:42

标题: 正念冥想与呼吸：基于加速计的呼吸率和正念进展估计以增强应用参与度和正念技巧

摘要: 正念训练被广泛认为有助于减少抑郁、焦虑和孤独。随着基于智能手机的正念应用的兴起，数字化冥想变得更加容易接触，但长期用户参与度的持续仍然是一个挑战。本文探讨呼吸生物信号反馈和正念技能估计是否增强系统的可用性和技能发展。我们开发了基于智能手机加速度计的呼吸跟踪算法，消除了对额外可穿戴设备的需求。与现有方法不同，我们的方法准确捕捉到正念冥想中典型的缓慢呼吸模式。此外，我们首次引入了基于加速度计衍生的呼吸数据的正念技能估计的第一个定量框架-专注力、感知清晰度和平静度。我们在控制和真实世界环境中的261个正念训练会话中开发和测试我们的算法。一项用户研究比较了一个接收生物信号反馈的实验组和使用标准应用程序的对照组，结果显示呼吸反馈增强了系统的可用性。我们的呼吸跟踪模型实现了1.6次呼吸每分钟的平均绝对误差（MAE），与真实数据密切吻合，而我们的正念技能估计在追踪技能进展方面取得了80-84%的F1分数。通过将呼吸跟踪和正念估计整合到商业应用程序中，我们展示了智能手机传感器提升数字化正念训练的潜力。

更新时间: 2025-07-23 16:52:42

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.17688v1

Towards Effective Open-set Graph Class-incremental Learning

Graph class-incremental learning (GCIL) allows graph neural networks (GNNs) to adapt to evolving graph analytical tasks by incrementally learning new class knowledge while retaining knowledge of old classes. Existing GCIL methods primarily focus on a closed-set assumption, where all test samples are presumed to belong to previously known classes. Such an assumption restricts their applicability in real-world scenarios, where unknown classes naturally emerge during inference, and are absent during training. In this paper, we explore a more challenging open-set graph class-incremental learning scenario with two intertwined challenges: catastrophic forgetting of old classes, which impairs the detection of unknown classes, and inadequate open-set recognition, which destabilizes the retention of learned knowledge. To address the above problems, a novel OGCIL framework is proposed, which utilizes pseudo-sample embedding generation to effectively mitigate catastrophic forgetting and enable robust detection of unknown classes. To be specific, a prototypical conditional variational autoencoder is designed to synthesize node embeddings for old classes, enabling knowledge replay without storing raw graph data. To handle unknown classes, we employ a mixing-based strategy to generate out-of-distribution (OOD) samples from pseudo in-distribution and current node embeddings. A novel prototypical hypersphere classification loss is further proposed, which anchors in-distribution embeddings to their respective class prototypes, while repelling OOD embeddings away. Instead of assigning all unknown samples into one cluster, our proposed objective function explicitly models them as outliers through prototype-aware rejection regions, ensuring a robust open-set recognition. Extensive experiments on five benchmarks demonstrate the effectiveness of OGCIL over existing GCIL and open-set GNN methods.

Updated: 2025-07-23 16:51:23

标题: 朝着有效的开放集图类增量学习路径

摘要: 图类增量学习（GCIL）允许图神经网络（GNNs）适应不断发展的图分析任务，通过增量学习新的类知识同时保留旧类知识。现有的GCIL方法主要专注于封闭集假设，即所有测试样本被假定属于先前已知的类。这种假设限制了它们在现实场景中的适用性，因为在推断过程中未知类别在训练时并不存在。在本文中，我们探索了一个更具挑战性的开放集图类增量学习场景，其中存在两个交织的挑战：旧类别的灾难性遗忘，损害了未知类别的检测，以及不足的开放集识别，破坏了学习知识的保留。为解决上述问题，提出了一种新颖的OGCIL框架，利用伪样本嵌入生成有效缓解灾难性遗忘，并实现对未知类别的稳健检测。具体地，设计了一种原型条件变分自动编码器，用于合成旧类别的节点嵌入，实现知识重播而无需存储原始图数据。为处理未知类别，采用了基于混合的策略，从伪分布和当前节点嵌入中生成超出分布（OOD）样本。进一步提出了一种新颖的原型超球分类损失，将分布嵌入锚定到其相应的类原型，同时将OOD嵌入远离。我们的目标函数不是将所有未知样本分配到一个簇，而是通过原型感知的拒绝区域明确地将它们建模为异常值，确保强大的开放集识别。对五个基准进行的广泛实验表明OGCIL相对于现有的GCIL和开放集GNN方法的有效性。

更新时间: 2025-07-23 16:51:23

领域: cs.LG

下载: http://arxiv.org/abs/2507.17687v1

Debiased maximum-likelihood estimators for hazard ratios under machine-learning adjustment

Previous studies have shown that hazard ratios between treatment groups estimated with the Cox model are uninterpretable because the indefinite baseline hazard of the model fails to identify temporal change in the risk set composition due to treatment assignment and unobserved factors among multiple, contradictory scenarios. To alleviate this problem, especially in studies based on observational data with uncontrolled dynamic treatment and real-time measurement of many covariates, we propose abandoning the baseline hazard and using machine learning to explicitly model the change in the risk set with or without latent variables. For this framework, we clarify the context in which hazard ratios can be causally interpreted, and then develop a method based on Neyman orthogonality to compute debiased maximum-likelihood estimators of hazard ratios. Computing the constructed estimators is more efficient than computing those based on weighted regression with marginal structural Cox models. Numerical simulations confirm that the proposed method identifies the ground truth with minimal bias. These results lay the foundation for developing a useful, alternative method for causal inference with uncontrolled, observational data in modern epidemiology.

Updated: 2025-07-23 16:51:09

标题: 通过机器学习调整的危险比的无偏最大似然估计量

摘要: 以前的研究表明，使用Cox模型估计的治疗组之间的危险比是无法解释的，因为模型的基线风险不确定导致未能确定由于治疗分配和未观察到的因素在多个矛盾情景中导致风险集合组成的时间变化。为了缓解这个问题，特别是在基于观察数据的研究中，其中存在未受控的动态治疗和许多协变量的实时测量，我们建议放弃基线风险，并使用机器学习明确地建模风险集合的变化，无论是否有潜在变量。对于这个框架，我们澄清了危险比可以被因果解释的情境，然后基于Neyman正交性开发了一种方法来计算危险比的无偏最大似然估计。计算构造的估计量比基于边际结构Cox模型的加权回归计算更有效。数值模拟证实了所提出的方法以最小的偏差确定了地面真相。这些结果为在现代流行病学中使用未受控的观察数据进行因果推断奠定了基础，开发了一种有用的替代方法。

更新时间: 2025-07-23 16:51:09

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.17686v1

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Large Language Models (LLMs) have become indispensable in real-world applications. However, their widespread adoption raises significant safety concerns, particularly in responding to socially harmful questions. Despite substantial efforts to improve model safety through alignment, aligned models can still have their safety protections undermined by subsequent fine-tuning - even when the additional training data appears benign. In this paper, we empirically demonstrate that this vulnerability stems from the sensitivity of safety-critical low-rank subspaces in LLM parameters to fine-tuning. Building on this insight, we propose a novel training-free method, termed Low-Rank Extrapolation (LoX), to enhance safety robustness by extrapolating the safety subspace of an aligned LLM. Our experimental results confirm the effectiveness of LoX, demonstrating significant improvements in robustness against both benign and malicious fine-tuning attacks while preserving the model's adaptability to new tasks. For instance, LoX leads to 11% to 54% absolute reductions in attack success rates (ASR) facing benign or malicious fine-tuning attacks. By investigating the ASR landscape of parameters, we attribute the success of LoX to that the extrapolation moves LLM parameters to a flatter zone, thereby less sensitive to perturbations. The code is available at github.com/VITA-Group/LoX.

Updated: 2025-07-23 16:48:01

标题: LoX：低秩外推稳固了LLM在微调中的安全性

摘要: 大型语言模型（LLMs）已经成为现实世界中不可或缺的应用。然而，它们的广泛应用引发了重大的安全担忧，特别是在应对社会有害问题方面。尽管通过对齐来改进模型安全性的努力很大，但即使附加的训练数据看似无害，对齐模型仍可能因随后的微调而破坏其安全保护。在本文中，我们通过实验证明，这种脆弱性源于LLM参数中安全关键的低秩子空间对微调的敏感性。基于这一洞见，我们提出了一种新颖的无需训练的方法，称为低秩外推（LoX），通过外推对齐LLM的安全子空间来增强安全鲁棒性。我们的实验结果证实了LoX的有效性，显著提高了对良性和恶意微调攻击的鲁棒性，同时保留了模型对新任务的适应能力。例如，LoX导致面对良性或恶意微调攻击的攻击成功率（ASR）降低了11%到54%。通过研究参数的ASR景观，我们认为LoX的成功在于外推将LLM参数移动到一个更平缓的区域，从而对扰动不太敏感。代码可在github.com/VITA-Group/LoX 上找到。

更新时间: 2025-07-23 16:48:01

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.15606v2

Generalized Dual Discriminator GANs

Dual discriminator generative adversarial networks (D2 GANs) were introduced to mitigate the problem of mode collapse in generative adversarial networks. In D2 GANs, two discriminators are employed alongside a generator: one discriminator rewards high scores for samples from the true data distribution, while the other favors samples from the generator. In this work, we first introduce dual discriminator $\alpha$-GANs (D2 $\alpha$-GANs), which combines the strengths of dual discriminators with the flexibility of a tunable loss function, $\alpha$-loss. We further generalize this approach to arbitrary functions defined on positive reals, leading to a broader class of models we refer to as generalized dual discriminator generative adversarial networks. For each of these proposed models, we provide theoretical analysis and show that the associated min-max optimization reduces to the minimization of a linear combination of an $f$-divergence and a reverse $f$-divergence. This generalizes the known simplification for D2-GANs, where the objective reduces to a linear combination of the KL-divergence and the reverse KL-divergence. Finally, we perform experiments on 2D synthetic data and use multiple performance metrics to capture various advantages of our GANs.

Updated: 2025-07-23 16:46:03

标题: 广义双判别器生成对抗网络

摘要: 双鉴别器生成对抗网络（D2 GANs）被引入以减轻生成对抗网络中模式坍塌的问题。在D2 GANs中，除了一个生成器外，还有两个鉴别器：一个鉴别器为来自真实数据分布的样本奖励高分，而另一个鉴别器偏爱来自生成器的样本。在这项工作中，我们首先介绍双鉴别器α-GANs（D2 α-GANs），它将双鉴别器的优势与可调节损失函数α-损失的灵活性相结合。我们进一步将这种方法推广到在正实数上定义的任意函数，从而导致一个更广泛的模型类，我们将其称为广义双鉴别器生成对抗网络。对于这些提出的模型中的每一个，我们提供理论分析，并展示相关的极小极大优化归结为对$f$-散度和反向$f$-散度的线性组合的最小化。这推广了D2-GANs的已知简化，其中目标简化为KL散度和反向KL散度的线性组合。最后，我们在二维合成数据上进行实验，并使用多个性能指标来捕捉我们的GANs的各种优势。

更新时间: 2025-07-23 16:46:03

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2507.17684v1

RAPID-Net: Accurate Pocket Identification for Binding-Site-Agnostic Docking

Accurate identification of druggable pockets and their features is essential for structure-based drug design and effective downstream docking. Here, we present RAPID-Net, a deep learning-based algorithm designed for the accurate prediction of binding pockets and seamless integration with docking pipelines. On the PoseBusters benchmark, RAPID-Net-guided AutoDock Vina achieves 54.9% of Top-1 poses with RMSD < 2 A and satisfying the PoseBusters chemical-validity criterion, compared to 49.1% for DiffBindFR. On the most challenging time split of PoseBusters aiming to assess generalization ability (structures submitted after September 30, 2021), RAPID-Net-guided AutoDock Vina achieves 53.1% of Top-1 poses with RMSD < 2 A and PB-valid, versus 59.5% for AlphaFold 3. Notably, in 92.2% of cases, RAPID-Net-guided Vina samples at least one pose with RMSD < 2 A (regardless of its rank), indicating that pose ranking, rather than sampling, is the primary accuracy bottleneck. The lightweight inference, scalability, and competitive accuracy of RAPID-Net position it as a viable option for large-scale virtual screening campaigns. Across diverse benchmark datasets, RAPID-Net outperforms other pocket prediction tools, including PUResNet and Kalasanty, in both docking accuracy and pocket-ligand intersection rates. Furthermore, we demonstrate the potential of RAPID-Net to accelerate the development of novel therapeutics by highlighting its performance on pharmacologically relevant targets. RAPID-Net accurately identifies distal functional sites, offering new opportunities for allosteric inhibitor design. In the case of the RNA-dependent RNA polymerase of SARS-CoV-2, RAPID-Net uncovers a wider array of potential binding pockets than existing predictors, which typically annotate only the orthosteric pocket and overlook secondary cavities.

Updated: 2025-07-23 16:44:22

标题: RAPID-Net：用于结合位点不可知对接的准确口袋识别

摘要: 准确识别可用于药物设计的口袋及其特征对基于结构的药物设计和有效的下游对接至关重要。在这里，我们介绍了RAPID-Net，这是一种基于深度学习的算法，旨在准确预测结合口袋并与对接管线无缝集成。在PoseBusters基准测试中，RAPID-Net引导的AutoDock Vina实现了54.9%的RMSD < 2 A的Top-1位姿，并满足PoseBusters的化学有效性标准，而DiffBindFR为49.1%。在PoseBusters最具挑战性的时间切分中，旨在评估泛化能力（2021年9月30日后提交的结构），RAPID-Net引导的AutoDock Vina实现了53.1%的RMSD < 2 A的Top-1位姿和PB有效，而AlphaFold 3为59.5%。值得注意的是，在92.2%的情况下，RAPID-Net引导的Vina样本至少有一个RMSD < 2 A的位姿（不考虑其排名），表明位姿排名而不是采样是主要的准确性瓶颈。RAPID-Net的轻量级推断、可扩展性和竞争性准确性将其定位为大规模虚拟筛选活动的可行选择。在各种基准数据集中，RAPID-Net在对接准确性和口袋-配体相交率方面优于其他口袋预测工具，包括PUResNet和Kalasanty。此外，我们展示了RAPID-Net加速新型治疗药物开发的潜力，通过突显其在药理学相关靶点上的性能。RAPID-Net准确识别了远端功能位点，为变构抑制剂设计提供了新的机会。以SARS-CoV-2的RNA依赖性RNA聚合酶为例，RAPID-Net发现了比现有预测器更广泛的潜在结合口袋，后者通常只注释正交口袋并忽略次级腔。

更新时间: 2025-07-23 16:44:22

领域: q-bio.BM,cs.AI,cs.LG,physics.bio-ph,physics.med-ph

下载: http://arxiv.org/abs/2502.02371v2

Simulating multiple human perspectives in socio-ecological systems using large language models

Understanding socio-ecological systems requires insights from diverse stakeholder perspectives, which are often hard to access. To enable alternative, simulation-based exploration of different stakeholder perspectives, we develop the HoPeS (Human-Oriented Perspective Shifting) modelling framework. HoPeS employs agents powered by large language models (LLMs) to represent various stakeholders; users can step into the agent roles to experience perspectival differences. A simulation protocol serves as a "scaffold" to streamline multiple perspective-taking simulations, supporting users in reflecting on, transitioning between, and integrating across perspectives. A prototype system is developed to demonstrate HoPeS in the context of institutional dynamics and land use change, enabling both narrative-driven and numerical experiments. In an illustrative experiment, a user successively adopts the perspectives of a system observer and a researcher - a role that analyses data from the embedded land use model to inform evidence-based decision-making for other LLM agents representing various institutions. Despite the user's effort to recommend technically sound policies, discrepancies persist between the policy recommendation and implementation due to stakeholders' competing advocacies, mirroring real-world misalignment between researcher and policymaker perspectives. The user's reflection highlights the subjective feelings of frustration and disappointment as a researcher, especially due to the challenge of maintaining political neutrality while attempting to gain political influence. Despite this, the user exhibits high motivation to experiment with alternative narrative framing strategies, suggesting the system's potential in exploring different perspectives. Further system and protocol refinement are likely to enable new forms of interdisciplinary collaboration in socio-ecological simulations.

Updated: 2025-07-23 16:42:51

标题: 使用大型语言模型在社会生态系统中模拟多个人类视角

摘要: 理解社会生态系统需要从不同利益相关者的视角获得见解，而这通常很难获得。为了实现对不同利益相关者视角的替代性、基于模拟的探索，我们开发了HoPeS（面向人的视角转变）建模框架。HoPeS采用由大型语言模型（LLMs）驱动的代理来代表各种利益相关者；用户可以扮演代理角色以体验不同的视角差异。一个模拟协议作为“支架”来简化多角度模拟，支持用户反思、过渡和整合各种视角。开发了一个原型系统来展示HoPeS在制度动态和土地利用变化背景下的应用，实现基于叙事和数值实验。在一个说明性实验中，用户先后扮演了系统观察者和研究者的角色，后者分析嵌入式土地利用模型的数据，为代表各种机构的其他LLM代理提供基于证据的决策支持。尽管用户努力推荐技术上合理的政策，但由于利益相关者的竞争性主张，政策建议与实施之间仍存在差异，反映了研究者和决策者视角之间现实世界的不一致。用户的反思突出了作为研究者的主观感受，尤其是在试图获得政治影响力的同时保持政治中立的挑战。尽管如此，用户表现出极高的动机来尝试不同的叙事框架策略，表明该系统在探索不同视角方面具有潜力。进一步的系统和协议完善可能会促进社会生态模拟中新形式的跨学科合作。

更新时间: 2025-07-23 16:42:51

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2507.17680v1

On the Lipschitz Constant of Deep Networks and Double Descent

Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors -- namely loss landscape curvature and distance of parameters from initialization -- respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective model complexity for networks trained in practice.

Updated: 2025-07-23 16:41:45

标题: 关于深度网络和双下降的利普希茨常数

摘要: 现有关于深度网络泛化误差的界限假定输入变量具有某种平滑或有界依赖性，但未能探究实践中控制这些因素的机制。在这项工作中，我们对经历双重下降的深度网络的经验利普希茨常数进行了广泛的实验研究，并突出显示了与测试误差强烈相关的非单调趋势。通过建立参数空间和输入空间梯度在临界点周围的SGD之间的联系，我们分离出两个重要因素--即损失景观曲率和参数距离初始化的距离--分别控制着临界点周围的优化动态和限制模型函数复杂性，甚至超出训练数据。我们的研究提供了有关通过过度参数化进行隐式正则化以及在实践中训练网络的有效模型复杂性的新见解。

更新时间: 2025-07-23 16:41:45

领域: cs.LG

下载: http://arxiv.org/abs/2301.12309v5

How Should We Meta-Learn Reinforcement Learning Algorithms?

The process of meta-learning algorithms from data, instead of relying on manual design, is growing in popularity as a paradigm for improving the performance of machine learning systems. Meta-learning shows particular promise for reinforcement learning (RL), where algorithms are often adapted from supervised or unsupervised learning despite their suboptimality for RL. However, until now there has been a severe lack of comparison between different meta-learning algorithms, such as using evolution to optimise over black-box functions or LLMs to propose code. In this paper, we carry out this empirical comparison of the different approaches when applied to a range of meta-learned algorithms which target different parts of the RL pipeline. In addition to meta-train and meta-test performance, we also investigate factors including the interpretability, sample cost and train time for each meta-learning algorithm. Based on these findings, we propose several guidelines for meta-learning new RL algorithms which will help ensure that future learned algorithms are as performant as possible.

Updated: 2025-07-23 16:31:38

标题: 我们应该如何元学习强化学习算法？

摘要: 从数据中学习元学习算法的过程，而不是依赖手动设计，作为改进机器学习系统性能的范例，正在变得越来越流行。元学习在强化学习（RL）中表现出特别的潜力，其中算法经常从监督或无监督学习中进行调整，尽管它们对RL来说并不是最优的。然而，到目前为止，不同元学习算法之间的比较严重缺乏，例如使用进化来优化黑盒函数或LLMs来提出代码。在本文中，我们对应用于一系列针对RL流程不同部分的元学习算法时的不同方法进行了实证比较。除了元训练和元测试性能外，我们还调查了每个元学习算法的可解释性、样本成本和训练时间等因素。基于这些发现，我们提出了几条元学习新RL算法的准则，这将有助于确保未来学习的算法尽可能表现出色。

更新时间: 2025-07-23 16:31:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17668v1

Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography

Breast cancer (BC) remains one of the leading causes of cancer-related mortality among women, despite recent advances in Computer-Aided Diagnosis (CAD) systems. Accurate and efficient interpretation of multi-view mammograms is essential for early detection, driving a surge of interest in Artificial Intelligence (AI)-powered CAD models. While state-of-the-art multi-view mammogram classification models are largely based on Transformer architectures, their computational complexity scales quadratically with the number of image patches, highlighting the need for more efficient alternatives. To address this challenge, we propose Mammo-Mamba, a novel framework that integrates Selective State-Space Models (SSMs), transformer-based attention, and expert-driven feature refinement into a unified architecture. Mammo-Mamba extends the MambaVision backbone by introducing the Sequential Mixture of Experts (SeqMoE) mechanism through its customized SecMamba block. The SecMamba is a modified MambaVision block that enhances representation learning in high-resolution mammographic images by enabling content-adaptive feature refinement. These blocks are integrated into the deeper stages of MambaVision, allowing the model to progressively adjust feature emphasis through dynamic expert gating, effectively mitigating the limitations of traditional Transformer models. Evaluated on the CBIS-DDSM benchmark dataset, Mammo-Mamba achieves superior classification performance across all key metrics while maintaining computational efficiency.

Updated: 2025-07-23 16:29:46

标题: Mammo-Mamba：一种用于多视角乳腺X线摄影的混合状态空间和变压器架构，具有顺序专家混合

摘要: 乳腺癌（BC）仍然是妇女癌症相关死亡的主要原因之一，尽管近年来计算机辅助诊断（CAD）系统取得了进展。准确和高效地解释多视角乳房X线照片对于早期检测至关重要，这引发了对人工智能（AI）驱动的CAD模型的兴趣激增。虽然最先进的多视角乳房X线照片分类模型主要基于变压器架构，但其计算复杂性随图像块数量的平方增长，突显了需要更有效的替代方案。为了解决这一挑战，我们提出了Mammo-Mamba，这是一个将选择性状态空间模型（SSMs）、基于变压器的注意力和专家驱动的特征细化集成到统一架构中的新框架。Mammo-Mamba通过引入其定制的SecMamba块，扩展了MambaVision的骨干，其中包括顺序专家混合（SeqMoE）机制。SecMamba是一种修改后的MambaVision块，通过启用内容自适应特征细化，增强了高分辨率乳房X线照片中的表示学习。这些块被集成到MambaVision的较深阶段，使模型能够通过动态专家门控逐渐调整特征强调，有效地减轻了传统变压器模型的限制。在CBIS-DDSM基准数据集上评估，Mammo-Mamba在所有关键指标上取得了卓越的分类性能，同时保持了计算效率。

更新时间: 2025-07-23 16:29:46

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.17662v1

Rethinking HSM and TPM Security in the Cloud: Real-World Attacks and Next-Gen Defenses

As organizations rapidly migrate to the cloud, the security of cryptographic key management has become a growing concern. Hardware Security Modules (HSMs) and Trusted Platform Modules (TPMs), traditionally seen as the gold standard for securing encryption keys and digital trust, are increasingly challenged by cloud-native threats. Real-world breaches have exposed weaknesses in cloud deployments, including misconfigurations, API abuse, and privilege escalations, allowing attackers to access sensitive key material and bypass protections. These incidents reveal that while the hardware remains secure, the surrounding cloud ecosystem introduces systemic vulnerabilities. This paper analyzes notable security failures involving HSMs and TPMs, identifies common attack vectors, and questions longstanding assumptions about their effectiveness in distributed environments. We explore alternative approaches such as confidential computing, post-quantum cryptography, and decentralized key management. Our findings highlight that while HSMs and TPMs still play a role, modern cloud security requires more adaptive, layered architectures. By evaluating both current weaknesses and emerging models, this research equips cloud architects and security engineers with strategies to reinforce cryptographic trust in the evolving threat landscape.

Updated: 2025-07-23 16:18:16

标题: 重新思考云中的HSM和TPM安全性：现实世界的攻击和下一代防御

摘要: 随着组织迅速迁移到云端，加密密钥管理的安全性已成为一个日益关注的问题。硬件安全模块（HSMs）和可信平台模块（TPMs），传统上被视为保护加密密钥和数字信任的黄金标准，越来越受到云原生威胁的挑战。现实世界中的数据泄露事件暴露了云部署中的弱点，包括配置错误、API 滥用和权限升级，使攻击者能够访问敏感密钥材料并绕过保护措施。这些事件揭示了虽然硬件本身仍然安全，但周围的云生态系统引入了系统性的漏洞。本文分析了涉及 HSMs 和 TPMs 的显著安全失败，确定了常见的攻击向量，并质疑了它们在分布式环境中的有效性的长期假设。我们探讨了诸如保密计算、后量子密码学和分散密钥管理等替代方法。我们的研究结果强调，虽然 HSMs 和 TPMs 仍然发挥作用，但现代云安全需要更具适应性、分层的架构。通过评估当前的弱点和新兴模型，这项研究为云架构师和安全工程师提供了在不断演变的威胁环境中加强加密信任的策略。

更新时间: 2025-07-23 16:18:16

领域: cs.CR,cs.NI,cs.SE,C.2.4; D.4.6; E.3; E.5; K.6.5

下载: http://arxiv.org/abs/2507.17655v1

XStacking: Explanation-Guided Stacked Ensemble Learning

Ensemble Machine Learning (EML) techniques, especially stacking, have been shown to improve predictive performance by combining multiple base models. However, they are often criticized for their lack of interpretability. In this paper, we introduce XStacking, an effective and inherently explainable framework that addresses this limitation by integrating dynamic feature transformation with model-agnostic Shapley additive explanations. This enables stacked models to retain their predictive accuracy while becoming inherently explainable. We demonstrate the effectiveness of the framework on 29 datasets, achieving improvements in both the predictive effectiveness of the learning space and the interpretability of the resulting models. XStacking offers a practical and scalable solution for responsible ML.

Updated: 2025-07-23 16:14:48

标题: XStacking: 解释引导的堆叠集成学习

摘要: 集成机器学习（EML）技术，特别是堆叠技术，已被证明通过结合多个基础模型可以提高预测性能。然而，它们经常因缺乏可解释性而受到批评。在本文中，我们介绍了XStacking，一种有效且本质上可解释的框架，通过将动态特征转换与模型不可知的Shapley可加性解释相结合，解决了这一限制。这使得堆叠模型能够保持其预测准确性，同时具有可解释性。我们在29个数据集上展示了该框架的有效性，实现了学习空间的预测有效性和结果模型可解释性的改善。XStacking为负责任的机器学习提供了一个实用且可扩展的解决方案。

更新时间: 2025-07-23 16:14:48

领域: cs.LG

下载: http://arxiv.org/abs/2507.17650v1

Closing the Chain: How to reduce your risk of being SolarWinds, Log4j, or XZ Utils

Software supply chain frameworks, such as the US NIST Secure Software Development Framework (SSDF), detail what tasks software development organizations are recommended or mandated to adopt to reduce security risk. However, to further reduce the risk of similar attacks occurring, software organizations benefit from knowing what tasks mitigate attack techniques the attackers are currently using to address specific threats, prioritize tasks, and close mitigation gaps. The goal of this study is to aid software organizations in reducing the risk of software supply chain attacks by systematically synthesizing how framework tasks mitigate the attack techniques used in the SolarWinds, Log4j, and XZ Utils attacks. We qualitatively analyzed 106 Cyber Threat Intelligence (CTI) reports of the 3 attacks to gather the attack techniques. We then systematically constructed a mapping between attack techniques and the 73 tasks enumerated in 10 software supply chain frameworks. Afterward, we established and ranked priority tasks that mitigate attack techniques. The three mitigation tasks with the highest scores are role-based access control, system monitoring, and boundary protection. Additionally, three mitigation tasks were missing from all ten frameworks, including sustainable open-source software and environmental scanning tools. Thus, software products would still be vulnerable to software supply chain attacks even if organizations adopted all recommended tasks.

Updated: 2025-07-23 16:13:13

标题: 打破链条：如何降低成为SolarWinds、Log4j或XZ Utils受害风险

摘要: 软件供应链框架，如美国国家标准与技术研究所安全软件开发框架（SSDF），详细说明了软件开发组织被建议或被强制采用的任务，以减少安全风险。然而，为了进一步降低类似攻击发生的风险，软件组织有益于了解攻击者目前正在使用的攻击技术，以应对特定威胁，优先处理任务，并填补缓解差距。本研究的目标是通过系统地综合框架任务如何缓解SolarWinds、Log4j和XZ Utils攻击中使用的攻击技术，帮助软件组织减少软件供应链攻击的风险。我们对三次攻击的106份网络威胁情报（CTI）报告进行了定性分析，以收集攻击技术。然后，我们系统地构建了攻击技术和10个软件供应链框架中列举的73个任务之间的映射。之后，我们确定并排名优先处理攻击技术的任务。得分最高的三个缓解任务是基于角色的访问控制、系统监控和边界保护。此外，十个框架中缺少三个缓解任务，包括可持续的开源软件和环境扫描工具。因此，即使组织采用了所有建议的任务，软件产品仍然会容易受到软件供应链攻击的影响。

更新时间: 2025-07-23 16:13:13

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2503.12192v2

A Concept-based approach to Voice Disorder Detection

Voice disorders affect a significant portion of the population, and the ability to diagnose them using automated, non-invasive techniques would represent a substantial advancement in healthcare, improving the quality of life of patients. Recent studies have demonstrated that artificial intelligence models, particularly Deep Neural Networks (DNNs), can effectively address this task. However, due to their complexity, the decision-making process of such models often remain opaque, limiting their trustworthiness in clinical contexts. This paper investigates an alternative approach based on Explainable AI (XAI), a field that aims to improve the interpretability of DNNs by providing different forms of explanations. Specifically, this works focuses on concept-based models such as Concept Bottleneck Model (CBM) and Concept Embedding Model (CEM) and how they can achieve performance comparable to traditional deep learning methods, while offering a more transparent and interpretable decision framework.

Updated: 2025-07-23 16:11:44

标题: 一种基于概念的声音障碍检测方法

摘要: 语音障碍影响了大部分人口，利用自动化、无创技术诊断这些障碍将在医疗保健领域取得重大进展，提高患者生活质量。最近的研究表明，人工智能模型，特别是深度神经网络（DNNs），可以有效地解决这一问题。然而，由于其复杂性，这些模型的决策过程通常不透明，限制了它们在临床环境中的可信度。本文研究了一种基于可解释人工智能（XAI）的替代方法，该领域旨在通过提供不同形式的解释来提高DNNs的可解释性。具体来说，本研究侧重于基于概念的模型，如概念瓶颈模型（CBM）和概念嵌入模型（CEM），以及它们如何实现与传统深度学习方法相当的性能，同时提供更透明和可解释的决策框架。

更新时间: 2025-07-23 16:11:44

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2507.17799v1

A Concept-based approach to Voice Disorder Detection

Updated: 2025-07-23 16:11:44

标题: 一个基于概念的声音障碍检测方法

摘要: 语音障碍影响着大部分人口，利用自动化、非侵入性技术进行诊断将代表着医疗保健领域的重大进步，有助于提高患者的生活质量。最近的研究表明，人工智能模型，特别是深度神经网络（DNNs），可以有效地解决这一问题。然而，由于其复杂性，这些模型的决策过程往往不透明，限制了它们在临床环境中的可信度。本文研究了一种基于可解释人工智能（XAI）的替代方法，该领域旨在通过提供不同形式的解释来改进DNNs的可解释性。具体来说，本作品关注基于概念的模型，如概念瓶颈模型（CBM）和概念嵌入模型（CEM），以及它们如何实现与传统深度学习方法相当的性能，同时提供更透明和可解释的决策框架。

更新时间: 2025-07-23 16:11:44

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2507.17799v1

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks. However, it still remains an open question whether such strategies can be applied to verifying and reinforcing image generation scenarios. In this paper, we provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation. We focus on three techniques: scaling test-time computation for verification, aligning model preferences with Direct Preference Optimization (DPO), and integrating these techniques for complementary effects. Our results demonstrate that these approaches can be effectively adapted and combined to significantly improve image generation performance. Furthermore, given the pivotal role of reward models in our findings, we propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation. PARM adaptively assesses each generation step through a potential assessment approach, merging the strengths of existing reward models, and PARM++ further introduces a reflection mechanism to self-correct the generated unsatisfactory image, which is the first to incorporate reflection in autoregressive image generation. Using our investigated reasoning strategies, we enhance a baseline model, Show-o, to achieve superior results, with a significant +24% improvement on the GenEval benchmark, surpassing Stable Diffusion 3 by +15%. We hope our study provides unique insights and paves a new path for integrating CoT reasoning with autoregressive image generation. Code and models are released at https://github.com/ZiyuGuo99/Image-Generation-CoT

Updated: 2025-07-23 16:09:10

标题: 我们能用CoT生成图像吗？让我们逐步验证和强化图像生成

摘要: 连锁思维（CoT）推理在大型模型中得到了广泛探讨，以应对复杂的理解任务。然而，目前仍然存在一个问题，即这种策略是否可以应用于验证和加强图像生成场景。在本文中，我们首次全面调查了CoT推理在增强自回归图像生成方面的潜力。我们关注三种技术：在验证过程中扩展测试时间计算、将模型偏好与直接偏好优化（DPO）对齐，以及将这些技术整合以实现互补效果。我们的结果表明，这些方法可以被有效地调整和结合，显著提高图像生成性能。此外，鉴于奖励模型在我们的研究结果中的关键作用，我们提出了专门针对自回归图像生成的潜在评估奖励模型（PARM）和PARM++。PARM通过潜在评估方法自适应地评估每个生成步骤，融合了现有奖励模型的优势，而PARM++进一步引入了一种反思机制，以自我校正生成的不满意图像，这是首次在自回归图像生成中融入反思。利用我们调查的推理策略，我们增强了基准模型Show-o，取得了优越的结果，在GenEval基准测试中实现了显著的+24%的改进，超过了Stable Diffusion 3的+15%。希望我们的研究提供了独特的见解，并为将CoT推理与自回归图像生成相结合铺平了一条新道路。代码和模型发布在https://github.com/ZiyuGuo99/Image-Generation-CoT。

更新时间: 2025-07-23 16:09:10

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2501.13926v2

WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training

Recent advances in learning rate (LR) scheduling have demonstrated the effectiveness of decay-free approaches that eliminate the traditional decay phase while maintaining competitive performance. Model merging techniques have emerged as particularly promising solutions in this domain. We present Warmup-Stable and Merge (WSM), a general framework that establishes a formal connection between learning rate decay and model merging. WSM provides a unified theoretical foundation for emulating various decay strategies-including cosine decay, linear decay and inverse square root decay-as principled model averaging schemes, while remaining fully compatible with diverse optimization methods. Through extensive experiments, we identify merge duration-the training window for checkpoint aggregation-as the most critical factor influencing model performance, surpassing the importance of both checkpoint interval and merge quantity. Our framework consistently outperforms the widely-adopted Warmup-Stable-Decay (WSD) approach across multiple benchmarks, achieving significant improvements of +3.5% on MATH, +2.9% on HumanEval, and +5.5% on MMLU-Pro. The performance advantages extend to supervised fine-tuning scenarios, highlighting WSM's potential for long-term model refinement.

Updated: 2025-07-23 16:02:06

标题: WSM：通过检查点合并实现LLM预训练的无衰减学习率调度

摘要: 最近关于学习率（LR）调度的进展表明，无衰减方法的有效性已得到证实，这些方法消除了传统的衰减阶段，同时保持了竞争性能。模型合并技术已经成为这一领域特别有前景的解决方案。我们提出了Warmup-Stable and Merge（WSM），这是一个建立学习率衰减和模型合并之间正式联系的通用框架。WSM为模拟各种衰减策略（包括余弦衰减、线性衰减和倒数平方根衰减）作为有原则的模型平均方案提供了统一的理论基础，同时与各种优化方法完全兼容。通过大量实验，我们发现合并持续时间——用于检查点聚合的训练窗口——是影响模型性能最关键的因素，超过了检查点间隔和合并数量的重要性。我们的框架在多个基准测试中始终优于广泛采用的Warmup-Stable-Decay（WSD）方法，实现了+3.5%的MATH、+2.9%的HumanEval和+5.5%的MMLU-Pro显著改进。性能优势延伸到监督微调场景，突显了WSM对长期模型精炼的潜力。

更新时间: 2025-07-23 16:02:06

领域: cs.CL,cs.LG,I.2.7

下载: http://arxiv.org/abs/2507.17634v1

Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods

We study stochastic Cubic Newton methods for solving general possibly non-convex minimization problems. We propose a new framework, which we call the helper framework, that provides a unified view of the stochastic and variance-reduced second-order algorithms equipped with global complexity guarantees. It can also be applied to learning with auxiliary information. Our helper framework offers the algorithm designer high flexibility for constructing and analyzing the stochastic Cubic Newton methods, allowing arbitrary size batches, and the use of noisy and possibly biased estimates of the gradients and Hessians, incorporating both the variance reduction and the lazy Hessian updates. We recover the best-known complexities for the stochastic and variance-reduced Cubic Newton, under weak assumptions on the noise. A direct consequence of our theory is the new lazy stochastic second-order method, which significantly improves the arithmetic complexity for large dimension problems. We also establish complexity bounds for the classes of gradient-dominated objectives, that include convex and strongly convex problems. For Auxiliary Learning, we show that using a helper (auxiliary function) can outperform training alone if a given similarity measure is small.

Updated: 2025-07-23 16:01:17

标题: 随机和方差减少的三次牛顿法统一收敛理论

摘要: 我们研究了用于解决一般可能非凸最小化问题的随机立方牛顿方法。我们提出了一个新的框架，我们称之为辅助框架，它提供了具有全局复杂性保证的随机和降低方差的二阶算法的统一视图。它还可以应用于具有辅助信息的学习。我们的辅助框架为算法设计者提供了高度灵活性，用于构建和分析随机立方牛顿方法，允许任意大小的批处理，并使用嘈杂和可能有偏差的梯度和Hessian估计，结合了方差减少和惰性Hessian更新。在噪声条件下，我们恢复了随机和降低方差的立方牛顿的最佳已知复杂度。我们理论的一个直接结果是新的惰性随机二阶方法，可以显著改进大维问题的算术复杂度。我们还建立了梯度主导目标类的复杂度界限，其中包括凸和强凸问题。对于辅助学习，我们表明，如果给定相似性度量较小，则使用辅助（辅助函数）可以优于单独训练。

更新时间: 2025-07-23 16:01:17

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2302.11962v5

Toward a Lightweight and Robust Design for Caching

The online caching problem aims to minimize cache misses when serving a sequence of requests under a limited cache size. While naive learning-augmented caching algorithms achieve ideal $1$-consistency, they lack robustness guarantees. Existing robustification methods either sacrifice $1$-consistency or introduce significant computational overhead. In this paper, we introduce Guard, a lightweight robustification framework that enhances the robustness of a broad class of learning-augmented caching algorithms to $2H_k + 2$, while preserving their $1$-consistency. Guard achieves the current best-known trade-off between consistency and robustness, with only $O(1)$ additional per-request overhead, thereby maintaining the original time complexity of the base algorithm. Extensive experiments across multiple real-world datasets and prediction models validate the effectiveness of Guard in practice.

Updated: 2025-07-23 15:59:38

标题: 朝向轻量化和稳健的缓存设计

摘要: 在线缓存问题旨在在有限的缓存大小下提供一系列请求时最小化缓存未命中。虽然朴素的学习增强缓存算法实现了理想的1一致性，但它们缺乏稳健性保证。现有的稳健化方法要么牺牲1一致性，要么引入显着的计算开销。在本文中，我们介绍了Guard，一个轻量级的稳健化框架，将广泛类别的学习增强缓存算法的稳健性增强到2H_k + 2，同时保持它们的1一致性。Guard实现了目前已知的在一致性和稳健性之间的最佳权衡，仅具有每个请求的O(1)额外开销，从而保持了基础算法的原始时间复杂度。对多个真实数据集和预测模型进行的大量实验证实了Guard的有效性。

更新时间: 2025-07-23 15:59:38

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2507.16242v2

Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder

Text-to-Image (T2I) Diffusion Models have achieved remarkable performance in generating high quality images. However, enabling precise control of continuous attributes, especially multiple attributes simultaneously, in a new domain (e.g., numeric values like eye openness or car width) with text-only guidance remains a significant challenge. To address this, we introduce the Attribute (Att) Adapter, a novel plug-and-play module designed to enable fine-grained, multi-attributes control in pretrained diffusion models. Our approach learns a single control adapter from a set of sample images that can be unpaired and contain multiple visual attributes. The Att-Adapter leverages the decoupled cross attention module to naturally harmonize the multiple domain attributes with text conditioning. We further introduce Conditional Variational Autoencoder (CVAE) to the Att-Adapter to mitigate overfitting, matching the diverse nature of the visual world. Evaluations on two public datasets show that Att-Adapter outperforms all LoRA-based baselines in controlling continuous attributes. Additionally, our method enables a broader control range and also improves disentanglement across multiple attributes, surpassing StyleGAN-based techniques. Notably, Att-Adapter is flexible, requiring no paired synthetic data for training, and is easily scalable to multiple attributes within a single model.

Updated: 2025-07-23 15:56:25

标题: Att-Adapter：一种通过条件变分自动编码器实现的稳健且精确的特定领域多属性T2I扩散适配器

摘要: 文本到图像（T2I）扩散模型在生成高质量图像方面取得了显著的表现。然而，在新领域（例如，数字值，如眼睛睁开程度或汽车宽度）中实现对连续属性的精确控制，尤其是同时控制多个属性，仅凭文本指导仍然是一个重大挑战。为了解决这个问题，我们引入了属性（Att）适配器，这是一个新颖的即插即用模块，旨在在预训练的扩散模型中实现细粒度、多属性的控制。我们的方法通过从一组可能不成对且包含多个视觉属性的样本图像中学习一个单一的控制适配器来实现这一目标。Att-Adapter利用解耦的交叉注意力模块，自然地将多个领域属性与文本条件进行协调。我们进一步引入了条件变分自编码器（CVAE）到Att-Adapter中，以减轻过拟合问题，匹配视觉世界的多样性。在两个公共数据集上的评估结果显示，Att-Adapter在控制连续属性方面优于所有基于LoRA的基线方法。此外，我们的方法扩展了更广泛的控制范围，并改善了跨多个属性的解耦，超越了基于StyleGAN的技术。值得注意的是，Att-Adapter具有灵活性，无需配对的合成数据进行训练，并且可以轻松扩展到单个模型中的多个属性。

更新时间: 2025-07-23 15:56:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11937v3

Quantifying the ROI of Cyber Threat Intelligence: A Data-Driven Approach

The valuation of Cyber Threat Intelligence (CTI) remains a persistent challenge due to the problem of negative evidence: successful threat prevention results in non-events that generate minimal observable financial impact, making CTI expenditures difficult to justify within traditional cost-benefit frameworks. This study introduces a data-driven methodology for quantifying the return on investment (ROI) of CTI, thereby reframing it as a measurable contributor to risk mitigation. The proposed framework extends established models in security economics, including the Gordon-Loeb and FAIR models, to account for CTI's complex influence on both the probability of security breaches and the severity of associated losses. The framework is operationalized through empirically grounded performance indicators, such as reductions in mean time to detect (MTTD), mean time to respond (MTTR), and adversary dwell time, supported by three sector-specific case studies in finance, healthcare, and retail. To address limitations in conventional linear assessment methodologies, the Threat Intelligence Effectiveness Index (TIEI) is introduced as a composite metric based on a weighted geometric mean. TIEI penalizes underperformance across critical dimensions: quality, enrichment, integration, and operational impact; thereby capturing bottleneck effect where the least effective component limits overall performance. By integrating financial quantification, adversarial coverage, and qualitative assessments of business enablement, the proposed hybrid model converts negative evidence into a justifiable ROI explanation. This approach offers a replicable means of repositioning CTI from an expense to a strategic investment, enabling informed decision-making and continuous optimization across diverse organizational contexts.

Updated: 2025-07-23 15:54:56

标题: 量化网络威胁情报的投资回报率：一种数据驱动的方法

摘要: 网络威胁情报（CTI）的评估仍然是一个持久的挑战，这是由于负面证据的问题：成功的威胁预防导致了几乎没有可观的财务影响的非事件，使得在传统的成本效益框架内很难证明CTI支出的合理性。本研究引入了一种基于数据的方法，用于量化CTI的投资回报（ROI），从而将其重新构建为风险缓解的可衡量贡献者。提出的框架扩展了安全经济学中已建立的模型，包括Gordon-Loeb和FAIR模型，以考虑CTI对安全漏洞的概率和相关损失严重性的复杂影响。该框架通过基于经验的绩效指标操作化，如减少检测时间（MTTD）、平均响应时间（MTTR）和对手滞留时间，在金融、医疗保健和零售等三个特定领域的案例研究的支持下。为了解决传统线性评估方法的局限性，引入了威胁情报有效性指数（TIEI），这是一种基于加权几何平均值的综合指标。TIEI对关键维度的表现不佳进行惩罚：质量、丰富度、整合和操作影响，从而捕捉到最不有效的组件限制整体表现的瓶颈效应。通过整合财务量化、对手覆盖范围和业务启用的定性评估，提出的混合模型将负面证据转化为一个合理的ROI解释。这种方法提供了一种可复制的手段，将CTI从一种开支转变为战略投资，实现了在不同组织背景下的知情决策和持续优化。

更新时间: 2025-07-23 15:54:56

领域: cs.CR

下载: http://arxiv.org/abs/2507.17628v1

Machine Learning Classification and Portfolio Allocation: with Implications from Machine Uncertainty

We use multi-class machine learning classifiers to identify the stocks that outperform or underperform other stocks. The resulting long-short portfolios achieve annual Sharpe ratios of 1.67 (value-weighted) and 3.35 (equal-weighted), with annual alphas ranging from 29\% to 48\%. These results persist after controlling for machine learning regressions and remain robust among large-cap stocks. Machine uncertainty, as measured by predicted probabilities, impairs the prediction performance. Stocks with higher machine uncertainty experience lower returns, particularly when human proxies of information uncertainty align with machine uncertainty. Consistent with the literature, such an effect is driven by the past underperformers.

Updated: 2025-07-23 15:52:55

标题: 机器学习分类和投资组合配置：来自机器不确定性的启示

摘要: 我们使用多类机器学习分类器来识别表现优异或表现不佳的股票。由此产生的多头-空头投资组合的年夏普比率为1.67（按市值加权）和3.35（等权重），年阿尔法值在29%至48%之间。在控制机器学习回归后，这些结果持续存在，并在大市值股票中保持稳健。机器不确定性，如预测概率所衡量的那样，会损害预测绩效。具有更高机器不确定性的股票经历较低的回报，特别是当人类信息不确定性的代理与机器不确定性一致时。与文献一致，这种效应是由过去表现不佳的股票驱动的。

更新时间: 2025-07-23 15:52:55

领域: q-fin.GN,cs.LG,econ.GN,q-fin.CP,q-fin.EC,q-fin.PM

下载: http://arxiv.org/abs/2108.02283v2

Machine Learning Classification and Portfolio Allocation: with Implications from Machine Uncertainty

Updated: 2025-07-23 15:52:55

标题: 机器学习分类和投资组合配置：来自机器不确定性的启示

摘要: 我们使用多类机器学习分类器来识别那些表现优异或者表现不佳的股票。由此产生的多空投资组合的年夏普比率为1.67（按市值加权）和3.35（等权重），年alpha值在29%到48%之间。这些结果在控制机器学习回归后仍然持续存在，并且在大市值股票中仍然稳健。机器不确定性，通过预测概率衡量，会影响预测表现。具有较高机器不确定性的股票会经历较低的回报，特别是当人类信息不确定性的代理与机器不确定性一致时。与文献一致，这种效应是由过去的表现不佳者驱动的。

更新时间: 2025-07-23 15:52:55

领域: q-fin.GN,cs.LG,econ.GN,q-fin.CP,q-fin.EC,q-fin.PM

下载: http://arxiv.org/abs/2108.02283v2

Multi-Level Explanations for Generative Language Models

Despite the increasing use of large language models (LLMs) for context-grounded tasks like summarization and question-answering, understanding what makes an LLM produce a certain response is challenging. We propose Multi-Level Explanations for Generative Language Models (MExGen), a technique to provide explanations for context-grounded text generation. MExGen assigns scores to parts of the context to quantify their influence on the model's output. It extends attribution methods like LIME and SHAP to LLMs used in context-grounded tasks where (1) inference cost is high, (2) input text is long, and (3) the output is text. We conduct a systematic evaluation, both automated and human, of perturbation-based attribution methods for summarization and question answering. The results show that our framework can provide more faithful explanations of generated output than available alternatives, including LLM self-explanations. We open-source code for MExGen as part of the ICX360 toolkit: https://github$.$com/IBM/ICX360.

Updated: 2025-07-23 15:48:23

标题: 生成语言模型的多层解释Explanation for Generative Language Models

摘要: 尽管越来越多地使用大型语言模型（LLMs）进行上下文相关任务，如摘要和问答，但理解LLM生成特定响应的原因是具有挑战性的。我们提出了一种为生成式语言模型（LLMs）提供解释的多级解释技术（MExGen），用于上下文相关文本生成。 MExGen对上下文的各个部分进行评分，以量化它们对模型输出的影响。它扩展了像LIME和SHAP这样的归因方法，用于上下文相关任务中的LLMs，其中（1）推理成本高，（2）输入文本很长，（3）输出是文本。我们对基于扰动的归因方法进行了系统评估，包括自动化和人工评估，用于摘要和问答。结果表明，我们的框架可以提供比现有替代方案更忠实的生成输出解释，包括LLM自解释。我们将MExGen的开源代码作为ICX360工具包的一部分发布在GitHub上：https://github$.$com/IBM/ICX360。

更新时间: 2025-07-23 15:48:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.14459v2

Vision Transformer attention alignment with human visual perception in aesthetic object evaluation

Visual attention mechanisms play a crucial role in human perception and aesthetic evaluation. Recent advances in Vision Transformers (ViTs) have demonstrated remarkable capabilities in computer vision tasks, yet their alignment with human visual attention patterns remains underexplored, particularly in aesthetic contexts. This study investigates the correlation between human visual attention and ViT attention mechanisms when evaluating handcrafted objects. We conducted an eye-tracking experiment with 30 participants (9 female, 21 male, mean age 24.6 years) who viewed 20 artisanal objects comprising basketry bags and ginger jars. Using a Pupil Labs eye-tracker, we recorded gaze patterns and generated heat maps representing human visual attention. Simultaneously, we analyzed the same objects using a pre-trained ViT model with DINO (Self-DIstillation with NO Labels), extracting attention maps from each of the 12 attention heads. We compared human and ViT attention distributions using Kullback-Leibler divergence across varying Gaussian parameters (sigma=0.1 to 3.0). Statistical analysis revealed optimal correlation at sigma=2.4 +-0.03, with attention head #12 showing the strongest alignment with human visual patterns. Significant differences were found between attention heads, with heads #7 and #9 demonstrating the greatest divergence from human attention (p< 0.05, Tukey HSD test). Results indicate that while ViTs exhibit more global attention patterns compared to human focal attention, certain attention heads can approximate human visual behavior, particularly for specific object features like buckles in basketry items. These findings suggest potential applications of ViT attention mechanisms in product design and aesthetic evaluation, while highlighting fundamental differences in attention strategies between human perception and current AI models.

Updated: 2025-07-23 15:47:34

标题: 用视觉变换器注意力对齐人类视觉感知在审美对象评估中的应用

摘要: 视觉注意机制在人类感知和审美评价中起着至关重要的作用。最近在视觉Transformer（ViTs）方面取得的进展在计算机视觉任务中展示出了显著的能力，然而它们与人类视觉注意模式的一致性尚未得到充分探讨，特别是在审美背景下。本研究调查了评估手工制品时人类视觉注意和ViT注意机制之间的相关性。我们进行了一项眼动实验，共有30名参与者（9名女性，21名男性，平均年龄24.6岁）观看了包括编织袋和姜罐在内的20个手工制品。使用Pupil Labs眼动仪，我们记录了注视模式并生成了代表人类视觉注意的热图。同时，我们使用一个经过预训练的ViT模型和DINO（无标签的自我蒸馏）分析了相同的物体，从12个注意力头中提取了注意力图。我们通过Kullback-Leibler散度比较了人类和ViT注意分布在不同的高斯参数（sigma=0.1至3.0）下的情况。统计分析显示，在sigma=2.4 +-0.03时具有最佳相关性，其中注意力头＃12显示出与人类视觉模式最强的一致性。发现了注意力头之间的显著差异，其中头＃7和＃9表现出与人类注意力最大的分歧（p<0.05，Tukey HSD检验）。结果表明，虽然与人类的焦点注意相比，ViTs表现出更全局的注意模式，但某些注意头可以近似于人类的视觉行为，特别是对于篮编物品中的扣环等特定物体特征。这些发现表明ViT注意机制在产品设计和审美评价中具有潜在应用，同时突出了人类感知和当前AI模型之间注意策略的基本差异。

更新时间: 2025-07-23 15:47:34

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.17616v1

Time Deep Gradient Flow Method for pricing American options

In this research, we explore neural network-based methods for pricing multidimensional American put options under the BlackScholes and Heston model, extending up to five dimensions. We focus on two approaches: the Time Deep Gradient Flow (TDGF) method and the Deep Galerkin Method (DGM). We extend the TDGF method to handle the free-boundary partial differential equation inherent in American options. We carefully design the sampling strategy during training to enhance performance. Both TDGF and DGM achieve high accuracy while outperforming conventional Monte Carlo methods in terms of computational speed. In particular, TDGF tends to be faster during training than DGM.

Updated: 2025-07-23 15:39:39

标题: 时间深度梯度流方法用于定价美式期权

摘要: 在这项研究中，我们探索了基于神经网络的方法来定价多维度的美式看跌期权，在BlackScholes和Heston模型下，扩展至五个维度。我们专注于两种方法：时间深度梯度流（TDGF）方法和深度Galerkin方法（DGM）。我们将TDGF方法扩展到处理美式期权中固有的自由边界偏微分方程。我们在训练过程中精心设计采样策略以提高性能。在计算速度方面，TDGF和DGM均能实现高准确性，同时在超越传统的蒙特卡洛方法方面表现出色。特别是，在训练过程中，TDGF往往比DGM更快。

更新时间: 2025-07-23 15:39:39

领域: q-fin.CP,cs.LG,math.PR,q-fin.MF,91G20, 91G60, 68T07

下载: http://arxiv.org/abs/2507.17606v1

Trusted Multi-view Learning under Noisy Supervision

Multi-view learning methods often focus on improving decision accuracy while neglecting the decision uncertainty, which significantly restricts their applications in safety-critical scenarios. To address this, trusted multi-view learning methods estimate prediction uncertainties by learning class distributions from each instance. However, these methods heavily rely on high quality ground-truth labels. This motivates us to delve into a new problem: how to develop a reliable multi-view learning model under the guidance of noisy labels? We propose the Trusted Multi view Noise Refining (TMNR) method to address this challenge by modeling label noise arising from low-quality data features and easily-confused classes. TMNR employs evidential deep neural networks to construct view-specific opinions that capture both beliefs and uncertainty. These opinions are then transformed through noise correlation matrices to align with the noisy supervision, where matrix elements are constrained by sample uncertainty to reflect label reliability. Furthermore, considering the challenge of jointly optimizing the evidence network and noise correlation matrices under noisy supervision, we further propose Trusted Multi-view Noise Re-Refining (TMNR^2 ), which disentangles this complex co-training problem by establishing different training objectives for distinct modules. TMNR^2 identifies potentially mislabeled samples through evidence-label consistency and generates pseudo-labels from neighboring information. By assigning clean samples to optimize evidential networks and noisy samples to guide noise correlation matrices, respectively, TMNR^2 reduces mapping interference and achieves stabilizes training. Experimental results demonstrate that TMNR^2 significantly outperforms baseline methods, with average accuracy improvements of 7% on datasets with 50% label noise.

Updated: 2025-07-23 15:34:21

标题: 在嘈杂监督下的可信多视图学习

摘要: 多视角学习方法通常专注于提高决策准确性，而忽略了决策不确定性，这在安全关键场景中会显著限制它们的应用。为了解决这个问题，值得信赖的多视角学习方法通过学习每个实例的类分布来估计预测不确定性。然而，这些方法严重依赖高质量的标签。这激励我们探讨一个新问题：如何在嘈杂标签的指导下开发可靠的多视角学习模型？我们提出了Trusted Multi view Noise Refining (TMNR) 方法来应对这一挑战，通过建模源自低质量数据特征和易混淆类别的标签噪声。TMNR采用证据深度神经网络构建特定视角的意见，捕捉信念和不确定性。然后，通过噪声相关矩阵将这些意见转化为与嘈杂监督一致的形式，其中矩阵元素受样本不确定性约束以反映标签可靠性。此外，考虑到在嘈杂监督下联合优化证据网络和噪声相关矩阵的挑战，我们进一步提出了Trusted Multi-view Noise Re-Refining (TMNR^2) 方法，通过为不同模块建立不同的训练目标来解决这个复杂的共训练问题。TMNR^2通过证据-标签一致性识别潜在的误标样本，并从周围信息生成伪标签。通过将干净样本分配给优化证据网络和嘈杂样本用于指导噪声相关矩阵，TMNR^2降低了映射干扰，实现了训练的稳定。实验结果表明，TMNR^2在50%标签噪声的数据集上平均准确率提高了7%，明显优于基准方法。

更新时间: 2025-07-23 15:34:21

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2404.11944v3

Citation Recommendation using Deep Canonical Correlation Analysis

Recent advances in citation recommendation have improved accuracy by leveraging multi-view representation learning to integrate the various modalities present in scholarly documents. However, effectively combining multiple data views requires fusion techniques that can capture complementary information while preserving the unique characteristics of each modality. We propose a novel citation recommendation algorithm that improves upon linear Canonical Correlation Analysis (CCA) methods by applying Deep CCA (DCCA), a neural network extension capable of capturing complex, non-linear relationships between distributed textual and graph-based representations of scientific articles. Experiments on the large-scale DBLP (Digital Bibliography & Library Project) citation network dataset demonstrate that our approach outperforms state-of-the-art CCA-based methods, achieving relative improvements of over 11% in Mean Average Precision@10, 5% in Precision@10, and 7% in Recall@10. These gains reflect more relevant citation recommendations and enhanced ranking quality, suggesting that DCCA's non-linear transformations yield more expressive latent representations than CCA's linear projections.

Updated: 2025-07-23 15:34:07

标题: Citation Recommendation using Deep Canonical Correlation Analysis 利用深度规范相关分析进行引文推荐

摘要: 最近在引文推荐方面取得的进展通过利用多视图表示学习来提高准确性，以整合学术文档中存在的各种模态。然而，有效地结合多个数据视图需要融合技术，可以捕获互补信息同时保留每种模态的独特特性。我们提出了一种新颖的引文推荐算法，通过应用Deep CCA（DCCA）改进了线性典型相关分析（CCA）方法，DCCA是一种能够捕获科学文章的分布式文本和基于图的表示之间复杂非线性关系的神经网络扩展。在大规模DBLP（数字文献和图书馆项目）引文网络数据集上的实验表明，我们的方法优于最先进的基于CCA的方法，在Mean Average Precision@10上实现了超过11%的相对改进，在Precision@10上提高了5%，在Recall@10上提高了7%。这些收益反映了更相关的引文推荐和增强的排名质量，表明DCCA的非线性转换比CCA的线性投影产生更具表现力的潜在表示。

更新时间: 2025-07-23 15:34:07

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2507.17603v1

HyDRA: A Hybrid-Driven Reasoning Architecture for Verifiable Knowledge Graphs

The synergy between symbolic knowledge, often represented by Knowledge Graphs (KGs), and the generative capabilities of neural networks is central to advancing neurosymbolic AI. A primary bottleneck in realizing this potential is the difficulty of automating KG construction, which faces challenges related to output reliability, consistency, and verifiability. These issues can manifest as structural inconsistencies within the generated graphs, such as the formation of disconnected $\textit{isolated islands}$ of data or the inaccurate conflation of abstract classes with specific instances. To address these challenges, we propose HyDRA, a $\textbf{Hy}$brid-$\textbf{D}$riven $\textbf{R}$easoning $\textbf{A}$rchitecture designed for verifiable KG automation. Given a domain or an initial set of documents, HyDRA first constructs an ontology via a panel of collaborative neurosymbolic agents. These agents collaboratively agree on a set of competency questions (CQs) that define the scope and requirements the ontology must be able to answer. Given these CQs, we build an ontology graph that subsequently guides the automated extraction of triplets for KG generation from arbitrary documents. Inspired by design-by-contracts (DbC) principles, our method leverages verifiable contracts as the primary control mechanism to steer the generative process of Large Language Models (LLMs). To verify the output of our approach, we extend beyond standard benchmarks and propose an evaluation framework that assesses the functional correctness of the resulting KG by leveraging symbolic verifications as described by the neurosymbolic AI framework, $\textit{SymbolicAI}$. This work contributes a hybrid-driven architecture for improving the reliability of automated KG construction and the exploration of evaluation methods for measuring the functional integrity of its output. The code is publicly available.

Updated: 2025-07-23 15:32:44

标题: HyDRA：可验证知识图的混合驱动推理架构

摘要: 符号知识（通常由知识图谱（KG）表示）与神经网络的生成能力之间的协同作用对推动神经符号AI的发展至关重要。实现这一潜力的主要瓶颈是自动化知识图谱构建的困难，面临与输出可靠性、一致性和可验证性相关的挑战。这些问题可能表现为生成图中的结构不一致，例如数据的孤立岛的形成或抽象类与具体实例的不准确混淆。为了解决这些挑战，我们提出了HyDRA，一个专为可验证知识图谱自动化设计的混合驱动推理架构。给定一个领域或一组初始文档，HyDRA首先通过一组协作的神经符号代理构建本体论。这些代理共同同意一组定义本体论必须能够回答的能力问题（CQs）。在给定这些CQs的情况下，我们构建一个本体图，随后指导从任意文档中自动提取三元组以生成知识图谱。受设计按合同（DbC）原则的启发，我们的方法利用可验证的合同作为主要控制机制来引导大型语言模型（LLMs）的生成过程。为了验证我们方法的输出，我们超越了标准基准，并提出了一个评估框架，通过利用神经符号AI框架SymbolicAI所描述的符号验证来评估生成知识图谱的功能正确性。这项工作为提高自动化知识图谱构建的可靠性和探索衡量其输出功能完整性的评估方法做出了贡献。代码公开可用。

更新时间: 2025-07-23 15:32:44

领域: cs.LG

下载: http://arxiv.org/abs/2507.15917v2

Wasserstein GAN-Based Precipitation Downscaling with Optimal Transport for Enhancing Perceptual Realism

High-resolution (HR) precipitation prediction is essential for reducing damage from stationary and localized heavy rainfall; however, HR precipitation forecasts using process-driven numerical weather prediction models remains challenging. This study proposes using Wasserstein Generative Adversarial Network (WGAN) to perform precipitation downscaling with an optimal transport cost. In contrast to a conventional neural network trained with mean squared error, the WGAN generated visually realistic precipitation fields with fine-scale structures even though the WGAN exhibited slightly lower performance on conventional evaluation metrics. The learned critic of WGAN correlated well with human perceptual realism. Case-based analysis revealed that large discrepancies in critic scores can help identify both unrealistic WGAN outputs and potential artifacts in the reference data. These findings suggest that the WGAN framework not only improves perceptual realism in precipitation downscaling but also offers a new perspective for evaluating and quality-controlling precipitation datasets.

Updated: 2025-07-23 15:29:34

标题: 使用Wasserstein GAN和最优输运进行降水降尺度以增强感知逼真度

摘要: 高分辨率（HR）降水预测对于减少由静止和局部暴雨造成的损害至关重要；然而，使用基于过程的数值天气预测模型进行HR降水预测仍然具有挑战性。本研究提出使用Wasserstein生成对抗网络（WGAN）进行具有最优传输成本的降水降尺度。与传统的神经网络训练的均方误差相比，尽管WGAN在传统评估指标上表现略低，但生成的降水场具有视觉上逼真的细小结构。WGAN学习的评论家与人类感知的逼真度良好相关。基于案例的分析揭示了评论家评分的差异可以帮助识别不真实的WGAN输出和参考数据中的潜在人为现象。这些发现表明，WGAN框架不仅改善了降水降尺度中的感知逼真度，还为评估和质量控制降水数据集提供了新的视角。

更新时间: 2025-07-23 15:29:34

领域: cs.LG

下载: http://arxiv.org/abs/2507.17798v1

Wasserstein GAN-Based Precipitation Downscaling with Optimal Transport for Enhancing Perceptual Realism

Updated: 2025-07-23 15:29:34

标题: 使用Wasserstein GAN和最优传输进行降水降尺度以增强感知逼真度

摘要: 高分辨率（HR）降水预测对减少因静止和局部强降雨而造成的损害至关重要；然而，使用基于过程的数值天气预报模型进行HR降水预报仍具有挑战性。本研究提出使用Wasserstein生成对抗网络（WGAN）进行带有最优传输成本的降水降尺度。与使用均方误差训练的传统神经网络不同，尽管WGAN在传统评估指标上表现略低，但生成的降水场具有视觉上逼真的细致结构。WGAN学习的评论家与人类感知现实性良好相关。基于案例的分析显示，评论家评分的差异较大有助于识别不真实的WGAN输出和参考数据中的潜在人为痕迹。这些发现表明，WGAN框架不仅提高了降水降尺度中的感知逼真度，还为评估和质量控制降水数据集提供了新的视角。

更新时间: 2025-07-23 15:29:34

领域: cs.LG

下载: http://arxiv.org/abs/2507.17798v1

PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX (Plan from Raw Pixels). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR. PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer (CaRT), a novel module designed to effectively enhance multi-level visual features for more robust planning. We demonstrate through comprehensive experiments that PRIX achieves state-of-the-art performance on the NavSim and nuScenes benchmarks, matching the capabilities of larger, multimodal diffusion planners while being significantly more efficient in terms of inference speed and model size, making it a practical solution for real-world deployment. Our work is open-source and the code will be at https://maxiuw.github.io/prix.

Updated: 2025-07-23 15:28:23

标题: PRIX：从原始像素学习规划以实现端到端自动驾驶

摘要: 尽管端到端自动驾驶模型显示出有希望的结果，但它们在实际部署中常常受到模型大小较大、依赖昂贵的激光雷达传感器和计算密集的BEV特征表示的限制。这限制了它们的可扩展性，特别是对于只配备摄像头的大众市场车辆而言。为了解决这些挑战，我们提出了PRIX（Plan from Raw Pixels）。我们的新颖高效的端到端驾驶架构仅使用摄像头数据运行，无需显式的BEV表示并放弃了对激光雷达的需求。PRIX利用视觉特征提取器结合生成式规划头部直接从原始像素输入中预测安全轨迹。我们架构的核心组件是Context-aware Recalibration Transformer（CaRT），这是一个设计用于有效增强多层次视觉特征以实现更稳健规划的新颖模块。我们通过全面的实验展示，PRIX在NavSim和nuScenes基准测试中实现了最先进的性能，与更大的、多模态扩散规划器的能力相匹配，同时在推理速度和模型大小方面显著更有效，使其成为实际部署的实用解决方案。我们的工作是开源的，代码将在https://maxiuw.github.io/prix 上提供。

更新时间: 2025-07-23 15:28:23

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.17596v1

First, Learn What You Don't Know: Active Information Gathering for Driving at the Limits of Handling

Combining data-driven models that adapt online and model predictive control (MPC) has enabled effective control of nonlinear systems. However, when deployed on unstable systems, online adaptation may not be fast enough to ensure reliable simultaneous learning and control. For example, a controller on a vehicle executing highly dynamic maneuvers--such as drifting to avoid an obstacle--may push the vehicle's tires to their friction limits, destabilizing the vehicle and allowing modeling errors to quickly compound and cause a loss of control. To address this challenge, we present an active information gathering framework for identifying vehicle dynamics as quickly as possible. We propose an expressive vehicle dynamics model that leverages Bayesian last-layer meta-learning to enable rapid online adaptation. The model's uncertainty estimates are used to guide informative data collection and quickly improve the model prior to deployment. Dynamic drifting experiments on a Toyota Supra show that (i) the framework enables reliable control of a vehicle at the edge of stability, (ii) online adaptation alone may not suffice for zero-shot control and can lead to undesirable transient errors or spin-outs, and (iii) active data collection helps achieve reliable performance.

Updated: 2025-07-23 15:24:16

标题: 首先，了解你不知道的：驾驶极限处理能力的积极信息收集

摘要: 将数据驱动模型与在线自适应和模型预测控制（MPC）相结合，已经实现了对非线性系统的有效控制。然而，当应用于不稳定系统时，在线自适应可能不够快速，无法确保可靠的同时学习和控制。例如，在执行高度动态机动的车辆上，如漂移以避开障碍物，控制器可能会将车辆的轮胎推向摩擦限制，使车辆失稳，并导致建模误差快速累积，造成失控。为了解决这一挑战，我们提出了一个主动信息收集框架，以尽快识别车辆动力学。我们提出了一个富有表现力的车辆动力学模型，利用贝叶斯末层元学习来实现快速在线自适应。模型的不确定性估计被用来指导信息收集，并在部署之前快速改进模型。在一辆Toyota Supra上进行的动态漂移实验表明，（i）该框架能够可靠地控制车辆在稳定边缘，（ii）仅依靠在线自适应可能不足以实现零-shot控制，并可能导致不良的瞬态错误或旋转，（iii）主动数据收集有助于实现可靠的性能。

更新时间: 2025-07-23 15:24:16

领域: cs.RO,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.00107v2

Encrypted-State Quantum Compilation Scheme Based on Quantum Circuit Obfuscation

With the rapid advancement of quantum computing, quantum compilation has become a crucial layer connecting high-level algorithms with physical hardware. In quantum cloud computing, compilation is performed on the cloud side, which exposes user circuits to potential risks such as structural leakage and output predictability. To address these issues, we propose the encrypted-state quantum compilation scheme based on quantum circuit obfuscation (ECQCO), the first secure compilation framework tailored for the co-location of compilers and quantum hardware. It applies quantum homomorphic encryption to conceal output states and instantiates a structure obfuscation mechanism based on quantum indistinguishability obfuscation, effectively protecting both functionality and topology of the circuit. Additionally, an adaptive decoupling obfuscation algorithm is designed to suppress potential idle errors while inserting pulse operations. The proposed scheme achieves information-theoretic security and guarantees computational indistinguishability under the quantum random oracle model. Experimental results on benchmark datasets show that ECQCO achieves a TVD of up to 0.7 and a normalized GED of 0.88, enhancing compilation-stage security. Moreover, it introduces only a slight increase in circuit depth, while keeping the average fidelity change within 1%, thus achieving a practical balance between security and efficiency.

Updated: 2025-07-23 15:23:18

标题: 基于量子电路混淆的加密态量子编译方案

摘要: 随着量子计算的快速发展，量子编译已经成为连接高级算法和物理硬件的关键层。在量子云计算中，编译是在云端执行的，这暴露了用户电路面临的潜在风险，如结构泄露和输出可预测性。为了解决这些问题，我们提出了基于量子电路混淆的加密状态量子编译方案（ECQCO），这是第一个专门为编译器和量子硬件共存而设计的安全编译框架。它应用量子同态加密来隐藏输出状态，并实现了基于量子不可区分性混淆的结构混淆机制，有效保护电路的功能性和拓扑结构。此外，设计了一个自适应解耦混淆算法来抑制潜在的空闲错误，同时插入脉冲操作。所提出的方案实现了信息论安全，并在量子随机预言机模型下保证了计算不可区分性。对基准数据集的实验结果显示，ECQCO实现了高达0.7的TVD和0.88的归一化GED，增强了编译阶段的安全性。此外，它仅对电路深度产生轻微增加，同时使平均保真度变化保持在1%以内，因此实现了安全性和效率之间的实际平衡。

更新时间: 2025-07-23 15:23:18

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2507.17589v1

Constructing Optimal Noise Channels for Enhanced Robustness in Quantum Machine Learning

With the rapid advancement of Quantum Machine Learning (QML), the critical need to enhance security measures against adversarial attacks and protect QML models becomes increasingly evident. In this work, we outline the connection between quantum noise channels and differential privacy (DP), by constructing a family of noise channels which are inherently $\epsilon$-DP: $(\alpha, \gamma)$-channels. Through this approach, we successfully replicate the $\epsilon$-DP bounds observed for depolarizing and random rotation channels, thereby affirming the broad generality of our framework. Additionally, we use a semi-definite program to construct an optimally robust channel. In a small-scale experimental evaluation, we demonstrate the benefits of using our optimal noise channel over depolarizing noise, particularly in enhancing adversarial accuracy. Moreover, we assess how the variables $\alpha$ and $\gamma$ affect the certifiable robustness and investigate how different encoding methods impact the classifier's robustness.

Updated: 2025-07-23 15:23:03

标题: 构建优化的噪声通道以增强量子机器学习的鲁棒性

摘要: 随着量子机器学习（QML）的快速发展，加强安全措施以抵御敌对攻击并保护QML模型的关键性需求日益显现。在这项工作中，我们通过构建一系列本质上是$\epsilon$-DP的噪声通道（$\alpha，\gamma$）-通道，概述了量子噪声通道与差分隐私（DP）之间的关系。通过这种方法，我们成功地复制了极化和随机旋转通道观察到的$\epsilon$-DP界限，从而确认了我们框架的广泛普适性。此外，我们使用半定规划来构建一个最优鲁棒通道。在小规模实验评估中，我们展示了使用我们的最优噪声通道相对于极化噪声的益处，特别是在增强敌对准确性方面。此外，我们评估了变量$\alpha$和$\gamma$如何影响可证实的鲁棒性，并调查不同编码方法如何影响分类器的鲁棒性。

更新时间: 2025-07-23 15:23:03

领域: quant-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.16417v2

GenSelect: A Generative Approach to Best-of-N

Generative reward models with parallel sampling have enabled effective test-time scaling for reasoning tasks. Current approaches employ pointwise scoring of individual solutions or pairwise comparisons. However, pointwise methods underutilize LLMs' comparative abilities, while pairwise methods scale inefficiently with larger sampling budgets. We introduce GenSelect, where the LLM uses long reasoning to select the best solution among N candidates. This leverages LLMs' comparative strengths while scaling efficiently across parallel sampling budgets. For math reasoning, we demonstrate that reasoning models, such as QwQ and DeepSeek-R1-0528, excel at GenSelect, outperforming existing scoring approaches with simple prompting.

Updated: 2025-07-23 15:22:51

标题: GenSelect：一种最佳N选择的生成方法

摘要: 生成式奖励模型与并行抽样已经实现了推理任务的有效测试时间扩展。目前的方法采用对单个解决方案的逐点评分或成对比较。然而，逐点方法未充分利用LLMs的比较能力，而成对方法在较大的抽样预算下效率低下。我们引入了GenSelect，其中LLM使用长时间推理来从N个候选中选择最佳解决方案。这利用了LLMs的比较优势，同时在并行抽样预算下高效扩展。对于数学推理，我们演示了推理模型（如QwQ和DeepSeek-R1-0528）在GenSelect上表现出色，优于现有的简单提示评分方法。

更新时间: 2025-07-23 15:22:51

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2507.17797v1

GenSelect: A Generative Approach to Best-of-N

Updated: 2025-07-23 15:22:51

标题: GenSelect：一种生成式的最佳选择方法

摘要: 通过并行抽样的生成式奖励模型，已经实现了推理任务的有效测试时间扩展。当前方法采用逐点评分个别解决方案或成对比较。然而，逐点方法未充分利用LLM的比较能力，而成对方法在较大的抽样预算下效率低下。我们介绍了GenSelect，其中LLM使用长时间推理来从N个候选者中选择最佳解决方案。这利用了LLM的比较优势，同时在并行抽样预算中高效扩展。对于数学推理，我们证明推理模型，如QwQ和DeepSeek-R1-0528，在GenSelect方面表现出色，优于现有的简单提示评分方法。

更新时间: 2025-07-23 15:22:51

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2507.17797v1

SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics

Spatial Transcriptomics (ST) technologies provide biologists with rich insights into single-cell biology by preserving spatial context of cells. Building foundational models for ST can significantly enhance the analysis of vast and complex data sources, unlocking new perspectives on the intricacies of biological tissues. However, modeling ST data is inherently challenging due to the need to extract multi-scale information from tissue slices containing vast numbers of cells. This process requires integrating macro-scale tissue morphology, micro-scale cellular microenvironment, and gene-scale gene expression profile. To address this challenge, we propose SToFM, a multi-scale Spatial Transcriptomics Foundation Model. SToFM first performs multi-scale information extraction on each ST slice, to construct a set of ST sub-slices that aggregate macro-, micro- and gene-scale information. Then an SE(2) Transformer is used to obtain high-quality cell representations from the sub-slices. Additionally, we construct \textbf{SToCorpus-88M}, the largest high-resolution spatial transcriptomics corpus for pretraining. SToFM achieves outstanding performance on a variety of downstream tasks, such as tissue region semantic segmentation and cell type annotation, demonstrating its comprehensive understanding of ST data through capturing and integrating multi-scale information.

Updated: 2025-07-23 15:22:26

标题: SToFM：空间转录组学的多尺度基础模型

摘要: 空间转录组学（ST）技术通过保留细胞的空间上下文，为生物学家提供了对单细胞生物学的丰富见解。为ST构建基础模型可以显著增强对庞大而复杂数据源的分析，解锁对生物组织复杂性的新视角。然而，对ST数据建模在本质上是具有挑战性的，因为需要从包含大量细胞的组织切片中提取多尺度信息。这个过程需要整合宏观尺度组织形态、微观尺度细胞微环境和基因尺度基因表达谱。为了解决这一挑战，我们提出了SToFM，一个多尺度空间转录组学基础模型。SToFM首先在每个ST切片上执行多尺度信息提取，构建一组聚合了宏观、微观和基因尺度信息的ST子切片。然后使用SE(2)变换器从子切片中获得高质量的细胞表示。此外，我们构建了SToCorpus-88M，这是最大的高分辨率空间转录组学语料库，用于预训练。SToFM在各种下游任务中取得了出色的表现，如组织区域语义分割和细胞类型注释，通过捕获和整合多尺度信息展示了其对ST数据的全面理解。

更新时间: 2025-07-23 15:22:26

领域: q-bio.GN,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.11588v2

Enhancing Quantum Federated Learning with Fisher Information-Based Optimization

Federated Learning (FL) has become increasingly popular across different sectors, offering a way for clients to work together to train a global model without sharing sensitive data. It involves multiple rounds of communication between the global model and participating clients, which introduces several challenges like high communication costs, heterogeneous client data, prolonged processing times, and increased vulnerability to privacy threats. In recent years, the convergence of federated learning and parameterized quantum circuits has sparked significant research interest, with promising implications for fields such as healthcare and finance. By enabling decentralized training of quantum models, it allows clients or institutions to collaboratively enhance model performance and outcomes while preserving data privacy. Recognizing that Fisher information can quantify the amount of information that a quantum state carries under parameter changes, thereby providing insight into its geometric and statistical properties. We intend to leverage this property to address the aforementioned challenges. In this work, we propose a Quantum Federated Learning (QFL) algorithm that makes use of the Fisher information computed on local client models, with data distributed across heterogeneous partitions. This approach identifies the critical parameters that significantly influence the quantum model's performance, ensuring they are preserved during the aggregation process. Our research assessed the effectiveness and feasibility of QFL by comparing its performance against other variants, and exploring the benefits of incorporating Fisher information in QFL settings. Experimental results on ADNI and MNIST datasets demonstrate the effectiveness of our approach in achieving better performance and robustness against the quantum federated averaging method.

Updated: 2025-07-23 15:14:53

标题: 用基于费舍尔信息的优化增强量子联邦学习

摘要: 联邦学习（FL）在不同领域日益受欢迎，为客户提供一种共同训练全球模型而不共享敏感数据的方式。它涉及全球模型和参与客户之间多轮通信，引入了诸多挑战，如高通信成本、异构客户数据、延长处理时间和增加隐私威胁。近年来，联邦学习和参数化量子电路的融合引发了重要的研究兴趣，对医疗保健和金融等领域具有有希望的影响。通过实现量子模型的分散训练，它允许客户或机构共同提高模型性能和结果，同时保护数据隐私。认识到费舍尔信息可以量化量子状态在参数改变下携带的信息量，从而提供对其几何和统计特性的洞察。我们打算利用这一属性来解决前述挑战。在本研究中，我们提出了一种利用在分布在异构分区的本地客户模型上计算的费舍尔信息的量子联邦学习（QFL）算法。这种方法识别出显著影响量子模型性能的关键参数，在聚合过程中确保它们得到保留。我们通过将其性能与其他变体进行比较，探索在QFL设置中引入费舍尔信息的好处，评估了QFL的有效性和可行性。在ADNI和MNIST数据集上的实验结果展示了我们的方法在实现更好性能和对抗量子联邦平均方法的鲁棒性方面的有效性。

更新时间: 2025-07-23 15:14:53

领域: cs.LG,cs.AI,cs.DC,cs.ET,quant-ph

下载: http://arxiv.org/abs/2507.17580v1

Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors

One of the most practical and challenging types of black-box adversarial attacks is the hard-label attack, where only the top-1 predicted label is available. One effective approach is to search for the optimal ray direction from the benign image that minimizes the $\ell_p$-norm distance to the adversarial region. The unique advantage of this approach is that it transforms the hard-label attack into a continuous optimization problem. The objective function value is the ray's radius, which can be obtained via binary search at a high query cost. Existing methods use a "sign trick" in gradient estimation to reduce the number of queries. In this paper, we theoretically analyze the quality of this gradient estimation and propose a novel prior-guided approach to improve ray search efficiency both theoretically and empirically. Specifically, we utilize the transfer-based priors from surrogate models, and our gradient estimators appropriately integrate them by approximating the projection of the true gradient onto the subspace spanned by these priors and random directions, in a query-efficient manner. We theoretically derive the expected cosine similarities between the obtained gradient estimators and the true gradient, and demonstrate the improvement achieved by incorporating priors. Extensive experiments on the ImageNet and CIFAR-10 datasets show that our approach significantly outperforms 11 state-of-the-art methods in terms of query efficiency.

Updated: 2025-07-23 15:11:25

标题: 用基于转移的先验知识增强硬标签攻击的射线搜索过程

摘要: 摘要：黑盒对抗攻击中最实用且具挑战性的类型之一是硬标签攻击，其中仅有顶部预测标签可用。一种有效的方法是从良性图像中搜索最优的射线方向，以最小化到对抗性区域的 $\ell_p$-范数距离。这种方法的独特优势在于将硬标签攻击转化为连续优化问题。目标函数值是射线的半径，可以通过高查询成本的二分搜索获得。现有方法在梯度估计中使用“符号技巧”来减少查询次数。本文在理论上分析了这种梯度估计的质量，并提出了一种新颖的先验引导方法，从理论和经验上提高射线搜索效率。具体来说，我们利用来自替代模型的基于转移的先验，我们的梯度估计器适当地通过近似将真实梯度投影到由这些先验和随机方向张成的子空间中，以一种查询高效的方式进行集成。我们在理论上推导了获得的梯度估计器与真实梯度之间的期望余弦相似度，并展示了通过整合先验实现的改进。在ImageNet和CIFAR-10数据集上的大量实验表明，我们的方法在查询效率方面明显优于11种最先进的方法。

更新时间: 2025-07-23 15:11:25

领域: cs.CV,cs.CR,cs.LG,I.2.6; I.5.1; G.1.6

下载: http://arxiv.org/abs/2507.17577v1

Turing Test 2.0: The General Intelligence Threshold

With the rise of artificial intelligence (A.I.) and large language models like ChatGPT, a new race for achieving artificial general intelligence (A.G.I) has started. While many speculate how and when A.I. will achieve A.G.I., there is no clear agreement on how A.G.I. can be detected in A.I. models, even when popular tools like the Turing test (and its modern variations) are used to measure their intelligence. In this work, we discuss why traditional methods like the Turing test do not suffice for measuring or detecting A.G.I. and provide a new, practical method that can be used to decide if a system (computer or any other) has reached or surpassed A.G.I. To achieve this, we make two new contributions. First, we present a clear definition for general intelligence (G.I.) and set a G.I. Threshold (G.I.T.) that can be used to distinguish between systems that achieve A.G.I. and systems that do not. Second, we present a new framework on how to construct tests that can detect if a system has achieved G.I. in a simple, comprehensive, and clear-cut fail/pass way. We call this novel framework the Turing test 2.0. We then demonstrate real-life examples of applying tests that follow our Turing test 2.0 framework on modern A.I. models.

Updated: 2025-07-23 15:09:33

标题: Turning Test 2.0：通用智能门槛

摘要: 随着人工智能（A.I.）和大型语言模型如ChatGPT的兴起，一个追求人工通用智能（A.G.I）的新竞赛已经开始。虽然许多人猜测人工智能将如何以及何时实现人工通用智能，但对于如何在人工智能模型中检测A.G.I.没有明确的共识，即使使用像图灵测试（及其现代变体）这样的流行工具来衡量其智能也不行。在这项工作中，我们讨论了为什么传统方法如图灵测试不足以衡量或检测A.G.I.并提出了一种可以用来判断系统（计算机或其他任何系统）是否已经达到或超过A.G.I.的新实用方法。为了实现这一目标，我们做出了两项新贡献。首先，我们提出了通用智能（G.I.）的明确定义，并设定了一个通用智能阈值（G.I.T.），可用于区分达到A.G.I.的系统和未达到A.G.I.的系统。其次，我们提出了一个新框架，说明如何构建测试，可以简单、全面和明确地检测一个系统是否已经达到通用智能。我们将这一新颖框架称为图灵测试2.0。然后，我们展示了将遵循我们的图灵测试2.0框架对现代人工智能模型应用测试的实例。

更新时间: 2025-07-23 15:09:33

领域: cs.AI

下载: http://arxiv.org/abs/2505.19550v4

Fairness Evaluation of Large Language Models in Academic Library Reference Services

As libraries explore large language models (LLMs) for use in virtual reference services, a key question arises: Can LLMs serve all users equitably, regardless of demographics or social status? While they offer great potential for scalable support, LLMs may also reproduce societal biases embedded in their training data, risking the integrity of libraries' commitment to equitable service. To address this concern, we evaluate whether LLMs differentiate responses across user identities by prompting six state-of-the-art LLMs to assist patrons differing in sex, race/ethnicity, and institutional role. We found no evidence of differentiation by race or ethnicity, and only minor evidence of stereotypical bias against women in one model. LLMs demonstrated nuanced accommodation of institutional roles through the use of linguistic choices related to formality, politeness, and domain-specific vocabularies, reflecting professional norms rather than discriminatory treatment. These findings suggest that current LLMs show a promising degree of readiness to support equitable and contextually appropriate communication in academic library reference services.

Updated: 2025-07-23 15:08:40

标题: 大型语言模型在学术图书馆参考服务中的公平性评估

摘要: 随着图书馆探索大型语言模型（LLMs）用于虚拟参考服务，一个关键问题出现了：LLMs是否能够公平地为所有用户提供服务，而不考虑其人口统计特征或社会地位？虽然它们具有可扩展支持的巨大潜力，但LLMs可能也会复制嵌入其训练数据中的社会偏见，从而危及图书馆对公平服务的承诺。为了解决这一问题，我们评估了LLMs是否会根据用户身份的不同而区别对待，通过促使六种最先进的LLMs帮助性别、种族/族裔和机构角色不同的读者。我们发现没有种族或族裔的差异化迹象，只有一个模型中对女性的刻板印象偏见的轻微证据。LLMs通过使用与形式、礼貌和领域特定词汇相关的语言选择，展示了对机构角色的微妙适应，反映了专业规范而非歧视性对待。这些发现表明，当前的LLMs表现出一定程度的准备就绪，以支持学术图书馆参考服务中公平且与环境相适应的沟通。

更新时间: 2025-07-23 15:08:40

领域: cs.CL,cs.AI,cs.DL

下载: http://arxiv.org/abs/2507.04224v2

Photonic Fabric Platform for AI Accelerators

This paper presents the Photonic FabricTM and the Photonic Fabric ApplianceTM (PFA), a photonic-enabled switch and memory subsystem that delivers low latency, high bandwidth, and low per-bit energy. By integrating high-bandwidth HBM3E memory, an on-module photonic switch, and external DDR5 in a 2.5D electro-optical system-in-package, the PFA offers up to 32 TB of shared memory alongside 115 Tbps of all-to-all digital switching. The Photonic FabricTM enables distributed AI training and inference to execute parallelism strategies more efficiently. The Photonic Fabric removes the silicon beachfront constraint that limits the fixed memory-to-compute ratio observed in virtually all current XPU accelerator designs. Replacing a local HBM stack on an XPU with a chiplet that connects to the Photonic Fabric increases its memory capacity and correspondingly its memory bandwidth by offering a flexible path to scaling well beyond the limitations of on-package HBM alone. We introduce CelestiSim, a lightweight analytical simulator validated on NVIDIA H100 and H200 systems. It is used to evaluate the performance of LLM reference and energy savings on PFA, without any significant change to the GPU core design. With the PFA, the simulation results show that up to 3.66x throughput and 1.40x latency improvements in LLM inference at 405B parameters, up to 7.04x throughput and 1.41x latency improvements at 1T parameters, and 60-90% energy savings in data movement for heavy collective operations in all LLM training scenarios. While these results are shown for NVIDIA GPUs, they can be applied similarly to other AI accelerator designs (XPUs) that share the same fundamental limitation of fixed memory to compute.

Updated: 2025-07-23 15:07:06

标题: 光子织物平台用于人工智能加速器

摘要: 这篇论文介绍了光子织物TM和光子织物设备TM（PFA），这是一种光子使能的交换机和内存子系统，提供低延迟、高带宽和低每比特能量。通过在2.5D电光系统封装中集成高带宽HBM3E内存、模块上的光子交换机和外部DDR5，PFA提供高达32TB的共享内存和115Tbps的全对全数字交换。光子织物TM实现了分布式AI训练和推理，以更有效地执行并行策略。光子织物消除了硅沙滩约束，这种约束限制了几乎所有当前XPU加速器设计中观察到的固定内存与计算比。通过使用与光子织物连接的芯片组替换XPU上的本地HBM堆栈，可以增加其内存容量，从而通过提供灵活扩展路径，使内存带宽超越仅靠封装内HBM的限制。我们引入了CelestiSim，这是一个轻量级的分析模拟器，在NVIDIA H100和H200系统上进行验证。它用于评估PFA上LLM参考和节能的性能，而无需对GPU核心设计进行任何重大更改。通过PFA，模拟结果显示，在405B参数下LLM推理的吞吐量提高了最多3.66倍，延迟提高了1.40倍，在1T参数下吞吐量提高了最多7.04倍，延迟提高了1.41倍，并且在所有LLM训练场景中，对于重型集合操作，数据移动节省了60-90%的能量。尽管这些结果是针对NVIDIA GPU展示的，但可以类似地应用于其他共享固定内存与计算的AI加速器设计（XPUs）。

更新时间: 2025-07-23 15:07:06

领域: cs.PF,cs.AI,C.4

下载: http://arxiv.org/abs/2507.14000v3

Application of YOLOv8 in monocular downward multiple Car Target detection

Autonomous driving technology is progressively transforming traditional car driving methods, marking a significant milestone in modern transportation. Object detection serves as a cornerstone of autonomous systems, playing a vital role in enhancing driving safety, enabling autonomous functionality, improving traffic efficiency, and facilitating effective emergency responses. However, current technologies such as radar for environmental perception, cameras for road perception, and vehicle sensor networks face notable challenges, including high costs, vulnerability to weather and lighting conditions, and limited resolution.To address these limitations, this paper presents an improved autonomous target detection network based on YOLOv8. By integrating structural reparameterization technology, a bidirectional pyramid structure network model, and a novel detection pipeline into the YOLOv8 framework, the proposed approach achieves highly efficient and precise detection of multi-scale, small, and remote objects. Experimental results demonstrate that the enhanced model can effectively detect both large and small objects with a detection accuracy of 65%, showcasing significant advancements over traditional methods.This improved model holds substantial potential for real-world applications and is well-suited for autonomous driving competitions, such as the Formula Student Autonomous China (FSAC), particularly excelling in scenarios involving single-target and small-object detection.

Updated: 2025-07-23 15:01:56

标题: YOLOv8在单目向下多车目标检测中的应用

摘要: 自动驾驶技术正在逐渐改变传统的汽车驾驶方法，在现代交通领域标志着重要的里程碑。目标检测作为自主系统的基石，在提升驾驶安全、实现自主功能、改善交通效率和促进有效应急响应方面发挥着至关重要的作用。然而，当前的技术，如用于环境感知的雷达、用于道路感知的摄像头和车辆传感器网络，面临着诸多挑战，包括高成本、对天气和光照条件的脆弱性以及分辨率有限。为了解决这些限制，本文介绍了一种基于YOLOv8的改进型自主目标检测网络。通过将结构重参数化技术、双向金字塔结构网络模型和新颖检测管道集成到YOLOv8框架中，所提出的方法实现了对多尺度、小型和远程目标的高效精确检测。实验结果表明，增强型模型可以有效检测大型和小型目标，检测准确率达到65%，较传统方法取得了显著进展。这一改进型模型在现实世界应用中具有巨大潜力，特别适用于自动驾驶竞赛，如中国大学生自动驾驶比赛（FSAC），在涉及单一目标和小型物体检测的场景中表现突出。

更新时间: 2025-07-23 15:01:56

领域: cs.CV,cs.AI,I.4.8; I.2.10

下载: http://arxiv.org/abs/2505.10016v2

ORL-LDM: Offline Reinforcement Learning Guided Latent Diffusion Model Super-Resolution Reconstruction

With the rapid advancement of remote sensing technology, super-resolution image reconstruction is of great research and practical significance. Existing deep learning methods have made progress but still face limitations in handling complex scenes and preserving image details. This paper proposes a reinforcement learning-based latent diffusion model (LDM) fine-tuning method for remote sensing image super-resolution. The method constructs a reinforcement learning environment with states, actions, and rewards, optimizing decision objectives through proximal policy optimization (PPO) during the reverse denoising process of the LDM model. Experiments on the RESISC45 dataset show significant improvements over the baseline model in PSNR, SSIM, and LPIPS, with PSNR increasing by 3-4dB, SSIM improving by 0.08-0.11, and LPIPS reducing by 0.06-0.10, particularly in structured and complex natural scenes. The results demonstrate the method's effectiveness in enhancing super-resolution quality and adaptability across scenes.

Updated: 2025-07-23 15:01:44

标题: ORL-LDM: 离线强化学习引导的潜在扩散模型超分辨率重建

摘要: 随着遥感技术的快速发展，超分辨率图像重建具有重要的研究和实际意义。现有的深度学习方法取得了一定进展，但在处理复杂场景和保留图像细节方面仍然存在局限性。本文提出了一种基于强化学习的潜在扩散模型（LDM）微调方法，用于遥感图像超分辨率。该方法通过状态、动作和奖励构建了一个强化学习环境，在LDM模型的逆去噪过程中通过近端策略优化（PPO）优化决策目标。在RESISC45数据集上的实验结果显示，在PSNR、SSIM和LPIPS方面，与基准模型相比取得了显著改进，其中PSNR提高了3-4dB，SSIM提高了0.08-0.11，LPIPS减少了0.06-0.10，特别是在结构化和复杂的自然场景中。结果表明该方法在提高超分辨率质量和适应各种场景方面的有效性。

更新时间: 2025-07-23 15:01:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2505.10027v2

Federated Behavioural Planes: Explaining the Evolution of Client Behaviour in Federated Learning

Federated Learning (FL), a privacy-aware approach in distributed deep learning environments, enables many clients to collaboratively train a model without sharing sensitive data, thereby reducing privacy risks. However, enabling human trust and control over FL systems requires understanding the evolving behaviour of clients, whether beneficial or detrimental for the training, which still represents a key challenge in the current literature. To address this challenge, we introduce Federated Behavioural Planes (FBPs), a novel method to analyse, visualise, and explain the dynamics of FL systems, showing how clients behave under two different lenses: predictive performance (error behavioural space) and decision-making processes (counterfactual behavioural space). Our experiments demonstrate that FBPs provide informative trajectories describing the evolving states of clients and their contributions to the global model, thereby enabling the identification of clusters of clients with similar behaviours. Leveraging the patterns identified by FBPs, we propose a robust aggregation technique named Federated Behavioural Shields to detect malicious or noisy client models, thereby enhancing security and surpassing the efficacy of existing state-of-the-art FL defense mechanisms. Our code is publicly available on GitHub.

Updated: 2025-07-23 14:57:55

标题: 联合行为平面：解释联合学习中客户行为的演变

摘要: 联邦学习（FL）是在分布式深度学习环境中的隐私意识方法，使许多客户能够共同训练模型而不共享敏感数据，从而降低隐私风险。然而，使人类信任和控制FL系统需要了解客户的演变行为，无论是对训练有益还是有害，这仍然是当前文献中的一个关键挑战。为了解决这一挑战，我们引入了联邦行为平面（FBPs），这是一种新颖的方法，用于分析、可视化和解释FL系统的动态，展示客户在两种不同视角下的行为：预测性能（错误行为空间）和决策过程（反事实行为空间）。我们的实验表明，FBPs提供了描述客户不断演变状态和其对全局模型贡献的信息轨迹，从而实现了识别具有相似行为的客户簇。利用FBPs识别出的模式，我们提出了一种强大的聚合技术，称为联邦行为护盾，用于检测恶意或嘈杂的客户模型，从而增强安全性并超越现有最先进的FL防御机制的效力。我们的代码可在GitHub上公开获取。

更新时间: 2025-07-23 14:57:55

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.15632v3

A Physically Driven Long Short Term Memory Model for Estimating Snow Water Equivalent over the Continental United States

Snow is an essential input for various land surface models. Seasonal snow estimates are available as snow water equivalent (SWE) from process-based reanalysis products or locally from in situ measurements. While the reanalysis products are computationally expensive and available at only fixed spatial and temporal resolutions, the in situ measurements are highly localized and sparse. To address these issues and enable the analysis of the effect of a large suite of physical, morphological, and geological conditions on the presence and amount of snow, we build a Long Short-Term Memory (LSTM) network, which is able to estimate the SWE based on time series input of the various physical/meteorological factors as well static spatial/morphological factors. Specifically, this model breaks down the SWE estimation into two separate tasks: (i) a classification task that indicates the presence/absence of snow on a specific day and (ii) a regression task that indicates the height of the SWE on a specific day in the case of snow presence. The model is trained using physical/in situ SWE measurements from the SNOw TELemetry (SNOTEL) snow pillows in the western United States. We will show that trained LSTM models have a classification accuracy of $\geq 93\%$ for the presence of snow and a coefficient of correlation of $\sim 0.9$ concerning their SWE estimates. We will also demonstrate that the models can generalize both spatially and temporally to previously unseen data.

Updated: 2025-07-23 14:53:46

标题: 一个基于物理驱动的长短期记忆模型，用于估算美国大陆雪水当量

摘要: 雪是各种陆地表面模型的重要输入。季节性雪量估计可作为雪水当量（SWE）从基于过程的再分析产品中获得，或者从就地测量中获取。虽然再分析产品在计算方面昂贵，并且仅以固定的空间和时间分辨率提供，就地测量则高度局部化且稀疏。为了解决这些问题并能够分析各种物理、形态和地质条件对雪的存在和数量的影响，我们构建了一个长短期记忆（LSTM）网络，能够根据各种物理/气象因素的时间序列输入以及静态空间/形态因素来估计SWE。具体来说，该模型将SWE估计分解为两个独立任务：（i）分类任务，指示特定日期的雪的存在/不存在；（ii）回归任务，指示特定日期的SWE高度在雪存在的情况下。该模型使用来自美国西部的SNOw TELemetry（SNOTEL）雪枕的物理/就地SWE测量进行训练。我们将展示经过训练的LSTM模型在雪的存在方面具有$\geq 93\%$的分类准确度，并且在其SWE估计方面具有$\sim 0.9$的相关系数。我们还将证明这些模型可以在空间和时间上泛化到以前看不见的数据。

更新时间: 2025-07-23 14:53:46

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2504.20129v2

Quantum-Safe Hybrid Key Exchanges with KEM-Based Authentication

Authenticated Key Exchange (AKE) between any two entities is one of the most important security protocols available for securing our digital networks and infrastructures. In PQCrypto 2023, Bruckner, Ramacher and Striecks proposed a novel hybrid AKE (HAKE) protocol, dubbed Muckle+, that is particularly useful in large quantum-safe networks consisting of a large number of nodes. Their protocol is hybrid in the sense that it allows key material from conventional and post-quantum primitives, as well as from quantum key distribution, to be incorporated into a single end-to-end shared key. To achieve the desired authentication properties, Muckle+ utilizes post-quantum digital signatures. However, available instantiations of such signatures schemes are not yet efficient enough compared to their post-quantum key-encapsulation mechanism (KEM) counterparts, particularly in large networks with potentially several connections in a short period of time. To mitigate this gap, we propose Muckle# that pushes the efficiency boundaries of currently known HAKE constructions. Muckle# uses post-quantum key-encapsulating mechanisms for implicit authentication inspired by recent works done in the area of Transport Layer Security (TLS) protocols, particularly, in KEMTLS (CCS'20). We port those ideas to the HAKE framework and develop novel proof techniques on the way. Due to our novel KEM-based approach, the resulting protocol has a slightly different message flow compared to prior work that we carefully align with the HAKE framework and which makes our changes to the Muckle+ non-trivial.

Updated: 2025-07-23 14:41:48

标题: 具有基于KEM认证的量子安全混合密钥交换

摘要: 任何两个实体之间的认证密钥交换（AKE）是保护数字网络和基础设施安全的最重要的安全协议之一。在PQCrypto 2023年会议上，Bruckner、Ramacher和Striecks提出了一种名为Muckle+的新型混合AKE（HAKE）协议，特别适用于由大量节点组成的大型量子安全网络。他们的协议是混合的，因为它允许将传统和后量子基元以及量子密钥分发中的密钥材料合并到单个端到端共享密钥中。为了实现所需的认证属性，Muckle+利用后量子数字签名。然而，目前可用的此类签名方案的实例化与其后量子密钥封装机制（KEM）同行相比尚不够高效，特别是在可能在短时间内建立多个连接的大型网络中。为了弥补这一差距，我们提出了Muckle#，它推动了当前已知HAKE构造的效率边界。Muckle#使用后量子密钥封装机制进行隐式认证，灵感来自最近在传输层安全协议（TLS）领域进行的研究，特别是在KEMTLS（CCS'20）中。我们将这些想法移植到HAKE框架中，并在此过程中开发了新颖的证明技术。由于我们的基于KEM的新方法，结果协议的消息流与之前的工作略有不同，我们仔细将其与HAKE框架对齐，这使得我们对Muckle+的更改非常重要。

更新时间: 2025-07-23 14:41:48

领域: cs.CR,quant-ph,94A60, 81P94, 94A62,E.3

下载: http://arxiv.org/abs/2411.04030v2

Impact of Stickers on Multimodal Sentiment and Intent in Social Media: A New Task, Dataset and Baseline

Stickers are increasingly used in social media to express sentiment and intent. Despite their significant impact on sentiment analysis and intent recognition, little research has been conducted in this area. To address this gap, we propose a new task: \textbf{M}ultimodal chat \textbf{S}entiment \textbf{A}nalysis and \textbf{I}ntent \textbf{R}ecognition involving \textbf{S}tickers (MSAIRS). Additionally, we introduce a novel multimodal dataset containing Chinese chat records and stickers excerpted from several mainstream social media platforms. Our dataset includes paired data with the same text but different stickers, the same sticker but different contexts, and various stickers consisting of the same images with different texts, allowing us to better understand the impact of stickers on chat sentiment and intent. We also propose an effective multimodal joint model, MMSAIR, featuring differential vector construction and cascaded attention mechanisms for enhanced multimodal fusion. Our experiments demonstrate the necessity and effectiveness of jointly modeling sentiment and intent, as they mutually reinforce each other's recognition accuracy. MMSAIR significantly outperforms traditional models and advanced MLLMs, demonstrating the challenge and uniqueness of sticker interpretation in social media. Our dataset and code are available on https://github.com/FakerBoom/MSAIRS-Dataset.

Updated: 2025-07-23 14:35:12

标题: 社交媒体中贴纸对多模情感和意图的影响：一个新的任务、数据集和基线

摘要: 贴纸越来越被广泛应用于社交媒体以表达情感和意图。尽管贴纸对情感分析和意图识别有着重要影响，但在这一领域中进行的研究甚少。为了填补这一空白，我们提出了一个新任务：包含贴纸的多模态聊天情感分析和意图识别（MSAIRS）。此外，我们引入了一个包含中国聊天记录和摘自几个主流社交媒体平台的贴纸的新颖多模态数据集。我们的数据集包含了相同文本但不同贴纸、相同贴纸但不同上下文以及包含相同图像但不同文本的各种贴纸的配对数据，使我们能够更好地理解贴纸对聊天情感和意图的影响。我们还提出了一种有效的多模态联合模型MMSAIR，具有差异化向量构建和级联注意机制，以增强多模态融合。我们的实验表明，共同建模情感和意图的必要性和有效性，因为它们相互增强了对方的识别准确性。MMSAIR明显优于传统模型和先进的MLLMs，展示了社交媒体中贴纸解释的挑战和独特性。我们的数据集和代码可在https://github.com/FakerBoom/MSAIRS-Dataset 上获取。

更新时间: 2025-07-23 14:35:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.08427v2

Scalable DC Optimization via Adaptive Frank-Wolfe Algorithms

We consider the problem of minimizing a difference of (smooth) convex functions over a compact convex feasible region $P$, i.e., $\min_{x \in P} f(x) - g(x)$, with smooth $f$ and Lipschitz continuous $g$. This computational study builds upon and complements the framework of Maskan et al. [2025] by integrating advanced Frank-Wolfe variants to reduce computational overhead. We empirically show that constrained DC problems can be efficiently solved using a combination of the Blended Pairwise Conditional Gradients (BPCG) algorithm [Tsuji et al., 2022] with warm-starting and the adaptive error bound from Maskan et al. [2025]. The result is a highly efficient and scalable projection-free algorithm for constrained DC optimization.

Updated: 2025-07-23 14:22:42

标题: 可扩展的DC优化：通过自适应Frank-Wolfe算法

摘要: 我们考虑在紧凸可行区域$P$上最小化（光滑）凸函数之差的问题，即$\min_{x \in P} f(x) - g(x)$，其中$f$是光滑函数，$g$是Lipschitz连续函数。这项计算研究基于Maskan等人[2025]的框架，并结合了先进的Frank-Wolfe变体，以减少计算开销。我们经验性地展示，通过将Blended Pairwise Conditional Gradients（BPCG）算法[Tsuji等人，2022]与热启动和Maskan等人[2025]的自适应误差界相结合，可以高效地解决受约束的DC问题。结果是一个高效且可扩展的无投影算法，用于受约束的DC优化。

更新时间: 2025-07-23 14:22:42

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2507.17545v1

Optimal differentially private kernel learning with random projection

Differential privacy has become a cornerstone in the development of privacy-preserving learning algorithms. This work addresses optimizing differentially private kernel learning within the empirical risk minimization (ERM) framework. We propose a novel differentially private kernel ERM algorithm based on random projection in the reproducing kernel Hilbert space using Gaussian processes. Our method achieves minimax-optimal excess risk for both the squared loss and Lipschitz-smooth convex loss functions under a local strong convexity condition. We further show that existing approaches based on alternative dimension reduction techniques, such as random Fourier feature mappings or $\ell_2$ regularization, yield suboptimal generalization performance. Our key theoretical contribution also includes the derivation of dimension-free generalization bounds for objective perturbation-based private linear ERM -- marking the first such result that does not rely on noisy gradient-based mechanisms. Additionally, we obtain sharper generalization bounds for existing differentially private kernel ERM algorithms. Empirical evaluations support our theoretical claims, demonstrating that random projection enables statistically efficient and optimally private kernel learning. These findings provide new insights into the design of differentially private algorithms and highlight the central role of dimension reduction in balancing privacy and utility.

Updated: 2025-07-23 14:20:46

标题: 最佳隐私保护的随机投影核学习

摘要: 差分隐私已成为隐私保护学习算法发展的基石。本文研究了在经验风险最小化（ERM）框架内优化差分隐私核学习。我们提出了一种基于随机投影在再生核希尔伯特空间中使用高斯过程的新颖差分隐私核ERM算法。我们的方法在局部强凸性条件下实现了最小最优超额风险，适用于平方损失和Lipschitz平滑凸损失函数。我们进一步展示，基于替代降维技术的现有方法，如随机傅里叶特征映射或$\ell_2$正则化，导致次优的泛化性能。我们的关键理论贡献还包括为基于目标扰动的私有线性ERM推导无维度泛化界--这是第一个不依赖于嘈杂梯度机制的结果。此外，我们为现有的差分隐私核ERM算法获得了更为精确的泛化界。实证评估支持我们的理论断言，表明随机投影实现了统计有效和最优私有核学习。这些发现为设计差分隐私算法提供了新的见解，并强调了降维在平衡隐私和效用方面的中心作用。

更新时间: 2025-07-23 14:20:46

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2507.17544v1

Clustering-based hard negative sampling for supervised contrastive speaker verification

In speaker verification, contrastive learning is gaining popularity as an alternative to the traditionally used classification-based approaches. Contrastive methods can benefit from an effective use of hard negative pairs, which are different-class samples particularly challenging for a verification model due to their similarity. In this paper, we propose CHNS - a clustering-based hard negative sampling method, dedicated for supervised contrastive speaker representation learning. Our approach clusters embeddings of similar speakers, and adjusts batch composition to obtain an optimal ratio of hard and easy negatives during contrastive loss calculation. Experimental evaluation shows that CHNS outperforms a baseline supervised contrastive approach with and without loss-based hard negative sampling, as well as a state-of-the-art classification-based approach to speaker verification by as much as 18 % relative EER and minDCF on the VoxCeleb dataset using two lightweight model architectures.

Updated: 2025-07-23 14:19:33

标题: 基于聚类的硬负采样用于监督对比说话人验证

摘要: 在说话者验证中，对比学习作为一种替代传统分类方法的方法正变得越来越流行。对比方法可以从有效使用难以处理的负面对中获益，这些对是由于它们的相似性而对验证模型尤为具有挑战性的不同类别样本。在本文中，我们提出了CHNS - 一种基于聚类的难负面采样方法，专门用于监督对比说话者表示学习。我们的方法对相似说话者的嵌入进行聚类，并调整批处理组合以在对比损失计算期间获得难易负例的最佳比例。实验评估表明，CHNS在VoxCeleb数据集上使用两种轻量级模型架构将相对EER和minDCF提高了高达18％，优于基线监督对比方法以及基于损失的难负例采样和最先进的基于分类的说话者验证方法。

更新时间: 2025-07-23 14:19:33

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2507.17540v1

Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning

Multimodal large language models (MLLMs) demonstrate significant potential in the field of medical diagnosis. However, they face critical challenges in specialized domains such as ophthalmology, particularly the fragmentation of annotation granularity and inconsistencies in clinical reasoning logic, which hinder precise cross-modal understanding. This paper introduces FundusExpert, an ophthalmology-specific MLLM with integrated positioning-diagnosis reasoning capabilities, along with FundusGen, a dataset constructed through the intelligent Fundus-Engine system. Fundus-Engine automates localization and leverages MLLM-based semantic expansion to integrate global disease classification, local object detection, and fine-grained feature analysis within a single fundus image. Additionally, by constructing a clinically aligned cognitive chain, it guides the model to generate interpretable reasoning paths. FundusExpert, fine-tuned with instruction data from FundusGen, achieves the best performance in ophthalmic question-answering tasks, surpassing the average accuracy of the 40B MedRegA by 26.6%. It also excels in zero-shot report generation tasks, achieving a clinical consistency of 77.0%, significantly outperforming GPT-4o's 47.6%. Furthermore, we reveal a scaling law between data quality and model capability ($L \propto N^{0.068}$), demonstrating that the cognitive alignment annotations in FundusGen enhance data utilization efficiency. By integrating region-level localization with diagnostic reasoning chains, our work develops a scalable, clinically-aligned MLLM and explores a pathway toward bridging the visual-language gap in specific MLLMs. Our project can be found at https://github.com/MeteorElf/FundusExpert.

Updated: 2025-07-23 14:19:30

标题: 构建眼科 MLLM 通过临床认知链推理进行定位诊断协作

摘要: 多模态大型语言模型（MLLMs）在医学诊断领域展现出巨大潜力。然而，在专业领域如眼科，它们面临关键挑战，特别是在注释细粒度和临床推理逻辑上的不一致，这些都妨碍了精确的跨模式理解。本文介绍了FundusExpert，一种专门用于眼科的MLLM，具有集成的定位-诊断推理能力，以及通过智能Fundus-Engine系统构建的数据集FundusGen。Fundus-Engine自动化定位并利用基于MLLM的语义扩展，将全球疾病分类、局部目标检测和细粒度特征分析整合到单个眼底图像中。此外，通过构建临床对齐的认知链，它引导模型生成可解释的推理路径。通过使用来自FundusGen的指导数据对FundusExpert进行微调，在眼科问题回答任务中取得最佳表现，超过40B MedRegA的平均准确率26.6%。它还在零-shot报告生成任务中表现出色，实现了77.0%的临床一致性，明显优于GPT-4o的47.6%。此外，我们揭示了数据质量和模型能力之间的规模定律（$L \propto N^{0.068}$），证明了FundusGen中的认知对齐注释提高了数据利用效率。通过将区域级定位与诊断推理链集成，我们的工作开发了一种可扩展的、临床对齐的MLLM，并探索了在特定MLLM中弥合视觉-语言差距的路径。我们的项目可以在https://github.com/MeteorElf/FundusExpert找到。

更新时间: 2025-07-23 14:19:30

领域: cs.AI,cs.CV,eess.IV

下载: http://arxiv.org/abs/2507.17539v1

CoCAI: Copula-based Conformal Anomaly Identification for Multivariate Time-Series

We propose a novel framework that harnesses the power of generative artificial intelligence and copula-based modeling to address two critical challenges in multivariate time-series analysis: delivering accurate predictions and enabling robust anomaly detection. Our method, Copula-based Conformal Anomaly Identification for Multivariate Time-Series (CoCAI), leverages a diffusion-based model to capture complex dependencies within the data, enabling high quality forecasting. The model's outputs are further calibrated using a conformal prediction technique, yielding predictive regions which are statistically valid, i.e., cover the true target values with a desired confidence level. Starting from these calibrated forecasts, robust outlier detection is performed by combining dimensionality reduction techniques with copula-based modeling, providing a statistically grounded anomaly score. CoCAI benefits from an offline calibration phase that allows for minimal overhead during deployment and delivers actionable results rooted in established theoretical foundations. Empirical tests conducted on real operational data derived from water distribution and sewerage systems confirm CoCAI's effectiveness in accurately forecasting target sequences of data and in identifying anomalous segments within them.

Updated: 2025-07-23 14:15:31

标题: CoCAI：基于Copula的多变量时间序列符合异常识别

摘要: 我们提出了一个新颖的框架，利用生成人工智能和copula模型来解决多变量时间序列分析中的两个关键挑战：提供准确的预测和实现稳健的异常检测。我们的方法，基于Copula的多变量时间序列一致异常识别（CoCAI），利用扩散模型来捕捉数据中的复杂依赖关系，实现高质量的预测。模型的输出进一步使用一致预测技术进行校准，产生统计有效的预测区域，即以所需置信水平覆盖真实目标值。从这些校准的预测开始，通过将降维技术与基于copula的建模相结合，执行稳健的异常检测，提供统计学基础的异常分数。CoCAI受益于离线校准阶段，允许在部署过程中减少开销，并提供根植于已建立理论基础的可操作结果。对从供水和污水系统中获取的实际运行数据进行的实证测试证实了CoCAI在准确预测数据目标序列和识别其中异常部分方面的有效性。

更新时间: 2025-07-23 14:15:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.17796v1

CoCAI: Copula-based Conformal Anomaly Identification for Multivariate Time-Series

Updated: 2025-07-23 14:15:31

标题: CoCAI：基于Copula的多变量时间序列一致异常识别

摘要: 我们提出了一个新颖的框架，利用生成式人工智能和copula模型来解决多变量时间序列分析中的两个关键挑战：提供准确的预测和实现稳健的异常检测。我们的方法，基于Copula的多变量时间序列一致性异常识别（CoCAI），利用扩散模型捕捉数据中的复杂依赖关系，实现高质量的预测。该模型的输出进一步通过一致性预测技术进行校准，产生统计上有效的预测区域，即以所需置信水平覆盖真实目标值。从这些校准的预测开始，通过结合降维技术和基于copula的建模，执行稳健的异常检测，提供基于统计的异常分数。CoCAI受益于离线校准阶段，可在部署过程中最小化开销，并提供根据已建立的理论基础的可操作结果。对来自水务分配和污水系统的实际运行数据进行的实证测试证实了CoCAI在准确预测数据目标序列和识别其中异常部分方面的有效性。

更新时间: 2025-07-23 14:15:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.17796v1

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

Large Language Models (LLMs) often struggle with mathematical reasoning tasks requiring precise, verifiable computation. While Reinforcement Learning (RL) from outcome-based rewards enhances text-based reasoning, understanding how agents autonomously learn to leverage external tools like code execution remains crucial. We investigate RL from outcome-based rewards for Tool-Integrated Reasoning, ZeroTIR, training base LLMs to spontaneously generate and execute Python code for mathematical problems without supervised tool-use examples. Our central contribution is we demonstrate that as RL training progresses, key metrics scale predictably. Specifically, we observe strong positive correlations where increased training steps lead to increases in the spontaneous code execution frequency, the average response length, and, critically, the final task accuracy. This suggests a quantifiable relationship between computational effort invested in training and the emergence of effective, tool-augmented reasoning strategies. We implement a robust framework featuring a decoupled code execution environment and validate our findings across standard RL algorithms and frameworks. Experiments show ZeroTIR significantly surpasses non-tool ZeroRL baselines on challenging math benchmarks. Our findings provide a foundational understanding of how autonomous tool use is acquired and scales within Agent RL, offering a reproducible benchmark for future studies. Code is released at \href{https://github.com/yyht/openrlhf_async_pipline}{https://github.com/yyht/openrlhf\_async\_pipline}.

Updated: 2025-07-23 14:15:28

标题: 代理RL缩放定律：具有自发代码执行功能的代理RL用于数学问题求解

摘要: 大型语言模型（LLMs）通常在需要精确、可验证计算的数学推理任务中遇到困难。虽然基于结果的强化学习（RL）增强了基于文本的推理能力，但理解代理如何自主学习利用诸如代码执行之类的外部工具仍然至关重要。我们研究了基于结果的强化学习用于工具整合推理的ZeroTIR，训练基础LLMs自发生成并执行Python代码，解决数学问题，而无需监督的工具使用示例。我们的主要贡献是我们证明随着RL训练的进行，关键指标可预测地增加。具体而言，我们观察到强烈的正相关性，即增加的训练步骤导致自发代码执行频率、平均响应长度以及关键的最终任务准确度的增加。这表明在训练中投入的计算工作量与有效的、工具增强的推理策略的出现之间存在可量化的关系。我们实现了一个强大的框架，包括一个分离的代码执行环境，并验证了我们的发现跨标准RL算法和框架。实验表明ZeroTIR在具有挑战性的数学基准测试中明显优于非工具ZeroRL基线。我们的发现为如何在Agent RL中获得和扩展自主工具使用提供了基础性的理解，并为未来研究提供了可重现的基准。代码发布在\href{https://github.com/yyht/openrlhf_async_pipline}{https://github.com/yyht/openrlhf\_async\_pipline}。

更新时间: 2025-07-23 14:15:28

领域: cs.AI

下载: http://arxiv.org/abs/2505.07773v3

Federated Majorize-Minimization: Beyond Parameter Aggregation

This paper proposes a unified approach for designing stochastic optimization algorithms that robustly scale to the federated learning setting. Our work studies a class of Majorize-Minimization (MM) problems, which possesses a linearly parameterized family of majorizing surrogate functions. This framework encompasses (proximal) gradient-based algorithms for (regularized) smooth objectives, the Expectation Maximization algorithm, and many problems seen as variational surrogate MM. We show that our framework motivates a unifying algorithm called Stochastic Approximation Stochastic Surrogate MM (\SSMM), which includes previous stochastic MM procedures as special instances. We then extend \SSMM\ to the federated setting, while taking into consideration common bottlenecks such as data heterogeneity, partial participation, and communication constraints; this yields \QSMM. The originality of \QSMM\ is to learn locally and then aggregate information characterizing the \textit{surrogate majorizing function}, contrary to classical algorithms which learn and aggregate the \textit{original parameter}. Finally, to showcase the flexibility of this methodology beyond our theoretical setting, we use it to design an algorithm for computing optimal transport maps in the federated setting.

Updated: 2025-07-23 14:13:19

标题: 联邦式主导极小化：超越参数聚合

摘要: 本文提出了一种统一的方法，用于设计能够稳健地扩展到联邦学习环境的随机优化算法。我们的工作研究了一类Majorize-Minimization（MM）问题，这类问题具有线性参数化的主要替代函数系列。该框架包括（近端）基于梯度的算法用于（正则化的）平滑目标，期望最大化算法，以及许多被视为变分替代MM的问题。我们展示了我们的框架激发了一个统一的算法，称为Stochastic Approximation Stochastic Surrogate MM（\SSMM），其中包括以前的随机MM过程作为特殊实例。然后我们将\SSMM\扩展到联邦设置中，同时考虑到常见的瓶颈，如数据异质性，部分参与和通信约束；这产生了\QSMM。\QSMM的独创性在于在本地学习，然后聚合信息来描述\textit{替代主要函数}，与经典算法学习和聚合\textit{原始参数}的方式相反。最后，为了展示这种方法在我们理论设置之外的灵活性，我们使用它来设计一个算法，用于在联邦设置中计算最优传输映射。

更新时间: 2025-07-23 14:13:19

领域: cs.LG,cs.AI,math.OC,stat.ML

下载: http://arxiv.org/abs/2507.17534v1

HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks

Speech Enhancement techniques have become core technologies in mobile devices and voice software. Still, modern deep learning solutions often require high amount of computational resources what makes their usage on low-resource devices challenging. We present HiFi-Stream, an optimized version of recently published HiFi++ model. Our experiments demonstrate that HiFi-Stream saves most of the qualities of the original model despite its size and computational complexity improved in comparison to the original HiFi++ making it one of the smallest and fastest models available. The model is evaluated in streaming setting where it demonstrates its superior performance in comparison to modern baselines.

Updated: 2025-07-23 14:11:04

标题: HiFi-Stream：使用生成对抗网络进行流媒体语音增强

摘要: 语音增强技术已成为移动设备和语音软件中的核心技术。然而，现代深度学习解决方案通常需要大量的计算资源，这使得它们在低资源设备上的使用具有挑战性。我们提出了HiFi-Stream，这是最近发布的HiFi++模型的优化版本。我们的实验证明，HiFi-Stream在与原始模型相比尺寸和计算复杂性得到改进的情况下，仍保留了大部分原始模型的优点，使其成为当前可用的最小和最快速度的模型之一。该模型在流式设置中进行评估，并展示了与现代基线相比的卓越性能。

更新时间: 2025-07-23 14:11:04

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2503.17141v2

Channel Estimation for RIS-Assisted mmWave Systems via Diffusion Models

Reconfigurable intelligent surface (RIS) has been recognized as a promising technology for next-generation wireless communications. However, the performance of RIS-assisted systems critically depends on accurate channel state information (CSI). To address this challenge, this letter proposes a novel channel estimation method for RIS-aided millimeter-wave (mmWave) systems based on diffusion models (DMs). Specifically, the forward diffusion process of the original signal is formulated to model the received signal as a noisy observation within the framework of DMs. Subsequently, the channel estimation task is formulated as the reverse diffusion process, and a sampling algorithm based on denoising diffusion implicit models (DDIMs) is developed to enable effective inference. Furthermore, a lightweight neural network, termed BRCNet, is introduced to replace the conventional U-Net, significantly reducing the number of parameters and computational complexity. Extensive experiments conducted under various scenarios demonstrate that the proposed method consistently outperforms existing baselines.

Updated: 2025-07-23 14:10:03

标题: 通过扩散模型进行RIS辅助毫米波系统的信道估计

摘要: Reconfigurable intelligent surface (RIS)被认为是下一代无线通信的有前途的技术。然而，RIS辅助系统的性能关键取决于准确的信道状态信息（CSI）。为了解决这一挑战，本信函提出了一种基于扩散模型（DMs）的RIS辅助毫米波（mmWave）系统的新颖信道估计方法。具体而言，原始信号的正向扩散过程被制定为模拟接收信号作为DMs框架内的嘈杂观察。随后，信道估计任务被制定为反向扩散过程，并且基于去噪扩散隐式模型（DDIMs）的采样算法被开发以实现有效的推理。此外，引入了一种轻量级神经网络，称为BRCNet，以取代传统的U-Net，大大减少了参数数量和计算复杂性。在各种场景下进行的大量实验表明，所提出的方法始终优于现有基准线。

更新时间: 2025-07-23 14:10:03

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2506.07770v2

Sampling-enabled scalable manifold learning unveils discriminative cluster structure of high-dimensional data

As a pivotal branch of machine learning, manifold learning uncovers the intrinsic low-dimensional structure within complex nonlinear manifolds in high-dimensional space for visualization, classification, clustering, and gaining key insights. Although existing techniques have achieved remarkable successes, they suffer from extensive distortions of cluster structure, which hinders the understanding of underlying patterns. Scalability issues also limit their applicability for handling large-scale data. We hence propose a sampling-based Scalable manifold learning technique that enables Uniform and Discriminative Embedding, namely SUDE, for large-scale and high-dimensional data. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data, and then incorporates the non-landmarks into the learned space based on the constrained locally linear embedding (CLLE). We empirically validated the effectiveness of SUDE on synthetic datasets and real-world benchmarks, and applied it to analyze single-cell data and detect anomalies in electrocardiogram (ECG) signals. SUDE exhibits distinct advantage in scalability with respect to data size and embedding dimension, and has promising performance in cluster separation, integrity, and global structure preservation. The experiments also demonstrate notable robustness in embedding quality as the sampling rate decreases.

Updated: 2025-07-23 14:08:03

标题: 采样启用的可扩展流形学习揭示高维数据的判别性聚类结构

摘要: 作为机器学习的一个重要分支，流形学习揭示了复杂非线性流形空间中固有的低维结构，用于可视化、分类、聚类和获取关键见解。尽管现有技术取得了显著成功，但它们存在严重的簇结构扭曲问题，这阻碍了潜在模式的理解。可扩展性问题也限制了它们处理大规模数据的适用性。因此，我们提出了一种基于抽样的可扩展流形学习技术，实现了均匀和具有区分性的嵌入，即SUDE，用于大规模和高维数据。它首先寻找一组地标来构建整个数据的低维骨架，然后基于受限局部线性嵌入（CLLE）将非地标合并到学习空间中。我们对SUDE在合成数据集和现实世界基准上的有效性进行了实证验证，并将其应用于分析单细胞数据和检测心电图（ECG）信号中的异常。SUDE在数据大小和嵌入维度的可扩展性方面具有明显优势，并在簇分离、完整性和全局结构保持方面表现出有希望的性能。实验还表明，随着采样率的降低，嵌入质量具有显著的稳健性。

更新时间: 2025-07-23 14:08:03

领域: cs.LG,I.5.3

下载: http://arxiv.org/abs/2401.01100v3

Generalized Advantage Estimation for Distributional Policy Gradients

Generalized Advantage Estimation (GAE) has been used to mitigate the computational complexity of reinforcement learning (RL) by employing an exponentially weighted estimation of the advantage function to reduce the variance in policy gradient estimates. Despite its effectiveness, GAE is not designed to handle value distributions integral to distributional RL, which can capture the inherent stochasticity in systems and is hence more robust to system noises. To address this gap, we propose a novel approach that utilizes the optimal transport theory to introduce a Wasserstein-like directional metric, which measures both the distance and the directional discrepancies between probability distributions. Using the exponentially weighted estimation, we leverage this Wasserstein-like directional metric to derive distributional GAE (DGAE). Similar to traditional GAE, our proposed DGAE provides a low-variance advantage estimate with controlled bias, making it well-suited for policy gradient algorithms that rely on advantage estimation for policy updates. We integrated DGAE into three different policy gradient methods. Algorithms were evaluated across various OpenAI Gym environments and compared with the baselines with traditional GAE to assess the performance.

Updated: 2025-07-23 14:07:56

标题: 分布式策略梯度的广义优势估计

摘要: 广义优势估计（GAE）已被用于通过采用优势函数的指数加权估计来减少策略梯度估计的方差，从而减轻强化学习（RL）的计算复杂性。尽管其有效性，GAE并未设计用于处理与分布式RL中的值分布有关的问题，这可以捕捉系统中固有的随机性，因此对系统噪声更具鲁棒性。为了解决这一差距，我们提出了一种利用最优输运理论引入类似Wasserstein方向度量的新方法，该方法既测量概率分布之间的距离，也测量方向差异。利用指数加权估计，我们利用这种类似Wasserstein方向度量来推导分布式GAE（DGAE）。与传统GAE类似，我们提出的DGAE提供了一个低方差的优势估计，带有受控偏差，使其非常适用于依赖于优势估计进行策略更新的策略梯度算法。我们将DGAE集成到三种不同的策略梯度方法中。在各种OpenAI Gym环境中评估算法，并与传统GAE的基线进行比较，以评估性能。

更新时间: 2025-07-23 14:07:56

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.17530v1

Generalized Low-Rank Matrix Contextual Bandits with Graph Information

The matrix contextual bandit (CB), as an extension of the well-known multi-armed bandit, is a powerful framework that has been widely applied in sequential decision-making scenarios involving low-rank structure. In many real-world scenarios, such as online advertising and recommender systems, additional graph information often exists beyond the low-rank structure, that is, the similar relationships among users/items can be naturally captured through the connectivity among nodes in the corresponding graphs. However, existing matrix CB methods fail to explore such graph information, and thereby making them difficult to generate effective decision-making policies. To fill in this void, we propose in this paper a novel matrix CB algorithmic framework that builds upon the classical upper confidence bound (UCB) framework. This new framework can effectively integrate both the low-rank structure and graph information in a unified manner. Specifically, it involves first solving a joint nuclear norm and matrix Laplacian regularization problem, followed by the implementation of a graph-based generalized linear version of the UCB algorithm. Rigorous theoretical analysis demonstrates that our procedure outperforms several popular alternatives in terms of cumulative regret bound, owing to the effective utilization of graph information. A series of synthetic and real-world data experiments are conducted to further illustrate the merits of our procedure.

Updated: 2025-07-23 14:07:47

标题: 具有图信息的广义低秩矩阵上下文多臂老虎机

摘要: 矩阵上下文赌博机（CB）作为著名多臂赌博机的延伸，是一个强大的框架，在涉及低秩结构的顺序决策场景中被广泛应用。在许多现实世界的场景中，如在线广告和推荐系统，除了低秩结构外，通常还存在额外的图信息，即用户/项目之间的相似关系可以通过相应图中节点之间的连接性自然捕捉到。然而，现有的矩阵CB方法未能探索这种图信息，因此很难生成有效的决策策略。为了填补这一空白，本文提出了一种新颖的矩阵CB算法框架，该框架建立在经典的置信上界（UCB）框架之上。这种新框架可以有效地统一集成低秩结构和图信息。具体而言，它首先解决一个联合核范数和矩阵拉普拉斯正则化问题，然后实施基于图的广义线性版本的UCB算法。严格的理论分析表明，我们的方法在累积遗憾值方面优于几种流行的替代方法，这归功于对图信息的有效利用。进行了一系列合成和真实世界数据实验，进一步说明了我们方法的优点。

更新时间: 2025-07-23 14:07:47

领域: cs.LG

下载: http://arxiv.org/abs/2507.17528v1

Integrating Physics-Based and Data-Driven Approaches for Probabilistic Building Energy Modeling

Building energy modeling is a key tool for optimizing the performance of building energy systems. Historically, a wide spectrum of methods has been explored -- ranging from conventional physics-based models to purely data-driven techniques. Recently, hybrid approaches that combine the strengths of both paradigms have gained attention. These include strategies such as learning surrogates for physics-based models, modeling residuals between simulated and observed data, fine-tuning surrogates with real-world measurements, using physics-based outputs as additional inputs for data-driven models, and integrating the physics-based output into the loss function the data-driven model. Despite this progress, two significant research gaps remain. First, most hybrid methods focus on deterministic modeling, often neglecting the inherent uncertainties caused by factors like weather fluctuations and occupant behavior. Second, there has been little systematic comparison within a probabilistic modeling framework. This study addresses these gaps by evaluating five representative hybrid approaches for probabilistic building energy modeling, focusing on quantile predictions of building thermodynamics in a real-world case study. Our results highlight two main findings. First, the performance of hybrid approaches varies across different building room types, but residual learning with a Feedforward Neural Network performs best on average. Notably, the residual approach is the only model that produces physically intuitive predictions when applied to out-of-distribution test data. Second, Quantile Conformal Prediction is an effective procedure for calibrating quantile predictions in case of indoor temperature modeling.

Updated: 2025-07-23 14:07:33

标题: 整合基于物理和数据驱动方法进行概率建筑能耗建模

摘要: 建筑能源建模是优化建筑能源系统性能的关键工具。在历史上，探索了一系列方法，从传统的基于物理的模型到纯粹的数据驱动技术。最近，结合这两种范式优势的混合方法受到关注。这些方法包括学习基于物理的模型替代品、建模模拟和观测数据之间的残差、用实际测量数据微调替代品、使用基于物理的输出作为数据驱动模型的额外输入，并将基于物理的输出整合到数据驱动模型的损失函数中。尽管取得了进展，但仍存在两个重要的研究空白。首先，大多数混合方法侧重于确定性建模，通常忽略了由天气波动和居住者行为等因素引起的固有不确定性。其次，在概率建模框架内几乎没有系统比较。本研究通过评估五种代表性的混合方法，针对建筑热力学的分位数预测在一个真实案例研究中进行了探讨，以填补这些空白。我们的研究结果突出了两个主要发现。首先，不同建筑房间类型的混合方法性能有所不同，但使用前馈神经网络进行残差学习在平均性能上表现最佳。值得注意的是，在应用于分布之外的测试数据时，残差方法是唯一能够产生物理直观预测的模型。第二，量化符合预测是在室内温度建模的情况下校准分位数预测的有效程序。

更新时间: 2025-07-23 14:07:33

领域: eess.SY,cs.AI,cs.LG,cs.SY

下载: http://arxiv.org/abs/2507.17526v1

LSDM: LLM-Enhanced Spatio-temporal Diffusion Model for Service-Level Mobile Traffic Prediction

Service-level mobile traffic prediction for individual users is essential for network efficiency and quality of service enhancement. However, current prediction methods are limited in their adaptability across different urban environments and produce inaccurate results due to the high uncertainty in personal traffic patterns, the lack of detailed environmental context, and the complex dependencies among different network services. These challenges demand advanced modeling techniques that can capture dynamic traffic distributions and rich environmental features. Inspired by the recent success of diffusion models in distribution modeling and Large Language Models (LLMs) in contextual understanding, we propose an LLM-Enhanced Spatio-temporal Diffusion Model (LSDM). LSDM integrates the generative power of diffusion models with the adaptive learning capabilities of transformers, augmented by the ability to capture multimodal environmental information for modeling service-level patterns and dynamics. Extensive evaluations on real-world service-level datasets demonstrate that the model excels in traffic usage predictions, showing outstanding generalization and adaptability. After incorporating contextual information via LLM, the performance improves by at least 2.83% in terms of the coefficient of determination. Compared to models of a similar type, such as CSDI, the root mean squared error can be reduced by at least 8.29%. The code and dataset will be available at: https://github.com/SoftYuaneR/LSDM.

Updated: 2025-07-23 14:01:16

标题: LSDM: 服务水平移动流量预测的LLM增强时空扩散模型

摘要: 个人用户的服务级移动流量预测对于网络效率和服务质量的提升至关重要。然而，当前的预测方法在不同城市环境之间的适应性有限，并且由于个人流量模式的高度不确定性、缺乏详细的环境背景以及不同网络服务之间的复杂依赖关系，产生了不准确的结果。这些挑战需要先进的建模技术，可以捕捉动态流量分布和丰富的环境特征。受扩散模型在分布建模和大型语言模型（LLMs）在情境理解方面的最近成功启发，我们提出了一个LLM增强的时空扩散模型（LSDM）。LSDM将扩散模型的生成能力与变压器的自适应学习能力相结合，增强了捕捉多模态环境信息以建模服务级模式和动态的能力。对真实服务级数据集的广泛评估表明，该模型在流量使用预测方面表现出色，具有出色的泛化和适应性。通过LLM引入情境信息后，性能在确定系数方面至少提高了2.83%。与类似类型的模型（如CSDI）相比，均方根误差至少可以降低8.29%。代码和数据集将可在以下网址获得：https://github.com/SoftYuaneR/LSDM。

更新时间: 2025-07-23 14:01:16

领域: cs.LG

下载: http://arxiv.org/abs/2507.17795v1

LSDM: LLM-Enhanced Spatio-temporal Diffusion Model for Service-Level Mobile Traffic Prediction

Updated: 2025-07-23 14:01:16

标题: LSDM：LLM增强的面向服务级移动流量预测的时空扩散模型

摘要: 为个人用户预测服务级移动流量对网络效率和服务质量的提升至关重要。然而，当前的预测方法在不同城市环境中的适应性有限，并由于个人流量模式的高度不确定性、缺乏详细的环境背景和不同网络服务之间的复杂依赖关系而产生不准确的结果。这些挑战需要先进的建模技术，能够捕捉动态流量分布和丰富的环境特征。受扩散模型在分布建模和大语言模型（LLMs）在上下文理解方面的最近成功启发，我们提出了一个LLM增强的时空扩散模型（LSDM）。LSDM将扩散模型的生成能力与变压器的自适应学习能力相结合，增强了捕捉多模态环境信息来建模服务级模式和动态的能力。对真实世界的服务级数据集进行的广泛评估表明，该模型在流量使用预测方面表现出色，展现出优秀的泛化能力和适应性。通过LLM引入上下文信息后，性能在确定系数方面至少提高了2.83%。与CSDI等类似类型的模型相比，均方根误差至少可以降低8.29%。代码和数据集可在以下链接获取：https://github.com/SoftYuaneR/LSDM。

更新时间: 2025-07-23 14:01:16

领域: cs.LG

下载: http://arxiv.org/abs/2507.17795v1

Data-Driven Exploration for a Class of Continuous-Time Indefinite Linear--Quadratic Reinforcement Learning Problems

We study reinforcement learning (RL) for the same class of continuous-time stochastic linear--quadratic (LQ) control problems as in \cite{huang2024sublinear}, where volatilities depend on both states and controls while states are scalar-valued and running control rewards are absent. We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the critic and policy variance by the actor. Unlike the constant or deterministic exploration schedules employed in \cite{huang2024sublinear}, which require extensive tuning for implementations and ignore learning progresses during iterations, our adaptive exploratory approach boosts learning efficiency with minimal tuning. Despite its flexibility, our method achieves a sublinear regret bound that matches the best-known model-free results for this class of LQ problems, which were previously derived only with fixed exploration schedules. Numerical experiments demonstrate that adaptive explorations accelerate convergence and improve regret performance compared to the non-adaptive model-free and model-based counterparts.

Updated: 2025-07-23 14:00:39

标题: 基于数据驱动的连续时间不定性线性二次强化学习问题探索

摘要: 我们研究了强化学习（RL）在与\cite{huang2024sublinear}相同类别的连续时间随机线性二次（LQ）控制问题中的应用，其中波动性取决于状态和控制，而状态为标量值，且不存在运行控制奖励。我们提出了一种无模型、数据驱动的探索机制，通过批评家自适应调整熵正则化，通过演员自适应调整策略方差。与\cite{huang2024sublinear}中采用的常数或确定性探索计划不同，这种计划需要广泛的调整以进行实施，并忽略迭代过程中的学习进展，我们的自适应探索方法通过最小化调整提高了学习效率。尽管方法灵活，但我们的方法实现了与此类LQ问题的最佳已知无模型结果相匹配的次线性遗憾界，这些结果先前仅通过固定探索计划推导得出。数值实验表明，与非自适应无模型和基于模型的对照方法相比，自适应探索加速了收敛速度并提高了遗憾性能。

更新时间: 2025-07-23 14:00:39

领域: cs.LG,cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2507.00358v2

Enhancing Sequential Recommender with Large Language Models for Joint Video and Comment Recommendation

Nowadays, reading or writing comments on captivating videos has emerged as a critical part of the viewing experience on online video platforms. However, existing recommender systems primarily focus on users' interaction behaviors with videos, neglecting comment content and interaction in user preference modeling. In this paper, we propose a novel recommendation approach called LSVCR that utilizes user interaction histories with both videos and comments to jointly perform personalized video and comment recommendation. Specifically, our approach comprises two key components: sequential recommendation (SR) model and supplemental large language model (LLM) recommender. The SR model functions as the primary recommendation backbone (retained in deployment) of our method for efficient user preference modeling. Concurrently, we employ a LLM as the supplemental recommender (discarded in deployment) to better capture underlying user preferences derived from heterogeneous interaction behaviors. In order to integrate the strengths of the SR model and the supplemental LLM recommender, we introduce a two-stage training paradigm. The first stage, personalized preference alignment, aims to align the preference representations from both components, thereby enhancing the semantics of the SR model. The second stage, recommendation-oriented fine-tuning, involves fine-tuning the alignment-enhanced SR model according to specific objectives. Extensive experiments in both video and comment recommendation tasks demonstrate the effectiveness of LSVCR. Moreover, online A/B testing on KuaiShou platform verifies the practical benefits of our approach. In particular, we attain a cumulative gain of 4.13% in comment watch time.

Updated: 2025-07-23 13:55:50

标题: 利用大型语言模型增强顺序推荐器，实现视频和评论的联合推荐

摘要: 现在，在线视频平台上观看或撰写评论已经成为观看体验的一个关键部分。然而，现有的推荐系统主要关注用户与视频的交互行为，忽视了评论内容和用户偏好建模中的交互。在本文中，我们提出了一种名为LSVCR的新颖推荐方法，该方法利用用户与视频和评论的交互历史来共同执行个性化视频和评论推荐。具体来说，我们的方法包括两个关键组件：顺序推荐（SR）模型和补充大型语言模型（LLM）推荐器。SR模型作为我们方法的主要推荐骨干（在部署中保留）来进行有效的用户偏好建模。同时，我们使用LLM作为补充推荐器（在部署中丢弃）以更好地捕捉来自异构交互行为的潜在用户偏好。为了整合SR模型和补充LLM推荐器的优势，我们引入了一个两阶段训练范式。第一阶段，个性化偏好对齐，旨在对齐两个组件的偏好表示，从而增强SR模型的语义。第二阶段，面向推荐的微调，涉及根据具体目标微调经过对齐增强的SR模型。在视频和评论推荐任务中进行了大量实验，证明了LSVCR的有效性。此外，快手平台上的在线A/B测试验证了我们方法的实际益处。特别是，我们在评论观看时间上获得了4.13%的累积增益。

更新时间: 2025-07-23 13:55:50

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2403.13574v2

Enabling Cyber Security Education through Digital Twins and Generative AI

Digital Twins (DTs) are gaining prominence in cybersecurity for their ability to replicate complex IT (Information Technology), OT (Operational Technology), and IoT (Internet of Things) infrastructures, allowing for real time monitoring, threat analysis, and system simulation. This study investigates how integrating DTs with penetration testing tools and Large Language Models (LLMs) can enhance cybersecurity education and operational readiness. By simulating realistic cyber environments, this approach offers a practical, interactive framework for exploring vulnerabilities and defensive strategies. At the core of this research is the Red Team Knife (RTK), a custom penetration testing toolkit aligned with the Cyber Kill Chain model. RTK is designed to guide learners through key phases of cyberattacks, including reconnaissance, exploitation, and response within a DT powered ecosystem. The incorporation of Large Language Models (LLMs) further enriches the experience by providing intelligent, real-time feedback, natural language threat explanations, and adaptive learning support during training exercises. This combined DT LLM framework is currently being piloted in academic settings to develop hands on skills in vulnerability assessment, threat detection, and security operations. Initial findings suggest that the integration significantly improves the effectiveness and relevance of cybersecurity training, bridging the gap between theoretical knowledge and real-world application. Ultimately, the research demonstrates how DTs and LLMs together can transform cybersecurity education to meet evolving industry demands.

Updated: 2025-07-23 13:55:35

标题: 通过数字孪生技术和生成式人工智能促进网络安全教育

摘要: 数字孪生（DTs）因其能够复制复杂的信息技术（IT）、运营技术（OT）和物联网（IoT）基础设施而在网络安全领域备受关注，这使得实时监测、威胁分析和系统模拟成为可能。本研究探讨了如何将DTs与渗透测试工具和大型语言模型（LLMs）相结合，以增强网络安全教育和操作准备能力。通过模拟真实的网络环境，这种方法为探索漏洞和防御策略提供了一个实用、互动性的框架。这项研究的核心是Red Team Knife（RTK），这是一个与网络攻击链模型相符合的定制渗透测试工具包。RTK旨在引导学习者通过网络攻击的关键阶段，包括侦察、利用和在由DT提供支持的生态系统内的响应。通过整合大型语言模型（LLMs），进一步丰富了体验，提供智能的、实时的反馈、自然语言的威胁解释和在培训练习过程中的适应性学习支持。目前，这种结合的DT LLM框架正在学术环境中进行试点，以培养漏洞评估、威胁检测和安全操作方面的实际技能。初步研究结果表明，这种整合显著提高了网络安全培训的有效性和相关性，弥合了理论知识与实际应用之间的差距。最终，这项研究展示了DTs和LLMs如何共同改变网络安全教育，以满足不断发展的行业需求。

更新时间: 2025-07-23 13:55:35

领域: cs.CR,cs.AI,cs.CY,cs.HC,cs.SE

下载: http://arxiv.org/abs/2507.17518v1

Frequency Estimation of Correlated Multi-attribute Data under Local Differential Privacy

Large-scale data collection, from national censuses to IoT-enabled smart homes, routinely gathers dozens of attributes per individual. These multi-attribute datasets are vital for analytics but pose significant privacy risks. Local Differential Privacy (LDP) is a powerful tool to protect user data privacy by allowing users to locally perturb their records before releasing to an untrusted data aggregator. However, existing LDP mechanisms either split the privacy budget across all attributes or treat each attribute independently, ignoring natural inter-attribute correlations. This leads to excessive noise or fragmented budgets, resulting in significant utility loss, particularly in high-dimensional settings. To overcome these limitations, we propose Correlated Randomized Response (Corr-RR), a novel LDP mechanism that leverages correlations among attributes to substantially improve utility while maintaining rigorous LDP guarantees. Corr-RR allocates the full privacy budget to perturb a single, randomly selected attribute and reconstructs the remaining attributes using estimated interattribute dependencies, without incurring additional privacy cost. To enable this, Corr-RR operates in two phases: (1) a subset of users apply standard LDP mechanisms to estimate correlations, and (2) each remaining user perturbs one attribute and infers the others using the learned correlations. We theoretically prove that Corr-RR satisfies $\epsilon$-LDP, and extensive experiments on synthetic and real-world datasets demonstrate that Corr-RR consistently outperforms state-of-the-art LDP mechanisms, particularly in scenarios with many attributes and strong inter-attribute correlations.

Updated: 2025-07-23 13:52:45

标题: 局部差分隐私条件下相关多属性数据频率估计

摘要: 大规模数据收集，从国家普查到物联网智能家居，通常会收集每个个体几十个属性。这些多属性数据集对分析至关重要，但也带来了重大的隐私风险。本地差分隐私（LDP）是一种保护用户数据隐私的强大工具，允许用户在发布给不受信任的数据聚合器之前在本地扰动其记录。然而，现有的LDP机制要么将隐私预算分配给所有属性，要么将每个属性视为独立的，忽略自然的属性间相关性。这导致了过多的噪声或碎片化的预算，尤其在高维设置中导致了显著的效用损失。为了克服这些局限性，我们提出了相关随机响应（Corr-RR），一种利用属性间相关性显著提高效用并保持严格LDP保证的新型LDP机制。Corr-RR将整个隐私预算分配给扰动一个随机选择的属性，并使用估计的属性间依赖关系重建其余属性，而不会产生额外的隐私成本。为了实现这一点，Corr-RR分为两个阶段：（1）一部分用户应用标准LDP机制来估计相关性，（2）每个剩余用户扰动一个属性并利用学到的相关性推断其他属性。我们在理论上证明了Corr-RR满足ε-LDP，并对合成和现实世界数据集进行了广泛实验，结果表明Corr-RR在许多属性和强属性间相关性的情况下始终优于最先进的LDP机制。

更新时间: 2025-07-23 13:52:45

领域: cs.CR

下载: http://arxiv.org/abs/2507.17516v1

Towards Unifying Quantitative Security Benchmarking for Multi Agent Systems

Evolving AI systems increasingly deploy multi-agent architectures where autonomous agents collaborate, share information, and delegate tasks through developing protocols. This connectivity, while powerful, introduces novel security risks. One such risk is a cascading risk: a breach in one agent can cascade through the system, compromising others by exploiting inter-agent trust. In tandem with OWASP's initiative for an Agentic AI Vulnerability Scoring System we define an attack vector, Agent Cascading Injection, analogous to Agent Impact Chain and Blast Radius, operating across networks of agents. In an ACI attack, a malicious input or tool exploit injected at one agent leads to cascading compromises and amplified downstream effects across agents that trust its outputs. We formalize this attack with an adversarial goal equation and key variables (compromised agent, injected exploit, polluted observations, etc.), capturing how a localized vulnerability can escalate into system-wide failure. We then analyze ACI's properties -- propagation chains, amplification factors, and inter-agent compound effects -- and map these to OWASP's emerging Agentic AI risk categories (e.g. Impact Chain and Orchestration Exploits). Finally, we argue that ACI highlights a critical need for quantitative benchmarking frameworks to evaluate the security of agent-to-agent communication protocols. We outline a methodology for stress-testing multi-agent systems (using architectures such as Google's A2A and Anthropic's MCP) against cascading trust failures, developing upon groundwork for measurable, standardized agent-to-agent security evaluation. Our work provides the necessary apparatus for engineers to benchmark system resilience, make data-driven architectural trade-offs, and develop robust defenses against a new generation of agentic threats.

Updated: 2025-07-23 13:51:28

标题: 朝向统一多Agent系统的定量安全基准化

摘要: 不断发展的人工智能系统越来越多地采用多代理体系架构，其中自主代理通过制定协议合作，共享信息，并委托任务。虽然这种连接性非常强大，但也引入了新的安全风险。其中一种风险是级联风险：一个代理的违规行为可能通过利用代理之间的信任关系，级联至整个系统，危害其他代理。与OWASP对代理人工智能漏洞评分系统的倡议相结合，我们定义了一种攻击向量，即代理级联注入，类似于代理影响链和爆炸半径，跨代理网络运作。在ACI攻击中，一个恶意输入或工具漏洞注入到一个代理中，会导致级联的妥协，并在信任其输出的代理之间产生放大的下游效应。我们通过对抗性目标方程和关键变量（受损代理、注入漏洞、污染观察等）来形式化这种攻击，捕捉了一个局部漏洞如何升级为系统范围的失败。然后我们分析了ACI的属性--传播链、放大因子和代理之间的复合效应--并将其映射到OWASP新兴的代理人工智能风险类别（如影响链和编排利用）。最后，我们认为ACI突出了对量化基准测试框架的迫切需求，以评估代理到代理通信协议的安全性。我们概述了一种针对多代理系统进行压力测试的方法论（使用架构如Google的A2A和Anthropic的MCP），以对抗级联信任失败，建立在可衡量、标准化的代理到代理安全评估基础上。我们的工作为工程师提供了必要的工具，以评估系统的弹性，进行基于数据的架构权衡，并对抗一代新的代理威胁。

更新时间: 2025-07-23 13:51:28

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.21146v1

TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment

This paper introduces the TAI Scan Tool, a RAG-based TAI self-assessment tool with minimalistic input. The current version of the tool supports the legal TAI assessment, with a particular emphasis on facilitating compliance with the AI Act. It involves a two-step approach with a pre-screening and an assessment phase. The assessment output of the system includes insight regarding the risk-level of the AI system according to the AI Act, while at the same time retrieving relevant articles to aid with compliance and notify on their obligations. Our qualitative evaluation using use-case scenarios yields promising results, correctly predicting risk levels while retrieving relevant articles across three distinct semantic groups. Furthermore, interpretation of results shows that the tool's reasoning relies on comparison with the setting of high-risk systems, a behaviour attributed to their deployment requiring careful consideration, and therefore frequently presented within the AI Act.

Updated: 2025-07-23 13:51:23

标题: TAI扫描工具：一种基于RAG的工具，具有最简输入，用于可信的AI自我评估

摘要: 本文介绍了TAI扫描工具，这是一种基于RAG的具有极简输入的TAI自我评估工具。该工具的当前版本支持合法的TAI评估，特别强调促进与AI法案的合规性。它采用了一个两步方法，包括预筛选和评估阶段。系统的评估输出包括根据AI法案对AI系统的风险级别的见解，同时检索相关文章以帮助遵守法规并提醒其义务。我们使用用例场景进行的定性评估产生了令人鼓舞的结果，正确预测了风险级别，并在三个不同的语义组中检索到了相关文章。此外，结果的解释显示，该工具的推理依赖于与高风险系统设置的比较，这种行为被归因于其部署需要仔细考虑，因此在AI法案中经常提到。

更新时间: 2025-07-23 13:51:23

领域: cs.AI

下载: http://arxiv.org/abs/2507.17514v1

HOTA: Hamiltonian framework for Optimal Transport Advection

Optimal transport (OT) has become a natural framework for guiding the probability flows. Yet, the majority of recent generative models assume trivial geometry (e.g., Euclidean) and rely on strong density-estimation assumptions, yielding trajectories that do not respect the true principles of optimality in the underlying manifold. We present Hamiltonian Optimal Transport Advection (HOTA), a Hamilton-Jacobi-Bellman based method that tackles the dual dynamical OT problem explicitly through Kantorovich potentials, enabling efficient and scalable trajectory optimization. Our approach effectively evades the need for explicit density modeling, performing even when the cost functionals are non-smooth. Empirically, HOTA outperforms all baselines in standard benchmarks, as well as in custom datasets with non-differentiable costs, both in terms of feasibility and optimality.

Updated: 2025-07-23 13:51:06

标题: HOTA：汉密尔顿框架下的最优输运对流

摘要: 最优输运（OT）已成为引导概率流的自然框架。然而，大多数最近的生成模型假设几何结构是平凡的（例如，欧几里得空间），依赖于强密度估计假设，导致生成的轨迹不符合底层流形的真实最优原则。我们提出了基于哈密顿-雅各比-贝尔曼的哈密顿最优输运对流（HOTA）方法，通过Kantorovich势明确处理双重动力OT问题，实现了高效和可扩展的轨迹优化。我们的方法有效地避免了对显式密度建模的需求，即使在成本函数是非光滑的情况下也能运行。在经验上，HOTA在标准基准测试中优于所有对照组，在自定义数据集中，即使成本不可微的情况下，也在可行性和最优性方面表现出色。

更新时间: 2025-07-23 13:51:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17513v1

Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of LLMs. Existing research has predominantly concentrated on isolated reasoning domains such as mathematical problem-solving, coding tasks, or logical reasoning. However, real world reasoning scenarios inherently demand an integrated application of multiple cognitive skills. Despite this, the interplay among these reasoning skills under reinforcement learning remains poorly understood. To bridge this gap, we present a systematic investigation of multi-domain reasoning within the RLVR framework, explicitly focusing on three primary domains: mathematical reasoning, code generation, and logical puzzle solving. We conduct a comprehensive study comprising four key components: (1) Leveraging the GRPO algorithm and the Qwen-2.5-7B model family, our study thoroughly evaluates the models' in-domain improvements and cross-domain generalization capabilities when trained on single-domain datasets. (2) Additionally, we examine the intricate interactions including mutual enhancements and conflicts that emerge during combined cross-domain training. (3) To further understand the influence of SFT on RL, we also analyze and compare performance differences between base and instruct models under identical RL configurations. (4) Furthermore, we delve into critical RL training details, systematically exploring the impacts of curriculum learning strategies, variations in reward design, and language-specific factors. Through extensive experiments, our results offer significant insights into the dynamics governing domain interactions, revealing key factors influencing both specialized and generalizable reasoning performance. These findings provide valuable guidance for optimizing RL methodologies to foster comprehensive, multi-domain reasoning capabilities in LLMs.

Updated: 2025-07-23 13:51:04

标题: 一个领域可以帮助其他领域吗？基于数据的多领域推理研究：通过强化学习进行研究

摘要: 强化学习与可验证奖励（RLVR）已经成为增强LLM推理能力的强大范式。现有研究主要集中在孤立的推理领域，如数学问题求解、编码任务或逻辑推理。然而，现实世界的推理场景本质上需要多种认知技能的综合应用。尽管如此，在强化学习下这些推理技能之间的相互作用仍不明确。为了弥补这一差距，我们在RLVR框架内对多领域推理进行了系统调查，明确关注三个主要领域：数学推理、代码生成和逻辑谜题解决。我们进行了一项全面的研究，包括四个关键组成部分：（1）利用GRPO算法和Qwen-2.5-7B模型系列，我们的研究彻底评估了模型在领域内改进和跨领域泛化能力，当它们在单一领域数据集上训练时。（2）此外，我们还研究了在联合跨领域训练过程中出现的相互增强和冲突等复杂交互作用。（3）为了进一步了解SFT对RL的影响，我们还分析和比较了在相同RL配置下基础模型和指导模型之间的性能差异。（4）此外，我们深入研究了关键的RL训练细节，系统地探讨了课程学习策略、奖励设计的变化和语言特定因素的影响。通过大量实验，我们的结果为控制领域间相互作用的动态提供了重要见解，揭示了影响专业和可泛化推理性能的关键因素。这些发现为优化RL方法论以促进LLM中全面的、多领域推理能力提供了宝贵的指导。

更新时间: 2025-07-23 13:51:04

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.17512v1

The Bright Side of Timed Opacity

Timed automata (TAs) are an extension of finite automata that can measure and react to the passage of time, providing the ability to handle real-time constraints using clocks. In 2009, Franck Cassez showed that the timed opacity problem, where an attacker can observe some actions with their timestamps and attempts to deduce information, is undecidable for TAs. Moreover, he showed that the undecidability holds even for subclasses such as event-recording automata. In this article, we consider the same definition of opacity, by restricting either the system or the attacker. Our first contribution is to prove the inter-reducibility of two variants of opacity: full opacity (for which the observations should be the same regardless of the visit of a private location) and weak opacity (for which it suffices that the attacker cannot deduce whether the private location was visited, but for which it is harmless to deduce that it was not visited); we also prove further results including a connection with timed language inclusion. Our second contribution is to study opacity for several subclasses of TAs: with restrictions on the number of clocks, the number of actions, the nature of time, or a new subclass called observable event-recording automata. We show that opacity is mostly decidable in these cases, except for one-action TAs and for one-clock TAs with $\epsilon$-transitions, for which undecidability remains. Our third (and arguably main) contribution is to propose a new definition of opacity in which the number of observations made by the attacker is limited to the first $N$ observations, or to a set of $N$ timestamps after which the attacker observes the first action that follows immediately. This set can be defined either a priori or at runtime; all three versions yield decidability for the whole TA class.

Updated: 2025-07-23 13:51:01

标题: 计时不透明的光明面

摘要: 定时自动机（TAs）是有限自动机的扩展，可以测量并对时间的流逝做出反应，通过使用时钟处理实时约束。在2009年，Franck Cassez表明，对于TAs，定时不透明问题是不可判定的，即攻击者可以观察一些带有时间戳的动作并试图推断信息。此外，他还表明即使对于事件记录自动机等子类，不可判定性也成立。在本文中，我们考虑了对不透明性相同的定义，通过限制系统或攻击者。我们的第一个贡献是证明了两种不透明性变体之间的互相可归约性：完全不透明性（观察结果应该是相同的，无论私有位置是否被访问）和弱不透明性（攻击者不能推断私有位置是否被访问，但可以推断它没有被访问）；我们还证明了进一步的结果，包括与定时语言包含性的联系。我们的第二个贡献是研究TAs的几个子类的不透明性：对于时钟数量、动作数量、时间性质的限制，或一种称为可观察事件记录自动机的新子类。我们表明在这些情况下，不透明性大多是可判定的，除了只有一个动作的TAs和具有$\epsilon$-转换的一个时钟的TAs，其中不可判定性仍然存在。我们的第三个（可能是主要的）贡献是提出了一个新的不透明性定义，其中攻击者所做的观察次数限制为前$N$次观察，或在观察了$N$个时间戳之后，攻击者会立即观察到下一个动作。这个集合可以在预先定义或运行时定义；所有三个版本都为整个TA类提供了可判定性。

更新时间: 2025-07-23 13:51:01

领域: cs.LO,cs.CR,cs.FL

下载: http://arxiv.org/abs/2408.12240v4

Fake or Real: The Impostor Hunt in Texts for Space Operations

The "Fake or Real" competition hosted on Kaggle (https://www.kaggle.com/competitions/fake-or-real-the-impostor-hunt ) is the second part of a series of follow-up competitions and hackathons related to the "Assurance for Space Domain AI Applications" project funded by the European Space Agency (https://assurance-ai.space-codev.org/ ). The competition idea is based on two real-life AI security threats identified within the project -- data poisoning and overreliance in Large Language Models. The task is to distinguish between the proper output from LLM and the output generated under malicious modification of the LLM. As this problem was not extensively researched, participants are required to develop new techniques to address this issue or adjust already existing ones to this problem's statement.

Updated: 2025-07-23 13:48:01

标题: 假的还是真的：太空操作文本中的冒名者追踪

摘要: “Fake or Real”竞赛由Kaggle主办(https://www.kaggle.com/competitions/fake-or-real-the-impostor-hunt )，是与“欧洲航天局资助的“空间领域AI应用保障”项目相关的一系列后续竞赛和编程马拉松的第二部分（https://assurance-ai.space-codev.org/）。该竞赛的理念基于项目中确定的两种真实AI安全威胁--数据污染和对大型语言模型的过度依赖。任务是区分LLM的正确输出和在LLM恶意修改下生成的输出。由于这个问题没有被广泛研究，参与者需要开发新技术来解决这个问题，或者调整已存在的技术以解决这个问题。

更新时间: 2025-07-23 13:48:01

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2507.13508v3

Graph Neural Network Approach to Predicting Magnetization in Quasi-One-Dimensional Ising Systems

We present a graph-based deep learning framework for predicting the magnetic properties of quasi-one-dimensional Ising spin systems. The lattice geometry is encoded as a graph and processed by a graph neural network (GNN) followed by fully connected layers. The model is trained on Monte Carlo simulation data and accurately reproduces key features of the magnetization curve, including plateaus, critical transition points, and the effects of geometric frustration. It captures both local motifs and global symmetries, demonstrating that GNNs can infer magnetic behavior directly from structural connectivity. The proposed approach enables efficient prediction of magnetization without the need for additional Monte Carlo simulations.

Updated: 2025-07-23 13:47:38

标题: 图神经网络方法用于预测准一维伊辛系统中的磁化

摘要: 我们提出了一种基于图的深度学习框架，用于预测准一维Ising自旋系统的磁性特性。晶格几何结构被编码为一个图，并由图神经网络（GNN）处理，接着由全连接层进行处理。该模型在蒙特卡洛模拟数据上进行训练，并准确地重现了磁化曲线的关键特征，包括平台、临界转变点以及几何阻挠的影响。它捕捉了局部模式和全局对称性，表明GNN可以直接从结构连接中推断磁性行为。所提出的方法使得在不需要额外的蒙特卡洛模拟的情况下，能够高效地预测磁化。

更新时间: 2025-07-23 13:47:38

领域: cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2507.17509v1

Joint Multi-Target Detection-Tracking in Cognitive Massive MIMO Radar via POMCP

This correspondence presents a power-aware cognitive radar framework for joint detection and tracking of multiple targets in a massive multiple-input multiple-output (MIMO) radar environment. Building on a previous single-target algorithm based on Partially Observable Monte Carlo Planning (POMCP), we extend it to the multi-target case by assigning each target an independent POMCP tree, enabling scalable and efficient planning. Departing from uniform power allocation-which is often suboptimal with varying signal-to-noise ratios (SNRs)-our approach predicts each target's future angular position and expected received power, based on its estimated range and radar cross-section (RCS). These predictions guide adaptive waveform design via a constrained optimization problem that allocates transmit energy to enhance the detectability of weaker or distant targets, while ensuring sufficient power for high-SNR targets. The reward function in the underlying partially observable Markov decision process (POMDP) is also modified to prioritize accurate spatial and power estimation. Simulations involving multiple targets with different SNRs confirm the effectiveness of our method. The proposed framework for the cognitive radar improves detection probability for low-SNR targets and achieves more accurate tracking compared to approaches using uniform or orthogonal waveforms. These results demonstrate the potential of the POMCP-based framework for adaptive, efficient multi-target radar systems.

Updated: 2025-07-23 13:43:29

标题: 通过POMCP在认知大规模MIMO雷达中进行联合多目标检测跟踪

摘要: 这份通讯介绍了一种面向功耗的认知雷达框架，用于在大规模多输入多输出（MIMO）雷达环境中联合检测和跟踪多个目标。基于先前基于部分可观测蒙特卡洛规划（POMCP）的单目标算法，我们通过为每个目标分配独立的POMCP树将其扩展到多目标情况，实现可扩展和高效的规划。与通常在信噪比（SNR）变化时不太优化的均匀功率分配不同，我们的方法基于估计的距离和雷达截面积（RCS）预测每个目标的未来角位置和预期接收功率。这些预测通过一个受约束的优化问题引导自适应波形设计，该问题将传输能量分配到增强较弱或较远目标的可检测性，同时确保高SNR目标的足够功率。基础部分可观测马尔可夫决策过程（POMDP）中的奖励函数也被修改，以优先考虑准确的空间和功率估计。涉及不同SNR的多目标的模拟验证了我们方法的有效性。认知雷达的提出框架提高了低SNR目标的检测概率，与使用均匀或正交波形的方法相比，实现了更准确的跟踪。这些结果展示了基于POMCP的框架对于自适应、高效的多目标雷达系统的潜力。

更新时间: 2025-07-23 13:43:29

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2507.17506v1

DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD

Transformers have become the de facto backbone of modern deep learning, yet their training typically demands an advanced optimizer with adaptive learning rate like AdamW, rather than a momentum SGDW (mSGDW). Previous works show that it is mainly due to a heavy-tailed distribution of the gradients. In this paper, we introduce a Deeply Normalized Transformer (DNT), which is meticulously engineered to overcome this limitation enabling seamless training with vanilla mSGDW while yielding comparable performance to the Transformers trained via AdamW. To be specific, in DNT, we strategically integrate normalization techniques at proper positions in the Transformers to effectively modulate the Jacobian matrices of each layer, balance the influence of weights, activations, and their interactions, and thus enable the distributions of gradients concentrated. We provide both theoretical justifications of the normalization technique used in our DNT and extensive empirical evaluation on two popular Transformer architectures to validate that: a) DNT outperforms its counterparts (\ie, ViT and GPT), and b) DNT can be effectively trained with vanilla mSGDW.

Updated: 2025-07-23 13:37:23

标题: DNT：一种可以通过动量SGD进行训练的深度归一化Transformer

摘要: Transformers已经成为现代深度学习的事实支柱，然而它们的训练通常需要一种具有自适应学习率的高级优化器，如AdamW，而不是动量SGDW（mSGDW）。先前的研究表明，这主要是由于梯度的重尾分布。在本文中，我们介绍了一种深度规范化Transformer（DNT），它经过精心设计，能够克服这一限制，实现使用普通mSGDW进行无缝训练，同时产生与通过AdamW训练的Transformers相媲美的性能。具体来说，在DNT中，我们在Transformer的适当位置策略性地集成规范化技术，有效调节每一层的雅可比矩阵，平衡权重、激活和它们的相互作用的影响，从而使梯度的分布集中。我们提供了对我们的DNT中使用的规范化技术的理论证明，并对两种流行的Transformer架构进行了广泛的实证评估，以验证：a）DNT优于其对手（即ViT和GPT），b）DNT可以有效地使用普通mSGDW进行训练。

更新时间: 2025-07-23 13:37:23

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2507.17501v1

Fast post-process Bayesian inference with Variational Sparse Bayesian Quadrature

In applied Bayesian inference scenarios, users may have access to a large number of pre-existing model evaluations, for example from maximum-a-posteriori (MAP) optimization runs. However, traditional approximate inference techniques make little to no use of this available information. We propose the framework of post-process Bayesian inference as a means to obtain a quick posterior approximation from existing target density evaluations, with no further model calls. Within this framework, we introduce Variational Sparse Bayesian Quadrature (VSBQ), a method for post-process approximate inference for models with black-box and potentially noisy likelihoods. VSBQ reuses existing target density evaluations to build a sparse Gaussian process (GP) surrogate model of the log posterior density function. Subsequently, we leverage sparse-GP Bayesian quadrature combined with variational inference to achieve fast approximate posterior inference over the surrogate. We validate our method on challenging synthetic scenarios and real-world applications from computational neuroscience. The experiments show that VSBQ builds high-quality posterior approximations by post-processing existing optimization traces, with no further model evaluations.

Updated: 2025-07-23 13:27:03

标题: 使用变分稀疏贝叶斯积分进行快速后处理贝叶斯推断

摘要: 在应用贝叶斯推断场景中，用户可能可以访问大量现有的模型评估结果，例如来自最大后验（MAP）优化运行。然而，传统的近似推断技术很少或根本不利用这些可用信息。我们提出了后处理贝叶斯推断的框架，作为一种从现有目标密度评估中获得快速后验近似的手段，而无需进一步的模型调用。在这个框架内，我们引入了变分稀疏贝叶斯积分（VSBQ），一种针对具有黑盒和潜在嘈杂似然的模型的后处理近似推断方法。VSBQ重复利用现有的目标密度评估来构建对数后验密度函数的稀疏高斯过程（GP）代理模型。随后，我们利用稀疏GP贝叶斯积分结合变分推断，实现对代理模型的快速近似后验推断。我们在具有挑战性的合成场景和计算神经科学的实际应用中验证了我们的方法。实验证明，VSBQ通过后处理现有的优化轨迹建立高质量的后验近似，而无需进一步的模型评估。

更新时间: 2025-07-23 13:27:03

领域: stat.ML,cs.LG,stat.CO,stat.ME

下载: http://arxiv.org/abs/2303.05263v4

To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks

In next-generation communications and networks, machine learning (ML) models are expected to deliver not only accurate predictions but also well-calibrated confidence scores that reflect the true likelihood of correct decisions. This paper studies the calibration performance of an ML-based outage predictor within a single-user, multi-resource allocation framework. We first establish key theoretical properties of this system's outage probability (OP) under perfect calibration. Importantly, we show that as the number of resources grows, the OP of a perfectly calibrated predictor approaches the expected output conditioned on it being below the classification threshold. In contrast, when only one resource is available, the system's OP equals the model's overall expected output. We then derive the OP conditions for a perfectly calibrated predictor. These findings guide the choice of the classification threshold to achieve a desired OP, helping system designers meet specific reliability requirements. We also demonstrate that post-processing calibration cannot improve the system's minimum achievable OP, as it does not introduce new information about future channel states. Additionally, we show that well-calibrated models are part of a broader class of predictors that necessarily improve OP. In particular, we establish a monotonicity condition that the accuracy-confidence function must satisfy for such improvement to occur. To demonstrate these theoretical properties, we conduct a rigorous simulation-based analysis using post-processing calibration techniques: Platt scaling and isotonic regression. As part of this framework, the predictor is trained using an outage loss function specifically designed for this system. Furthermore, this analysis is performed on Rayleigh fading channels with temporal correlation captured by Clarke's 2D model, which accounts for receiver mobility.

Updated: 2025-07-23 13:23:43

标题: 信任与否：关于无线网络中基于机器学习资源分配的校准

摘要: 在下一代通信和网络中，机器学习（ML）模型被期望能够提供准确的预测，同时还能提供良好校准的置信度分数，以反映正确决策的真实可能性。本文研究了基于ML的故障预测器在单用户、多资源分配框架中的校准性能。我们首先建立了系统的故障概率（OP）在完美校准下的关键理论特性。重要的是，我们表明随着资源数量的增加，完美校准预测器的OP逼近于在其低于分类阈值的条件下的期望输出。相比之下，当只有一个资源可用时，系统的OP等于模型的整体期望输出。然后我们推导了完美校准预测器的OP条件。这些发现指导了选择分类阈值以实现所需的OP，帮助系统设计者满足特定的可靠性要求。我们还证明后处理校准不能提高系统的最小可达到的OP，因为它不会引入有关未来信道状态的新信息。此外，我们表明校准良好的模型是一类更广泛的预测器中的一部分，这些预测器必然提高了OP。特别是，我们建立了准确度-置信度函数必须满足的单调性条件，以实现这种改进。为了展示这些理论特性，我们使用后处理校准技术Platt缩放和保序回归进行了严格的基于模拟的分析。作为这个框架的一部分，预测器使用专门设计用于该系统的故障损失函数进行训练。此外，这个分析是在由克拉克的2D模型捕捉到的具有时间相关性的瑞利衰落信道上进行的，该模型考虑了接收器的移动性。

更新时间: 2025-07-23 13:23:43

领域: stat.ML,cs.AI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.17494v1

Automated Hybrid Grounding Using Structural and Data-Driven Heuristics

The grounding bottleneck poses one of the key challenges that hinders the widespread adoption of Answer Set Programming in industry. Hybrid Grounding is a step in alleviating the bottleneck by combining the strength of standard bottom-up grounding with recently proposed techniques where rule bodies are decoupled during grounding. However, it has remained unclear when hybrid grounding shall use body-decoupled grounding and when to use standard bottom-up grounding. In this paper, we address this issue by developing automated hybrid grounding: we introduce a splitting algorithm based on data-structural heuristics that detects when to use body-decoupled grounding and when standard grounding is beneficial. We base our heuristics on the structure of rules and an estimation procedure that incorporates the data of the instance. The experiments conducted on our prototypical implementation demonstrate promising results, which show an improvement on hard-to-ground scenarios, whereas on hard-to-solve instances we approach state-of-the-art performance.

Updated: 2025-07-23 13:19:02

标题: 自动混合接地使用结构和数据驱动启发式方法

摘要: 接地瓶颈是阻碍工业中广泛采用答案集编程的关键挑战之一。混合接地是缓解瓶颈的一种方法，它结合了标准自下而上接地的优势和最近提出的规则体在接地过程中被解耦的技术。然而，目前仍不清楚何时应该使用体解耦接地，何时应该使用标准自下而上接地。本文通过开发自动混合接地来解决这个问题：我们引入了一个基于数据结构启发式的分裂算法，它可以检测何时使用体解耦接地，何时使用标准接地是有益的。我们的启发式基于规则的结构和结合实例数据的估算过程。我们在我们的原型实现上进行的实验显示了有希望的结果，表明在难以接地的情况下有所改善，而在难以解决的实例上，我们接近最先进的性能水平。

更新时间: 2025-07-23 13:19:02

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2507.17493v1

Active Attack Resilience in 5G: A New Take on Authentication and Key Agreement

As 5G networks expand into critical infrastructure, secure and efficient user authentication is more important than ever. The 5G-AKA protocol, standardized by 3GPP in TS 33.501, is central to authentication in current 5G deployments. It provides mutual authentication, user privacy, and key secrecy. However, despite its adoption, 5G-AKA has known limitations in both security and performance. While it focuses on protecting privacy against passive attackers, recent studies show its vulnerabilities to active attacks. It also relies on a sequence number mechanism to prevent replay attacks, requiring perfect synchronization between the device and the core network. This stateful design adds complexity, causes desynchronization, and incurs extra communication overhead. More critically, 5G-AKA lacks Perfect Forward Secrecy (PFS), exposing past communications if long-term keys are compromised-an increasing concern amid sophisticated threats. This paper proposes an enhanced authentication protocol that builds on 5G-AKA's design while addressing its shortcomings. First, we introduce a stateless version that removes sequence number reliance, reducing complexity while staying compatible with existing SIM cards and infrastructure. We then extend this design to add PFS with minimal cryptographic overhead. Both protocols are rigorously analyzed using ProVerif, confirming their compliance with all major security requirements, including resistance to passive and active attacks, as well as those defined by 3GPP and academic studies. We also prototype both protocols and evaluate their performance against 5G-AKA and 5G-AKA' (USENIX'21). Our results show the proposed protocols offer stronger security with only minor computational overhead, making them practical, future-ready solutions for 5G and beyond.

Updated: 2025-07-23 13:18:44

标题: 5G中的主动攻击韧性：身份验证和密钥协议的新方法

摘要: 随着5G网络扩展至关键基础设施，安全高效的用户认证比以往任何时候都更为重要。由3GPP在TS 33.501中标准化的5G-AKA协议在当前5G部署中的认证中起着核心作用。它提供了相互认证、用户隐私和密钥机密性。然而，尽管被采用，5G-AKA在安全性和性能方面存在已知的局限性。虽然它专注于保护隐私免受被动攻击者的侵害，但最近的研究表明其容易受到主动攻击的攻击。它还依赖于序列号机制来防止重放攻击，需要设备与核心网络之间的完美同步。这种有状态的设计增加了复杂性，导致不同步，并产生额外的通信开销。更为关键的是，5G-AKA缺乏完美前向保密性（PFS），如果长期密钥受到威胁，会暴露过去的通信内容，这是在面对复杂威胁时日益引起关注的问题。本文提出了一种增强认证协议，基于5G-AKA的设计并解决了其缺点。首先，我们介绍了一个去除序列号依赖的无状态版本，降低了复杂性，同时与现有SIM卡和基础设施兼容。然后，我们扩展了这一设计，添加了最小的加密开销来实现PFS。这两种协议都经过ProVerif的严格分析，确认它们符合所有主要安全要求，包括抵抗被动和主动攻击，以及3GPP和学术研究所定义的要求。我们还对两种协议进行了原型设计，并评估了它们相对于5G-AKA和5G-AKA'（USENIX'21）的性能。我们的结果显示，提出的协议提供了更强的安全性，只有轻微的计算开销，使其成为5G及更高版本的实用、未来准备的解决方案。

更新时间: 2025-07-23 13:18:44

领域: cs.CR,cs.NI,68M25,C.2.2

下载: http://arxiv.org/abs/2507.17491v1

CQE under Epistemic Dependencies: Algorithms and Experiments (extended version)

We investigate Controlled Query Evaluation (CQE) over ontologies, where information disclosure is regulated by epistemic dependencies (EDs), a family of logical rules recently proposed for the CQE framework. In particular, we combine EDs with the notion of optimal GA censors, i.e. maximal sets of ground atoms that are entailed by the ontology and can be safely revealed. We focus on answering Boolean unions of conjunctive queries (BUCQs) with respect to the intersection of all optimal GA censors - an approach that has been shown in other contexts to ensure strong security guarantees with favorable computational behavior. First, we characterize the security of this intersection-based approach and identify a class of EDs (namely, full EDs) for which it remains safe. Then, for a subclass of EDs and for DL-Lite_R ontologies, we show that answering BUCQs in the above CQE semantics is in AC^0 in data complexity by presenting a suitable, detailed first-order rewriting algorithm. Finally, we report on experiments conducted in two different evaluation scenarios, showing the practical feasibility of our rewriting function.

Updated: 2025-07-23 13:10:33

标题: CQE在认知依赖条件下：算法和实验（扩展版本）

摘要: 我们研究了本体上的受控查询评估（CQE），其中信息披露受到认知依赖（EDs）的监管，这是最近为CQE框架提出的一组逻辑规则。特别是，我们将EDs与最优GA传感器的概念相结合，即由本体蕴含并可安全揭示的地面原子的最大集合。我们专注于回答关于所有最优GA传感器的交集的布尔合取查询（BUCQs），在其他情境中已经证明这种方法可以确保强大的安全保证并具有有利的计算行为。首先，我们表征了基于交集的方法的安全性，并确定了一类EDs（即完整EDs），在这类EDs中，该方法仍然是安全的。然后，对于EDs的一个子类和DL-Lite_R本体，我们通过提供一个适当的、详细的一阶重写算法，证明了在上述CQE语义中回答BUCQs的数据复杂性在AC^0级别。最后，我们报告了在两种不同评估场景中进行的实验，展示了我们的重写函数的实际可行性。

更新时间: 2025-07-23 13:10:33

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2507.17487v1

Unsupervised anomaly detection using Bayesian flow networks: application to brain FDG PET in the context of Alzheimer's disease

Unsupervised anomaly detection (UAD) plays a crucial role in neuroimaging for identifying deviations from healthy subject data and thus facilitating the diagnosis of neurological disorders. In this work, we focus on Bayesian flow networks (BFNs), a novel class of generative models, which have not yet been applied to medical imaging or anomaly detection. BFNs combine the strength of diffusion frameworks and Bayesian inference. We introduce AnoBFN, an extension of BFNs for UAD, designed to: i) perform conditional image generation under high levels of spatially correlated noise, and ii) preserve subject specificity by incorporating a recursive feedback from the input image throughout the generative process. We evaluate AnoBFN on the challenging task of Alzheimer's disease-related anomaly detection in FDG PET images. Our approach outperforms other state-of-the-art methods based on VAEs (beta-VAE), GANs (f-AnoGAN), and diffusion models (AnoDDPM), demonstrating its effectiveness at detecting anomalies while reducing false positive rates.

Updated: 2025-07-23 13:09:57

标题: 无监督异常检测使用贝叶斯流网络：在阿尔茨海默病背景下应用于脑部FDG PET

摘要: 无监督异常检测（UAD）在神经影像学中起着至关重要的作用，可以识别与健康受试者数据偏离，并有助于诊断神经系统疾病。在这项工作中，我们专注于贝叶斯流网络（BFNs），这是一种新型生成模型，尚未应用于医学影像或异常检测。BFNs结合了扩散框架和贝叶斯推断的优势。我们引入了AnoBFN，这是BFNs的一个扩展，用于UAD，旨在：i）在高度空间相关噪声下执行条件图像生成，ii）通过在整个生成过程中从输入图像中引入递归反馈来保留主体特异性。我们在FDG PET图像中评估了AnoBFN在阿尔茨海默病相关异常检测的挑战性任务上的表现。我们的方法胜过基于VAEs（beta-VAE）、GANs（f-AnoGAN）和扩散模型（AnoDDPM）的其他最先进方法，证明了其在检测异常并降低假阳性率方面的有效性。

更新时间: 2025-07-23 13:09:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17486v1

Infinite Video Understanding

The rapid advancements in Large Language Models (LLMs) and their multimodal extensions (MLLMs) have ushered in remarkable progress in video understanding. However, a fundamental challenge persists: effectively processing and comprehending video content that extends beyond minutes or hours. While recent efforts like Video-XL-2 have demonstrated novel architectural solutions for extreme efficiency, and advancements in positional encoding such as HoPE and VideoRoPE++ aim to improve spatio-temporal understanding over extensive contexts, current state-of-the-art models still encounter significant computational and memory constraints when faced with the sheer volume of visual tokens from lengthy sequences. Furthermore, maintaining temporal coherence, tracking complex events, and preserving fine-grained details over extended periods remain formidable hurdles, despite progress in agentic reasoning systems like Deep Video Discovery. This position paper posits that a logical, albeit ambitious, next frontier for multimedia research is Infinite Video Understanding -- the capability for models to continuously process, understand, and reason about video data of arbitrary, potentially never-ending duration. We argue that framing Infinite Video Understanding as a blue-sky research objective provides a vital north star for the multimedia, and the wider AI, research communities, driving innovation in areas such as streaming architectures, persistent memory mechanisms, hierarchical and adaptive representations, event-centric reasoning, and novel evaluation paradigms. Drawing inspiration from recent work on long/ultra-long video understanding and several closely related fields, we outline the core challenges and key research directions towards achieving this transformative capability.

Updated: 2025-07-23 13:06:44

标题: 无限视频理解

摘要: 大型语言模型（LLMs）及其多模态扩展（MLLMs）的快速进展已经在视频理解方面取得了显著进展。然而，一个基本挑战仍然存在：有效处理和理解超过几分钟甚至几小时的视频内容。虽然最近的努力，如Video-XL-2已经展示了极高效率的创新架构解决方案，以及位置编码的进展，如HoPE和VideoRoPE++旨在改善对广泛上下文的时空理解，但当前最先进的模型在面对漫长序列中大量视觉令牌时仍然遇到了重大的计算和内存约束。此外，尽管在代理性推理系统（如Deep Video Discovery）方面取得进展，但保持时间连贯性、跟踪复杂事件，并在长时间内保留细节仍然是艰巨的障碍。本文认为，媒体研究的一个合乎逻辑但雄心勃勃的下一步前沿是无限视频理解 - 模型能够持续处理、理解和推理任意可能永无止境的视频数据。我们认为将无限视频理解作为一个蓝天研究目标为多媒体和更广泛的人工智能研究社区提供了一个至关重要的指导方向，推动了诸如流媒体架构、持久存储机制、分层和自适应表示、以事件为中心的推理和新颖的评估范式等领域的创新。受长/超长视频理解和几个密切相关领域最近工作的启发，我们概述了实现这一变革能力的核心挑战和关键研究方向。

更新时间: 2025-07-23 13:06:44

领域: cs.CV,cs.AI,cs.IR,cs.LG,cs.MM

下载: http://arxiv.org/abs/2507.09068v2

Leveraging Diffusion Models for Parameterized Quantum Circuit Generation

Quantum computing holds immense potential, yet its practical success depends on multiple factors, including advances in quantum circuit design. In this paper, we introduce a generative approach based on denoising diffusion models (DMs) to synthesize parameterized quantum circuits (PQCs). Extending the recent diffusion model pipeline of F\"urrutter et al. [1], our model effectively conditions the synthesis process, enabling the simultaneous generation of circuit architectures and their continuous gate parameters. We demonstrate our approach in synthesizing PQCs optimized for generating high-fidelity Greenberger-Horne-Zeilinger (GHZ) states and achieving high accuracy in quantum machine learning (QML) classification tasks. Our results indicate a strong generalization across varying gate sets and scaling qubit counts, highlighting the versatility and computational efficiency of diffusion-based methods. This work illustrates the potential of generative models as a powerful tool for accelerating and optimizing the design of PQCs, supporting the development of more practical and scalable quantum applications.

Updated: 2025-07-23 13:04:46

标题: 利用扩散模型进行参数化量子电路生成

摘要: 量子计算具有巨大潜力，但其实际成功取决于多个因素，包括量子电路设计的进展。在本文中，我们介绍了一种基于去噪扩散模型（DMs）的生成方法，用于合成参数化量子电路（PQCs）。扩展了最近F\"urrutter等人的扩散模型流水线[1]，我们的模型有效地调节了合成过程，实现了电路结构和连续门参数的同时生成。我们展示了我们的方法在合成优化生成高保真度Greenberger-Horne-Zeilinger（GHZ）态和在量子机器学习（QML）分类任务中实现高准确度方面的性能。我们的结果表明，在不同门集和缩放量子比特数量的情况下具有强大的泛化能力，突显了基于扩散的方法的多功能性和计算效率。这项工作展示了生成模型作为加速和优化PQCs设计的强大工具的潜力，支持更实用和可扩展的量子应用的发展。

更新时间: 2025-07-23 13:04:46

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2505.20863v3

LTLZinc: a Benchmarking Framework for Continual Learning and Neuro-Symbolic Temporal Reasoning

Neuro-symbolic artificial intelligence aims to combine neural architectures with symbolic approaches that can represent knowledge in a human-interpretable formalism. Continual learning concerns with agents that expand their knowledge over time, improving their skills while avoiding to forget previously learned concepts. Most of the existing approaches for neuro-symbolic artificial intelligence are applied to static scenarios only, and the challenging setting where reasoning along the temporal dimension is necessary has been seldom explored. In this work we introduce LTLZinc, a benchmarking framework that can be used to generate datasets covering a variety of different problems, against which neuro-symbolic and continual learning methods can be evaluated along the temporal and constraint-driven dimensions. Our framework generates expressive temporal reasoning and continual learning tasks from a linear temporal logic specification over MiniZinc constraints, and arbitrary image classification datasets. Fine-grained annotations allow multiple neural and neuro-symbolic training settings on the same generated datasets. Experiments on six neuro-symbolic sequence classification and four class-continual learning tasks generated by LTLZinc, demonstrate the challenging nature of temporal learning and reasoning, and highlight limitations of current state-of-the-art methods. We release the LTLZinc generator and ten ready-to-use tasks to the neuro-symbolic and continual learning communities, in the hope of fostering research towards unified temporal learning and reasoning frameworks.

Updated: 2025-07-23 13:04:13

标题: LTLZinc：用于持续学习和神经符号时间推理的基准框架

摘要: 神经符号人工智能旨在结合能够以人类可解释形式表示知识的符号方法和神经结构。持续学习涉及代理在时间内扩展其知识，提高其技能，同时避免忘记先前学习的概念。目前大多数神经符号人工智能方法仅适用于静态场景，很少探索需要沿时间维度推理的具有挑战性的设置。在这项工作中，我们介绍了LTLZinc，一个基准框架，可用于生成涵盖各种不同问题的数据集，以评估神经符号和持续学习方法在时间和约束驱动维度上的性能。我们的框架从MiniZinc约束和任意图像分类数据集上的线性时间逻辑规范中生成具有表达性的时间推理和持续学习任务。细粒度的注释允许在同一生成的数据集上进行多个神经和神经符号训练设置。通过LTLZinc生成的六个神经符号序列分类和四个类别持续学习任务的实验，展示了时间学习和推理的挑战性质，并突显了当前最先进方法的局限性。我们发布了LTLZinc生成器和十个可供神经符号和持续学习社区使用的任务，希望促进统一的时间学习和推理框架的研究。

更新时间: 2025-07-23 13:04:13

领域: cs.AI

下载: http://arxiv.org/abs/2507.17482v1

RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning

In the pursuit of robust autonomous driving systems, models trained on real-world datasets often struggle to adapt to new environments, particularly when confronted with corner cases such as extreme weather conditions. Collecting these corner cases in the real world is non-trivial, which necessitates the use of simulators for validation. However,the high computational cost and the domain gap in data distribution have hindered the seamless transition between real and simulated driving scenarios. To tackle this challenge, we propose Retrieval-Augmented Learning for Autonomous Driving (RALAD), a novel framework designed to bridge the real-to-sim gap at a low cost. RALAD features three primary designs, including (1) domain adaptation via an enhanced Optimal Transport (OT) method that accounts for both individual and grouped image distances, (2) a simple and unified framework that can be applied to various models, and (3) efficient fine-tuning techniques that freeze the computationally expensive layers while maintaining robustness. Experimental results demonstrate that RALAD compensates for the performance degradation in simulated environments while maintaining accuracy in real-world scenarios across three different models. Taking Cross View as an example, the mIOU and mAP metrics in real-world scenarios remain stable before and after RALAD fine-tuning, while in simulated environments,the mIOU and mAP metrics are improved by 10.30% and 12.29%, respectively. Moreover, the re-training cost of our approach is reduced by approximately 88.1%. Our code is available at https://github.com/JiachengZuo/RALAD.git.

Updated: 2025-07-23 13:04:12

标题: RALAD：使用检索增强学习弥合自动驾驶中的现实到模拟领域差距

摘要: 在追求强大的自动驾驶系统时，基于真实世界数据集训练的模型往往难以适应新环境，特别是在面对极端天气等特殊情况时。收集真实世界中的这些特殊情况并不容易，这就需要使用模拟器进行验证。然而，高计算成本和数据分布中的领域差距阻碍了真实和模拟驾驶场景之间的无缝过渡。为了解决这一挑战，我们提出了一种新颖的框架——用于自动驾驶的检索增强学习（RALAD），旨在以较低成本解决真实和模拟之间的差距。RALAD具有三个主要设计，包括（1）通过增强的最优输运（OT）方法进行领域适应，考虑了个体和分组图像距离，（2）一个简单且统一的框架，可应用于各种模型，以及（3）通过冻结计算昂贵的层并保持鲁棒性的高效微调技巧。实验结果表明，RALAD在模拟环境中弥补了性能下降，同时在真实世界场景中保持了准确性，跨三种不同模型。以Cross View为例，在真实世界场景中，mIOU和mAP指标在RALAD微调前后保持稳定，而在模拟环境中，mIOU和mAP指标分别提高了10.30%和12.29%。此外，我们方法的重新训练成本减少了约88.1％。我们的代码可在https://github.com/JiachengZuo/RALAD.git 上找到。

更新时间: 2025-07-23 13:04:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2501.12296v3

The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training

We show that learning-rate schedules for large model training behave surprisingly similar to a performance bound from non-smooth convex optimization theory. We provide a bound for the constant schedule with linear cooldown; in particular, the practical benefit of cooldown is reflected in the bound due to the absence of logarithmic terms. Further, we show that this surprisingly close match between optimization theory and practice can be exploited for learning-rate tuning: we achieve noticeable improvements for training 124M and 210M Llama-type models by (i) extending the schedule for continued training with optimal learning-rate, and (ii) transferring the optimal learning-rate across schedules.

Updated: 2025-07-23 13:03:41

标题: 大型模型训练中凸优化理论与学习率调度的惊人一致性

摘要: 我们发现，用于大模型训练的学习率调度表现出与非平滑凸优化理论中的性能界限惊人相似的特点。我们为恒定调度与线性冷却提供了一个界限；特别是，由于缺乏对数项，冷却的实际好处在界限中得到体现。此外，我们展示了优化理论与实践之间这种令人惊讶的密切匹配可以用于学习率调整：我们通过（i）扩展持续训练的最佳学习率的调度，和（ii）跨调度传递最佳学习率，为训练124M和210M的Llama型模型实现了显著的改进。

更新时间: 2025-07-23 13:03:41

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2501.18965v2

SRMambaV2: Biomimetic Attention for Sparse Point Cloud Upsampling in Autonomous Driving

Upsampling LiDAR point clouds in autonomous driving scenarios remains a significant challenge due to the inherent sparsity and complex 3D structures of the data. Recent studies have attempted to address this problem by converting the complex 3D spatial scenes into 2D image super-resolution tasks. However, due to the sparse and blurry feature representation of range images, accurately reconstructing detailed and complex spatial topologies remains a major difficulty. To tackle this, we propose a novel sparse point cloud upsampling method named SRMambaV2, which enhances the upsampling accuracy in long-range sparse regions while preserving the overall geometric reconstruction quality. Specifically, inspired by human driver visual perception, we design a biomimetic 2D selective scanning self-attention (2DSSA) mechanism to model the feature distribution in distant sparse areas. Meanwhile, we introduce a dual-branch network architecture to enhance the representation of sparse features. In addition, we introduce a progressive adaptive loss (PAL) function to further refine the reconstruction of fine-grained details during the upsampling process. Experimental results demonstrate that SRMambaV2 achieves superior performance in both qualitative and quantitative evaluations, highlighting its effectiveness and practical value in automotive sparse point cloud upsampling tasks.

Updated: 2025-07-23 13:01:19

标题: SRMambaV2：自动驾驶中用于稀疏点云上采样的仿生关注

摘要: 在自动驾驶场景中，LiDAR点云的上采样仍然是一个重要挑战，这是由于数据的固有稀疏性和复杂的3D结构所导致的。最近的研究尝试通过将复杂的3D空间场景转换为2D图像超分辨率任务来解决这个问题。然而，由于范围图像的稀疏和模糊特征表示，精确重建详细和复杂的空间拓扑仍然是一个主要困难。为了解决这个问题，我们提出了一种名为SRMambaV2的新型稀疏点云上采样方法，它在长距离稀疏区域增强了上采样精度，同时保持了整体几何重建质量。具体地，受人类驾驶员视觉感知的启发，我们设计了一种仿生2D选择性扫描自注意力（2DSSA）机制来建模远距离稀疏区域中的特征分布。同时，我们引入了一个双分支网络架构来增强稀疏特征的表示。此外，我们引入了一个渐进自适应损失（PAL）函数，在上采样过程中进一步细化细节重建。实验结果表明，SRMambaV2在定性和定量评估中均表现出优越性能，突显了其在汽车稀疏点云上采样任务中的有效性和实用价值。

更新时间: 2025-07-23 13:01:19

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.17479v1

An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models

Large Language Models (LLMs) have demonstrated remarkable progress in instruction following and general-purpose reasoning. However, achieving high-quality alignment with human intent and safety norms without human annotations remains a fundamental challenge. In this work, we propose an Uncertainty-Driven Adaptive Self-Alignment (UDASA) framework designed to improve LLM alignment in a fully automated manner. UDASA first generates multiple responses for each input and quantifies output uncertainty across three dimensions: semantics, factuality, and value alignment. Based on these uncertainty scores, the framework constructs preference pairs and categorizes training samples into three stages, conservative, moderate, and exploratory, according to their uncertainty difference. The model is then optimized progressively across these stages. In addition, we conduct a series of preliminary studies to validate the core design assumptions and provide strong empirical motivation for the proposed framework. Experimental results show that UDASA outperforms existing alignment methods across multiple tasks, including harmlessness, helpfulness, truthfulness, and controlled sentiment generation, significantly improving model performance.

Updated: 2025-07-23 13:00:00

标题: 一个基于不确定性驱动的用于大型语言模型的自适应自对齐框架

摘要: 大型语言模型（LLMs）在指令遵循和通用推理方面取得了显著进展。然而，在没有人类注释的情况下实现与人类意图和安全规范高质量对齐仍然是一个基本挑战。在这项工作中，我们提出了一种基于不确定性驱动的自适应自对齐（UDASA）框架，旨在以完全自动化的方式改善LLM的对齐。UDASA首先为每个输入生成多个响应，并在三个维度上量化输出不确定性：语义、事实性和价值对齐。基于这些不确定性分数，框架构建偏好对，并根据它们的不确定性差异将训练样本分类为保守、中等和探索性三个阶段。然后，模型在这些阶段逐渐优化。此外，我们进行了一系列初步研究来验证核心设计假设，并为所提出的框架提供强有力的实证动机。实验结果显示，UDASA在多个任务（包括无害性、有益性、真实性和受控情感生成）上优于现有的对齐方法，显著提高了模型性能。

更新时间: 2025-07-23 13:00:00

领域: cs.AI

下载: http://arxiv.org/abs/2507.17477v1

MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs

Although recent Large Language Models (LLMs) have shown rapid improvement on reasoning benchmarks in English, the evaluation of such LLMs' multilingual reasoning capability across diverse languages and cultural contexts remains limited. Existing multilingual reasoning benchmarks are typically constructed by translating existing English reasoning benchmarks, biasing these benchmarks towards reasoning problems with context in English language/cultures. In this work, we introduce the Multilingual Native Reasoning Challenge (MultiNRC), a benchmark designed to assess LLMs on more than 1,000 native, linguistic and culturally grounded reasoning questions written by native speakers in French, Spanish, and Chinese. MultiNRC covers four core reasoning categories: language-specific linguistic reasoning, wordplay & riddles, cultural/tradition reasoning, and math reasoning with cultural relevance. For cultural/tradition reasoning and math reasoning with cultural relevance, we also provide English equivalent translations of the multilingual questions by manual translation from native speakers fluent in English. This set of English equivalents can provide a direct comparison of LLM reasoning capacity in other languages vs. English on the same reasoning questions. We systematically evaluate current 14 leading LLMs covering most LLM families on MultiNRC and its English equivalent set. The results show that (1) current LLMs are still not good at native multilingual reasoning, with none scoring above 50% on MultiNRC; (2) LLMs exhibit distinct strengths and weaknesses in handling linguistic, cultural, and logical reasoning tasks; (3) Most models perform substantially better in math reasoning in English compared to in original languages (+10%), indicating persistent challenges with culturally grounded knowledge.

Updated: 2025-07-23 12:56:31

标题: MultiNRC：一种具有挑战性的本地多语言推理评估基准，用于LLM

摘要: 尽管最近的大型语言模型（LLMs）在英语推理基准上显示出快速改进，但对这些LLMs在不同语言和文化背景下的多语推理能力的评估仍然有限。现有的多语推理基准通常是通过翻译现有的英语推理基准构建的，这使得这些基准偏向于具有英语语言/文化背景背景的推理问题。在这项工作中，我们引入了多语本土推理挑战（MultiNRC），这是一个旨在评估LLMs的基准，包括超过1,000个由法语、西班牙语和中文的母语者撰写的本土、语言和文化基础的推理问题。MultiNRC涵盖四个核心推理类别：语言特定的语言推理、文字游戏和谜语、文化/传统推理以及具有文化相关性的数学推理。对于文化/传统推理和具有文化相关性的数学推理，我们还通过母语流利的英语人士手工翻译提供了多语问题的英语等效翻译。这组英语等效翻译可以直接比较LLM在其他语言与英语上的推理能力。我们系统地评估了当前14个主要LLMs在MultiNRC及其英语等效集中的覆盖大多数LLM家族的情况。结果显示：（1）目前的LLMs在本土多语推理方面仍然表现不佳，没有一个在MultiNRC上得分超过50%；（2）LLMs在处理语言、文化和逻辑推理任务时表现出明显的优势和劣势；（3）大多数模型在英语数学推理中表现比原始语言中更好（+10%），这表明在文化基础知识方面仍存在挑战。

更新时间: 2025-07-23 12:56:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.17476v1

BGM-HAN: A Hierarchical Attention Network for Accurate and Fair Decision Assessment on Semi-Structured Profiles

Human decision-making in high-stakes domains often relies on expertise and heuristics, but is vulnerable to hard-to-detect cognitive biases that threaten fairness and long-term outcomes. This work presents a novel approach to enhancing complex decision-making workflows through the integration of hierarchical learning alongside various enhancements. Focusing on university admissions as a representative high-stakes domain, we propose BGM-HAN, an enhanced Byte-Pair Encoded, Gated Multi-head Hierarchical Attention Network, designed to effectively model semi-structured applicant data. BGM-HAN captures multi-level representations that are crucial for nuanced assessment, improving both interpretability and predictive performance. Experimental results on real admissions data demonstrate that our proposed model significantly outperforms both state-of-the-art baselines from traditional machine learning to large language models, offering a promising framework for augmenting decision-making in domains where structure, context, and fairness matter. Source code is available at: https://github.com/junhua/bgm-han.

Updated: 2025-07-23 12:52:38

标题: BGM-HAN：用于半结构化档案准确和公平决策评估的分层注意力网络

摘要: 人类在高风险领域的决策往往依赖专业知识和启发式，但容易受到难以检测的认知偏见的影响，威胁公平和长期结果。本文提出了一种增强复杂决策工作流程的新方法，通过整合分层学习和各种增强功能。以大学招生作为代表性的高风险领域，我们提出了BGM-HAN，一种增强的字节对编码、门控多头分层注意网络，旨在有效建模半结构化申请人数据。BGM-HAN捕捉了关键的多层次表示，对细致评估至关重要，提高了可解释性和预测性能。对真实招生数据的实验结果表明，我们提出的模型明显优于传统机器学习到大型语言模型的最新基线，为增强关注结构、背景和公平性的领域的决策提供了一个有前途的框架。源代码可在以下网址找到：https://github.com/junhua/bgm-han。

更新时间: 2025-07-23 12:52:38

领域: cs.LG,cs.AI,cs.IR

下载: http://arxiv.org/abs/2507.17472v1

Demonstration of Efficient Predictive Surrogates for Large-scale Quantum Processors

The ongoing development of quantum processors is driving breakthroughs in scientific discovery. Despite this progress, the formidable cost of fabricating large-scale quantum processors means they will remain rare for the foreseeable future, limiting their widespread application. To address this bottleneck, we introduce the concept of predictive surrogates, which are classical learning models designed to emulate the mean-value behavior of a given quantum processor with provably computational efficiency. In particular, we propose two predictive surrogates that can substantially reduce the need for quantum processor access in diverse practical scenarios. To demonstrate their potential in advancing digital quantum simulation, we use these surrogates to emulate a quantum processor with up to 20 programmable superconducting qubits, enabling efficient pre-training of variational quantum eigensolvers for families of transverse-field Ising models and identification of non-equilibrium Floquet symmetry-protected topological phases. Experimental results reveal that the predictive surrogates not only reduce measurement overhead by orders of magnitude, but can also surpass the performance of conventional, quantum-resource-intensive approaches. Collectively, these findings establish predictive surrogates as a practical pathway to broadening the impact of advanced quantum processors.

Updated: 2025-07-23 12:51:03

标题: 大规模量子处理器的高效预测替代方案演示

摘要: 随着量子处理器的持续发展，科学发现取得了突破。尽管取得了这一进展，制造大规模量子处理器的巨大成本意味着它们在可预见的未来仍将很少见，限制了它们的广泛应用。为了解决这一瓶颈，我们引入了预测替代品的概念，这些是设计用来模拟给定量子处理器均值行为的经典学习模型，具有可证明的计算效率。特别地，我们提出了两种预测替代品，可以在不同实际场景中大幅减少对量子处理器的访问需求。为了展示它们在推进数字量子模拟方面的潜力，我们使用这些替代品来模拟具有高达20个可编程超导量子比特的量子处理器，实现对横场伊辛模型家族的变分量子本征求解器进行高效的预训练，并识别非平衡弗洛凯对称保护拓扑相。实验结果显示，预测替代品不仅可以将测量开销降低数个数量级，而且还能超越传统的、资源密集型的方法。总的来说，这些发现确立了预测替代品作为拓宽先进量子处理器影响的实际途径。

更新时间: 2025-07-23 12:51:03

领域: quant-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.17470v1

Probing Vision-Language Understanding through the Visual Entailment Task: promises and pitfalls

This study investigates the extent to which the Visual Entailment (VE) task serves as a reliable probe of vision-language understanding in multimodal language models, using the LLaMA 3.2 11B Vision model as a test case. Beyond reporting performance metrics, we aim to interpret what these results reveal about the underlying possibilities and limitations of the VE task. We conduct a series of experiments across zero-shot, few-shot, and fine-tuning settings, exploring how factors such as prompt design, the number and order of in-context examples and access to visual information might affect VE performance. To further probe the reasoning processes of the model, we used explanation-based evaluations. Results indicate that three-shot inference outperforms the zero-shot baselines. However, additional examples introduce more noise than they provide benefits. Additionally, the order of the labels in the prompt is a critical factor that influences the predictions. In the absence of visual information, the model has a strong tendency to hallucinate and imagine content, raising questions about the model's over-reliance on linguistic priors. Fine-tuning yields strong results, achieving an accuracy of 83.3% on the e-SNLI-VE dataset and outperforming the state-of-the-art OFA-X model. Additionally, the explanation evaluation demonstrates that the fine-tuned model provides semantically meaningful explanations similar to those of humans, with a BERTScore F1-score of 89.2%. We do, however, find comparable BERTScore results in experiments with limited vision, questioning the visual grounding of this task. Overall, our results highlight both the utility and limitations of VE as a diagnostic task for vision-language understanding and point to directions for refining multimodal evaluation methods.

Updated: 2025-07-23 12:46:51

标题: 通过视觉蕴含任务探究视觉-语言理解：前景与缺陷

摘要: 这项研究调查了视觉蕴涵（VE）任务在多模态语言模型中作为视觉-语言理解可靠探测器的程度，以LLaMA 3.2 11B Vision模型作为测试案例。除了报告性能指标外，我们旨在解释这些结果揭示了VE任务的潜在可能性和局限性。我们在零样本、少样本和微调设置下进行了一系列实验，探讨提示设计、上下文示例的数量和顺序以及访问视觉信息等因素如何影响VE的性能。为了进一步探究模型的推理过程，我们使用基于解释的评估。结果表明，三次推理优于零样本基线。然而，额外的示例引入的噪音比提供的好处更多。此外，提示中标签的顺序是影响预测的关键因素。在没有视觉信息的情况下，模型有强烈的倾向产生幻觉和想象内容，引发对模型过度依赖语言先验的质疑。微调产生了强大的结果，在e-SNLI-VE数据集上实现了83.3％的准确率，并且优于最先进的OFA-X模型。此外，解释评估表明，微调模型提供了语义上有意义的解释，类似于人类的解释，BERTScore F1分数为89.2％。然而，我们发现在有限视觉情况下的实验中具有可比的BERTScore结果，质疑了该任务的视觉基础。总的来说，我们的结果突出了VE作为视觉-语言理解诊断任务的效用和局限性，并指出了改进多模态评估方法的方向。

更新时间: 2025-07-23 12:46:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17467v1

MIRA: Medical Time Series Foundation Model for Real-World Health Data

A unified foundation model for medical time series -- pretrained on open access and ethics board-approved medical corpora -- offers the potential to reduce annotation burdens, minimize model customization, and enable robust transfer across clinical institutions, modalities, and tasks, particularly in data-scarce or privacy-constrained environments. However, existing generalist time series foundation models struggle to handle medical time series data due to their inherent challenges, including irregular intervals, heterogeneous sampling rates, and frequent missing values. To address these challenges, we introduce MIRA, a unified foundation model specifically designed for medical time series forecasting. MIRA incorporates a Continuous-Time Rotary Positional Encoding that enables fine-grained modeling of variable time intervals, a frequency-specific mixture-of-experts layer that routes computation across latent frequency regimes to further promote temporal specialization, and a Continuous Dynamics Extrapolation Block based on Neural ODE that models the continuous trajectory of latent states, enabling accurate forecasting at arbitrary target timestamps. Pretrained on a large-scale and diverse medical corpus comprising over 454 billion time points collect from publicly available datasets, MIRA achieves reductions in forecasting errors by an average of 10% and 7% in out-of-distribution and in-distribution scenarios, respectively, when compared to other zero-shot and fine-tuned baselines. We also introduce a comprehensive benchmark spanning multiple downstream clinical tasks, establishing a foundation for future research in medical time series modeling.

Updated: 2025-07-23 12:45:18

标题: MIRA：用于真实世界健康数据的医疗时间序列基础模型

摘要: 一个统一的医学时间序列基础模型——在开放获取和伦理委员会批准的医学语料库上预训练——为减轻注释负担、最小化模型定制以及在数据稀缺或受隐私限制环境下实现跨临床机构、模态和任务的稳健迁移提供了潜力。然而，现有的通用时间序列基础模型由于其固有的挑战，包括不规则间隔、异质采样率和频繁缺失值，难以处理医学时间序列数据。为了解决这些挑战，我们引入了MIRA，一个专门为医学时间序列预测设计的统一基础模型。MIRA包含一种连续时间旋转位置编码，可以精细建模可变时间间隔，一个特定频率的专家混合层，可以在潜在频率区域之间路由计算，进一步促进时间特化，以及基于神经ODE的连续动力学外推块，模拟潜在状态的连续轨迹，从而能够在任意目标时间戳上进行准确的预测。在一个包含超过4540亿时间点的大规模和多样化医学语料库上预训练，与其他零射击和微调基线相比，MIRA在分布外和分布内场景中的预测错误平均分别降低了10%和7%。我们还引入了一个涵盖多个下游临床任务的全面基准，为未来医学时间序列建模研究奠定了基础。

更新时间: 2025-07-23 12:45:18

领域: cs.LG

下载: http://arxiv.org/abs/2506.07584v3

MolX: Enhancing Large Language Models for Molecular Understanding With A Multi-Modal Extension

Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields, moving beyond natural language understanding. However, their proficiency within the chemistry domain remains restricted, especially in solving molecule-related tasks. This challenge is attributed to their inherent limitations in comprehending molecules using only common textual representations, i.e. SMILES strings. In this study, we seek to enhance the ability of LLMs to comprehend molecules by equipping them with a multi-modal external module, termed MolX. Instead of directly using SMILES strings to represent a molecule, we utilize specific encoders to extract fine-grained features from both SMILES string and 2D molecular graph representations for feeding into an LLM. A hand-crafted molecular fingerprint is incorporated to leverage its embedded domain knowledge. To establish an alignment between MolX and the LLM's textual input space, the model in which the LLM is frozen, is pre-trained with a strategy including a diverse set of tasks. Experimental evaluations show that our proposed method outperforms baselines across 4 downstream molecule-related tasks ranging from molecule-to-text translation to retrosynthesis, with and without fine-tuning the LLM, while only introducing a small number of trainable parameters--0.53\% and 0.82\%, respectively.

Updated: 2025-07-23 12:32:35

标题: MolX: 利用多模态扩展增强大型语言模型对分子理解的能力

摘要: 大型语言模型(LLMs)凭借其强大的任务处理能力，在一系列领域取得了显著进展，超越了自然语言理解。然而，它们在化学领域的专业性仍然受限，特别是在解决与分子相关的任务方面。这一挑战归因于它们仅使用常见的文本表示形式，即SMILES字符串，来理解分子的固有限制。在这项研究中，我们试图通过为它们配备一个名为MolX的多模态外部模块，提升LLMs理解分子的能力。我们不直接使用SMILES字符串来表示分子，而是利用特定的编码器从SMILES字符串和2D分子图表示中提取细粒度特征，以输入到LLMs中。我们还加入了手工制作的分子指纹，以利用其嵌入的领域知识。为了在MolX和LLMs的文本输入空间之间建立对齐，冻结LLMs的模型通过一个包括各种任务的策略进行预训练。实验评估表明，我们提出的方法在4个下游与分子相关的任务中表现优于基线方法，包括从分子到文本的翻译到逆合成，无论是否对LLMs进行微调，同时只引入了少量可训练参数-分别为0.53%和0.82%。

更新时间: 2025-07-23 12:32:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.06777v8

Mapping of Weed Management Methods in Orchards using Sentinel-2 and PlanetScope Data

Effective weed management is crucial for improving agricultural productivity, as weeds compete with crops for vital resources like nutrients and water. Accurate maps of weed management methods are essential for policymakers to assess farmer practices, evaluate impacts on vegetation health, biodiversity, and climate, as well as ensure compliance with policies and subsidies. However, monitoring weed management methods is challenging as they commonly rely on ground-based field surveys, which are often costly, time-consuming and subject to delays. In order to tackle this problem, we leverage earth observation data and Machine Learning (ML). Specifically, we developed separate ML models using Sentinel-2 and PlanetScope satellite time series data, respectively, to classify four distinct weed management methods (Mowing, Tillage, Chemical-spraying, and No practice) in orchards. The findings demonstrate the potential of ML-driven remote sensing to enhance the efficiency and accuracy of weed management mapping in orchards.

Updated: 2025-07-23 12:31:13

标题: 果园杂草管理方法的映射：利用Sentinel-2和PlanetScope数据

摘要: 有效的杂草管理对于提高农业生产力至关重要，因为杂草与作物竞争重要资源，如营养和水。准确的杂草管理方法地图对于政策制定者来说至关重要，以评估农民的实践，评估对植被健康、生物多样性和气候的影响，以及确保遵守政策和补贴。然而，监测杂草管理方法具有挑战性，因为它们通常依赖于地面野外调查，这往往既昂贵又耗时，并且容易出现延迟。为了解决这个问题，我们利用地球观测数据和机器学习（ML）。具体而言，我们分别使用Sentinel-2和PlanetScope卫星时间序列数据开发了不同的ML模型，以在果园中对四种不同的杂草管理方法（修剪、耕作、化学喷洒和无实践）进行分类。研究结果表明，基于ML的遥感技术有潜力提高果园中杂草管理地图的效率和准确性。

更新时间: 2025-07-23 12:31:13

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.19991v2

A Deep Learning Approach for Augmenting Perceptional Understanding of Histopathology Images

In Recent Years, Digital Technologies Have Made Significant Strides In Augmenting-Human-Health, Cognition, And Perception, Particularly Within The Field Of Computational-Pathology. This Paper Presents A Novel Approach To Enhancing The Analysis Of Histopathology Images By Leveraging A Mult-modal-Model That Combines Vision Transformers (Vit) With Gpt-2 For Image Captioning. The Model Is Fine-Tuned On The Specialized Arch-Dataset, Which Includes Dense Image Captions Derived From Clinical And Academic Resources, To Capture The Complexities Of Pathology Images Such As Tissue Morphologies, Staining Variations, And Pathological Conditions. By Generating Accurate, Contextually Captions, The Model Augments The Cognitive Capabilities Of Healthcare Professionals, Enabling More Efficient Disease Classification, Segmentation, And Detection. The Model Enhances The Perception Of Subtle Pathological Features In Images That Might Otherwise Go Unnoticed, Thereby Improving Diagnostic Accuracy. Our Approach Demonstrates The Potential For Digital Technologies To Augment Human Cognitive Abilities In Medical Image Analysis, Providing Steps Toward More Personalized And Accurate Healthcare Outcomes.

Updated: 2025-07-23 12:27:38

标题: 一种用于增强组织病理学图像感知理解的深度学习方法

摘要: 近年来，数字技术在增强人类健康、认知和感知方面取得了重大进展，特别是在计算病理学领域。本文提出了一种新颖的方法，通过将视觉转换器（Vit）与Gpt-2相结合，以提高组织病理学图像分析的能力。该模型在专门的Arch数据集上进行微调，该数据集包括从临床和学术资源中提取的密集图像标题，以捕捉病理图像的复杂性，如组织形态学、染色变异和病理条件。通过生成准确的上下文标题，该模型增强了医疗保健专业人员的认知能力，实现更高效的疾病分类、分割和检测。该模型增强了图像中微妙病理特征的感知，这些特征可能被忽略，从而提高了诊断的准确性。我们的方法展示了数字技术在医学图像分析中增强人类认知能力的潜力，为更个性化和准确的医疗结果迈出了一步。

更新时间: 2025-07-23 12:27:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.06894v3

C3RL: Rethinking the Combination of Channel-independence and Channel-mixing from Representation Learning

Multivariate time series forecasting has drawn increasing attention due to its practical importance. Existing approaches typically adopt either channel-mixing (CM) or channel-independence (CI) strategies. CM strategy can capture inter-variable dependencies but fails to discern variable-specific temporal patterns. CI strategy improves this aspect but fails to fully exploit cross-variable dependencies like CM. Hybrid strategies based on feature fusion offer limited generalization and interpretability. To address these issues, we propose C3RL, a novel representation learning framework that jointly models both CM and CI strategies. Motivated by contrastive learning in computer vision, C3RL treats the inputs of the two strategies as transposed views and builds a siamese network architecture: one strategy serves as the backbone, while the other complements it. By jointly optimizing contrastive and prediction losses with adaptive weighting, C3RL balances representation and forecasting performance. Extensive experiments on seven models show that C3RL boosts the best-case performance rate to 81.4\% for models based on CI strategy and to 76.3\% for models based on CM strategy, demonstrating strong generalization and effectiveness. The code will be available once the paper is accepted.

Updated: 2025-07-23 12:21:26

标题: C3RL：重新思考表示学习中通道独立性和通道混合的结合

摘要: 多变量时间序列预测因其实际重要性而受到越来越多的关注。现有方法通常采用通道混合（CM）或通道独立（CI）策略。CM策略能够捕捉变量间的依赖关系，但无法区分特定变量的时间模式。CI策略改进了这一方面，但未能充分利用像CM那样的跨变量依赖关系。基于特征融合的混合策略提供了有限的泛化性和可解释性。为了解决这些问题，我们提出了C3RL，这是一个新颖的表示学习框架，可以共同建模CM和CI策略。受计算机视觉中对比学习的启发，C3RL将两种策略的输入视为转置视图，并构建了一个连体网络架构：一种策略作为主干，而另一种策略作为补充。通过联合优化对比和预测损失，并使用自适应加权，C3RL平衡了表示和预测性能。对七种模型进行的大量实验表明，C3RL将基于CI策略的最佳性能率提高到81.4\%，将基于CM策略的最佳性能率提高到76.3\%，展示了强大的泛化性和有效性。一旦论文被接受，代码将会提供。

更新时间: 2025-07-23 12:21:26

领域: cs.LG

下载: http://arxiv.org/abs/2507.17454v1

Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees

The vulnerability of neural networks to adversarial perturbations has necessitated formal verification techniques that can rigorously certify the quality of neural networks. As the state-of-the-art, branch and bound (BaB) is a "divide-and-conquer" strategy that applies off-the-shelf verifiers to sub-problems for which they perform better. While BaB can identify the sub-problems that are necessary to be split, it explores the space of these sub-problems in a naive "first-come-first-serve" manner, thereby suffering from an issue of inefficiency to reach a verification conclusion. To bridge this gap, we introduce an order over different sub-problems produced by BaB, concerning with their different likelihoods of containing counterexamples. Based on this order, we propose a novel verification framework Oliva that explores the sub-problem space by prioritizing those sub-problems that are more likely to find counterexamples, in order to efficiently reach the conclusion of the verification. Even if no counterexample can be found in any sub-problem, it only changes the order of visiting different sub-problem and so will not lead to a performance degradation. Specifically, Oliva has two variants, including $Oliva^{GR}$, a greedy strategy that always prioritizes the sub-problems that are more likely to find counterexamples, and $Oliva^{SA}$, a balanced strategy inspired by simulated annealing that gradually shifts from exploration to exploitation to locate the globally optimal sub-problems. We experimentally evaluate the performance of Oliva on 690 verification problems spanning over 5 models with datasets MNIST and CIFAR10. Compared to the state-of-the-art approaches, we demonstrate the speedup of Oliva for up to 25X in MNIST, and up to 80X in CIFAR10.

Updated: 2025-07-23 12:20:20

标题: 通过分支界限树的顺序引导探索实现高效的神经网络验证

摘要: 神经网络对对抗性扰动的脆弱性已经促使形式验证技术的发展，这些技术可以严格认证神经网络的质量。作为最先进的技术，分支定界（BaB）是一种“分而治之”的策略，将现成的验证器应用于它们表现更好的子问题。虽然BaB可以确定需要拆分的子问题，但它以一种天真的“先来先服务”的方式探索这些子问题的空间，因此存在效率低下以达到验证结论的问题。为了弥补这一差距，我们引入了一个关于BaB生成的不同子问题的顺序，关注它们包含反例的不同可能性。基于这个顺序，我们提出了一个新颖的验证框架Oliva，通过优先考虑更有可能找到反例的子问题来探索子问题空间，以便有效地达到验证结论。即使在任何子问题中找不到反例，它也只会改变访问不同子问题的顺序，因此不会导致性能降低。具体而言，Oliva有两个变体，包括$Oliva^{GR}$，一种总是优先考虑更有可能找到反例的子问题的贪婪策略，以及$Oliva^{SA}$，受模拟退火启发的平衡策略，逐渐从勘探转向开发，以找到全局最优的子问题。我们在跨越5个模型的690个验证问题上对Oliva的性能进行了实验评估，使用MNIST和CIFAR10数据集。与最先进的方法相比，我们展示了Oliva在MNIST上高达25倍的加速，以及在CIFAR10上高达80倍的加速。

更新时间: 2025-07-23 12:20:20

领域: cs.LG,cs.PL,cs.SE

下载: http://arxiv.org/abs/2507.17453v1

JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models

We introduce JEDI, a test-time adaptation method that enhances subject separation and compositional alignment in diffusion models without requiring retraining or external supervision. JEDI operates by minimizing semantic entanglement in attention maps using a novel Jensen-Shannon divergence based objective. To improve efficiency, we leverage adversarial optimization, reducing the number of updating steps required. JEDI is model-agnostic and applicable to architectures such as Stable Diffusion 1.5 and 3.5, consistently improving prompt alignment and disentanglement in complex scenes. Additionally, JEDI provides a lightweight, CLIP-free disentanglement score derived from internal attention distributions, offering a principled benchmark for compositional alignment under test-time conditions. Code and results are available at https://ericbill21.github.io/JEDI/.

Updated: 2025-07-23 12:14:57

标题: JEDI：Jensen-Shannon散度在解开扩散模型中的作用

摘要: 我们介绍了JEDI，这是一种在扩散模型中增强主题分离和组合对齐的测试时间适应方法，无需重新训练或外部监督。JEDI通过使用基于新颖的Jensen-Shannon散度目标来最小化注意力图中的语义纠缠来运行。为了提高效率，我们利用对抗优化，减少了所需的更新步骤数。JEDI是模型不可知的，并适用于稳定扩散1.5和3.5等架构，始终在复杂场景中提高提示对齐和解缠。此外，JEDI提供了一个轻量级的、不依赖CLIP的解缠分数，根据内部注意力分布得出，为测试时间条件下的组合对齐提供了一个有原则的基准。代码和结果可在https://ericbill21.github.io/JEDI/获取。

更新时间: 2025-07-23 12:14:57

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2505.19166v2

Persistent Patterns in Eye Movements: A Topological Approach to Emotion Recognition

We present a topological pipeline for automated multiclass emotion recognition from eye-tracking data. Delay embeddings of gaze trajectories are analyzed using persistent homology. From the resulting persistence diagrams, we extract shape-based features such as mean persistence, maximum persistence, and entropy. A random forest classifier trained on these features achieves up to $75.6\%$ accuracy on four emotion classes, which are the quadrants the Circumplex Model of Affect. The results demonstrate that persistence diagram geometry effectively encodes discriminative gaze dynamics, suggesting a promising topological approach for affective computing and human behavior analysis.

Updated: 2025-07-23 12:14:17

标题: 眼动中的持续模式：一种拓扑学方法用于情绪识别

摘要: 我们提出了一个用于自动多类情绪识别的拓扑管道，其中利用眼动数据。我们使用持久同调来分析注视轨迹的延迟嵌入。从得到的持久图中，我们提取基于形状的特征，如平均持久性、最大持久性和熵。一个基于这些特征训练的随机森林分类器在四种情绪类别上取得了高达75.6%的准确率，这些类别是影响的圆环模型的象限。结果表明，持久图几何有效地编码了具有区分性的注视动态，表明了一种有前途的拓扑方法，用于情感计算和人类行为分析。

更新时间: 2025-07-23 12:14:17

领域: cs.LG,55N31

下载: http://arxiv.org/abs/2507.17450v1

Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning

Retrosynthesis planning, essential in organic synthesis and drug discovery, has greatly benefited from recent AI-driven advancements. Nevertheless, existing methods frequently face limitations in both applicability and explainability. Traditional graph-based and sequence-to-sequence models often lack generalized chemical knowledge, leading to predictions that are neither consistently accurate nor easily explainable. To address these challenges, we introduce RetroDFM-R, a reasoning-based large language model (LLM) designed specifically for chemical retrosynthesis. Leveraging large-scale reinforcement learning guided by chemically verifiable rewards, RetroDFM-R significantly enhances prediction accuracy and explainability. Comprehensive evaluations demonstrate that RetroDFM-R significantly outperforms state-of-the-art methods, achieving a top-1 accuracy of 65.0% on the USPTO-50K benchmark. Double-blind human assessments further validate the chemical plausibility and practical utility of RetroDFM-R's predictions. RetroDFM-R also accurately predicts multistep retrosynthetic routes reported in the literature for both real-world drug molecules and perovskite materials. Crucially, the model's explicit reasoning process provides human-interpretable insights, thereby enhancing trust and practical value in real-world retrosynthesis applications.

Updated: 2025-07-23 12:13:06

标题: 基于推理的大语言模型通过强化学习进行逆合成预测

摘要: 回溯合成规划在有机合成和药物发现中至关重要，最近人工智能驱动的进展极大地受益。然而，现有方法经常面临适用性和可解释性方面的限制。传统的基于图的和序列到序列模型经常缺乏广义化的化学知识，导致预测既不一致准确也不易解释。为了解决这些挑战，我们引入了RetroDFM-R，一个基于推理的大型语言模型（LLM），专门针对化学回溯合成设计。借助大规模强化学习指导的化学可验证奖励，RetroDFM-R显著提高了预测准确性和可解释性。全面评估表明，RetroDFM-R显著优于最先进的方法，在USPTO-50K基准测试中实现了65.0%的一级准确性。双盲人类评估进一步验证了RetroDFM-R预测的化学合理性和实用性。RetroDFM-R还准确预测了文献报道的真实世界药物分子和钙钛矿材料的多步回溯合成路线。关键是，该模型的明确推理过程提供了人类可解释的见解，从而增强了在实际回溯合成应用中的信任和实际价值。

更新时间: 2025-07-23 12:13:06

领域: cs.CE,cs.AI,physics.chem-ph

下载: http://arxiv.org/abs/2507.17448v1

IndoorBEV: Joint Detection and Footprint Completion of Objects via Mask-based Prediction in Indoor Scenarios for Bird's-Eye View Perception

Detecting diverse objects within complex indoor 3D point clouds presents significant challenges for robotic perception, particularly with varied object shapes, clutter, and the co-existence of static and dynamic elements where traditional bounding box methods falter. To address these limitations, we propose IndoorBEV, a novel mask-based Bird's-Eye View (BEV) method for indoor mobile robots. In a BEV method, a 3D scene is projected into a 2D BEV grid which handles naturally occlusions and provides a consistent top-down view aiding to distinguish static obstacles from dynamic agents. The obtained 2D BEV results is directly usable to downstream robotic tasks like navigation, motion prediction, and planning. Our architecture utilizes an axis compact encoder and a window-based backbone to extract rich spatial features from this BEV map. A query-based decoder head then employs learned object queries to concurrently predict object classes and instance masks in the BEV space. This mask-centric formulation effectively captures the footprint of both static and dynamic objects regardless of their shape, offering a robust alternative to bounding box regression. We demonstrate the effectiveness of IndoorBEV on a custom indoor dataset featuring diverse object classes including static objects and dynamic elements like robots and miscellaneous items, showcasing its potential for robust indoor scene understanding.

Updated: 2025-07-23 12:07:21

标题: 室内BEV：通过基于掩模预测的联合检测和完整化对象的足迹，在鸟瞰视角感知室内场景

摘要: 在复杂的室内3D点云中检测各种物体对机器人感知提出了重大挑战，特别是在物体形状各异、混乱和静态动态元素共存的情况下，传统的边界框方法会出现问题。为了解决这些限制，我们提出了IndoorBEV，这是一种新颖的基于掩模的室内移动机器人鸟瞰（BEV）方法。在BEV方法中，将3D场景投影到2D BEV网格中，处理自然遮挡并提供一致的俯视图，有助于区分静态障碍物和动态代理。获得的2D BEV结果可直接用于下游机器人任务，如导航、运动预测和规划。我们的架构利用轴紧凑编码器和基于窗口的主干来从此BEV地图中提取丰富的空间特征。然后，一个基于查询的解码器头使用学习的对象查询来同时预测BEV空间中的对象类别和实例掩模。这种以掩模为中心的公式有效地捕捉了静态和动态对象的轮廓，无论它们的形状如何，为边界框回归提供了强大的替代方案。我们在一个自定义的室内数据集上展示了IndoorBEV的有效性，该数据集包含各种静态对象和动态元素，如机器人和杂项物品，展示了其在室内场景理解方面的潜力。

更新时间: 2025-07-23 12:07:21

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.17445v1

EndoControlMag: Robust Endoscopic Vascular Motion Magnification with Periodic Reference Resetting and Hierarchical Tissue-aware Dual-Mask Contro

Visualizing subtle vascular motions in endoscopic surgery is crucial for surgical precision and decision-making, yet remains challenging due to the complex and dynamic nature of surgical scenes. To address this, we introduce EndoControlMag, a training-free, Lagrangian-based framework with mask-conditioned vascular motion magnification tailored to endoscopic environments. Our approach features two key modules: a Periodic Reference Resetting (PRR) scheme that divides videos into short overlapping clips with dynamically updated reference frames to prevent error accumulation while maintaining temporal coherence, and a Hierarchical Tissue-aware Magnification (HTM) framework with dual-mode mask dilation. HTM first tracks vessel cores using a pretrained visual tracking model to maintain accurate localization despite occlusions and view changes. It then applies one of two adaptive softening strategies to surrounding tissues: motion-based softening that modulates magnification strength proportional to observed tissue displacement, or distance-based exponential decay that simulates biomechanical force attenuation. This dual-mode approach accommodates diverse surgical scenarios-motion-based softening excels with complex tissue deformations while distance-based softening provides stability during unreliable optical flow conditions. We evaluate EndoControlMag on our EndoVMM24 dataset spanning four different surgery types and various challenging scenarios, including occlusions, instrument disturbance, view changes, and vessel deformations. Quantitative metrics, visual assessments, and expert surgeon evaluations demonstrate that EndoControlMag significantly outperforms existing methods in both magnification accuracy and visual quality while maintaining robustness across challenging surgical conditions. The code, dataset, and video results are available at https://szupc.github.io/EndoControlMag/.

Updated: 2025-07-23 12:04:57

标题: EndoControlMag：具有周期性参考重置和分层组织感知双蒙版控制的鲁棒内窥镜血管运动放大

摘要: 在内窥手术中，可视化微小血管运动对手术精度和决策至关重要，但由于手术场景的复杂和动态性，仍然具有挑战性。为了解决这个问题，我们引入了EndoControlMag，这是一个无需训练的基于拉格朗日的框架，具有面向内窥环境定制的面具条件血管运动放大。我们的方法包括两个关键模块：周期性参考重置（PRR）方案，将视频划分为短重叠片段，并动态更新参考帧，以防止误差累积同时保持时间上的连贯性；以及具有双模面具膨胀的分层组织感知放大（HTM）框架。HTM首先使用预训练的视觉跟踪模型跟踪血管核心，以保持精确的定位，尽管存在遮挡和视角变化。然后，它应用两种自适应软化策略之一到周围组织：基于运动的软化，根据观察到的组织位移调节放大强度，或者基于距离的指数衰减，模拟生物力学力的衰减。这种双模式方法适应各种外科手术场景-基于运动的软化在复杂组织变形时表现出色，而基于距离的软化在不可靠的光流条件下提供稳定性。我们在我们的EndoVMM24数据集上评估了EndoControlMag，该数据集涵盖了四种不同的手术类型和各种具有挑战性的情景，包括遮挡、仪器干扰、视角变化和血管变形。定量指标、视觉评估和专家外科医生评估表明，EndoControlMag在放大精度和视觉质量方面明显优于现有方法，在具有挑战性的外科条件下保持了稳健性。代码、数据集和视频结果可在https://szupc.github.io/EndoControlMag/上找到。

更新时间: 2025-07-23 12:04:57

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.15292v3

Each to Their Own: Exploring the Optimal Embedding in RAG

Recently, as Large Language Models (LLMs) have fundamentally impacted various fields, the methods for incorporating up-to-date information into LLMs or adding external knowledge to construct domain-specific models have garnered wide attention. Retrieval-Augmented Generation (RAG), serving as an inference-time scaling method, is notable for its low cost and minimal effort for parameter tuning. However, due to heterogeneous training data and model architecture, the variant embedding models used in RAG exhibit different benefits across various areas, often leading to different similarity calculation results and, consequently, varying response quality from LLMs. To address this problem, we propose and examine two approaches to enhance RAG by combining the benefits of multiple embedding models, named Mixture-Embedding RAG and Confident RAG. Mixture-Embedding RAG simply sorts and selects retrievals from multiple embedding models based on standardized similarity; however, it does not outperform vanilla RAG. In contrast, Confident RAG generates responses multiple times using different embedding models and then selects the responses with the highest confidence level, demonstrating average improvements of approximately 10% and 5% over vanilla LLMs and RAG, respectively. The consistent results across different LLMs and embedding models indicate that Confident RAG is an efficient plug-and-play approach for various domains. We will release our code upon publication.

Updated: 2025-07-23 12:03:54

标题: 每个人都有自己的方式：探索RAG中的最佳嵌入

摘要: 最近，大型语言模型（LLMs）已经对各个领域产生了根本性影响，将最新信息整合到LLMs中或添加外部知识以构建特定领域模型的方法引起了广泛关注。检索增强生成（RAG）作为一种推理时间扩展方法，以低成本和最小的参数调整工作而著称。然而，由于异构的训练数据和模型架构，RAG中使用的变种嵌入模型在各个领域表现出不同的优势，通常导致不同的相似度计算结果，从而导致LLMs的响应质量不同。为了解决这个问题，我们提出并研究了两种增强RAG的方法，分别命名为混合嵌入RAG和自信RAG。混合嵌入RAG简单地根据标准化相似度对多个嵌入模型的检索进行排序和选择；然而，它并没有超越普通的RAG。相反，自信RAG使用不同的嵌入模型多次生成响应，然后选择具有最高信心水平的响应，分别比普通LLMs和RAG平均提高约10%和5%。不同LLMs和嵌入模型之间的一致结果表明，自信RAG是各个领域的高效即插即用方法。我们将在发表后发布我们的代码。

更新时间: 2025-07-23 12:03:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.17442v1

Doubly robust outlier resistant inference on causal treatment effect

Outliers can severely distort causal effect estimation in observational studies, yet this issue has received limited attention in the literature. Their influence is especially pronounced in small sample sizes, where detecting and removing outliers becomes increasingly difficult. Therefore, it is essential to estimate treatment effects robustly without excluding these influential data points. To address this, we propose a doubly robust point estimator for the average treatment effect under a contaminated model that includes outliers. Robustness in outcome regression is achieved through a robust estimating equation, while covariate balancing propensity scores (CBPS) ensure resilience in propensity score modeling. To prevent model overfitting due to the inclusion of numerous parameters, we incorporate variable selection. All these components are unified under a penalized empirical likelihood framework. For confidence interval estimation, most existing approaches rely on asymptotic properties, which may be unreliable in finite samples. We derive an optimal finite-sample confidence interval for the average treatment effect using our proposed estimating equation, ensuring that the interval bounds remain unaffected by outliers. Through simulations and a real-world application involving hypertension data with outliers, we demonstrate that our method consistently outperforms existing approaches in both accuracy and robustness.

Updated: 2025-07-23 11:58:54

标题: 双重稳健的异常值抵抗因果治疗效应推断

摘要: 异常值可以严重扭曲观察性研究中的因果效应估计，然而这个问题在文献中受到了有限的关注。它们的影响在样本量较小的情况下尤为明显，在这种情况下，检测和排除异常值变得越来越困难。因此，至关重要的是在不排除这些有影响力的数据点的情况下稳健地估计处理效应。为了解决这个问题，我们提出了一个双重稳健点估计器，用于在包含异常值的混合模型下估计平均处理效应。通过稳健估计方程实现了结果回归的稳健性，而协变量平衡倾向得分（CBPS）确保了在倾向得分建模中的韧性。为了防止由于包含大量参数而导致的模型过拟合，我们引入了变量选择。所有这些组件都统一在一个惩罚经验似然框架下。对于置信区间估计，大多数现有方法依赖于渐近性质，这在有限样本中可能不可靠。我们利用我们提出的估计方程推导出了一个关于平均处理效应的最佳有限样本置信区间，确保区间边界不受异常值影响。通过模拟和涉及具有异常值的高血压数据的实际应用，我们表明我们的方法在准确性和稳健性方面始终优于现有方法。

更新时间: 2025-07-23 11:58:54

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2507.17439v1

Gathering and Exploiting Higher-Order Information when Training Large Structured Models

When training large models, such as neural networks, the full derivatives of order 2 and beyond are usually inaccessible, due to their computational cost. Therefore, among the second-order optimization methods, it is common to bypass the computation of the Hessian by using first-order information, such as the gradient of the parameters (e.g., quasi-Newton methods) or the activations (e.g., K-FAC). In this paper, we focus on the exact and explicit computation of projections of the Hessian and higher-order derivatives on well-chosen subspaces relevant for optimization. Namely, for a given partition of the set of parameters, we compute tensors that can be seen as "higher-order derivatives according to the partition", at a reasonable cost as long as the number of subsets of the partition remains small. Then, we give some examples of how these tensors can be used. First, we show how to compute a learning rate per subset of parameters, which can be used for hyperparameter tuning. Second, we show how to use these tensors at order 2 to construct an optimization method that uses information contained in the Hessian. Third, we show how to use these tensors at order 3 (information contained in the third derivative of the loss) to regularize this optimization method. The resulting training step has several interesting properties, including: it takes into account long-range interactions between the layers of the trained neural network, which is usually not the case in similar methods (e.g., K-FAC); the trajectory of the optimization is invariant under affine layer-wise reparameterization.

Updated: 2025-07-23 11:50:49

标题: 采集和利用高阶信息在训练大型结构化模型时

摘要: 在训练大型模型，例如神经网络时，由于计算成本，通常无法访问二阶及以上的全导数。因此，在二阶优化方法中，常常通过使用一阶信息，如参数的梯度（例如拟牛顿方法）或激活值（例如K-FAC），来避免计算Hessian矩阵。本文侧重于在优化中针对精心选择的子空间精确和明确计算Hessian矩阵和高阶导数的投影。具体而言，对于给定的参数集合分区，我们计算可以被视为“根据分区的高阶导数”的张量，只要分区的子集数量保持较少，计算成本就是合理的。然后，我们给出了一些这些张量可以如何使用的示例。首先，我们展示如何为每个参数子集计算学习率，这可用于超参数调整。其次，我们展示如何在二阶时使用这些张量构建一种利用Hessian矩阵信息的优化方法。第三，我们展示如何在三阶时（损失函数的三阶导数中包含的信息）对这种优化方法进行正则化。结果的训练步骤具有几个有趣的特性，包括：考虑训练神经网络各层之间的长程相互作用，这在类似方法中通常不会发生（例如K-FAC）；优化的轨迹在仿射层间重参数化下保持不变。

更新时间: 2025-07-23 11:50:49

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2312.03885v4

Leveraging Trustworthy AI for Automotive Security in Multi-Domain Operations: Towards a Responsive Human-AI Multi-Domain Task Force for Cyber Social Security

Multi-Domain Operations (MDOs) emphasize cross-domain defense against complex and synergistic threats, with civilian infrastructures like smart cities and Connected Autonomous Vehicles (CAVs) emerging as primary targets. As dual-use assets, CAVs are vulnerable to Multi-Surface Threats (MSTs), particularly from Adversarial Machine Learning (AML) which can simultaneously compromise multiple in-vehicle ML systems (e.g., Intrusion Detection Systems, Traffic Sign Recognition Systems). Therefore, this study investigates how key hyperparameters in Decision Tree-based ensemble models-Random Forest (RF), Gradient Boosting (GB), and Extreme Gradient Boosting (XGB)-affect the time required for a Black-Box AML attack i.e. Zeroth Order Optimization (ZOO). Findings show that parameters like the number of trees or boosting rounds significantly influence attack execution time, with RF and GB being more sensitive than XGB. Adversarial Training (AT) time is also analyzed to assess the attacker's window of opportunity. By optimizing hyperparameters, this research supports Defensive Trustworthy AI (D-TAI) practices within MST scenarios and contributes to the development of resilient ML systems for civilian and military domains, aligned with Cyber Social Security framework in MDOs and Human-AI Multi-Domain Task Forces.

Updated: 2025-07-23 11:46:52

标题: 利用可信的人工智能技术提升汽车安全在多领域操作中的应用：朝向一个响应灵敏的人工智能多领域任务部队，用于网络社会安全

摘要: 多域作战（MDOs）强调跨领域对抗复杂和协同威胁，民用基础设施如智慧城市和连接自主车辆（CAVs）正成为主要目标。作为双重用途资产，CAVs易受多表面威胁（MSTs）的影响，特别是来自对抗性机器学习（AML）的影响，后者可以同时破坏多个车载机器学习系统（例如入侵检测系统、交通标志识别系统）。因此，本研究调查了基于决策树的集成模型-随机森林（RF）、梯度提升（GB）和极端梯度提升（XGB）中的关键超参数如何影响黑盒AML攻击所需的时间，即零阶优化（ZOO）。研究结果表明，树的数量或提升轮数等参数显著影响攻击执行时间，RF和GB比XGB更敏感。攻击者的机会窗口也通过对抗性训练（AT）时间进行分析。通过优化超参数，这项研究支持MST场景中的防御可信AI（D-TAI）实践，并为民用和军用领域的韧性机器学习系统的发展做出贡献，与MDOs中的网络社会安全框架和人工智能多领域任务部队保持一致。

更新时间: 2025-07-23 11:46:52

领域: cs.CR

下载: http://arxiv.org/abs/2507.21145v1

Fair Compromises in Participatory Budgeting: a Multi-Agent Deep Reinforcement Learning Approach

Participatory budgeting is a method of collectively understanding and addressing spending priorities where citizens vote on how a budget is spent, it is regularly run to improve the fairness of the distribution of public funds. Participatory budgeting requires voters to make decisions on projects which can lead to ``choice overload". A multi-agent reinforcement learning approach to decision support can make decision making easier for voters by identifying voting strategies that increase the winning proportion of their vote. This novel approach can also support policymakers by highlighting aspects of election design that enable fair compromise on projects. This paper presents a novel, ethically aligned approach to decision support using multi-agent deep reinforcement learning modelling. This paper introduces a novel use of a branching neural network architecture to overcome scalability challenges of multi-agent reinforcement learning in a decentralized way. Fair compromises are found through optimising voter actions towards greater representation of voter preferences in the winning set. Experimental evaluation with real-world participatory budgeting data reveals a pattern in fair compromise: that it is achievable through projects with smaller cost.

Updated: 2025-07-23 11:46:13

标题: 参与式预算中的公平妥协：一种多Agent深度强化学习方法

摘要: 参与式预算是一种集体了解和解决支出优先事项的方法，市民投票决定预算如何支出，定期进行以改善公共资金分配的公平性。参与式预算要求选民在可能导致“选择过载”的项目上做出决策。一种多智能体强化学习方法可以帮助选民更容易地做出决策，通过识别增加其投票获胜比例的投票策略。这种新颖的方法还可以通过突出选举设计中促进项目公平妥协的方面来支持决策者。本文提出了一种新颖的、符合道德的决策支持方法，使用多智能体深度强化学习建模。本文介绍了一个新颖的使用分支神经网络架构来克服多智能体强化学习中可扩展性挑战的方法。通过优化选民行动，朝着在获胜集中更好地代表选民偏好的方向找到公平妥协。通过对真实世界参与式预算数据进行实验评估，揭示了公平妥协的模式：通过成本较低的项目是可以实现的。

更新时间: 2025-07-23 11:46:13

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2507.17433v1

AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference

Recent advancements in Large Visual Language Models (LVLMs) have gained significant attention due to their remarkable reasoning capabilities and proficiency in generalization. However, processing a large number of visual tokens and generating long-context outputs impose substantial computational overhead, leading to excessive demands for key-value (KV) cache. To address this critical bottleneck, we propose AirCache, a novel KV cache compression method aimed at accelerating LVLMs inference. This work systematically investigates the correlations between visual and textual tokens within the attention mechanisms of LVLMs. Our empirical analysis reveals considerable redundancy in cached visual tokens, wherein strategically eliminating these tokens preserves model performance while significantly accelerating context generation. Inspired by these findings, we introduce an elite observation window for assessing the importance of visual components in the KV cache, focusing on stable inter-modal relevancy modeling with enhanced multi-perspective consistency. Additionally, we develop an adaptive layer-wise budget allocation strategy that capitalizes on the strength and skewness of token importance distribution, showcasing superior efficiency compared to uniform allocation. Comprehensive evaluations across multiple LVLMs and benchmarks demonstrate that our method achieves comparable performance to the full cache while retaining only 10% of visual KV cache, thereby reducing decoding latency by 29% to 66% across various batch size and prompt length of inputs. Notably, as cache retention rates decrease, our method exhibits increasing performance advantages over existing approaches.

Updated: 2025-07-23 11:42:03

标题: AirCache: 激活跨模态相关性KV缓存压缩，以实现高效的大型视觉语言模型推理

摘要: 最近，大型视觉语言模型（LVLMs）的最新进展引起了广泛关注，因为它们具有卓越的推理能力和泛化能力。然而，处理大量视觉标记并生成长期上下文输出会带来巨大的计算开销，导致对键-值（KV）缓存的过度需求。为了解决这一关键瓶颈，我们提出了AirCache，一种新颖的KV缓存压缩方法，旨在加速LVLMs的推理过程。这项工作系统地研究了LVLMs的注意机制中视觉和文本标记之间的相关性。我们的实证分析揭示了缓存的视觉标记中存在相当多的冗余，通过策略性地消除这些标记，可以保留模型性能同时显著加速上下文生成。受到这些发现的启发，我们引入了一个精英观察窗口，用于评估KV缓存中视觉组件的重要性，重点放在稳定的跨模态相关性建模和增强的多角度一致性上。此外，我们开发了一种自适应逐层预算分配策略，利用了标记重要性分布的强度和偏斜，展示了与均匀分配相比的卓越效率。在多个LVLMs和基准测试中的全面评估表明，我们的方法在仅保留10%的视觉KV缓存的情况下实现了与完整缓存相当的性能，从而将解码延迟在不同批处理大小和提示长度的输入中降低了29%至66%。值得注意的是，随着缓存保留率的降低，我们的方法表现出对现有方法越来越大的性能优势。

更新时间: 2025-07-23 11:42:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.23956v3

Ctx2TrajGen: Traffic Context-Aware Microscale Vehicle Trajectories using Generative Adversarial Imitation Learning

Precise modeling of microscopic vehicle trajectories is critical for traffic behavior analysis and autonomous driving systems. We propose Ctx2TrajGen, a context-aware trajectory generation framework that synthesizes realistic urban driving behaviors using GAIL. Leveraging PPO and WGAN-GP, our model addresses nonlinear interdependencies and training instability inherent in microscopic settings. By explicitly conditioning on surrounding vehicles and road geometry, Ctx2TrajGen generates interaction-aware trajectories aligned with real-world context. Experiments on the drone-captured DRIFT dataset demonstrate superior performance over existing methods in terms of realism, behavioral diversity, and contextual fidelity, offering a robust solution to data scarcity and domain shift without simulation.

Updated: 2025-07-23 11:21:27

标题: Ctx2TrajGen：使用生成对抗模仿学习的交通上下文感知微观车辆轨迹

摘要: 微观车辆轨迹的精确建模对于交通行为分析和自动驾驶系统至关重要。我们提出了一种上下文感知轨迹生成框架Ctx2TrajGen，利用GAIL合成真实的城市驾驶行为。通过利用PPO和WGAN-GP，我们的模型解决了微观环境中固有的非线性相互依赖性和训练不稳定性。通过明确地对周围车辆和道路几何进行调节，Ctx2TrajGen生成与现实环境相一致的互动感知轨迹。对无人机捕获的DRIFT数据集的实验表明，相比现有方法，我们的模型在现实性、行为多样性和上下文忠实度方面表现出优异性能，为数据稀缺和领域转移提供了稳健解决方案，无需模拟。

更新时间: 2025-07-23 11:21:27

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.17418v1

A Comprehensive Evaluation on Quantization Techniques for Large Language Models

For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is a rapidly evolving research field. Though many papers have reported breakthrough performance, they may not conduct experiments on the same ground since one quantization method usually contains multiple components. In addition, analyzing the theoretical connections among existing methods is crucial for in-depth understanding. To bridge these gaps, we conduct an extensive review of state-of-the-art methods and perform comprehensive evaluations on the same ground to ensure fair comparisons. To our knowledge, this fair and extensive investigation remains critically important yet underexplored. To better understand the theoretical connections, we decouple the published quantization methods into two steps: pre-quantization transformation and quantization error mitigation. We define the former as a preprocessing step applied before quantization to reduce the impact of outliers, making the data distribution flatter and more suitable for quantization. Quantization error mitigation involves techniques that offset the errors introduced during quantization, thereby enhancing model performance. We evaluate and analyze the impact of different components of quantization methods. Additionally, we analyze and evaluate the latest MXFP4 data format and its performance. Our experimental results demonstrate that optimized rotation and scaling yield the best performance for pre-quantization transformation, and combining low-rank compensation with GPTQ occasionally outperforms using GPTQ alone for quantization error mitigation. Furthermore, we explore the potential of the latest MXFP4 quantization and reveal that the optimal pre-quantization transformation strategy for INT4 does not generalize well to MXFP4, inspiring further investigation.

Updated: 2025-07-23 11:21:21

标题: 大型语言模型的量化技术的全面评估

摘要: 对于大型语言模型（LLMs），训练后量化（PTQ）可以显著减少内存占用和计算开销。模型量化是一个快速发展的研究领域。尽管许多论文报道了突破性的表现，但它们可能没有在相同的基础上进行实验，因为一个量化方法通常包含多个组件。此外，分析现有方法之间的理论联系对于深入理解至关重要。为了弥合这些差距，我们进行了对最先进方法的广泛评估，并在相同基础上进行全面评估，以确保公平比较。据我们所知，这种公平和全面的调查仍然至关重要，但尚未得到充分探讨。为了更好地理解理论联系，我们将已发表的量化方法分解为两个步骤：预量化变换和量化误差缓解。我们将前者定义为在量化之前应用的预处理步骤，以减少异常值的影响，使数据分布更加平坦，更适合量化。量化误差缓解涉及抵消量化过程中引入的错误的技术，从而提高模型性能。我们评估并分析了量化方法的不同组件的影响。此外，我们分析和评估了最新的MXFP4数据格式及其性能。我们的实验结果表明，优化的旋转和缩放为预量化变换提供了最佳性能，将低秩补偿与GPTQ结合有时优于仅使用GPTQ进行量化误差缓解。此外，我们探索了最新的MXFP4量化的潜力，并发现INT4的最佳预量化变换策略并不适用于MXFP4，这激发了进一步的研究。

更新时间: 2025-07-23 11:21:21

领域: cs.LG

下载: http://arxiv.org/abs/2507.17417v1

Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging

The increasing volume of medical images poses challenges for radiologists in retrieving relevant cases. Content-based image retrieval (CBIR) systems offer potential for efficient access to similar cases, yet lack standardized evaluation and comprehensive studies. Building on prior studies for tumor characterization via CBIR, this study advances CBIR research for volumetric medical images through three key contributions: (1) a framework eliminating reliance on pre-segmented data and organ-specific datasets, aligning with large and unstructured image archiving systems, i.e. PACS in clinical practice; (2) introduction of C-MIR, a novel volumetric re-ranking method adapting ColBERT's contextualized late interaction mechanism for 3D medical imaging; (3) comprehensive evaluation across four tumor sites using three feature extractors and three database configurations. Our evaluations highlight the significant advantages of C-MIR. We demonstrate the successful adaptation of the late interaction principle to volumetric medical images, enabling effective context-aware re-ranking. A key finding is C-MIR's ability to effectively localize the region of interest, eliminating the need for pre-segmentation of datasets and offering a computationally efficient alternative to systems relying on expensive data enrichment steps. C-MIR demonstrates promising improvements in tumor flagging, achieving improved performance, particularly for colon and lung tumors (p<0.05). C-MIR also shows potential for improving tumor staging, warranting further exploration of its capabilities. Ultimately, our work seeks to bridge the gap between advanced retrieval techniques and their practical applications in healthcare, paving the way for improved diagnostic processes.

Updated: 2025-07-23 11:12:52

标题: 基于内容的三维图像检索和受ColBERT启发的肿瘤标记和分期重新排序

摘要: 随着医学影像数量的增加，放射科医师在检索相关病例方面面临挑战。基于内容的图像检索（CBIR）系统为获取类似病例提供了高效途径，但缺乏标准化评估和全面研究。本研究在以往关于通过CBIR进行肿瘤表征的研究基础上，通过三个关键贡献推进了对体积医学图像的CBIR研究：（1）一个框架，消除了对预分割数据和器官特定数据集的依赖，与临床实践中的大型和非结构化图像归档系统（即PACS）保持一致；（2）引入了C-MIR，一种新颖的体积重排方法，适应了ColBERT的上下文化后期交互机制，用于3D医学成像；（3）通过三个特征提取器和三个数据库配置，在四个肿瘤部位进行了全面评估。我们的评估突显了C-MIR的重要优势。我们展示了后期交互原则对体积医学图像的成功适应，实现了有效的上下文感知重排。一个关键发现是C-MIR能够有效定位感兴趣区域，消除了对数据集的预分割的需求，并提供了一种计算效率高的替代方案，避免了依赖昂贵数据丰富化步骤的系统。C-MIR展现出在肿瘤标记方面有希望的改进，特别是对结肠和肺肿瘤（p <0.05）的性能改善。C-MIR还显示了改善肿瘤分期的潜力，值得进一步探索其能力。最终，我们的工作旨在弥合先进检索技术与其在医疗保健中的实际应用之间的鸿沟，为改进诊断流程铺平道路。

更新时间: 2025-07-23 11:12:52

领域: cs.CV,cs.AI,cs.IR

下载: http://arxiv.org/abs/2507.17412v1

ACMP: Allen-Cahn Message Passing with Attractive and Repulsive Forces for Graph Neural Networks

Neural message passing is a basic feature extraction unit for graph-structured data considering neighboring node features in network propagation from one layer to the next. We model such process by an interacting particle system with attractive and repulsive forces and the Allen-Cahn force arising in the modeling of phase transition. The dynamics of the system is a reaction-diffusion process which can separate particles without blowing up. This induces an Allen-Cahn message passing (ACMP) for graph neural networks where the numerical iteration for the particle system solution constitutes the message passing propagation. ACMP which has a simple implementation with a neural ODE solver can propel the network depth up to one hundred of layers with theoretically proven strictly positive lower bound of the Dirichlet energy. It thus provides a deep model of GNNs circumventing the common GNN problem of oversmoothing. GNNs with ACMP achieve state of the art performance for real-world node classification tasks on both homophilic and heterophilic datasets. Codes are available at https://github.com/ykiiiiii/ACMP.

Updated: 2025-07-23 11:10:40

标题: ACMP：具有吸引和排斥力的Allen-Cahn消息传递用于图神经网络

摘要: 神经消息传递是一种基本的特征提取单元，用于考虑网络传播中相邻节点特征的图结构数据，从一个层到下一个层。我们通过具有吸引力和排斥力的相互作用粒子系统以及在相变建模中出现的Allen-Cahn力来建模这种过程。系统的动力学是一个反应扩散过程，可以在不爆炸的情况下分离粒子。这引发了一种Allen-Cahn消息传递（ACMP）用于图神经网络，其中粒子系统解的数值迭代构成了消息传递传播。ACMP通过神经ODE求解器的简单实现可以将网络深度推高到一百层，理论上证明了Dirichlet能量的严格正下界。因此，它提供了一个深层GNN模型，避开了常见的GNN过度平滑问题。带有ACMP的GNN在同源和异源数据集上实现了实际节点分类任务的最新性能。代码可在https://github.com/ykiiiiii/ACMP 上找到。

更新时间: 2025-07-23 11:10:40

领域: cs.LG,cs.AI,math.AP

下载: http://arxiv.org/abs/2206.05437v4

From DDMs to DNNs: Using process data and models of decision-making to improve human-AI interactions

Over the past decades, cognitive neuroscientists and behavioral economists have recognized the value of describing the process of decision making in detail and modeling the emergence of decisions over time. For example, the time it takes to decide can reveal more about an agent's true hidden preferences than only the decision itself. Similarly, data that track the ongoing decision process such as eye movements or neural recordings contain critical information that can be exploited, even if no decision is made. Here, we argue that artificial intelligence (AI) research would benefit from a stronger focus on insights about how decisions emerge over time and incorporate related process data to improve AI predictions in general and human-AI interactions in particular. First, we introduce a highly established computational framework that assumes decisions to emerge from the noisy accumulation of evidence, and we present related empirical work in psychology, neuroscience, and economics. Next, we discuss to what extent current approaches in multi-agent AI do or do not incorporate process data and models of decision making. Finally, we outline how a more principled inclusion of the evidence-accumulation framework into the training and use of AI can help to improve human-AI interactions in the future.

Updated: 2025-07-23 11:02:12

标题: 从DDMs到DNNs：利用决策过程数据和模型来改进人工智能与人类的互动

摘要: 在过去的几十年中，认知神经科学家和行为经济学家已经意识到详细描述决策过程的价值，并对决策随时间的推移建模。例如，决策所需的时间可以更多地揭示一个决策者真正的隐藏偏好，而不仅仅是决策本身。同样，跟踪决策过程的数据，如眼动或神经记录，包含关键信息，即使没有做出决策也可以被利用。在这里，我们认为人工智能（AI）研究将受益于更加关注决策如何随时间出现并整合相关过程数据，以改进AI预测以及特定的人机交互。首先，我们介绍了一个高度成熟的计算框架，假设决策是从嘈杂的证据积累中出现的，并介绍了相关的心理学、神经科学和经济学的实证工作。接下来，我们讨论了当前多智能体AI方法在多大程度上是否整合了过程数据和决策模型。最后，我们概述了如何更有原则地将证据积累框架纳入AI的训练和使用中，以帮助未来改进人机交互。

更新时间: 2025-07-23 11:02:12

领域: q-bio.NC,cs.AI

下载: http://arxiv.org/abs/2308.15225v3

Millions of $\text{GeAR}$-s: Extending GraphRAG to Millions of Documents

Recent studies have explored graph-based approaches to retrieval-augmented generation, leveraging structured or semi-structured information -- such as entities and their relations extracted from documents -- to enhance retrieval. However, these methods are typically designed to address specific tasks, such as multi-hop question answering and query-focused summarisation, and therefore, there is limited evidence of their general applicability across broader datasets. In this paper, we aim to adapt a state-of-the-art graph-based RAG solution: $\text{GeAR}$ and explore its performance and limitations on the SIGIR 2025 LiveRAG Challenge.

Updated: 2025-07-23 10:54:24

标题: 数百万个 $\text{GeAR}$-s：将GraphRAG扩展到数百万个文档

摘要: 最近的研究探讨了基于图的检索增强生成方法，利用从文档中提取的结构化或半结构化信息，如实体及其关系，来增强检索。然而，这些方法通常设计用于解决特定任务，如多跳问题回答和以查询为焦点的摘要，因此，对它们在更广泛数据集上的普适性的证据有限。在本文中，我们旨在调整最先进的基于图的RAG解决方案：GeAR，并探索其在SIGIR 2025 LiveRAG挑战赛上的性能和局限性。

更新时间: 2025-07-23 10:54:24

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2507.17399v1

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Multimodal foundation models, such as GPT-4o, have recently made remarkable progress, but it is not clear where exactly these models stand in terms of understanding vision. In this paper, we benchmark the performance of popular multimodal foundation models (GPT-4o, o4-mini, Gemini 1.5 Pro and Gemini 2.0 Flash, Claude 3.5 Sonnet, Qwen2-VL, Llama 3.2) on standard computer vision tasks (semantic segmentation, object detection, image classification, depth and surface normal prediction) using established datasets (e.g., COCO, ImageNet and its variants, etc). The main challenges to performing this are: 1) most models are trained to output text and cannot natively express versatile domains, such as segments or 3D geometry, and 2) many leading models are proprietary and accessible only at an API level, i.e., there is no weight access to adapt them. We address these challenges by translating standard vision tasks into equivalent text-promptable and API-compatible tasks via prompt chaining to create a standardized benchmarking framework. We observe that 1) the models are not close to the state-of-the-art specialist models at any task. However, 2) they are respectable generalists; this is remarkable as they are presumably trained on primarily image-text-based tasks. 3) They perform semantic tasks notably better than geometric ones. 4) While the prompt-chaining techniques affect performance, better models exhibit less sensitivity to prompt variations. 5) GPT-4o performs the best among non-reasoning models, securing the top position in 4 out of 6 tasks, 6) reasoning models, e.g. o3, show improvements in geometric tasks, and 7) a preliminary analysis of models with native image generation, like the latest GPT-4o, shows they exhibit quirks like hallucinations and spatial misalignments.

Updated: 2025-07-23 10:52:38

标题: GPT-4o对视觉理解有多好？评估多模态基础模型在标准计算机视觉任务上的表现

摘要: 多模态基础模型，如GPT-4o，最近取得了显著进展，但这些模型在理解视觉方面的确切位置尚不清楚。在本文中，我们使用已建立的数据集（如COCO、ImageNet及其变体等）对流行的多模态基础模型（GPT-4o、o4-mini、Gemini 1.5 Pro和Gemini 2.0 Flash、Claude 3.5 Sonnet、Qwen2-VL、Llama 3.2）在标准计算机视觉任务（语义分割、目标检测、图像分类、深度和表面法线预测）的性能进行基准测试。执行此操作的主要挑战包括：1）大多数模型经过训练以输出文本，无法本机表达多样化领域，如分段或3D几何，2）许多领先模型是专有的，只能在API级别访问，即没有权重访问以对其进行调整。通过通过提示链接将标准视觉任务转换为等效的文本提示和API兼容任务，以创建一个标准化的基准测试框架，我们解决了这些挑战。我们观察到：1）这些模型在任何任务上都不接近最新专家模型。然而，2）它们是受尊敬的通才；这是令人印象深刻的，因为它们据说主要是在基于图像文本的任务上训练的。3）它们在语义任务方面的表现明显优于几何任务。4）虽然提示链接技术会影响性能，但更好的模型对提示变化的敏感性较低。5）在非推理模型中，GPT-4o在6项任务中表现最佳，获得了前4名，6）推理模型（例如o3）在几何任务中显示出改进，7）对具有本机图像生成功能的模型进行初步分析，如最新的GPT-4o，显示出幻觉和空间错位等怪癖。

更新时间: 2025-07-23 10:52:38

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01955v2

Graph Neural Networks for O-RAN Mobility Management: A Link Prediction Approach

Mobility performance has been a key focus in cellular networks up to 5G. To enhance handover (HO) performance, 3GPP introduced Conditional Handover (CHO) and Layer 1/Layer 2 Triggered Mobility (LTM) mechanisms in 5G. While these reactive HO strategies address the trade-off between HO failures (HOF) and ping-pong effects, they often result in inefficient radio resource utilization due to additional HO preparations. To overcome these challenges, this article proposes a proactive HO framework for mobility management in O-RAN, leveraging user-cell link predictions to identify the optimal target cell for HO. We explore various categories of Graph Neural Networks (GNNs) for link prediction and analyze the complexity of applying them to the mobility management domain. Two GNN models are compared using a real-world dataset, with experimental results demonstrating their ability to capture the dynamic and graph-structured nature of cellular networks. Finally, we present key insights from our study and outline future steps to enable the integration of GNN-based link prediction for mobility management in O-RAN networks.

Updated: 2025-07-23 10:48:02

标题: 图神经网络用于O-RAN移动性管理: 一种链路预测方法

摘要: 移动性能一直是5G之前蜂窝网络的关键焦点。为了增强切换（HO）性能，3GPP在5G中引入了有条件切换（CHO）和层1/层2触发移动性（LTM）机制。虽然这些反应性HO策略解决了HO失败（HOF）和乒乓效应之间的权衡，但它们通常会导致由于额外的HO准备而导致无效的无线资源利用。为了克服这些挑战，本文提出了一个用于在O-RAN中进行移动性管理的主动HO框架，利用用户-小区链接预测来识别最佳的HO目标小区。我们探讨了各种类别的图神经网络（GNN）用于链接预测，并分析了将它们应用于移动性管理领域的复杂性。通过使用真实数据集比较了两个GNN模型，实验结果表明它们能够捕捉蜂窝网络的动态和图结构特性。最后，我们从研究中得出了关键见解，并概述了未来步骤，以实现在O-RAN网络中基于GNN的链接预测用于移动性管理的集成。

更新时间: 2025-07-23 10:48:02

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2502.02170v2

Learning from Scratch: Structurally-masked Transformer for Next Generation Lib-free Simulation

This paper proposes a neural framework for power and timing prediction of multi-stage data path, distinguishing itself from traditional lib-based analytical methods dependent on driver characterization and load simplifications. To the best of our knowledge, this is the first language-based, netlist-aware neural network designed explicitly for standard cells. Our approach employs two pre-trained neural models of waveform prediction and delay estimation that directly infer transient waveforms and propagation delays from SPICE netlists, conditioned on critical physical parameters such as load capacitance, input slew, and gate size. This method accurately captures both intrinsic and coupling-induced delay effects without requiring simplification or interpolation. For multi-stage timing prediction, we implement a recursive propagation strategy where predicted waveforms from each stage feed into subsequent stages, cumulatively capturing delays across the logic chain. This approach ensures precise timing alignment and complete waveform visibility throughout complex signal pathways. The waveform prediction utilizes a hybrid CNN-Transformer architecture with netlist-aware node-level encoding, addressing traditional Transformers' fixed input dimensionality constraints. Additionally, specialized subnetworks separately handle primary delay estimation and crosstalk correction. Experimental results demonstrate SPICE-level accuracy, consistently achieving RMSE below 0.0098 across diverse industrial circuits. The proposed framework provides a scalable, structurally adaptable neural alternative to conventional power and timing engines, demonstrating high fidelity to physical circuit behaviors.

Updated: 2025-07-23 10:46:25

标题: 从头学习：下一代无库模拟的结构掩盖变压器

摘要: 这篇论文提出了一个神经网络框架，用于多级数据路径的功耗和时序预测，与传统基于库的分析方法有所区别，传统方法依赖于驱动器特性和负载简化。据我们所知，这是第一个基于语言的、网表感知的神经网络，专门设计用于标准单元。我们的方法采用两个预先训练的神经模型进行波形预测和延迟估计，直接根据SPICE网表推断瞬态波形和传播延迟，条件是关键的物理参数，如负载电容、输入斜率和门尺寸。这种方法准确捕捉了本质和耦合引起的延迟效应，而无需简化或插值。对于多级时序预测，我们实现了一种递归传播策略，每个阶段的预测波形反馈到后续阶段，累积捕获逻辑链中的延迟。这种方法确保了精确的时序对齐和在复杂信号路径中的完整波形可见性。波形预测采用了混合CNN-Transformer架构，具有网表感知的节点级编码，解决了传统Transformer的固定输入维度约束。此外，专门的子网络分别处理主要延迟估计和串扰校正。实验结果表明，跨不同工业电路，保持RMSE在0.0098以下，实现了SPICE级的准确性。该提出的框架为传统功耗和时序引擎提供了一个可伸缩的、结构可调整的神经替代方案，展示了对物理电路行为的高保真度。

更新时间: 2025-07-23 10:46:25

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2507.17396v1

Solving nonconvex Hamilton--Jacobi--Isaacs equations with PINN-based policy iteration

We propose a mesh-free policy iteration framework that combines classical dynamic programming with physics-informed neural networks (PINNs) to solve high-dimensional, nonconvex Hamilton--Jacobi--Isaacs (HJI) equations arising in stochastic differential games and robust control. The method alternates between solving linear second-order PDEs under fixed feedback policies and updating the controls via pointwise minimax optimization using automatic differentiation. Under standard Lipschitz and uniform ellipticity assumptions, we prove that the value function iterates converge locally uniformly to the unique viscosity solution of the HJI equation. The analysis establishes equi-Lipschitz regularity of the iterates, enabling provable stability and convergence without requiring convexity of the Hamiltonian. Numerical experiments demonstrate the accuracy and scalability of the method. In a two-dimensional stochastic path-planning game with a moving obstacle, our method matches finite-difference benchmarks with relative $L^2$-errors below %10^{-2}%. In five- and ten-dimensional publisher-subscriber differential games with anisotropic noise, the proposed approach consistently outperforms direct PINN solvers, yielding smoother value functions and lower residuals. Our results suggest that integrating PINNs with policy iteration is a practical and theoretically grounded method for solving high-dimensional, nonconvex HJI equations, with potential applications in robotics, finance, and multi-agent reinforcement learning.

Updated: 2025-07-23 10:44:15

标题: 使用基于PINN的策略迭代解决非凸Hamilton-Jacobi-Isaacs方程

摘要: 我们提出了一个无网格策略迭代框架，将传统的动态规划与物理信息神经网络（PINNs）相结合，以解决在随机微分博弈和鲁棒控制中出现的高维、非凸的Hamilton-Jacobi-Isaacs（HJI）方程。该方法在固定反馈策略下交替解决线性二阶PDE，并通过自动微分使用点点最小最优化更新控制。在标准利普希茨和均匀椭圆性假设下，我们证明值函数迭代在局部一致收敛到HJI方程的唯一黏性解。分析确立了迭代的等利普希兹正则性，使得可以在不需要Hamiltonian凸性的情况下证明稳定性和收敛性。数值实验表明了该方法的准确性和可伸缩性。在一个带有移动障碍物的二维随机路径规划游戏中，我们的方法与有限差分基准相匹配，相对$L^2$-误差低于$10^{-2}$。在具有各向异性噪声的五维和十维发布者-订阅者微分博弈中，所提出的方法始终优于直接的PINN求解器，产生更平滑的值函数和更低的残差。我们的结果表明，将PINNs与策略迭代集成是解决高维、非凸HJI方程的一种实用且理论基础的方法，可能在机器人技术、金融和多智能体强化学习等领域有潜在应用。

更新时间: 2025-07-23 10:44:15

领域: math.NA,cs.AI,cs.NA,math.AP,49N70, 35Q93, 49L25, 68T07

下载: http://arxiv.org/abs/2507.15455v2

HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs

Video Anomaly Detection (VAD) aims to identify and locate deviations from normal patterns in video sequences. Traditional methods often struggle with substantial computational demands and a reliance on extensive labeled datasets, thereby restricting their practical applicability. To address these constraints, we propose HiProbe-VAD, a novel framework that leverages pre-trained Multimodal Large Language Models (MLLMs) for VAD without requiring fine-tuning. In this paper, we discover that the intermediate hidden states of MLLMs contain information-rich representations, exhibiting higher sensitivity and linear separability for anomalies compared to the output layer. To capitalize on this, we propose a Dynamic Layer Saliency Probing (DLSP) mechanism that intelligently identifies and extracts the most informative hidden states from the optimal intermediate layer during the MLLMs reasoning. Then a lightweight anomaly scorer and temporal localization module efficiently detects anomalies using these extracted hidden states and finally generate explanations. Experiments on the UCF-Crime and XD-Violence datasets demonstrate that HiProbe-VAD outperforms existing training-free and most traditional approaches. Furthermore, our framework exhibits remarkable cross-model generalization capabilities in different MLLMs without any tuning, unlocking the potential of pre-trained MLLMs for video anomaly detection and paving the way for more practical and scalable solutions.

Updated: 2025-07-23 10:41:46

标题: HiProbe-VAD：通过调谐自由多模态LLM中的隐藏状态探测进行视频异常检测

摘要: 视频异常检测（VAD）旨在识别和定位视频序列中的正常模式偏差。传统方法通常难以满足较大的计算需求，并且依赖于大量标记数据集，从而限制了它们的实际适用性。为了解决这些限制，我们提出了HiProbe-VAD，这是一个利用预训练的多模态大型语言模型（MLLMs）进行VAD的新框架，而无需进行微调。在本文中，我们发现MLLMs的中间隐藏状态包含信息丰富的表示，与输出层相比，对异常具有更高的敏感性和线性可分性。为了利用这一点，我们提出了一种动态层显著性探测（DLSP）机制，智能地识别并提取MLLMs推理过程中最佳中间层中最具信息量的隐藏状态。然后，一个轻量级的异常评分器和时间定位模块有效地使用这些提取的隐藏状态来检测异常，并最终生成解释。对UCF-Crime和XD-Violence数据集的实验表明，HiProbe-VAD优于现有的无需训练和大多数传统方法。此外，我们的框架在不进行任何调整的情况下展现出在不同MLLMs中出色的跨模型泛化能力，释放了预训练MLLMs在视频异常检测中的潜力，为更实用和可扩展的解决方案铺平了道路。

更新时间: 2025-07-23 10:41:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17394v1

Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains

To gain deeper insights into a complex sensor system through the lens of causality, we present common and individual causal mechanism estimation (CICME), a novel three-step approach to inferring causal mechanisms from heterogeneous data collected across multiple domains. By leveraging the principle of Causal Transfer Learning (CTL), CICME is able to reliably detect domain-invariant causal mechanisms when provided with sufficient samples. The identified common causal mechanisms are further used to guide the estimation of the remaining causal mechanisms in each domain individually. The performance of CICME is evaluated on linear Gaussian models under scenarios inspired from a manufacturing process. Building upon existing continuous optimization-based causal discovery methods, we show that CICME leverages the benefits of applying causal discovery on the pooled data and repeatedly on data from individual domains, and it even outperforms both baseline methods under certain scenarios.

Updated: 2025-07-23 10:35:37

标题: 多传感器系统跨多个领域的因果机制估计

摘要: 为了通过因果关系的视角更深入地了解一个复杂的传感器系统，我们提出了常见和个体因果机制估计（CICME），这是一种从跨多个领域收集的异质数据中推断因果机制的新颖三步方法。通过利用因果转移学习（CTL）的原则，CICME能够在提供足够样本的情况下可靠地检测领域不变的因果机制。确定的公共因果机制进一步用于指导在每个领域中单独估计剩余的因果机制。我们在受制造过程启发的场景下评估了CICME在线性高斯模型上的性能。在现有基于连续优化的因果发现方法基础上，我们表明CICME利用了将因果发现应用于汇总数据和重复应用于各个领域数据的好处，并且在某些情况下甚至表现出色于基准方法。

更新时间: 2025-07-23 10:35:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.17792v1

Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains

Updated: 2025-07-23 10:35:37

标题: 多传感器系统跨多个领域的因果机制估计

摘要: 为了通过因果关系的视角更深入地了解一个复杂的传感器系统，我们提出了共同和个体因果机制估计（CICME），这是一种新颖的三步方法，用于从跨多个领域收集的异质数据中推断因果机制。通过利用因果传递学习（CTL）原则，CICME能够在提供足够的样本时可靠地检测到领域不变的因果机制。识别出的共同因果机制进一步用于指导在每个领域中单独估计其余的因果机制。CICME的性能在受到制造过程启发的情景下基于线性高斯模型进行评估。在现有基于连续优化的因果发现方法的基础上，我们展示了CICME利用将因果发现应用于汇总数据和重复应用于各个领域的数据的好处，并且在某些情景下甚至优于基准方法。

更新时间: 2025-07-23 10:35:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.17792v1

Investigating Training Data Detection in AI Coders

Recent advances in code large language models (CodeLLMs) have made them indispensable tools in modern software engineering. However, these models occasionally produce outputs that contain proprietary or sensitive code snippets, raising concerns about potential non-compliant use of training data, and posing risks to privacy and intellectual property. To ensure responsible and compliant deployment of CodeLLMs, training data detection (TDD) has become a critical task. While recent TDD methods have shown promise in natural language settings, their effectiveness on code data remains largely underexplored. This gap is particularly important given code's structured syntax and distinct similarity criteria compared to natural language. To address this, we conduct a comprehensive empirical study of seven state-of-the-art TDD methods on source code data, evaluating their performance across eight CodeLLMs. To support this evaluation, we introduce CodeSnitch, a function-level benchmark dataset comprising 9,000 code samples in three programming languages, each explicitly labeled as either included or excluded from CodeLLM training. Beyond evaluation on the original CodeSnitch, we design targeted mutation strategies to test the robustness of TDD methods under three distinct settings. These mutation strategies are grounded in the well-established Type-1 to Type-4 code clone detection taxonomy. Our study provides a systematic assessment of current TDD techniques for code and offers insights to guide the development of more effective and robust detection methods in the future.

Updated: 2025-07-23 10:34:22

标题: 调查人工智能编码器中的训练数据检测

摘要: 最近在代码大型语言模型（CodeLLMs）方面取得的进展使它们成为现代软件工程中不可或缺的工具。然而，这些模型偶尔会产生包含专有或敏感代码片段的输出，引发对训练数据潜在违规使用的担忧，并对隐私和知识产权构成风险。为了确保CodeLLMs的负责和合规部署，训练数据检测（TDD）已成为一项关键任务。尽管最近的TDD方法在自然语言环境中显示出潜力，但它们在代码数据上的有效性仍然很大程度上未被探索。鉴于代码的结构化语法和与自然语言相比的明显相似性标准，这一差距尤为重要。为了解决这个问题，我们对七种最先进的TDD方法在源代码数据上进行了全面的实证研究，评估它们在八个CodeLLMs上的性能。为了支持这一评估，我们引入了CodeSnitch，一个功能级基准数据集，包含三种编程语言中的9,000个代码样本，每个样本都明确标记为包含在或排除在CodeLLM训练之外。除了对原始CodeSnitch的评估外，我们设计了有针对性的突变策略，以测试TDD方法在三种不同设置下的稳健性。这些突变策略基于已建立的类型1到类型4的代码克隆检测分类法。我们的研究对当前代码的TDD技术进行了系统评估，并提供了指导未来开发更有效和更稳健的检测方法的见解。

更新时间: 2025-07-23 10:34:22

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.17389v1

Helix 1.0: An Open-Source Framework for Reproducible and Interpretable Machine Learning on Tabular Scientific Data

Helix is an open-source, extensible, Python-based software framework to facilitate reproducible and interpretable machine learning workflows for tabular data. It addresses the growing need for transparent experimental data analytics provenance, ensuring that the entire analytical process -- including decisions around data transformation and methodological choices -- is documented, accessible, reproducible, and comprehensible to relevant stakeholders. The platform comprises modules for standardised data preprocessing, visualisation, machine learning model training, evaluation, interpretation, results inspection, and model prediction for unseen data. To further empower researchers without formal training in data science to derive meaningful and actionable insights, Helix features a user-friendly interface that enables the design of computational experiments, inspection of outcomes, including a novel interpretation approach to machine learning decisions using linguistic terms all within an integrated environment. Released under the MIT licence, Helix is accessible via GitHub and PyPI, supporting community-driven development and promoting adherence to the FAIR principles.

Updated: 2025-07-23 10:33:35

标题: 螺旋1.0：一个用于在表格科学数据上可复制和可解释机器学习的开源框架

摘要: Helix是一个开源的、可扩展的、基于Python的软件框架，旨在为表格数据提供可重复和可解释的机器学习工作流。它解决了对透明实验数据分析溯源的不断增长的需求，确保整个分析过程--包括数据转换和方法选择--都被记录、可访问、可重现，并且易于理解给相关利益相关者。该平台包括标准化数据预处理、可视化、机器学习模型训练、评估、解释、结果检查和模型预测未知数据的模块。为了进一步赋予没有数据科学正式训练的研究人员获取有意义和可操作的见解的能力，Helix具有用户友好的界面，可以设计计算实验，检查结果，包括一种新颖的解释方法，使用语言术语来解释机器学习决策，所有这些都在一个集成环境中。Helix在MIT许可下发布，可通过GitHub和PyPI访问，支持社区驱动的开发，并促进遵守FAIR原则。

更新时间: 2025-07-23 10:33:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17791v1

Helix 1.0: An Open-Source Framework for Reproducible and Interpretable Machine Learning on Tabular Scientific Data

Updated: 2025-07-23 10:33:35

标题: 1.0: 一个用于可重复和可解释的机器学习在表格科学数据上的开源框架Helix

摘要: Helix是一个开源的、可扩展的、基于Python的软件框架，旨在为表格数据提供可重复和可解释的机器学习工作流程。它满足了透明实验数据分析来源的增长需求，确保整个分析过程--包括关于数据转换和方法选择的决策--都被记录、可访问、可重复，并且对相关利益相关者易于理解。该平台包括标准化数据预处理、可视化、机器学习模型训练、评估、解释、结果检查和对未见数据的模型预测的模块。为了进一步赋予没有数据科学正式培训的研究人员从中得出有意义且可操作见解的能力，Helix提供了一个用户友好的界面，可以设计计算实验、检查结果，包括使用语言术语对机器学习决策进行新颖解释方法，所有这些都在一个集成环境中。Helix在MIT许可下发布，通过GitHub和PyPI可访问，支持社区驱动的开发并促进遵守FAIR原则。

更新时间: 2025-07-23 10:33:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17791v1

A Zero-overhead Flow for Security Closure

In the traditional Application-Specific Integrated Circuit (ASIC) design flow, the concept of timing closure implies to reach convergence during physical synthesis such that, under a given area and power budget, the design works at the targeted frequency. However, security has been largely neglected when evaluating the Quality of Results (QoR) from physical synthesis. In general, commercial place & route tools do not understand security goals. In this work, we propose a modified ASIC design flow that is security-aware and, differently from prior research, does not degrade QoR for the sake of security improvement. Therefore, we propose a first-of-its-kind zero-overhead flow for security closure. Our flow is concerned with two distinct threat models: (i) insertion of Hardware Trojans (HTs) and (ii) physical probing/fault injection. Importantly, the flow is entirely executed within a commercial place & route engine and is scalable. In several metrics, our security-aware flow achieves the best-known results for the ISPD`22 set of benchmark circuits while incurring negligible design overheads due to security-related strategies. Finally, we open source the entire methodology (as a set of scripts) and also share the protected circuits (as design databases) for the benefit of the hardware security community.

Updated: 2025-07-23 10:28:15

标题: 一个零开销的用于安全封闭的流程

摘要: 在传统的应用特定集成电路（ASIC）设计流程中，时序闭合的概念意味着在物理综合过程中达到收敛，使设计在给定的面积和功耗预算下能够在目标频率下工作。然而，在评估物理综合的结果质量（QoR）时，安全性往往被忽视。一般来说，商用布局和布线工具不理解安全目标。在这项工作中，我们提出了一个修改后的ASIC设计流程，具有安全意识，并且与先前的研究不同，不会为了安全改进而降低QoR。因此，我们提出了一种首创的零开销安全闭合流程。我们的流程关注两种不同的威胁模型：（i）硬件特洛伊木马（HT）的插入和（ii）物理探测/故障注入。重要的是，该流程完全在商用布局和布线引擎内执行，并且具有可扩展性。在多个指标上，我们的安全意识流程在ISPD'22一组基准电路中实现了迄今为止已知的最佳结果，同时由于与安全相关策略而产生的设计开销微乎其微。最后，我们开源整个方法论（作为一组脚本），并且也分享受保护的电路（作为设计数据库），以造福硬件安全社区。

更新时间: 2025-07-23 10:28:15

领域: cs.CR

下载: http://arxiv.org/abs/2507.17385v1

Confidence Calibration in Vision-Language-Action Models

Trustworthy robot behavior requires not only high levels of task success but also that the robot can reliably quantify how likely it is to succeed. To this end, we present the first systematic study of confidence calibration in vision-language-action (VLA) foundation models, which map visual observations and natural-language instructions to low-level robot motor commands. We begin with extensive benchmarking to understand the critical relationship between task success and calibration error across multiple datasets and VLA variants, finding that task performance and calibration are not in tension. Next, we introduce prompt ensembles for VLAs, a lightweight, Bayesian-inspired algorithm that averages confidence across paraphrased instructions and consistently improves calibration. We further analyze calibration over the task time horizon, showing that confidence is often most reliable after making some progress, suggesting natural points for risk-aware intervention. Finally, we reveal differential miscalibration across action dimensions and propose action-wise Platt scaling, a method to recalibrate each action dimension independently to produce better confidence estimates. Our aim in this study is to begin to develop the tools and conceptual understanding necessary to render VLAs both highly performant and highly trustworthy via reliable uncertainty quantification.

Updated: 2025-07-23 10:26:10

标题: 视觉-语言-行动模型中的信心校准

摘要: 可信赖的机器人行为不仅需要高水平的任务成功率，还需要机器人能够可靠地量化成功的可能性。为此，我们提出了第一个系统研究视觉-语言-行动（VLA）基础模型中置信度校准的研究，该模型将视觉观察和自然语言指令映射到低级机器人电机指令。我们开始进行广泛的基准测试，以了解任务成功率和校准误差在多个数据集和VLA变体之间的关键关系，发现任务性能和校准并不矛盾。接下来，我们引入了VLAs的提示集成，这是一种轻量级、受贝叶斯启发的算法，通过对释义指令的置信度进行平均，并持续改进校准。我们进一步分析了任务时间范围内的校准，显示在取得一定进展后，置信度通常是最可靠的，这暗示了风险意识干预的自然时机。最后，我们揭示了在行动维度上的校准误差差异，并提出了行动智能Platt校准，这是一种独立重新校准每个行动维度以产生更好置信度估计的方法。我们在这项研究中的目标是开始发展必要的工具和概念理解，以通过可靠的不确定性量化使VLAs既具有高性能又具有高信任度。

更新时间: 2025-07-23 10:26:10

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2507.17383v1

Continual Generalized Category Discovery: Learning and Forgetting from a Bayesian Perspective

Continual Generalized Category Discovery (C-GCD) faces a critical challenge: incrementally learning new classes from unlabeled data streams while preserving knowledge of old classes. Existing methods struggle with catastrophic forgetting, especially when unlabeled data mixes known and novel categories. We address this by analyzing C-GCD's forgetting dynamics through a Bayesian lens, revealing that covariance misalignment between old and new classes drives performance degradation. Building on this insight, we propose Variational Bayes C-GCD (VB-CGCD), a novel framework that integrates variational inference with covariance-aware nearest-class-mean classification. VB-CGCD adaptively aligns class distributions while suppressing pseudo-label noise via stochastic variational updates. Experiments show VB-CGCD surpasses prior art by +15.21% with the overall accuracy in the final session on standard benchmarks. We also introduce a new challenging benchmark with only 10% labeled data and extended online phases, VB-CGCD achieves a 67.86% final accuracy, significantly higher than state-of-the-art (38.55%), demonstrating its robust applicability across diverse scenarios. Code is available at: https://github.com/daihao42/VB-CGCD

Updated: 2025-07-23 10:25:27

标题: 持续的广义类别发现：从贝叶斯视角学习与遗忘

摘要: 持续广义类别发现（C-GCD）面临一个关键挑战：从未标记的数据流中增量学习新类别，同时保留对旧类别的知识。现有方法在处理灾难性遗忘时遇到困难，特别是当未标记的数据混合了已知和新颖的类别时。我们通过贝叶斯视角分析了C-GCD的遗忘动态，发现旧类别和新类别之间的协方差不匹配导致性能下降。基于这一洞察，我们提出了变分贝叶斯C-GCD（VB-CGCD），这是一个集成变分推断和协方差感知的最近类均值分类的新框架。VB-CGCD通过随机变分更新自适应地对齐类分布，同时通过抑制伪标签噪声。实验表明，VB-CGCD在标准基准测试中在最终会话的整体准确率上超过先前的艺术作品15.21%。我们还引入了一个新的具有挑战性的基准测试，只有10%的标记数据和扩展的在线阶段，VB-CGCD实现了67.86%的最终准确率，明显高于现有技术（38.55%），展示了其在各种场景中的稳健适用性。代码可在以下链接找到：https://github.com/daihao42/VB-CGCD

更新时间: 2025-07-23 10:25:27

领域: cs.LG

下载: http://arxiv.org/abs/2507.17382v1

SFUOD: Source-Free Unknown Object Detection

Source-free object detection adapts a detector pre-trained on a source domain to an unlabeled target domain without requiring access to labeled source data. While this setting is practical as it eliminates the need for the source dataset during domain adaptation, it operates under the restrictive assumption that only pre-defined objects from the source domain exist in the target domain. This closed-set setting prevents the detector from detecting undefined objects. To ease this assumption, we propose Source-Free Unknown Object Detection (SFUOD), a novel scenario which enables the detector to not only recognize known objects but also detect undefined objects as unknown objects. To this end, we propose CollaPAUL (Collaborative tuning and Principal Axis-based Unknown Labeling), a novel framework for SFUOD. Collaborative tuning enhances knowledge adaptation by integrating target-dependent knowledge from the auxiliary encoder with source-dependent knowledge from the pre-trained detector through a cross-domain attention mechanism. Additionally, principal axes-based unknown labeling assigns pseudo-labels to unknown objects by estimating objectness via principal axes projection and confidence scores from model predictions. The proposed CollaPAUL achieves state-of-the-art performances on SFUOD benchmarks, and extensive experiments validate its effectiveness.

Updated: 2025-07-23 10:16:25

标题: SFUOD: 无源未知对象检测

摘要: 无源物体检测将一个在源域上预先训练好的检测器适应到一个未标记的目标域，而无需访问标记的源数据。尽管这种设置在域适应过程中消除了对源数据集的需求，但它仅适用于一个限制性假设，即目标域中只存在来自源域的预定义对象。这种封闭集设置会阻止检测器检测未定义的对象。为了放宽这种假设，我们提出了Source-Free Unknown Object Detection（SFUOD），这是一种新颖的场景，使检测器不仅能够识别已知对象，还能够将未定义的对象识别为未知对象。为此，我们提出了CollaPAUL（协作调整和基于主轴的未知标记），这是一种新颖的SFUOD框架。协作调整通过一个跨域注意机制将来自辅助编码器的目标相关知识与来自预先训练的检测器的源相关知识相结合，增强了知识适应能力。此外，基于主轴的未知标记通过主轴投影估计对象的客观性，并通过模型预测的置信度得分为未知对象分配伪标签。所提出的CollaPAUL在SFUOD基准测试中取得了最先进的性能，并广泛的实验证实了其有效性。

更新时间: 2025-07-23 10:16:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17373v1

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective, standing at the crux of advanced AI innovations and practical system optimizations. We provide in-depth analysis, covering a spectrum of solutions, ranging from cutting-edge algorithmic modifications to groundbreaking changes in system designs. The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving, offering valuable insights for researchers and practitioners in overcoming the barriers of effective LLM deployment, thereby reshaping the future of AI.

Updated: 2025-07-23 10:11:55

标题: 朝着高效的生成式大型语言模型服务：从算法到系统的调查

摘要: 在人工智能（AI）快速发展的背景下，生成式大型语言模型（LLM）站在前沿，彻底改变了我们与数据互动的方式。然而，部署这些模型所需的计算强度和内存消耗在效率方面提出了重大挑战，特别是在需要低延迟和高吞吐量的场景中。本调查从机器学习系统（MLSys）研究的角度探讨了高效LLM服务方法的迫切需求，处于先进AI创新和实际系统优化的核心位置。我们提供了深入分析，涵盖了一系列解决方案，从尖端算法修改到系统设计的突破性变革。该调查旨在全面了解高效LLM服务的当前状态和未来方向，为研究人员和实践者提供宝贵的见解，帮助克服有效部署LLM的障碍，从而重塑AI的未来。

更新时间: 2025-07-23 10:11:55

领域: cs.LG,cs.AI,cs.DC,cs.PF

下载: http://arxiv.org/abs/2312.15234v2

ViRN: Variational Inference and Distribution Trilateration for Long-Tailed Continual Representation Learning

Continual learning (CL) with long-tailed data distributions remains a critical challenge for real-world AI systems, where models must sequentially adapt to new classes while retaining knowledge of old ones, despite severe class imbalance. Existing methods struggle to balance stability and plasticity, often collapsing under extreme sample scarcity. To address this, we propose ViRN, a novel CL framework that integrates variational inference (VI) with distributional trilateration for robust long-tailed learning. First, we model class-conditional distributions via a Variational Autoencoder to mitigate bias toward head classes. Second, we reconstruct tail-class distributions via Wasserstein distance-based neighborhood retrieval and geometric fusion, enabling sample-efficient alignment of tail-class representations. Evaluated on six long-tailed classification benchmarks, including speech (e.g., rare acoustic events, accents) and image tasks, ViRN achieves a 10.24% average accuracy gain over state-of-the-art methods.

Updated: 2025-07-23 10:04:30

标题: ViRN: 变分推断和分布三边定位在长尾持续表示学习中的应用

摘要: 随着长尾数据分布的持续学习（CL）仍然是现实世界人工智能系统面临的一个关键挑战，在这种系统中，模型必须依次适应新类别，同时保留对旧类别的知识，尽管存在严重的类别不平衡。现有方法往往在稳定性和可塑性之间难以平衡，在极端样本稀缺的情况下常会崩溃。为了解决这个问题，我们提出了ViRN，这是一个将变分推断（VI）与分布三点定位相结合的新型CL框架，用于稳健的长尾学习。首先，我们通过变分自动编码器对类别条件分布进行建模，以减少对头部类别的偏见。其次，我们通过基于Wasserstein距离的邻域检索和几何融合来重建尾部类别分布，实现尾部类别表示的高效对齐。在六个长尾分类基准测试中进行评估，包括语音（例如罕见的声学事件、口音）和图像任务，ViRN的平均准确率提高了10.24%，超过了现有方法。

更新时间: 2025-07-23 10:04:30

领域: cs.LG

下载: http://arxiv.org/abs/2507.17368v1

Leveraging RAG-LLMs for Urban Mobility Simulation and Analysis

With the rise of smart mobility and shared e-mobility services, numerous advanced technologies have been applied to this field. Cloud-based traffic simulation solutions have flourished, offering increasingly realistic representations of the evolving mobility landscape. LLMs have emerged as pioneering tools, providing robust support for various applications, including intelligent decision-making, user interaction, and real-time traffic analysis. As user demand for e-mobility continues to grow, delivering comprehensive end-to-end solutions has become crucial. In this paper, we present a cloud-based, LLM-powered shared e-mobility platform, integrated with a mobile application for personalized route recommendations. The optimization module is evaluated based on travel time and cost across different traffic scenarios. Additionally, the LLM-powered RAG framework is evaluated at the schema level for different users, using various evaluation methods. Schema-level RAG with XiYanSQL achieves an average execution accuracy of 0.81 on system operator queries and 0.98 on user queries.

Updated: 2025-07-23 10:02:51

标题: 利用RAG-LLMs进行城市移动性模拟和分析

摘要: 随着智能出行和共享电动出行服务的兴起，许多先进技术已被应用到这一领域。基于云的交通仿真解决方案蓬勃发展，提供对不断发展的出行景观越来越逼真的表现。LLM已经成为开创性工具，为各种应用提供强大支持，包括智能决策、用户交互和实时交通分析。随着用户对电动出行的需求不断增长，提供全面的端到端解决方案变得至关重要。在本文中，我们介绍了一个基于云的、由LLM驱动的共享电动出行平台，集成了一个用于个性化路线推荐的移动应用程序。优化模块根据不同交通场景下的行驶时间和成本进行评估。此外，基于LLM的RAG框架在不同用户的模式级别上进行评估，使用各种评估方法。在系统操作员查询和用户查询上，XiYanSQL实现了0.81的平均执行准确度。

更新时间: 2025-07-23 10:02:51

领域: cs.LG

下载: http://arxiv.org/abs/2507.10382v2

Artificial Intelligence for Green Hydrogen Yield Prediction and Site Suitability using SHAP-Based Composite Index: Focus on Oman

As nations seek sustainable alternatives to fossil fuels, green hydrogen has emerged as a promising strategic pathway toward decarbonisation, particularly in solar-rich arid regions. However, identifying optimal locations for hydrogen production requires the integration of complex environmental, atmospheric, and infrastructural factors, often compounded by limited availability of direct hydrogen yield data. This study presents a novel Artificial Intelligence (AI) framework for computing green hydrogen yield and site suitability index using mean absolute SHAP (SHapley Additive exPlanations) values. This framework consists of a multi-stage pipeline of unsupervised multi-variable clustering, supervised machine learning classifier and SHAP algorithm. The pipeline trains on an integrated meteorological, topographic and temporal dataset and the results revealed distinct spatial patterns of suitability and relative influence of the variables. With model predictive accuracy of 98%, the result also showed that water proximity, elevation and seasonal variation are the most influential factors determining green hydrogen site suitability in Oman with mean absolute shap values of 2.470891, 2.376296 and 1.273216 respectively. Given limited or absence of ground-truth yield data in many countries that have green hydrogen prospects and ambitions, this study offers an objective and reproducible alternative to subjective expert weightings, thus allowing the data to speak for itself and potentially discover novel latent groupings without pre-imposed assumptions. This study offers industry stakeholders and policymakers a replicable and scalable tool for green hydrogen infrastructure planning and other decision making in data-scarce regions.

Updated: 2025-07-23 10:00:49

标题: 人工智能用于绿色氢产量预测和基于SHAP的复合指数的场地适宜性评估：以阿曼为重点

摘要: 随着各国寻求可持续替代化石燃料的办法，绿色氢气已成为一条有前景的战略途径，特别是在阳光充足的干旱地区向脱碳迈进。然而，确定适合氢气生产的最佳位置需要整合复杂的环境、大气和基础设施因素，这些因素通常受到直接氢气产量数据的有限可用性的影响。本研究提出了一种新颖的人工智能（AI）框架，用于计算绿色氢气产量和场地适宜性指数，采用平均绝对SHAP（Shapley Additive Explanations）值。该框架由无监督多变量聚类、监督机器学习分类器和SHAP算法构成的多阶段管道组成。该管道在整合的气象、地形和时间数据集上进行训练，结果显示了适宜性的明显空间模式和变量的相对影响。结果显示，水域接近度、海拔和季节变化是决定阿曼绿色氢气场地适宜性的最具影响力的因素，其平均绝对shap值分别为2.470891、2.376296和1.273216。鉴于许多国家在拥有绿色氢气前景和雄心的情况下存在地面真实产量数据有限或缺失的状况，本研究提供了一个客观和可重复的替代方案，避免主观专家权重，使数据自我表达，并可能在不加设先验假设的情况下发现新的潜在分组。本研究为工业利益相关者和政策制定者提供了一个可复制和可扩展的工具，用于在数据匮乏地区进行绿色氢气基础设施规划和其他决策制定。

更新时间: 2025-07-23 10:00:49

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.14219v2

DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning

Multi-step agentic retrieval systems based on large language models (LLMs) have demonstrated remarkable performance in complex information search tasks. However, these systems still face significant challenges in practical applications, particularly in generating factually inconsistent intermediate queries and inefficient search trajectories, which can lead to reasoning deviations or redundant computations. To address these issues, we propose DynaSearcher, an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL). Specifically, our system leverages knowledge graphs as external structured knowledge to guide the search process by explicitly modeling entity relationships, thereby ensuring factual consistency in intermediate queries and mitigating biases from irrelevant information. Furthermore, we employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality. This framework promotes the generation of high-quality intermediate queries and comprehensive final answers, while discouraging unnecessary exploration and minimizing information omissions or redundancy. Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets, matching frontier LLMs while using only small-scale models and limited computational resources. Furthermore, our approach demonstrates strong generalization and robustness across diverse retrieval environments and larger-scale models, highlighting its broad applicability.

Updated: 2025-07-23 09:58:31

标题: DynaSearcher：通过多奖励强化学习增强的动态知识图搜索代理

摘要: 基于大型语言模型（LLMs）的多步骤代理检索系统在复杂信息搜索任务中表现出了卓越的性能。然而，这些系统在实际应用中仍然面临着重大挑战，特别是在生成事实不一致的中间查询和低效的搜索轨迹方面，这可能导致推理偏差或冗余计算。为了解决这些问题，我们提出了DynaSearcher，这是一个由动态知识图和多重奖励强化学习（RL）增强的创新搜索代理。具体来说，我们的系统利用知识图作为外部结构化知识，通过明确建模实体关系来引导搜索过程，从而确保中间查询的事实一致性，并减轻来自不相关信息的偏见。此外，我们采用多重奖励RL框架对训练目标进行精细化控制，如检索准确性、效率和响应质量。该框架促进了高质量中间查询和全面的最终答案的生成，同时抑制了不必要的探索，并最小化信息遗漏或冗余。实验结果表明，我们的方法在六个多跳问题回答数据集上实现了最先进的答案准确性，与前沿LLMs相匹配，同时仅使用小规模模型和有限的计算资源。此外，我们的方法在不同的检索环境和更大规模的模型中表现出强大的泛化性和稳健性，突显了其广泛的适用性。

更新时间: 2025-07-23 09:58:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17365v1

Adaptive Repetition for Mitigating Position Bias in LLM-Based Ranking

When using LLMs to rank items based on given criteria, or evaluate answers, the order of candidate items can influence the model's final decision. This sensitivity to item positioning in a LLM's prompt is known as position bias. Prior research shows that this bias exists even in large models, though its severity varies across models and tasks. In addition to position bias, LLMs also exhibit varying degrees of low repetition consistency, where repeating the LLM call with the same candidate ordering can lead to different rankings. To address both inconsistencies, a common approach is to prompt the model multiple times with different candidate orderings and aggregate the results via majority voting. However, this repetition strategy, significantly increases computational costs. Extending prior findings, we observe that both the direction -- favoring either the earlier or later candidate in the prompt -- and magnitude of position bias across instances vary substantially, even within a single dataset. This observation highlights the need for a per-instance mitigation strategy. To this end, we introduce a dynamic early-stopping method that adaptively determines the number of repetitions required for each instance. Evaluating our approach across three LLMs of varying sizes and on two tasks, namely re-ranking and alignment, we demonstrate that transitioning to a dynamic repetition strategy reduces the number of LLM calls by an average of 81%, while preserving the accuracy. Furthermore, we propose a confidence-based adaptation to our early-stopping method, reducing LLM calls by an average of 87% compared to static repetition, with only a slight accuracy trade-off relative to our original early-stopping method.

Updated: 2025-07-23 09:54:44

标题: 基于LLM的排名中减轻位置偏差的自适应重复

摘要: 使用LLMs根据给定标准对项目进行排名或评估答案时，候选项目的顺序会影响模型的最终决策。LLM提示中对项目位置的敏感性被称为位置偏差。先前的研究表明，即使在大型模型中，这种偏差也存在，尽管其严重程度在模型和任务之间变化。除了位置偏差，LLMs还表现出不同程度的低重复一致性，即使用相同的候选排序重复LLM调用可能导致不同的排名。为了解决这两种不一致性，一个常见的方法是使用不同的候选排序多次提示模型，并通过多数投票来聚合结果。然而，这种重复策略会显著增加计算成本。延伸先前的发现，我们观察到在一个数据集中，即使在同一实例中，位置偏差的方向（偏向于提示中的前一个候选项还是后一个候选项）和程度都会有显著变化。这一观察突显了对每个实例的缓解策略的需求。为此，我们引入了一种动态早停方法，自适应地确定每个实例所需的重复次数。在三种不同大小的LLM和两个任务（重新排序和对齐）上评估我们的方法，我们证明过渡到动态重复策略可将LLM调用次数平均减少81％，同时保持准确性。此外，我们提出了一种基于置信度的早停方法适应，与静态重复相比，将LLM调用次数平均减少了87％，与我们最初的早停方法相比，只有轻微的准确性折衷。

更新时间: 2025-07-23 09:54:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17788v1

Adaptive Repetition for Mitigating Position Bias in LLM-Based Ranking

Updated: 2025-07-23 09:54:44

标题: 适应性重复：减轻基于LLM排名的位置偏差

摘要: 在使用LLMs根据给定标准对项目进行排名或评估答案时，候选项目的顺序可以影响模型的最终决策。LLM提示中对项目位置的敏感性被称为位置偏差。先前的研究表明，即使在大型模型中，这种偏见也存在，尽管其严重程度在不同模型和任务之间有所不同。除了位置偏差，LLMs还表现出不同程度的低重复一致性，即重复使用相同的候选顺序可能导致不同的排名。为了解决这两种不一致性，一个常见的方法是使用不同的候选顺序多次提示模型，并通过多数投票来汇总结果。然而，这种重复策略显著增加了计算成本。通过扩展先前的研究结果，我们观察到不同实例中位置偏差的方向和幅度变化很大，即使在单个数据集中也是如此。这一观察突显了需要针对每个实例采取缓解策略。为此，我们引入了一种动态早停方法，自适应地确定每个实例所需的重复次数。通过在三种不同大小的LLMs和两个任务上评估我们的方法，即重新排名和对齐，我们证明过渡到动态重复策略可将LLM调用数量平均减少81%，同时保持准确性。此外，我们提出了基于置信度的适应方法，将LLM调用数量平均减少87%，与我们最初的早停方法相比，只有略微的准确性折衷。

更新时间: 2025-07-23 09:54:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17788v1

Vascular Segmentation of Functional Ultrasound Images using Deep Learning

Segmentation of medical images is a fundamental task with numerous applications. While MRI, CT, and PET modalities have significantly benefited from deep learning segmentation techniques, more recent modalities, like functional ultrasound (fUS), have seen limited progress. fUS is a non invasive imaging method that measures changes in cerebral blood volume (CBV) with high spatio-temporal resolution. However, distinguishing arterioles from venules in fUS is challenging due to opposing blood flow directions within the same pixel. Ultrasound localization microscopy (ULM) can enhance resolution by tracking microbubble contrast agents but is invasive, and lacks dynamic CBV quantification. In this paper, we introduce the first deep learning-based segmentation tool for fUS images, capable of differentiating signals from different vascular compartments, based on ULM automatic annotation and enabling dynamic CBV quantification. We evaluate various UNet architectures on fUS images of rat brains, achieving competitive segmentation performance, with 90% accuracy, a 71% F1 score, and an IoU of 0.59, using only 100 temporal frames from a fUS stack. These results are comparable to those from tubular structure segmentation in other imaging modalities. Additionally, models trained on resting-state data generalize well to images captured during visual stimulation, highlighting robustness. This work offers a non-invasive, cost-effective alternative to ULM, enhancing fUS data interpretation and improving understanding of vessel function. Our pipeline shows high linear correlation coefficients between signals from predicted and actual compartments in both cortical and deeper regions, showcasing its ability to accurately capture blood flow dynamics.

Updated: 2025-07-23 09:53:03

标题: 使用深度学习进行功能性超声图像的血管分割

摘要: 医学图像分割是一项具有许多应用的基础任务。尽管MRI、CT和PET模式已经极大地受益于深度学习分割技术，但更近期的模式，如功能超声（fUS），进展有限。fUS是一种无创成像方法，能够以高时空分辨率测量脑血容量（CBV）的变化。然而，在fUS中区分动脉和静脉是具有挑战性的，因为在同一像素内存在相反的血流方向。超声定位显微镜（ULM）可以通过跟踪微泡对比剂来增强分辨率，但是具有侵入性，并且缺乏动态CBV量化。在本文中，我们介绍了第一个基于深度学习的fUS图像分割工具，能够根据ULM自动注释区分不同血管区段的信号，并实现动态CBV量化。我们评估了各种UNet架构在大鼠脑fUS图像上的表现，取得了竞争性的分割性能，准确率达到90％，F1分数为71％，IoU为0.59，仅使用来自fUS堆栈的100个时间帧。这些结果与其他成像模式中的管状结构分割结果相当。此外，训练于静息状态数据的模型在视觉刺激期间捕获的图像上表现良好，突显了其稳健性。这项工作提供了一种无创、成本效益的ULM替代方案，增强了fUS数据解释能力，并提高了对血管功能的理解。我们的流程在大脑皮层和更深部位的预测和实际区段信号之间显示出高线性相关系数，展示了其准确捕获血流动力学的能力。

更新时间: 2025-07-23 09:53:03

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.22365v2

Monitoring digestate application on agricultural crops using Sentinel-2 Satellite imagery

The widespread use of Exogenous Organic Matter in agriculture necessitates monitoring to assess its effects on soil and crop health. This study evaluates optical Sentinel-2 satellite imagery for detecting digestate application, a practice that enhances soil fertility but poses environmental risks like microplastic contamination and nitrogen losses. In the first instance, Sentinel-2 satellite image time series (SITS) analysis of specific indices (EOMI, NDVI, EVI) was used to characterize EOM's spectral behavior after application on the soils of four different crop types in Thessaly, Greece. Furthermore, Machine Learning (ML) models (namely Random Forest, k-NN, Gradient Boosting and a Feed-Forward Neural Network), were used to investigate digestate presence detection, achieving F1-scores up to 0.85. The findings highlight the potential of combining remote sensing and ML for scalable and cost-effective monitoring of EOM applications, supporting precision agriculture and sustainability.

Updated: 2025-07-23 09:50:45

标题: 使用Sentinel-2卫星图像监测农作物上的消化液施用

摘要: 农业中外源有机物质的广泛使用需要监测以评估其对土壤和作物健康的影响。本研究评估了光学Sentinel-2卫星图像用于检测消化物施用的效果，这种做法可以提高土壤肥力，但存在微塑料污染和氮损失等环境风险。首先，利用Sentinel-2卫星图像时间序列（SITS）分析特定指数（EOMI、NDVI、EVI）来表征希腊Thessaly地区四种不同作物土壤上施用EOM后的光谱行为。此外，采用机器学习（ML）模型（即随机森林、k-NN、梯度提升和前馈神经网络），用于研究消化物的存在检测，实现了F1分数高达0.85。研究结果突显了结合遥感和机器学习用于可扩展和成本效益的监测EOM施用的潜力，支持精准农业和可持续发展。

更新时间: 2025-07-23 09:50:45

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.19996v2

Hyperbolic Deep Learning for Foundation Models: A Survey

Foundation models pre-trained on massive datasets, including large language models (LLMs), vision-language models (VLMs), and large multimodal models, have demonstrated remarkable success in diverse downstream tasks. However, recent studies have shown fundamental limitations of these models: (1) limited representational capacity, (2) lower adaptability, and (3) diminishing scalability. These shortcomings raise a critical question: is Euclidean geometry truly the optimal inductive bias for all foundation models, or could incorporating alternative geometric spaces enable models to better align with the intrinsic structure of real-world data and improve reasoning processes? Hyperbolic spaces, a class of non-Euclidean manifolds characterized by exponential volume growth with respect to distance, offer a mathematically grounded solution. These spaces enable low-distortion embeddings of hierarchical structures (e.g., trees, taxonomies) and power-law distributions with substantially fewer dimensions compared to Euclidean counterparts. Recent advances have leveraged these properties to enhance foundation models, including improving LLMs' complex reasoning ability, VLMs' zero-shot generalization, and cross-modal semantic alignment, while maintaining parameter efficiency. This paper provides a comprehensive review of hyperbolic neural networks and their recent development for foundation models. We further outline key challenges and research directions to advance the field.

Updated: 2025-07-23 09:50:17

标题: 基于双曲深度学习的基础模型：一项调查

摘要: 在大规模数据集上预训练的基础模型，包括大型语言模型（LLMs）、视觉-语言模型（VLMs）和大型多模态模型，已经在各种下游任务中取得了显著的成功。然而，最近的研究表明了这些模型的基本局限性：（1）表示能力有限，（2）适应性较低，（3）可扩展性减弱。这些缺点引发了一个关键问题：欧几里得几何是否真的是所有基础模型的最佳归纳偏差，或者融合替代几何空间能够使模型更好地与真实世界数据的内在结构对齐并改善推理过程？双曲空间，一类以指数增长的体积随距离变化而特征化的非欧几里得流形，提供了一个数学上基础的解决方案。这些空间可以实现层次结构（例如树状结构、分类法）和幂律分布的低失真嵌入，与欧几里得对应物相比，所需的维度要少得多。最近的进展已经利用这些特性来增强基础模型，包括提高LLMs的复杂推理能力、VLMs的零次泛化和跨模态语义对齐，同时保持参数效率。本文对双曲神经网络及其最近对基础模型的发展进行了全面回顾。我们进一步概述了推进该领域的关键挑战和研究方向。

更新时间: 2025-07-23 09:50:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17787v1

Hyperbolic Deep Learning for Foundation Models: A Survey

Updated: 2025-07-23 09:50:17

标题: 超几何深度学习用于基础模型：一项调查

摘要: 在大规模数据集上预训练的基础模型，包括大型语言模型（LLMs）、视觉-语言模型（VLMs）和大型多模态模型，已在多样化的下游任务中取得了显著成功。然而，最近的研究显示了这些模型的基本局限性：（1）有限的表示能力，（2）较低的适应性，和（3）递减的可伸缩性。这些缺点引发了一个关键问题：欧几里得几何是否真的是所有基础模型的最佳归纳偏差，或者将替代几何空间纳入模型能够更好地与真实数据的内在结构对齐并改善推理过程？双曲空间是一类非欧几里得流形，其特点是与距离成指数增长的体积，提供了一个数学上有根据的解决方案。这些空间可以实现对层次结构（如树形结构、分类法）和幂律分布的低失真嵌入，与欧几里得对应物相比，所需的维度要少得多。最近的进展已利用这些特性来增强基础模型，包括提高LLMs的复杂推理能力、VLMs的零样本泛化和跨模态语义对齐，同时保持参数效率。本文全面评述了双曲神经网络及其最新发展，进一步概述了推进该领域的关键挑战和研究方向。

更新时间: 2025-07-23 09:50:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17787v1

TOC-UCO: a comprehensive repository of tabular ordinal classification datasets

An ordinal classification (OC) problem corresponds to a special type of classification characterised by the presence of a natural order relationship among the classes. This type of problem can be found in a number of real-world applications, motivating the design and development of many ordinal methodologies over the last years. However, it is important to highlight that the development of the OC field suffers from one main disadvantage: the lack of a comprehensive set of datasets on which novel approaches to the literature have to be benchmarked. In order to approach this objective, this manuscript from the University of C\'ordoba (UCO), which have previous experience on the OC field, provides the literature with a publicly available repository of tabular data for a robust validation of novel OC approaches, namely TOC-UCO (Tabular Ordinal Classification repository of the UCO). Specifically, this repository includes a set of $46$ tabular ordinal datasets, preprocessed under a common framework and ensured to have a reasonable number of patterns and an appropriate class distribution. We also provide the sources and preprocessing steps of each dataset, along with details on how to benchmark a novel approach using the TOC-UCO repository. For this, indices for $30$ different randomised train-test partitions are provided to facilitate the reproducibility of the experiments.

Updated: 2025-07-23 09:28:52

标题: TOC-UCO：一个全面的表格序数分类数据集库

摘要: 一个序数分类（OC）问题对应一种特殊类型的分类，其特点是类之间存在自然的顺序关系。这种问题在许多现实世界的应用中都可以找到，促使在过去几年中设计和开发了许多序数方法。然而，重要的是要强调OC领域的发展面临一个主要劣势：缺乏一个全面的数据集，新方法必须在其中进行基准测试。为了实现这一目标，科尔多瓦大学（UCO）的这篇论文提供了一份公开可用的表格数据库，用于对新的OC方法进行稳健验证，即TOC-UCO（UCO的表格序数分类存储库）。具体来说，这个存储库包括一个包含46个表格序数数据集的集合，经过一个共同的框架预处理，并确保具有合理数量的模式和适当的类分布。我们还提供每个数据集的来源和预处理步骤，以及如何使用TOC-UCO库对新方法进行基准测试的详细信息。为此，提供了30个不同随机训练-测试分区的指标，以便促进实验的可重复性。

更新时间: 2025-07-23 09:28:52

领域: cs.LG

下载: http://arxiv.org/abs/2507.17348v1

Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation

In the field of food image processing, efficient semantic segmentation techniques are crucial for industrial applications. However, existing large-scale Transformer-based models (such as FoodSAM) face challenges in meeting practical deploymentrequirements due to their massive parameter counts and high computational resource demands. This paper introduces TUNable Adapter module (Swin-TUNA), a Parameter Efficient Fine-Tuning (PEFT) method that integrates multiscale trainable adapters into the Swin Transformer architecture, achieving high-performance food image segmentation by updating only 4% of the parameters. The core innovation of Swin-TUNA lies in its hierarchical feature adaptation mechanism: it designs separable convolutions in depth and dimensional mappings of varying scales to address the differences in features between shallow and deep networks, combined with a dynamic balancing strategy for tasks-agnostic and task-specific features. Experiments demonstrate that this method achieves mIoU of 50.56% and 74.94% on the FoodSeg103 and UECFoodPix Complete datasets, respectively, surpassing the fully parameterized FoodSAM model while reducing the parameter count by 98.7% (to only 8.13M). Furthermore, Swin-TUNA exhibits faster convergence and stronger generalization capabilities in low-data scenarios, providing an efficient solution for assembling lightweight food image.

Updated: 2025-07-23 09:28:25

标题: Swin-TUNA：一种用于准确食物图像分割的新型PEFT方法

摘要: 在食品图像处理领域，高效的语义分割技术对于工业应用至关重要。然而，现有的基于Transformer的大规模模型（例如FoodSAM）面临实际部署需求的挑战，因为它们具有庞大的参数数量和高计算资源需求。本文介绍了一种可调适配器模块（Swin-TUNA），这是一种参数高效微调（PEFT）方法，将多尺度可训练适配器集成到Swin Transformer架构中，通过仅更新4%的参数实现高性能的食品图像分割。Swin-TUNA的核心创新在于其分层特征适应机制：它设计了可分离的卷积在深度和不同尺度的维度映射中来解决浅层和深层网络之间特征差异的问题，结合动态平衡策略来处理任务无关和任务特定的特征。实验证明，该方法在FoodSeg103和UECFoodPix Complete数据集上分别实现了50.56%和74.94%的mIoU，超越了完全参数化的FoodSAM模型，同时将参数数量减少了98.7%（仅为8.13M）。此外，在低数据场景下，Swin-TUNA表现出更快的收敛速度和更强的泛化能力，为组装轻量级食品图像提供了高效的解决方案。

更新时间: 2025-07-23 09:28:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17347v1

DeCo-SGD: Joint Optimization of Delay Staleness and Gradient Compression Ratio for Distributed SGD

Distributed machine learning in high end-to-end latency and low, varying bandwidth network environments undergoes severe throughput degradation. Due to its low communication requirements, distributed SGD (D-SGD) remains the mainstream optimizer in such challenging networks, but it still suffers from significant throughput reduction. To mitigate these limitations, existing approaches typically employ gradient compression and delayed aggregation to alleviate low bandwidth and high latency, respectively. To address both challenges simultaneously, these strategies are often combined, introducing a complex three-way trade-off among compression ratio, staleness (delayed synchronization steps), and model convergence rate. To achieve the balance under varying bandwidth conditions, an adaptive policy is required to dynamically adjust these parameters. Unfortunately, existing works rely on static heuristic strategies due to the lack of theoretical guidance, which prevents them from achieving this goal. This study fills in this theoretical gap by introducing a new theoretical tool, decomposing the joint optimization problem into a traditional convergence rate analysis with multiple analyzable noise terms. We are the first to reveal that staleness exponentially amplifies the negative impact of gradient compression on training performance, filling a critical gap in understanding how compressed and delayed gradients affect training. Furthermore, by integrating the convergence rate with a network-aware time minimization condition, we propose DeCo-SGD, which dynamically adjusts the compression ratio and staleness based on the real-time network condition and training task. DeCo-SGD achieves up to 5.07 and 1.37 speed-ups over D-SGD and static strategy in high-latency and low, varying bandwidth networks, respectively.

Updated: 2025-07-23 09:22:51

标题: DeCo-SGD: 分布式SGD的延迟陈旧和梯度压缩比例的联合优化

摘要: 在高端到端延迟和低、变化带宽网络环境中的分布式机器学习经历了严重的吞吐量下降。由于其低通信需求，分布式随机梯度下降（D-SGD）仍然是这种具有挑战性网络中的主流优化器，但它仍然受到显著的吞吐量减少的影响。为了缓解这些限制，现有方法通常采用梯度压缩和延迟聚合来减轻低带宽和高延迟的影响。为了同时解决这两个挑战，这些策略通常被结合起来，引入了在压缩比、过时度（延迟同步步骤）和模型收敛速度之间的复杂三方权衡。为了在不同带宽条件下实现平衡，需要一个自适应策略来动态调整这些参数。不幸的是，由于缺乏理论指导，现有的工作依赖于静态启发式策略，导致它们无法实现这一目标。本研究通过引入一种新的理论工具填补了这一理论空白，将联合优化问题分解为传统的收敛率分析和多个可分析的噪声项。我们首次揭示了过时性如何指数级地放大梯度压缩对训练性能的负面影响，填补了对压缩和延迟梯度如何影响训练的理解中的重要空白。此外，通过将收敛率与网络感知时间最小化条件相结合，我们提出了DeCo-SGD，根据实时网络条件和训练任务动态调整压缩比和过时度。DeCo-SGD在高延迟和低、变化带宽网络中分别比D-SGD和静态策略提高了5.07和1.37倍的速度。

更新时间: 2025-07-23 09:22:51

领域: cs.LG

下载: http://arxiv.org/abs/2507.17346v1

Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss

The task of Human-Object conTact (HOT) detection involves identifying the specific areas of the human body that are touching objects. Nevertheless, current models are restricted to just one type of image, often leading to too much segmentation in areas with little interaction, and struggling to maintain category consistency within specific regions. To tackle this issue, a HOT framework, termed \textbf{P3HOT}, is proposed, which blends \textbf{P}rompt guidance and human \textbf{P}roximal \textbf{P}erception. To begin with, we utilize a semantic-driven prompt mechanism to direct the network's attention towards the relevant regions based on the correlation between image and text. Then a human proximal perception mechanism is employed to dynamically perceive key depth range around the human, using learnable parameters to effectively eliminate regions where interactions are not expected. Calculating depth resolves the uncertainty of the overlap between humans and objects in a 2D perspective, providing a quasi-3D viewpoint. Moreover, a Regional Joint Loss (RJLoss) has been created as a new loss to inhibit abnormal categories in the same area. A new evaluation metric called ``AD-Acc.'' is introduced to address the shortcomings of existing methods in addressing negative samples. Comprehensive experimental results demonstrate that our approach achieves state-of-the-art performance in four metrics across two benchmark datasets. Specifically, our model achieves an improvement of \textbf{0.7}$\uparrow$, \textbf{2.0}$\uparrow$, \textbf{1.6}$\uparrow$, and \textbf{11.0}$\uparrow$ in SC-Acc., mIoU, wIoU, and AD-Acc. metrics, respectively, on the HOT-Annotated dataset. The sources code are available at https://github.com/YuxiaoWang-AI/P3HOT.

Updated: 2025-07-23 09:22:32

标题: 快速指导和人类近端感知在区域联合损失的热预测中的应用

摘要: 人体-物体接触（HOT）检测的任务涉及识别人体与物体接触的具体区域。然而，当前模型仅限于一种类型的图像，通常导致在互动较少的区域过多分割，并且难以在特定区域内保持类别一致性。为了解决这个问题，提出了一种名为P3HOT的HOT框架，它融合了Prompt引导和人类近距感知。首先，我们利用基于图像和文本之间相关性的语义驱动提示机制引导网络的注意力指向相关区域。然后，采用人类近距感知机制动态感知人体周围的关键深度范围，使用可学习参数有效消除预期未发生互动的区域。计算深度解决了人体和物体在二维视角中重叠的不确定性，提供了一种准三维视角。此外，创建了一个名为Regional Joint Loss（RJLoss）的新损失，用于抑制同一区域的异常类别。引入了一个名为“AD-Acc.”的新评估指标，以解决现有方法在处理负样本时存在的缺点。全面的实验结果表明，我们的方法在两个基准数据集上的四个指标中实现了最先进的性能。具体来说，我们的模型在HOT-Annotated数据集上的SC-Acc.、mIoU、wIoU和AD-Acc.指标分别实现了0.7↑、2.0↑、1.6↑和11.0↑的改进。源代码可在https://github.com/YuxiaoWang-AI/P3HOT 上找到。

更新时间: 2025-07-23 09:22:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.01630v2

Machine Learning-Based Modeling of the Anode Heel Effect in X-ray Beam Monte Carlo Simulations

To develop a machine learning-based framework for accurately modeling the anode heel effect in Monte Carlo simulations of X-ray imaging systems, enabling realistic beam intensity profiles with minimal experimental calibration. Multiple regression models were trained to predict spatial intensity variations along the anode-cathode axis using experimentally acquired weights derived from beam measurements across different tube potentials. These weights captured the asymmetry introduced by the anode heel effect. A systematic fine-tuning protocol was established to minimize the number of required measurements while preserving model accuracy. The models were implemented in the OpenGATE 10 and GGEMS Monte Carlo toolkits to evaluate their integration feasibility and predictive performance. Among the tested models, gradient boosting regression (GBR) delivered the highest accuracy, with prediction errors remaining below 5% across all energy levels. The optimized fine-tuning strategy required only six detector positions per energy level, reducing measurement effort by 65%. The maximum error introduced through this fine-tuning process remained below 2%. Dose actor comparisons within Monte Carlo simulations demonstrated that the GBR-based model closely replicated clinical beam profiles and significantly outperformed conventional symmetric beam models. This study presents a robust and generalizable method for incorporating the anode heel effect into Monte Carlo simulations using machine learning. By enabling accurate, energy-dependent beam modeling with limited calibration data, the approach enhances simulation realism for applications in clinical dosimetry, image quality assessment, and radiation protection.

Updated: 2025-07-23 09:16:28

标题: 基于机器学习的模型化在X射线束蒙特卡洛模拟中阴极脚效应

摘要: 为了开发一个基于机器学习的框架，准确地模拟X射线成像系统中阳极脚跟效应，在蒙特卡洛模拟中实现真实的射线强度剖面，同时最小化实验校准的需求。利用实验获得的权重训练了多元回归模型，用于预测沿着阳极-阴极轴的空间强度变化，这些权重是从在不同管电压下获得的射线测量中导出的。这些权重捕捉了阳极脚跟效应引入的不对称性。建立了一个系统的微调协议，以最小化所需测量数量，同时保持模型的准确性。这些模型已在OpenGATE 10和GGEMS蒙特卡洛工具包中实现，以评估它们的集成可行性和预测性能。在测试的模型中，梯度提升回归（GBR）提供了最高的准确性，预测误差在所有能量水平上均保持在5%以下。优化的微调策略每个能量级别仅需要六个探测器位置，将测量工作量减少了65%。通过这个微调过程引入的最大误差保持在2%以下。蒙特卡洛模拟中的剂量比较表明，基于GBR的模型紧密复制了临床射线剖面，并且明显优于传统的对称射线模型。本研究提出了一种稳健且可推广的方法，利用机器学习将阳极脚跟效应纳入蒙特卡洛模拟中。通过使用有限的校准数据实现准确的、能量相关的射线建模，该方法增强了应用于临床剂量学、图像质量评估和放射防护的模拟现实性。

更新时间: 2025-07-23 09:16:28

领域: physics.med-ph,cs.AI

下载: http://arxiv.org/abs/2504.19155v2

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

We introduce a reinforcement learning (RL) based adaptive optimization algorithm for aerodynamic shape optimization focused on dimensionality reduction. The form in which RL is applied here is that of a surrogate-based, actor-critic policy evaluation MCMC approach allowing for temporal 'freezing' of some of the parameters to be optimized. The goals are to minimize computational effort, and to use the observed optimization results for interpretation of the discovered extrema in terms of their role in achieving the desired flow-field. By a sequence of local optimized parameter changes around intermediate CFD simulations acting as ground truth, it is possible to speed up the global optimization if (a) the local neighbourhoods of the parameters in which the changed parameters must reside are sufficiently large to compete with the grid-sized steps and its large number of simulations, and (b) the estimates of the rewards and costs on these neighbourhoods necessary for a good step-wise parameter adaption are sufficiently accurate. We give an example of a simple fluid-dynamical problem on which the method allows interpretation in the sense of a feature importance scoring.

Updated: 2025-07-23 09:14:25

标题: 强化学习用于加速空气动力学形状优化

摘要: 我们引入了一种基于强化学习（RL）的自适应优化算法，用于空气动力学形状优化，重点是降低维度。在这里应用RL的形式是基于代理-评论者策略评估MCMC方法，允许对一些要优化的参数进行时间“冻结”。目标是最小化计算工作量，并利用观察到的优化结果来解释所发现的极值对实现期望流场的作用。通过围绕中间CFD模拟作为真实基准进行一系列本地优化参数变化，可以加快全局优化的速度，如果（a）参数的本地邻域足够大，可以与网格尺寸步长及其大量模拟竞争，并且（b）在这些邻域上的奖励和成本的估计对于良好的逐步参数适应是足够准确的。我们举例说明了一个简单的流体动力学问题，该方法允许解释为特征重要性评分的意义。

更新时间: 2025-07-23 09:14:25

领域: cs.LG

下载: http://arxiv.org/abs/2507.17786v1

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

Updated: 2025-07-23 09:14:25

标题: 强化学习在加速空气动力学形状优化中的应用

摘要: 我们介绍了一种基于强化学习（RL）的自适应优化算法，用于气动外形优化，重点是降维。在这里应用RL的形式是基于替代的，演员-评论家政策评估MCMC方法，允许对要优化的某些参数进行时间“冻结”。目标是最小化计算工作量，并利用观察到的优化结果解释所发现的极值在实现期望流场方面的作用。通过围绕中间CFD模拟作为基准事实的一系列局部优化参数更改，可以加快全局优化，如果（a）参数的本地邻域足够大以与网格大小的步长及其大量模拟竞争，并且（b）这些邻域上的奖励和成本的估计对于良好的逐步参数调整是足够准确的。我们给出了一个简单的流体动力学问题的示例，该方法允许在特征重要性评分的意义上进行解释。

更新时间: 2025-07-23 09:14:25

领域: cs.LG

下载: http://arxiv.org/abs/2507.17786v1

Principled Multimodal Representation Learning

Multimodal representation learning seeks to create a unified representation space by integrating diverse data modalities to improve multimodal understanding. Traditional methods often depend on pairwise contrastive learning, which relies on a predefined anchor modality, restricting alignment across all modalities. Recent advances have investigated the simultaneous alignment of multiple modalities, yet several challenges remain, such as limitations imposed by fixed anchor points and instability arising from optimizing the product of singular values. To address the challenges, in this paper, we propose Principled Multimodal Representation Learning (PMRL), a novel framework that achieves simultaneous alignment of multiple modalities without anchor dependency in a more stable manner. Specifically, grounded in the theoretical insight that full alignment corresponds to a rank-1 Gram matrix, PMRL optimizes the dominant singular value of the representation matrix to align modalities along a shared leading direction. We propose a softmax-based loss function that treats singular values as logits to prioritize the largest singular value. Besides, instance-wise contrastive regularization on the leading eigenvectors maintains inter-instance separability and prevents representation collapse. Extensive experiments across diverse tasks demonstrate PMRL's superiority compared to baseline methods. The source code will be publicly available.

Updated: 2025-07-23 09:12:25

标题: 原则性多模态表示学习

摘要: 多模态表示学习旨在通过整合多样的数据模态来创建统一的表示空间，以提高多模态理解能力。传统方法通常依赖于成对对比学习，这依赖于预定义的锚定模态，限制了所有模态之间的对齐。最近的进展已经研究了多模态的同时对齐，但仍然存在一些挑战，例如固定锚点所带来的限制以及优化奇异值乘积所产生的不稳定性。为了解决这些挑战，在本文中，我们提出了基于原则的多模态表示学习（PMRL），这是一个新颖的框架，以更稳定的方式实现多模态的同时对齐，而不依赖于锚定。具体地，基于理论洞察力，完全对齐对应于秩-1格拉姆矩阵，PMRL优化表示矩阵的主要奇异值，以沿着共享的主导方向对齐模态。我们提出了一个基于softmax的损失函数，将奇异值视为logits以优先考虑最大奇异值。此外，对于领先特征向量的实例级对比正则化可以保持实例之间的可分性，并防止表示坍缩。广泛的实验跨越不同任务表明PMRL相比基线方法具有更好的性能。源代码将公开提供。

更新时间: 2025-07-23 09:12:25

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2507.17343v1

Automated planning with ontologies under coherence update semantics (Extended Version)

Standard automated planning employs first-order formulas under closed-world semantics to achieve a goal with a given set of actions from an initial state. We follow a line of research that aims to incorporate background knowledge into automated planning problems, for example, by means of ontologies, which are usually interpreted under open-world semantics. We present a new approach for planning with DL-Lite ontologies that combines the advantages of ontology-based action conditions provided by explicit-input knowledge and action bases (eKABs) and ontology-aware action effects under the coherence update semantics. We show that the complexity of the resulting formalism is not higher than that of previous approaches and provide an implementation via a polynomial compilation into classical planning. An evaluation of existing and new benchmarks examines the performance of a planning system on different variants of our compilation.

Updated: 2025-07-23 09:09:15

标题: 使用一致性更新语义下的本体自动规划（扩展版）

摘要: 标准自动规划利用闭世界语义下的一阶公式来实现在初始状态下通过一组给定动作达到目标的目标。我们遵循一条旨在将背景知识纳入自动规划问题的研究路线，例如，通过本体，通常在开放世界语义下解释。我们提出了一种新的基于DL-Lite本体的规划方法，结合了本体为基础的行动条件和意识到的行动效果的优势，这些优势是通过显式输入知识和行动基础（eKABs）提供的，并在一致性更新语义下。我们展示了由此产生的形式主义的复杂性不高于先前方法，并通过多项式编译实现了一个经典规划。对现有和新的基准的评估考察了规划系统在我们的编译的不同变体上的性能。

更新时间: 2025-07-23 09:09:15

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2507.15120v2

Temporal Point-Supervised Signal Reconstruction: A Human-Annotation-Free Framework for Weak Moving Target Detection

In low-altitude surveillance and early warning systems, detecting weak moving targets remains a significant challenge due to low signal energy, small spatial extent, and complex background clutter. Existing methods struggle with extracting robust features and suffer from the lack of reliable annotations. To address these limitations, we propose a novel Temporal Point-Supervised (TPS) framework that enables high-performance detection of weak targets without any manual annotations.Instead of conventional frame-based detection, our framework reformulates the task as a pixel-wise temporal signal modeling problem, where weak targets manifest as short-duration pulse-like responses. A Temporal Signal Reconstruction Network (TSRNet) is developed under the TPS paradigm to reconstruct these transient signals.TSRNet adopts an encoder-decoder architecture and integrates a Dynamic Multi-Scale Attention (DMSAttention) module to enhance its sensitivity to diverse temporal patterns. Additionally, a graph-based trajectory mining strategy is employed to suppress false alarms and ensure temporal consistency.Extensive experiments on a purpose-built low-SNR dataset demonstrate that our framework outperforms state-of-the-art methods while requiring no human annotations. It achieves strong detection performance and operates at over 1000 FPS, underscoring its potential for real-time deployment in practical scenarios.

Updated: 2025-07-23 09:02:09

标题: 时间点监督信号重建：一种无需人工标注的弱移动目标检测框架

摘要: 在低空监视和预警系统中，由于信号能量低、空间范围小和复杂的背景杂波，检测弱移动目标仍然是一个重大挑战。现有方法在提取稳健特征方面存在困难，并且缺乏可靠的注释。为了解决这些限制，我们提出了一种新颖的时间点监督（TPS）框架，可以在没有任何手动注释的情况下实现对弱目标的高性能检测。与传统的基于帧的检测不同，我们的框架重新制定了任务，将其作为一个像素级的时间信号建模问题，其中弱目标表现为短时脉冲响应。在TPS范式下开发了一个时间信号重建网络（TSRNet）来重建这些瞬态信号。TSRNet采用编码器-解码器架构，并整合了动态多尺度注意（DMSAttention）模块，以增强其对不同时间模式的敏感性。此外，采用了基于图的轨迹挖掘策略来抑制虚警并确保时间一致性。在一个专门构建的低信噪比数据集上进行了大量实验，结果表明我们的框架优于最先进的方法，同时不需要人工注释。它实现了强大的检测性能，并在1000帧/秒以上运行，凸显了其在实际场景中实时部署的潜力。

更新时间: 2025-07-23 09:02:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17334v1

Self-similarity Analysis in Deep Neural Networks

Current research has found that some deep neural networks exhibit strong hierarchical self-similarity in feature representation or parameter distribution. However, aside from preliminary studies on how the power-law distribution of weights across different training stages affects model performance,there has been no quantitative analysis on how the self-similarity of hidden space geometry influences model weight optimization, nor is there a clear understanding of the dynamic behavior of internal neurons. Therefore, this paper proposes a complex network modeling method based on the output features of hidden-layer neurons to investigate the self-similarity of feature networks constructed at different hidden layers, and analyzes how adjusting the degree of self-similarity in feature networks can enhance the classification performance of deep neural networks. Validated on three types of networks MLP architectures, convolutional networks, and attention architectures this study reveals that the degree of self-similarity exhibited by feature networks varies across different model architectures. Furthermore, embedding constraints on the self-similarity of feature networks during the training process can improve the performance of self-similar deep neural networks (MLP architectures and attention architectures) by up to 6 percentage points.

Updated: 2025-07-23 09:01:53

标题: 深度神经网络中的自相似性分析

摘要: 最近的研究发现，一些深度神经网络在特征表示或参数分布中表现出强烈的分层自相似性。然而，除了关于权重在不同训练阶段的幂律分布如何影响模型性能的初步研究外，对隐藏空间几何的自相似性如何影响模型权重优化还没有定量分析，对内部神经元的动态行为也没有清晰的理解。因此，本文提出了一种基于隐藏层神经元输出特征的复杂网络建模方法，以研究构建在不同隐藏层的特征网络的自相似性，并分析调整特征网络中自相似性程度如何增强深度神经网络的分类性能。在三种类型的网络MLP架构、卷积网络和注意力架构上进行验证，这项研究揭示了特征网络展现的自相似性程度在不同模型架构之间变化。此外，在训练过程中对特征网络的自相似性施加约束可以将自相似深度神经网络（MLP架构和注意力架构）的性能提高高达6个百分点。

更新时间: 2025-07-23 09:01:53

领域: cs.LG

下载: http://arxiv.org/abs/2507.17785v1

Self-similarity Analysis in Deep Neural Networks

Updated: 2025-07-23 09:01:53

标题: 深度神经网络中的自相似性分析

摘要: 当前的研究发现，一些深度神经网络在特征表示或参数分布中表现出强烈的分层自相似性。然而，除了关于权重在不同训练阶段的幂律分布如何影响模型性能的初步研究外，还没有对隐藏空间几何自相似性如何影响模型权重优化进行定量分析，也没有对内部神经元的动态行为有清晰的理解。因此，本文提出了一种基于隐藏层神经元输出特征的复杂网络建模方法，以研究构建在不同隐藏层的特征网络的自相似性，并分析调整特征网络中自相似性程度如何增强深度神经网络的分类性能。在三种网络MLP架构、卷积网络和注意力架构上验证，这项研究发现特征网络展现出的自相似性程度在不同的模型架构中有所变化。此外，在训练过程中对特征网络的自相似性施加约束可以将自相似深度神经网络（MLP架构和注意力架构）的性能提高高达6个百分点。

更新时间: 2025-07-23 09:01:53

领域: cs.LG

下载: http://arxiv.org/abs/2507.17785v1

Optimizing Privacy-Utility Trade-off in Decentralized Learning with Generalized Correlated Noise

Decentralized learning enables distributed agents to collaboratively train a shared machine learning model without a central server, through local computation and peer-to-peer communication. Although each agent retains its dataset locally, sharing local models can still expose private information about the local training datasets to adversaries. To mitigate privacy attacks, a common strategy is to inject random artificial noise at each agent before exchanging local models between neighbors. However, this often leads to utility degradation due to the negative effects of cumulated artificial noise on the learning algorithm. In this work, we introduce CorN-DSGD, a novel covariance-based framework for generating correlated privacy noise across agents, which unifies several state-of-the-art methods as special cases. By leveraging network topology and mixing weights, CorN-DSGD optimizes the noise covariance to achieve network-wide noise cancellation. Experimental results show that CorN-DSGD cancels more noise than existing pairwise correlation schemes, improving model performance under formal privacy guarantees.

Updated: 2025-07-23 08:55:02

标题: 在去中心化学习中优化隐私-效用权衡与广义相关噪声

摘要: 分散式学习使分布式代理能够通过本地计算和点对点通信共同训练共享的机器学习模型，而无需中央服务器。虽然每个代理保留其本地数据集，但共享本地模型仍可能向对手暴露关于本地训练数据集的私人信息。为了减轻隐私攻击，一种常见策略是在代理之间交换本地模型之前在每个代理处注入随机人工噪声。然而，这通常会导致效用下降，因为累积人工噪声对学习算法的负面影响。在这项工作中，我们引入了CorN-DSGD，一个基于协方差的框架，用于在代理之间生成相关的隐私噪声，将几种最先进的方法统一为特例。通过利用网络拓扑和混合权重，CorN-DSGD优化噪声协方差以实现网络范围的噪声消除。实验结果表明，CorN-DSGD比现有的成对相关方案取消更多噪声，在正式隐私保证下改善了模型性能。

更新时间: 2025-07-23 08:55:02

领域: cs.LG,cs.AI,cs.CR,cs.DC

下载: http://arxiv.org/abs/2501.14644v2

A Learning-based Domain Decomposition Method

Recent developments in mechanical, aerospace, and structural engineering have driven a growing need for efficient ways to model and analyse structures at much larger and more complex scales than before. While established numerical methods like the Finite Element Method remain reliable, they often struggle with computational cost and scalability when dealing with large and geometrically intricate problems. In recent years, neural network-based methods have shown promise because of their ability to efficiently approximate nonlinear mappings. However, most existing neural approaches are still largely limited to simple domains, which makes it difficult to apply to real-world PDEs involving complex geometries. In this paper, we propose a learning-based domain decomposition method (L-DDM) that addresses this gap. Our approach uses a single, pre-trained neural operator-originally trained on simple domains-as a surrogate model within a domain decomposition scheme, allowing us to tackle large and complicated domains efficiently. We provide a general theoretical result on the existence of neural operator approximations in the context of domain decomposition solution of abstract PDEs. We then demonstrate our method by accurately approximating solutions to elliptic PDEs with discontinuous microstructures in complex geometries, using a physics-pretrained neural operator (PPNO). Our results show that this approach not only outperforms current state-of-the-art methods on these challenging problems, but also offers resolution-invariance and strong generalization to microstructural patterns unseen during training.

Updated: 2025-07-23 08:54:36

标题: 一种基于学习的域分解方法

摘要: 最近在机械、航空航天和结构工程领域的发展推动了对比以往规模更大、更复杂结构建模和分析方法的日益增长需求。虽然已建立的数值方法如有限元法仍然可靠，但在处理大型和几何复杂问题时往往面临计算成本和可扩展性方面的困难。近年来，基于神经网络的方法显示出潜力，因为它们能够高效地逼近非线性映射关系。然而，大多数现有的神经网络方法仍然主要局限于简单领域，这使得难以应用于涉及复杂几何形状的实际PDEs。在本文中，我们提出了一种学习型域分解方法（L-DDM），以解决这一差距。我们的方法使用一个在简单领域上进行过预训练的单个神经操作符作为一个代理模型，在域分解方案中高效处理大型和复杂领域。我们在抽象PDEs域分解解决方案的背景下提供了神经操作符逼近存在性的一般理论结果。然后，我们通过使用一个经过物理预训练的神经操作符（PPNO），准确逼近在复杂几何结构中具有不连续微结构的椭圆PDEs的解来展示我们的方法。我们的结果表明，这种方法不仅在这些具有挑战性问题上胜过当前的最先进方法，还具有分辨率不变性和对训练期间未见的微结构模式的强大泛化能力。

更新时间: 2025-07-23 08:54:36

领域: cs.LG,math-ph,math.MP

下载: http://arxiv.org/abs/2507.17328v1

An Empirical Study on Virtual Reality Software Security Weaknesses

Virtual Reality (VR) has emerged as a transformative technology across industries, yet its security weaknesses, including vulnerabilities, are underinvestigated. This study investigates 334 VR projects hosted on GitHub, examining 1,681 software security weaknesses to understand: what types of weaknesses are prevalent in VR software; {\em when} and {\em how} weaknesses are introduced; how long they have survived; and how they have been removed. Due to the limited availability of VR software security weaknesses in public databases (e.g., the National Vulnerability Database or NVD), we prepare the {first systematic} dataset of VR software security weaknesses by introducing a novel framework to collect such weaknesses from GitHub commit data. Our empirical study on the dataset leads to useful insights, including: (i) VR weaknesses are heavily skewed toward user interface weaknesses, followed by resource-related weaknesses; (ii) VR development tools pose higher security risks than VR applications; (iii) VR security weaknesses are often introduced at the VR software birth time.

Updated: 2025-07-23 08:45:53

标题: 一项关于虚拟现实软件安全弱点的实证研究

摘要: 虚拟现实（VR）已经成为跨行业的一项革命性技术，然而其安全弱点，包括漏洞，尚未受到充分调查。本研究调查了在GitHub上托管的334个VR项目，检查了1,681个软件安全弱点，以了解：VR软件中哪些类型的弱点最为普遍；这些弱点是在何时和如何引入的；它们存在了多长时间；以及它们是如何被消除的。由于在公共数据库（如国家漏洞数据库或NVD）中有关VR软件安全弱点的信息有限，我们通过引入一种新的框架从GitHub提交数据中收集这类弱点，准备了第一个系统性的VR软件安全弱点数据集。我们对数据集进行的实证研究带来了有用的见解，包括：（i）VR弱点在很大程度上偏向于用户界面弱点，其次是与资源相关的弱点；（ii）VR开发工具比VR应用程序面临更高的安全风险；（iii）VR安全弱点通常在VR软件诞生时引入。

更新时间: 2025-07-23 08:45:53

领域: cs.CR

下载: http://arxiv.org/abs/2507.17324v1

RIS-aided Latent Space Alignment for Semantic Channel Equalization

Semantic communication systems introduce a new paradigm in wireless communications, focusing on transmitting the intended meaning rather than ensuring strict bit-level accuracy. These systems often rely on Deep Neural Networks (DNNs) to learn and encode meaning directly from data, enabling more efficient communication. However, in multi-user settings where interacting agents are trained independently-without shared context or joint optimization-divergent latent representations across AI-native devices can lead to semantic mismatches, impeding mutual understanding even in the absence of traditional transmission errors. In this work, we address semantic mismatch in Multiple-Input Multiple-Output (MIMO) channels by proposing a joint physical and semantic channel equalization framework that leverages the presence of Reconfigurable Intelligent Surfaces (RIS). The semantic equalization is implemented as a sequence of transformations: (i) a pre-equalization stage at the transmitter; (ii) propagation through the RIS-aided channel; and (iii) a post-equalization stage at the receiver. We formulate the problem as a constrained Minimum Mean Squared Error (MMSE) optimization and propose two solutions: (i) a linear semantic equalization chain, and (ii) a non-linear DNN-based semantic equalizer. Both methods are designed to operate under semantic compression in the latent space and adhere to transmit power constraints. Through extensive evaluations, we show that the proposed joint equalization strategies consistently outperform conventional, disjoint approaches to physical and semantic channel equalization across a broad range of scenarios and wireless channel conditions.

Updated: 2025-07-23 08:38:58

标题: RIS辅助的潜空间对齐用于语义通道均衡

摘要: 语义通信系统引入了无线通信领域的一个新范式，其重点是传输预期的含义而不是确保严格的比特级准确性。这些系统通常依赖于深度神经网络（DNNs）直接从数据中学习和编码含义，从而实现更高效的通信。然而，在多用户环境中，相互作用的代理独立训练，没有共享的背景或联合优化，AI原生设备之间的不同潜在表示可能导致语义不匹配，即使在传统传输错误不存在的情况下也会妨碍相互理解。在这项工作中，我们通过提出一个利用可重构智能表面（RIS）存在的联合物理和语义通道均衡框架来解决MIMO通道中的语义不匹配问题。语义均衡实施为一系列转换：（i）发射端的预均衡阶段；（ii）通过RIS辅助通道传播；以及（iii）接收端的后均衡阶段。我们将问题形式化为一个受限制的最小均方误差（MMSE）优化问题，并提出两种解决方案：（i）线性语义均衡链和（ii）基于非线性DNN的语义均衡器。这两种方法都设计为在潜在空间中进行语义压缩并遵守发射功率约束。通过广泛的评估，我们展示了所提出的联合均衡策略在各种场景和无线信道条件下一贯优于传统的物理和语义通道均衡的分离方法。

更新时间: 2025-07-23 08:38:58

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.16450v2

Towards Detecting Persuasion on Social Media: From Model Development to Insights on Persuasion Strategies

Political advertising plays a pivotal role in shaping public opinion and influencing electoral outcomes, often through subtle persuasive techniques embedded in broader propaganda strategies. Detecting these persuasive elements is crucial for enhancing voter awareness and ensuring transparency in democratic processes. This paper presents an integrated approach that bridges model development and real-world application through two interconnected studies. First, we introduce a lightweight model for persuasive text detection that achieves state-of-the-art performance in Subtask 3 of SemEval 2023 Task 3 while requiring significantly fewer computational resources and training data than existing methods. Second, we demonstrate the model's practical utility by collecting the Australian Federal Election 2022 Facebook Ads (APA22) dataset, partially annotating a subset for persuasion, and fine-tuning the model to adapt from mainstream news to social media content. We then apply the fine-tuned model to label the remainder of the APA22 dataset, revealing distinct patterns in how political campaigns leverage persuasion through different funding strategies, word choices, demographic targeting, and temporal shifts in persuasion intensity as election day approaches. Our findings not only underscore the necessity of domain-specific modeling for analyzing persuasion on social media but also show how uncovering these strategies can enhance transparency, inform voters, and promote accountability in digital campaigns.

Updated: 2025-07-23 08:38:14

标题: 朝向在社交媒体上检测说服力：从模型开发到说服策略洞察

摘要: 政治广告在塑造公众舆论和影响选举结果中发挥着至关重要的作用，通常通过嵌入在更广泛的宣传策略中的微妙说服技巧来实现。检测这些说服元素对于增强选民意识并确保民主过程的透明度至关重要。本文提出了一种整合方法，通过两个互相关联的研究，实现了模型开发和现实应用的桥梁。首先，我们介绍了一个轻量级的说服文本检测模型，该模型在SemEval 2023任务3的Subtask 3中取得了最新的性能，而且相较于现有方法，所需的计算资源和训练数据明显更少。其次，我们通过收集2022年澳大利亚联邦选举的Facebook广告（APA22）数据集，并对部分数据进行说服性标注，并对模型进行微调，使其能够从主流新闻转变为社交媒体内容。然后，我们将微调后的模型应用于标记APA22数据集的其余部分，揭示了政治竞选如何通过不同的资金策略、用词选择、人口统计信息定位以及在选举日临近时说服强度的时间变化等方面利用说服。我们的研究结果不仅强调了社交媒体上分析说服所必需的领域特定建模的重要性，还展示了揭示这些策略如何增强透明度，告知选民，并促进数字竞选中的问责。

更新时间: 2025-07-23 08:38:14

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2503.13844v2

Conflict Detection for Temporal Knowledge Graphs:A Fast Constraint Mining Algorithm and New Benchmarks

Temporal facts, which are used to describe events that occur during specific time periods, have become a topic of increased interest in the field of knowledge graph (KG) research. In terms of quality management, the introduction of time restrictions brings new challenges to maintaining the temporal consistency of KGs. Previous studies rely on manually enumerated temporal constraints to detect conflicts, which are labor-intensive and may have granularity issues. To address this problem, we start from the common pattern of temporal facts and propose a pattern-based temporal constraint mining method, PaTeCon. Unlike previous studies, PaTeCon uses graph patterns and statistical information relevant to the given KG to automatically generate temporal constraints, without the need for human experts. In this paper, we illustrate how this method can be optimized to achieve significant speed improvement. We also annotate Wikidata and Freebase to build two new benchmarks for conflict detection. Extensive experiments demonstrate that our pattern-based automatic constraint mining approach is highly effective in generating valuable temporal constraints.

Updated: 2025-07-23 08:36:26

标题: 冲突检测用于时间知识图谱：一种快速约束挖掘算法和新的基准测试

摘要: 时间事实，用于描述发生在特定时间段内的事件，已经成为知识图谱（KG）研究领域日益受关注的话题。在质量管理方面，引入时间限制给KG的时间一致性维护带来了新的挑战。先前的研究依赖于手动枚举的时间约束来检测冲突，这种方法劳动密集且可能存在粒度问题。为了解决这个问题，我们从时间事实的常见模式出发，提出了一种基于模式的时间约束挖掘方法PaTeCon。与先前的研究不同，PaTeCon利用图模式和与给定KG相关的统计信息自动生成时间约束，无需人类专家。本文阐述了如何优化该方法以实现显著的速度提升。我们还对Wikidata和Freebase进行了标注，建立了两个新的冲突检测基准。广泛的实验证明，我们基于模式的自动约束挖掘方法在生成有价值的时间约束方面非常有效。

更新时间: 2025-07-23 08:36:26

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2312.11053v2

Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability

We consider the problem of estimating a discrete distribution $p$ with support of size $K$ and provide both upper and lower bounds with high probability in KL divergence. We prove that in the worst case, for any estimator $\widehat{p}$, with probability at least $\delta$, $\text{KL}(p \| \widehat{p}) \geq C\max\{K,\ln(K)\ln(1/\delta) \}/n $, where $n$ is the sample size and $C > 0$ is a constant. We introduce a computationally efficient estimator $p^{\text{OTB}}$, based on Online to Batch conversion and suffix averaging, and show that with probability at least $1 - \delta$ $\text{KL}(p \| \widehat{p}) \leq C(K\log(\log(K)) + \ln(K)\ln(1/\delta)) /n$. Furthermore, we also show that with sufficiently many observations relative to $\log(1/\delta)$, the maximum likelihood estimator $\bar{p}$ guarantees that with probability at least $1-\delta$ $$ 1/6 \chi^2(\bar{p}\|p) \leq 1/4 \chi^2(p\|\bar{p}) \leq \text{KL}(p|\bar{p}) \leq C(K + \log(1/\delta))/n\,, $$ where $\chi^2$ denotes the $\chi^2$-divergence.

Updated: 2025-07-23 08:30:37

标题: 在高概率条件下，Kullback-Leibler散度中的近最小化离散分布估计

摘要: 我们考虑估计具有大小为$K$的支持的离散分布$p$的问题，并在KL散度中提供了高概率的上下界。我们证明，在最坏情况下，对于任何估计器$\widehat{p}$，至少以概率$\delta$，$\text{KL}(p \| \widehat{p}) \geq C\max\{K,\ln(K)\ln(1/\delta) \}/n$，其中$n$是样本大小，$C > 0$是一个常数。我们引入了一个计算效率高的估计器$p^{\text{OTB}}$，基于在线转换为批处理和后缀平均，并表明至少以概率$1 - \delta$，$\text{KL}(p \| \widehat{p}) \leq C(K\log(\log(K)) + \ln(K)\ln(1/\delta)) /n$。此外，我们还表明，相对于$\log(1/\delta)$有足够多观测时，最大似然估计器$\bar{p}$保证至少以概率$1-\delta$，$$ 1/6 \chi^2(\bar{p}\|p) \leq 1/4 \chi^2(p\|\bar{p}) \leq \text{KL}(p|\bar{p}) \leq C(K + \log(1/\delta))/n\,, $$其中$\chi^2$表示$\chi^2$-散度。

更新时间: 2025-07-23 08:30:37

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.17316v1

EarthLink: Interpreting Climate Signals with Self-Evolving AI Agents

Modern Earth science is at an inflection point. The vast, fragmented, and complex nature of Earth system data, coupled with increasingly sophisticated analytical demands, creates a significant bottleneck for rapid scientific discovery. Here we introduce EarthLink, the first AI agent designed as an interactive copilot for Earth scientists. It automates the end-to-end research workflow, from planning and code generation to multi-scenario analysis. Unlike static diagnostic tools, EarthLink can learn from user interaction, continuously refining its capabilities through a dynamic feedback loop. We validated its performance on a number of core scientific tasks of climate change, ranging from model-observation comparisons to the diagnosis of complex phenomena. In a multi-expert evaluation, EarthLink produced scientifically sound analyses and demonstrated an analytical competency that was rated as comparable to specific aspects of a human junior researcher's workflow. Additionally, its transparent, auditable workflows and natural language interface empower scientists to shift from laborious manual execution to strategic oversight and hypothesis generation. EarthLink marks a pivotal step towards an efficient, trustworthy, and collaborative paradigm for Earth system research in an era of accelerating global change.

Updated: 2025-07-23 08:29:25

标题: EarthLink：利用自进化AI代理解读气候信号

摘要: 现代地球科学正处于一个转折点。地球系统数据的庞大、碎片化和复杂性，以及对越来越复杂的分析需求，为快速科学发现创造了一个重要的瓶颈。在这里，我们介绍EarthLink，这是第一个设计为地球科学家的交互式副驾驶的人工智能代理。它自动化了从规划和代码生成到多场景分析的端到端研究工作流程。与静态诊断工具不同，EarthLink可以通过动态反馈循环从用户交互中学习，不断完善其能力。我们在气候变化的若干核心科学任务上验证了其性能，范围从模型观测比较到复杂现象的诊断。在多专家评估中，EarthLink产生了科学上可靠的分析，并展示了一种被评为与人类初级研究人员工作流特定方面相媲美的分析能力。此外，其透明、可审计的工作流程和自然语言界面使科学家能够从繁琐的手动执行转变为战略监督和假设生成。EarthLink标志着地球系统研究朝着在全球变化加速时代的高效、可信赖和协作范式迈出了关键一步。

更新时间: 2025-07-23 08:29:25

领域: cs.LG,cs.AI,physics.ao-ph

下载: http://arxiv.org/abs/2507.17311v1

Confounded Causal Imitation Learning with Instrumental Variables

Imitation learning from demonstrations usually suffers from the confounding effects of unmeasured variables (i.e., unmeasured confounders) on the states and actions. If ignoring them, a biased estimation of the policy would be entailed. To break up this confounding gap, in this paper, we take the best of the strong power of instrumental variables (IV) and propose a Confounded Causal Imitation Learning (C2L) model. This model accommodates confounders that influence actions across multiple timesteps, rather than being restricted to immediate temporal dependencies. We develop a two-stage imitation learning framework for valid IV identification and policy optimization. In particular, in the first stage, we construct a testing criterion based on the defined pseudo-variable, with which we achieve identifying a valid IV for the C2L models. Such a criterion entails the sufficient and necessary identifiability conditions for IV validity. In the second stage, with the identified IV, we propose two candidate policy learning approaches: one is based on a simulator, while the other is offline. Extensive experiments verified the effectiveness of identifying the valid IV as well as learning the policy.

Updated: 2025-07-23 08:23:34

标题: 用工具变量混淆因果模仿学习

摘要: 从示范中学习的模仿通常受到未测量变量（即未测量混淆因素）对状态和动作的混淆效应的影响。如果忽略它们，将导致对策略的偏倚估计。为了打破这种混淆差距，在本文中，我们充分利用工具变量（IV）的强大能力，提出了一个混淆因果模仿学习（C2L）模型。该模型容纳了影响多个时间步骤的动作的混淆因素，而不仅仅局限于即时时间依赖关系。我们为有效的IV识别和策略优化开发了一个两阶段模仿学习框架。特别是，在第一阶段，我们基于定义的伪变量构建了一个测试标准，通过该标准我们实现了为C2L模型识别有效的IV。这种标准包含了IV有效性的充分和必要可辨认条件。在第二阶段，通过识别的IV，我们提出了两种候选策略学习方法：一种基于模拟器，另一种是离线的。大量实验证实了识别有效的IV以及学习策略的有效性。

更新时间: 2025-07-23 08:23:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17309v1

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

Chain-of-thought (CoT) reasoning enhances the problem-solving capabilities of large language models by encouraging step-by-step intermediate reasoning during inference. While effective, CoT introduces substantial computational overhead due to its reliance on autoregressive decoding over long token sequences. Existing acceleration strategies either reduce sequence length through early stopping or compressive reward designs, or improve decoding speed via speculative decoding with smaller models. However, speculative decoding suffers from limited speedup when the agreement between small and large models is low, and fails to exploit the potential advantages of small models in producing concise intermediate reasoning. In this paper, we present R-Stitch, a token-level, confidence-based hybrid decoding framework that accelerates CoT inference by switching between a small language model (SLM) and a large language model (LLM) along the reasoning trajectory. R-Stitch uses the SLM to generate tokens by default and delegates to the LLM only when the SLM's confidence falls below a threshold. This design avoids full-sequence rollback and selectively invokes the LLM on uncertain steps, preserving both efficiency and answer quality. R-Stitch is model-agnostic, training-free, and compatible with standard decoding pipelines. Experiments on math reasoning benchmarks demonstrate that R-Stitch achieves up to 85\% reduction in inference latency with negligible accuracy drop, highlighting its practical effectiveness in accelerating CoT reasoning.

Updated: 2025-07-23 08:14:36

标题: R-Stitch: 用于高效推理的动态轨迹拼接

摘要: 思维链（CoT）推理通过在推理过程中鼓励逐步中间推理来增强大型语言模型的问题解决能力。虽然有效，但由于其依赖于长标记序列上的自回归解码，CoT引入了相当大的计算开销。现有的加速策略要么通过提前停止或压缩奖励设计减少序列长度，要么通过使用更小的模型进行推测解码来提高解码速度。然而，当小型模型和大型模型之间的一致性较低时，推测解码会受到速度提升的限制，并且无法充分利用小型模型在生成简洁中间推理方面的潜在优势。在本文中，我们提出了R-Stitch，一个基于标记级别和置信度的混合解码框架，通过在推理轨迹上在小语言模型（SLM）和大语言模型（LLM）之间切换来加速CoT推理。R-Stitch默认使用SLM生成标记，并仅当SLM的置信度低于阈值时才委托给LLM。这种设计避免了完整序列回滚，并且仅在不确定步骤上选择性地调用LLM，既保持了效率又保持了答案质量。R-Stitch是与模型无关的、无需训练的，并且与标准解码流程兼容。对数学推理基准测试的实验表明，R-Stitch在推理延迟上实现了高达85%的减少，同时准确率几乎没有下降，突显了其在加速CoT推理方面的实际有效性。

更新时间: 2025-07-23 08:14:36

领域: cs.LG

下载: http://arxiv.org/abs/2507.17307v1

Cross-domain Multi-step Thinking: Zero-shot Fine-grained Traffic Sign Recognition in the Wild

In this study, we propose Cross-domain Multi-step Thinking (CdMT) to improve zero-shot fine-grained traffic sign recognition (TSR) performance in the wild. Zero-shot fine-grained TSR in the wild is challenging due to the cross-domain problem between clean template traffic signs and real-world counterparts, and existing approaches particularly struggle with cross-country TSR scenarios, where traffic signs typically differ between countries. The proposed CdMT framework tackles these challenges by leveraging the multi-step reasoning capabilities of large multimodal models (LMMs). We introduce context, characteristic, and differential descriptions to design multiple thinking processes for LMMs. Context descriptions, which are enhanced by center coordinate prompt optimization, enable the precise localization of target traffic signs in complex road images and filter irrelevant responses via novel prior traffic sign hypotheses. Characteristic descriptions, which are derived from in-context learning with template traffic signs, bridge cross-domain gaps and enhance fine-grained TSR. Differential descriptions refine the multimodal reasoning ability of LMMs by distinguishing subtle differences among similar signs. CdMT is independent of training data and requires only simple and uniform instructions, enabling it to achieve cross-country TSR. We conducted extensive experiments on three benchmark datasets and two real-world datasets from different countries. The proposed CdMT framework achieved superior performance compared with other state-of-the-art methods on all five datasets, with recognition accuracies of 0.93, 0.89, 0.97, 0.89, and 0.85 on the GTSRB, BTSD, TT-100K, Sapporo, and Yokohama datasets, respectively.

Updated: 2025-07-23 08:14:06

标题: 跨领域多步思考：野外零样本细粒度交通标志识别

摘要: 在这项研究中，我们提出了跨领域多步思考（CdMT）来改善野外零样本细粒度交通标志识别（TSR）的性能。野外零样本细粒度TSR具有挑战性，因为干净模板交通标志与现实世界对应物之间存在跨领域问题，现有方法在跨国家TSR场景中尤其困难，因为各国交通标志通常有所不同。所提出的CdMT框架通过利用大型多模态模型（LMMs）的多步推理能力来应对这些挑战。我们引入了上下文、特征和差异描述来设计LMMs的多个思考过程。通过中心坐标提示优化增强的上下文描述，使复杂道路图像中目标交通标志的定位更加精确，并通过新型先验交通标志假设过滤不相关的响应。由于与模板交通标志的上下文学习而衍生出的特征描述，弥合了跨领域差距，增强了细粒度TSR。差异描述通过区分相似标志之间的细微差异，提炼了LMMs的多模态推理能力。CdMT独立于训练数据，仅需要简单统一的指导，使其能够实现跨国家TSR。我们在三个基准数据集和两个来自不同国家的真实世界数据集上进行了大量实验。所提出的CdMT框架在所有五个数据集上均表现出优越的性能，分别在GTSRB、BTSD、TT-100K、Sapporo和Yokohama数据集上的识别准确率为0.93、0.89、0.97、0.89和0.85。

更新时间: 2025-07-23 08:14:06

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2409.01534v2

A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

Multimodal large language models (MLLMs) have emerged as powerful tools for computational pathology, offering unprecedented opportunities to integrate pathological images with language context for comprehensive diagnostic analysis. These models hold particular promise for automating complex tasks that traditionally require expert interpretation of pathologists. However, current MLLM approaches in pathology demonstrate significantly constrained reasoning capabilities, primarily due to their reliance on expensive chain-of-thought annotations. Additionally, existing methods remain limited to simplex application of visual question answering (VQA) at region-of-interest (ROI) level, failing to address the full spectrum of diagnostic needs such as ROI classification, detection, segmentation, whole-slide-image (WSI) classification and VQA in clinical practice. In this study, we present SmartPath-R1, a versatile MLLM capable of simultaneously addressing both ROI-level and WSI-level tasks while demonstrating robust pathological reasoning capability. Our framework combines scale-dependent supervised fine-tuning and task-aware reinforcement fine-tuning, which circumvents the requirement for chain-of-thought supervision by leveraging the intrinsic knowledge within MLLM. Furthermore, SmartPath-R1 integrates multiscale and multitask analysis through a mixture-of-experts mechanism, enabling dynamic processing for diverse tasks. We curate a large-scale dataset comprising 2.3M ROI samples and 188K WSI samples for training and evaluation. Extensive experiments across 72 tasks validate the effectiveness and superiority of the proposed approach. This work represents a significant step toward developing versatile, reasoning-enhanced AI systems for precision pathology.

Updated: 2025-07-23 08:09:42

标题: 通过推理增强多模态大型语言模型实现的多功能病理学辅助系统

摘要: 多模态大语言模型（MLLMs）已经成为计算病理学的强大工具，为将病理图像与语言背景整合进行全面诊断分析提供了前所未有的机会。这些模型特别有希望自动化传统上需要专家病理学家解释的复杂任务。然而，当前病理学中的MLLM方法显示出明显受限的推理能力，主要是因为它们依赖昂贵的思维链注释。此外，现有方法仍然局限于在感兴趣区域（ROI）级别简单应用视觉问答（VQA），未能解决ROI分类、检测、分割、全幻灯片图像（WSI）分类和临床实践中的VQA等全方位诊断需求。在这项研究中，我们提出了SmartPath-R1，一种多功能MLLM，能够同时处理ROI级别和WSI级别的任务，并展示出强大的病理推理能力。我们的框架结合了基于尺度的监督微调和基于任务的强化微调，通过利用MLLM内在知识来避免对思维链监督的要求。此外，SmartPath-R1通过专家混合机制集成了多尺度和多任务分析，为不同任务提供了动态处理能力。我们筛选了一个包含230万个ROI样本和18.8万个WSI样本的大规模数据集用于训练和评估。对72个任务进行了广泛实验，验证了所提出方法的有效性和优越性。这项工作代表着朝着为精准病理学开发多才多艺、推理增强的人工智能系统迈出了重要一步。

更新时间: 2025-07-23 08:09:42

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.17303v1

KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

Serving large language models (LLMs) is important for cloud providers, and caching intermediate results (KV\$) after processing each request substantially improves serving throughput and latency. However, there is limited understanding of how LLM serving benefits from KV\$ caching, where system design decisions like cache eviction policies are highly workload-dependent. In this paper, we present the first systematic characterization of the KV\$ workload patterns from one of the leading LLM service providers. We draw observations that were not covered by previous studies focusing on synthetic workloads, including: KV\$ reuses are skewed across requests, where reuses between single-turn requests are equally important as multi-turn requests; the reuse time and probability are diverse considering all requests, but for a specific request category, the pattern tends to be predictable; and the overall cache size required for an ideal cache hit ratio is moderate. Based on the characterization, we further propose a workload-aware cache eviction policy that improves the serving performance under real-world traces, especially with limited cache capacity.

Updated: 2025-07-23 08:07:19

标题: KVCache缓存在野外：对大型云服务提供商的KVCache缓存进行特征化和优化

摘要: 为云服务提供商提供大型语言模型（LLMs）的服务至关重要，通过在处理每个请求后缓存中间结果（KV\$）可以显着提高服务吞吐量和延迟。然而，对于LLM服务如何从KV\$缓存中受益还存在有限的理解，系统设计决策如缓存驱逐策略高度依赖于工作负载。在本文中，我们首次对来自一家领先的LLM服务提供商的KV\$工作负载模式进行了系统性的表征。我们得出了一些观察结果，这些观察结果不同于之前关注合成工作负载的研究，包括：KV\$的重用在请求之间存在偏斜，单次请求之间的重用与多次请求一样重要；重用时间和概率在考虑所有请求时是多样的，但对于特定的请求类别，模式往往是可预测的；理想的缓存命中率所需的整体缓存大小是适度的。基于这些表征，我们进一步提出了一种工作负载感知的缓存驱逐策略，可以在真实世界的迹线下提高服务性能，特别是在缓存容量有限的情况下。

更新时间: 2025-07-23 08:07:19

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2506.02634v4

Universal Fourier Neural Operators for Micromechanics

Solving cell problems in homogenization is hard, and available deep-learning frameworks fail to match the speed and generality of traditional computational frameworks. More to the point, it is generally unclear what to expect of machine-learning approaches, let alone single out which approaches are promising. In the work at hand, we advocate Fourier Neural Operators (FNOs) for micromechanics, empowering them by insights from computational micromechanics methods based on the fast Fourier transform (FFT). We construct an FNO surrogate mimicking the basic scheme foundational for FFT-based methods and show that the resulting operator predicts solutions to cell problems with arbitrary stiffness distribution only subject to a material-contrast constraint up to a desired accuracy. In particular, there are no restrictions on the material symmetry like isotropy, on the number of phases and on the geometry of the interfaces between materials. Also, the provided fidelity is sharp and uniform, providing explicit guarantees leveraging our physical empowerment of FNOs. To show the desired universal approximation property, we construct an FNO explicitly that requires no training to begin with. Still, the obtained neural operator complies with the same memory requirements as the basic scheme and comes with runtimes proportional to classical FFT solvers. In particular, large-scale problems with more than 100 million voxels are readily handled. The goal of this work is to underline the potential of FNOs for solving micromechanical problems, linking FFT-based methods to FNOs. This connection is expected to provide a fruitful exchange between both worlds.

Updated: 2025-07-23 08:07:08

标题: 微机械学的通用傅立叶神经算子

摘要: 在均质化过程中解决细胞问题很困难，现有的深度学习框架无法比拟传统计算框架的速度和通用性。更重要的是，机器学习方法的预期通常不明确，更不用说单独确定哪些方法有前景了。在这项工作中，我们提倡傅立叶神经算子（FNOs）用于微力学，通过基于快速傅立叶变换（FFT）的计算微力学方法的见解来增强它们。我们构建了一个FNO替代方案，模拟了基于FFT方法的基本方案，并展示了由该算子预测具有任意刚度分布的细胞问题的解，只受材料对比约束的限制，直至达到所需的精度。特别地，没有对材料对称性如各向同性、相数或材料间界面几何形状的限制。此外，所提供的准确性是尖锐且一致的，提供了明确的保证，利用了我们对FNO的物理增强。为了展示期望的通用逼近性质，我们明确构造了一个无需训练即可开始的FNO。然而，获得的神经算子符合与基本方案相同的内存需求，并且具有与经典FFT求解器成比例的运行时间。特别地，处理超过1亿个体素的大规模问题是容易的。这项工作的目标是强调FNO在解决微力学问题方面的潜力，将基于FFT方法与FNO联系起来。这种联系预计将在两个领域之间提供丰富的交流。

更新时间: 2025-07-23 08:07:08

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2507.12233v2

Cautious Next Token Prediction

Next token prediction paradigm has been prevailing for autoregressive models in the era of LLMs. The current default sampling choice for popular LLMs is temperature scaling together with nucleus sampling to balance diversity and coherence. Nevertheless, such approach leads to inferior performance in various NLP tasks when the model is not certain about testing questions. To this end, we propose a brand new training-free decoding strategy, dubbed as Cautious Next Token Prediction (CNTP). In the decoding process, if the model has comparatively high prediction entropy at a certain step, we sample multiple trials starting from the step independently and stop when encountering any punctuation. Then we select the trial with the lowest perplexity score viewed as the most probable and reliable trial path given the model's capacity. The trial number is negatively correlated with the prediction confidence, i.e., the less confident the model is, the more trials it should sample. This is consistent with human beings' behaviour: when feeling uncertain or unconfident, one tends to think more creatively, exploring multiple thinking paths, to cautiously select the path one feels most confident about. Extensive experiments on both LLMs and MLLMs show that our proposed CNTP approach outperforms existing standard decoding strategies consistently by a clear margin. Moreover, the integration of CNTP with self consistency can further improve over vanilla self consistency. We believe our proposed CNTP has the potential to become one of the default choices for LLM decoding. Code is available at https://github.com/wyzjack/CNTP.

Updated: 2025-07-23 08:06:29

标题: 谨慎的下一个标记预测

摘要: 在LLM时代，预测下一个标记的范式已经在自回归模型中盛行。当前流行的LLM的默认采样选择是温度缩放和核心采样，以平衡多样性和连贯性。然而，当模型对测试问题不确定时，这种方法会导致各种NLP任务的性能不佳。因此，我们提出了一种全新的无需训练的解码策略，称为谨慎的下一个标记预测（CNTP）。在解码过程中，如果模型在某一步的预测熵相对较高，我们独立地从该步开始进行多次尝试，并在遇到任何标点符号时停止。然后我们选择具有最低困惑度分数的试验路径，视为最可能和可靠的试验路径，考虑到模型的容量。试验次数与预测置信度呈负相关，即模型越不自信，应该进行更多的试验。这与人类行为一致：在感到不确定或不自信时，人倾向于更有创造性地思考，探索多种思维路径，谨慎选择自己最有信心的路径。对LLM和MLLM进行的大量实验表明，我们提出的CNTP方法始终优于现有的标准解码策略。此外，将CNTP与自一致性集成可以进一步改善基础自一致性。我们相信我们提出的CNTP具有成为LLM解码的默认选择之一的潜力。代码可在https://github.com/wyzjack/CNTP找到。

更新时间: 2025-07-23 08:06:29

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.03038v2

Alleviating Seasickness through Brain-Computer Interface-based Attention Shift

Seasickness poses a widespread problem that adversely impacts both passenger comfort and the operational efficiency of maritime crews. Although attention shift has been proposed as a potential method to alleviate symptoms of motion sickness, its efficacy remains to be rigorously validated, especially in maritime environments. In this study, we develop an AI-driven brain-computer interface (BCI) to realize sustained and practical attention shift by incorporating tasks such as breath counting. Forty-three participants completed a real-world nautical experiment consisting of a real-feedback session, a resting session, and a pseudo-feedback session. Notably, 81.39\% of the participants reported that the BCI intervention was effective. EEG analysis revealed that the proposed system can effectively regulate motion sickness EEG signatures, such as an decrease in total band power, along with an increase in theta relative power and a decrease in beta relative power. Furthermore, an indicator of attentional focus, the theta/beta ratio, exhibited a significant reduction during the real-feedback session, providing further evidence to support the effectiveness of the BCI in shifting attention. Collectively, this study presents a novel nonpharmacological, portable, and effective approach for seasickness intervention, which has the potential to open up a brand-new application domain for BCIs.

Updated: 2025-07-23 08:01:28

标题: 通过基于脑机接口的注意力转移缓解晕船

摘要: 晕船是一个普遍存在的问题，不仅影响乘客的舒适度，还影响海员的操作效率。尽管注意力转移被提议作为减轻晕动症状的潜在方法，但其疗效仍需得到严格验证，特别是在海上环境中。在本研究中，我们开发了一个基于人工智能的脑-计算机接口（BCI），通过加入呼吸计数等任务，实现持续且实用的注意力转移。四十三名参与者完成了一个真实世界的航海实验，包括一个真实反馈会话、一个休息会话和一个伪反馈会话。值得注意的是，81.39\%的参与者报告认为BCI干预是有效的。脑电图分析显示，所提出的系统可以有效调节晕动脑电图特征，如总频带功率降低，以及θ相对功率增加和β相对功率降低。此外，注意力焦点指标θ/β比在真实反馈会话期间显著减少，进一步证明了BCI在注意力转移方面的有效性。总的来说，本研究提出了一种新颖的非药物、便携且有效的晕船干预方法，有潜力开辟一个全新的BCI应用领域。

更新时间: 2025-07-23 08:01:28

领域: cs.HC,cs.AI,eess.SP,q-bio.QM

下载: http://arxiv.org/abs/2501.08518v2

A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios

We introduce a novel machine learning model for credit risk by combining tree-boosting with a latent spatio-temporal Gaussian process model accounting for frailty correlation. This allows for modeling non-linearities and interactions among predictor variables in a flexible data-driven manner and for accounting for spatio-temporal variation that is not explained by observable predictor variables. We also show how estimation and prediction can be done in a computationally efficient manner. In an application to a large U.S. mortgage credit risk data set, we find that both predictive default probabilities for individual loans and predictive loan portfolio loss distributions obtained with our novel approach are more accurate compared to conventional independent linear hazard models and also linear spatio-temporal models. Using interpretability tools for machine learning models, we find that the likely reasons for this outperformance are strong interaction and non-linear effects in the predictor variables and the presence of spatio-temporal frailty effects.

Updated: 2025-07-23 08:01:02

标题: 一个用于抵押信贷风险的时空机器学习模型：违约概率和贷款组合

摘要: 我们介绍了一种新颖的机器学习模型，用于信用风险评估，通过将树提升与潜在的时空高斯过程模型相结合，考虑了脆弱性相关性。这使得能够以灵活的数据驱动方式对预测变量之间的非线性和相互作用进行建模，并考虑到不被可观测的预测变量解释的时空变化。我们还展示了如何以计算效率的方式进行估计和预测。在一个大型的美国抵押信贷风险数据集的应用中，我们发现，与传统的独立线性危险模型以及线性时空模型相比，使用我们的新方法获得的个人贷款预测违约概率和贷款组合预测损失分布更准确。通过对机器学习模型的可解释性工具的使用，我们发现，这种超越性的可能原因是预测变量中强烈的相互作用和非线性效应以及时空脆弱性效应的存在。

更新时间: 2025-07-23 08:01:02

领域: q-fin.RM,cs.LG,q-fin.ST

下载: http://arxiv.org/abs/2410.02846v2

Privacy Artifact ConnecTor (PACT): Embedding Enterprise Artifacts for Compliance AI Agents

Enterprise environments contain a heterogeneous, rapidly growing collection of internal artifacts related to code, data, and many different tools. Critical information for assessing privacy risk and ensuring regulatory compliance is often embedded across these varied resources, each with their own arcane discovery and extraction techniques. Therefore, large-scale privacy compliance in adherence to governmental regulations requires systems to discern the interconnected nature of diverse artifacts in a common, shared universe. We present Privacy Artifact ConnecT or (PACT), an embeddings-driven graph that links millions of artifacts spanning multiple artifact types generated by a variety of teams and projects. Powered by the state-of-the-art DRAGON embedding model, PACT uses a contrastive learning objective with light fine-tuning to link artifacts via their textual components such as raw metadata, ownership specifics, and compliance context. Experimental results show that PACT's fine-tuned model improves recall@1 from 18% to 53%, the query match rate from 9.6% to 69.7% when paired with a baseline AI agent, and the hitrate@1 from 25.7% to 44.9% for candidate selection in a standard recommender system.

Updated: 2025-07-23 08:00:20

标题: 隐私物件连接器（PACT）：将企业物件嵌入合规AI代理程序

摘要: 企业环境包含了一个异构的、快速增长的内部文档集合，涉及到代码、数据和许多不同的工具。评估隐私风险和确保合规性所需的关键信息通常分布在这些不同资源中，每个资源都有自己独特的发现和提取技术。因此，依据政府法规进行大规模隐私合规性要求系统能够识别不同文档之间的相互关联性，共同存在于一个共享的宇宙中。我们提出了Privacy Artifact ConnecT（PACT），一个基于嵌入式驱动的图，连接了由多个团队和项目生成的跨多种文档类型的数百万文档。PACT利用最先进的DRAGON嵌入模型，使用对比学习目标和轻微微调来通过原始元数据、所有权具体信息和合规情境等文本组件来连接文档。实验结果表明，PACT的微调模型将召回率从18%提高到53%，与基线人工智能代理配对时查询匹配率从9.6%提高到69.7%，在标准推荐系统中用于候选选择的命中率从25.7%提高到44.9%。

更新时间: 2025-07-23 08:00:20

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.21142v1

On Temporal Guidance and Iterative Refinement in Audio Source Separation

Spatial semantic segmentation of sound scenes (S5) involves the accurate identification of active sound classes and the precise separation of their sources from complex acoustic mixtures. Conventional systems rely on a two-stage pipeline - audio tagging followed by label-conditioned source separation - but are often constrained by the absence of fine-grained temporal information critical for effective separation. In this work, we address this limitation by introducing a novel approach for S5 that enhances the synergy between the event detection and source separation stages. Our key contributions are threefold. First, we fine-tune a pre-trained Transformer to detect active sound classes. Second, we utilize a separate instance of this fine-tuned Transformer to perform sound event detection (SED), providing the separation module with detailed, time-varying guidance. Third, we implement an iterative refinement mechanism that progressively enhances separation quality by recursively reusing the separator's output from previous iterations. These advancements lead to significant improvements in both audio tagging and source separation performance, as demonstrated by our system's second-place finish in Task 4 of the DCASE Challenge 2025. Our implementation and model checkpoints are available in our GitHub repository: https://github.com/theMoro/dcase25task4 .

Updated: 2025-07-23 07:58:28

标题: 关于音频源分离中的时间引导和迭代细化

摘要: 声音场景（S5）的空间语义分割涉及准确识别活动声音类别，并精确地从复杂的声学混合物中分离它们的来源。传统系统依赖于一个两阶段流水线 - 音频标记后跟标签条件的源分离 - 但往往受到关键的细粒度时间信息缺失的限制，这对有效的分离至关重要。在这项工作中，我们通过引入一种新颖的S5方法来解决这一限制，增强了事件检测和源分离阶段之间的协同作用。我们的主要贡献有三个。首先，我们对预训练的Transformer进行微调，以检测活动声音类别。其次，我们利用这个经过微调的Transformer的单独实例来执行声音事件检测（SED），为分离模块提供详细的、时变的指导。第三，我们实现了一个迭代的细化机制，通过递归地重复使用先前迭代中分离器的输出，逐渐提高分离质量。这些进步显著提高了音频标记和源分离性能，我们的系统在DCASE Challenge 2025的任务4中获得了第二名。我们的实现和模型检查点可在我们的GitHub仓库中找到：https://github.com/theMoro/dcase25task4。

更新时间: 2025-07-23 07:58:28

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2507.17297v1

VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback

Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets. We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing \emph{without fine-tuning} the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision. Code is open-sourced at \href{https://github.com/jxbi1010/VLA-Touch}{this URL}.

Updated: 2025-07-23 07:54:10

标题: VLA-Touch: 用双层触觉反馈增强视觉-语言-动作模型

摘要: 触觉反馈通常被认为对有效与物理世界的交互至关重要。然而，现代视觉-语言-动作（VLA）模型缺乏解释和使用触觉信号的能力，从而限制了它们在接触丰富任务中的有效性。将触觉反馈纳入这些系统具有挑战性，因为缺乏大型多模态数据集。我们提出了VLA-Touch，一种方法，通过不对基础VLA进行微调，增强了通用机器人策略的触觉感知。我们的方法引入了两个关键创新：（1）一个利用预训练的触觉-语言模型的流水线，为高级任务规划提供语义触觉反馈，以及（2）一种基于扩散的控制器，通过触觉信号对接触丰富的操作进行精细调整。通过真实世界实验，我们证明了我们对触觉反馈的双层集成提高了任务规划效率同时提高了执行精度。代码开源，网址为https://github.com/jxbi1010/VLA-Touch。

更新时间: 2025-07-23 07:54:10

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2507.17294v1

Data Virtualization for Machine Learning

Nowadays, machine learning (ML) teams have multiple concurrent ML workflows for different applications. Each workflow typically involves many experiments, iterations, and collaborative activities and commonly takes months and sometimes years from initial data wrangling to model deployment. Organizationally, there is a large amount of intermediate data to be stored, processed, and maintained. \emph{Data virtualization} becomes a critical technology in an infrastructure to serve ML workflows. In this paper, we present the design and implementation of a data virtualization service, focusing on its service architecture and service operations. The infrastructure currently supports six ML applications, each with more than one ML workflow. The data virtualization service allows the number of applications and workflows to grow in the coming years.

Updated: 2025-07-23 07:53:56

标题: 数据虚拟化用于机器学习

摘要: 如今，机器学习团队针对不同应用拥有多个并发的机器学习工作流程。每个工作流程通常涉及许多实验、迭代和协作活动，并且通常需要数月甚至数年才能从初始数据清理到模型部署。在组织上，需要存储、处理和维护大量中间数据。数据虚拟化成为基础设施中的关键技术，用于服务机器学习工作流程。本文介绍了一个数据虚拟化服务的设计和实现，重点关注其服务架构和服务操作。该基础设施目前支持六个机器学习应用程序，每个应用程序都有多个机器学习工作流程。数据虚拟化服务允许在未来几年内应用程序和工作流程的数量增长。

更新时间: 2025-07-23 07:53:56

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2507.17293v1

Integrating Belief Domains into Probabilistic Logic Programs

Probabilistic Logic Programming (PLP) under the Distribution Semantics is a leading approach to practical reasoning under uncertainty. An advantage of the Distribution Semantics is its suitability for implementation as a Prolog or Python library, available through two well-maintained implementations, namely ProbLog and cplint/PITA. However, current formulations of the Distribution Semantics use point-probabilities, making it difficult to express epistemic uncertainty, such as arises from, for example, hierarchical classifications from computer vision models. Belief functions generalize probability measures as non-additive capacities, and address epistemic uncertainty via interval probabilities. This paper introduces interval-based Capacity Logic Programs based on an extension of the Distribution Semantics to include belief functions, and describes properties of the new framework that make it amenable to practical applications.

Updated: 2025-07-23 07:52:09

标题: 将信仰领域整合到概率逻辑程序中

摘要: Distribution Semantics下的概率逻辑编程（PLP）是在不确定性下进行实际推理的主要方法。Distribution Semantics的一个优点是其适合作为Prolog或Python库实现，可以通过两种良好维护的实现（即ProbLog和cplint/PITA）获得。然而，当前Distribution Semantics的表述使用点概率，这使得难以表达认知不确定性，例如来自计算机视觉模型的分层分类。信念函数将概率度量广义化为非可加容量，并通过区间概率处理认知不确定性。本文介绍了基于区间的容量逻辑程序，基于Distribution Semantics的扩展来包括信念函数，并描述了新框架的特性，使其适用于实际应用。

更新时间: 2025-07-23 07:52:09

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2507.17291v1

Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments

This paper presents Compliance Brain Assistant (CBA), a conversational, agentic AI assistant designed to boost the efficiency of daily compliance tasks for personnel in enterprise environments. To strike a good balance between response quality and latency, we design a user query router that can intelligently choose between (i) FastTrack mode: to handle simple requests that only need additional relevant context retrieved from knowledge corpora; and (ii) FullAgentic mode: to handle complicated requests that need composite actions and tool invocations to proactively discover context across various compliance artifacts, and/or involving other APIs/models for accommodating requests. A typical example would be to start with a user query, use its description to find a specific entity and then use the entity's information to query other APIs for curating and enriching the final AI response. Our experimental evaluations compared CBA against an out-of-the-box LLM on various real-world privacy/compliance-related queries targeting various personas. We found that CBA substantially improved upon the vanilla LLM's performance on metrics such as average keyword match rate (83.7% vs. 41.7%) and LLM-judge pass rate (82.0% vs. 20.0%). We also compared metrics for the full routing-based design against the `fast-track only` and `full-agentic` modes and found that it had a better average match-rate and pass-rate while keeping the run-time approximately the same. This finding validated our hypothesis that the routing mechanism leads to a good trade-off between the two worlds.

Updated: 2025-07-23 07:51:10

标题: 《合规大脑助手：企业环境中用于辅助合规任务的对话式代理人人工智能》

摘要: 本文介绍了一种名为Compliance Brain Assistant (CBA)的对话式、代理式人工智能助手，旨在提高企业环境中人员日常合规任务的效率。为了在响应质量和延迟之间取得良好平衡，我们设计了一个用户查询路由器，可以智能地在以下两种模式之间选择：(i)快速模式：处理只需要从知识文献中检索额外相关上下文的简单请求；(ii)全代理模式：处理需要复合动作和工具调用以主动发现涉及各种合规工件的上下文，和/或涉及其他API/模型以满足请求的复杂请求。一个典型的例子是从用户查询开始，使用其描述找到特定实体，然后使用实体信息查询其他API以策划和丰富最终的人工智能响应。我们对CBA与一个开箱即用的LLM在针对各种真实世界的隐私/合规相关查询的不同用户角色进行了实验评估。我们发现，CBA在平均关键词匹配率（83.7% vs. 41.7%）和LLM评分通过率（82.0% vs. 20.0%）等指标上明显优于普通LLM的性能。我们还比较了基于完整路由设计的指标与“仅快速通道”和“完全代理”模式的指标，发现它在保持运行时间大致相同的情况下具有更好的平均匹配率和通过率。这一发现验证了我们的假设，即路由机制在两个世界之间取得了良好的折衷。

更新时间: 2025-07-23 07:51:10

领域: cs.AI

下载: http://arxiv.org/abs/2507.17289v1

Decentralized Federated Learning of Probabilistic Generative Classifiers

Federated learning is a paradigm of increasing relevance in real world applications, aimed at building a global model across a network of heterogeneous users without requiring the sharing of private data. We focus on model learning over decentralized architectures, where users collaborate directly to update the global model without relying on a central server. In this context, the current paper proposes a novel approach to collaboratively learn probabilistic generative classifiers with a parametric form. The framework is composed by a communication network over a set of local nodes, each of one having its own local data, and a local updating rule. The proposal involves sharing local statistics with neighboring nodes, where each node aggregates the neighbors' information and iteratively learns its own local classifier, which progressively converges to a global model. Extensive experiments demonstrate that the algorithm consistently converges to a globally competitive model across a wide range of network topologies, network sizes, local dataset sizes, and extreme non-i.i.d. data distributions.

Updated: 2025-07-23 07:45:20

标题: 分散式联邦学习的概率生成分类器

摘要: 联邦学习是一个在现实世界应用中日益重要的范例，旨在在异构用户网络上构建全局模型，而无需共享私人数据。我们关注分散式架构上的模型学习，在这种架构中，用户直接合作更新全局模型，而不依赖于中央服务器。在这种背景下，本文提出了一种新颖的方法，用于协作学习具有参数形式的概率生成分类器。该框架由一个通信网络和一组本地节点组成，每个节点都有自己的本地数据和本地更新规则。提议涉及与相邻节点共享本地统计信息，每个节点汇总邻居的信息并迭代地学习自己的本地分类器，逐渐收敛到全局模型。大量实验表明，该算法在各种网络拓扑结构、网络大小、本地数据集大小和极端非独立同分布数据分布下均持续收敛到一个全局竞争模型。

更新时间: 2025-07-23 07:45:20

领域: cs.LG

下载: http://arxiv.org/abs/2507.17285v1

Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression

The rapid growth in computing demands, particularly driven by artificial intelligence applications, has begun to exceed the capabilities of traditional electronic hardware. Optical computing offers a promising alternative due to its parallelism, high computational speed, and low power consumption. However, existing photonic integrated circuits are constrained by large footprints, costly electro-optical interfaces, and complex control mechanisms, limiting the practical scalability of optical neural networks (ONNs). To address these limitations, we introduce a block-circulant photonic tensor core for a structure-compressed optical neural network (StrC-ONN) architecture. The structured compression technique substantially reduces both model complexity and hardware resources without sacrificing the versatility of neural networks, and achieves accuracy comparable to uncompressed models. Additionally, we propose a hardware-aware training framework to compensate for on-chip nonidealities to improve model robustness and accuracy. Experimental validation through image processing and classification tasks demonstrates that our StrC-ONN achieves a reduction in trainable parameters of up to 74.91%,while still maintaining competitive accuracy levels. Performance analyses further indicate that this hardware-software co-design approach is expected to yield a 3.56 times improvement in power efficiency. By reducing both hardware requirements and control complexity across multiple dimensions, this work explores a new pathway toward practical and scalable ONNs, highlighting a promising route to address future computational efficiency challenges.

Updated: 2025-07-23 07:39:55

标题: 硬件高效的光子张量核心：利用结构压缩加速深度神经网络

摘要: 随着人工智能应用驱动的计算需求快速增长，传统电子硬件的能力已经开始不足。光计算因其并行性、高计算速度和低功耗而成为一种有前途的替代方案。然而，现有的光子集成电路受到了大尺寸、昂贵的电光接口和复杂的控制机制的限制，限制了光神经网络（ONNs）的实际可扩展性。为了解决这些限制，我们引入了一种用于结构压缩光神经网络（StrC-ONN）架构的块循环光张量核。结构压缩技术大大减少了模型复杂性和硬件资源，同时又不牺牲神经网络的多功能性，并实现了与未压缩模型相媲美的准确性。此外，我们提出了一种硬件感知的训练框架，以弥补芯片上的非理想性，提高模型的稳健性和准确性。通过图像处理和分类任务的实验验证表明，我们的StrC-ONN实现了高达74.91％的可训练参数减少，同时仍保持竞争力的准确性水平。性能分析进一步表明，这种硬件-软件协同设计方法预计将使功率效率提高3.56倍。通过在多个维度上减少硬件要求和控制复杂性，这项工作探索了一条通向实用和可扩展ONNs的新路径，突显了解决未来计算效率挑战的有前途的途径。

更新时间: 2025-07-23 07:39:55

领域: cs.AR,cs.ET,cs.LG

下载: http://arxiv.org/abs/2502.01670v2

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Recent advancements in large language models (LLMs) have demonstrated impressive chain-of-thought reasoning capabilities, with reinforcement learning (RL) playing a crucial role in this progress. While "aha moment" patterns--where models exhibit self-correction through reflection--are often attributed to emergent properties from RL, we first demonstrate that these patterns exist in multimodal LLMs (MLLMs) prior to RL training but may not necessarily correlate with improved reasoning performance. Building on these insights, we present a comprehensive study on enhancing multimodal reasoning through a two-stage approach: (1) supervised fine-tuning (SFT) as a cold start with structured chain-of-thought reasoning patterns, followed by (2) reinforcement learning via GRPO to further refine these capabilities. Our extensive experiments show that this combined approach consistently outperforms both SFT-only and RL-only methods across challenging multimodal reasoning benchmarks. The resulting models achieve state-of-the-art performance among open-source MLLMs at both 3B and 7B scales, with our 7B model showing substantial improvements over base models (e.g., 66.3 %$\rightarrow$73.4 % on MathVista, 62.9 %$\rightarrow$70.4 % on We-Math) and our 3B model achieving performance competitive with several 7B models. Overall, this work provides practical guidance for building advanced multimodal reasoning models. Our code is available at https://github.com/waltonfuture/RL-with-Cold-Start.

Updated: 2025-07-23 07:37:08

标题: 通过强化学习与冷启动推进多模态推理

摘要: 近期大型语言模型（LLMs）的进展展示了令人印象深刻的思维链推理能力，其中强化学习（RL）在这一进展中起着至关重要的作用。虽然“灵光一现”模式--模型通过反思展示自我修正--通常被归因于RL中出现的新兴属性，我们首先证明这些模式在多模态LLMs（MLLMs）中存在于RL训练之前，但不一定与改进的推理性能相关。基于这些见解，我们提出了一个全面研究，通过两阶段方法增强多模态推理能力：（1）监督微调（SFT）作为冷启动，具有结构化的思维链推理模式，然后（2）通过GRPO的强化学习来进一步完善这些能力。我们的广泛实验表明，这种组合方法在困难的多模态推理基准测试中始终优于仅SFT和仅RL的方法。结果模型在开源MLLMs中在3B和7B规模上实现了最先进的性能，其中我们的7B模型在MathVista上表现出较大改进（例如，66.3％→73.4％，在We-Math上62.9％→70.4％），我们的3B模型实现了与几个7B模型竞争的性能。总的来说，这项工作为构建先进的多模态推理模型提供了实际指导。我们的代码可在https://github.com/waltonfuture/RL-with-Cold-Start获得。

更新时间: 2025-07-23 07:37:08

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2505.22334v2

SegQuant: A Semantics-Aware and Generalizable Quantization Framework for Diffusion Models

Diffusion models have demonstrated exceptional generative capabilities but are computationally intensive, posing significant challenges for deployment in resource-constrained or latency-sensitive environments. Quantization offers an effective means to reduce model size and computational cost, with post-training quantization (PTQ) being particularly appealing due to its compatibility with pre-trained models without requiring retraining or training data. However, existing PTQ methods for diffusion models often rely on architecture-specific heuristics that limit their generalizability and hinder integration with industrial deployment pipelines. To address these limitations, we propose SegQuant, a unified quantization framework that adaptively combines complementary techniques to enhance cross-model versatility. SegQuant consists of a segment-aware, graph-based quantization strategy (SegLinear) that captures structural semantics and spatial heterogeneity, along with a dual-scale quantization scheme (DualScale) that preserves polarity-asymmetric activations, which is crucial for maintaining visual fidelity in generated outputs. SegQuant is broadly applicable beyond Transformer-based diffusion models, achieving strong performance while ensuring seamless compatibility with mainstream deployment tools.

Updated: 2025-07-23 07:36:08

标题: SegQuant：一种面向语义感知和可推广的扩散模型量化框架

摘要: 扩散模型展示了出色的生成能力，但计算量大，对于资源受限或延迟敏感的环境部署提出了重大挑战。量化提供了一种有效的方法来减少模型大小和计算成本，后训练量化（PTQ）尤其吸引人，因为它与预训练模型兼容，无需重新训练或训练数据。然而，现有的用于扩散模型的PTQ方法通常依赖于特定架构的启发式方法，限制了它们的泛化能力，并阻碍了与工业部署流程的集成。为了解决这些限制，我们提出了SegQuant，一个统一的量化框架，通过自适应地结合互补技术来增强跨模型的多功能性。SegQuant包括一个基于段意识、基于图的量化策略（SegLinear），捕捉结构语义和空间异质性，以及一个保留极性不对称激活的双尺度量化方案（DualScale），这对于维持生成的输出中的视觉保真度至关重要。SegQuant不仅适用于基于Transformer的扩散模型，而且在确保与主流部署工具无缝兼容的同时，取得了良好的性能。

更新时间: 2025-07-23 07:36:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.14811v2

Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning

In inaccessible environments with uncertain task demands, robots often rely on general-purpose tools that lack predefined usage strategies. These tools are not tailored for particular operations, making their longevity highly sensitive to how they are used. This creates a fundamental challenge: how can a robot learn a tool-use policy that both completes the task and prolongs the tool's lifespan? In this work, we address this challenge by introducing a reinforcement learning (RL) framework that incorporates tool lifespan as a factor during policy optimization. Our framework leverages Finite Element Analysis (FEA) and Miner's Rule to estimate Remaining Useful Life (RUL) based on accumulated stress, and integrates the RUL into the RL reward to guide policy learning toward lifespan-guided behavior. To handle the fact that RUL can only be estimated after task execution, we introduce an Adaptive Reward Normalization (ARN) mechanism that dynamically adjusts reward scaling based on estimated RULs, ensuring stable learning signals. We validate our method across simulated and real-world tool use tasks, including Object-Moving and Door-Opening with multiple general-purpose tools. The learned policies consistently prolong tool lifespan (up to 8.01x in simulation) and transfer effectively to real-world settings, demonstrating the practical value of learning lifespan-guided tool use strategies.

Updated: 2025-07-23 07:25:04

标题: 延长工具寿命：通过寿命引导的强化学习学习通用工具的熟练使用

摘要: 在具有不确定任务需求的无法访问的环境中，机器人通常依赖缺乏预定义使用策略的通用工具。这些工具并非针对特定操作而设计，使得它们的寿命极其敏感于如何使用。这带来了一个基本挑战：机器人如何学习一种既能完成任务又能延长工具寿命的使用策略？在这项工作中，我们通过引入一个强化学习（RL）框架来解决这一挑战，该框架在策略优化过程中将工具寿命作为一个因素。我们的框架利用有限元分析（FEA）和Miner's规则来基于累积应力估计剩余有用寿命（RUL），并将RUL整合到RL奖励中，以指导策略学习朝向寿命导向行为。为了处理RUL只能在任务执行后估计的事实，我们引入了一种自适应奖励标准化（ARN）机制，根据估计的RUL动态调整奖励缩放，确保稳定的学习信号。我们验证了我们的方法在模拟和真实世界的工具使用任务中，包括使用多种通用工具的物体移动和开门。学习到的策略持续延长工具寿命（在模拟中高达8.01倍），并有效地转移到真实世界环境，展示了学习寿命导向工具使用策略的实际价值。

更新时间: 2025-07-23 07:25:04

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2507.17275v1

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation

Operating robots in open-ended scenarios with diverse tasks is a crucial research and application direction in robotics. While recent progress in natural language processing and large multimodal models has enhanced robots' ability to understand complex instructions, robot manipulation still faces the procedural skill dilemma and the declarative skill dilemma in open environments. Existing methods often compromise cognitive and executive capabilities. To address these challenges, in this paper, we propose RoBridge, a hierarchical intelligent architecture for general robotic manipulation. It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM), an invariant operable representation (IOR) serving as a symbolic bridge, and a generalist embodied agent (GEA). RoBridge maintains the declarative skill of VLM and unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution. RoBridge demonstrates significant performance improvements over existing baselines, achieving a 75% success rate on new tasks and an 83% average success rate in sim-to-real generalization using only five real-world data samples per task. This work represents a significant step towards integrating cognitive reasoning with physical execution in robotic systems, offering a new paradigm for general robotic manipulation.

Updated: 2025-07-23 07:22:26

标题: RoBridge：一个连接认知和执行的分层架构，用于普通机器人操作

摘要: 在机器人领域，操作多样任务的开放式场景是一个至关重要的研究和应用方向。尽管最近自然语言处理和大型多模态模型的进展增强了机器人理解复杂指令的能力，但在开放环境中，机器人操作仍面临程序性技能困境和陈述性技能困境。现有方法往往会牺牲认知和执行能力。为了解决这些挑战，在本文中，我们提出了RoBridge，一个用于通用机器人操作的分层智能架构。它包括基于大规模预训练视觉-语言模型（VLM）的高级认知规划器（HCP），作为符号桥梁的不变操作表示（IOR），以及通用实体代理（GEA）。RoBridge保持了VLM的陈述技能，并释放了强化学习的程序性技能，有效地弥合了认知和执行之间的差距。RoBridge在现有基线上表现出显著的性能改进，在新任务中取得了75%的成功率，并在每项任务仅使用五个真实世界数据样本的情况下，在从模拟到实际的泛化中实现了83%的平均成功率。这项工作代表着在机器人系统中整合认知推理和物理执行的重要一步，为通用机器人操作提供了一种新范式。

更新时间: 2025-07-23 07:22:26

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2505.01709v3

An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning

Enhancing the mathematical reasoning capabilities of Large Language Models (LLMs) is of great scientific and practical significance. Researchers typically employ process-supervised reward models (PRMs) to guide the reasoning process, effectively improving the models' reasoning abilities. However, existing methods for constructing process supervision training data, such as manual annotation and per-step Monte Carlo estimation, are often costly or suffer from poor quality. To address these challenges, this paper introduces a framework called EpicPRM, which annotates each intermediate reasoning step based on its quantified contribution and uses an adaptive binary search algorithm to enhance both annotation precision and efficiency. Using this approach, we efficiently construct a high-quality process supervision training dataset named Epic50k, consisting of 50k annotated intermediate steps. Compared to other publicly available datasets, the PRM trained on Epic50k demonstrates significantly superior performance. Getting Epic50k at https://github.com/xiaolizh1/EpicPRM.

Updated: 2025-07-23 07:19:41

标题: 一个高效精确的训练数据构建框架，用于数学推理中的过程监督奖励模型

摘要: 增强大型语言模型（LLMs）的数学推理能力具有重要的科学和实践意义。研究人员通常使用过程监督奖励模型（PRMs）来引导推理过程，有效提高模型的推理能力。然而，目前构建过程监督训练数据的方法，如手动注释和逐步蒙特卡洛估计，往往成本高或质量较差。为了解决这些挑战，本文介绍了一个名为EpicPRM的框架，该框架基于中间推理步骤的量化贡献进行注释，并使用自适应二进制搜索算法来提高注释精度和效率。利用这种方法，我们高效构建了一个名为Epic50k的高质量过程监督训练数据集，其中包含50k个注释的中间步骤。与其他公开可用数据集相比，使用Epic50k训练的PRM表现出明显优越的性能。获取Epic50k，请访问https://github.com/xiaolizh1/EpicPRM。

更新时间: 2025-07-23 07:19:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.02382v2

Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance

Analyzing large, complex output datasets from Discrete Event Simulations (DES) of warehouse operations to identify bottlenecks and inefficiencies is a critical yet challenging task, often demanding significant manual effort or specialized analytical tools. Our framework integrates Knowledge Graphs (KGs) and Large Language Model (LLM)-based agents to analyze complex Discrete Event Simulation (DES) output data from warehouse operations. It transforms raw DES data into a semantically rich KG, capturing relationships between simulation events and entities. An LLM-based agent uses iterative reasoning, generating interdependent sub-questions. For each sub-question, it creates Cypher queries for KG interaction, extracts information, and self-reflects to correct errors. This adaptive, iterative, and self-correcting process identifies operational issues mimicking human analysis. Our DES approach for warehouse bottleneck identification, tested with equipment breakdowns and process irregularities, outperforms baseline methods. For operational questions, it achieves near-perfect pass rates in pinpointing inefficiencies. For complex investigative questions, we demonstrate its superior diagnostic ability to uncover subtle, interconnected issues. This work bridges simulation modeling and AI (KG+LLM), offering a more intuitive method for actionable insights, reducing time-to-insight, and enabling automated warehouse inefficiency evaluation and diagnosis.

Updated: 2025-07-23 07:18:55

标题: 利用知识图谱和LLM推理识别仓库规划辅助中的操作瓶颈

摘要: 分析仓库操作的离散事件模拟（DES）产生的大规模、复杂的输出数据集，以识别瓶颈和低效率是一项关键但具有挑战性的任务，通常需要大量的手动工作或专门的分析工具。我们的框架整合了知识图谱（KGs）和基于大型语言模型（LLM）的代理，用于分析仓库操作的复杂离散事件模拟（DES）输出数据。它将原始DES数据转化为语义丰富的知识图谱，捕捉模拟事件和实体之间的关系。基于LLM的代理使用迭代推理，生成相互依赖的子问题。对于每个子问题，它创建Cypher查询用于KG交互，提取信息，并进行自我反思以纠正错误。这种自适应、迭代和自我纠正的过程识别出运营问题，模拟人类分析。我们的仓库瓶颈识别DES方法，经过设备故障和流程异常测试，优于基准方法。对于运营问题，它在指出低效率方面取得几乎完美的通过率。对于复杂的调查问题，我们展示了其优越的诊断能力，揭示微妙的、相互关联的问题。这项工作架起了模拟建模和人工智能（KG+LLM）之间的桥梁，提供了一种更直观的方法来获得可操作的见解，缩短洞察力的时间，并实现自动化的仓库低效率评估和诊断。

更新时间: 2025-07-23 07:18:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17273v1

Bayesian Optimization of Robustness Measures under Input Uncertainty: A Randomized Gaussian Process Upper Confidence Bound Approach

Bayesian optimization based on the Gaussian process upper confidence bound (GP-UCB) offers a theoretical guarantee for optimizing black-box functions. In practice, however, black-box functions often involve input uncertainty. To handle such cases, GP-UCB can be extended to optimize evaluation criteria known as robustness measures. However, GP-UCB-based methods for robustness measures require a trade-off parameter, $\beta$, which, as in the original GP-UCB, must be set sufficiently large to ensure theoretical validity. In this study, we propose randomized robustness measure GP-UCB (RRGP-UCB), a novel method that samples $\beta$ from a chi-squared-based probability distribution. This approach eliminates the need to explicitly specify $\beta$. Notably, the expected value of $\beta$ under this distribution is not excessively large. Furthermore, we show that RRGP-UCB provides tight bounds on the expected regret between the optimal and estimated solutions. Numerical experiments demonstrate the effectiveness of the proposed method.

Updated: 2025-07-23 07:15:01

标题: 贝叶斯优化输入不确定性下的鲁棒性度量：一种随机高斯过程上置信上界方法

摘要: 基于高斯过程上置信上界（GP-UCB）的贝叶斯优化为优化黑匣子函数提供了理论保证。然而，在实践中，黑匣子函数通常涉及输入不确定性。为了处理这种情况，GP-UCB可以扩展以优化被称为鲁棒性度量的评估标准。然而，基于鲁棒性度量的GP-UCB方法需要一个权衡参数$\beta$，就像原始的GP-UCB一样，必须设置足够大以确保理论的有效性。在本研究中，我们提出了随机鲁棒性度量GP-UCB（RRGP-UCB），这是一种从基于卡方分布的概率分布中对$\beta$进行采样的新方法。这种方法消除了明确指定$\beta$的需要。值得注意的是，在这种分布下$\beta$的期望值并不过大。此外，我们展示了RRGP-UCB在最优和估计解之间的期望遗憾提供了严格的界限。数值实验证明了所提出方法的有效性。

更新时间: 2025-07-23 07:15:01

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2504.03172v2

PPFPL: Cross-silo Privacy-preserving Federated Prototype Learning Against Data Poisoning Attacks on Non-IID Data

Privacy-Preserving Federated Learning (PPFL) allows multiple clients to collaboratively train a deep learning model by submitting hidden model updates. Nonetheless, PPFL is vulnerable to data poisoning attacks due to the distributed training nature of clients. Existing solutions have struggled to improve the performance of cross-silo PPFL in poisoned Non-IID data. To address the issues, this paper proposes a privacy-preserving federated prototype learning framework, named PPFPL, which enhances the cross-silo FL performance in poisoned Non-IID data while effectively resisting data poisoning attacks. Specifically, we adopt prototypes as client-submitted model updates to eliminate the impact of tampered data distribution on federated learning. Moreover, we utilize two servers to achieve Byzantine-robust aggregation by secure aggregation protocol, which greatly reduces the impact of malicious clients. Theoretical analyses confirm the convergence of PPFPL, and experimental results on publicly available datasets show that PPFPL is effective for resisting data poisoning attacks with Non-IID conditions.

Updated: 2025-07-23 07:02:51

标题: PPFPL：跨部门隐私保护联合原型学习对抗非IID数据中的数据毒化攻击

摘要: 隐私保护联邦学习（PPFL）允许多个客户端通过提交隐藏的模型更新来协作训练深度学习模型。然而，由于客户端的分布式训练性质，PPFL容易受到数据毒化攻击的影响。现有解决方案在毒化的非独立同分布数据中改善跨隔离PPFL的性能方面存在困难。为了解决这些问题，本文提出了一种名为PPFPL的隐私保护联邦原型学习框架，它提高了在毒化的非独立同分布数据中跨隔离FL的性能，同时有效抵抗数据毒化攻击。具体来说，我们采用原型作为客户端提交的模型更新，消除了篡改数据分布对联邦学习的影响。此外，我们利用两个服务器通过安全聚合协议实现拜占庭-鲁棒聚合，大大减少了恶意客户端的影响。理论分析证实了PPFPL的收敛性，公开可用数据集上的实验结果表明，PPFPL对于具有非独立同分布条件的数据毒化攻击具有有效性。

更新时间: 2025-07-23 07:02:51

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2504.03173v3

Understanding Prompt Programming Tasks and Questions

Prompting foundation models (FMs) like large language models (LLMs) have enabled new AI-powered software features (e.g., text summarization) that previously were only possible by fine-tuning FMs. Now, developers are embedding prompts in software, known as prompt programs. The process of prompt programming requires the developer to make many changes to their prompt. Yet, the questions developers ask to update their prompt is unknown, despite the answers to these questions affecting how developers plan their changes. With the growing number of research and commercial prompt programming tools, it is unclear whether prompt programmers' needs are being adequately addressed. We address these challenges by developing a taxonomy of 25 tasks prompt programmers do and 51 questions they ask, measuring the importance of each task and question. We interview 16 prompt programmers, observe 8 developers make prompt changes, and survey 50 developers. We then compare the taxonomy with 48 research and commercial tools. We find that prompt programming is not well-supported: all tasks are done manually, and 16 of the 51 questions -- including a majority of the most important ones -- remain unanswered. Based on this, we outline important opportunities for prompt programming tools.

Updated: 2025-07-23 07:01:44

标题: 理解提示性编程任务和问题

摘要: 促使基础模型（FMs）如大型语言模型（LLMs）已经实现了新的基于人工智能的软件功能（例如文本摘要），这些功能以前只能通过对FMs进行微调才能实现。现在，开发人员正在将提示嵌入软件中，称为提示程序。提示编程的过程需要开发人员对其提示进行多次更改。然而，开发人员用于更新其提示的问题尚不清楚，尽管对这些问题的回答会影响开发人员如何规划其更改。随着研究和商业提示编程工具数量的增长，尚不清楚是否充分满足了提示程序员的需求。我们通过开发一个包含25项提示程序员的任务和51个问题的分类法，并衡量每个任务和问题的重要性来解决这些挑战。我们采访了16名提示程序员，观察了8名开发人员进行提示更改，并对50名开发人员进行了调查。然后，我们将这个分类法与48种研究和商业工具进行了比较。我们发现提示编程得不到很好的支持：所有任务都是手动完成的，其中16个问题中有51个问题，包括大多数最重要的问题，仍然没有得到答案。基于此，我们概述了提示编程工具的重要机会。

更新时间: 2025-07-23 07:01:44

领域: cs.SE,cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.17264v1

EXGnet: a single-lead explainable-AI guided multiresolution network with train-only quantitative features for trustworthy ECG arrhythmia classification

Deep learning has significantly propelled the performance of ECG arrhythmia classification, yet its clinical adoption remains hindered by challenges in interpretability and deployment on resource-constrained edge devices. To bridge this gap, we propose EXGnet, a novel and reliable ECG arrhythmia classification network tailored for single-lead signals, specifically designed to balance high accuracy, explainability, and edge compatibility. EXGnet integrates XAI supervision during training via a normalized cross-correlation based loss, directing the model's attention to clinically relevant ECG regions, similar to a cardiologist's focus. This supervision is driven by automatically generated ground truth, derived through an innovative heart rate variability-based approach, without the need for manual annotation. To enhance classification accuracy without compromising deployment simplicity, we incorporate quantitative ECG features during training. These enrich the model with multi-domain knowledge but are excluded during inference, keeping the model lightweight for edge deployment. Additionally, we introduce an innovative multiresolution block to efficiently capture both short and long-term signal features while maintaining computational efficiency. Rigorous evaluation on the Chapman and Ningbo benchmark datasets validates the supremacy of EXGnet, which achieves average five-fold accuracies of 98.762% and 96.932%, and F1-scores of 97.910% and 95.527%, respectively. Comprehensive ablation studies and both quantitative and qualitative interpretability assessment confirm that the XAI guidance is pivotal, demonstrably enhancing the model's focus and trustworthiness. Overall, EXGnet sets a new benchmark by combining high-performance arrhythmia classification with interpretability, paving the way for more trustworthy and accessible portable ECG based health monitoring systems.

Updated: 2025-07-23 06:58:51

标题: EXGnet：一种单导联可解释AI引导多分辨率网络，具有仅训练定量特征用于可信心电心律失常分类

摘要: 深度学习显著推动了心电图心律失常分类的性能，但其临床应用仍受到可解释性和在资源受限的边缘设备上部署的挑战的限制。为了弥合这一差距，我们提出了EXGnet，这是一种专门为单导联信号量身定制的新颖可靠的心电图心律失常分类网络，旨在平衡高准确性、可解释性和边缘兼容性。EXGnet在训练期间通过基于归一化的互相关损失集成了可解释的人工智能（XAI）监督，将模型的注意力引导到临床相关的心电图区域，类似于心脏病专家的关注。这种监督是由通过创新的基于心率变异性的方法自动生成的地面真相驱动的，无需手动注释。为了提高分类准确性而不影响部署简易性，我们在训练期间整合了定量心电图特征。这些特征丰富了模型的多领域知识，但在推断过程中被排除，使模型在边缘部署时保持轻量级。此外，我们引入了一种创新的多分辨率块，以高效地捕捉短期和长期信号特征，同时保持计算效率。在查普曼和宁波基准数据集上的严格评估验证了EXGnet的优越性，其平均五重准确率分别为98.762%和96.932%，F1分数为97.910%和95.527%。全面的消融研究以及定量和定性的可解释性评估证实了XAI指导的至关重要性，显著增强了模型的关注焦点和可信度。总体而言，EXGnet通过将高性能心律失常分类与可解释性相结合，为更可信赖和易于访问的便携式心电图监测系统铺平了道路，树立了新的基准。

更新时间: 2025-07-23 06:58:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.12404v2

Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs

Large language models (LLMs) are increasingly trained on tabular data, which, unlike unstructured text, often contains personally identifiable information (PII) in a highly structured and explicit format. As a result, privacy risks arise, since sensitive records can be inadvertently retained by the model and exposed through data extraction or membership inference attacks (MIAs). While existing MIA methods primarily target textual content, their efficacy and threat implications may differ when applied to structured data, due to its limited content, diverse data types, unique value distributions, and column-level semantics. In this paper, we present Tab-MIA, a benchmark dataset for evaluating MIAs on tabular data in LLMs and demonstrate how it can be used. Tab-MIA comprises five data collections, each represented in six different encoding formats. Using our Tab-MIA benchmark, we conduct the first evaluation of state-of-the-art MIA methods on LLMs finetuned with tabular data across multiple encoding formats. In the evaluation, we analyze the memorization behavior of pretrained LLMs on structured data derived from Wikipedia tables. Our findings show that LLMs memorize tabular data in ways that vary across encoding formats, making them susceptible to extraction via MIAs. Even when fine-tuned for as few as three epochs, models exhibit high vulnerability, with AUROC scores approaching 90% in most cases. Tab-MIA enables systematic evaluation of these risks and provides a foundation for developing privacy-preserving methods for tabular data in LLMs.

Updated: 2025-07-23 06:56:34

标题: Tab-MIA：LLMs中针对表格数据的成员推断攻击的基准数据集 LLMs代表Low-Level Models，意为低级模型。

摘要: 大型语言模型（LLMs）越来越多地在表格数据上进行训练，与非结构化文本不同，表格数据通常以高度结构化和明确的格式包含个人可识别信息（PII）。因此，由于敏感记录可能被模型无意中保留并通过数据提取或成员推理攻击（MIAs）暴露，隐私风险会产生。尽管现有的MIAs方法主要针对文本内容，但当应用于结构化数据时，它们的有效性和威胁影响可能会有所不同，这是由于结构化数据的内容有限、数据类型多样、唯一值分布和列级语义。在本文中，我们提出了Tab-MIA，一个用于在LLMs中评估表格数据上MIAs的基准数据集，并展示了它的用途。Tab-MIA包括五个数据集合，每个数据集以六种不同的编码格式表示。使用我们的Tab-MIA基准数据集，我们对LLMs在多个编码格式下使用表格数据进行微调的最新MIAs方法进行了首次评估。在评估中，我们分析了预训练LLMs在来源于维基百科表格的结构化数据上的记忆行为。我们的研究结果表明，LLMs以不同方式记忆表格数据，这使它们容易受到MIAs的提取。即使对模型进行了仅三个epochs的微调，模型在大多数情况下都表现出较高的脆弱性，AUROC分数接近90%。Tab-MIA使得这些风险的系统评估成为可能，并为开发LLMs中表格数据的隐私保护方法奠定了基础。

更新时间: 2025-07-23 06:56:34

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2507.17259v1

Students' Feedback Requests and Interactions with the SCRIPT Chatbot: Do They Get What They Ask For?

Building on prior research on Generative AI (GenAI) and related tools for programming education, we developed SCRIPT, a chatbot based on ChatGPT-4o-mini, to support novice learners. SCRIPT allows for open-ended interactions and structured guidance through predefined prompts. We evaluated the tool via an experiment with 136 students from an introductory programming course at a large German university and analyzed how students interacted with SCRIPT while solving programming tasks with a focus on their feedback preferences. The results reveal that students' feedback requests seem to follow a specific sequence. Moreover, the chatbot responses aligned well with students' requested feedback types (in 75%), and it adhered to the system prompt constraints. These insights inform the design of GenAI-based learning support systems and highlight challenges in balancing guidance and flexibility in AI-assisted tools.

Updated: 2025-07-23 06:56:26

标题: 学生对SCRIPT聊天机器人的反馈请求和互动：他们得到他们所要求的吗？

摘要: 在以往关于生成式人工智能（GenAI）和相关工具用于编程教育的研究基础上，我们开发了SCRIPT，这是一个基于ChatGPT-4o-mini的聊天机器人，旨在支持初学者。SCRIPT允许进行开放式互动，并通过预定义提示提供结构化指导。我们通过与德国一所大学的入门编程课程中的136名学生进行实验来评估该工具，并分析学生在解决编程任务时与SCRIPT的互动方式，重点关注他们的反馈偏好。结果显示，学生的反馈请求似乎遵循特定的顺序。此外，聊天机器人的回应与学生请求的反馈类型高度一致（达到75%），并且遵守系统提示的约束条件。这些发现对基于GenAI的学习支持系统的设计提供了有益信息，并突显了在AI辅助工具中平衡指导和灵活性的挑战。

更新时间: 2025-07-23 06:56:26

领域: cs.AI

下载: http://arxiv.org/abs/2507.17258v1

Agent Identity Evals: Measuring Agentic Identity

Central to agentic capability and trustworthiness of language model agents (LMAs) is the extent they maintain stable, reliable, identity over time. However, LMAs inherit pathologies from large language models (LLMs) (statelessness, stochasticity, sensitivity to prompts and linguistically-intermediation) which can undermine their identifiability, continuity, persistence and consistency. This attrition of identity can erode their reliability, trustworthiness and utility by interfering with their agentic capabilities such as reasoning, planning and action. To address these challenges, we introduce \textit{agent identity evals} (AIE), a rigorous, statistically-driven, empirical framework for measuring the degree to which an LMA system exhibit and maintain their agentic identity over time, including their capabilities, properties and ability to recover from state perturbations. AIE comprises a set of novel metrics which can integrate with other measures of performance, capability and agentic robustness to assist in the design of optimal LMA infrastructure and scaffolding such as memory and tools. We set out formal definitions and methods that can be applied at each stage of the LMA life-cycle, and worked examples of how to apply them.

Updated: 2025-07-23 06:56:15

标题: 代理人身份评估：衡量代理人身份

摘要: 文中提到，语言模型代理（LMAs）的代理能力和可信度的核心是它们在时间上保持稳定、可靠的身份的程度。然而，LMAs继承了大型语言模型（LLMs）的病态（无状态性、随机性、对提示和语言中介的敏感性），这可能会破坏它们的可识别性、连续性、持久性和一致性。身份的蚕食可能会通过干扰其推理、规划和行动等代理能力来侵蚀它们的可靠性、可信度和实用性。为了解决这些挑战，我们引入了\textit{代理身份评估}（AIE），这是一个严格、统计驱动的实证框架，用于衡量LMA系统在时间上展现和维持其代理身份的程度，包括它们的能力、属性和恢复状态扰动的能力。AIE包括一组新颖的度量标准，可以与其他性能、能力和代理稳健性的指标相结合，以协助设计最佳的LMA基础设施和脚手架，如内存和工具。我们提出了在LMA生命周期的每个阶段应用的形式定义和方法，并提供了如何应用它们的示例。

更新时间: 2025-07-23 06:56:15

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2507.17257v1

Knowledge Abstraction for Knowledge-based Semantic Communication: A Generative Causality Invariant Approach

In this study, we design a low-complexity and generalized AI model that can capture common knowledge to improve data reconstruction of the channel decoder for semantic communication. Specifically, we propose a generative adversarial network that leverages causality-invariant learning to extract causal and non-causal representations from the data. Causal representations are invariant and encompass crucial information to identify the data's label. They can encapsulate semantic knowledge and facilitate effective data reconstruction at the receiver. Moreover, the causal mechanism ensures that learned representations remain consistent across different domains, making the system reliable even with users collecting data from diverse domains. As user-collected data evolves over time causing knowledge divergence among users, we design sparse update protocols to improve the invariant properties of the knowledge while minimizing communication overheads. Three key observations were drawn from our empirical evaluations. Firstly, causality-invariant knowledge ensures consistency across different devices despite the diverse training data. Secondly, invariant knowledge has promising performance in classification tasks, which is pivotal for goal-oriented semantic communications. Thirdly, our knowledge-based data reconstruction highlights the robustness of our decoder, which surpasses other state-of-the-art data reconstruction and semantic compression methods in terms of Peak Signal-to-Noise Ratio (PSNR).

Updated: 2025-07-23 06:56:07

标题: 基于知识的语义通信的知识抽象：一种生成因果不变方法

摘要: 在这项研究中，我们设计了一个低复杂度和通用的人工智能模型，可以捕捉常见知识，以改善语义通信的信道解码器的数据重建。具体地，我们提出了一个生成对抗网络，利用因果不变学习从数据中提取因果和非因果表示。因果表示是不变的，并包含识别数据标签的关键信息。它们可以封装语义知识，并促进接收方的有效数据重建。此外，因果机制确保学习表示在不同领域之间保持一致，使系统即使用户从不同领域收集数据也可靠。随着用户收集数据随时间演变，导致用户之间的知识分歧，我们设计了稀疏更新协议，以改善知识的不变属性，同时最小化通信开销。从我们的实证评估中得出了三个关键观察。首先，因果不变的知识确保了不同设备之间的一致性，尽管训练数据不同。其次，不变的知识在分类任务中表现出色，这对目标导向的语义通信至关重要。第三，我们基于知识的数据重建突显了我们解码器的稳健性，其在峰值信噪比（PSNR）方面超越了其他最先进的数据重建和语义压缩方法。

更新时间: 2025-07-23 06:56:07

领域: cs.LG,68,I.2.0

下载: http://arxiv.org/abs/2507.17784v1

Knowledge Abstraction for Knowledge-based Semantic Communication: A Generative Causality Invariant Approach

Updated: 2025-07-23 06:56:07

标题: 基于知识的语义通信的知识抽象：一种生成因果不变方法

摘要: 在这项研究中，我们设计了一个低复杂度和通用的人工智能模型，可以捕捉常见知识，以改善语义通信中信道解码器的数据重建。具体而言，我们提出了一个生成对抗网络，利用因果不变学习来从数据中提取因果和非因果表示。因果表示是不变的，并包含识别数据标签的关键信息。它们可以封装语义知识，并促进接收端有效的数据重建。此外，因果机制确保学习的表示在不同领域之间保持一致，使系统即使用户从不同领域收集数据也可靠。随着用户收集的数据随时间演变，导致用户之间的知识分歧，我们设计了稀疏更新协议，以改善知识的不变属性，同时最小化通信开销。我们的经验评估得出了三个关键观察结果。首先，因果不变的知识确保了不同设备之间的一致性，尽管训练数据各不相同。其次，不变的知识在分类任务中表现出色，这对于目标导向的语义通信至关重要。第三，基于我们的知识的数据重建突显了我们解码器的鲁棒性，在峰值信噪比（PSNR）方面超过了其他最先进的数据重建和语义压缩方法。

更新时间: 2025-07-23 06:56:07

领域: cs.LG,68,I.2.0

下载: http://arxiv.org/abs/2507.17784v1

Rethinking VAE: From Continuous to Discrete Representations Without Probabilistic Assumptions

This paper explores the generative capabilities of Autoencoders (AEs) and establishes connections between Variational Autoencoders (VAEs) and Vector Quantized-Variational Autoencoders (VQ-VAEs) through a reformulated training framework. We demonstrate that AEs exhibit generative potential via latent space interpolation and perturbation, albeit limited by undefined regions in the encoding space. To address this, we propose a new VAE-like training method that introduces clustering centers to enhance data compactness and ensure well-defined latent spaces without relying on traditional KL divergence or reparameterization techniques. Experimental results on MNIST, CelebA, and FashionMNIST datasets show smooth interpolative transitions, though blurriness persists. Extending this approach to multiple learnable vectors, we observe a natural progression toward a VQ-VAE-like model in continuous space. However, when the encoder outputs multiple vectors, the model degenerates into a discrete Autoencoder (VQ-AE), which combines image fragments without learning semantic representations. Our findings highlight the critical role of encoding space compactness and dispersion in generative modeling and provide insights into the intrinsic connections between VAEs and VQ-VAEs, offering a new perspective on their design and limitations.

Updated: 2025-07-23 06:52:00

标题: 重新思考VAE：从连续到离散表示，无需概率假设

摘要: 这篇论文探讨了自动编码器（AEs）的生成能力，并通过重新构建的训练框架建立了变分自动编码器（VAEs）和向量量化-变分自动编码器（VQ-VAEs）之间的联系。我们证明了AEs通过潜在空间插值和扰动展现出生成潜力，尽管受到编码空间中未定义区域的限制。为了解决这一问题，我们提出了一种新的类似VAE的训练方法，引入聚类中心以增强数据紧凑性，并确保潜在空间明确定义，而无需依赖传统的KL散度或重参数化技术。在MNIST、CelebA和FashionMNIST数据集上的实验结果展示了平滑的插值过渡，尽管模糊性仍然存在。将这种方法扩展到多个可学习的向量，我们观察到在连续空间中向VQ-VAE模型的自然进化。然而，当编码器输出多个向量时，模型会退化为离散自动编码器（VQ-AE），将图像片段组合在一起而不学习语义表征。我们的发现突出了编码空间紧凑性和分散性在生成建模中的关键作用，并提供了关于VAEs和VQ-VAEs之间内在联系的见解，为它们的设计和局限性提供了新的视角。

更新时间: 2025-07-23 06:52:00

领域: cs.LG

下载: http://arxiv.org/abs/2507.17255v1

BadHMP: Backdoor Attack against Human Motion Prediction

Precise future human motion prediction over sub-second horizons from past observations is crucial for various safety-critical applications. To date, only a few studies have examined the vulnerability of skeleton-based neural networks to evasion and backdoor attacks. In this paper, we propose BadHMP, a novel backdoor attack that targets specifically human motion prediction tasks. Our approach involves generating poisoned training samples by embedding a localized backdoor trigger in one limb of the skeleton, causing selected joints to follow predefined motion in historical time steps. Subsequently, the future sequences are globally modified that all the joints move following the target trajectories. Our carefully designed backdoor triggers and targets guarantee the smoothness and naturalness of the poisoned samples, making them stealthy enough to evade detection by the model trainer while keeping the poisoned model unobtrusive in terms of prediction fidelity to untainted sequences. The target sequences can be successfully activated by the designed input sequences even with a low poisoned sample injection ratio. Experimental results on two datasets (Human3.6M and CMU-Mocap) and two network architectures (LTD and HRI) demonstrate the high-fidelity, effectiveness, and stealthiness of BadHMP. Robustness of our attack against fine-tuning defense is also verified.

Updated: 2025-07-23 06:48:33

标题: BadHMP：针对人体运动预测的后门攻击

摘要: 从过去的观察中精确地预测未来人类运动在亚秒级时间范围内是各种安全关键应用中至关重要的。迄今为止，只有少数研究考察了基于骨架的神经网络对逃避和后门攻击的脆弱性。在本文中，我们提出了BadHMP，一种专门针对人类运动预测任务的后门攻击。我们的方法涉及通过在骨架的一个肢体中嵌入局部后门触发器来生成毒化训练样本，导致选择的关节在历史时间步骤中遵循预定义的运动。随后，未来序列被全局修改，使所有关节都按照目标轨迹移动。我们精心设计的后门触发器和目标保证了毒化样本的平滑性和自然性，使它们足够隐蔽，以规避模型训练者的检测，同时保持毒化模型在预测忠实度方面与未被污染的序列一致。设计的输入序列可以成功激活目标序列，即使毒化样本注入比例较低。在两个数据集（Human3.6M和CMU-Mocap）和两个网络架构（LTD和HRI）上的实验结果显示了BadHMP的高忠实度、有效性和隐秘性。我们的攻击对于微调防御的强大性也得到了验证。

更新时间: 2025-07-23 06:48:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.19638v2

Reality Proxy: Fluid Interactions with Real-World Objects in MR via Abstract Representations

Interacting with real-world objects in Mixed Reality (MR) often proves difficult when they are crowded, distant, or partially occluded, hindering straightforward selection and manipulation. We observe that these difficulties stem from performing interaction directly on physical objects, where input is tightly coupled to their physical constraints. Our key insight is to decouple interaction from these constraints by introducing proxies-abstract representations of real-world objects. We embody this concept in Reality Proxy, a system that seamlessly shifts interaction targets from physical objects to their proxies during selection. Beyond facilitating basic selection, Reality Proxy uses AI to enrich proxies with semantic attributes and hierarchical spatial relationships of their corresponding physical objects, enabling novel and previously cumbersome interactions in MR - such as skimming, attribute-based filtering, navigating nested groups, and complex multi object selections - all without requiring new gestures or menu systems. We demonstrate Reality Proxy's versatility across diverse scenarios, including office information retrieval, large-scale spatial navigation, and multi-drone control. An expert evaluation suggests the system's utility and usability, suggesting that proxy-based abstractions offer a powerful and generalizable interaction paradigm for future MR systems.

Updated: 2025-07-23 06:34:58

标题: 现实代理：通过抽象表示在MR中与现实世界对象进行流体交互

摘要: 在混合现实（MR）中与现实世界物体进行交互通常会变得困难，特别是当它们拥挤、遥远或部分被遮挡时，阻碍了直接选择和操作。我们观察到，这些困难源于直接在物理对象上进行交互，输入与它们的物理约束紧密耦合。我们的关键洞察是通过引入代理来脱离这些约束，代理是对现实世界物体的抽象表示。我们将这一概念体现在Reality Proxy中，这是一个系统，在选择过程中无缝地将交互目标从物理对象转移至它们的代理。除了促进基本选择外，Reality Proxy还利用人工智能为代理增加语义属性和对应物理对象的层次空间关系，实现了在MR中的新颖和以前繁琐的交互，如略读、基于属性的过滤、导航嵌套组和复杂的多物体选择，而无需新的手势或菜单系统。我们展示了Reality Proxy在各种场景中的多功能性，包括办公信息检索、大规模空间导航和多无人机控制。专家评估表明该系统的实用性和可用性，表明基于代理的抽象为未来MR系统提供了强大且可推广的交互范式。

更新时间: 2025-07-23 06:34:58

领域: cs.HC,cs.AI,cs.GR,H.5.2; I.3.6

下载: http://arxiv.org/abs/2507.17248v1

DistrAttention: An Efficient and Flexible Self-Attention Mechanism on Modern GPUs

The Transformer architecture has revolutionized deep learning, delivering the state-of-the-art performance in areas such as natural language processing, computer vision, and time series prediction. However, its core component, self-attention, has the quadratic time complexity relative to input sequence length, which hinders the scalability of Transformers. The exsiting approaches on optimizing self-attention either discard full-contextual information or lack of flexibility. In this work, we design DistrAttention, an effcient and flexible self-attention mechanism with the full context. DistrAttention achieves this by grouping data on the embedding dimensionality, usually referred to as $d$. We realize DistrAttention with a lightweight sampling and fusion method that exploits locality-sensitive hashing to group similar data. A block-wise grouping framework is further designed to limit the errors introduced by locality sensitive hashing. By optimizing the selection of block sizes, DistrAttention could be easily integrated with FlashAttention-2, gaining high-performance on modern GPUs. We evaluate DistrAttention with extensive experiments. The results show that our method is 37% faster than FlashAttention-2 on calculating self-attention. In ViT inference, DistrAttention is the fastest and the most accurate among approximate self-attention mechanisms. In Llama3-1B, DistrAttention still achieves the lowest inference time with only 1% accuray loss.

Updated: 2025-07-23 06:29:38

标题: DistrAttention：一种在现代GPU上高效灵活的自注意力机制

摘要: Transformer架构彻底改变了深度学习，在自然语言处理、计算机视觉和时间序列预测等领域提供了最先进的性能。然而，其核心组件自注意力相对于输入序列长度具有二次时间复杂度，这限制了Transformer的可扩展性。现有的优化自注意力的方法要么丢弃了完整的上下文信息，要么缺乏灵活性。在这项工作中，我们设计了DistrAttention，一种高效灵活的自注意力机制，具有完整的上下文。DistrAttention通过在嵌入维度上分组数据来实现这一点，通常称为$d$。我们利用轻量级取样和融合方法实现了DistrAttention，利用局部敏感哈希将相似数据分组。进一步设计了一个块状分组框架，以限制局部敏感哈希引入的错误。通过优化块大小的选择，DistrAttention可以轻松与FlashAttention-2集成，实现在现代GPU上的高性能。我们通过大量实验评估了DistrAttention。结果显示，我们的方法在计算自注意力时比FlashAttention-2快37%。在ViT推断中，DistrAttention是近似自注意力机制中最快最准确的。在Llama3-1B中，DistrAttention仍然实现了最低的推断时间，仅有1%的准确率损失。

更新时间: 2025-07-23 06:29:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17245v1

DMS-Net:Dual-Modal Multi-Scale Siamese Network for Binocular Fundus Image Classification

Ophthalmic diseases pose a significant global health challenge, yet traditional diagnosis methods and existing single-eye deep learning approaches often fail to account for binocular pathological correlations. To address this, we propose DMS-Net, a dual-modal multi-scale Siamese network for binocular fundus image classification. Our framework leverages weight-shared Siamese ResNet-152 backbones to extract deep semantic features from paired fundus images. To tackle challenges such as lesion boundary ambiguity and scattered pathological distributions, we introduce a Multi-Scale Context-Aware Module (MSCAM) that integrates adaptive pooling and attention mechanisms for multi-resolution feature aggregation. Additionally, a Dual-Modal Feature Fusion (DMFF) module enhances cross-modal interaction through spatial-semantic recalibration and bidirectional attention, effectively combining global context and local edge features. Evaluated on the ODIR-5K dataset, DMS-Net achieves state-of-the-art performance with 82.9% accuracy, 84.5% recall, and 83.2% Cohen's kappa, demonstrating superior capability in detecting symmetric pathologies and advancing clinical decision-making for ocular diseases.

Updated: 2025-07-23 06:21:52

标题: DMS-Net：双模多尺度连体网络用于双眼底图像分类

摘要: 眼科疾病构成了一个重要的全球健康挑战，然而传统的诊断方法和现有的单眼深度学习方法往往未能考虑到双眼病理学相关性。为了解决这个问题，我们提出了DMS-Net，一种用于双眼眼底图像分类的双模多尺度连体网络。我们的框架利用共享权重的Siamese ResNet-152主干网络从配对的眼底图像中提取深度语义特征。为了解决诸如病变边界模糊和散布式病理分布等挑战，我们引入了一个多尺度上下文感知模块（MSCAM），该模块整合了自适应池化和注意机制，用于多分辨率特征聚合。此外，一个双模态特征融合（DMFF）模块通过空间语义重校准和双向注意，增强了跨模态交互，有效地结合了全局上下文和局部边缘特征。在ODIR-5K数据集上评估，DMS-Net实现了82.9%的准确率，84.5%的召回率和83.2%的Cohen's kappa，展示了在检测对称病变方面的卓越能力，并推动了眼科疾病临床决策的进步。

更新时间: 2025-07-23 06:21:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.18046v2

Eco-Friendly AI: Unleashing Data Power for Green Federated Learning

The widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) comes with a significant environmental impact, particularly in terms of energy consumption and carbon emissions. This pressing issue highlights the need for innovative solutions to mitigate AI's ecological footprint. One of the key factors influencing the energy consumption of ML model training is the size of the training dataset. ML models are often trained on vast amounts of data continuously generated by sensors and devices distributed across multiple locations. To reduce data transmission costs and enhance privacy, Federated Learning (FL) enables model training without the need to move or share raw data. While FL offers these advantages, it also introduces challenges due to the heterogeneity of data sources (related to volume and quality), computational node capabilities, and environmental impact. This paper contributes to the advancement of Green AI by proposing a data-centric approach to Green Federated Learning. Specifically, we focus on reducing FL's environmental impact by minimizing the volume of training data. Our methodology involves the analysis of the characteristics of federated datasets, the selecting of an optimal subset of data based on quality metrics, and the choice of the federated nodes with the lowest environmental impact. We develop a comprehensive methodology that examines the influence of data-centric factors, such as data quality and volume, on FL training performance and carbon emissions. Building on these insights, we introduce an interactive recommendation system that optimizes FL configurations through data reduction, minimizing environmental impact during training. Applying this methodology to time series classification has demonstrated promising results in reducing the environmental impact of FL tasks.

Updated: 2025-07-23 06:18:15

标题: 环保人工智能：释放数据力量以促进绿色联邦学习

摘要: 人工智能（AI）和机器学习（ML）的广泛应用带来了显著的环境影响，特别是在能源消耗和碳排放方面。这一紧迫问题凸显了减少AI生态足迹的创新解决方案的必要性。影响ML模型训练能源消耗的关键因素之一是训练数据集的大小。ML模型通常在由传感器和设备不断生成的大量数据上进行训练，这些数据分布在多个位置。为了降低数据传输成本并增强隐私性，联邦学习（FL）使模型训练无需移动或共享原始数据成为可能。尽管FL具有这些优势，但也由于数据源的异质性（与容量和质量有关）、计算节点的能力和环境影响而引入了挑战。本文通过提出一种以数据为中心的绿色联邦学习方法，为绿色AI的进步做出了贡献。具体而言，我们专注于通过减少训练数据的数量来降低FL的环境影响。我们的方法涉及对联邦数据集特征的分析，基于质量指标选择最佳数据子集，并选择具有最低环境影响的联邦节点。我们制定了一个全面的方法，研究了数据中心因素（如数据质量和数量）对FL训练性能和碳排放的影响。基于这些见解，我们引入了一个交互式推荐系统，通过数据减少优化FL配置，降低训练过程中的环境影响。将这一方法应用于时间序列分类已经展示出减少FL任务的环境影响方面的有前途的结果。

更新时间: 2025-07-23 06:18:15

领域: cs.LG,cs.AI,cs.DB,cs.DC

下载: http://arxiv.org/abs/2507.17241v1

GTA: Grouped-head latenT Attention

Attention mechanisms underpin the success of large language models (LLMs), yet their substantial computational and memory overhead poses challenges for optimizing efficiency and performance. A critical bottleneck arises as KV cache and attention computations scale rapidly with text length, challenging deployment on hardware with limited computational and memory resources. We observe that attention mechanisms exhibit substantial redundancy, since the KV cache can be significantly compressed and attention maps across heads display high similarity, revealing that much of the computation and storage is unnecessary. Leveraging these insights, we propose \textbf{G}rouped-Head Laten\textbf{T} \textbf{A}ttention (GTA), a novel attention mechanism that reduces memory usage and computational complexity while maintaining performance. GTA comprises two components: (1) a shared attention map mechanism that reuses attention scores across multiple heads, decreasing the key cache size; and (2) a nonlinear value decoder with learned projections that compresses the value cache into a latent space, further cutting memory needs. GTA cuts attention computation FLOPs by up to \emph{62.5\%} versus Grouped-Query Attention and shrink the KV cache by up to \emph{70\%}, all while avoiding the extra overhead of Multi-Head Latent Attention to improve LLM deployment efficiency. Consequently, GTA models achieve a \emph{2x} increase in end-to-end inference speed, with prefill benefiting from reduced computational cost and decoding benefiting from the smaller cache footprint.

Updated: 2025-07-23 05:57:32

标题: 《GTA: 分组头部潜在关注》

摘要: 注意机制是大型语言模型（LLMs）成功的基础，然而它们的大量计算和存储开销给优化效率和性能带来了挑战。一个关键瓶颈是随着文本长度的增加，KV缓存和注意力计算迅速扩展，使得在计算和存储资源有限的硬件上部署变得困难。我们观察到注意机制存在大量冗余，因为KV缓存可以被显著压缩，而跨头部的注意力映射显示出高度相似性，揭示出大部分计算和存储是不必要的。基于这些见解，我们提出了\textbf{G}rouped-Head Laten\textbf{T} \textbf{A}ttention（GTA），一种新颖的注意机制，可以减少内存使用和计算复杂性，同时保持性能。GTA包括两个组件：（1）一个共享的注意力映射机制，可以在多个头部之间重复使用注意力得分，减少关键缓存的大小；（2）一个具有学习投影的非线性值解码器，将值缓存压缩为潜在空间，进一步减少内存需求。与Grouped-Query Attention相比，GTA将注意力计算FLOPs减少了高达62.5％，将KV缓存缩小了高达70％，同时避免了多头潜在注意力的额外开销，以提高LLM部署效率。因此，GTA模型的推理速度端到端增加了2倍，预填充受益于降低的计算成本，解码受益于更小的缓存占用。

更新时间: 2025-07-23 05:57:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.17286v2

A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task

Large Language Models (LLMs) are trained on a vast amount of procedural texts, but they do not directly observe real-world phenomena. In the context of cooking recipes, this poses a challenge, as intermediate states of ingredients are often omitted, making it difficult for models to track ingredient states and understand recipes accurately. In this paper, we apply state probing, a method for evaluating a language model's understanding of the world, to the domain of cooking. We propose a new task and dataset for evaluating how well LLMs can recognize intermediate ingredient states during cooking procedures. We first construct a new Japanese recipe dataset with clear and accurate annotations of ingredient state changes, collected from well-structured and controlled recipe texts. Using this dataset, we design three novel tasks to evaluate whether LLMs can track ingredient state transitions and identify ingredients present at intermediate steps. Our experiments with widely used LLMs, such as Llama3.1-70B and Qwen2.5-72B, show that learning ingredient state knowledge improves their understanding of cooking processes, achieving performance comparable to commercial LLMs.

Updated: 2025-07-23 05:56:20

标题: 一个高度干净的食谱数据集，包含成分状态注释，用于状态探测任务

摘要: 大型语言模型（LLMs）是在大量程序文本上进行训练的，但它们并不直接观察现实世界现象。在烹饪食谱的背景下，这带来了挑战，因为食材的中间状态经常被省略，使得模型难以跟踪食材状态并准确理解食谱。在本文中，我们将状态探测（state probing）方法应用于烹饪领域，这是评估语言模型对世界理解的方法。我们提出了一个新的任务和数据集，用于评估LLMs在烹饪过程中能否识别中间食材状态。我们首先构建了一个新的日本食谱数据集，其中包含清晰准确的食材状态变化注释，这些注释来自结构良好且受控的食谱文本。利用这个数据集，我们设计了三个新颖的任务，评估LLMs是否能够跟踪食材状态转变并识别中间步骤中存在的食材。我们对广泛使用的LLMs进行实验，如Llama3.1-70B和Qwen2.5-72B，结果显示学习食材状态知识可以提高它们对烹饪过程的理解，实现了与商业LLMs相媲美的性能。

更新时间: 2025-07-23 05:56:20

领域: cs.MM,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.17232v1

SFNet: A Spatial-Frequency Domain Deep Learning Network for Efficient Alzheimer's Disease Diagnosis

Alzheimer's disease (AD) is a progressive neurodegenerative disorder that predominantly affects the elderly population and currently has no cure. Magnetic Resonance Imaging (MRI), as a non-invasive imaging technique, is essential for the early diagnosis of AD. MRI inherently contains both spatial and frequency information, as raw signals are acquired in the frequency domain and reconstructed into spatial images via the Fourier transform. However, most existing AD diagnostic models extract features from a single domain, limiting their capacity to fully capture the complex neuroimaging characteristics of the disease. While some studies have combined spatial and frequency information, they are mostly confined to 2D MRI, leaving the potential of dual-domain analysis in 3D MRI unexplored. To overcome this limitation, we propose Spatio-Frequency Network (SFNet), the first end-to-end deep learning framework that simultaneously leverages spatial and frequency domain information to enhance 3D MRI-based AD diagnosis. SFNet integrates an enhanced dense convolutional network to extract local spatial features and a global frequency module to capture global frequency-domain representations. Additionally, a novel multi-scale attention module is proposed to further refine spatial feature extraction. Experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset demonstrate that SFNet outperforms existing baselines and reduces computational overhead in classifying cognitively normal (CN) and AD, achieving an accuracy of 95.1%.

Updated: 2025-07-23 05:53:40

标题: SFNet：一种用于高效阿尔茨海默病诊断的空间频率域深度学习网络

摘要: 阿尔茨海默病（AD）是一种逐渐进展的神经退行性疾病，主要影响老年人群，目前尚无治愈方法。磁共振成像（MRI）作为一种无创成像技术，对早期诊断AD至关重要。MRI本质上包含空间和频率信息，因为原始信号是在频率域中获取的，通过傅立叶变换重建为空间图像。然而，大多数现有的AD诊断模型仅从单一领域提取特征，限制了其完全捕捉疾病的复杂神经影像特征的能力。虽然一些研究已经结合了空间和频率信息，但它们大多局限于2D MRI，未探索3D MRI中双域分析的潜力。为了克服这一限制，我们提出了Spatio-Frequency Network（SFNet），这是第一个端到端的深度学习框架，同时利用空间和频率域信息来增强基于3D MRI的AD诊断。SFNet整合了一个增强型密集卷积网络以提取局部空间特征，以及一个全局频率模块来捕获全局频率域表示。此外，提出了一个新颖的多尺度注意力模块来进一步完善空间特征提取。对阿尔茨海默病神经影像倡议（ADNI）数据集上的实验表明，SFNet优于现有基准线，并在分类认知正常（CN）和AD时降低了计算开销，实现了95.1％的准确率。

更新时间: 2025-07-23 05:53:40

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.16267v2

NeuroHD-RA: Neural-distilled Hyperdimensional Model with Rhythm Alignment

We present a novel and interpretable framework for electrocardiogram (ECG)-based disease detection that combines hyperdimensional computing (HDC) with learnable neural encoding. Unlike conventional HDC approaches that rely on static, random projections, our method introduces a rhythm-aware and trainable encoding pipeline based on RR intervals, a physiological signal segmentation strategy that aligns with cardiac cycles. The core of our design is a neural-distilled HDC architecture, featuring a learnable RR-block encoder and a BinaryLinear hyperdimensional projection layer, optimized jointly with cross-entropy and proxy-based metric loss. This hybrid framework preserves the symbolic interpretability of HDC while enabling task-adaptive representation learning. Experiments on Apnea-ECG and PTB-XL demonstrate that our model significantly outperforms traditional HDC and classical ML baselines, achieving 73.09\% precision and an F1 score of 0.626 on Apnea-ECG, with comparable robustness on PTB-XL. Our framework offers an efficient and scalable solution for edge-compatible ECG classification, with strong potential for interpretable and personalized health monitoring.

Updated: 2025-07-23 05:51:42

标题: NeuroHD-RA：具有韵律对齐的神经蒸馏超高维模型

摘要: 我们提出了一种新颖且可解释的基于心电图（ECG）的疾病检测框架，结合了高维计算（HDC）和可学习的神经编码。与依赖于静态、随机投影的传统HDC方法不同，我们的方法引入了一种基于RR间隔的节律感知和可训练的编码管道，这是一种与心脏周期对齐的生理信号分割策略。我们设计的核心是一个神经精炼的HDC架构，具有可学习的RR块编码器和一个BinaryLinear高维投影层，与交叉熵和基于代理的度量损失一起进行优化。这种混合框架在保留HDC的符号可解释性的同时，实现了任务自适应的表示学习。在Apnea-ECG和PTB-XL上的实验表明，我们的模型在传统HDC和经典ML基线的基础上取得了显著的性能提升，在Apnea-ECG上达到了73.09％的精度和0.626的F1分数，而在PTB-XL上具有可比较的鲁棒性。我们的框架为边缘兼容的ECG分类提供了一种高效且可扩展的解决方案，具有强大的解释性和个性化健康监测潜力。

更新时间: 2025-07-23 05:51:42

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.14184v3

P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices

Split Learning (SL) is an emerging privacy-preserving machine learning technique that enables resource constrained edge devices to participate in model training by partitioning a model into client-side and server-side sub-models. While SL reduces computational overhead on edge devices, it encounters significant challenges in heterogeneous environments where devices vary in computing resources, communication capabilities, environmental conditions, and privacy requirements. Although recent studies have explored heterogeneous SL frameworks that optimize split points for devices with varying resource constraints, they often neglect personalized privacy requirements and local model customization under varying environmental conditions. To address these limitations, we propose P3SL, a Personalized Privacy-Preserving Split Learning framework designed for heterogeneous, resource-constrained edge device systems. The key contributions of this work are twofold. First, we design a personalized sequential split learning pipeline that allows each client to achieve customized privacy protection and maintain personalized local models tailored to their computational resources, environmental conditions, and privacy needs. Second, we adopt a bi-level optimization technique that empowers clients to determine their own optimal personalized split points without sharing private sensitive information (i.e., computational resources, environmental conditions, privacy requirements) with the server. This approach balances energy consumption and privacy leakage risks while maintaining high model accuracy. We implement and evaluate P3SL on a testbed consisting of 7 devices including 4 Jetson Nano P3450 devices, 2 Raspberry Pis, and 1 laptop, using diverse model architectures and datasets under varying environmental conditions.

Updated: 2025-07-23 05:50:33

标题: P3SL：异构边缘设备上个性化隐私保护拆分学习

摘要: Split Learning（SL）是一种新兴的隐私保护机器学习技术，它通过将模型分割为客户端和服务器端子模型，使资源受限的边缘设备能够参与模型训练。虽然SL减少了边缘设备上的计算开销，但在异构环境中面临着重大挑战，其中设备在计算资源、通信能力、环境条件和隐私要求方面存在差异。尽管最近的研究已经探索了针对具有不同资源约束的设备优化分割点的异构SL框架，但它们经常忽略了在不同环境条件下的个性化隐私需求和本地模型定制。为了解决这些限制，我们提出了P3SL，一种专为异构、资源受限的边缘设备系统设计的个性化隐私保护分割学习框架。这项工作的关键贡献有两个。首先，我们设计了一个个性化的顺序分割学习流水线，允许每个客户端实现定制的隐私保护，并保持针对其计算资源、环境条件和隐私需求定制的个性化本地模型。其次，我们采用了一个双层优化技术，赋予客户端确定其自身最佳个性化分割点的权力，而无需与服务器共享私人敏感信息（即计算资源、环境条件、隐私要求）。这种方法在保持高模型准确度的同时平衡了能源消耗和隐私泄露风险。我们在一个由4个Jetson Nano P3450设备、2个Raspberry Pis和1台笔记本电脑组成的测试平台上实施和评估P3SL，使用不同的模型架构和数据集在不同的环境条件下。

更新时间: 2025-07-23 05:50:33

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2507.17228v1

HuiduRep: A Robust Self-Supervised Framework for Learning Neural Representations from Extracellular Spikes

Extracellular recordings are brief voltage fluctuations recorded near neurons, widely used in neuroscience as the basis for decoding brain activity at single-neuron resolution. Spike sorting, which assigns each spike to its source neuron, is a critical step in brain sensing pipelines. However, it remains challenging under low signal-to-noise ratio (SNR), electrode drift, and cross-session variability. In this paper, we propose HuiduRep, a robust self-supervised representation learning framework that extracts discriminative and generalizable features from extracellular spike waveforms. By combining contrastive learning with a denoising autoencoder, HuiduRep learns latent representations that are robust to noise and drift. Built on HuiduRep, we develop a spike sorting pipeline that clusters spike representations without supervision. Experiments on hybrid and real-world datasets demonstrate that HuiduRep achieves strong robustness and the pipeline matches or outperforms state-of-the-art tools such as KiloSort4 and MountainSort5. These findings demonstrate the potential of self-supervised spike representation learning as a foundational tool for robust and generalizable processing of extracellular recordings.

Updated: 2025-07-23 05:45:38

标题: HuiduRep：一种稳健的自监督框架，用于从细胞外尖波中学习神经表示

摘要: 细胞外记录是记录在神经元附近的短暂电压波动，被广泛应用于神经科学中，作为解码单个神经元脑活动的基础。尖峰排序将每个尖峰分配给其源神经元，是脑感知管道中的关键步骤。然而，在低信噪比（SNR）、电极漂移和跨会话变异性下，尖峰排序仍然具有挑战性。在本文中，我们提出了HuiduRep，一个强大的自监督表示学习框架，从细胞外尖峰波形中提取具有区分性和泛化性的特征。通过将对比学习与去噪自动编码器相结合，HuiduRep学习到对噪声和漂移具有鲁棒性的潜在表示。基于HuiduRep，我们开发了一个尖峰排序管道，可以在无监督的情况下对尖峰表示进行聚类。对混合和真实世界数据集的实验证明，HuiduRep具有强大的鲁棒性，并且该管道与KiloSort4和MountainSort5等最先进工具相匹配或表现更好。这些发现表明，自监督尖峰表示学习有望成为处理细胞外记录的强大和泛化性处理的基础工具。

更新时间: 2025-07-23 05:45:38

领域: eess.SP,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2507.17224v1

Dataset Distillation as Data Compression: A Rate-Utility Perspective

Driven by the ``scale-is-everything'' paradigm, modern machine learning increasingly demands ever-larger datasets and models, yielding prohibitive computational and storage requirements. Dataset distillation mitigates this by compressing an original dataset into a small set of synthetic samples, while preserving its full utility. Yet, existing methods either maximize performance under fixed storage budgets or pursue suitable synthetic data representations for redundancy removal, without jointly optimizing both objectives. In this work, we propose a joint rate-utility optimization method for dataset distillation. We parameterize synthetic samples as optimizable latent codes decoded by extremely lightweight networks. We estimate the Shannon entropy of quantized latents as the rate measure and plug any existing distillation loss as the utility measure, trading them off via a Lagrange multiplier. To enable fair, cross-method comparisons, we introduce bits per class (bpc), a precise storage metric that accounts for sample, label, and decoder parameter costs. On CIFAR-10, CIFAR-100, and ImageNet-128, our method achieves up to $170\times$ greater compression than standard distillation at comparable accuracy. Across diverse bpc budgets, distillation losses, and backbone architectures, our approach consistently establishes better rate-utility trade-offs.

Updated: 2025-07-23 05:40:52

标题: 数据集提炼作为数据压缩：一个速率-效用角度的视角

摘要: 受“规模即一切”的范式驱动，现代机器学习越来越需要更大的数据集和模型，从而产生了令人望而却步的计算和存储需求。数据集精炼通过将原始数据集压缩成一小组合成样本来缓解这一问题，同时保留其完整的效用。然而，现有方法要么在固定存储预算下最大化性能，要么追求适合的合成数据表示以去除冗余，而没有同时优化这两个目标。在这项工作中，我们提出了一种用于数据集精炼的联合速率-效用优化方法。我们将合成样本参数化为可优化的潜在代码，由极轻量级网络解码。我们将量化潜在代码的香农熵估计为速率度量，将任何现有的精炼损失作为效用度量，并通过拉格朗日乘子进行折衷。为了实现公平的跨方法比较，我们引入了每类比特（bpc），这是一种精确的存储度量，考虑了样本、标签和解码器参数成本。在CIFAR-10、CIFAR-100和ImageNet-128上，我们的方法在可比准确度下比标准精炼压缩高达170倍。在不同的bpc预算、精炼损失和骨干架构下，我们的方法始终建立更好的速率-效用折衷。

更新时间: 2025-07-23 05:40:52

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.17221v1

Onto-LLM-TAMP: Knowledge-oriented Task and Motion Planning using Large Language Models

Performing complex manipulation tasks in dynamic environments requires efficient Task and Motion Planning (TAMP) approaches that combine high-level symbolic plans with low-level motion control. Advances in Large Language Models (LLMs), such as GPT-4, are transforming task planning by offering natural language as an intuitive and flexible way to describe tasks, generate symbolic plans, and reason. However, the effectiveness of LLM-based TAMP approaches is limited due to static and template-based prompting, which limits adaptability to dynamic environments and complex task contexts. To address these limitations, this work proposes a novel Onto-LLM-TAMP framework that employs knowledge-based reasoning to refine and expand user prompts with task-contextual reasoning and knowledge-based environment state descriptions. Integrating domain-specific knowledge into the prompt ensures semantically accurate and context-aware task plans. The proposed framework demonstrates its effectiveness by resolving semantic errors in symbolic plan generation, such as maintaining logical temporal goal ordering in scenarios involving hierarchical object placement. The proposed framework is validated through both simulation and real-world scenarios, demonstrating significant improvements over the baseline approach in terms of adaptability to dynamic environments and the generation of semantically correct task plans.

Updated: 2025-07-23 05:31:07

标题: Onto-LLM-TAMP：使用大型语言模型的基于知识的任务和动作规划

摘要: 在动态环境中执行复杂操纵任务需要高效的任务和动作规划（TAMP）方法，这些方法将高层次的符号计划与低层次的运动控制结合起来。大型语言模型（LLMs）的进展，例如GPT-4，通过提供自然语言作为描述任务、生成符号计划和推理的直观灵活方式，正在改变任务规划。然而，基于LLM的TAMP方法的有效性受到静态和基于模板的提示的限制，这限制了对动态环境和复杂任务背景的适应能力。为了解决这些限制，本文提出了一个新颖的Onto-LLM-TAMP框架，该框架采用基于知识的推理，以精细化和扩展用户提示，增加任务上下文推理和基于知识的环境状态描述。将领域专业知识整合到提示中，确保任务计划在语义上准确且具有上下文意识。所提出的框架通过解决符号计划生成中的语义错误，例如在涉及分层物体放置的情景中保持逻辑时间目标排序，展示了其有效性。该框架通过仿真和现实场景进行验证，显示在适应动态环境和生成语义正确任务计划方面相比基线方法取得了显著改进。

更新时间: 2025-07-23 05:31:07

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2412.07493v2

A Low-Cost Machine Learning Approach for Timber Diameter Estimation

The wood processing industry, particularly in facilities such as sawmills and MDF production lines, requires accurate and efficient identification of species and thickness of the wood. Although traditional methods rely heavily on expert human labor, they are slow, inconsistent, and prone to error, especially when processing large volumes. This study focuses on practical and cost-effective machine learning frameworks that automate the estimation of timber log diameter using standard RGB images captured under real-world working conditions. We employ the YOLOv5 object detection algorithm, fine-tuned on a public dataset (TimberSeg 1.0), to detect individual timber logs and estimate thickness through bounding-box dimensions. Unlike previous methods that require expensive sensors or controlled environments, this model is trained on images taken in typical industrial sheds during timber delivery. Experimental results show that the model achieves a mean Average Precision (mAP@0.5) of 0.64, demonstrating reliable log detection even with modest computing resources. This lightweight, scalable solution holds promise for practical integration into existing workflows, including on-site inventory management and preliminary sorting, particularly in small and medium-sized operations.

Updated: 2025-07-23 05:29:28

标题: 一种用于木材直径估计的低成本机器学习方法

摘要: 木材加工行业，特别是在锯木厂和MDF生产线等设施中，需要准确高效地识别木材的种类和厚度。尽管传统方法在很大程度上依赖于专家人工劳动，但它们速度慢、不一致且容易出错，特别是在处理大量木材时。本研究重点研究了实用且成本效益高的机器学习框架，通过在真实工作环境下拍摄的标准RGB图像自动估计木材原木直径。我们采用了YOLOv5目标检测算法，在公共数据集（TimberSeg 1.0）上进行微调，以检测单个木材原木并通过边界框尺寸估计厚度。与以往需要昂贵传感器或受控环境的方法不同，该模型是在典型工业棚内木材运送过程中拍摄的图像上进行训练的。实验结果显示，该模型实现了平均精度（mAP@0.5）达到0.64，展示了即使在计算资源有限的情况下也能可靠检测木材原木。这种轻量级、可扩展的解决方案有望实现与现有工作流程的实际整合，包括现场库存管理和初步分类，特别是在中小型企业中。

更新时间: 2025-07-23 05:29:28

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.17219v1

The Pluralistic Moral Gap: Understanding Judgment and Value Differences between Humans and Large Language Models

People increasingly rely on Large Language Models (LLMs) for moral advice, which may influence humans' decisions. Yet, little is known about how closely LLMs align with human moral judgments. To address this, we introduce the Moral Dilemma Dataset, a benchmark of 1,618 real-world moral dilemmas paired with a distribution of human moral judgments consisting of a binary evaluation and a free-text rationale. We treat this problem as a pluralistic distributional alignment task, comparing the distributions of LLM and human judgments across dilemmas. We find that models reproduce human judgments only under high consensus; alignment deteriorates sharply when human disagreement increases. In parallel, using a 60-value taxonomy built from 3,783 value expressions extracted from rationales, we show that LLMs rely on a narrower set of moral values than humans. These findings reveal a pluralistic moral gap: a mismatch in both the distribution and diversity of values expressed. To close this gap, we introduce Dynamic Moral Profiling (DMP), a Dirichlet-based sampling method that conditions model outputs on human-derived value profiles. DMP improves alignment by 64.3% and enhances value diversity, offering a step toward more pluralistic and human-aligned moral guidance from LLMs.

Updated: 2025-07-23 05:26:17

标题: 多元道德鸿沟：理解人类与大型语言模型之间的判断和价值差异

摘要: 人们越来越依赖大型语言模型（LLMs）来获得道德建议，这可能会影响人类的决策。然而，我们对LLMs与人类道德判断的接近程度知之甚少。为了解决这个问题，我们引入了道德困境数据集，这是一个由1,618个现实世界道德困境和人类道德判断分布组成的基准，包括一个二元评估和一个自由文本理由。我们将这个问题视为一项多元分布对齐任务，比较LLM和人类在道德困境中的判断分布。我们发现，模型仅在高度一致时才能复制人类判断；当人类分歧增加时，对齐度急剧下降。同时，使用从理由中提取的3,783个价值表达构建的60个值分类法，我们发现LLMs依赖的道德价值比人类更狭窄。这些发现揭示了一种多元道德差距：价值表达的分布和多样性不匹配。为了弥合这一差距，我们引入了动态道德剖析（DMP），这是一种基于狄利克雷采样方法，将模型输出条件设置为人类衍生的价值剖析。DMP将对齐度提高了64.3%，并增加了价值多样性，为从LLMs获得更多多元和与人类对齐的道德指导迈出了一步。

更新时间: 2025-07-23 05:26:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.17216v1

LEGO Co-builder: Exploring Fine-Grained Vision-Language Modeling for Multimodal LEGO Assembly Assistants

Vision-language models (VLMs) are facing the challenges of understanding and following multimodal assembly instructions, particularly when fine-grained spatial reasoning and precise object state detection are required. In this work, we explore LEGO Co-builder, a hybrid benchmark combining real-world LEGO assembly logic with programmatically generated multimodal scenes. The dataset captures stepwise visual states and procedural instructions, allowing controlled evaluation of instruction-following, object detection, and state detection. We introduce a unified framework and assess leading VLMs such as GPT-4o, Gemini, and Qwen-VL, under zero-shot and fine-tuned settings. Our results reveal that even advanced models like GPT-4o struggle with fine-grained assembly tasks, with a maximum F1 score of just 40.54\% on state detection, highlighting gaps in fine-grained visual understanding. We release the benchmark, codebase, and generation pipeline to support future research on multimodal assembly assistants grounded in real-world workflows.

Updated: 2025-07-23 05:20:57

标题: LEGO合作构建者：探索用于多模态乐高组装助手的细粒度视觉-语言建模

摘要: 视觉语言模型（VLMs）面临理解和遵循多模态装配说明的挑战，特别是在需要精细的空间推理和精确的物体状态检测时。在这项工作中，我们探讨了LEGO Co-builder，这是一个结合了真实世界LEGO组装逻辑和程序化生成的多模态场景的混合基准。该数据集捕捉了逐步的视觉状态和程序性说明，允许对指令遵循、物体检测和状态检测进行控制评估。我们引入了一个统一的框架，并评估了领先的VLMs，如GPT-4o、Gemini和Qwen-VL，在零样本和微调设置下的表现。我们的结果显示，即使像GPT-4o这样的先进模型也在精细的组装任务中遇到困难，最大的F1分数仅为40.54％，突显了在精细的视觉理解方面存在差距。我们发布了该基准、代码库和生成流水线，以支持未来基于真实工作流程的多模态组装助手的研究。

更新时间: 2025-07-23 05:20:57

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2507.05515v2

Our Cars Can Talk: How IoT Brings AI to Vehicles

Bringing AI to vehicles and enabling them as sensing platforms is key to transforming maintenance from reactive to proactive. Now is the time to integrate AI copilots that speak both languages: machine and driver. This article offers a conceptual and technical perspective intended to spark interdisciplinary dialogue and guide future research and development in intelligent vehicle systems, predictive maintenance, and AI-powered user interaction.

Updated: 2025-07-23 05:12:04

标题: 我们的汽车会说话：物联网如何将人工智能引入车辆

摘要: 将人工智能引入车辆，并使其成为感知平台，是将维护从被动变为主动的关键。现在是将会说两种语言的AI副驾驶整合到一起的时候：机器和驾驶员。本文提供了一个概念和技术的视角，旨在激发跨学科对话，并引导未来智能车辆系统、预测性维护和基于人工智能的用户交互的研究和发展。

更新时间: 2025-07-23 05:12:04

领域: cs.AI,cs.CY,cs.NI,cs.SY,eess.SY,I.2; B.8; C.2; I.5; J.7

下载: http://arxiv.org/abs/2507.17214v1

APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

We propose the APTx Neuron, a novel, unified neural computation unit that integrates non-linear activation and linear transformation into a single trainable expression. The APTx Neuron is derived from the APTx activation function, thereby eliminating the need for separate activation layers and making the architecture both computationally efficient and elegant. The proposed neuron follows the functional form $y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$, where all parameters $\alpha_i$, $\beta_i$, $\gamma_i$, and $\delta$ are trainable. We validate our APTx Neuron-based architecture on the MNIST dataset, achieving up to 96.69% test accuracy in just 20 epochs using approximately 332K trainable parameters. The results highlight the superior expressiveness and computational efficiency of the APTx Neuron compared to traditional neurons, pointing toward a new paradigm in unified neuron design and the architectures built upon it.

Updated: 2025-07-23 05:09:48

标题: APTx神经元：集成激活和计算的统一可训练神经元架构

摘要: 我们提出了APTx神经元，这是一个新颖的、统一的神经计算单元，将非线性激活和线性变换整合为一个可训练的表达式。APTx神经元源自APTx激活函数，因此无需单独的激活层，使得架构既具有计算效率又优雅。所提出的神经元遵循函数形式$y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$，其中所有参数$\alpha_i$、$\beta_i$、$\gamma_i$和$\delta$都是可训练的。我们在MNIST数据集上验证了基于APTx神经元的架构，在仅20个epochs中使用约332K个可训练参数达到了高达96.69%的测试准确率。结果突显了APTx神经元相对于传统神经元的出色表达能力和计算效率，指向了统一神经元设计和基于其构建的架构的新范式。

更新时间: 2025-07-23 05:09:48

领域: cs.NE,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.14270v2

Blind Source Separation of Single-Channel Mixtures via Multi-Encoder Autoencoders

The task of blind source separation (BSS) involves separating sources from a mixture without prior knowledge of the sources or the mixing system. Single-channel mixtures and non-linear mixtures are a particularly challenging problem in BSS. In this paper, we propose a novel method for addressing BSS with single-channel non-linear mixtures by leveraging the natural feature subspace specialization ability of multi-encoder autoencoders. During the training phase, our method unmixes the input into the separate encoding spaces of the multi-encoder network and then remixes these representations within the decoder for a reconstruction of the input. Then to perform source inference, we introduce a novel encoding masking technique whereby masking out all but one of the encodings enables the decoder to estimate a source signal. To this end, we also introduce a sparse mixing loss that encourages sparse remixing of source encodings throughout the decoder and a so-called zero reconstruction loss on the decoder for coherent source estimations. To analyze and evaluate our method, we conduct experiments on a toy dataset, designed to demonstrate this property of feature subspace specialization, and with real-world biosignal recordings from a polysomnography sleep study for extracting respiration from electrocardiogram and photoplethysmography signals.

Updated: 2025-07-23 05:09:18

标题: 单通道混合信号的盲源分离：通过多编码器自编码器实现

摘要: 盲源分离（BSS）的任务涉及在没有先验知识的情况下从混合物中分离源或混合系统。单通道混合和非线性混合是BSS中一个特别具有挑战性的问题。本文提出了一种新颖的方法，通过利用多编码器自动编码器的自然特征子空间专门化能力来解决单通道非线性混合的BSS问题。在训练阶段，我们的方法将输入解混为多编码器网络的单独编码空间，然后在解码器内重新混合这些表示以重构输入。然后，为了进行源推断，我们引入了一种新颖的编码掩蔽技术，通过屏蔽除一个之外的所有编码，使解码器能够估计源信号。为此，我们还引入了一种稀疏混合损失，以鼓励解码器中的源编码稀疏混合，并在解码器上引入所谓的零重构损失，以实现一致的源估计。为了分析和评估我们的方法，我们在一个玩具数据集上进行实验，旨在展示特征子空间专门化的特性，并使用来自多导睡眠研究的真实生物信号记录，以提取呼吸信号从心电图和光电脉搏信号中。

更新时间: 2025-07-23 05:09:18

领域: eess.SP,cs.LG,I.2.6

下载: http://arxiv.org/abs/2309.07138v4

HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery

Modern scientific discovery faces growing challenges in integrating vast and heterogeneous knowledge critical to breakthroughs in biomedicine and drug development. Traditional hypothesis-driven research, though effective, is constrained by human cognitive limits, the complexity of biological systems, and the high cost of trial-and-error experimentation. Deep learning models, especially graph neural networks (GNNs), have accelerated prediction generation, but the sheer volume of outputs makes manual selection for validation unscalable. Large language models (LLMs) offer promise in filtering and hypothesis generation, yet suffer from hallucinations and lack grounding in structured knowledge, limiting their reliability. To address these issues, we propose HypoChainer, a collaborative visualization framework that integrates human expertise, LLM-driven reasoning, and knowledge graphs (KGs) to enhance hypothesis generation and validation. HypoChainer operates in three stages: First, exploration and contextualization -- experts use retrieval-augmented LLMs (RAGs) and dimensionality reduction to navigate large-scale GNN predictions, assisted by interactive explanations. Second, hypothesis chain formation -- experts iteratively examine KG relationships around predictions and semantically linked entities, refining hypotheses with LLM and KG suggestions. Third, validation prioritization -- refined hypotheses are filtered based on KG-supported evidence to identify high-priority candidates for experimentation, with visual analytics further strengthening weak links in reasoning. We demonstrate HypoChainer's effectiveness through case studies in two domains and expert interviews, highlighting its potential to support interpretable, scalable, and knowledge-grounded scientific discovery.

Updated: 2025-07-23 05:02:54

标题: HypoChainer：一种结合LLMs和知识图谱的协作系统，用于基于假设的科学发现

摘要: 现代科学发现在整合广泛和异构的知识方面面临越来越大的挑战，这对于生物医学和药物开发的突破至关重要。传统的假设驱动研究虽然有效，但受到人类认知限制、生物系统复杂性和试错实验的高成本的限制。深度学习模型，特别是图神经网络（GNNs），已加速预测生成，但输出量之大使得人工选择进行验证难以扩展。大型语言模型（LLMs）在过滤和假设生成方面具有潜力，但存在幻觉问题，并且缺乏结构化知识的依据，限制了它们的可靠性。为解决这些问题，我们提出了HypoChainer，这是一个集成人类专业知识、LLM驱动推理和知识图谱（KGs）的协作可视化框架，以增强假设的生成和验证。HypoChainer分为三个阶段：首先，探索和情境化--专家使用检索增强LLMs（RAGs）和降维技术来浏览大规模GNN预测，辅以交互式解释。其次，假设链形成--专家迭代地检查围绕预测和语义链接实体的知识图谱关系，利用LLM和KG的建议细化假设。第三，验证优先级--根据KG支持的证据对细化的假设进行筛选，确定实验的高优先级候选者，并通过视觉分析进一步加强推理中的弱连接。我们通过两个领域的案例研究和专家访谈展示了HypoChainer的有效性，突出了其支持可解释、可扩展和基于知识的科学发现的潜力。

更新时间: 2025-07-23 05:02:54

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.17209v1

AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation

In modern large language models (LLMs), LLM alignment is of crucial importance and is typically achieved through methods such as reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO). However, in most existing methods for LLM alignment, all tokens in the response are optimized using a sparse, response-level reward or preference annotation. The ignorance of token-level rewards may erroneously punish high-quality tokens or encourage low-quality tokens, resulting in suboptimal performance and slow convergence speed. To address this issue, we propose AlignDistil, an RLHF-equivalent distillation method for token-level reward optimization. Specifically, we introduce the reward learned by DPO into the RLHF objective and theoretically prove the equivalence between this objective and a token-level distillation process, where the teacher distribution linearly combines the logits from the DPO model and a reference model. On this basis, we further bridge the accuracy gap between the reward from the DPO model and the pure reward model, by building a contrastive DPO reward with a normal and a reverse DPO model. Moreover, to avoid under- and over-optimization on different tokens, we design a token adaptive logit extrapolation mechanism to construct an appropriate teacher distribution for each token. Experimental results demonstrate the superiority of our AlignDistil over existing methods and showcase fast convergence due to its token-level distributional reward optimization.

Updated: 2025-07-23 04:59:45

标题: AlignDistil: 作为自适应策略蒸馏的基于标记级语言模型对齐

摘要: 在现代大型语言模型（LLMs）中，LLM对齐是至关重要的，并通常通过强化学习从人类反馈（RLHF）和直接偏好优化（DPO）等方法实现。然而，在大多数现有的LLM对齐方法中，响应中的所有标记都使用稀疏的响应级奖励或偏好注释进行优化。忽略标记级奖励可能会错误地惩罚高质量的标记或鼓励低质量的标记，导致性能亚优化和收敛速度缓慢。为了解决这个问题，我们提出了AlignDistil，这是一个用于标记级奖励优化的RLHF等价蒸馏方法。具体而言，我们将DPO学习的奖励引入RLHF目标，并在理论上证明了这个目标与一个标记级蒸馏过程之间的等价性，其中教师分布线性组合了来自DPO模型和参考模型的logits。在此基础上，我们通过建立一个具有正向和反向DPO模型的对比DPO奖励，进一步弥合了来自DPO模型和纯奖励模型之间的准确性差距。此外，为了避免对不同标记进行过度或不足优化，我们设计了一个标记自适应logit外推机制，以构建适当的教师分布。实验结果表明，我们的AlignDistil优于现有方法，并展示了由于其标记级分布奖励优化而快速收敛的优越性。

更新时间: 2025-07-23 04:59:45

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.02832v3

ICCO: Learning an Instruction-conditioned Coordinator for Language-guided Task-aligned Multi-robot Control

Recent advances in Large Language Models (LLMs) have permitted the development of language-guided multi-robot systems, which allow robots to execute tasks based on natural language instructions. However, achieving effective coordination in distributed multi-agent environments remains challenging due to (1) misalignment between instructions and task requirements and (2) inconsistency in robot behaviors when they independently interpret ambiguous instructions. To address these challenges, we propose Instruction-Conditioned Coordinator (ICCO), a Multi-Agent Reinforcement Learning (MARL) framework designed to enhance coordination in language-guided multi-robot systems. ICCO consists of a Coordinator agent and multiple Local Agents, where the Coordinator generates Task-Aligned and Consistent Instructions (TACI) by integrating language instructions with environmental states, ensuring task alignment and behavioral consistency. The Coordinator and Local Agents are jointly trained to optimize a reward function that balances task efficiency and instruction following. A Consistency Enhancement Term is added to the learning objective to maximize mutual information between instructions and robot behaviors, further improving coordination. Simulation and real-world experiments validate the effectiveness of ICCO in achieving language-guided task-aligned multi-robot control. The demonstration can be found at https://yanoyoshiki.github.io/ICCO/.

Updated: 2025-07-23 04:56:04

标题: ICCO: 学习一个指令条件的协调器，用于语言引导的任务对齐多机器人控制

摘要: 最近大规模语言模型（LLMs）的进展使得语言引导的多机器人系统的发展成为可能，这使得机器人能够根据自然语言指令执行任务。然而，在分布式多智能体环境中实现有效协调仍然具有挑战性，原因包括（1）指令与任务需求之间的不一致，以及（2）机器人在独立解释模糊指令时行为的不一致性。为了解决这些挑战，我们提出了Instruction-Conditioned Coordinator（ICCO），这是一个旨在增强语言引导的多机器人系统协调的多智能体强化学习（MARL）框架。ICCO包括一个协调员代理和多个本地代理，其中协调员通过将语言指令与环境状态集成，生成任务对齐和行为一致的指令（TACI），确保任务对齐和行为一致性。协调员和本地代理共同训练以优化一个平衡任务效率和指令遵循的奖励函数。为了最大化指令与机器人行为之间的互信息，进一步改善协调，学习目标中添加了一项一致性增强项。模拟和实际实验验证了ICCO在实现语言引导的任务对齐多机器人控制方面的有效性。演示可以在https://yanoyoshiki.github.io/ICCO/找到。

更新时间: 2025-07-23 04:56:04

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.12122v2

Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation

Effective content moderation is essential for video platforms to safeguard user experience and uphold community standards. While traditional video classification models effectively handle well-defined moderation tasks, they struggle with complicated scenarios such as implicit harmful content and contextual ambiguity. Multimodal large language models (MLLMs) offer a promising solution to these limitations with their superior cross-modal reasoning and contextual understanding. However, two key challenges hinder their industrial adoption. First, the high computational cost of MLLMs makes full-scale deployment impractical. Second, adapting generative models for discriminative classification remains an open research problem. In this paper, we first introduce an efficient method to transform a generative MLLM into a multimodal classifier using minimal discriminative training data. To enable industry-scale deployment, we then propose a router-ranking cascade system that integrates MLLMs with a lightweight router model. Offline experiments demonstrate that our MLLM-based approach improves F1 score by 66.50% over traditional classifiers while requiring only 2% of the fine-tuning data. Online evaluations show that our system increases automatic content moderation volume by 41%, while the cascading deployment reduces computational cost to only 1.5% of direct full-scale deployment.

Updated: 2025-07-23 04:52:58

标题: 过滤和精炼：基于MLLM的工业规模视频内容管理级联系统

摘要: 有效的内容审查对视频平台来说至关重要，可以保障用户体验并维护社区标准。传统的视频分类模型可以有效处理明确定义的审查任务，但在涉及含蓄有害内容和语境模糊的复杂情况下表现不佳。多模态大型语言模型（MLLMs）以其出色的跨模态推理和语境理解能力，为解决这些限制提供了有希望的解决方案。然而，两个关键挑战阻碍了它们在工业中的应用。首先，MLLMs的高计算成本使得完全部署不切实际。其次，将生成模型调整为判别分类仍然是一个未解决的研究问题。本文首先介绍了一种高效的方法，将生成式MLLM转化为多模态分类器，只需最少的判别训练数据。为了实现工业规模的部署，我们提出了一个集成MLLMs和轻量级路由器模型的路由器-排名级联系统。离线实验表明，我们基于MLLM的方法将F1分数提高了66.50%，而只需要2%的微调数据。在线评估显示，我们的系统将自动内容审查量增加了41%，而级联部署将计算成本降低到直接全面部署的1.5%。

更新时间: 2025-07-23 04:52:58

领域: cs.LG

下载: http://arxiv.org/abs/2507.17204v1

DesignLab: Designing Slides Through Iterative Detection and Correction

Designing high-quality presentation slides can be challenging for non-experts due to the complexity involved in navigating various design choices. Numerous automated tools can suggest layouts and color schemes, yet often lack the ability to refine their own output, which is a key aspect in real-world workflows. We propose DesignLab, which separates the design process into two roles, the design reviewer, who identifies design-related issues, and the design contributor who corrects them. This decomposition enables an iterative loop where the reviewer continuously detects issues and the contributor corrects them, allowing a draft to be further polished with each iteration, reaching qualities that were unattainable. We fine-tune large language models for these roles and simulate intermediate drafts by introducing controlled perturbations, enabling the design reviewer learn design errors and the contributor learn how to fix them. Our experiments show that DesignLab outperforms existing design-generation methods, including a commercial tool, by embracing the iterative nature of designing which can result in polished, professional slides.

Updated: 2025-07-23 04:49:48

标题: DesignLab：通过迭代检测和修正设计幻灯片

摘要: 设计高质量的演示文稿对于非专家来说可能是具有挑战性的，因为要处理各种设计选择的复杂性。许多自动化工具可以提供布局和配色方案的建议，但通常缺乏对自身输出进行细化的能力，这是现实工作流程中的关键方面。我们提出了DesignLab，将设计过程分为两个角色，设计审查员负责识别与设计相关的问题，设计贡献者负责纠正这些问题。这种分解使得审查员不断检测问题，设计贡献者纠正问题，允许每次迭代进一步完善草稿，达到以前无法实现的质量水平。我们为这些角色微调了大型语言模型，并通过引入受控扰动来模拟中间草稿，使设计审查员学习设计错误，设计贡献者学习如何修复这些错误。我们的实验表明，DesignLab在实现设计的迭代性质方面优于现有的设计生成方法，包括商业工具，可以产生经过精细打磨的专业幻灯片。

更新时间: 2025-07-23 04:49:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17202v1

Threshold-Protected Searchable Sharing: Privacy Preserving Aggregated-ANN Search for Collaborative RAG

LLM-powered search services have driven data integration as a significant trend. However, this trend's progress is fundamentally hindered, despite the fact that combining individual knowledge can significantly improve the relevance and quality of responses in specialized queries and make AI more professional at providing services. Two key bottlenecks are private data repositories' locality constraints and the need to maintain compatibility with mainstream search techniques, particularly Hierarchical Navigable Small World (HNSW) indexing for high-dimensional vector spaces. In this work, we develop a secure and privacy-preserving aggregated approximate nearest neighbor search (SP-A$^2$NN) with HNSW compatibility under a threshold-based searchable sharing primitive. A sharable bitgraph structure is constructed and extended to support searches and dynamical insertions over shared data without compromising the underlying graph topology. The approach reduces the complexity of a search from $O(n^2)$ to $O(n)$ compared to naive (undirected) graph-sharing approach when organizing graphs in the identical HNSW manner. On the theoretical front, we explore a novel security analytical framework that incorporates privacy analysis via reductions. The proposed leakage-guessing proof system is built upon an entirely different interactive game that is independent of existing coin-toss game design. Rather than being purely theoretical, this system is rooted in existing proof systems but goes beyond them to specifically address leakage concerns and standardize leakage analysis -- one of the most critical security challenges with AI's rapid development.

Updated: 2025-07-23 04:45:01

标题: Threshold-Protected Searchable Sharing: 隐私保护的聚合-ANN搜索用于协作RAG

摘要: 由LLM驱动的搜索服务已经促使数据集成成为一个重要趋势。然而，尽管结合个人知识可以显著提高专业查询的相关性和质量，并使人工智能在提供服务方面更加专业，但这一趋势的进展在根本上受到阻碍。两个关键瓶颈是私有数据仓库的地域约束和需要与主流搜索技术保持兼容性，特别是用于高维向量空间的分层可导航小世界（HNSW）索引。在这项工作中，我们开发了一种安全且保护隐私的聚合近似最近邻搜索（SP-A$^2$NN），在阈值为基础的可搜索共享原语下与HNSW兼容。构建了一个可共享的位图结构，并扩展以支持在共享数据上进行搜索和动态插入，而不会损害底层图拓扑结构。与组织图表以相同的HNSW方式时的朴素（无向）图共享方法相比，该方法将搜索的复杂度从$O(n^2)$降低到$O(n)$。在理论方面，我们探索了一个融合隐私分析的新型安全分析框架。提出的泄露猜测证明系统建立在一个完全不同的互动游戏之上，独立于现有的抛硬币游戏设计。这个系统不仅仅是理论上的，它根植于现有的证明系统，但超越它们，专门解决泄露问题，并标准化泄露分析--这是人工智能快速发展中最关键的安全挑战之一。

更新时间: 2025-07-23 04:45:01

领域: cs.CR

下载: http://arxiv.org/abs/2507.17199v1

Dispatch-Aware Deep Neural Network for Optimal Transmission Switching: Toward Real-Time and Feasibility Guaranteed Operation

Optimal transmission switching (OTS) improves optimal power flow (OPF) by selectively opening transmission lines, but its mixed-integer formulation increases computational complexity, especially on large grids. To deal with this, we propose a dispatch-aware deep neural network (DA-DNN) that accelerates DC-OTS without relying on pre-solved labels. DA-DNN predicts line states and passes them through a differentiable DC-OPF layer, using the resulting generation cost as the loss function so that all physical network constraints are enforced throughout training and inference. In addition, we adopt a customized weight-bias initialization that keeps every forward pass feasible from the first iteration, which allows stable learning on large grids. Once trained, the proposed DA-DNN produces a provably feasible topology and dispatch pair in the same time as solving the DCOPF, whereas conventional mixed-integer solvers become intractable. As a result, the proposed method successfully captures the economic advantages of OTS while maintaining scalability.

Updated: 2025-07-23 04:39:29

标题: 调度感知深度神经网络用于最佳传输开关：实时和可行性保证运行方向

摘要: 最优输电开关（OTS）通过选择性地打开输电线路，改善了最优潮流（OPF），但其混合整数形式增加了计算复杂性，尤其是在大规模电网上。为了解决这个问题，我们提出了一种可调度感知的深度神经网络（DA-DNN），加速了DC-OTS，而不依赖于预先解决的标签。DA-DNN预测线路状态，并将其通过可微分的DC-OPF层，使用生成成本作为损失函数，以便在训练和推断过程中强制执行所有物理网络约束。此外，我们采用了一种定制的权重-偏置初始化，使每次前向传递从第一次迭代开始就可行，这允许在大规模电网上稳定学习。一旦训练完成，所提出的DA-DNN可以在与解决DC-OPF相同的时间内产生一个可证实的拓扑和调度对，而传统的混合整数求解器变得难以处理。因此，所提出的方法成功地捕捉了OTS的经济优势，同时保持了可扩展性。

更新时间: 2025-07-23 04:39:29

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2507.17194v1

Spintronic Bayesian Hardware Driven by Stochastic Magnetic Domain Wall Dynamics

As artificial intelligence (AI) advances into diverse applications, ensuring reliability of AI models is increasingly critical. Conventional neural networks offer strong predictive capabilities but produce deterministic outputs without inherent uncertainty estimation, limiting their reliability in safety-critical domains. Probabilistic neural networks (PNNs), which introduce randomness, have emerged as a powerful approach for enabling intrinsic uncertainty quantification. However, traditional CMOS architectures are inherently designed for deterministic operation and actively suppress intrinsic randomness. This poses a fundamental challenge for implementing PNNs, as probabilistic processing introduces significant computational overhead. To address this challenge, we introduce a Magnetic Probabilistic Computing (MPC) platform-an energy-efficient, scalable hardware accelerator that leverages intrinsic magnetic stochasticity for uncertainty-aware computing. This physics-driven strategy utilizes spintronic systems based on magnetic domain walls (DWs) and their dynamics to establish a new paradigm of physical probabilistic computing for AI. The MPC platform integrates three key mechanisms: thermally induced DW stochasticity, voltage controlled magnetic anisotropy (VCMA), and tunneling magnetoresistance (TMR), enabling fully electrical and tunable probabilistic functionality at the device level. As a representative demonstration, we implement a Bayesian Neural Network (BNN) inference structure and validate its functionality on CIFAR-10 classification tasks. Compared to standard 28nm CMOS implementations, our approach achieves a seven orders of magnitude improvement in the overall figure of merit, with substantial gains in area efficiency, energy consumption, and speed. These results underscore the MPC platform's potential to enable reliable and trustworthy physical AI systems.

Updated: 2025-07-23 04:39:04

标题: 自旋电子贝叶斯硬件驱动的随机磁域壁动力学

摘要: 随着人工智能（AI）进入各种应用领域，确保AI模型的可靠性变得日益关键。传统的神经网络提供强大的预测能力，但产生确定性输出且无固有的不确定性估计，限制了它们在安全关键领域的可靠性。引入随机性的概率神经网络（PNNs）已经成为一种强大的方法，用于实现内在不确定性量化。然而，传统的CMOS架构天生设计用于确定性操作，并主动抑制内在随机性。这对于实现PNNs构成了基本挑战，因为概率处理引入了显着的计算开销。为了解决这一挑战，我们引入了一种磁性概率计算（MPC）平台-一种能效高、可扩展的硬件加速器，利用内在磁性随机性进行不确定性感知计算。这种基于物理的策略利用基于磁性域墙（DWs）及其动态的自旋电子系统，建立了一种新的物理概率计算范式，用于AI。MPC平台集成了三个关键机制：热诱导的DW随机性、电压控制的磁各向异性（VCMA）和隧道磁电阻（TMR），在器件级别实现了完全电气化和可调节的概率功能。作为代表性示范，我们实现了一个贝叶斯神经网络（BNN）推理结构，并在CIFAR-10分类任务上验证了其功能。与标准的28nm CMOS实现相比，我们的方法在总体评价指标上取得了七个数量级的改进，面积效率、能耗和速度方面都有显著提高。这些结果凸显了MPC平台实现可靠且可信的物理AI系统的潜力。

更新时间: 2025-07-23 04:39:04

领域: physics.app-ph,cs.LG

下载: http://arxiv.org/abs/2507.17193v1

Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems

The increasing frequency of extreme weather events due to global climate change urges accurate weather prediction. Recently, great advances have been made by the \textbf{end-to-end methods}, thanks to deep learning techniques, but they face limitations of \textit{representation inconsistency} in multivariable integration and struggle to effectively capture the dependency between variables, which is required in complex weather systems. Treating different variables as distinct modalities and applying a \textbf{two-stage training approach} from multimodal models can partially alleviate this issue, but due to the inconformity in training tasks between the two stages, the results are often suboptimal. To address these challenges, we propose an implicit two-stage training method, configuring separate encoders and decoders for each variable. In detailed, in the first stage, the Translator is frozen while the Encoders and Decoders learn a shared latent space, in the second stage, the Encoders and Decoders are frozen, and the Translator captures inter-variable interactions for prediction. Besides, by introducing a self-attention mechanism for multivariable fusion in the latent space, the performance achieves further improvements. Empirically, extensive experiments show the state-of-the-art performance of our method. Specifically, it reduces the MSE for near-surface air temperature and relative humidity predictions by 28.82\% and 23.39\%, respectively. The source code is available at https://github.com/ShremG/Met2Net.

Updated: 2025-07-23 04:26:56

标题: Met$^2$Net：一个用于复杂气象系统的解耦两阶段时空预测模型

摘要: 由于全球气候变化导致极端天气事件频率增加，迫使准确的天气预测。最近，由于深度学习技术，\textbf{端到端方法}取得了巨大进展，但它们在多变量集成中面临\textit{表示不一致}的限制，并且难以有效捕捉变量之间的依赖关系，这在复杂的天气系统中是必需的。将不同变量视为不同的模态，并从多模态模型中应用\textbf{两阶段训练方法}，可以部分缓解这个问题，但由于两个阶段之间训练任务的不一致性，结果通常不理想。为了解决这些挑战，我们提出了一种隐式的两阶段训练方法，为每个变量配置单独的编码器和解码器。具体而言，在第一阶段，翻译器被冻结，而编码器和解码器学习共享的潜在空间，在第二阶段，编码器和解码器被冻结，翻译器捕获预测的变量间相互作用。此外，在潜在空间中引入自注意机制进行多变量融合，性能得到进一步提升。经验上，大量实验证明了我们方法的最先进性能。具体而言，它将近地表气温和相对湿度预测的均方误差分别降低了28.82\%和23.39\%。源代码可在https://github.com/ShremG/Met2Net找到。

更新时间: 2025-07-23 04:26:56

领域: cs.LG

下载: http://arxiv.org/abs/2507.17189v1

LLM Meets the Sky: Heuristic Multi-Agent Reinforcement Learning for Secure Heterogeneous UAV Networks

This work tackles the physical layer security (PLS) problem of maximizing the secrecy rate in heterogeneous UAV networks (HetUAVNs) under propulsion energy constraints. Unlike prior studies that assume uniform UAV capabilities or overlook energy-security trade-offs, we consider a realistic scenario where UAVs with diverse payloads and computation resources collaborate to serve ground terminals in the presence of eavesdroppers. To manage the complex coupling between UAV motion and communication, we propose a hierarchical optimization framework. The inner layer uses a semidefinite relaxation (SDR)-based S2DC algorithm combining penalty functions and difference-of-convex (d.c.) programming to solve the secrecy precoding problem with fixed UAV positions. The outer layer introduces a Large Language Model (LLM)-guided heuristic multi-agent reinforcement learning approach (LLM-HeMARL) for trajectory optimization. LLM-HeMARL efficiently incorporates expert heuristics policy generated by the LLM, enabling UAVs to learn energy-aware, security-driven trajectories without the inference overhead of real-time LLM calls. The simulation results show that our method outperforms existing baselines in secrecy rate and energy efficiency, with consistent robustness across varying UAV swarm sizes and random seeds.

Updated: 2025-07-23 04:22:57

标题: LLM遇见天空：用于安全异构无人机网络的启发式多智能体强化学习

摘要: 这项工作解决了在推进能源约束下，如何最大化异构无人机网络（HetUAVNs）中的物理层安全（PLS）问题，以提高保密率。与先前的研究不同，先前的研究假设无人机能力均匀或忽视能源安全权衡，我们考虑了一个现实情景，即具有不同有效载荷和计算资源的无人机协作为地面终端提供服务，并存在窃听者。为了管理无人机运动和通信之间的复杂耦合，我们提出了一个分层优化框架。内层使用基于半定松弛（SDR）的S2DC算法，结合惩罚函数和凸差（d.c.）规划，以解决具有固定无人机位置的保密预编码问题。外层引入了一个大型语言模型（LLM）引导的启发式多智能体强化学习方法（LLM-HeMARL）用于轨迹优化。LLM-HeMARL有效地结合了LLM生成的专家启发式策略，使无人机能够学习能源感知、安全驱动的轨迹，而无需实时LLM调用的推断开销。模拟结果表明，我们的方法在保密率和能源效率方面优于现有基线，在不同无人机群体规模和随机种子下具有一致的鲁棒性。

更新时间: 2025-07-23 04:22:57

领域: cs.NI,cs.AI,cs.CR

下载: http://arxiv.org/abs/2507.17188v1

Learning-based Privacy-Preserving Graph Publishing Against Sensitive Link Inference Attacks

Publishing graph data is widely desired to enable a variety of structural analyses and downstream tasks. However, it also potentially poses severe privacy leakage, as attackers may leverage the released graph data to launch attacks and precisely infer private information such as the existence of hidden sensitive links in the graph. Prior studies on privacy-preserving graph data publishing relied on heuristic graph modification strategies and it is difficult to determine the graph with the optimal privacy--utility trade-off for publishing. In contrast, we propose the first privacy-preserving graph structure learning framework against sensitive link inference attacks, named PPGSL, which can automatically learn a graph with the optimal privacy--utility trade-off. The PPGSL operates by first simulating a powerful surrogate attacker conducting sensitive link attacks on a given graph. It then trains a parameterized graph to defend against the simulated adversarial attacks while maintaining the favorable utility of the original graph. To learn the parameters of both parts of the PPGSL, we introduce a secure iterative training protocol. It can enhance privacy preservation and ensure stable convergence during the training process, as supported by the theoretical proof. Additionally, we incorporate multiple acceleration techniques to improve the efficiency of the PPGSL in handling large-scale graphs. The experimental results confirm that the PPGSL achieves state-of-the-art privacy--utility trade-off performance and effectively thwarts various sensitive link inference attacks.

Updated: 2025-07-23 04:19:29

标题: 基于学习的隐私保护图发布抵御敏感链接推断攻击

摘要: 发布图形数据被广泛希望，以便实现各种结构分析和下游任务。然而，它也可能导致严重的隐私泄露，因为攻击者可能利用发布的图形数据发动攻击，并准确推断出图中隐藏敏感链接的存在等私人信息。先前关于隐私保护图数据发布的研究依赖于启发式图修改策略，很难确定用于发布的具有最佳隐私-效用权衡的图形。相比之下，我们提出了第一个针对敏感链接推断攻击的隐私保护图结构学习框架，命名为PPGSL，它可以自动学习具有最佳隐私-效用权衡的图形。PPGSL首先通过模拟一个强大的替代攻击者对给定图进行敏感链接攻击。然后训练一个参数化图来抵御模拟的敌对攻击，同时保持原始图的有利效用。为了学习PPGSL的两个部分的参数，我们引入了一种安全的迭代训练协议。它可以增强隐私保护，并确保在训练过程中稳定收敛，这得到了理论证明的支持。此外，我们还结合了多种加速技术，以提高PPGSL处理大规模图形的效率。实验结果证实，PPGSL实现了最先进的隐私-效用权衡性能，并有效地挫败了各种敏感链接推断攻击。

更新时间: 2025-07-23 04:19:29

领域: cs.CR

下载: http://arxiv.org/abs/2507.21139v1

Asymmetric Lesion Detection with Geometric Patterns and CNN-SVM Classification

In dermoscopic images, which allow visualization of surface skin structures not visible to the naked eye, lesion shape offers vital insights into skin diseases. In clinically practiced methods, asymmetric lesion shape is one of the criteria for diagnosing melanoma. Initially, we labeled data for a non-annotated dataset with symmetrical information based on clinical assessments. Subsequently, we propose a supporting technique, a supervised learning image processing algorithm, to analyze the geometrical pattern of lesion shape, aiding non-experts in understanding the criteria of an asymmetric lesion. We then utilize a pre-trained convolutional neural network (CNN) to extract shape, color, and texture features from dermoscopic images for training a multiclass support vector machine (SVM) classifier, outperforming state-of-the-art methods from the literature. In the geometry-based experiment, we achieved a 99.00% detection rate for dermatological asymmetric lesions. In the CNN-based experiment, the best performance is found with 94% Kappa Score, 95% Macro F1-score, and 97% Weighted F1-score for classifying lesion shapes (Asymmetric, Half-Symmetric, and Symmetric).

Updated: 2025-07-23 04:17:57

标题: 使用几何图案和CNN-SVM分类进行非对称病变检测

摘要: 在皮肤镜图像中，可以可视化肉眼不可见的表面皮肤结构，病变形状提供了对皮肤疾病的重要见解。在临床实践方法中，不对称的病变形状是诊断黑色素瘤的标准之一。最初，我们根据临床评估为一个非注释数据集标记了具有对称信息的数据。随后，我们提出了一种支持技术，一种监督学习图像处理算法，用于分析病变形状的几何图案，帮助非专家理解不对称病变的标准。然后，我们利用一个预训练的卷积神经网络（CNN）从皮肤镜图像中提取形状、颜色和纹理特征，用于训练一个多类支持向量机（SVM）分类器，优于文献中的最先进方法。在基于几何的实验中，我们实现了对皮肤病不对称病变的99.00%检测率。在基于CNN的实验中，最佳表现是94%的Kappa分数，95%的宏F1分数和97%的加权F1分数，用于分类病变形状（不对称、半对称和对称）。

更新时间: 2025-07-23 04:17:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17185v1

Regret Minimization in Population Network Games: Vanishing Heterogeneity and Convergence to Equilibria

Understanding and predicting the behavior of large-scale multi-agents in games remains a fundamental challenge in multi-agent systems. This paper examines the role of heterogeneity in equilibrium formation by analyzing how smooth regret-matching drives a large number of heterogeneous agents with diverse initial policies toward unified behavior. By modeling the system state as a probability distribution of regrets and analyzing its evolution through the continuity equation, we uncover a key phenomenon in diverse multi-agent settings: the variance of the regret distribution diminishes over time, leading to the disappearance of heterogeneity and the emergence of consensus among agents. This universal result enables us to prove convergence to quantal response equilibria in both competitive and cooperative multi-agent settings. Our work advances the theoretical understanding of multi-agent learning and offers a novel perspective on equilibrium selection in diverse game-theoretic scenarios.

Updated: 2025-07-23 04:13:56

标题: 人口网络游戏中的后悔最小化：异质性消失和收敛到均衡

摘要: 理解和预测大规模多主体在游戏中的行为仍然是多主体系统中的一个基本挑战。本文通过分析异质性在均衡形成中的作用，研究了平滑遗憾匹配如何驱动大量具有不同初始策略的异质主体走向统一行为。通过将系统状态建模为遗憾的概率分布，并通过连续性方程分析其演化，我们揭示了多元主体环境中的一个关键现象：遗憾分布的方差随时间减小，导致异质性消失，并使代理人之间形成共识。这一普遍结果使我们能够证明在竞争和合作多主体环境中收敛到量子响应均衡。我们的工作推动了对多主体学习的理论理解，并为多样化博弈理论场景中的均衡选择提供了新的视角。

更新时间: 2025-07-23 04:13:56

领域: cs.GT,cs.AI,cs.MA

下载: http://arxiv.org/abs/2507.17183v1

A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis

Data perturbation-based privacy-preserving methods have been widely adopted in various scenarios due to their efficiency and the elimination of the need for a trusted third party. However, these methods primarily focus on individual statistical indicators, neglecting the overall quality of the collected data from a distributional perspective. Consequently, they often fall short of meeting the diverse statistical analysis requirements encountered in practical data analysis. As a promising sensitive data perturbation method, negative survey methods is able to complete the task of collecting sensitive information distribution while protecting personal privacy. Yet, existing negative survey methods are primarily designed for discrete sensitive information and are inadequate for real-valued data distributions. To bridge this gap, this paper proposes a novel real-value negative survey model, termed RVNS, for the first time in the field of real-value sensitive information collection. The RVNS model exempts users from the necessity of discretizing their data and only requires them to sample a set of data from a range that deviates from their actual sensitive details, thereby preserving the privacy of their genuine information. Moreover, to accurately capture the distribution of sensitive information, an optimization problem is formulated, and a novel approach is employed to solve it. Rigorous theoretical analysis demonstrates that the RVNS model conforms to the differential privacy model, ensuring robust privacy preservation. Comprehensive experiments conducted on both synthetic and real-world datasets further validate the efficacy of the proposed method.

Updated: 2025-07-23 04:05:33

标题: 一种用于多样化统计分析的隐私保护数据收集方法

摘要: 基于数据扰动的隐私保护方法由于其高效性和消除了对可信第三方的需求而被广泛应用于各种场景中。然而，这些方法主要集中在个体统计指标上，忽视了从分布角度来看所收集数据的整体质量。因此，它们经常无法满足在实际数据分析中遇到的多样化统计分析要求。作为一种有前途的敏感数据扰动方法，负采访方法能够完成收集敏感信息分布的任务，同时保护个人隐私。然而，现有的负采访方法主要设计用于离散的敏感信息，对于实值数据分布来说则不足够。为了弥补这一差距，本文首次在实值敏感信息收集领域提出了一种新颖的实值负采访模型，称为RVNS模型。RVNS模型使用户免除了将其数据离散化的必要性，只需要从一个偏离其实际敏感细节的范围中抽样一组数据，从而保护其真实信息的隐私。此外，为了准确捕捉敏感信息的分布，提出了一个优化问题，并采用了一种新颖的方法来解决它。严格的理论分析表明，RVNS模型符合差分隐私模型，确保了强大的隐私保护。在合成和真实世界数据集上进行的综合实验进一步验证了所提方法的有效性。

更新时间: 2025-07-23 04:05:33

领域: cs.CR

下载: http://arxiv.org/abs/2507.17180v1

SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs

Although large language models (LLMs) have made significant progress in understanding Structured Knowledge (SK) like KG and Table, existing evaluations for SK understanding are non-rigorous (i.e., lacking evaluations of specific capabilities) and focus on a single type of SK. Therefore, we aim to propose a more comprehensive and rigorous structured knowledge understanding benchmark to diagnose the shortcomings of LLMs. In this paper, we introduce SKA-Bench, a Structured Knowledge Augmented QA Benchmark that encompasses four widely used structured knowledge forms: KG, Table, KG+Text, and Table+Text. We utilize a three-stage pipeline to construct SKA-Bench instances, which includes a question, an answer, positive knowledge units, and noisy knowledge units. To evaluate the SK understanding capabilities of LLMs in a fine-grained manner, we expand the instances into four fundamental ability testbeds: Noise Robustness, Order Insensitivity, Information Integration, and Negative Rejection. Empirical evaluations on 8 representative LLMs, including the advanced DeepSeek-R1, indicate that existing LLMs still face significant challenges in understanding structured knowledge, and their performance is influenced by factors such as the amount of noise, the order of knowledge units, and hallucination phenomenon. Our dataset and code are available at https://github.com/Lza12a/SKA-Bench.

Updated: 2025-07-23 03:52:24

标题: SKA-Bench：用于评估LLMs结构化知识理解的细粒度基准测试

摘要: 尽管大型语言模型（LLMs）在理解结构化知识（SK）如知识图谱和表格方面取得了重大进展，但现有的SK理解评估缺乏严格性（即缺乏对具体能力的评估），并且专注于单一类型的SK。因此，我们旨在提出一个更全面且严格的结构化知识理解基准，以诊断LLMs的不足之处。在本文中，我们介绍了SKA-Bench，一个结构化知识增强问答基准，涵盖了四种广泛使用的结构化知识形式：知识图谱、表格、知识图谱+文本和表格+文本。我们利用一个三阶段流程构建SKA-Bench实例，其中包括一个问题，一个答案，正知识单元和嘈杂知识单元。为了以细粒度方式评估LLMs的SK理解能力，我们将实例扩展为四个基本能力测试平台：噪声鲁棒性、顺序不敏感性、信息整合和负面拒绝。对包括先进的DeepSeek-R1在内的8个代表性LLMs的实证评估表明，现有的LLMs在理解结构化知识方面仍面临重大挑战，其性能受到噪声量、知识单元顺序和幻觉现象等因素的影响。我们的数据集和代码可在https://github.com/Lza12a/SKA-Bench 上找到。

更新时间: 2025-07-23 03:52:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.17178v1

Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints

Improving the reliability of large language models (LLMs) is critical for deploying them in real-world scenarios. In this paper, we propose \textbf{Deliberative Searcher}, the first framework to integrate certainty calibration with retrieval-based search for open-domain question answering. The agent performs multi-step reflection and verification over Wikipedia data and is trained with a reinforcement learning algorithm that optimizes for accuracy under a soft reliability constraint. Empirical results show that proposed method improves alignment between model confidence and correctness, leading to more trustworthy outputs. This paper will be continuously updated.

Updated: 2025-07-23 03:52:14

标题: 深思熟虑的搜索者：通过受限制的强化学习提高LLM可靠性

摘要: 提高大型语言模型（LLMs）的可靠性对于在现实世界场景中部署它们至关重要。在本文中，我们提出了\textbf{审慎搜索者}，这是第一个将确定性校准与基于检索的开放域问答结合的框架。该代理程序对维基百科数据进行多步反思和验证，并使用强化学习算法进行训练，以在软可靠性约束下优化准确性。实证结果表明，所提出的方法改善了模型置信度和正确性之间的对齐，从而产生更加可信赖的输出。本文将持续更新。

更新时间: 2025-07-23 03:52:14

领域: cs.AI

下载: http://arxiv.org/abs/2507.16727v2

GhostUMAP2: Measuring and Analyzing (r,d)-Stability of UMAP

Despite the widespread use of Uniform Manifold Approximation and Projection (UMAP), the impact of its stochastic optimization process on the results remains underexplored. We observed that it often produces unstable results where the projections of data points are determined mostly by chance rather than reflecting neighboring structures. To address this limitation, we introduce (r,d)-stability to UMAP: a framework that analyzes the stochastic positioning of data points in the projection space. To assess how stochastic elements, specifically initial projection positions and negative sampling, impact UMAP results, we introduce "ghosts", or duplicates of data points representing potential positional variations due to stochasticity. We define a data point's projection as (r,d)-stable if its ghosts perturbed within a circle of radius r in the initial projection remain confined within a circle of radius d for their final positions. To efficiently compute the ghost projections, we develop an adaptive dropping scheme that reduces a runtime up to 60% compared to an unoptimized baseline while maintaining approximately 90% of unstable points. We also present a visualization tool that supports the interactive exploration of the (r,d)-stability of data points. Finally, we demonstrate the effectiveness of our framework by examining the stability of projections of real-world datasets and present usage guidelines for the effective use of our framework.

Updated: 2025-07-23 03:40:53

标题: GhostUMAP2：测量和分析UMAP的（r，d）稳定性

摘要: 尽管Uniform Manifold Approximation and Projection (UMAP)被广泛使用，但其随机优化过程对结果的影响仍未得到充分探讨。我们观察到，它经常产生不稳定的结果，其中数据点的投影主要由机会决定，而不是反映邻近结构。为了解决这一限制，我们引入了(r,d)-稳定性到UMAP：一个分析数据点在投影空间中随机定位的框架。为了评估随机元素，特别是初始投影位置和负采样，对UMAP结果的影响，我们引入了“ghosts”，或者表示由于随机性可能导致位置变化的数据点的复制品。我们定义数据点的投影为(r,d)-稳定，如果其ghosts在初始投影中被扰动在半径为r的圆内，并在最终位置中限制在半径为d的圆内。为了有效计算ghost projections，我们开发了一种自适应丢弃方案，与未优化的基线相比，可以将运行时间缩短高达60%，同时保持大约90%的不稳定点。我们还提供了一个可视化工具，支持对数据点的(r,d)-稳定性进行交互式探索。最后，我们通过检查真实世界数据集的投影稳定性来展示我们框架的有效性，并提供了有效使用我们框架的指导方针。

更新时间: 2025-07-23 03:40:53

领域: cs.GR,cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.17174v1

Privacy-Preserving Multimodal News Recommendation through Federated Learning

Personalized News Recommendation systems (PNR) have emerged as a solution to information overload by predicting and suggesting news items tailored to individual user interests. However, traditional PNR systems face several challenges, including an overreliance on textual content, common neglect of short-term user interests, and significant privacy concerns due to centralized data storage. This paper addresses these issues by introducing a novel multimodal federated learning-based approach for news recommendation. First, it integrates both textual and visual features of news items using a multimodal model, enabling a more comprehensive representation of content. Second, it employs a time-aware model that balances users' long-term and short-term interests through multi-head self-attention networks, improving recommendation accuracy. Finally, to enhance privacy, a federated learning framework is implemented, enabling collaborative model training without sharing user data. The framework divides the recommendation model into a large server-maintained news model and a lightweight user model shared between the server and clients. The client requests news representations (vectors) and a user model from the central server, then computes gradients with user local data, and finally sends their locally computed gradients to the server for aggregation. The central server aggregates gradients to update the global user model and news model. The updated news model is further used to infer news representation by the server. To further safeguard user privacy, a secure aggregation algorithm based on Shamir's secret sharing is employed. Experiments on a real-world news dataset demonstrate strong performance compared to existing systems, representing a significant advancement in privacy-preserving personalized news recommendation.

Updated: 2025-07-23 03:40:18

标题: 隐私保护的多模态新闻推荐系统通过联邦学习实现

摘要: 个性化新闻推荐系统（PNR）已经成为解决信息过载问题的一种解决方案，它通过预测并推荐符合个人用户兴趣的新闻项目。然而，传统PNR系统面临着几个挑战，包括对文本内容过于依赖、常常忽视短期用户兴趣以及由于集中数据存储而产生的重大隐私问题。本文通过引入一种基于多模式联邦学习的新颖方法来解决这些问题。首先，它使用多模式模型整合了新闻项目的文本和视觉特征，实现了对内容的更全面的表示。其次，它采用了一个考虑用户长期和短期兴趣的时间感知模型，通过多头自注意力网络来提高推荐准确性。最后，为了增强隐私保护，实施了一个联邦学习框架，实现了协作模型训练而不共享用户数据。该框架将推荐模型分为由大型服务器维护的新闻模型和在服务器和客户端之间共享的轻量级用户模型。客户端请求从中央服务器获取新闻表示（向量）和用户模型，然后使用本地数据计算梯度，并最终将其本地计算的梯度发送到服务器进行聚合。中央服务器聚合梯度以更新全局用户模型和新闻模型。更新后的新闻模型进一步由服务器用于推断新闻表示。为进一步保护用户隐私，采用了基于Shamir's秘密共享的安全聚合算法。在一个真实的新闻数据集上进行的实验表明，与现有系统相比，该方法表现出强大的性能，代表了隐私保护的个性化新闻推荐的重大进展。

更新时间: 2025-07-23 03:40:18

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2507.15460v3

Improving LLMs' Generalized Reasoning Abilities by Graph Problems

Large Language Models (LLMs) have made remarkable strides in reasoning tasks, yet their performance often falters on novel and complex problems. Domain-specific continued pretraining (CPT) methods, such as those tailored for mathematical reasoning, have shown promise but lack transferability to broader reasoning tasks. In this work, we pioneer the use of Graph Problem Reasoning (GPR) to enhance the general reasoning capabilities of LLMs. GPR tasks, spanning pathfinding, network analysis, numerical computation, and topological reasoning, require sophisticated logical and relational reasoning, making them ideal for teaching diverse reasoning patterns. To achieve this, we introduce GraphPile, the first large-scale corpus specifically designed for CPT using GPR data. Spanning 10.9 billion tokens across 23 graph tasks, the dataset includes chain-of-thought, program-of-thought, trace of execution, and real-world graph data. Using GraphPile, we train GraphMind on popular base models Llama 3 and 3.1, as well as Gemma 2, achieving up to 4.9 percent higher accuracy in mathematical reasoning and up to 21.2 percent improvement in non-mathematical reasoning tasks such as logical and commonsense reasoning. By being the first to harness GPR for enhancing reasoning patterns and introducing the first dataset of its kind, our work bridges the gap between domain-specific pretraining and universal reasoning capabilities, advancing the adaptability and robustness of LLMs.

Updated: 2025-07-23 03:19:57

标题: 通过图问题提高LLMs的泛化推理能力

摘要: 大型语言模型(LLMs)在推理任务上取得了显著进展，但它们在新颖和复杂问题上的表现常常不尽如人意。针对特定领域的持续预训练(CPT)方法，比如针对数学推理的方法，显示出了潜力，但缺乏对更广泛推理任务的可转移性。在这项工作中，我们首次利用图问题推理(GPR)来增强LLMs的一般推理能力。GPR任务涵盖了寻路、网络分析、数值计算和拓扑推理等领域，需要复杂的逻辑和关系推理，使其成为教授多样推理模式的理想选择。为了实现这一目标，我们引入了GraphPile，这是第一个专门设计用于使用GPR数据进行CPT的大规模语料库。该数据集跨越了23个图任务，共包含109亿个标记，包括思维链、思维程序、执行跟踪和现实世界图数据。利用GraphPile，我们对流行的基础模型Llama 3和3.1以及Gemma 2进行了培训，数学推理准确率提高了4.9%，非数学推理任务如逻辑和常识推理提高了21.2%。通过首次利用GPR来增强推理模式，并引入这一类别的第一个数据集，我们的工作弥合了领域特定预训练和通用推理能力之间的差距，提升了LLMs的适应性和鲁棒性。

更新时间: 2025-07-23 03:19:57

领域: cs.AI

下载: http://arxiv.org/abs/2507.17168v1

Unmasking Trees for Tabular Data

Despite much work on advanced deep learning and generative modeling techniques for tabular data generation and imputation, traditional methods have continued to win on imputation benchmarks. We herein present UnmaskingTrees, a simple method for tabular imputation (and generation) employing gradient-boosted decision trees which are used to incrementally unmask individual features. On a benchmark for out-of-the-box performance on 27 small tabular datasets, UnmaskingTrees offers leading performance on imputation; state-of-the-art performance on generation given data with missingness; and competitive performance on vanilla generation given data without missingness. To solve the conditional generation subproblem, we propose a tabular probabilistic prediction method, BaltoBot, which fits a balanced tree of boosted tree classifiers. Unlike older methods, it requires no parametric assumption on the conditional distribution, accommodating features with multimodal distributions; unlike newer diffusion methods, it offers fast sampling, closed-form density estimation, and flexible handling of discrete variables. We finally consider our two approaches as meta-algorithms, demonstrating in-context learning-based generative modeling with TabPFN.

Updated: 2025-07-23 03:16:46

标题: 揭示表格数据中的树结构

摘要: 尽管在对表格数据生成和插补方面进行了大量关于先进深度学习和生成建模技术的研究，传统方法在插补基准测试中仍然获胜。我们在这里提出UnmaskingTrees，这是一种简单的表格插补（和生成）方法，采用了梯度提升决策树，用于逐步揭示单个特征。在27个小型表格数据集的开箱即用性能基准测试中，UnmaskingTrees在插补方面提供了领先的性能；在给定缺失数据的情况下，生成效果达到了最先进水平；在给定没有缺失数据的情况下，生成效果竞争力强。为了解决条件生成子问题，我们提出了一种表格概率预测方法BaltoBot，它适合使用平衡树的提升树分类器。与旧方法不同，它不需要对条件分布进行参数假设，可容纳具有多模态分布的特征；与较新的扩散方法不同，它提供快速抽样、封闭形式密度估计和灵活处理离散变量。最后，我们将我们的两种方法视为元算法，通过TabPFN展示上下文学习的生成建模。

更新时间: 2025-07-23 03:16:46

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.05593v5

Attention-Based Multiscale Temporal Fusion Network for Uncertain-Mode Fault Diagnosis in Multimode Processes

Fault diagnosis in multimode processes plays a critical role in ensuring the safe operation of industrial systems across multiple modes. It faces a great challenge yet to be addressed - that is, the significant distributional differences among monitoring data from multiple modes make it difficult for the models to extract shared feature representations related to system health conditions. In response to this problem, this paper introduces a novel method called attention-based multiscale temporal fusion network. The multiscale depthwise convolution and gated recurrent unit are employed to extract multiscale contextual local features and long-short-term features. Instance normalization is applied to suppress mode-specific information. Furthermore, a temporal attention mechanism is designed to focus on critical time points with higher cross-mode shared information, thereby enhancing the accuracy of fault diagnosis. The proposed model is applied to Tennessee Eastman process dataset and three-phase flow facility dataset. The experiments demonstrate that the proposed model achieves superior diagnostic performance and maintains a small model size. The source code will be available on GitHub at https://github.com/GuangqiangLi/AMTFNet.

Updated: 2025-07-23 03:13:44

标题: 基于注意力的多尺度时间融合网络用于多模式过程中的不确定模式故障诊断

摘要: 多模式过程中的故障诊断在确保工业系统安全运行方面起着关键作用。它面临一个尚未解决的巨大挑战 - 即来自多种模式监测数据的显著分布差异使模型难以提取与系统健康状态相关的共享特征表示。针对这个问题，本文介绍了一种名为基于注意力的多尺度时间融合网络的新方法。多尺度深度卷积和门控循环单元被用来提取多尺度上下文局部特征和长短期特征。实例归一化被应用于抑制特定于模式的信息。此外，设计了一个时间注意力机制，以便关注具有更高交叉模式共享信息的关键时间点，从而提高故障诊断的准确性。该模型被应用于田纳西伊斯曼过程数据集和三相流设施数据集。实验表明，所提出的模型实现了优越的诊断性能并保持了较小的模型尺寸。源代码将在GitHub上提供：https://github.com/GuangqiangLi/AMTFNet。

更新时间: 2025-07-23 03:13:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.05172v3

Frequency-Dynamic Attention Modulation for Dense Prediction

Vision Transformers (ViTs) have significantly advanced computer vision, demonstrating strong performance across various tasks. However, the attention mechanism in ViTs makes each layer function as a low-pass filter, and the stacked-layer architecture in existing transformers suffers from frequency vanishing. This leads to the loss of critical details and textures. We propose a novel, circuit-theory-inspired strategy called Frequency-Dynamic Attention Modulation (FDAM), which can be easily plugged into ViTs. FDAM directly modulates the overall frequency response of ViTs and consists of two techniques: Attention Inversion (AttInv) and Frequency Dynamic Scaling (FreqScale). Since circuit theory uses low-pass filters as fundamental elements, we introduce AttInv, a method that generates complementary high-pass filtering by inverting the low-pass filter in the attention matrix, and dynamically combining the two. We further design FreqScale to weight different frequency components for fine-grained adjustments to the target response function. Through feature similarity analysis and effective rank evaluation, we demonstrate that our approach avoids representation collapse, leading to consistent performance improvements across various models, including SegFormer, DeiT, and MaskDINO. These improvements are evident in tasks such as semantic segmentation, object detection, and instance segmentation. Additionally, we apply our method to remote sensing detection, achieving state-of-the-art results in single-scale settings. The code is available at https://github.com/Linwei-Chen/FDAM.

Updated: 2025-07-23 03:10:31

标题: 频率动态注意力调节用于密集预测

摘要: Vision Transformers (ViTs)在计算机视觉领域取得了显著进展，在各种任务中表现出较强的性能。然而，ViTs中的注意力机制使每一层的功能都像一个低通滤波器，而现有变压器中的堆叠层架构在频率消失方面存在问题。这导致关键细节和纹理的丢失。我们提出了一种新颖的、受电路理论启发的策略，称为频率动态注意力调制（FDAM），可以轻松地插入到ViTs中。FDAM直接调制ViTs的整体频率响应，包括两种技术：注意力反转（AttInv）和频率动态缩放（FreqScale）。由于电路理论使用低通滤波器作为基本元素，我们引入了AttInv，一种通过反转注意力矩阵中的低通滤波器来生成互补的高通滤波器，并动态地将两者结合在一起的方法。我们进一步设计了FreqScale，用于对目标响应函数进行细粒度调整的不同频率成分进行加权。通过特征相似性分析和有效秩评估，我们证明了我们的方法避免了表示坍缩，导致了在各种模型中的一致性性能改进，包括SegFormer、DeiT和MaskDINO。这些改进在语义分割、目标检测和实例分割等任务中明显。此外，我们将我们的方法应用于遥感检测，在单尺度设置中取得了最新的结果。代码可在https://github.com/Linwei-Chen/FDAM 上找到。

更新时间: 2025-07-23 03:10:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.12006v2

From Hypothesis to Publication: A Comprehensive Survey of AI-Driven Research Support Systems

Research is a fundamental process driving the advancement of human civilization, yet it demands substantial time and effort from researchers. In recent years, the rapid development of artificial intelligence (AI) technologies has inspired researchers to explore how AI can accelerate and enhance research. To monitor relevant advancements, this paper presents a systematic review of the progress in this domain. Specifically, we organize the relevant studies into three main categories: hypothesis formulation, hypothesis validation, and manuscript publication. Hypothesis formulation involves knowledge synthesis and hypothesis generation. Hypothesis validation includes the verification of scientific claims, theorem proving, and experiment validation. Manuscript publication encompasses manuscript writing and the peer review process. Furthermore, we identify and discuss the current challenges faced in these areas, as well as potential future directions for research. Finally, we also offer a comprehensive overview of existing benchmarks and tools across various domains that support the integration of AI into the research process. We hope this paper serves as an introduction for beginners and fosters future research. Resources have been made publicly available at https://github.com/zkzhou126/AI-for-Research.

Updated: 2025-07-23 03:06:25

标题: 从假设到发表：人工智能驱动的研究支持系统的综合调查

摘要: 研究是推动人类文明进步的基本过程，但它需要研究人员投入大量的时间和精力。近年来，人工智能（AI）技术的快速发展激发了研究人员探索如何利用AI加速和增强研究。为了监测相关进展，本文提出了对该领域进展的系统性回顾。具体而言，我们将相关研究分为三个主要类别：假设制定、假设验证和文稿出版。假设制定涉及知识综合和假设生成。假设验证包括科学主张的验证、定理证明和实验验证。文稿出版包括文稿撰写和同行评审过程。此外，我们还识别和讨论了这些领域面临的当前挑战，以及未来研究的潜在方向。最后，我们还提供了跨多个领域的现有基准和工具的全面概述，支持将AI整合到研究过程中。我们希望本文为初学者提供介绍，并促进未来的研究。资源已经公开发布在https://github.com/zkzhou126/AI-for-Research。

更新时间: 2025-07-23 03:06:25

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.01424v2

Spatial Frequency Modulation for Semantic Segmentation

High spatial frequency information, including fine details like textures, significantly contributes to the accuracy of semantic segmentation. However, according to the Nyquist-Shannon Sampling Theorem, high-frequency components are vulnerable to aliasing or distortion when propagating through downsampling layers such as strided-convolution. Here, we propose a novel Spatial Frequency Modulation (SFM) that modulates high-frequency features to a lower frequency before downsampling and then demodulates them back during upsampling. Specifically, we implement modulation through adaptive resampling (ARS) and design a lightweight add-on that can densely sample the high-frequency areas to scale up the signal, thereby lowering its frequency in accordance with the Frequency Scaling Property. We also propose Multi-Scale Adaptive Upsampling (MSAU) to demodulate the modulated feature and recover high-frequency information through non-uniform upsampling This module further improves segmentation by explicitly exploiting information interaction between densely and sparsely resampled areas at multiple scales. Both modules can seamlessly integrate with various architectures, extending from convolutional neural networks to transformers. Feature visualization and analysis confirm that our method effectively alleviates aliasing while successfully retaining details after demodulation. Finally, we validate the broad applicability and effectiveness of SFM by extending it to image classification, adversarial robustness, instance segmentation, and panoptic segmentation tasks. The code is available at https://github.com/Linwei-Chen/SFM.

Updated: 2025-07-23 03:04:09

标题: 空间频率调制用于语义分割

摘要: 高空间频率信息，包括纹理等细节，对语义分割的准确性有着显著贡献。然而，根据奈奎斯特-香农采样定理，高频成分在通过像步进卷积这样的下采样层时容易遭受混淆或失真。在这里，我们提出了一种新颖的空间频率调制（SFM），在下采样之前将高频特征调制到较低频率，然后在上采样时将其解调回来。具体而言，我们通过自适应重采样（ARS）实现调制，并设计了一种轻量级附加模块，可以密集采样高频区域以放大信号，从而根据频率缩放属性降低其频率。我们还提出了多尺度自适应上采样（MSAU）来解调调制特征，并通过非均匀上采样恢复高频信息。该模块通过明确利用多个尺度上密集和稀疏重采样区域之间的信息交互，进一步改进了分割效果。这两个模块可以无缝集成到各种架构中，从卷积神经网络到变换器。特征可视化和分析证实了我们的方法在成功保留细节的同时有效减轻了混淆。最后，我们通过将SFM扩展到图像分类、对抗鲁棒性、实例分割和全景分割任务来验证了其广泛适用性和有效性。代码可在https://github.com/Linwei-Chen/SFM 获取。

更新时间: 2025-07-23 03:04:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.11893v2

Tabular Diffusion based Actionable Counterfactual Explanations for Network Intrusion Detection

Modern network intrusion detection systems (NIDS) frequently utilize the predictive power of complex deep learning models. However, the "black-box" nature of such deep learning methods adds a layer of opaqueness that hinders the proper understanding of detection decisions, trust in the decisions and prevent timely countermeasures against such attacks. Explainable AI (XAI) methods provide a solution to this problem by providing insights into the causes of the predictions. The majority of the existing XAI methods provide explanations which are not convenient to convert into actionable countermeasures. In this work, we propose a novel diffusion-based counterfactual explanation framework that can provide actionable explanations for network intrusion attacks. We evaluated our proposed algorithm against several other publicly available counterfactual explanation algorithms on 3 modern network intrusion datasets. To the best of our knowledge, this work also presents the first comparative analysis of existing counterfactual explanation algorithms within the context of network intrusion detection systems. Our proposed method provide minimal, diverse counterfactual explanations out of the tested counterfactual explanation algorithms in a more efficient manner by reducing the time to generate explanations. We also demonstrate how counterfactual explanations can provide actionable explanations by summarizing them to create a set of global rules. These rules are actionable not only at instance level but also at the global level for intrusion attacks. These global counterfactual rules show the ability to effectively filter out incoming attack queries which is crucial for efficient intrusion detection and defense mechanisms.

Updated: 2025-07-23 02:53:58

标题: 基于表格扩散的可操作性对策反事实解释用于网络入侵检测

摘要: 现代网络入侵检测系统（NIDS）经常利用复杂深度学习模型的预测能力。然而，这种深度学习方法的“黑匣子”性质增加了不透明性，阻碍了对检测决策的正确理解，信任这些决策并防止及时采取对抗这些攻击的措施。可解释AI（XAI）方法通过提供对预测原因的见解，为这一问题提供了解决方案。大多数现有的XAI方法提供的解释不方便转化为可行的对抗措施。在这项工作中，我们提出了一种新颖的基于扩散的反事实解释框架，可以为网络入侵攻击提供可行的解释。我们在3个现代网络入侵数据集上评估了我们提出的算法与其他几种公开可用的反事实解释算法。据我们所知，这项工作还首次在网络入侵检测系统的背景下对现有反事实解释算法进行了比较分析。我们提出的方法以更高效的方式提供最少、多样化的反事实解释，通过减少生成解释所需的时间。我们还展示了反事实解释如何通过总结形成一组全局规则，提供可行的解释。这些规则不仅在实例级别上可操作，而且在全局级别上也可针对入侵攻击。这些全局反事实规则显示了有效过滤传入攻击查询的能力，这对于高效入侵检测和防御机制至关重要。

更新时间: 2025-07-23 02:53:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17161v1

Flexible Coded Distributed Convolution Computing for Enhanced Straggler Resilience and Numerical Stability in Distributed CNNs

Deploying Convolutional Neural Networks (CNNs) on resource-constrained devices necessitates efficient management of computational resources, often via distributed environments susceptible to latency from straggler nodes. This paper introduces the Flexible Coded Distributed Convolution Computing (FCDCC) framework to enhance straggler resilience and numerical stability in distributed CNNs. We extend Coded Distributed Computing (CDC) with Circulant and Rotation Matrix Embedding (CRME) which was originally proposed for matrix multiplication to high-dimensional tensor convolution. For the proposed scheme, referred to as the Numerically Stable Coded Tensor Convolution (NSCTC) scheme, we also propose two new coded partitioning schemes: Adaptive-Padding Coded Partitioning (APCP) for the input tensor and Kernel-Channel Coded Partitioning (KCCP) for the filter tensor. These strategies enable linear decomposition of tensor convolutions and encoding them into CDC subtasks, combining model parallelism with coded redundancy for robust and efficient execution. Theoretical analysis identifies an optimal trade-off between communication and storage costs. Empirical results validate the framework's effectiveness in computational efficiency, straggler resilience, and scalability across various CNN architectures.

Updated: 2025-07-23 02:45:38

标题: 灵活编码的分布式卷积计算：提高分布式CNN中的冗余节点韧性和数值稳定性

摘要: 在资源受限设备上部署卷积神经网络（CNNs）需要有效管理计算资源，通常通过容易受到慢速节点延迟影响的分布式环境来实现。本文介绍了灵活编码分布式卷积计算（FCDCC）框架，以提高分布式CNNs中慢速节点的韧性和数值稳定性。我们将最初用于矩阵乘法的循环和旋转矩阵嵌入（CRME）扩展了编码分布式计算（CDC），以适用于高维张量卷积。对于所提出的方案，被称为数值稳定编码张量卷积（NSCTC）方案，我们还提出了两种新的编码分区方案：自适应填充编码分区（APCP）用于输入张量和核通道编码分区（KCCP）用于滤波器张量。这些策略使张量卷积的线性分解并将其编码成CDC子任务，将模型并行性与编码冗余结合起来，实现强大且高效的执行。理论分析确定了通信和存储成本之间的最佳折衷。实证结果验证了该框架在计算效率、慢速节点韧性和各种CNN架构的可扩展性方面的有效性。

更新时间: 2025-07-23 02:45:38

领域: cs.DC,cs.AI,cs.CV,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2411.01579v2

Parasite: A Steganography-based Backdoor Attack Framework for Diffusion Models

Recently, the diffusion model has gained significant attention as one of the most successful image generation models, which can generate high-quality images by iteratively sampling noise. However, recent studies have shown that diffusion models are vulnerable to backdoor attacks, allowing attackers to enter input data containing triggers to activate the backdoor and generate their desired output. Existing backdoor attack methods primarily focused on target noise-to-image and text-to-image tasks, with limited work on backdoor attacks in image-to-image tasks. Furthermore, traditional backdoor attacks often rely on a single, conspicuous trigger to generate a fixed target image, lacking concealability and flexibility. To address these limitations, we propose a novel backdoor attack method called "Parasite" for image-to-image tasks in diffusion models, which not only is the first to leverage steganography for triggers hiding, but also allows attackers to embed the target content as a backdoor trigger to achieve a more flexible attack. "Parasite" as a novel attack method effectively bypasses existing detection frameworks to execute backdoor attacks. In our experiments, "Parasite" achieved a 0 percent backdoor detection rate against the mainstream defense frameworks. In addition, in the ablation study, we discuss the influence of different hiding coefficients on the attack results. You can find our code at https://anonymous.4open.science/r/Parasite-1715/.

Updated: 2025-07-23 02:43:00

标题: 《寄生虫：基于隐写术的扩散模型后门攻击框架》

摘要: 最近，扩散模型作为最成功的图像生成模型之一受到了重视，它可以通过迭代采样噪声生成高质量的图像。然而，最近的研究表明，扩散模型容易受到后门攻击，允许攻击者输入包含触发器的数据来激活后门并生成他们想要的输出。现有的后门攻击方法主要集中在目标噪声到图像和文本到图像任务上，对图像到图像任务的后门攻击研究较少。此外，传统的后门攻击通常依赖于单一、显眼的触发器来生成固定目标图像，缺乏隐蔽性和灵活性。为了解决这些限制，我们提出了一种名为“寄生虫”（Parasite）的新型图像到图像任务的扩散模型后门攻击方法，它不仅是第一个利用隐写术隐藏触发器的方法，还允许攻击者嵌入目标内容作为后门触发器，以实现更灵活的攻击。作为一种新型攻击方法，“寄生虫”有效地规避了现有的检测框架来执行后门攻击。在我们的实验中，“寄生虫”在主流防御框架下实现了0%的后门检测率。此外，在消融研究中，我们讨论了不同隐藏系数对攻击结果的影响。您可以在https://anonymous.4open.science/r/Parasite-1715/找到我们的代码。

更新时间: 2025-07-23 02:43:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05815v2

Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models

In recent years, Diffusion Models (DMs) have demonstrated significant advances in the field of image generation. However, according to current research, DMs are vulnerable to backdoor attacks, which allow attackers to control the model's output by inputting data containing covert triggers, such as a specific visual patch or phrase. Existing defense strategies are well equipped to thwart such attacks through backdoor detection and trigger inversion because previous attack methods are constrained by limited input spaces and low-dimensional triggers. For example, visual triggers are easily observed by defenders, text-based or attention-based triggers are more susceptible to neural network detection. To explore more possibilities of backdoor attack in DMs, we propose Gungnir, a novel method that enables attackers to activate the backdoor in DMs through style triggers within input images. Our approach proposes using stylistic features as triggers for the first time and implements backdoor attacks successfully in image-to-image tasks by introducing Reconstructing-Adversarial Noise (RAN) and Short-Term Timesteps-Retention (STTR). Our technique generates trigger-embedded images that are perceptually indistinguishable from clean images, thus bypassing both manual inspection and automated detection neural networks. Experiments demonstrate that Gungnir can easily bypass existing defense methods. Among existing DM defense frameworks, our approach achieves a 0 backdoor detection rate (BDR). Our codes are available at https://github.com/paoche11/Gungnir.

Updated: 2025-07-23 02:38:02

标题: Gungnir：利用图像中的风格特征对扩散模型进行后门攻击

摘要: 近年来，扩散模型（DMs）在图像生成领域取得了显著进展。然而，根据当前研究，DMs容易受到后门攻击，这使得攻击者可以通过输入包含秘密触发器的数据来控制模型的输出，比如特定的视觉补丁或短语。现有的防御策略通过后门检测和触发器反转来阻止这种攻击，因为先前的攻击方法受限于有限的输入空间和低维触发器。例如，视觉触发器容易被防御者观察到，基于文本或注意力的触发器更容易被神经网络检测到。为了探索DMs中后门攻击的更多可能性，我们提出了一种新方法Gungnir，使攻击者能够通过输入图像中的样式触发器激活DMs中的后门。我们的方法首次提出使用风格特征作为触发器，并通过引入重构对抗噪声（RAN）和短期时间步保留（STTR）成功实施了在图像到图像任务中的后门攻击。我们的技术生成了嵌入触发器的图像，这些图像在感知上与干净图像无法区别，从而既绕过了手动检查，也绕过了自动检测的神经网络。实验证明，Gungnir可以轻松绕过现有的防御方法。在现有的DM防御框架中，我们的方法实现了0后门检测率（BDR）。我们的代码可以在https://github.com/paoche11/Gungnir找到。

更新时间: 2025-07-23 02:38:02

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2502.20650v4

JAM: Keypoint-Guided Joint Prediction after Classification-Aware Marginal Proposal for Multi-Agent Interaction

Predicting the future motion of road participants is a critical task in autonomous driving. In this work, we address the challenge of low-quality generation of low-probability modes in multi-agent joint prediction. To tackle this issue, we propose a two-stage multi-agent interactive prediction framework named \textit{keypoint-guided joint prediction after classification-aware marginal proposal} (JAM). The first stage is modeled as a marginal prediction process, which classifies queries by trajectory type to encourage the model to learn all categories of trajectories, providing comprehensive mode information for the joint prediction module. The second stage is modeled as a joint prediction process, which takes the scene context and the marginal proposals from the first stage as inputs to learn the final joint distribution. We explicitly introduce key waypoints to guide the joint prediction module in better capturing and leveraging the critical information from the initial predicted trajectories. We conduct extensive experiments on the real-world Waymo Open Motion Dataset interactive prediction benchmark. The results show that our approach achieves competitive performance. In particular, in the framework comparison experiments, the proposed JAM outperforms other prediction frameworks and achieves state-of-the-art performance in interactive trajectory prediction. The code is available at https://github.com/LinFunster/JAM to facilitate future research.

Updated: 2025-07-23 02:35:04

标题: JAM：基于关键点引导的多智能体交互分类感知边际提议后的联合预测

摘要: 预测道路参与者未来运动是自动驾驶中的一个关键任务。在这项工作中，我们解决了多智能体联合预测中低概率模式生成质量低的挑战。为了解决这个问题，我们提出了一个名为“关键点引导的分类感知边际提议后联合预测”（JAM）的两阶段多智能体互动预测框架。第一阶段被建模为一个边际预测过程，通过轨迹类型对查询进行分类，以鼓励模型学习所有类别的轨迹，为联合预测模块提供全面的模式信息。第二阶段被建模为一个联合预测过程，该过程将场景背景和第一阶段的边际提议作为输入，学习最终的联合分布。我们明确引入关键路标点来指导联合预测模块更好地捕获和利用从初始预测轨迹中获取的关键信息。我们在现实世界的Waymo Open Motion数据集互动预测基准上进行了大量实验。结果显示我们的方法达到了竞争性的性能。特别是在框架比较实验中，提出的JAM优于其他预测框架，并在互动轨迹预测中实现了最先进的性能。代码可在https://github.com/LinFunster/JAM 上获得，以促进未来研究。

更新时间: 2025-07-23 02:35:04

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.17152v1

PICore: Physics-Informed Unsupervised Coreset Selection for Data Efficient Neural Operator Training

Neural operators offer a powerful paradigm for solving partial differential equations (PDEs) that cannot be solved analytically by learning mappings between function spaces. However, there are two main bottlenecks in training neural operators: they require a significant amount of training data to learn these mappings, and this data needs to be labeled, which can only be accessed via expensive simulations with numerical solvers. To alleviate both of these issues simultaneously, we propose PICore, an unsupervised coreset selection framework that identifies the most informative training samples without requiring access to ground-truth PDE solutions. PICore leverages a physics-informed loss to select unlabeled inputs by their potential contribution to operator learning. After selecting a compact subset of inputs, only those samples are simulated using numerical solvers to generate labels, reducing annotation costs. We then train the neural operator on the reduced labeled dataset, significantly decreasing training time as well. Across four diverse PDE benchmarks and multiple coreset selection strategies, PICore achieves up to 78% average increase in training efficiency relative to supervised coreset selection methods with minimal changes in accuracy. We provide code at https://github.com/Asatheesh6561/PICore.

Updated: 2025-07-23 02:32:44

标题: PICore：物理信息未监督的Coreset选择，用于数据高效的神经算子训练

摘要: 神经算子为解决无法通过学习函数空间之间的映射来解析地解的偏微分方程（PDEs）提供了一个强大的范式。然而，在训练神经算子时存在两个主要瓶颈：它们需要大量的训练数据来学习这些映射，而且这些数据需要进行标注，只能通过昂贵的数值求解器进行访问。为了同时缓解这两个问题，我们提出了PICore，一种无监督的核心集选择框架，它可以识别最具信息量的训练样本，而无需访问地面真值PDE解。PICore利用一种基于物理的损失来选择未标记的输入，根据它们对算子学习的潜在贡献。在选择了一个紧凑的输入子集之后，仅使用数值求解器模拟这些样本以生成标签，从而降低了标注成本。然后我们在减少的标记数据集上训练神经算子，显著减少了训练时间。在四个不同的PDE基准测试和多个核心集选择策略中，PICore相对于监督核心集选择方法实现了高达78%的平均训练效率提升，准确性变化很小。我们在https://github.com/Asatheesh6561/PICore提供了代码。

更新时间: 2025-07-23 02:32:44

领域: cs.LG

下载: http://arxiv.org/abs/2507.17151v1

NVS-SQA: Exploring Self-Supervised Quality Representation Learning for Neurally Synthesized Scenes without References

Neural View Synthesis (NVS), such as NeRF and 3D Gaussian Splatting, effectively creates photorealistic scenes from sparse viewpoints, typically evaluated by quality assessment methods like PSNR, SSIM, and LPIPS. However, these full-reference methods, which compare synthesized views to reference views, may not fully capture the perceptual quality of neurally synthesized scenes (NSS), particularly due to the limited availability of dense reference views. Furthermore, the challenges in acquiring human perceptual labels hinder the creation of extensive labeled datasets, risking model overfitting and reduced generalizability. To address these issues, we propose NVS-SQA, a NSS quality assessment method to learn no-reference quality representations through self-supervision without reliance on human labels. Traditional self-supervised learning predominantly relies on the "same instance, similar representation" assumption and extensive datasets. However, given that these conditions do not apply in NSS quality assessment, we employ heuristic cues and quality scores as learning objectives, along with a specialized contrastive pair preparation process to improve the effectiveness and efficiency of learning. The results show that NVS-SQA outperforms 17 no-reference methods by a large margin (i.e., on average 109.5% in SRCC, 98.6% in PLCC, and 91.5% in KRCC over the second best) and even exceeds 16 full-reference methods across all evaluation metrics (i.e., 22.9% in SRCC, 19.1% in PLCC, and 18.6% in KRCC over the second best).

Updated: 2025-07-23 02:32:10

标题: NVS-SQA: 探索无参考神经合成场景的自监督质量表示学习

摘要: 神经视图合成（NVS），如NeRF和3D高斯点阵，有效地从稀疏视角创建逼真的场景，通常通过PSNR、SSIM和LPIPS等质量评估方法进行评估。然而，这些全参考方法，将合成的视图与参考视图进行比较，可能无法完全捕捉神经合成场景（NSS）的感知质量，特别是由于稠密参考视图的有限可用性。此外，获取人类感知标签的挑战阻碍了创建广泛标记的数据集，可能导致模型过拟合和降低泛化能力。为了解决这些问题，我们提出了NVS-SQA，一种NSS质量评估方法，通过自监督学习学习无参考质量表示，而不依赖于人类标签。传统的自监督学习主要依赖于“相同实例，相似表示”假设和大量数据集。然而，鉴于这些条件在NSS质量评估中不适用，我们采用启发式线索和质量分数作为学习目标，以及专门的对比对准备过程，以提高学习的效果和效率。结果表明，NVS-SQA在SRCC方面的表现远远优于17种无参考方法（即平均比第二名高109.5%），在PLCC方面高出98.6%，在KRCC方面高出91.5%，甚至超过了16种全参考方法在所有评估指标上的表现（即在SRCC方面高出22.9%，在PLCC方面高出19.1%，在KRCC方面高出18.6%）。

更新时间: 2025-07-23 02:32:10

领域: cs.CV,cs.AI,cs.HC,cs.MM,eess.IV

下载: http://arxiv.org/abs/2501.06488v2

ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation

The significant morphological and distributional variability among subcellular components poses a long-standing challenge for learning-based organelle segmentation models, significantly increasing the risk of biased feature learning. Existing methods often rely on single mapping relationships, overlooking feature diversity and thereby inducing biased training. Although the Segment Anything Model (SAM) provides rich feature representations, its application to subcellular scenarios is hindered by two key challenges: (1) The variability in subcellular morphology and distribution creates gaps in the label space, leading the model to learn spurious or biased features. (2) SAM focuses on global contextual understanding and often ignores fine-grained spatial details, making it challenging to capture subtle structural alterations and cope with skewed data distributions. To address these challenges, we introduce ScSAM, a method that enhances feature robustness by fusing pre-trained SAM with Masked Autoencoder (MAE)-guided cellular prior knowledge to alleviate training bias from data imbalance. Specifically, we design a feature alignment and fusion module to align pre-trained embeddings to the same feature space and efficiently combine different representations. Moreover, we present a cosine similarity matrix-based class prompt encoder to activate class-specific features to recognize subcellular categories. Extensive experiments on diverse subcellular image datasets demonstrate that ScSAM outperforms state-of-the-art methods.

Updated: 2025-07-23 02:28:43

标题: ScSAM: 消除亚细胞语义分割中的形态和分布变异Bias

摘要: 在学习基于器官分割模型时，亚细胞组分之间的显著形态和分布变异性构成长期挑战，显著增加了偏见特征学习的风险。现有方法通常依赖于单一映射关系，忽视特征多样性，从而引发偏见训练。虽然分割任何模型（SAM）提供了丰富的特征表示，但其在亚细胞场景中的应用受到两个关键挑战的阻碍：（1）亚细胞形态和分布的可变性在标签空间中产生差距，导致模型学习虚假或偏见特征。（2）SAM专注于全局上下文理解，通常忽略细粒度的空间细节，使其难以捕捉微妙的结构变化和应对数据分布的偏斜。为了解决这些挑战，我们引入了ScSAM，这是一种通过融合预训练的SAM和基于掩模自编码器（MAE）引导的细胞先验知识来增强特征稳健性的方法，以减轻数据不平衡带来的训练偏见。具体而言，我们设计了一个特征对齐和融合模块，将预训练的嵌入对齐到相同的特征空间，并有效地组合不同的表示。此外，我们提出了一种基于余弦相似性矩阵的类别提示编码器，激活特定于类别的特征以识别亚细胞类别。对多样的亚细胞图像数据集进行的大量实验表明，ScSAM优于现有的方法。

更新时间: 2025-07-23 02:28:43

领域: cs.CV,cs.AI,cs.LG,I.4.6

下载: http://arxiv.org/abs/2507.17149v1

VGS-ATD: Robust Distributed Learning for Multi-Label Medical Image Classification Under Heterogeneous and Imbalanced Conditions

In recent years, advanced deep learning architectures have shown strong performance in medical imaging tasks. However, the traditional centralized learning paradigm poses serious privacy risks as all data is collected and trained on a single server. To mitigate this challenge, decentralized approaches such as federated learning and swarm learning have emerged, allowing model training on local nodes while sharing only model weights. While these methods enhance privacy, they struggle with heterogeneous and imbalanced data and suffer from inefficiencies due to frequent communication and the aggregation of weights. More critically, the dynamic and complex nature of clinical environments demands scalable AI systems capable of continuously learning from diverse modalities and multilabels. Yet, both centralized and decentralized models are prone to catastrophic forgetting during system expansion, often requiring full model retraining to incorporate new data. To address these limitations, we propose VGS-ATD, a novel distributed learning framework. To validate VGS-ATD, we evaluate it in experiments spanning 30 datasets and 80 independent labels across distributed nodes, VGS-ATD achieved an overall accuracy of 92.7%, outperforming centralized learning (84.9%) and swarm learning (72.99%), while federated learning failed under these conditions due to high requirements on computational resources. VGS-ATD also demonstrated strong scalability, with only a 1% drop in accuracy on existing nodes after expansion, compared to a 20% drop in centralized learning, highlighting its resilience to catastrophic forgetting. Additionally, it reduced computational costs by up to 50% relative to both centralized and swarm learning, confirming its superior efficiency and scalability.

Updated: 2025-07-23 02:27:31

标题: VGS-ATD：异构和不平衡条件下多标签医学图像分类的稳健分布式学习

摘要: 近年来，先进的深度学习架构在医学影像任务中展现出强大的性能。然而，传统的集中式学习范式存在严重的隐私风险，因为所有数据都是在单个服务器上收集和训练的。为了缓解这一挑战，出现了分散式方法，如联邦学习和群体学习，允许在本地节点上进行模型训练，同时仅分享模型权重。虽然这些方法增强了隐私性，但它们在处理异构和不平衡数据方面遇到困难，并且由于频繁通信和权重聚合而导致效率低下。更为关键的是，临床环境的动态和复杂性要求可持续学习来自各种模态和多标签的可扩展AI系统。然而，集中式和分散式模型都容易在系统扩展过程中出现灾难性遗忘，通常需要完全重新训练模型以整合新数据。为了解决这些限制，我们提出了VGS-ATD，一个新颖的分布式学习框架。为了验证VGS-ATD，我们在跨越30个数据集和80个独立标签的实验中对其进行评估，VGS-ATD实现了92.7%的整体准确率，优于集中式学习（84.9%）和群体学习（72.99%），而联邦学习在这些条件下失败，因为对计算资源有很高的需求。VGS-ATD还展现了强大的可扩展性，在扩展后现有节点上的准确率仅下降了1%，而集中式学习下降了20%，凸显了其对灾难性遗忘的弹性。此外，相对于集中式和群体学习，它将计算成本降低了多达50%，证实了其出色的效率和可扩展性。

更新时间: 2025-07-23 02:27:31

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2507.18657v1

Towards Human-level Intelligence via Human-like Whole-Body Manipulation

Building general-purpose intelligent robots has long been a fundamental goal of robotics. A promising approach is to mirror the evolutionary trajectory of humans: learning through continuous interaction with the environment, with early progress driven by the imitation of human behaviors. Achieving this goal presents three core challenges: (1) designing safe robotic hardware with human-level physical capabilities; (2) developing an intuitive and scalable whole-body teleoperation interface for data collection; and (3) creating algorithms capable of learning whole-body visuomotor policies from human demonstrations. To address these challenges in a unified framework, we propose Astribot Suite, a robot learning suite for whole-body manipulation aimed at general daily tasks across diverse environments. We demonstrate the effectiveness of our system on a wide range of activities that require whole-body coordination, extensive reachability, human-level dexterity, and agility. Our results show that Astribot's cohesive integration of embodiment, teleoperation interface, and learning pipeline marks a significant step towards real-world, general-purpose whole-body robotic manipulation, laying the groundwork for the next generation of intelligent robots.

Updated: 2025-07-23 02:23:41

标题: 朝向通过类人整体身体操纵实现人类水平智能

摘要: 构建通用智能机器人长期以来一直是机器人技术的基本目标。一种有前途的方法是模仿人类的演化轨迹：通过与环境的持续互动学习，早期进展是通过模仿人类行为驱动的。实现这一目标面临三个核心挑战：（1）设计具有人类水平物理能力的安全机器人硬件；（2）开发直观且可扩展的全身远程操作界面用于数据收集；以及（3）创建能够从人类示范中学习全身视觉动作策略的算法。为了统一解决这些挑战，我们提出了Astribot Suite，一个旨在进行全身操作的机器人学习套件，旨在处理各种环境中的常规日常任务。我们展示了我们系统在需要全身协调、广泛可及性、人类水平灵巧性和敏捷性的各种活动上的有效性。我们的结果显示，Astribot对实体化、远程操作界面和学习管道的完整集成标志着朝着实现真实世界通用全身机器人操作迈出了重要一步，为下一代智能机器人奠定了基础。

更新时间: 2025-07-23 02:23:41

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.17141v1

SADA: Stability-guided Adaptive Diffusion Acceleration

Diffusion models have achieved remarkable success in generative tasks but suffer from high computational costs due to their iterative sampling process and quadratic attention costs. Existing training-free acceleration strategies that reduce per-step computation cost, while effectively reducing sampling time, demonstrate low faithfulness compared to the original baseline. We hypothesize that this fidelity gap arises because (a) different prompts correspond to varying denoising trajectory, and (b) such methods do not consider the underlying ODE formulation and its numerical solution. In this paper, we propose Stability-guided Adaptive Diffusion Acceleration (SADA), a novel paradigm that unifies step-wise and token-wise sparsity decisions via a single stability criterion to accelerate sampling of ODE-based generative models (Diffusion and Flow-matching). For (a), SADA adaptively allocates sparsity based on the sampling trajectory. For (b), SADA introduces principled approximation schemes that leverage the precise gradient information from the numerical ODE solver. Comprehensive evaluations on SD-2, SDXL, and Flux using both EDM and DPM++ solvers reveal consistent $\ge 1.8\times$ speedups with minimal fidelity degradation (LPIPS $\leq 0.10$ and FID $\leq 4.5$) compared to unmodified baselines, significantly outperforming prior methods. Moreover, SADA adapts seamlessly to other pipelines and modalities: It accelerates ControlNet without any modifications and speeds up MusicLDM by $1.8\times$ with $\sim 0.01$ spectrogram LPIPS.

Updated: 2025-07-23 02:15:45

标题: SADA：稳定性引导的自适应扩散加速

摘要: 扩散模型在生成任务中取得了显著的成功，但由于其迭代采样过程和二次注意力成本，存在高计算成本的问题。现有的无训练加速策略可以减少每步计算成本，同时有效减少采样时间，但与原始基准相比展现出较低的保真度。我们假设这种保真度差距是因为（a）不同提示对应于不同的去噪轨迹，以及（b）这种方法没有考虑基础ODE公式及其数值解。在本文中，我们提出了基于稳定性引导的自适应扩散加速（SADA），这是一种新范式，通过单一稳定性标准统一了逐步和令牌稀疏决策，以加速基于ODE的生成模型（扩散和流匹配）的采样。对于（a），SADA根据采样轨迹自适应地分配稀疏性。对于（b），SADA引入了基于原则的近似方案，利用来自数值ODE求解器的精确梯度信息。在SD-2、SDXL和Flux上对EDM和DPM++求解器进行全面评估，结果显示相对于未修改的基准，一致获得≥1.8倍的加速，并且保真度下降最小（LPIPS≤0.10，FID≤4.5），明显优于先前的方法。此外，SADA可以无缝地适应其他流水线和模态：它可以加速ControlNet而无需任何修改，并将MusicLDM加速1.8倍，其谱图LPIPS约为0.01。

更新时间: 2025-07-23 02:15:45

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.17135v1

Feature-Enhanced TResNet for Fine-Grained Food Image Classification

Food is not only essential to human health but also serves as a medium for cultural identity and emotional connection. In the context of precision nutrition, accurately identifying and classifying food images is critical for dietary monitoring, nutrient estimation, and personalized health management. However, fine-grained food classification remains challenging due to the subtle visual differences among similar dishes. To address this, we propose Feature-Enhanced TResNet (FE-TResNet), a novel deep learning model designed to improve the accuracy of food image recognition in fine-grained scenarios. Built on the TResNet architecture, FE-TResNet integrates a Style-based Recalibration Module (StyleRM) and Deep Channel-wise Attention (DCA) to enhance feature extraction and emphasize subtle distinctions between food items. Evaluated on two benchmark Chinese food datasets-ChineseFoodNet and CNFOOD-241-FE-TResNet achieved high classification accuracies of 81.37% and 80.29%, respectively. These results demonstrate its effectiveness and highlight its potential as a key enabler for intelligent dietary assessment and personalized recommendations in precision nutrition systems.

Updated: 2025-07-23 02:14:58

标题: Feature-Enhanced TResNet用于细粒度食物图像分类

摘要: 食物不仅对人类健康至关重要，还作为文化认同和情感联系的媒介。在精准营养的背景下，准确识别和分类食物图像对于膳食监测、营养估算和个性化健康管理至关重要。然而，由于类似菜肴之间的微小视觉差异，细粒度食物分类仍然具有挑战性。为了解决这个问题，我们提出了一种新型深度学习模型Feature-Enhanced TResNet (FE-TResNet)，旨在提高细粒度场景下食物图像识别的准确性。基于TResNet架构，FE-TResNet整合了基于样式的重新校准模块（StyleRM）和深度通道注意力（DCA），以增强特征提取并强调食物项目之间的微小区别。在两个基准中国食物数据集ChineseFoodNet和CNFOOD-241上评估，FE-TResNet分别实现了81.37%和80.29%的高分类准确性。这些结果表明其有效性，并突显其作为智能膳食评估和个性化推荐在精准营养系统中的潜力。

更新时间: 2025-07-23 02:14:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.12828v2

Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination

Global health emergencies, such as the COVID-19 pandemic, have exposed critical weaknesses in traditional medical supply chains, including inefficiencies in resource allocation, lack of transparency, and poor adaptability to dynamic disruptions. This paper presents a novel hybrid framework that integrates blockchain technology with a decentralized, large language model (LLM) powered multi-agent negotiation system to enhance the resilience and accountability of medical supply chains during crises. In this system, autonomous agents-representing manufacturers, distributors, and healthcare institutions-engage in structured, context-aware negotiation and decision-making processes facilitated by LLMs, enabling rapid and ethical allocation of scarce medical resources. The off-chain agent layer supports adaptive reasoning and local decision-making, while the on-chain blockchain layer ensures immutable, transparent, and auditable enforcement of decisions via smart contracts. The framework also incorporates a formal cross-layer communication protocol to bridge decentralized negotiation with institutional enforcement. A simulation environment emulating pandemic scenarios evaluates the system's performance, demonstrating improvements in negotiation efficiency, fairness of allocation, supply chain responsiveness, and auditability. This research contributes an innovative approach that synergizes blockchain trust guarantees with the adaptive intelligence of LLM-driven agents, providing a robust and scalable solution for critical supply chain coordination under uncertainty.

Updated: 2025-07-23 02:14:42

标题: 医疗物资供应链的弹性多代理谈判：集成LLMs和区块链实现透明协调

摘要: 全球卫生紧急情况，如COVID-19大流行，暴露了传统医疗供应链中的关键弱点，包括资源分配效率低、缺乏透明度和对动态干扰的适应能力不足。本文提出了一种新颖的混合框架，将区块链技术与分散式、基于大型语言模型（LLM）的多代理协商系统相结合，以增强危机期间医疗供应链的弹性和问责性。在这个系统中，代表制造商、经销商和医疗机构的自治代理参与由LLMs促进的结构化、上下文感知的协商和决策过程，实现医疗资源的快速和道德分配。离线代理层支持自适应推理和本地决策，而基于区块链的在线层通过智能合约确保不可变、透明和可审计的决策执行。该框架还包含一个正式的跨层通信协议，以连接分散式协商与机构执行。一个模拟环境模拟大流行场景评估了系统的性能，展示了协商效率、分配公平性、供应链响应能力和可审计性的改善。这项研究提供了一种创新的方法，将区块链信任保证与LLM驱动的代理的自适应智能相结合，为不确定条件下的关键供应链协调提供了稳健且可扩展的解决方案。

更新时间: 2025-07-23 02:14:42

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2507.17134v1

Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance

Large language model (LLM) agents often struggle in environments where rules and required domain knowledge frequently change, such as regulatory compliance and user risk screening. Current approaches, like offline fine-tuning and standard prompting, are insufficient because they cannot effectively adapt to new knowledge during actual operation. To address this limitation, we propose the Adaptive Reflective Interactive Agent (ARIA), an LLM agent framework designed specifically to continuously learn updated domain knowledge at test time. ARIA assesses its own uncertainty through structured self-dialogue, proactively identifying knowledge gaps and requesting targeted explanations or corrections from human experts. It then systematically updates an internal, timestamped knowledge repository with provided human guidance, detecting and resolving conflicting or outdated knowledge through comparisons and clarification queries. We evaluate ARIA on the realistic customer due diligence name screening task on TikTok Pay, alongside publicly available dynamic knowledge tasks. Results demonstrate significant improvements in adaptability and accuracy compared to baselines using standard offline fine-tuning and existing self-improving agents. ARIA is deployed within TikTok Pay serving over 150 million monthly active users, confirming its practicality and effectiveness for operational use in rapidly evolving environments.

Updated: 2025-07-23 02:12:32

标题: 通过人机协作指导，在测试时间让自我改进的代理学习

摘要: 大型语言模型（LLM）代理通常在规则和所需领域知识经常变化的环境中遇到困难，例如监管合规性和用户风险筛查。当前的方法，如离线微调和标准提示，是不足够的，因为它们不能有效地适应实际操作中的新知识。为了解决这一限制，我们提出了自适应反思互动代理（ARIA），这是一个专门设计用于在测试时间持续学习更新领域知识的LLM代理框架。ARIA通过结构化的自对话评估自己的不确定性，主动识别知识空白，并请求人类专家提供有针对性的解释或更正。然后，通过比较和澄清查询，它系统地使用提供的人类指导更新一个内部的、带有时间戳的知识存储库，检测和解决冲突或过时的知识。我们在TikTok Pay上对ARIA进行评估，同时进行了公开可用的动态知识任务。结果表明，与使用标准离线微调和现有的自我改进代理的基线相比，ARIA在适应性和准确性方面取得了显著改进。ARIA已在TikTok Pay中部署，为超过1.5亿月活跃用户提供服务，证实了它在快速发展的环境中的操作使用的实用性和有效性。

更新时间: 2025-07-23 02:12:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17131v1

A Survey of Event Causality Identification: Taxonomy, Challenges, Assessment, and Prospects

Event Causality Identification (ECI) has emerged as a pivotal task in natural language processing (NLP), aimed at automatically detecting causal relationships between events in text. In this comprehensive survey, we systematically elucidate the foundational principles and technical frameworks of ECI, proposing a novel classification framework to categorize and clarify existing methods. {We discuss associated challenges, provide quantitative evaluations, and outline future directions for this dynamic and rapidly evolving field. We first delineate key definitions, problem formalization, and evaluation protocols of ECI. Our classification framework organizes ECI methods based on two primary tasks: Sentence-level Event Causality Identification (SECI) and Document-level Event Causality Identification (DECI). For SECI, we review methods including feature pattern-based matching, machine learning-based classification, deep semantic encoding, prompt-based fine-tuning, and causal knowledge pre-training, alongside common data augmentation strategies. For DECI, we focus on techniques such as deep semantic encoding, event graph reasoning, and prompt-based fine-tuning. We dedicate specific discussions to advancements in multi-lingual and cross-lingual ECI as well as zero-shot ECI leveraging Large Language Models (LLMs). Furthermore, we analyze the strengths, limitations, and unresolved challenges of each method. Extensive quantitative evaluations are conducted on four benchmark datasets to assess various ECI methods. Finally, we explore future research directions.

Updated: 2025-07-23 02:03:22

标题: 一项事件因果性识别调查：分类、挑战、评估和展望

摘要: 事件因果识别（ECI）已经成为自然语言处理（NLP）中的一个关键任务，旨在自动检测文本中事件之间的因果关系。在这份全面的调查中，我们系统地阐明了ECI的基本原则和技术框架，提出了一个新颖的分类框架来对现有方法进行分类和澄清。我们讨论了相关挑战，提供了定量评估，并概述了这个动态和迅速发展的领域的未来方向。我们首先界定了ECI的关键定义，问题形式化和评估协议。我们的分类框架将ECI方法分为两个主要任务：句子级事件因果识别（SECI）和文档级事件因果识别（DECI）。对于SECI，我们回顾了基于特征模式匹配、基于机器学习的分类、深层语义编码、基于提示的微调和因果知识预训练等方法，以及常见的数据增强策略。对于DECI，我们关注技术，如深层语义编码、事件图推理和基于提示的微调。我们专门讨论了多语言和跨语言ECI以及利用大型语言模型（LLMs）的零-shot ECI的进展。此外，我们分析了每种方法的优势、限制和未解决的挑战。我们对四个基准数据集进行了广泛的定量评估，以评估各种ECI方法。最后，我们探讨未来的研究方向。

更新时间: 2025-07-23 02:03:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.10371v4

OkadaTorch: A Differentiable Programming of Okada Model to Calculate Displacements and Strains from Fault Parameters

The Okada model is a widely used analytical solution for displacements and strains caused by a point or rectangular dislocation source in a 3D elastic half-space. We present OkadaTorch, a PyTorch implementation of the Okada model, where the entire code is differentiable; gradients with respect to input can be easily computed using automatic differentiation (AD). Our work consists of two components: a direct translation of the original Okada model into PyTorch, and a convenient wrapper interface for efficiently computing gradients and Hessians with respect to either observation station coordinates or fault parameters. This differentiable framework is well suited for fault parameter inversion, including gradient-based optimization, Bayesian inference, and integration with scientific machine learning (SciML) models. Our code is available here: https://github.com/msomeya1/OkadaTorch

Updated: 2025-07-23 02:02:39

标题: OkadaTorch：利用Okada模型进行可微分编程，计算地震断层参数引起的位移和应变

摘要: Okada模型是一种广泛应用的分析解，用于描述3D弹性半空间中由点源或矩形断层源引起的位移和应变。我们提出了OkadaTorch，这是Okada模型的一个PyTorch实现，整个代码是可微的；可以使用自动微分轻松计算关于输入的梯度。我们的工作包括两个组件：将原始Okada模型直接翻译成PyTorch，以及一个方便的包装器接口，用于有效地计算关于观测站坐标或断层参数的梯度和Hessian矩阵。这种可微分的框架非常适用于断层参数反演，包括基于梯度的优化、贝叶斯推断和与科学机器学习（SciML）模型的集成。我们的代码可以在这里找到：https://github.com/msomeya1/OkadaTorch

更新时间: 2025-07-23 02:02:39

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2507.17126v1

Model Compression Engine for Wearable Devices Skin Cancer Diagnosis

Skin cancer is one of the most prevalent and preventable types of cancer, yet its early detection remains a challenge, particularly in resource-limited settings where access to specialized healthcare is scarce. This study proposes an AI-driven diagnostic tool optimized for embedded systems to address this gap. Using transfer learning with the MobileNetV2 architecture, the model was adapted for binary classification of skin lesions into "Skin Cancer" and "Other." The TensorRT framework was employed to compress and optimize the model for deployment on the NVIDIA Jetson Orin Nano, balancing performance with energy efficiency. Comprehensive evaluations were conducted across multiple benchmarks, including model size, inference speed, throughput, and power consumption. The optimized models maintained their performance, achieving an F1-Score of 87.18% with a precision of 93.18% and recall of 81.91%. Post-compression results showed reductions in model size of up to 0.41, along with improvements in inference speed and throughput, and a decrease in energy consumption of up to 0.93 in INT8 precision. These findings validate the feasibility of deploying high-performing, energy-efficient diagnostic tools on resource-constrained edge devices. Beyond skin cancer detection, the methodologies applied in this research have broader applications in other medical diagnostics and domains requiring accessible, efficient AI solutions. This study underscores the potential of optimized AI systems to revolutionize healthcare diagnostics, thereby bridging the divide between advanced technology and underserved regions.

Updated: 2025-07-23 02:02:24

标题: 可穿戴设备皮肤癌诊断模型压缩引擎

摘要: 皮肤癌是最常见且可预防的癌症之一，但其早期检测仍然是一个挑战，特别是在资源有限的环境中，专业医疗保健的获取有限。本研究提出了一种针对嵌入式系统优化的人工智能诊断工具，以解决这一难题。利用MobileNetV2架构进行迁移学习，该模型被适应用于将皮肤病变二元分类为“皮肤癌”和“其他”。采用TensorRT框架对模型进行压缩和优化，以在NVIDIA Jetson Orin Nano上部署，平衡性能和能效。进行了全面的评估，包括模型大小、推理速度、吞吐量和功耗等多个基准。优化模型保持了其性能，实现了87.18%的F1分数，93.18%的精度和81.91%的召回率。压缩后的结果显示模型大小减少了最多0.41，推理速度和吞吐量有所提高，并且在INT8精度下能源消耗减少了最多0.93。这些发现验证了在资源受限的边缘设备上部署高性能、能效高的诊断工具的可行性。除了皮肤癌检测外，本研究中应用的方法还可以在其他医疗诊断和需要可访问、高效的人工智能解决方案的领域中有更广泛的应用。本研究强调了优化的人工智能系统改变医疗诊断的潜力，从而弥合先进技术和服务不足地区之间的鸿沟。

更新时间: 2025-07-23 02:02:24

领域: cs.LG

下载: http://arxiv.org/abs/2507.17125v1

EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion

Despite the remarkable developments achieved by recent 3D generation works, scaling these methods to geographic extents, such as modeling thousands of square kilometers of Earth's surface, remains an open challenge. We address this through a dual innovation in data infrastructure and model architecture. First, we introduce Aerial-Earth3D, the largest 3D aerial dataset to date, consisting of 50k curated scenes (each measuring 600m x 600m) captured across the U.S. mainland, comprising 45M multi-view Google Earth frames. Each scene provides pose-annotated multi-view images, depth maps, normals, semantic segmentation, and camera poses, with explicit quality control to ensure terrain diversity. Building on this foundation, we propose EarthCrafter, a tailored framework for large-scale 3D Earth generation via sparse-decoupled latent diffusion. Our architecture separates structural and textural generation: 1) Dual sparse 3D-VAEs compress high-resolution geometric voxels and textural 2D Gaussian Splats (2DGS) into compact latent spaces, largely alleviating the costly computation suffering from vast geographic scales while preserving critical information. 2) We propose condition-aware flow matching models trained on mixed inputs (semantics, images, or neither) to flexibly model latent geometry and texture features independently. Extensive experiments demonstrate that EarthCrafter performs substantially better in extremely large-scale generation. The framework further supports versatile applications, from semantic-guided urban layout generation to unconditional terrain synthesis, while maintaining geographic plausibility through our rich data priors from Aerial-Earth3D. Our project page is available at https://whiteinblue.github.io/earthcrafter/

Updated: 2025-07-23 01:59:09

标题: EarthCrafter：通过双稀疏潜在扩散实现可扩展的3D地球生成

摘要: 尽管最近的3D生成作品取得了显著的进展，但将这些方法扩展到地理范围，比如对地球表面数千平方公里的建模，仍然是一个悬而未决的挑战。我们通过数据基础设施和模型架构的双重创新来解决这个问题。首先，我们介绍了迄今为止最大的3D航空数据集Aerial-Earth3D，包括在美国本土捕捉的50,000个经过筛选的场景（每个场景尺寸为600m x 600m），涵盖了4500万个多视角Google Earth帧。每个场景提供了姿态注释的多视角图像、深度图、法线、语义分割和相机位置，通过明确的质量控制确保了地形多样性。基于这一基础，我们提出了EarthCrafter，一个专门用于通过稀疏解耦潜在扩散实现大规模3D地球生成的框架。我们的架构将结构生成和纹理生成分开：1）双稀疏3D-VAEs将高分辨率几何体素和纹理2D高斯斑块（2DGS）压缩成紧凑的潜在空间，从而在保留关键信息的同时大大减轻了面向广阔地理尺度的昂贵计算负担。2）我们提出了在混合输入（语义、图像或两者都不是）上训练的条件感知流匹配模型，以灵活地独立建模潜在几何和纹理特征。广泛的实验表明，EarthCrafter在极大规模生成中表现出色。该框架进一步支持多样化的应用，从语义引导的城市布局生成到无条件的地形合成，同时通过我们从Aerial-Earth3D获取的丰富数据先验保持地理可信性。我们的项目页面可在https://whiteinblue.github.io/earthcrafter/上获得。

更新时间: 2025-07-23 01:59:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.16535v2

Computer Vision for Real-Time Monkeypox Diagnosis on Embedded Systems

The rapid diagnosis of infectious diseases, such as monkeypox, is crucial for effective containment and treatment, particularly in resource-constrained environments. This study presents an AI-driven diagnostic tool developed for deployment on the NVIDIA Jetson Orin Nano, leveraging the pre-trained MobileNetV2 architecture for binary classification. The model was trained on the open-source Monkeypox Skin Lesion Dataset, achieving a 93.07% F1-Score, which reflects a well-balanced performance in precision and recall. To optimize the model, the TensorRT framework was used to accelerate inference for FP32 and to perform post-training quantization for FP16 and INT8 formats. TensorRT's mixed-precision capabilities enabled these optimizations, which reduced the model size, increased inference speed, and lowered power consumption by approximately a factor of two, all while maintaining the original accuracy. Power consumption analysis confirmed that the optimized models used significantly less energy during inference, reinforcing their suitability for deployment in resource-constrained environments. The system was deployed with a Wi-Fi Access Point (AP) hotspot and a web-based interface, enabling users to upload and analyze images directly through connected devices such as mobile phones. This setup ensures simple access and seamless connectivity, making the tool practical for real-world applications. These advancements position the diagnostic tool as an efficient, scalable, and energy-conscious solution to address diagnosis challenges in underserved regions, paving the way for broader adoption in low-resource healthcare settings.

Updated: 2025-07-23 01:53:31

标题: 嵌入式系统上实时猴痘诊断的计算机视觉

摘要: 快速诊断传染病，如猴痘，对于有效的控制和治疗尤为关键，特别是在资源匮乏的环境中。本研究提出了一种基于人工智能的诊断工具，该工具专为部署在NVIDIA Jetson Orin Nano上而开发，利用预训练的MobileNetV2架构进行二元分类。该模型在开源的Monkeypox皮肤损伤数据集上进行训练，实现了93.07%的F1分数，反映出在精确度和召回率方面表现均衡。为了优化模型，使用了TensorRT框架来加速FP32推断，并对FP16和INT8格式进行训练后量化。TensorRT的混合精度功能使得这些优化成为可能，它们减小了模型大小，增加了推断速度，并将功耗降低了约两倍，同时保持了原始的准确性。功耗分析证实，优化后的模型在推断时使用的能量明显减少，强化了它们在资源匮乏环境中部署的适用性。该系统部署了一个Wi-Fi接入点（AP）热点和一个基于Web的界面，使用户能够通过连接设备（如手机）直接上传和分析图像。这种设置确保了简单的访问和无缝的连接性，使该工具在实际应用中更加实用。这些进展将该诊断工具定位为一种高效、可扩展和节能的解决方案，以解决欠发达地区的诊断挑战，为在低资源的医疗环境中更广泛地采用铺平道路。

更新时间: 2025-07-23 01:53:31

领域: cs.LG

下载: http://arxiv.org/abs/2507.17123v1

Robust Five-Class and binary Diabetic Retinopathy Classification Using Transfer Learning and Data Augmentation

Diabetic retinopathy (DR) is a leading cause of vision loss worldwide, and early diagnosis through automated retinal image analysis can significantly reduce the risk of blindness. This paper presents a robust deep learning framework for both binary and five-class DR classification, leveraging transfer learning and extensive data augmentation to address the challenges of class imbalance and limited training data. We evaluate a range of pretrained convolutional neural network architectures, including variants of ResNet and EfficientNet, on the APTOS 2019 dataset. For binary classification, our proposed model achieves a state-of-the-art accuracy of 98.9%, with a precision of 98.6%, recall of 99.3%, F1-score of 98.9%, and an AUC of 99.4%. In the more challenging five-class severity classification task, our model obtains a competitive accuracy of 84.6% and an AUC of 94.1%, outperforming several existing approaches. Our findings also demonstrate that EfficientNet-B0 and ResNet34 offer optimal trade-offs between accuracy and computational efficiency across both tasks. These results underscore the effectiveness of combining class-balanced augmentation with transfer learning for high-performance DR diagnosis. The proposed framework provides a scalable and accurate solution for DR screening, with potential for deployment in real-world clinical environments.

Updated: 2025-07-23 01:52:27

标题: 使用迁移学习和数据增强进行强大的五类和二元糖尿病视网膜病变分类

摘要: 糖尿病视网膜病变（DR）是全球导致视力丧失的主要原因，通过自动视网膜图像分析进行早期诊断可以显著降低失明风险。本文提出了一个强大的深度学习框架，用于二元和五类DR分类，利用迁移学习和大量数据增强来解决类别不平衡和有限训练数据的挑战。我们在APTOS 2019数据集上评估了一系列预训练的卷积神经网络架构，包括ResNet和EfficientNet的变体。对于二元分类，我们提出的模型实现了98.9%的最新准确率，准确率为98.6%，召回率为99.3%，F1分数为98.9%，AUC为99.4%。在更具挑战性的五类严重程度分类任务中，我们的模型获得了84.6%的竞争性准确率和94.1%的AUC，超过了几种现有方法。我们的研究结果还表明，EfficientNet-B0和ResNet34在两个任务中提供了最佳的准确率和计算效率的权衡。这些结果强调了将平衡类别增强与迁移学习相结合对于高性能DR诊断的有效性。提出的框架为DR筛查提供了可扩展和准确的解决方案，并有潜力在现实临床环境中部署。

更新时间: 2025-07-23 01:52:27

领域: cs.CV,cs.LG,F.2.2; I.2.7

下载: http://arxiv.org/abs/2507.17121v1

BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving

Large language models (LLMs) have become increasingly popular in various areas, traditional business gradually shifting from rule-based systems to LLM-based solutions. However, the inference of LLMs is resource-intensive or latency-sensitive, posing significant challenges for serving systems. Existing LLM serving systems often use static or continuous batching strategies, which can lead to inefficient GPU memory utilization and increased latency, especially under heterogeneous workloads. These methods may also struggle to adapt to dynamic workload fluctuations, resulting in suboptimal throughput and potential service level objective (SLO) violations. In this paper, we introduce BucketServe, a bucket-based dynamic batching framework designed to optimize LLM inference performance. By grouping requests into size-homogeneous buckets based on sequence length, BucketServe minimizes padding overhead and optimizes GPU memory usage through real-time batch size adjustments preventing out-of-memory (OOM) errors. It introduces adaptive bucket splitting/merging and priority-aware scheduling to mitigate resource fragmentation and ensure SLO compliance. Experiment shows that BucketServe significantly outperforms UELLM in throughput, achieving up to 3.58x improvement. It can also handle 1.93x more request load under the SLO attainment of 80% compared with DistServe and demonstrates 1.975x higher system load capacity compared to the UELLM.

Updated: 2025-07-23 01:51:48

标题: BucketServe：基于桶的动态批处理用于智能高效的LLM推理服务

摘要: 大型语言模型（LLMs）在各个领域越来越受欢迎，传统企业逐渐从基于规则的系统转向基于LLM的解决方案。然而，LLMs的推理对资源密集或延迟敏感，给服务系统带来了重大挑战。现有的LLM服务系统通常使用静态或连续的批处理策略，这可能导致GPU内存利用效率低下和延迟增加，特别是在异构工作负载下。这些方法也可能难以适应动态工作负载波动，导致吞吐量不佳和潜在的服务水平目标（SLO）违规。在本文中，我们介绍了BucketServe，一个基于桶的动态批处理框架，旨在优化LLM推理性能。通过根据序列长度将请求分组到大小均匀的桶中，BucketServe通过实时批处理大小调整最小化填充开销，并通过防止内存溢出（OOM）错误来优化GPU内存使用。它引入了自适应桶分割/合并和优先级感知调度，以减轻资源碎片化并确保SLO合规性。实验证明，BucketServe在吞吐量方面明显优于UELLM，实现了高达3.58倍的改进。与DistServe相比，它还可以在80%的SLO达成情况下处理1.93倍更多的请求负载，并且相对于UELLM，表现出1.975倍更高的系统负载能力。

更新时间: 2025-07-23 01:51:48

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2507.17120v1

HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study

AI has become integral to safety-critical areas like autonomous driving systems (ADS) and robotics. The architecture of recent autonomous systems are trending toward end-to-end (E2E) monolithic architectures such as large language models (LLMs) and vision language models (VLMs). In this paper, we review different architectural solutions and then evaluate the efficacy of common safety analyses such as failure modes and effect analysis (FMEA) and fault tree analysis (FTA). We show how these techniques can be improved for the intricate nature of the foundational models, particularly in how they form and utilize latent representations. We introduce HySAFE-AI, Hybrid Safety Architectural Analysis Framework for AI Systems, a hybrid framework that adapts traditional methods to evaluate the safety of AI systems. Lastly, we offer hints of future work and suggestions to guide the evolution of future AI safety standards.

Updated: 2025-07-23 01:41:51

标题: HySafe-AI：AI系统的混合安全架构分析框架：案例研究

摘要: 人工智能已经成为自动驾驶系统（ADS）和机器人等安全关键领域的重要组成部分。最近自主系统的架构趋向于端到端（E2E）的单体架构，如大型语言模型（LLMs）和视觉语言模型（VLMs）。本文回顾了不同的架构解决方案，然后评估了常见安全分析方法的有效性，如故障模式和效应分析（FMEA）和故障树分析（FTA）。我们展示了这些技术如何能够改进对基础模型复杂性的理解，特别是它们如何形成和利用潜在表示。我们介绍了HySAFE-AI，一种适应传统方法以评估AI系统安全性的混合框架。最后，我们提供了未来工作的线索和建议，以引导未来AI安全标准的发展。

更新时间: 2025-07-23 01:41:51

领域: cs.AI

下载: http://arxiv.org/abs/2507.17118v1

Probabilistic Graphical Models: A Concise Tutorial

Probabilistic graphical modeling is a branch of machine learning that uses probability distributions to describe the world, make predictions, and support decision-making under uncertainty. Underlying this modeling framework is an elegant body of theory that bridges two mathematical traditions: probability and graph theory. This framework provides compact yet expressive representations of joint probability distributions, yielding powerful generative models for probabilistic reasoning. This tutorial provides a concise introduction to the formalisms, methods, and applications of this modeling framework. After a review of basic probability and graph theory, we explore three dominant themes: (1) the representation of multivariate distributions in the intuitive visual language of graphs, (2) algorithms for learning model parameters and graphical structures from data, and (3) algorithms for inference, both exact and approximate.

Updated: 2025-07-23 01:36:44

标题: 概率图模型：简明教程

摘要: 概率图模型是机器学习的一个分支，它使用概率分布来描述世界，进行预测，并在不确定性下支持决策。在这个建模框架的基础上，存在一种优雅的理论体系，它连接了概率和图论这两个数学传统。这个框架提供了对联合概率分布的简洁而表达丰富的表示，产生了强大的概率推理生成模型。本教程提供了对这种建模框架的形式化、方法和应用的简明介绍。在回顾基本概率和图论之后，我们探讨了三个主题：(1)在图形直观语言中表示多元分布，(2)从数据中学习模型参数和图形结构的算法，以及(3)用于推理的算法，包括精确和近似推理。

更新时间: 2025-07-23 01:36:44

领域: cs.LG

下载: http://arxiv.org/abs/2507.17116v1

Learning Neural Strategy-Proof Matching Mechanism from Examples

Designing two-sided matching mechanisms is challenging when practical demands for matching outcomes are difficult to formalize and the designed mechanism must satisfy theoretical conditions. To address this, prior work has proposed a framework that learns a matching mechanism from examples, using a parameterized family that satisfies properties such as stability. However, despite its usefulness, this framework does not guarantee strategy-proofness (SP), and cannot handle varying numbers of agents or incorporate publicly available contextual information about agents, both of which are crucial in real-world applications. In this paper, we propose a new parametrized family of matching mechanisms that always satisfy strategy-proofness, are applicable for an arbitrary number of agents, and deal with public contextual information of agents, based on the serial dictatorship (SD). This family is represented by NeuralSD, a novel neural network architecture based on SD, where agent rankings in SD are treated as learnable parameters computed from agents' contexts using an attention-based sub-network. To enable learning, we introduce tensor serial dictatorship (TSD), a differentiable relaxation of SD using tensor operations. This allows NeuralSD to be trained end-to-end from example matchings while satisfying SP. We conducted experiments to learn a matching mechanism from matching examples while satisfying SP. We demonstrated that our method outperformed baselines in predicting matchings and on several metrics for goodness of matching outcomes.

Updated: 2025-07-23 01:17:48

标题: 学习神经策略无懈可击的匹配机制的例子

摘要: 设计双边匹配机制在实际需求难以形式化并且设计机制必须满足理论条件时是具有挑战性的。为了解决这个问题，先前的工作提出了一个框架，从示例中学习匹配机制，使用一个满足稳定性等属性的参数化家族。然而，尽管其有用性，这个框架并不保证策略证明性（SP），也无法处理不同数量的代理人或者合并有关代理人的公开可用上下文信息，这两者在实际应用中至关重要。在本文中，我们提出了一个新的参数化匹配机制家族，始终满足策略证明性，适用于任意数量的代理人，并处理代理人的公共上下文信息，基于串行独裁（SD）。这个家族由NeuralSD表示，这是一种基于SD的新颖神经网络架构，其中SD中的代理人排名被视为可学习参数，通过基于注意力的子网络从代理人的上下文中计算得出。为了实现学习，我们引入了张量串行独裁（TSD），这是使用张量操作对SD进行可微分的放松。这使得NeuralSD可以从示例匹配中端到端训练，同时满足SP。我们进行了实验，从匹配示例中学习匹配机制，同时满足SP。我们展示了我们的方法在预测匹配和匹配结果好坏的几个指标上优于基线。

更新时间: 2025-07-23 01:17:48

领域: cs.AI

下载: http://arxiv.org/abs/2410.19384v2

EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles

With Large Language Models (LLMs) rapidly approaching and potentially surpassing human-level performance, it has become imperative to develop approaches capable of effectively supervising and enhancing these powerful models using smaller, human-level models exposed to only human-level data. We address this critical weak-to-strong (W2S) generalization challenge by proposing a novel method aimed at improving weak experts, by training on the same limited human-level data, enabling them to generalize to complex, super-human-level tasks. Our approach, called **EnsemW2S**, employs a token-level ensemble strategy that iteratively combines multiple weak experts, systematically addressing the shortcomings identified in preceding iterations. By continuously refining these weak models, we significantly enhance their collective ability to supervise stronger student models. We extensively evaluate the generalization performance of both the ensemble of weak experts and the subsequent strong student model across in-distribution (ID) and out-of-distribution (OOD) datasets. For OOD, we specifically introduce question difficulty as an additional dimension for defining distributional shifts. Our empirical results demonstrate notable improvements, achieving 4%, and 3.2% improvements on ID datasets and, upto 6% and 2.28% on OOD datasets for experts and student models respectively, underscoring the effectiveness of our proposed method in advancing W2S generalization.

Updated: 2025-07-23 01:08:36

标题: EnsemW2S: 用大型语言模型集合增强从弱到强的泛化

摘要: 随着大型语言模型（LLMs）迅速接近并有可能超越人类水平的表现，开发能够有效监督和增强这些强大模型的方法变得至关重要，这些方法仅使用暴露于人类水平数据的较小的人类水平模型。我们通过提出一种旨在通过在相同有限的人类水平数据上训练来改进弱专家，使它们能够推广到复杂的、超越人类水平任务的新颖方法来解决这一关键的由弱到强（W2S）泛化挑战。我们的方法称为**EnsemW2S**，采用逐标记级别的集成策略，通过迭代地组合多个弱专家，系统地解决在前几次迭代中确定的缺点。通过不断完善这些弱模型，我们显著增强了它们共同监督更强大的学生模型的能力。我们对弱专家的集成和随后的强学生模型在分布内（ID）和分布外（OOD）数据集上的泛化性能进行了广泛评估。对于OOD，我们特别引入了问题难度作为定义分布偏移的额外维度。我们的实证结果表明了显著的改进，分别在ID数据集上实现了4%和3.2%的改进，在OOD数据集上分别实现了6%和2.28%的改进，强调了我们提出的方法在推进W2S泛化方面的有效性。

更新时间: 2025-07-23 01:08:36

领域: cs.LG

下载: http://arxiv.org/abs/2410.04571v3

Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models

Reinforcement learning (RL) is a key post-pretraining step for aligning large language models (LLMs) with complex tasks and human preferences. While it is often assumed that RL fine-tuning requires updating most of a model's parameters, we challenge this assumption with a surprising finding: RL fine-tuning consistently modifies only a small subnetwork (typically 5-30% of weights), leaving most parameters unchanged. We call this phenomenon RL-induced parameter update sparsity. It arises naturally, without any sparsity constraints or parameter-efficient tuning, and appears across multiple RL algorithms (e.g., PPO, DPO, SimPO, PRIME) and model families (e.g., OpenAI, Meta, and open-source LLMs). Moreover, the subnetworks updated by RL show substantial overlap across different seeds, datasets, and algorithms-far exceeding chance-suggesting a partially transferable structure in the pretrained model. We show that fine-tuning only this sparse subnetwork recovers full model performance and yields parameters nearly identical to the fully fine-tuned model. Our analysis suggests this sparsity emerges because RL operates near the model's original distribution, requiring only targeted changes. KL penalties, gradient clipping, and on-policy dynamics have limited effect on the sparsity pattern. These findings shed new light on how RL adapts models: not by shifting all weights, but by focusing training on a small, consistently updated subnetwork. This insight enables more efficient RL methods and reframes sparsity through the lens of the lottery ticket hypothesis.

Updated: 2025-07-23 01:02:17

标题: 强化学习在大型语言模型中微调稀疏子网络

摘要: 强化学习（RL）是对齐大型语言模型（LLMs）与复杂任务和人类偏好的关键后预训练步骤。通常假设RL微调需要更新模型大部分参数，但我们挑战了这一假设，发现一个令人惊讶的发现：RL微调一直只修改一个小子网络（通常是权重的5-30%），大部分参数保持不变。我们称这种现象为RL诱导的参数更新稀疏性。它自然而然地产生，没有任何稀疏约束或参数高效调整，并且出现在多个RL算法（如PPO，DPO，SimPO，PRIME）和模型系列（如OpenAI，Meta和开源LLMs）中。此外，由RL更新的子网络在不同种子、数据集和算法之间显示出明显的重叠，远远超过随机机会，表明在预训练模型中存在部分可转移的结构。我们展示只微调这个稀疏子网络可以恢复完整模型性能，并产生几乎与完全微调模型相同的参数。我们的分析表明，这种稀疏性的出现是因为RL在模型的原始分布附近运作，只需要有针对性地进行改变。KL惩罚、梯度剪切和政策动态对稀疏性模式的影响有限。这些发现揭示了RL如何调整模型的新视角：不是通过转移所有权重，而是通过将训练集中在一个小而一致更新的子网络上。这种洞察力使得RL方法更加高效，并通过彩票票据假设的视角重新定义了稀疏性。

更新时间: 2025-07-23 01:02:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17107v1

In Reverie Together: Ten Years of Mathematical Discovery with a Machine Collaborator

We present four open conjectures in graph theory generated by the automated conjecturing system \texttt{TxGraffiti}. Each conjecture is concise, grounded in natural graph invariants, and empirically validated across hundreds of graphs. Despite extensive effort, these statements remain unresolved--defying both proof and counterexample. They are not only mathematical challenges but creative expressions--born of symbolic pattern recognition and mathematician-defined heuristics, refined through years of human dialogue, and now offered back to the community as collaborative artifacts. These conjectures invite not only formal proof, but also reflection on how machines can evoke wonder, spark curiosity, and contribute to the raw material of discovery. By highlighting these problems, we aim to inspire both human mathematicians and AI systems to engage with them--not only to solve them, but to reflect on what it means when machines participate meaningfully in the creative process of mathematical thought.

Updated: 2025-07-23 00:49:32

标题: 共同沉思：与机器合作者一起度过的十年数学探索

摘要: 我们提出了四个图论中的开放猜想，这些猜想是由自动猜想系统\texttt{TxGraffiti}生成的。每个猜想都简洁明了，基于自然图不变量，经过数百个图的实证验证。尽管付出了大量努力，这些陈述仍然没有解决--既不证明也不反例。它们不仅是数学挑战，而且是创造性的表达--诞生于符号模式识别和数学家定义的启发式，通过多年的人类对话不断完善，现在作为协作制品回馈给社区。这些猜想不仅邀请进行形式化证明，还促使人们思考机器如何引发惊奇，激发好奇心，并为发现的原始材料做出贡献。通过突出这些问题，我们旨在激励人类数学家和人工智能系统与之互动--不仅解决问题，而且反思当机器在数学思维的创造过程中有意义地参与时意味着什么。

更新时间: 2025-07-23 00:49:32

领域: cs.DM,cs.AI,math.CO

下载: http://arxiv.org/abs/2507.17780v1

Fragment size density estimator for shrinkage-induced fracture based on a physics-informed neural network

This paper presents a neural network (NN)-based solver for an integro-differential equation that models shrinkage-induced fragmentation. The proposed method directly maps input parameters to the corresponding probability density function without numerically solving the governing equation, thereby significantly reducing computational costs. Specifically, it enables efficient evaluation of the density function in Monte Carlo simulations while maintaining accuracy comparable to or even exceeding that of conventional finite difference schemes. Validatation on synthetic data demonstrates both the method's computational efficiency and predictive reliability. This study establishes a foundation for the data-driven inverse analysis of fragmentation and suggests the potential for extending the framework beyond pre-specified model structures.

Updated: 2025-07-23 00:44:03

标题: 基于物理信息神经网络的收缩诱导断裂的碎片尺寸密度估计器

摘要: 本文提出了一种基于神经网络（NN）的求解器，用于模拟收缩引起的碎裂的积分微分方程。所提出的方法直接将输入参数映射到相应的概率密度函数，而无需数值求解控制方程，从而显著降低了计算成本。具体而言，它能够在蒙特卡洛模拟中高效评估密度函数，同时保持与传统有限差分方案相当甚至超过的准确性。在合成数据上的验证表明了该方法的计算效率和预测可靠性。这项研究为碎裂的数据驱动反向分析奠定了基础，并提示了将框架扩展到预先指定的模型结构以外的潜力。

更新时间: 2025-07-23 00:44:03

领域: physics.comp-ph,cs.AI

下载: http://arxiv.org/abs/2507.11799v2

Weather-Aware AI Systems versus Route-Optimization AI: A Comprehensive Analysis of AI Applications in Transportation Productivity

While recent research demonstrates that AI route-optimization systems improve taxi driver productivity by 14\%, this study reveals that such findings capture only a fraction of AI's potential in transportation. We examine comprehensive weather-aware AI systems that integrate deep learning meteorological prediction with machine learning positioning optimization, comparing their performance against traditional operations and route-only AI approaches. Using simulation data from 10,000 taxi operations across varied weather conditions, we find that weather-aware AI systems increase driver revenue by 107.3\%, compared to 14\% improvements from route-optimization alone. Weather prediction contributes the largest individual productivity gain, with strong correlations between meteorological conditions and demand ($r=0.575$). Economic analysis reveals annual earnings increases of 13.8 million yen per driver, with rapid payback periods and superior return on investment. These findings suggest that current AI literature significantly underestimates AI's transformative potential by focusing narrowly on routing algorithms, while weather intelligence represents an untapped \$8.9 billion market opportunity. Our results indicate that future AI implementations should adopt comprehensive approaches that address multiple operational challenges simultaneously rather than optimizing isolated functions.

Updated: 2025-07-23 00:30:09

标题: 天气感知人工智能系统与路线优化人工智能：对交通生产力中人工智能应用的全面分析

摘要: 最近的研究表明，人工智能路线优化系统可以提高出租车司机的生产率达14\%，但本研究揭示了这些发现仅捕捉了人工智能在交通运输领域潜力的一部分。我们研究了综合考虑天气情况的人工智能系统，将深度学习气象预测与机器学习位置优化相结合，将其性能与传统运营和仅路线优化的人工智能方法进行比较。利用来自不同天气条件下的1万次出租车运营的模拟数据，我们发现，考虑天气情况的人工智能系统将司机收入提高了107.3\%，而仅通过路线优化的提高只有14\%。天气预测对个人生产率提升贡献最大，气象条件和需求之间存在强相关性（r=0.575）。经济分析显示，每名司机的年收入增加1380万日元，回报周期迅速，投资回报率优越。这些发现表明，当前的人工智能文献在狭隘地关注路线算法的同时，显著低估了人工智能的转型潜力，而天气情报代表了一个尚未开发的89亿美元市场机遇。我们的结果表明，未来的人工智能实施应采用综合方法，同时解决多个运营挑战，而不是优化孤立的功能。

更新时间: 2025-07-23 00:30:09

领域: econ.GN,cs.AI,q-fin.EC

下载: http://arxiv.org/abs/2507.17099v1

ZORMS-LfD: Learning from Demonstrations with Zeroth-Order Random Matrix Search

We propose Zeroth-Order Random Matrix Search for Learning from Demonstrations (ZORMS-LfD). ZORMS-LfD enables the costs, constraints, and dynamics of constrained optimal control problems, in both continuous and discrete time, to be learned from expert demonstrations without requiring smoothness of the learning-loss landscape. In contrast, existing state-of-the-art first-order methods require the existence and computation of gradients of the costs, constraints, dynamics, and learning loss with respect to states, controls and/or parameters. Most existing methods are also tailored to discrete time, with constrained problems in continuous time receiving only cursory attention. We demonstrate that ZORMS-LfD matches or surpasses the performance of state-of-the-art methods in terms of both learning loss and compute time across a variety of benchmark problems. On unconstrained continuous-time benchmark problems, ZORMS-LfD achieves similar loss performance to state-of-the-art first-order methods with an over $80$\% reduction in compute time. On constrained continuous-time benchmark problems where there is no specialized state-of-the-art method, ZORMS-LfD is shown to outperform the commonly used gradient-free Nelder-Mead optimization method.

Updated: 2025-07-23 00:23:01

标题: ZORMS-LfD：零阶随机矩阵搜索中的示范学习

摘要: 我们提出了Zeroth-Order Random Matrix Search for Learning from Demonstrations (ZORMS-LfD)。ZORMS-LfD使得在连续和离散时间的约束最优控制问题中学习成为可能，而无需学习-损失景观的光滑性。相比之下，现有的一阶方法需要成本、约束、动态和学习损失相对于状态、控制和/或参数的梯度的存在和计算。大多数现有方法也专门针对离散时间，而对于连续时间中的约束问题只能得到粗略的关注。我们证明 ZORMS-LfD 在各种基准问题上的学习损失和计算时间方面与最先进的方法相匹敌甚至超越。在无约束的连续时间基准问题上，ZORMS-LfD 达到了与最先进的一阶方法类似的损失表现，而计算时间减少了超过80%。在连续时间基准问题中存在无专门最先进方法的约束问题中，ZORMS-LfD 被证明优于常用的无梯度 Nelder-Mead 优化方法。

更新时间: 2025-07-23 00:23:01

领域: cs.LG,cs.NA,cs.SY,eess.SY,math.NA,math.OC

下载: http://arxiv.org/abs/2507.17096v1

Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning

Reinforcement learning (RL) holds significant promise for adaptive traffic signal control. While existing RL-based methods demonstrate effectiveness in reducing vehicular congestion, their predominant focus on vehicle-centric optimization leaves pedestrian mobility needs and safety challenges unaddressed. In this paper, we present a deep RL framework for adaptive control of eight traffic signals along a real-world urban corridor, jointly optimizing both pedestrian and vehicular efficiency. Our single-agent policy is trained using real-world pedestrian and vehicle demand data derived from Wi-Fi logs and video analysis. The results demonstrate significant performance improvements over traditional fixed-time signals, reducing average wait times per pedestrian and per vehicle by up to 67% and 52% respectively, while simultaneously decreasing total wait times for both groups by up to 67% and 53%. Additionally, our results demonstrate generalization capabilities across varying traffic demands, including conditions entirely unseen during training, validating RL's potential for developing transportation systems that serve all road users.

Updated: 2025-07-23 00:13:12

标题: 城市环境中使用强化学习进行行人和车辆交通优化

摘要: 强化学习（RL）在自适应交通信号控制方面具有重要的潜力。虽然现有基于RL的方法在减少车辆拥堵方面表现出有效性，但它们主要集中在以车辆为中心的优化上，未解决行人的移动需求和安全挑战。本文提出了一个深度RL框架，用于实现沿着真实城市走廊的八个交通信号的自适应控制，同时优化行人和车辆的效率。我们使用来自Wi-Fi日志和视频分析的真实行人和车辆需求数据来训练我们的单智能体策略。结果表明，相对于传统的固定时间信号，平均每位行人和每辆车的等待时间分别减少了67%和52%，同时两组总等待时间分别减少了67%和53%。此外，我们的结果显示了在不同交通需求下的泛化能力，包括在训练期间完全未见过的情况，验证了RL在发展为所有道路用户提供服务的交通系统方面的潜力。

更新时间: 2025-07-23 00:13:12

领域: cs.LG,cs.MA

下载: http://arxiv.org/abs/2504.05018v2