GSCache: Real-Time Radiance Caching for Volume Path Tracing using 3D Gaussian Splatting
Real-time path tracing is rapidly becoming the standard for rendering in entertainment and professional applications. In scientific visualization, volume rendering plays a crucial role in helping researchers analyze and interpret complex 3D data. Recently, photorealistic rendering techniques have gained popularity in scientific visualization, yet they face significant challenges. One of the most prominent issues is slow rendering performance and high pixel variance caused by Monte Carlo integration. In this work, we introduce a novel radiance caching approach for path-traced volume rendering. Our method leverages advances in volumetric scene representation and adapts 3D Gaussian splatting to function as a multi-level, path-space radiance cache. This cache is designed to be trainable on the fly, dynamically adapting to changes in scene parameters such as lighting configurations and transfer functions. By incorporating our cache, we achieve less noisy, higher-quality images without increasing rendering costs. To evaluate our approach, we compare it against a baseline path tracer that supports uniform sampling and next-event estimation and the state-of-the-art for neural radiance caching. Through both quantitative and qualitative analyses, we demonstrate that our path-space radiance cache is a robust solution that is easy to integrate and significantly enhances the rendering quality of volumetric visualization applications while maintaining comparable computational efficiency.
Updated: 2025-07-25 23:55:54
标题: GSCache:使用3D高斯喷射进行体路径追踪的实时辐射缓存
摘要: 实时光线追踪正在迅速成为娱乐和专业应用程序中渲染的标准。在科学可视化中,体积渲染在帮助研究人员分析和解释复杂的3D数据方面起着至关重要的作用。最近,逼真的渲染技术在科学可视化中变得流行起来,但它们面临着重大挑战。其中最突出的问题之一是由蒙特卡洛积分引起的渲染性能缓慢和像素方差高。在这项工作中,我们介绍了一种新颖的辐射缓存方法,用于路径追踪体积渲染。我们的方法利用体积场景表示的进展,并使3D高斯分布适应作为多级、路径空间辐射缓存的功能。该缓存被设计为可在运行时进行训练,动态适应场景参数的变化,如光照配置和传输功能。通过整合我们的缓存,我们实现了更少噪音、更高质量的图像,而不增加渲染成本。为了评估我们的方法,我们将其与支持均匀采样和下一个事件估计的基线路径追踪器以及最先进的神经辐射缓存进行比较。通过定量和定性分析,我们证明了我们的路径空间辐射缓存是一个强大的解决方案,易于集成,并显着增强了体积可视化应用程序的渲染质量,同时保持可比较的计算效率。
更新时间: 2025-07-25 23:55:54
领域: cs.GR,cs.LG
Beyond Nearest Neighbors: Semantic Compression and Graph-Augmented Retrieval for Enhanced Vector Search
Vector databases typically rely on approximate nearest neighbor (ANN) search to retrieve the top-k closest vectors to a query in embedding space. While effective, this approach often yields semantically redundant results, missing the diversity and contextual richness required by applications such as retrieval-augmented generation (RAG), multi-hop QA, and memory-augmented agents. We introduce a new retrieval paradigm: semantic compression, which aims to select a compact, representative set of vectors that captures the broader semantic structure around a query. We formalize this objective using principles from submodular optimization and information geometry, and show that it generalizes traditional top-k retrieval by prioritizing coverage and diversity. To operationalize this idea, we propose graph-augmented vector retrieval, which overlays semantic graphs (e.g., kNN or knowledge-based links) atop vector spaces to enable multi-hop, context-aware search. We theoretically analyze the limitations of proximity-based retrieval under high-dimensional concentration and highlight how graph structures can improve semantic coverage. Our work outlines a foundation for meaning-centric vector search systems, emphasizing hybrid indexing, diversity-aware querying, and structured semantic retrieval. We make our implementation publicly available to foster future research in this area.
Updated: 2025-07-25 23:35:11
标题: 超越最近邻:语义压缩和图增强检索以增强向量搜索
摘要: 向量数据库通常依赖于近似最近邻(ANN)搜索,以在嵌入空间中检索与查询最接近的前k个向量。虽然有效,但这种方法通常会产生语义上冗余的结果,缺乏检索增强生成(RAG)、多跳QA和记忆增强代理等应用所需的多样性和上下文丰富性。我们引入了一种新的检索范式:语义压缩,旨在选择一个紧凑、代表性的向量集,捕捉查询周围更广泛的语义结构。我们使用子模优化和信息几何原则来形式化这一目标,并展示它通过优先考虑覆盖范围和多样性而概括了传统的top-k检索。为了实现这一想法,我们提出了图增强向量检索,该方法在向量空间之上叠加语义图(例如kNN或基于知识的链接),以实现多跳、上下文感知搜索。我们在理论上分析了在高维集中情况下基于接近度的检索的限制,并强调图结构如何改进语义覆盖。我们的工作为以意义为中心的向量搜索系统奠定了基础,强调混合索引、多样性感知查询和结构化语义检索。我们公开提供我们的实现,以促进这一领域的未来研究。
更新时间: 2025-07-25 23:35:11
领域: cs.LG
Oranits: Mission Assignment and Task Offloading in Open RAN-based ITS using Metaheuristic and Deep Reinforcement Learning
In this paper, we explore mission assignment and task offloading in an Open Radio Access Network (Open RAN)-based intelligent transportation system (ITS), where autonomous vehicles leverage mobile edge computing for efficient processing. Existing studies often overlook the intricate interdependencies between missions and the costs associated with offloading tasks to edge servers, leading to suboptimal decision-making. To bridge this gap, we introduce Oranits, a novel system model that explicitly accounts for mission dependencies and offloading costs while optimizing performance through vehicle cooperation. To achieve this, we propose a twofold optimization approach. First, we develop a metaheuristic-based evolutionary computing algorithm, namely the Chaotic Gaussian-based Global ARO (CGG-ARO), serving as a baseline for one-slot optimization. Second, we design an enhanced reward-based deep reinforcement learning (DRL) framework, referred to as the Multi-agent Double Deep Q-Network (MA-DDQN), that integrates both multi-agent coordination and multi-action selection mechanisms, significantly reducing mission assignment time and improving adaptability over baseline methods. Extensive simulations reveal that CGG-ARO improves the number of completed missions and overall benefit by approximately 7.1% and 7.7%, respectively. Meanwhile, MA-DDQN achieves even greater improvements of 11.0% in terms of mission completions and 12.5% in terms of the overall benefit. These results highlight the effectiveness of Oranits in enabling faster, more adaptive, and more efficient task processing in dynamic ITS environments.
Updated: 2025-07-25 23:13:09
标题: 标题翻译:Oranits:基于元启发式和深度强化学习的开放式RAN ITS中的任务分配和任务卸载
摘要: 在这篇论文中,我们探讨了在基于开放式无线接入网络(Open RAN)的智能交通系统(ITS)中的任务分配和任务卸载,其中自动驾驶车辆利用移动边缘计算进行高效处理。现有研究经常忽视任务之间复杂的相互依赖关系以及将任务卸载到边缘服务器所带来的成本,导致决策不够优化。为了弥补这一差距,我们引入了Oranits,一个新颖的系统模型,明确考虑了任务依赖关系和卸载成本,同时通过车辆合作优化性能。为了实现这一目标,我们提出了一个双重优化方法。首先,我们开发了基于元启发式的进化计算算法,即混沌高斯全局ARO(CGG-ARO),作为一种单插槽优化的基准。其次,我们设计了一个增强的基于奖励的深度强化学习(DRL)框架,称为多智能体双深度Q网络(MA-DDQN),集成了多智能体协调和多动作选择机制,显著减少了任务分配时间,并提高了对基准方法的适应性。大量模拟结果显示,CGG-ARO分别将完成的任务数量和整体收益提高了约7.1%和7.7%。同时,MA-DDQN在任务完成方面取得了更大的改进,分别为11.0%和整体收益12.5%。这些结果突出了Oranits在动态ITS环境中实现更快、更适应性更强、更高效任务处理的有效性。
更新时间: 2025-07-25 23:13:09
领域: cs.DC,cs.AI,cs.GT,cs.LG,cs.NI
The wall confronting large language models
We show that the scaling laws which determine the performance of large language models (LLMs) severely limit their ability to improve the uncertainty of their predictions. As a result, raising their reliability to meet the standards of scientific inquiry is intractable by any reasonable measure. We argue that the very mechanism which fuels much of the learning power of LLMs, namely the ability to generate non-Gaussian output distributions from Gaussian input ones, might well be at the roots of their propensity to produce error pileup, ensuing information catastrophes and degenerative AI behaviour. This tension between learning and accuracy is a likely candidate mechanism underlying the observed low values of the scaling components. It is substantially compounded by the deluge of spurious correlations pointed out by Calude and Longo which rapidly increase in any data set merely as a function of its size, regardless of its nature. The fact that a degenerative AI pathway is a very probable feature of the LLM landscape does not mean that it must inevitably arise in all future AI research. Its avoidance, which we also discuss in this paper, necessitates putting a much higher premium on insight and understanding of the structural characteristics of the problems being investigated.
Updated: 2025-07-25 22:48:37
标题: 大型语言模型面临的挑战
摘要: 我们展示了决定大型语言模型(LLMs)性能的缩放定律严重限制了它们改善预测不确定性的能力。因此,提高它们的可靠性以满足科学探究的标准在任何合理的标准下都是棘手的。我们认为,驱动LLMs许多学习能力的机制,即能够从高斯输入产生非高斯输出分布的能力,很可能是它们倾向于产生错误积累、随之而来的信息灾难和退化人工智能行为的根源。学习和准确性之间的紧张关系很可能是观察到的缩放分量低值的机制。卡鲁德和隆戈指出的大量伪相关性的洪水被迅速增加到任何数据集中,不管其性质如何。退化人工智能路径是LLM景观中的一个极有可能的特征,并不意味着它必然会出现在所有未来人工智能研究中。避免这一点,我们在本文中也讨论了,需要更加重视对正在调查的问题的结构特征的洞察和理解。
更新时间: 2025-07-25 22:48:37
领域: cs.AI
A Lightweight Deep Learning-based Model for Ranking Influential Nodes in Complex Networks
Identifying influential nodes in complex networks is a critical task with a wide range of applications across different domains. However, existing approaches often face trade-offs between accuracy and computational efficiency. To address these challenges, we propose 1D-CGS, a lightweight and effective hybrid model that integrates the speed of one-dimensional convolutional neural networks (1D-CNN) with the topological representation power of GraphSAGE for efficient node ranking. The model uses a lightweight input representation built on two straightforward and significant topological features: node degree and average neighbor degree. These features are processed through 1D convolutions to extract local patterns, followed by GraphSAGE layers to aggregate neighborhood information. We formulate the node ranking task as a regression problem and use the Susceptible-Infected-Recovered (SIR) model to generate ground truth influence scores. 1D-CGS is initially trained on synthetic networks generated by the Barabasi-Albert model and then applied to real world networks for identifying influential nodes. Experimental evaluations on twelve real world networks demonstrate that 1D-CGS significantly outperforms traditional centrality measures and recent deep learning models in ranking accuracy, while operating in very fast runtime. The proposed model achieves an average improvement of 4.73% in Kendall's Tau correlation and 7.67% in Jaccard Similarity over the best performing deep learning baselines. It also achieves an average Monotonicity Index (MI) score 0.99 and produces near perfect rank distributions, indicating highly unique and discriminative rankings. Furthermore, all experiments confirm that 1D-CGS operates in a highly reasonable time, running significantly faster than existing deep learning methods, making it suitable for large scale applications.
Updated: 2025-07-25 22:45:56
标题: 一个基于轻量级深度学习的模型,用于在复杂网络中对影响力节点进行排名
摘要: 在复杂网络中识别有影响力的节点是一项关键任务,具有广泛的应用领域。然而,现有方法通常在精度和计算效率之间面临折衷。为了解决这些挑战,我们提出了1D-CGS,这是一个轻量级且有效的混合模型,将一维卷积神经网络(1D-CNN)的速度与GraphSAGE的拓扑表示能力相结合,实现高效的节点排序。该模型使用建立在节点度和平均邻居度两个简单而重要的拓扑特征上的轻量级输入表示。这些特征经过1D卷积处理,提取局部模式,然后通过GraphSAGE层聚合邻域信息。我们将节点排序任务制定为一个回归问题,并使用易感-感染-康复(SIR)模型生成地面真实影响分数。1D-CGS首先在由Barabasi-Albert模型生成的合成网络上进行训练,然后应用于真实世界网络,以识别有影响力的节点。对十二个真实世界网络的实验评估表明,1D-CGS在排名精度方面显著优于传统的中心性度量和最近的深度学习模型,同时运行非常快速。提出的模型在Kendall's Tau相关性和Jaccard相似性方面平均改进了4.73%和7.67%。它还获得了平均单调性指数(MI)得分0.99,并产生接近完美的排名分布,表明具有高度独特和区分性的排名。此外,所有实验都证实1D-CGS在一个非常合理的时间内运行,比现有的深度学习方法运行速度显著更快,适用于大规模应用。
更新时间: 2025-07-25 22:45:56
领域: cs.SI,cs.LG
Disjoint Generative Models
We propose a new framework for generating cross-sectional synthetic datasets via disjoint generative models. In this paradigm, a dataset is partitioned into disjoint subsets that are supplied to separate instances of generative models. The results are then combined post hoc by a joining operation that works in the absence of common variables/identifiers. The success of the framework is demonstrated through several case studies and examples on tabular data that helps illuminate some of the design choices that one may make. The principal benefit of disjoint generative models is significantly increased privacy at only a low utility cost. Additional findings include increased effectiveness and feasibility for certain model types and the possibility for mixed-model synthesis.
Updated: 2025-07-25 22:38:06
标题: 不相交的生成模型
摘要: 我们提出了一个新的框架,通过不相交的生成模型生成横截面合成数据集。在这种范式中,数据集被划分为不相交的子集,这些子集被提供给单独的生成模型实例。然后,通过一个连接操作将结果后期组合,该操作可以在缺乏共同变量/标识符的情况下工作。该框架的成功通过几个案例研究和在表格数据上的示例来证明,这有助于阐明可能做出的一些设计选择。不相交生成模型的主要好处是在仅有较低效用成本的情况下显著增加隐私性。其他发现包括对某些模型类型的增加效力和可行性,以及混合模型合成的可能性。
更新时间: 2025-07-25 22:38:06
领域: cs.LG
NAICS-Aware Graph Neural Networks for Large-Scale POI Co-visitation Prediction: A Multi-Modal Dataset and Methodology
Understanding where people go after visiting one business is crucial for urban planning, retail analytics, and location-based services. However, predicting these co-visitation patterns across millions of venues remains challenging due to extreme data sparsity and the complex interplay between spatial proximity and business relationships. Traditional approaches using only geographic distance fail to capture why coffee shops attract different customer flows than fine dining restaurants, even when co-located. We introduce NAICS-aware GraphSAGE, a novel graph neural network that integrates business taxonomy knowledge through learnable embeddings to predict population-scale co-visitation patterns. Our key insight is that business semantics, captured through detailed industry codes, provide crucial signals that pure spatial models cannot explain. The approach scales to massive datasets (4.2 billion potential venue pairs) through efficient state-wise decomposition while combining spatial, temporal, and socioeconomic features in an end-to-end framework. Evaluated on our POI-Graph dataset comprising 94.9 million co-visitation records across 92,486 brands and 48 US states, our method achieves significant improvements over state-of-the-art baselines: the R-squared value increases from 0.243 to 0.625 (a 157 percent improvement), with strong gains in ranking quality (32 percent improvement in NDCG at 10).
Updated: 2025-07-25 22:31:45
标题: NAICS感知图神经网络用于大规模POI共同访问预测:一个多模态数据集和方法论
摘要: 理解人们在访问一个企业后去哪里是城市规划、零售分析和基于位置的服务至关重要的。然而,由于极端的数据稀疏性和空间接近性与商业关系之间复杂的相互作用,预测数百万场所的共同访问模式仍然具有挑战性。传统方法仅使用地理距离无法解释为什么咖啡店吸引不同的顾客流量而与高级餐厅相同地点时。我们引入了NAICS-aware GraphSAGE,这是一种新颖的图神经网络,通过可学习的嵌入将业务分类知识整合到一起,以预测人口规模的共同访问模式。我们的关键洞察是,通过详细的行业代码捕捉到的业务语义提供了纯空间模型无法解释的重要信号。该方法通过高效的状态分解实现了大规模数据集(42亿个潜在场所对),同时将空间、时间和经济特征组合在一个端到端框架中。在我们的POI-Graph数据集上进行评估,该数据集包括了来自92,486个品牌和48个美国州的9490万共同访问记录,我们的方法在最先进的基线上取得了显著改进:R-squared值从0.243提高到0.625(提高了157%),在排名质量上取得了较大的增长(NDCG在10上提高了32%)。
更新时间: 2025-07-25 22:31:45
领域: cs.LG
Polar Coding and Linear Decoding
Polar encoding, described by Arikan in IEEE Transactions on Information Theory, Vol. 55, No. 7, July 2009, was a milestone for telecommunications. A Polar code distributes information among high and low-capacity channels, showing the possibility of achieving perfect channel capacity. The high-capacity channels allow almost noiseless transmission of data. When these channels are not high noise, reliability is achieved in the signal transmission. It starts to compete against codes such a Low-Density Parity-Check (LDPC) codes. Polar code can be also considered error correcting, based on the redundancy inherent in its structure. This feature makes polar encoding also applicable to digital quantum-resistant cryptography protocols. This work explores linear decoding at a first or single trial in the case of small losses or small number of bit-flipping, and repeated transmission for medium level losses. This is distinct from Arikans successive probabilistic decoding by application of probabilistic rules. Linear decoding is done directly from solving the linear equations connecting the codewords x and the received signals y after transmission via noisy channels. Numerical examples will be shown. Along with this work, programming in Mathematica language was used. Codes are available for copy-and-paste for Mathematica users to immediately try the described formalism.
Updated: 2025-07-25 22:23:14
标题: 极化编码和线性译码
摘要: 极化编码(Polar encoding),由Arikan在2009年7月的IEEE信息论杂志上描述,是电信领域的一个里程碑。极化码将信息分配到高容量和低容量信道中,显示了实现完美信道容量的可能性。高容量信道几乎可以无噪声传输数据。当这些信道不受高噪声干扰时,信号传输可靠性得到保证。它开始与低密度奇偶校验(LDPC)码等码制竞争。极化码也可以被视为纠错码,基于其结构固有的冗余性。这一特点使极化编码也适用于数字抗量子密码协议。本文探讨了在线性解码中的第一次或单次尝试,在有小损失或少量位翻转的情况下,并针对中等水平损失进行了重复传输。这与Arikan通过应用概率规则的连续概率解码是不同的。线性解码直接通过求解连接码字x和通过嘈杂信道传输后接收信号y的线性方程来完成。数值示例将被展示。除此之外,本文还使用了Mathematica语言进行编程。代码可供Mathematica用户复制粘贴,以立即尝试所描述的形式化方法。
更新时间: 2025-07-25 22:23:14
领域: cs.IT,cs.CR,math.IT
Ultracoarse Equilibria and Ordinal-Folding Dynamics in Operator-Algebraic Models of Infinite Multi-Agent Games
We develop an operator algebraic framework for infinite games with a continuum of agents and prove that regret based learning dynamics governed by a noncommutative continuity equation converge to a unique quantal response equilibrium under mild regularity assumptions. The framework unifies functional analysis, coarse geometry and game theory by assigning to every game a von Neumann algebra that represents collective strategy evolution. A reflective regret operator within this algebra drives the flow of strategy distributions and its fixed point characterises equilibrium. We introduce the ordinal folding index, a computable ordinal valued metric that measures the self referential depth of the dynamics, and show that it bounds the transfinite time needed for convergence, collapsing to zero on coarsely amenable networks. The theory yields new invariant subalgebra rigidity results, establishes existence and uniqueness of envy free and maximin share allocations in continuum economies, and links analytic properties of regret flows with empirical stability phenomena in large language models. These contributions supply a rigorous mathematical foundation for large scale multi agent systems and demonstrate the utility of ordinal metrics for equilibrium selection.
Updated: 2025-07-25 22:20:42
标题: 超粗略均衡和序折叠动力学在无限多智能体游戏的算子代数模型中的应用
摘要: 我们为具有无限个代理的无穷博弈开发了一个算子代数框架,并证明了由非交换连续性方程控制的遗憾学习动态在轻微的正则性假设下收敛到唯一的量子响应均衡。该框架通过为每个博弈分配代表集体策略演化的冯·诺伊曼代数,统一了函数分析、粗粒度几何和博弈论。代数中的反射遗憾算子驱动策略分布的流动,其不动点表征均衡。我们引入了序数折叠指数,一种可计算的序数值度量,用于衡量动态的自指深度,并展示了它对收敛所需的超越时间的上界,在粗粒度可接受网络上收敛到零。该理论产生了新的不变子代数刚性结果,确立了在连续经济中存在和唯一的无嫉妒和最小份额分配,并将遗憾流的解析性质与大型语言模型中的经验稳定现象联系起来。这些贡献为大规模多代理系统提供了严密的数学基础,并展示了序数度量在均衡选择中的实用性。
更新时间: 2025-07-25 22:20:42
领域: math.OC,cs.AI,cs.GT,cs.MA,91A26, 47L65, 03E10, 91B32
BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning
Recent years have seen significant advancements in designing reinforcement learning (RL)-based agents for building energy management. While individual success is observed in simulated or controlled environments, the scalability of RL approaches in terms of efficiency and generalization across building dynamics and operational scenarios remains an open question. In this work, we formally characterize the generalization space for the cross-environment, multi-objective building energy management task, and formulate the multi-objective contextual RL problem. Such a formulation helps understand the challenges of transferring learned policies across varied operational contexts such as climate and heat convection dynamics under multiple control objectives such as comfort level and energy consumption. We provide a principled way to parameterize such contextual information in realistic building RL environments, and construct a novel benchmark to facilitate the evaluation of generalizable RL algorithms in practical building control tasks. Our results show that existing multi-objective RL methods are capable of achieving reasonable trade-offs between conflicting objectives. However, their performance degrades under certain environment variations, underscoring the importance of incorporating dynamics-dependent contextual information into the policy learning process.
Updated: 2025-07-25 22:18:43
标题: 海狸:构建具有可评估变化的环境,用于评估多目标强化学习
摘要: 近年来,在设计基于强化学习(RL)的建筑能源管理代理方面取得了显著进展。虽然在模拟或受控环境中观察到个别成功案例,但强化学习方法在建筑动态和操作场景方面的效率和泛化能力仍然是一个悬而未决的问题。在这项工作中,我们正式表征了跨环境、多目标建筑能源管理任务的泛化空间,并制定了多目标情境强化学习问题。这样的表述有助于理解在不同操作背景下转移学习策略的挑战,例如在多种控制目标下,如舒适水平和能耗下的气候和热对流动态。我们提供了一种合理的方法,在现实建筑RL环境中参数化这种情境信息,并构建了一个新颖的基准,以促进对实际建筑控制任务中的可泛化RL算法的评估。我们的结果表明,现有的多目标RL方法能够在冲突目标之间实现合理的权衡。然而,在某些环境变化下它们的性能会下降,突显了将动态依赖情境信息纳入策略学习过程中的重要性。
更新时间: 2025-07-25 22:18:43
领域: cs.LG,cs.SY,eess.SY
KD-GAT: Combining Knowledge Distillation and Graph Attention Transformer for a Controller Area Network Intrusion Detection System
The Controller Area Network (CAN) protocol is widely adopted for in-vehicle communication but lacks inherent security mechanisms, making it vulnerable to cyberattacks. This paper introduces KD-GAT, an intrusion detection framework that combines Graph Attention Networks (GATs) with knowledge distillation (KD) to enhance detection accuracy while reducing computational complexity. In our approach, CAN traffic is represented as graphs using a sliding window to capture temporal and relational patterns. A multi-layer GAT with jumping knowledge aggregation acting as the teacher model, while a compact student GAT--only 6.32% the size of the teacher--is trained via a two-phase process involving supervised pretraining and knowledge distillation with both soft and hard label supervision. Experiments on three benchmark datasets--Car-Hacking, Car-Survival, and can-train-and-test demonstrate that both teacher and student models achieve strong results, with the student model attaining 99.97% and 99.31% accuracy on Car-Hacking and Car-Survival, respectively. However, significant class imbalance in can-train-and-test has led to reduced performance for both models on this dataset. Addressing this imbalance remains an important direction for future work.
Updated: 2025-07-25 21:45:58
标题: KD-GAT:将知识蒸馏和图注意力变换器结合应用于控制器区域网络入侵检测系统
摘要: 控制器区域网络(CAN)协议被广泛应用于车辆通信,但缺乏固有的安全机制,使其容易受到网络攻击。本文介绍了KD-GAT,一种融合图注意力网络(GAT)和知识蒸馏(KD)的入侵检测框架,以增强检测准确性同时减少计算复杂度。在我们的方法中,CAN流量被表示为图,使用滑动窗口捕获时间和关系模式。一个多层GAT作为教师模型,跳跃知识聚合,而一个紧凑的学生GAT--仅为教师大小的6.32%--通过两阶段过程进行训练,包括监督预训练和软硬标签监督的知识蒸馏。对三个基准数据集--汽车黑客、汽车生存和CAN训练和测试的实验表明,教师和学生模型都取得了强大的结果,学生模型在汽车黑客和汽车生存上分别达到了99.97%和99.31%的准确率。然而,CAN训练和测试中的显著类别不平衡导致两个模型在这个数据集上表现不佳。解决这种不平衡仍然是未来工作的重要方向。
更新时间: 2025-07-25 21:45:58
领域: cs.LG,cs.AI
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Modern generative models exhibit unprecedented capabilities to generate extremely realistic data. However, given the inherent compositionality of the real world, reliable use of these models in practical applications requires that they exhibit the capability to compose a novel set of concepts to generate outputs not seen in the training data set. Prior work demonstrates that recent diffusion models do exhibit intriguing compositional generalization abilities, but also fail unpredictably. Motivated by this, we perform a controlled study for understanding compositional generalization in conditional diffusion models in a synthetic setting, varying different attributes of the training data and measuring the model's ability to generate samples out-of-distribution. Our results show: (i) the order in which the ability to generate samples from a concept and compose them emerges is governed by the structure of the underlying data-generating process; (ii) performance on compositional tasks exhibits a sudden "emergence" due to multiplicative reliance on the performance of constituent tasks, partially explaining emergent phenomena seen in generative models; and (iii) composing concepts with lower frequency in the training data to generate out-of-distribution samples requires considerably more optimization steps compared to generating in-distribution samples. Overall, our study lays a foundation for understanding capabilities and compositionality in generative models from a data-centric perspective.
Updated: 2025-07-25 21:42:43
标题: 组合能力呈现出乘法性质:探索合成任务上的扩散模型
摘要: 现代生成模型展现出前所未有的能力,可以生成极为逼真的数据。然而,考虑到现实世界的固有组合性,可靠地将这些模型用于实际应用需要它们展示出能够组合一组新概念以生成在训练数据集中未见的输出的能力。先前的研究表明,最近的扩散模型确实展现出了引人注目的组合泛化能力,但也存在无法预测的失败。受此启发,我们在合成环境中进行了一项受控研究,以了解条件扩散模型中的组合泛化能力,通过改变训练数据的不同属性,测量模型生成超出分布范围的样本的能力。我们的结果显示:(i) 从一个概念生成样本的能力和组合它们的能力的顺序受到底层数据生成过程结构的控制;(ii) 在组合任务上的表现出现突然的“出现”,部分解释了生成模型中看到的新兴现象;(iii) 将训练数据中频率较低的概念组合起来生成超出分布范围的样本,相比生成分布内的样本,需要更多的优化步骤。总的来说,我们的研究为从数据中心的角度理解生成模型的能力和组合性奠定了基础。
更新时间: 2025-07-25 21:42:43
领域: cs.LG
Salsa as a Nonverbal Embodied Language -- The CoMPAS3D Dataset and Benchmarks
Imagine a humanoid that can safely and creatively dance with a human, adapting to its partner's proficiency, using haptic signaling as a primary form of communication. While today's AI systems excel at text or voice-based interaction with large language models, human communication extends far beyond text-it includes embodied movement, timing, and physical coordination. Modeling coupled interaction between two agents poses a formidable challenge: it is continuous, bidirectionally reactive, and shaped by individual variation. We present CoMPAS3D, the largest and most diverse motion capture dataset of improvised salsa dancing, designed as a challenging testbed for interactive, expressive humanoid AI. The dataset includes 3 hours of leader-follower salsa dances performed by 18 dancers spanning beginner, intermediate, and professional skill levels. For the first time, we provide fine-grained salsa expert annotations, covering over 2,800 move segments, including move types, combinations, execution errors and stylistic elements. We draw analogies between partner dance communication and natural language, evaluating CoMPAS3D on two benchmark tasks for synthetic humans that parallel key problems in spoken language and dialogue processing: leader or follower generation with proficiency levels (speaker or listener synthesis), and duet (conversation) generation. Towards a long-term goal of partner dance with humans, we release the dataset, annotations, and code, along with a multitask SalsaAgent model capable of performing all benchmark tasks, alongside additional baselines to encourage research in socially interactive embodied AI and creative, expressive humanoid motion generation.
Updated: 2025-07-25 21:33:48
标题: 莎莎舞作为一种非语言体现语言——CoMPAS3D数据集和基准的研究
摘要: 想象一下一个能够安全且创造性地与人类共舞的人形机器人,能够适应其舞伴的熟练程度,使用触觉信号作为主要的沟通方式。虽然今天的人工智能系统在与大型语言模型的文本或语音交互方面表现出色,但人类的交流远远不止于文本,还包括具体的动作、节奏和身体协调。对两个代理人之间的耦合交互建模是一个巨大的挑战:它是连续的、双向反应的,并且受到个体差异的影响。我们提出了CoMPAS3D,这是设计为交互式、富有表现力的人形机器人的最大和最多样化的即兴萨尔萨舞动作捕捉数据集。该数据集包括18名舞者表演的3小时领舞-随从萨尔萨舞蹈,这些舞者涵盖了初学者、中级和专业技能水平。我们首次提供了细粒度的萨尔萨舞专家注释,涵盖了超过2800个动作段,包括动作类型、组合、执行错误和风格元素。我们将伴舞沟通与自然语言之间的类比,评估CoMPAS3D在两项用于合成人类的基准任务上的表现,这两项任务类似于口语和对话处理中的关键问题:领舞或随从生成与熟练水平(发言者或听众合成),以及二重奏(对话)生成。为了实现与人类共舞的长期目标,我们发布了数据集、注释和代码,以及一种能够执行所有基准任务的多任务SalsaAgent模型,同时提供了额外的基准线,以鼓励在社交互动式的具体化人工智能和创造性、富有表现力的人形机器人动作生成方面的研究。
更新时间: 2025-07-25 21:33:48
领域: cs.LG,cs.AI,cs.CL,cs.CV
DeepJIVE: Learning Joint and Individual Variation Explained from Multimodal Data Using Deep Learning
Conventional multimodal data integration methods provide a comprehensive assessment of the shared or unique structure within each individual data type but suffer from several limitations such as the inability to handle high-dimensional data and identify nonlinear structures. In this paper, we introduce DeepJIVE, a deep-learning approach to performing Joint and Individual Variance Explained (JIVE). We perform mathematical derivation and experimental validations using both synthetic and real-world 1D, 2D, and 3D datasets. Different strategies of achieving the identity and orthogonality constraints for DeepJIVE were explored, resulting in three viable loss functions. We found that DeepJIVE can successfully uncover joint and individual variations of multimodal datasets. Our application of DeepJIVE to the Alzheimer's Disease Neuroimaging Initiative (ADNI) also identified biologically plausible covariation patterns between the amyloid positron emission tomography (PET) and magnetic resonance (MR) images. In conclusion, the proposed DeepJIVE can be a useful tool for multimodal data analysis.
Updated: 2025-07-25 21:23:31
标题: DeepJIVE:使用深度学习从多模态数据中学习联合和个体变化的解释
摘要: 传统的多模态数据整合方法提供了对每种数据类型内部共享或独特结构的全面评估,但存在一些限制,比如无法处理高维数据和识别非线性结构。在本文中,我们介绍了DeepJIVE,这是一种执行联合和个体方差解释(JIVE)的深度学习方法。我们使用合成和真实的1D、2D和3D数据集进行数学推导和实验验证。探索了实现DeepJIVE的身份和正交约束的不同策略,结果得到了三种可行的损失函数。我们发现DeepJIVE可以成功地揭示多模态数据集的联合和个体变化。我们将DeepJIVE应用于阿尔茨海默病神经影像计划(ADNI),还确定了淀粉样蛋白正电子发射断层扫描(PET)和磁共振(MR)图像之间的生物学合理的协变化模式。总之,所提出的DeepJIVE可以成为多模态数据分析的有用工具。
更新时间: 2025-07-25 21:23:31
领域: cs.CV,cs.AI
Feature learning is decoupled from generalization in high capacity neural networks
Neural networks outperform kernel methods, sometimes by orders of magnitude, e.g. on staircase functions. This advantage stems from the ability of neural networks to learn features, adapting their hidden representations to better capture the data. We introduce a concept we call feature quality to measure this performance improvement. We examine existing theories of feature learning and demonstrate empirically that they primarily assess the strength of feature learning, rather than the quality of the learned features themselves. Consequently, current theories of feature learning do not provide a sufficient foundation for developing theories of neural network generalization.
Updated: 2025-07-25 21:19:37
标题: 在高容量神经网络中,特征学习与泛化是分离的
摘要: 神经网络有时比核方法表现更好,有时甚至可以达到数量级的差距,例如在阶梯函数上。这种优势源于神经网络具有学习特征的能力,通过调整隐藏表示来更好地捕捉数据。我们引入了一个称为特征质量的概念来衡量这种性能改进。我们检查了现有的特征学习理论,并通过实证验证,表明它们主要评估特征学习的强度,而不是学习到的特征本身的质量。因此,当前的特征学习理论并不足以为发展神经网络泛化理论提供充分的基础。
更新时间: 2025-07-25 21:19:37
领域: cs.LG,stat.ML
Efficient Learning for Product Attributes with Compact Multimodal Models
Image-based product attribute prediction in e-commerce is a crucial task with numerous applications. The supervised fine-tuning of Vision Language Models (VLMs) faces significant scale challenges due to the cost of manual or API based annotation. In this paper, we investigate label-efficient semi-supervised fine-tuning strategies for compact VLMs (2B-3B parameters) that leverage unlabeled product listings through Direct Preference Optimization (DPO). Beginning with a small, API-based, annotated, and labeled set, we first employ PEFT to train low-rank adapter modules. To update the adapter weights with unlabeled data, we generate multiple reasoning-and-answer chains per unlabeled sample and segregate these chains into preferred and dispreferred based on self-consistency. We then fine-tune the model with DPO loss and use the updated model for the next iteration. By using PEFT fine-tuning with DPO, our method achieves efficient convergence with minimal compute overhead. On a dataset spanning twelve e-commerce verticals, DPO-based fine-tuning, which utilizes only unlabeled data, demonstrates a significant improvement over the supervised model. Moreover, experiments demonstrate that accuracy with DPO training improves with more unlabeled data, indicating that a large pool of unlabeled samples can be effectively leveraged to improve performance.
Updated: 2025-07-25 21:12:11
标题: 高效学习产品属性的紧凑多模型模型
摘要: 在电子商务中,基于图像的产品属性预测是一项具有许多应用的关键任务。由于手动或API的注释成本,对视觉语言模型(VLMs)进行监督微调面临着重大规模挑战。在本文中,我们研究了一种标签高效的半监督微调策略,用于紧凑的VLMs(2B-3B参数),通过直接偏好优化(DPO)利用未标记的产品列表。从一个小型的、基于API的、注释和标记的数据集开始,我们首先使用PEFT来训练低秩适配器模块。为了使用未标记数据更新适配器权重,我们为每个未标记样本生成多个推理和答案链,并根据自一致性将这些链分为首选和非首选。然后,我们使用DPO损失对模型进行微调,并将更新后的模型用于下一次迭代。通过使用带有DPO的PEFT微调,我们的方法实现了高效的收敛,计算开销最小。在跨越十二个电子商务垂直领域的数据集上,仅利用未标记数据的基于DPO的微调显示出明显的改进,超过监督模型。此外,实验证明,使用DPO训练的准确性随着更多未标记数据的增加而提高,表明可以有效利用大量未标记样本来提高性能。
更新时间: 2025-07-25 21:12:11
领域: cs.CV,cs.AI
Black Box Deployed -- Functional Criteria for Artificial Moral Agents in the LLM Era
The advancement of powerful yet opaque large language models (LLMs) necessitates a fundamental revision of the philosophical criteria used to evaluate artificial moral agents (AMAs). Pre-LLM frameworks often relied on the assumption of transparent architectures, which LLMs defy due to their stochastic outputs and opaque internal states. This paper argues that traditional ethical criteria are pragmatically obsolete for LLMs due to this mismatch. Engaging with core themes in the philosophy of technology, this paper proffers a revised set of ten functional criteria to evaluate LLM-based artificial moral agents: moral concordance, context sensitivity, normative integrity, metaethical awareness, system resilience, trustworthiness, corrigibility, partial transparency, functional autonomy, and moral imagination. These guideposts, applied to what we term "SMA-LLS" (Simulating Moral Agency through Large Language Systems), aim to steer AMAs toward greater alignment and beneficial societal integration in the coming years. We illustrate these criteria using hypothetical scenarios involving an autonomous public bus (APB) to demonstrate their practical applicability in morally salient contexts.
Updated: 2025-07-25 21:09:11
标题: 黑匣子部署——LLM时代人工道德代理的功能标准
摘要: 随着强大而不透明的大型语言模型(LLMs)的发展,必须对用于评估人工道德代理人(AMAs)的哲学标准进行基本修订。之前的LLM框架通常依赖于透明架构的假设,而LLMs由于其随机输出和不透明的内部状态而违反了这一假设。本文认为,由于这种不匹配,传统的伦理标准在LLMs方面实际上已经过时。通过与技术哲学中的核心主题进行互动,本文提出了一组修订后的十个功能标准,用于评估基于LLMs的人工道德代理人:道德一致性、上下文敏感性、规范完整性、元伦理意识、系统韧性、值得信赖性、可纠正性、部分透明度、功能自主性和道德想象力。这些指导原则,应用于我们所谓的“SMA-LLS”(通过大型语言系统模拟道德代理)的场景,旨在引导AMAs在未来几年朝着更大的对齐和有益的社会整合方向发展。我们利用涉及自治公共汽车(APB)的假设情景来说明这些标准,在道德敏感的环境中展示它们的实际适用性。
更新时间: 2025-07-25 21:09:11
领域: cs.AI,68T27, 03B42 68T27, 03B4268T27, 03B42 68T27, 03B42 68T27, 03B42 68T27, 03B42 68T27, 03B42 68T27, 03B4268T27, 03B42,I.2.0; I.2.9; K.4.1
Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs
Current Large Language Models (LLMs) are predominantly designed with English as the primary language, and even the few that are multilingual tend to exhibit strong English-centric biases. Much like speakers who might produce awkward expressions when learning a second language, LLMs often generate unnatural outputs in non-English languages, reflecting English-centric patterns in both vocabulary and grammar. Despite the importance of this issue, the naturalness of multilingual LLM outputs has received limited attention. In this paper, we address this gap by introducing novel automatic corpus-level metrics to assess the lexical and syntactic naturalness of LLM outputs in a multilingual context. Using our new metrics, we evaluate state-of-the-art LLMs on a curated benchmark in French and Chinese, revealing a tendency towards English-influenced patterns. To mitigate this issue, we also propose a simple and effective alignment method to improve the naturalness of an LLM in a target language and domain, achieving consistent improvements in naturalness without compromising the performance on general-purpose benchmarks. Our work highlights the importance of developing multilingual metrics, resources and methods for the new wave of multilingual LLMs.
Updated: 2025-07-25 21:08:21
标题: 大型语言模型是否有英语口音?评估和改进多语言LLM的自然度
摘要: 当前的大型语言模型(LLMs)主要设计为以英语为主要语言,即使少数多语言模型也往往表现出强烈的英语中心偏见。就像学习第二语言时可能会产生尴尬表达的说话者一样,LLMs经常在非英语语言中生成不自然的输出,反映出词汇和语法中的英语中心模式。尽管这个问题很重要,但多语言LLMs输出的自然性却受到了有限的关注。在本文中,我们通过引入新颖的自动语料库级别指标,评估多语境中LLMs输出的词汇和句法自然性,来填补这一空白。使用我们的新指标,我们评估了法语和中文中最先进的LLMs在一个精心策划的基准测试中,揭示了一种受到英语影响的模式。为了缓解这一问题,我们还提出了一种简单而有效的对齐方法,以改善LLMs在目标语言和领域中的自然性,实现在不损害通用基准测试性能的情况下,对自然性的持续改进。我们的工作强调了为新一波多语言LLMs开发多语言指标、资源和方法的重要性。
更新时间: 2025-07-25 21:08:21
领域: cs.CL,cs.AI
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Text-to-video (T2V) generative models have rapidly advanced and found widespread applications across fields like entertainment, education, and marketing. However, the adversarial vulnerabilities of these models remain rarely explored. We observe that in T2V generation tasks, the generated videos often contain substantial redundant information not explicitly specified in the text prompts, such as environmental elements, secondary objects, and additional details, providing opportunities for malicious attackers to embed hidden harmful content. Exploiting this inherent redundancy, we introduce BadVideo, the first backdoor attack framework tailored for T2V generation. Our attack focuses on designing target adversarial outputs through two key strategies: (1) Spatio-Temporal Composition, which combines different spatiotemporal features to encode malicious information; (2) Dynamic Element Transformation, which introduces transformations in redundant elements over time to convey malicious information. Based on these strategies, the attacker's malicious target seamlessly integrates with the user's textual instructions, providing high stealthiness. Moreover, by exploiting the temporal dimension of videos, our attack successfully evades traditional content moderation systems that primarily analyze spatial information within individual frames. Extensive experiments demonstrate that BadVideo achieves high attack success rates while preserving original semantics and maintaining excellent performance on clean inputs. Overall, our work reveals the adversarial vulnerability of T2V models, calling attention to potential risks and misuse. Our project page is at https://wrt2000.github.io/BadVideo2025/.
Updated: 2025-07-25 21:03:17
标题: BadVideo:针对文本到视频生成的隐蔽后门攻击
摘要: 文本到视频(T2V)生成模型已经迅速发展,并在娱乐、教育和营销等领域得到广泛应用。然而,这些模型的对抗性漏洞很少被探索。我们观察到,在T2V生成任务中,生成的视频往往包含大量在文本提示中未明确指定的冗余信息,例如环境元素、次要对象和额外细节,为恶意攻击者嵌入隐藏的有害内容提供机会。利用这种固有的冗余性,我们引入了BadVideo,这是专为T2V生成量身定制的第一个后门攻击框架。我们的攻击重点是通过两个关键策略设计目标对抗性输出:(1)时空组合,结合不同的时空特征来编码恶意信息;(2)动态元素变换,通过随时间引入冗余元素的变换来传达恶意信息。基于这些策略,攻击者的恶意目标与用户的文本指令无缝集成,具有很高的隐蔽性。此外,通过利用视频的时间维度,我们的攻击成功地规避了传统的内容审核系统,这些系统主要分析单个帧内的空间信息。大量实验证明,BadVideo在保留原始语义和在干净输入上保持出色性能的同时,实现了高攻击成功率。总的来说,我们的工作揭示了T2V模型的对抗性脆弱性,引起人们对潜在风险和滥用的关注。我们的项目页面位于https://wrt2000.github.io/BadVideo2025/。
更新时间: 2025-07-25 21:03:17
领域: cs.CV,cs.AI
Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges
Due to the remarkable capabilities and growing impact of large language models (LLMs), they have been deeply integrated into many aspects of society. Thus, ensuring their alignment with human values and intentions has emerged as a critical challenge. This survey provides a comprehensive overview of practical alignment techniques, training protocols, and empirical findings in LLM alignment. We analyze the development of alignment methods across diverse paradigms, characterizing the fundamental trade-offs between core alignment objectives. Our analysis shows that while supervised fine-tuning enables basic instruction-following, preference-based methods offer more flexibility for aligning with nuanced human intent. We discuss state-of-the-art techniques, including Direct Preference Optimization (DPO), Constitutional AI, brain-inspired methods, and alignment uncertainty quantification (AUQ), highlighting their approaches to balancing quality and efficiency. We review existing evaluation frameworks and benchmarking datasets, emphasizing limitations such as reward misspecification, distributional robustness, and scalable oversight. We summarize strategies adopted by leading AI labs to illustrate the current state of practice. We conclude by outlining open problems in oversight, value pluralism, robustness, and continuous alignment. This survey aims to inform both researchers and practitioners navigating the evolving landscape of LLM alignment.
Updated: 2025-07-25 20:52:58
标题: 大型语言模型的对齐性和安全性:安全机制、训练范式和新兴挑战
摘要: 由于大型语言模型(LLMs)卓越的能力和日益增长的影响力,它们已深度融入社会的许多方面。因此,确保它们与人类价值观和意图保持一致已成为一个关键挑战。本调查提供了LLM对齐的实用技术、训练协议和经验发现的全面概述。我们分析了在不同范例中对齐方法的发展,描述了核心对齐目标之间的基本权衡。我们的分析表明,虽然监督微调能够实现基本的指示遵循,但基于偏好的方法提供了更多灵活性,以便与微妙的人类意图保持一致。我们讨论了最先进的技术,包括直接偏好优化(DPO)、宪法AI、脑启发方法和对齐不确定性量化(AUQ),突出它们在平衡质量和效率方面的方法。我们回顾了现有的评估框架和基准数据集,强调了限制,如奖励错误规范化、分布稳健性和可扩展监督。我们总结了领先AI实验室采取的策略,以展示当前实践状态。最后,我们概述了监督、价值多元主义、稳健性和持续对齐领域存在的问题。本调查旨在为研究人员和实践者提供信息,助其在不断发展的LLM对齐领域中导航。
更新时间: 2025-07-25 20:52:58
领域: cs.AI,cs.LG,stat.ML
Adaptive Bayesian Data-Driven Design of Reliable Solder Joints for Micro-electronic Devices
Solder joint reliability related to failures due to thermomechanical loading is a critically important yet physically complex engineering problem. As a result, simulated behavior is oftentimes computationally expensive. In an increasingly data-driven world, the usage of efficient data-driven design schemes is a popular choice. Among them, Bayesian optimization (BO) with Gaussian process regression is one of the most important representatives. The authors argue that computational savings can be obtained from exploiting thorough surrogate modeling and selecting a design candidate based on multiple acquisition functions. This is feasible due to the relatively low computational cost, compared to the expensive simulation objective. This paper addresses the shortcomings in the adjacent literature by providing and implementing a novel heuristic framework to perform BO with adaptive hyperparameters across the various optimization iterations. Adaptive BO is subsequently compared to regular BO when faced with synthetic objective minimization problems. The results show the efficiency of adaptive BO when compared any worst-performing regular Bayesian schemes. As an engineering use case, the solder joint reliability problem is tackled by minimizing the accumulated non-linear creep strain under a cyclic thermal load. Results show that adaptive BO outperforms regular BO by 3% on average at any given computational budget threshold, critically saving half of the computational expense budget. This practical result underlines the methodological potential of the adaptive Bayesian data-driven methodology to achieve better results and cut optimization-related expenses. Lastly, in order to promote the reproducibility of the results, the data-driven implementations are made available on an open-source basis.
Updated: 2025-07-25 20:34:03
标题: 自适应贝叶斯数据驱动设计微电子器件可靠焊接连接
摘要: 焊点可靠性与由热机械载荷引起的故障相关,是一个至关重要但物理上复杂的工程问题。因此,模拟行为往往计算成本高昂。在一个越来越依赖数据的世界中,使用高效的数据驱动设计方案是一个流行的选择。其中,具有高斯过程回归的贝叶斯优化(BO)是最重要的代表之一。作者认为,通过充分利用代理模型和基于多个获取函数选择设计候选,可以获得计算节省。这是由于相对较低的计算成本,与昂贵的模拟目标相比。本文通过提供和实施一种新颖的启发式框架来进行自适应超参数的BO,以解决相邻文献中的缺点,跨各种优化迭代进行比较。在面对合成目标最小化问题时,将自适应BO与普通BO进行比较。结果显示,自适应BO在任何给定的计算预算阈值上比任何性能最差的常规贝叶斯方案高效。作为一个工程案例,通过在循环热负载下最小化累积非线性蠕变应变来解决焊点可靠性问题。结果表明,自适应BO在任何给定的计算预算阈值上平均比常规BO高出3%,临界节省计算费用预算的一半。这一实际结果强调了自适应贝叶斯数据驱动方法在实现更好结果和削减与优化相关费用方面的方法论潜力。最后,为了促进结果的可重复性,数据驱动实现以开源方式提供。
更新时间: 2025-07-25 20:34:03
领域: stat.ML,cs.LG,physics.comp-ph
Survey on Hand Gesture Recognition from Visual Input
Hand gesture recognition has become an important research area, driven by the growing demand for human-computer interaction in fields such as sign language recognition, virtual and augmented reality, and robotics. Despite the rapid growth of the field, there are few surveys that comprehensively cover recent research developments, available solutions, and benchmark datasets. This survey addresses this gap by examining the latest advancements in hand gesture and 3D hand pose recognition from various types of camera input data including RGB images, depth images, and videos from monocular or multiview cameras, examining the differing methodological requirements of each approach. Furthermore, an overview of widely used datasets is provided, detailing their main characteristics and application domains. Finally, open challenges such as achieving robust recognition in real-world environments, handling occlusions, ensuring generalization across diverse users, and addressing computational efficiency for real-time applications are highlighted to guide future research directions. By synthesizing the objectives, methodologies, and applications of recent studies, this survey offers valuable insights into current trends, challenges, and opportunities for future research in human hand gesture recognition.
Updated: 2025-07-25 20:26:11
标题: 视觉输入下手势识别调查
摘要: 手势识别已成为一个重要的研究领域,受到对人机交互日益增长的需求的推动,这些领域包括手语识别、虚拟和增强现实以及机器人技术。尽管该领域发展迅速,但却很少有综合覆盖最新研究进展、可用解决方案和基准数据集的调查。该调查通过检查手势和3D手势姿势识别在各种类型的摄像头输入数据中的最新进展,包括RGB图像、深度图像和来自单目或多视角摄像头的视频,研究每种方法的方法论要求的不同之处,填补了这一空白。此外,还提供了广泛使用的数据集概述,详细介绍了它们的主要特征和应用领域。最后,为了指导未来研究方向,突出了一些开放挑战,如在真实环境中实现强大的识别、处理遮挡、确保在不同用户之间的泛化,并解决实时应用的计算效率。通过综合最近研究的目标、方法和应用,该调查提供了有关当前趋势、挑战和未来研究机会的宝贵见解。
更新时间: 2025-07-25 20:26:11
领域: cs.CV,cs.AI
Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity
For the problem of reconstructing a low-rank matrix from a few linear measurements, two classes of algorithms have been widely studied in the literature: convex approaches based on nuclear norm minimization, and non-convex approaches that use factorized gradient descent. Under certain statistical model assumptions, it is known that nuclear norm minimization recovers the ground truth as soon as the number of samples scales linearly with the number of degrees of freedom of the ground-truth. In contrast, while non-convex approaches are computationally less expensive, existing recovery guarantees assume that the number of samples scales at least quadratically with the rank $r$ of the ground-truth matrix. In this paper, we close this gap by showing that the non-convex approaches can be as efficient as nuclear norm minimization in terms of sample complexity. Namely, we consider the problem of reconstructing a positive semidefinite matrix from a few Gaussian measurements. We show that factorized gradient descent with spectral initialization converges to the ground truth at a linear rate as soon as the number of samples scales with $ \Omega (rd\kappa^2)$, where $d$ is the dimension, and $\kappa$ is the condition number of the ground truth matrix. This improves the previous rank-dependence in the sample complexity of non-convex matrix factorization from quadratic to linear. Furthermore, we extend our theory to the noisy setting, where we show that with noisy measurements, factorized gradient descent with spectral initialization converges to the minimax optimal error up to a factor linear in $\kappa$. Our proof relies on a probabilistic decoupling argument, where we show that the gradient descent iterates are only weakly dependent on the individual entries of the measurement matrices. We expect that our proof technique is of independent interest for other non-convex problems.
Updated: 2025-07-25 20:22:38
标题: 非凸矩阵感知:在样本复杂性中突破二次秩障碍
摘要: 针对从少量线性测量中重建低秩矩阵的问题,在文献中广泛研究了两类算法:基于核范数最小化的凸方法和使用因子化梯度下降的非凸方法。在某些统计模型假设下,已知核范数最小化在样本数量与地面真实自由度数量线性扩展时恢复地面真实。相比之下,虽然非凸方法在计算上较为便宜,但现有的恢复保证假设样本数量至少与地面真实矩阵的秩$r$的平方成比例。在本文中,我们通过展示非凸方法在样本复杂度方面可以与核范数最小化一样有效来填补这一差距。具体来说,我们考虑从少量高斯测量中重建正半定矩阵的问题。我们展示了具有谱初始化的因子化梯度下降在样本数量与$ \Omega (rd\kappa^2)$成比例时以线性速率收敛于地面真实,其中$d$是维度,$\kappa$是地面真实矩阵的条件数。这将非凸矩阵因子化的样本复杂度的排名依赖性从二次改进为线性。此外,我们将我们的理论扩展到有噪声的情况,在这种情况下,我们展示了在有噪声测量时,具有谱初始化的因子化梯度下降以与$\kappa$线性相关的最小最优误差收敛。我们的证明依赖于一种概率解耦论证,其中我们展示梯度下降迭代仅对测量矩阵的各个条目弱依赖。我们期望我们的证明技术对其他非凸问题也具有独立的兴趣。
更新时间: 2025-07-25 20:22:38
领域: stat.ML,cs.IT,cs.LG,math.IT,math.OC,math.PR,math.ST,stat.TH
Growing Neural Networks: Dynamic Evolution through Gradient Descent
In contrast to conventional artificial neural networks, which are structurally static, we present two approaches for evolving small networks into larger ones during training. The first method employs an auxiliary weight that directly controls network size, while the second uses a controller-generated mask to modulate neuron participation. Both approaches optimize network size through the same gradient-descent algorithm that updates the network's weights and biases. We evaluate these growing networks on nonlinear regression and classification tasks, where they consistently outperform static networks of equivalent final size. We then explore the hyperparameter space of these networks to find associated scaling relations relative to their static counterparts. Our results suggest that starting small and growing naturally may be preferable to simply starting large, particularly as neural networks continue to grow in size and energy consumption.
Updated: 2025-07-25 20:18:00
标题: 不断发展的神经网络:通过梯度下降实现动态演化
摘要: 与传统的人工神经网络结构静态不同,我们提出了两种在训练过程中将小网络演化成大网络的方法。第一种方法利用一个辅助权重直接控制网络大小,而第二种方法使用一个由控制器生成的掩模来调节神经元的参与。这两种方法都通过更新网络的权重和偏差的梯度下降算法来优化网络大小。我们在非线性回归和分类任务中评估这些成长型网络,在这些任务中,它们始终优于等效最终大小的静态网络。然后,我们探索这些网络的超参数空间,以找到与它们的静态对应物相对的缩放关系。我们的结果表明,从小开始自然生长可能比简单地从大开始更可取,特别是在神经网络不断增长的大小和能量消耗的情况下。
更新时间: 2025-07-25 20:18:00
领域: cs.LG,cond-mat.dis-nn
"X of Information'' Continuum: A Survey on AI-Driven Multi-dimensional Metrics for Next-Generation Networked Systems
The development of next-generation networking systems has inherently shifted from throughput-based paradigms towards intelligent, information-aware designs that emphasize the quality, relevance, and utility of transmitted information, rather than sheer data volume. While classical network metrics, such as latency and packet loss, remain significant, they are insufficient to quantify the nuanced information quality requirements of modern intelligent applications, including autonomous vehicles, digital twins, and metaverse environments. In this survey, we present the first comprehensive study of the ``X of Information'' continuum by introducing a systematic four-dimensional taxonomic framework that structures information metrics along temporal, quality/utility, reliability/robustness, and network/communication dimensions. We uncover the increasing interdependencies among these dimensions, whereby temporal freshness triggers quality evaluation, which in turn helps with reliability appraisal, ultimately enabling effective network delivery. Our analysis reveals that artificial intelligence technologies, such as deep reinforcement learning, multi-agent systems, and neural optimization models, enable adaptive, context-aware optimization of competing information quality objectives. In our extensive study of six critical application domains, covering autonomous transportation, industrial IoT, healthcare digital twins, UAV communications, LLM ecosystems, and metaverse settings, we illustrate the revolutionary promise of multi-dimensional information metrics for meeting diverse operational needs. Our survey identifies prominent implementation challenges, including ...
Updated: 2025-07-25 20:03:38
标题: “‘信息’连续体的X:关于人工智能驱动的多维度指标在下一代网络系统中的调查”
摘要: 下一代网络系统的发展从吞吐量为基础的范式向着智能、信息感知设计转变,强调传输信息的质量、相关性和实用性,而不是纯粹的数据量。虽然经典网络指标,如延迟和丢包率,仍然很重要,但它们无法量化现代智能应用的微妙信息质量要求,包括自动驾驶车辆、数字孪生体和元宇宙环境。在本调查中,我们通过引入一个系统性的四维分类框架来构建信息度量,沿着时间、质量/实用性、可靠性/稳健性和网络/通信维度。我们揭示了这些维度之间日益增加的相互依赖关系,其中时间新鲜度触发质量评估,而质量评估又有助于可靠性评估,最终实现有效的网络传输。我们的分析表明,人工智能技术,如深度强化学习、多智能体系统和神经优化模型,使竞争性信息质量目标的自适应、上下文感知优化成为可能。在我们对六个关键应用领域的广泛研究中,涵盖自动运输、工业物联网、医疗数字孪生体、无人机通信、LLM生态系统和元宇宙设置,我们展示了多维信息度量对满足各种运营需求的革命性承诺。我们的调查确定了突出的实施挑战,包括...
更新时间: 2025-07-25 20:03:38
领域: cs.NI,cs.AI
On the Limitations of Ray-Tracing for Learning-Based RF Tasks in Urban Environments
We study the realism of Sionna v1.0.2 ray-tracing for outdoor cellular links in central Rome. We use a real measurement set of 1,664 user-equipments (UEs) and six nominal base-station (BS) sites. Using these fixed positions we systematically vary the main simulation parameters, including path depth, diffuse/specular/refraction flags, carrier frequency, as well as antenna's properties like its altitude, radiation pattern, and orientation. Simulator fidelity is scored for each base station via Spearman correlation between measured and simulated powers, and by a fingerprint-based k-nearest-neighbor localization algorithm using RSSI-based fingerprints. Across all experiments, solver hyper-parameters are having immaterial effect on the chosen metrics. On the contrary, antenna locations and orientations prove decisive. By simple greedy optimization we improve the Spearman correlation by 5% to 130% for various base stations, while kNN-based localization error using only simulated data as reference points is decreased by one-third on real-world samples, while staying twice higher than the error with purely real data. Precise geometry and credible antenna models are therefore necessary but not sufficient; faithfully capturing the residual urban noise remains an open challenge for transferable, high-fidelity outdoor RF simulation.
Updated: 2025-07-25 19:58:44
标题: 关于射线追踪在城市环境中学习型射频任务中的局限性
摘要: 我们研究了Sionna v1.0.2射线追踪在罗马市中心室外蜂窝链路中的真实性。我们使用了一个包含1,664个用户设备(UEs)和六个名义基站(BS)站点的真实测量集。利用这些固定位置,我们系统地变化主要的仿真参数,包括路径深度、漫反射/镜面反射/折射标志、载波频率,以及天线的高度、辐射图案和方向。通过Spearman相关性评分对每个基站的模拟保真度进行评分,并使用基于RSSI指纹的指纹k最近邻定位算法。在所有实验中,求解器超参数对所选指标的影响微乎其微。相反,天线位置和方向被证明是决定性的。通过简单的贪婪优化,我们提高了各个基站的Spearman相关性,提高了5%到130%,而仅使用模拟数据作为参考点的kNN定位误差在真实样本上减少了三分之一,同时比纯粹真实数据的误差高出一倍。因此,精确的几何形状和可信的天线模型是必要的,但不足以满足需求;忠实地捕捉残余的城市噪音仍然是一个开放的挑战,对于可转移、高保真度的室外射频模拟。
更新时间: 2025-07-25 19:58:44
领域: cs.NI,cs.AI,cs.LG
Street network sub-patterns and travel mode
Urban morphology has long been recognized as a factor shaping human mobility, yet comparative and formal classifications of urban form across metropolitan areas remain limited. Building on theoretical principles of urban structure and advances in unsupervised learning, we systematically classified the built environment of nine U.S. metropolitan areas using structural indicators such as density, connectivity, and spatial configuration. The resulting morphological types were linked to mobility patterns through descriptive statistics, marginal effects estimation, and post hoc statistical testing. Here we show that distinct urban forms are systematically associated with different mobility behaviors, such as reticular morphologies being linked to significantly higher public transport use (marginal effect = 0.49) and reduced car dependence (-0.41), while organic forms are associated with increased car usage (0.44), and substantial declines in public transport (-0.47) and active mobility (-0.30). These effects are statistically robust (p < 1e-19), highlighting that the spatial configuration of urban areas plays a fundamental role in shaping transportation choices. Our findings extend previous work by offering a reproducible framework for classifying urban form and demonstrate the added value of morphological analysis in comparative urban research. These results suggest that urban form should be treated as a key variable in mobility planning and provide empirical support for incorporating spatial typologies into sustainable urban policy design.
Updated: 2025-07-25 19:49:51
标题: 街道网络子模式与出行方式
摘要: 城市形态长期以来被认为是塑造人类流动性的一个因素,然而对跨大都市区城市形态的比较和正式分类仍然有限。借鉴城市结构的理论原则和无监督学习的进展,我们系统地利用密度、连通性和空间配置等结构指标对美国九个大都市区的建成环境进行了分类。通过描述统计学、边际效应估计和事后统计检验,将得到的形态类型与流动性模式联系起来。在这里,我们展示了不同的城市形态与不同的流动性行为之间有系统的关联,例如网状形态与显著更高的公共交通使用相关(边际效应=0.49),以及减少对汽车的依赖(-0.41),而有机形态则与增加汽车使用(0.44)、显著减少公共交通(-0.47)和主动流动(-0.30)有关。这些效应在统计上是稳健的(p < 1e-19),突显了城市区域的空间配置在塑造交通选择中起着基本作用。我们的发现通过提供一个可复制的框架来分类城市形态,扩展了以前的研究,并展示了形态分析在比较城市研究中的附加价值。这些结果表明城市形态应被视为流动性规划中的一个关键变量,并为将空间类型学纳入可持续城市政策设计提供了实证支持。
更新时间: 2025-07-25 19:49:51
领域: physics.soc-ph,cs.CY,cs.LG
OneShield -- the Next Generation of LLM Guardrails
The rise of Large Language Models has created a general excitement about the great potential for a myriad of applications. While LLMs offer many possibilities, questions about safety, privacy, and ethics have emerged, and all the key actors are working to address these issues with protective measures for their own models and standalone solutions. The constantly evolving nature of LLMs makes the task of universally shielding users against their potential risks extremely challenging, and one-size-fits-all solutions unfeasible. In this work, we propose OneShield, our stand-alone, model-agnostic and customizable solution to safeguard LLMs. OneShield aims to provide facilities for defining risk factors, expressing and declaring contextual safety and compliance policies, and mitigating LLM risks, with a focus on each specific customer. We describe the implementation of the framework, the scalability considerations and provide usage statistics of OneShield since its first deployment.
Updated: 2025-07-25 19:44:38
标题: OneShield-LLM护栏的下一代
摘要: 大型语言模型的崛起引起了对其潜在应用的普遍兴奋。虽然大型语言模型提供了许多可能性,但关于安全、隐私和道德的问题已经浮出水面,所有主要参与者都在努力采取保护措施来解决这些问题,以保护自己的模型和独立解决方案。大型语言模型不断发展的特性使得普遍保护用户免受潜在风险的任务极具挑战性,一刀切的解决方案也不可行。在这项工作中,我们提出了OneShield,我们独立的、与模型无关且可定制的解决方案,旨在保护大型语言模型。OneShield旨在为定义风险因素、表达和声明上下文安全和合规政策、以及减轻大型语言模型风险提供设施,重点是每个特定客户。我们描述了框架的实施,考虑了可扩展性,并提供了自首次部署以来OneShield的使用统计数据。
更新时间: 2025-07-25 19:44:38
领域: cs.CR,cs.AI,cs.CL
Categorical Schrödinger Bridge Matching
The Schr\"odinger Bridge (SB) is a powerful framework for solving generative modeling tasks such as unpaired domain translation. Most SB-related research focuses on continuous data space $\mathbb{R}^{D}$ and leaves open theoretical and algorithmic questions about applying SB methods to discrete data, e.g, on finite spaces $\mathbb{S}^{D}$. Notable examples of such sets $\mathbb{S}$ are codebooks of vector-quantized (VQ) representations of modern autoencoders, tokens in texts, categories of atoms in molecules, etc. In this paper, we provide a theoretical and algorithmic foundation for solving SB in discrete spaces using the recently introduced Iterative Markovian Fitting (IMF) procedure. Specifically, we theoretically justify the convergence of discrete-time IMF (D-IMF) to SB in discrete spaces. This enables us to develop a practical computational algorithm for SB, which we call Categorical Schr\"odinger Bridge Matching (CSBM). We show the performance of CSBM via a series of experiments with synthetic data and VQ representations of images. The code of CSBM is available at https://github.com/gregkseno/csbm.
Updated: 2025-07-25 19:43:53
标题: 分类Schrödinger桥匹配
摘要: 薛定谔桥(SB)是解决生成建模任务(如无配对域转换)的强大框架。大多数与SB相关的研究集中在连续数据空间$\mathbb{R}^{D}$上,并留下了关于将SB方法应用于离散数据(如有限空间$\mathbb{S}^{D}$)的理论和算法问题。这种集合$\mathbb{S}$的显著示例是现代自动编码器的矢量量化(VQ)表示的码书,文本中的标记,分子中的原子类别等。本文提供了在离散空间中使用最近引入的迭代马尔科夫拟合(IMF)过程解决SB问题的理论和算法基础。具体来说,我们在理论上证明了离散时间IMF(D-IMF)收敛到离散空间中的SB。这使我们能够开发一个实用的计算算法,称为分类薛定谔桥匹配(CSBM)。我们通过一系列关于合成数据和图像VQ表示的实验展示了CSBM的性能。CSBM的代码可在https://github.com/gregkseno/csbm获取。
更新时间: 2025-07-25 19:43:53
领域: cs.LG
Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy
We introduce an optimization framework for variational inference based on the coupled free energy, extending variational inference techniques to account for the curved geometry of the coupled exponential family. This family includes important heavy-tailed distributions such as the generalized Pareto and the Student's t. By leveraging the coupled free energy, which is equal to the coupled evidence lower bound (ELBO) of the inverted probabilities, we improve the accuracy and robustness of the learned model. The coupled generalization of Fisher Information metric and the affine connection. The method is applied to the design of a coupled variational autoencoder (CVAE). By using the coupling for both the distributions and cost functions, the reconstruction metric is derived to still be the mean-square average loss with modified constants. The novelty comes from sampling the heavy-tailed latent distribution with its associated coupled probability, which has faster decaying tails. The result is the ability to train a model robust against severe outliers, while assuring that the training process is stable. The Wasserstein-2 or Fr\'echet Inception Distance of the reconstructed CelebA images shows the CVAE has a 3\% improvement over the VAE after 5 epochs of training.
Updated: 2025-07-25 19:39:30
标题: 使用耦合自由能曲线几何优化的变分推断
摘要: 我们介绍了基于耦合自由能的变分推断优化框架,将变分推断技术扩展到考虑耦合指数族的曲线几何。这个族包括重尾分布,如广义帕累托和学生t分布。通过利用耦合自由能,它等于倒置概率的耦合证据下界(ELBO),我们提高了学习模型的准确性和鲁棒性。耦合广义Fisher信息度量和仿射连接的推广。该方法应用于设计耦合变分自动编码器(CVAE)。通过同时对分布和成本函数进行耦合,重构度量仍然被导出为带有修改常数的均方平均损失。新颖之处在于用其相关的耦合概率对重尾潜在分布进行采样,其尾部衰减更快。结果是能够训练一个对严重离群值稳健的模型,同时确保训练过程稳定。重构CelebA图像的Wasserstein-2或Fr\'echet Inception距离显示,在训练5个时期后,CVAE比VAE有3\%的改进。
更新时间: 2025-07-25 19:39:30
领域: cs.LG,cs.IT,math.IT
Can You Share Your Story? Modeling Clients' Metacognition and Openness for LLM Therapist Evaluation
Understanding clients' thoughts and beliefs is fundamental in counseling, yet current evaluations of LLM therapists often fail to assess this ability. Existing evaluation methods rely on client simulators that clearly disclose internal states to the therapist, making it difficult to determine whether an LLM therapist can uncover unexpressed perspectives. To address this limitation, we introduce MindVoyager, a novel evaluation framework featuring a controllable and realistic client simulator which dynamically adapts itself based on the ongoing counseling session, offering a more realistic and challenging evaluation environment. We further introduce evaluation metrics that assess the exploration ability of LLM therapists by measuring their thorough understanding of client's beliefs and thoughts.
Updated: 2025-07-25 19:32:05
标题: 你能分享你的故事吗?对LLM治疗师评估建模客户的元认知和开放性
摘要: 理解客户的思想和信念在咨询中至关重要,然而目前对LLM治疗师的评估往往未能评估到这种能力。现有的评估方法依赖于客户模拟器,这些模拟器明确地向治疗师披露内部状态,这使得难以确定LLM治疗师是否能揭示未表达的观点。为了解决这一局限,我们引入了MindVoyager,这是一个新颖的评估框架,具有可控和逼真的客户模拟器,根据正在进行的咨询会话动态调整自己,提供更真实和具有挑战性的评估环境。我们进一步引入了评估指标,通过衡量LLM治疗师对客户信念和思想的深入理解来评估他们的探索能力。
更新时间: 2025-07-25 19:32:05
领域: cs.CY,cs.AI
Mask prior-guided denoising diffusion improves inverse protein folding
Inverse protein folding generates valid amino acid sequences that can fold into a desired protein structure, with recent deep-learning advances showing strong potential and competitive performance. However, challenges remain, such as predicting elements with high structural uncertainty, including disordered regions. To tackle such low-confidence residue prediction, we propose a Mask-prior-guided denoising Diffusion (MapDiff) framework that accurately captures both structural information and residue interactions for inverse protein folding. MapDiff is a discrete diffusion probabilistic model that iteratively generates amino acid sequences with reduced noise, conditioned on a given protein backbone. To incorporate structural information and residue interactions, we develop a graph-based denoising network with a mask-prior pre-training strategy. Moreover, in the generative process, we combine the denoising diffusion implicit model with Monte-Carlo dropout to reduce uncertainty. Evaluation on four challenging sequence design benchmarks shows that MapDiff substantially outperforms state-of-the-art methods. Furthermore, the in silico sequences generated by MapDiff closely resemble the physico-chemical and structural characteristics of native proteins across different protein families and architectures.
Updated: 2025-07-25 19:29:03
标题: 使用面具先验引导的去噪扩散改进了蛋白质反折算法
摘要: 反向蛋白质折叠产生有效的氨基酸序列,可以折叠成所需的蛋白质结构,最近深度学习的进展表现出强大潜力和竞争性表现。然而,仍然存在挑战,例如预测具有高结构不确定性的元素,包括无序区域。为了解决这种低置信度残基预测,我们提出了一种Mask-prior-guided denoising Diffusion(MapDiff)框架,可以准确捕获反向蛋白质折叠的结构信息和残基相互作用。MapDiff是一个离散扩散概率模型,可以在给定蛋白质骨架的条件下迭代地生成具有降低噪音的氨基酸序列。为了整合结构信息和残基相互作用,我们开发了一个基于图的去噪网络,采用了一个mask-prior预训练策略。此外,在生成过程中,我们将去噪扩散隐式模型与蒙特卡洛dropout相结合,以减少不确定性。在四项具有挑战性的序列设计基准测试中的评估表明,MapDiff明显优于最先进的方法。此外,MapDiff生成的体外序列与不同蛋白质家族和结构的天然蛋白质的物理化学和结构特征非常相似。
更新时间: 2025-07-25 19:29:03
领域: q-bio.BM,cs.LG
Directly Learning Stock Trading Strategies Through Profit Guided Loss Functions
Stock trading has always been a challenging task due to the highly volatile nature of the stock market. Making sound trading decisions to generate profit is particularly difficult under such conditions. To address this, we propose four novel loss functions to drive decision-making for a portfolio of stocks. These functions account for the potential profits or losses based with respect to buying or shorting respective stocks, enabling potentially any artificial neural network to directly learn an effective trading strategy. Despite the high volatility in stock market fluctuations over time, training time-series models such as transformers on these loss functions resulted in trading strategies that generated significant profits on a portfolio of 50 different S&P 500 company stocks as compared to a benchmark reinforcment learning techniques and a baseline buy and hold method. As an example, using 2021, 2022 and 2023 as three test periods, the Crossformer model adapted with our best loss function was most consistent, resulting in returns of 51.42%, 51.04% and 48.62% respectively. In comparison, the best performing state-of-the-art reinforcement learning methods, PPO and DDPG, only delivered maximum profits of around 41%, 2.81% and 41.58% for the same periods. The code is available at https://anonymous.4open.science/r/bandit-stock-trading-58C8/README.md.
Updated: 2025-07-25 19:22:05
标题: 通过利润导向的损失函数直接学习股票交易策略
摘要: 股票交易一直是一项具有挑战性的任务,因为股市的高度波动性。在这种情况下,做出明智的交易决策以获取利润尤为困难。为了解决这个问题,我们提出了四种新颖的损失函数,以推动股票组合的决策制定。这些函数考虑了根据购买或卖空各自的股票的潜在利润或损失,使得潜在地任何人工神经网络都能直接学习有效的交易策略。尽管股市波动率随时间波动较大,但在这些损失函数上训练时间序列模型(如变压器)导致了交易策略,相比于基准强化学习技术和基准买入持有方法,在50家不同的标准普尔500公司股票组合上产生了显著的利润。例如,以2021年、2022年和2023年作为三个测试期间,我们最好的损失函数适配的Crossformer模型最为稳定,分别获得了51.42%、51.04%和48.62%的回报。相比之下,最佳表现的最新强化学习方法PPO和DDPG,在同一时期只能获得最大利润分别为41%、2.81%和41.58%。代码可在https://anonymous.4open.science/r/bandit-stock-trading-58C8/README.md 获取。
更新时间: 2025-07-25 19:22:05
领域: cs.LG,cs.NE
Efficient and Scalable Agentic AI with Heterogeneous Systems
AI agents are emerging as a dominant workload in a wide range of applications, promising to be the vehicle that delivers the promised benefits of AI to enterprises and consumers. Unlike conventional software or static inference, agentic workloads are dynamic and structurally complex. Often these agents are directed graphs of compute and IO operations that span multi-modal data input and conversion), data processing and context gathering (e.g vector DB lookups), multiple LLM inferences, tool calls, etc. To scale AI agent usage, we need efficient and scalable deployment and agent-serving infrastructure. To tackle this challenge, in this paper, we present a system design for dynamic orchestration of AI agent workloads on heterogeneous compute infrastructure spanning CPUs and accelerators, both from different vendors and across different performance tiers within a single vendor. The system delivers several building blocks: a framework for planning and optimizing agentic AI execution graphs using cost models that account for compute, memory, and bandwidth constraints of different HW; a MLIR based representation and compilation system that can decompose AI agent execution graphs into granular operators and generate code for different HW options; and a dynamic orchestration system that can place the granular components across a heterogeneous compute infrastructure and stitch them together while meeting an end-to-end SLA. Our design performs a systems level TCO optimization and preliminary results show that leveraging a heterogeneous infrastructure can deliver significant TCO benefits. A preliminary surprising finding is that for some workloads a heterogeneous combination of older generation GPUs with newer accelerators can deliver similar TCO as the latest generation homogenous GPU infrastructure design, potentially extending the life of deployed infrastructure.
Updated: 2025-07-25 19:02:42
标题: 高效且可扩展的具有异质系统的代理人人工智能
摘要: 人工智能代理正在成为广泛应用中的主要工作负载,承诺将成为将人工智能的承诺好处传递给企业和消费者的工具。与传统软件或静态推理不同,代理工作负载是动态且结构复杂的。这些代理通常是跨多模式数据输入和转换(例如数据处理和上下文收集(例如矢量数据库查找),多个LLM推理,工具调用等)的计算和IO操作的有向图。为了扩展人工智能代理的使用,我们需要高效和可扩展的部署和代理服务基础设施。 为了应对这一挑战,在本文中,我们提出了一个系统设计,用于在跨越来自不同供应商的不同性能层次的CPU和加速器的异构计算基础设施上对人工智能代理工作负载进行动态编排。该系统提供了几个构建块:一个框架,用于计划和优化代理型人工智能执行图,该框架考虑了不同硬件的计算、内存和带宽约束的成本模型;一个基于MLIR的表示和编译系统,可以将人工智能代理执行图分解为粒度操作,并为不同的硬件选项生成代码;以及一个动态编排系统,可以将粒度组件放置在异构计算基础设施上,并在满足端到端SLA的同时将它们串联在一起。我们的设计进行了系统级TCO优化,初步结果显示,利用异构基础设施可以带来显著的TCO优势。初步令人惊讶的发现是,对于某些工作负载,旧一代GPU与新一代加速器的异构组合可以提供与最新一代同构GPU基础设施设计相似的TCO,可能延长部署基础设施的寿命。
更新时间: 2025-07-25 19:02:42
领域: cs.LG,cs.AI,cs.DC
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Recent advances in large language models have catalyzed the development of multimodal LLMs (MLLMs) that integrate text, speech, and vision within unified frameworks. As MLLMs evolve from narrow, monolingual, task-specific systems to general-purpose instruction-following models, a key frontier lies in evaluating their multilingual and multimodal capabilities over both long and short contexts. However, existing benchmarks fall short in evaluating these dimensions jointly: they are often limited to English, mostly focus on one single modality at a time, rely on short-form contexts, or lack human annotations--hindering comprehensive assessment of model performance across languages, modalities, and task complexity. To address these gaps, we introduce MCIF (Multimodal Crosslingual Instruction Following), the first multilingual human-annotated benchmark based on scientific talks that is designed to evaluate instruction-following in crosslingual, multimodal settings over both short- and long-form inputs. MCIF spans three core modalities--speech, vision, and text--and four diverse languages (English, German, Italian, and Chinese), enabling a comprehensive evaluation of MLLMs' abilities to interpret instructions across languages and combine them with multimodal contextual information. MCIF is released under a CC-BY 4.0 license to encourage open research and progress in MLLMs development.
Updated: 2025-07-25 19:00:51
标题: MCIF:来自科学讲座的多模态跨语言指令遵循基准
摘要: 最近大型语言模型的进展推动了多模态LLM(MLLMs)的发展,这些模型在统一框架内集成了文本、语音和视觉。随着MLLMs从狭窄的、单语种的、任务特定的系统发展为通用的指令跟随模型,一个关键的前沿在于评估它们在长期和短期上下文中的多语种和多模态能力。然而,现有的基准测试在共同评估这些维度方面存在不足:它们往往局限于英语,大多集中在一种单一模态上,依赖于短形式上下文,或者缺乏人工标注,阻碍了对模型在语言、模态和任务复杂性方面表现的全面评估。为了解决这些缺口,我们介绍了MCIF(多模态跨语言指令跟随),这是第一个基于科学演讲的多语种人工标注基准测试,旨在评估跨语言、多模态设置中的指令跟随,包括短文和长文输入。MCIF涵盖了三种核心模态——语音、视觉和文本——以及四种不同的语言(英语、德语、意大利语和中文),使得能够全面评估MLLMs解释跨语言指令以及与多模态上下文信息相结合的能力。MCIF发布在CC-BY 4.0许可下,以鼓励MLLMs开放研究和进展。
更新时间: 2025-07-25 19:00:51
领域: cs.CL,cs.AI,cs.CV,cs.SD
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
Large Language Models (LLMs) have become indispensable in real-world applications. However, their widespread adoption raises significant safety concerns, particularly in responding to socially harmful questions. Despite substantial efforts to improve model safety through alignment, aligned models can still have their safety protections undermined by subsequent fine-tuning - even when the additional training data appears benign. In this paper, we empirically demonstrate that this vulnerability stems from the sensitivity of safety-critical low-rank subspaces in LLM parameters to fine-tuning. Building on this insight, we propose a novel training-free method, termed Low-Rank Extrapolation (LoX), to enhance safety robustness by extrapolating the safety subspace of an aligned LLM. Our experimental results confirm the effectiveness of LoX, demonstrating significant improvements in robustness against both benign and malicious fine-tuning attacks while preserving the model's adaptability to new tasks. For instance, LoX leads to 11% to 54% absolute reductions in attack success rates (ASR) facing benign or malicious fine-tuning attacks. By investigating the ASR landscape of parameters, we attribute the success of LoX to that the extrapolation moves LLM parameters to a flatter zone, thereby less sensitive to perturbations. The code is available at github.com/VITA-Group/LoX.
Updated: 2025-07-25 18:57:36
标题: LoX:低秩外推强化LLM安全性抵抗微调
摘要: 大型语言模型(LLMs)已经成为现实世界中不可或缺的应用。然而,它们的广泛应用引发了重大的安全问题,特别是在回答社会有害问题方面。尽管通过调整努力提高模型安全性,但即使附加的训练数据看似无害,对齐模型的安全保护仍可能被后续微调破坏。在本文中,我们通过实验证明,这种脆弱性源于LLM参数中关键低秩子空间对微调的敏感性。基于这一洞察,我们提出了一种新颖的无训练方法,称为低秩外推(LoX),通过外推对齐LLM的安全子空间来增强安全鲁棒性。我们的实验结果证实了LoX的有效性,显示出对抗良性和恶意微调攻击的鲁棒性显著提高,同时保持了模型对新任务的适应性。例如,LoX可以使面对良性或恶意微调攻击的攻击成功率(ASR)降低11%至54%。通过研究参数的ASR景观,我们将LoX的成功归因于外推将LLM参数移动到一个更平缓的区域,从而对扰动不太敏感。该代码可在github.com/VITA-Group/LoX找到。
更新时间: 2025-07-25 18:57:36
领域: cs.LG,cs.AI,cs.CL
Quantum Reinforcement Learning by Adaptive Non-local Observables
Hybrid quantum-classical frameworks leverage quantum computing for machine learning; however, variational quantum circuits (VQCs) are limited by the need for local measurements. We introduce an adaptive non-local observable (ANO) paradigm within VQCs for quantum reinforcement learning (QRL), jointly optimizing circuit parameters and multi-qubit measurements. The ANO-VQC architecture serves as the function approximator in Deep Q-Network (DQN) and Asynchronous Advantage Actor-Critic (A3C) algorithms. On multiple benchmark tasks, ANO-VQC agents outperform baseline VQCs. Ablation studies reveal that adaptive measurements enhance the function space without increasing circuit depth. Our results demonstrate that adaptive multi-qubit observables can enable practical quantum advantages in reinforcement learning.
Updated: 2025-07-25 18:57:16
标题: 通过自适应非局部可观测量的量子强化学习
摘要: 混合量子-经典框架利用量子计算进行机器学习;然而,变分量子电路(VQCs)受到对局部测量的需求的限制。我们在VQCs中引入了一种自适应的非局部观测(ANO)范式,用于量子强化学习(QRL),共同优化电路参数和多量子比特测量。ANO-VQC架构作为Deep Q-Network(DQN)和Asynchronous Advantage Actor-Critic(A3C)算法中的函数逼近器。在多个基准任务上,ANO-VQC代理优于基准VQCs。消融研究表明,自适应测量增强了函数空间而不增加电路深度。我们的结果表明,自适应多量子比特观测可以在强化学习中实现实际的量子优势。
更新时间: 2025-07-25 18:57:16
领域: quant-ph,cs.AI,cs.LG
Federated Calculation of the Free-Support Transportation Barycenter by Single-Loop Dual Decomposition
We propose an efficient federated dual decomposition algorithm for calculating the Wasserstein barycenter of several distributions, including choosing the support of the solution. The algorithm does not access local data and uses only highly aggregated information. It also does not require repeated solutions to mass transportation problems. Because of the absence of any matrix-vector operations, the algorithm exhibits a very low complexity of each iteration and significant scalability. We illustrate its virtues and compare it to the state-of-the-art methods on several examples of mixture models.
Updated: 2025-07-25 18:54:25
标题: 《单环双分解下的自由支持运输质心的联合计算》
摘要: 我们提出了一种高效的联合双重分解算法,用于计算几个分布的Wasserstein质心,包括选择解的支持。该算法不访问本地数据,仅使用高度聚合的信息。它也不需要重复解决质量运输问题。由于没有任何矩阵-向量操作,该算法在每次迭代中表现出非常低的复杂性和显著的可扩展性。我们展示了其优点,并将其与几个混合模型的最新方法进行了比较。
更新时间: 2025-07-25 18:54:25
领域: cs.LG,math.OC,stat.ML
Studying number theory with deep learning: a case study with the Möbius and squarefree indicator functions
Building on work of Charton, we train small transformer models to calculate the M\"{o}bius function $\mu(n)$ and the squarefree indicator function $\mu^2(n)$. The models attain nontrivial predictive power. We apply a mixture of additional models and feature scoring to give a theoretical explanation.
Updated: 2025-07-25 18:48:48
标题: 使用深度学习研究数论:以莫比乌斯和无平方因子指示函数为例研究
摘要: 借鉴Charton的工作,我们训练小型Transformer模型来计算Möbius函数$\mu(n)$和无平方因子指示函数$\mu^2(n)$。这些模型具有非平凡的预测能力。我们应用一种混合模型和特征评分方法来给出理论解释。
更新时间: 2025-07-25 18:48:48
领域: math.NT,cs.LG
State evolution beyond first-order methods I: Rigorous predictions and finite-sample guarantees
We develop a toolbox for exact analysis of iterative algorithms on a class of high-dimensional nonconvex optimization problems with random data. While prior work has shown that low-dimensional statistics of (generalized) first-order methods can be predicted by a deterministic recursion known as state evolution, our focus is on developing such a prediction for a more general class of algorithms. We provide a state evolution for any method whose iterations are given by (possibly interleaved) first-order and saddle point updates, showing two main results. First, we establish a rigorous state evolution prediction that holds even when the updates are not coordinate-wise separable. Second, we establish finite-sample guarantees bounding the deviation of the empirical updates from the established state evolution. In the process, we develop a technical toolkit that may prove useful in related problems. One component of this toolkit is a general Hilbert space lifting technique to prove existence and uniqueness of a convenient parameterization of the state evolution. Another component of the toolkit combines a generic application of Bolthausen's conditioning method with a sequential variant of Gordon's Gaussian comparison inequality, and provides additional ingredients that enable a general finite-sample analysis.
Updated: 2025-07-25 18:28:09
标题: 超越一阶方法的状态演变 I:严格预测和有限样本保证
摘要: 我们开发了一个工具箱,用于精确分析具有随机数据的高维非凸优化问题上的迭代算法。尽管先前的工作已经表明,(广义)一阶方法的低维统计量可以通过一种称为状态演化的确定性递归来预测,但我们的重点是为更一般的算法开发这样的预测。我们为任何其迭代由(可能交错的)一阶和鞍点更新给出的方法提供了一个状态演化,展示了两个主要结果。首先,我们建立了一个严格的状态演化预测,即使在更新不可分离坐标的情况下也成立。其次,我们建立了有限样本保证,限制了经验更新与已建立状态演化之间的偏差。在这个过程中,我们开发了一个可能在相关问题中证明有用的技术工具包。该工具包的一个组成部分是一个通用的希尔伯特空间提升技术,用于证明状态演化的一个方便参数化的存在性和唯一性。工具包的另一个组成部分结合了波尔特豪森的调节方法的一般应用和戈登的高斯比较不等式的顺序变体,提供了额外的成分,使得一般有限样本分析成为可能。
更新时间: 2025-07-25 18:28:09
领域: math.ST,cs.LG,math.OC,stat.ML,stat.TH
Securing the Internet of Medical Things (IoMT): Real-World Attack Taxonomy and Practical Security Measures
The Internet of Medical Things (IoMT) has the potential to radically improve healthcare by enabling real-time monitoring, remote diagnostics, and AI-driven decision making. However, the connectivity, embedded intelligence, and inclusion of a wide variety of novel sensors expose medical devices to severe cybersecurity threats, compromising patient safety and data privacy. In addition, many devices also have direct capacity - individually or in conjunction with other IoMT devices - to perform actions on the patient, such as delivering an electrical stimulus, administering a drug, or activating a motor, which can potentially be life-threatening. We provide a taxonomy of potential attacks targeting IoMT, presenting attack surfaces, vulnerabilities, and mitigation strategies across all layers of the IoMT architecture. It answers key questions such as: What makes IoMT security different from traditional IT security? What are the cybersecurity threats to medical devices? How can engineers design secure IoMT systems and protect hospital networks from cyberattacks? By analyzing historical cyber incidents, we highlight critical security gaps and propose practical security guidelines for medical device engineers and security professionals. This work bridges the gap between research and implementation, equipping healthcare stakeholders with actionable insights to build resilient and privacy-preserving IoMT ecosystems. Finally, we present the latest standardization and compliance frameworks, that IoMT security designers should be aware of.
Updated: 2025-07-25 18:24:45
标题: 确保医疗互联网事物(IoMT)安全性:现实世界攻击分类和实用安全措施
摘要: 医疗物联网(IoMT)有潜力通过实时监测、远程诊断和基于人工智能的决策,从根本上改善医疗保健。然而,连接性、嵌入式智能和包含各种新型传感器使医疗设备面临严重的网络安全威胁,可能危及患者安全和数据隐私。此外,许多设备还具有直接能力 - 单独或与其他IoMT设备一起 - 对患者执行操作,如提供电刺激、给药或激活电动机,这可能具有潜在的危险。我们为针对IoMT的潜在攻击提供了一个分类法,呈现攻击表面、漏洞和在IoMT架构各层中的缓解策略。它回答了关键问题,例如:IoMT安全与传统IT安全有何不同?医疗设备面临哪些网络安全威胁?工程师如何设计安全的IoMT系统并保护医院网络免受网络攻击?通过分析历史网络事件,我们突出显示了关键的安全漏洞,并提出了医疗设备工程师和安全专业人员的实用安全准则。这项工作弥合了研究和实施之间的差距,为保健利益相关者提供可操作的见解,以构建具有弹性和保护隐私的IoMT生态系统。最后,我们介绍了IoMT安全设计者应该意识到的最新标准化和合规框架。
更新时间: 2025-07-25 18:24:45
领域: cs.CR
Deep Unsupervised Domain Adaptation for Time Series Classification: a Benchmark
Unsupervised Domain Adaptation (UDA) aims to harness labeled source data to train models for unlabeled target data. Despite extensive research in domains like computer vision and natural language processing, UDA remains underexplored for time series data, which has widespread real-world applications ranging from medicine and manufacturing to earth observation and human activity recognition. Our paper addresses this gap by introducing a comprehensive benchmark for evaluating UDA techniques for time series classification, with a focus on deep learning methods. We provide seven new benchmark datasets covering various domain shifts and temporal dynamics, facilitating fair and standardized UDA method assessments with state of the art neural network backbones (e.g. Inception) for time series data. This benchmark offers insights into the strengths and limitations of the evaluated approaches while preserving the unsupervised nature of domain adaptation, making it directly applicable to practical problems. Our paper serves as a vital resource for researchers and practitioners, advancing domain adaptation solutions for time series data and fostering innovation in this critical field. The implementation code of this benchmark is available at https://github.com/EricssonResearch/UDA-4-TSC.
Updated: 2025-07-25 18:24:22
标题: 深度无监督领域适应用于时间序列分类:一个基准Benchmark
摘要: 无监督域适应(UDA)旨在利用标记的源数据来训练未标记的目标数据的模型。尽管在诸如计算机视觉和自然语言处理等领域进行了广泛的研究,但对于时间序列数据,UDA仍然是一个未被充分探索的领域,时间序列数据在医学、制造业、地球观测和人类活动识别等领域具有广泛的实际应用。我们的论文通过引入一个全面的基准来评估时间序列分类的UDA技术,重点放在深度学习方法上,来填补这一空白。我们提供了七个新的基准数据集,涵盖各种领域转移和时间动态,有助于使用最先进的神经网络骨干(例如Inception)进行公平和标准化的UDA方法评估。这个基准提供了对评估方法的优势和限制的见解,同时保持了领域适应的无监督性质,使其直接适用于实际问题。我们的论文为研究人员和从业者提供了重要资源,推动了时间序列数据领域适应解决方案的发展,并促进了这一关键领域的创新。这个基准的实现代码可以在https://github.com/EricssonResearch/UDA-4-TSC找到。
更新时间: 2025-07-25 18:24:22
领域: cs.LG,cs.AI,stat.ML
DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference
Deploying Large Language Models (LLMs) on edge devices remains challenging due to their quadratically increasing computations with the sequence length. Existing studies for dynamic attention pruning are designed for hardware with massively parallel computation capabilities, such as GPUs or TPUs, and aim at long context lengths (e.g., 64K), making them unsuitable for edge scenarios. We present DeltaLLM, a training-free framework that exploits temporal sparsity in attention patterns to enable efficient LLM inference across both the prefilling and decoding stages, on resource-constrained edge devices. DeltaLLM introduces an accuracy- and memory-aware delta matrix construction strategy that introduces temporal sparsity, and a context-aware hybrid attention mechanism that combines full attention in a local context window with delta approximation outside it to increase accuracy. We evaluate our framework on the edge-device-friendly BitNet-b1.58-2B-4T model and Llama3.2-1B-Instruct model across diverse language tasks. The results show that on BitNet, our framework increases the attention sparsity from 0% to 60% during the prefilling stage with slight accuracy improvement on the WG task, and 0% to 57% across both the prefilling and decoding stages, with even higher F1 score from 29.63 to 30.97 on SQuAD-v2 task. On the Llama model, it can also achieve up to 60% sparsity during the prefilling stage and around 57% across both stages with negligible accuracy drop. These results demonstrate that DeltaLLM offers a promising solution for efficient edge deployment, requiring no fine-tuning and seamlessly integrating with existing inference pipelines.
Updated: 2025-07-25 18:23:18
标题: DeltaLLM:一种利用时间稀疏性的训练免费框架,用于高效的边缘LLM推断
摘要: 在边缘设备上部署大型语言模型(LLMs)仍然具有挑战性,因为它们随着序列长度的增加而呈二次增长的计算量。现有的动态注意力修剪研究是为具有大规模并行计算能力的硬件设计的,如GPU或TPU,并旨在实现长上下文长度(例如64K),使其不适用于边缘场景。我们提出了DeltaLLM,这是一个无需训练的框架,利用注意力模式中的时间稀疏性,实现在资源受限的边缘设备上高效进行LLM推理,包括填充和解码阶段。DeltaLLM引入了一个考虑准确性和内存的增量矩阵构建策略,引入了时间稀疏性,并结合了本地上下文窗口中的全局注意力和外部的增量近似,以提高准确性。我们在适用于边缘设备的BitNet-b1.58-2B-4T模型和Llama3.2-1B-Instruct模型上评估了我们的框架在不同语言任务上的表现。结果显示,在BitNet模型上,我们的框架在填充阶段将注意力稀疏度从0%提高到60%,在WG任务上稍微提高了准确性,并在填充和解码阶段将其从0%提高到57%,在SQuAD-v2任务上的F1分数甚至从29.63提高到30.97。在Llama模型上,它在填充阶段也可以实现高达60%的稀疏度,并在两个阶段中约为57%,几乎没有准确性下降。这些结果表明,DeltaLLM为高效的边缘部署提供了一个有前途的解决方案,无需微调,可以与现有的推理管道无缝集成。
更新时间: 2025-07-25 18:23:18
领域: cs.AI,eess.SP
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?
Recent advancements in Large Language Models (LLMs) have significantly enhanced their code generation capabilities. However, their robustness against adversarial misuse, particularly through multi-turn malicious coding prompts, remains underexplored. In this work, we introduce code decomposition attacks, where a malicious coding task is broken down into a series of seemingly benign subtasks across multiple conversational turns to evade safety filters. To facilitate systematic evaluation, we introduce \benchmarkname{}, a large-scale benchmark designed to evaluate the robustness of code LLMs against both single-turn and multi-turn malicious prompts. Empirical results across open- and closed-source models reveal persistent vulnerabilities, especially under multi-turn scenarios. Fine-tuning on MOCHA improves rejection rates while preserving coding ability, and importantly, enhances robustness on external adversarial datasets with up to 32.4% increase in rejection rates without any additional supervision.
Updated: 2025-07-25 18:11:10
标题: MOCHA:代码语言模型对多轮恶意编码提示具有鲁棒性吗?
摘要: 最近,大型语言模型(LLMs)的进展显著增强了它们的代码生成能力。然而,它们对抗对抗性误用的鲁棒性,特别是通过多轮恶意编码提示,仍然未被充分探索。在这项工作中,我们介绍了代码分解攻击,其中恶意编码任务被分解为一系列看似良性的子任务,跨多个对话轮来规避安全过滤器。为了促进系统评估,我们引入了\benchmarkname{},一个大规模基准,旨在评估代码LLMs对单轮和多轮恶意提示的鲁棒性。开源和闭源模型的实证结果显示出持续的漏洞,特别是在多轮场景下。在MOCHA上微调可以提高拒绝率,同时保留编码能力,并且重要的是,在没有任何额外监督的情况下,可以在外部对抗性数据集上提高高达32.4%的拒绝率。
更新时间: 2025-07-25 18:11:10
领域: cs.CL,cs.AI,cs.CR,cs.LG
Affordance-Guided Reinforcement Learning via Visual Prompting
Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as human demonstrations of success and failure, to learn task-specific reward functions. Recently, there is also a growing adoption of large multi-modal foundation models for robotics that can perform visual reasoning in physical contexts and generate coarse robot motions for manipulation tasks. Motivated by this range of capability, in this work, we present Keypoint-based Affordance Guidance for Improvements (KAGI), a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL. State-of-the-art VLMs have demonstrated impressive zero-shot reasoning about affordances through keypoints, and we use these to define dense rewards that guide autonomous robotic learning. On diverse real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 30K online fine-tuning steps. Additionally, we demonstrate the robustness of KAGI to reductions in the number of in-domain demonstrations used for pre-training, reaching similar performance in 45K online fine-tuning steps. Project website: https://sites.google.com/view/affordance-guided-rl
Updated: 2025-07-25 18:09:11
标题: 基于视觉提示的可供性引导强化学习
摘要: 通过强化学习(RL)装备机器人有潜力仅通过奖励信号学习广泛技能。然而,为一般操纵任务获取稳健且密集的奖励信号仍然是一个挑战。现有的基于学习的方法需要大量数据,如人类成功和失败的演示,来学习任务特定的奖励函数。最近,也越来越多地采用了用于机器人技术的大型多模态基础模型,可以在物理环境中进行视觉推理,并为操纵任务生成粗糙的机器人动作。受到这种能力范围的启发,本文提出了基于关键点的可承担引导改进(KAGI)方法,利用视觉语言模型(VLMs)塑造的奖励进行自主RL。最先进的VLMs已经通过关键点展示了令人印象深刻的零样推理能力,并且我们利用这些定义密集奖励来指导自主机器人学习。在通过自然语言描述指定的多样真实世界操纵任务中,KAGI提高了自主RL的样本效率,并在30K在线微调步骤中实现成功的任务完成。此外,我们展示了KAGI对于减少用于预训练的领域内演示数量的稳健性,在45K在线微调步骤中达到类似的性能。项目网站:https://sites.google.com/view/affordance-guided-rl
更新时间: 2025-07-25 18:09:11
领域: cs.RO,cs.AI,cs.LG
Efficient Attention Mechanisms for Large Language Models: A Survey
Transformer-based architectures have become the prevailing backbone of large language models. However, the quadratic time and memory complexity of self-attention remains a fundamental obstacle to efficient long-context modeling. To address this limitation, recent research has introduced two principal categories of efficient attention mechanisms. Linear attention methods achieve linear complexity through kernel approximations, recurrent formulations, or fastweight dynamics, thereby enabling scalable inference with reduced computational overhead. Sparse attention techniques, in contrast, limit attention computation to selected subsets of tokens based on fixed patterns, block-wise routing, or clustering strategies, enhancing efficiency while preserving contextual coverage. This survey provides a systematic and comprehensive overview of these developments, integrating both algorithmic innovations and hardware-level considerations. In addition, we analyze the incorporation of efficient attention into largescale pre-trained language models, including both architectures built entirely on efficient attention and hybrid designs that combine local and global components. By aligning theoretical foundations with practical deployment strategies, this work aims to serve as a foundational reference for advancing the design of scalable and efficient language models.
Updated: 2025-07-25 18:08:10
标题: 大型语言模型的高效注意机制:一项调查
摘要: 基于Transformer的架构已成为大型语言模型的主要支柱。然而,自注意力的二次时间和内存复杂度仍然是实现高效长上下文建模的一个根本障碍。为了解决这一限制,最近的研究引入了两类主要的高效注意力机制。线性注意力方法通过核逼近、循环公式或快速权重动态实现线性复杂度,从而实现可扩展的推理,并减少计算开销。相比之下,稀疏注意力技术将注意力计算限制在基于固定模式、块状路由或聚类策略的选定子集上,提高效率同时保留上下文覆盖。本调查提供了这些发展的系统和全面概述,整合了算法创新和硬件层面的考虑。此外,我们分析了将高效注意力整合到大规模预训练语言模型中,包括完全基于高效注意力构建的架构和将局部和全局组件结合的混合设计。通过将理论基础与实际部署策略对齐,本工作旨在成为推进可扩展和高效语言模型设计的基础参考。
更新时间: 2025-07-25 18:08:10
领域: cs.CL,cs.AI
Hypergames: Modeling Misaligned Perceptions and Nested Beliefs for Multi-agent Systems
Classical game-theoretic models typically assume rational agents, complete information, and common knowledge of payoffs - assumptions that are often violated in real-world MAS characterized by uncertainty, misaligned perceptions, and nested beliefs. To overcome these limitations, researchers have proposed extensions that incorporate models of cognitive constraints, subjective beliefs, and heterogeneous reasoning. Among these, hypergame theory extends the classical paradigm by explicitly modeling agents' subjective perceptions of the strategic scenario, known as perceptual games, in which agents may hold divergent beliefs about the structure, payoffs, or available actions. We present a systematic review of agent-compatible applications of hypergame theory, examining how its descriptive capabilities have been adapted to dynamic and interactive MAS contexts. We analyze 44 selected studies from cybersecurity, robotics, social simulation, communications, and general game-theoretic modeling. Building on a formal introduction to hypergame theory and its two major extensions - hierarchical hypergames and HNF - we develop agent-compatibility criteria and an agent-based classification framework to assess integration patterns and practical applicability. Our analysis reveals prevailing tendencies, including the prevalence of hierarchical and graph-based models in deceptive reasoning and the simplification of extensive theoretical frameworks in practical applications. We identify structural gaps, including the limited adoption of HNF-based models, the lack of formal hypergame languages, and unexplored opportunities for modeling human-agent and agent-agent misalignment. By synthesizing trends, challenges, and open research directions, this review provides a new roadmap for applying hypergame theory to enhance the realism and effectiveness of strategic modeling in dynamic multi-agent environments.
Updated: 2025-07-25 18:06:41
标题: 超级游戏:为多智能体系统建模错位的认知和嵌套信念
摘要: 经典博弈论模型通常假设有理性的代理人、完整的信息和支付的共同知识 - 这些假设在现实世界的多主体系统中往往被违反,这些系统以不确定性、错位的感知和嵌套信念为特征。为了克服这些限制,研究人员提出了包含认知约束模型、主观信念和异质推理的扩展。其中,超博弈理论通过显式建模代理人对战略情景的主观感知,即所谓的感知博弈,扩展了经典范式,在这种情况下,代理人可能对结构、支付或可用行动持有不同的信念。我们对超博弈理论的代理兼容应用进行了系统性审查,考察了其描述性能力如何适应动态和互动的多主体系统环境。我们分析了来自网络安全、机器人技术、社会模拟、通信和一般博弈论建模的44项选定研究。基于对超博弈理论及其两个主要扩展 - 分层超博弈和HNF的正式介绍,我们制定了代理兼容性标准和一个基于代理的分类框架,以评估整合模式和实际适用性。我们的分析揭示了主导趋势,包括在欺骗性推理中分层和基于图形的模型的普遍性,以及在实际应用中对广泛理论框架的简化。我们确定了结构性差距,包括对基于HNF的模型采用有限,缺乏正式的超博弈语言,以及未经探索的建模人类代理人和代理人之间错位的机会。通过综合趋势、挑战和开放研究方向,本审查为将超博弈理论应用于增强动态多主体环境中战略建模的现实性和效果提供了新的路线图。
更新时间: 2025-07-25 18:06:41
领域: cs.AI,cs.MA
Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning
Large language models (LLMs) possess extensive world knowledge, including geospatial knowledge, which has been successfully applied to various geospatial tasks such as mobility prediction and social indicator prediction. However, LLMs often generate inaccurate geospatial knowledge, leading to geospatial hallucinations (incorrect or inconsistent representations of geospatial information) that compromise their reliability. While the phenomenon of general knowledge hallucination in LLMs has been widely studied, the systematic evaluation and mitigation of geospatial hallucinations remain largely unexplored. To address this gap, we propose a comprehensive evaluation framework for geospatial hallucinations, leveraging structured geospatial knowledge graphs for controlled assessment. Through extensive evaluation across 20 advanced LLMs, we uncover the hallucinations in their geospatial knowledge. Building on these insights, we introduce a dynamic factuality aligning method based on Kahneman-Tversky Optimization (KTO) to mitigate geospatial hallucinations in LLMs, leading to a performance improvement of over 29.6% on the proposed benchmark. Extensive experimental results demonstrate the effectiveness of our benchmark and learning algorithm in enhancing the trustworthiness of LLMs in geospatial knowledge and reasoning tasks.
Updated: 2025-07-25 18:00:21
标题: 减轻大型语言模型中的地理空间知识幻觉:基准测试和动态事实对齐
摘要: 大型语言模型(LLMs)拥有广泛的世界知识,包括地理空间知识,已成功应用于各种地理空间任务,如移动性预测和社会指标预测。然而,LLMs经常生成不准确的地理空间知识,导致地理空间幻觉(地理空间信息的不正确或不一致表示),从而损害它们的可靠性。虽然LLMs中的一般知识幻觉现象已被广泛研究,但对地理空间幻觉的系统评估和缓解仍然很少被探讨。为了填补这一空白,我们提出了一个全面的地理空间幻觉评估框架,利用结构化的地理空间知识图进行受控评估。通过对20个先进的LLMs进行广泛评估,我们揭示了它们地理空间知识中的幻觉。基于这些见解,我们引入了一种基于卡内曼-特维斯基优化(KTO)的动态事实对齐方法,以减轻LLMs中的地理空间幻觉,从而在提出的基准测试中实现了超过29.6%的性能改进。广泛的实验结果证明了我们的基准测试和学习算法在增强LLMs在地理空间知识和推理任务中的可信度方面的有效性。
更新时间: 2025-07-25 18:00:21
领域: cs.CL,cs.AI,cs.LG
Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts
Many recent papers have studied the development of superforecaster-level event forecasting LLMs. While methodological problems with early studies cast doubt on the use of LLMs for event forecasting, recent studies with improved evaluation methods have shown that state-of-the-art LLMs are gradually reaching superforecaster-level performance, and reinforcement learning has also been reported to improve future forecasting. Additionally, the unprecedented success of recent reasoning models and Deep Research-style models suggests that technology capable of greatly improving forecasting performance has been developed. Therefore, based on these positive recent trends, we argue that the time is ripe for research on large-scale training of superforecaster-level event forecasting LLMs. We discuss two key research directions: training methods and data acquisition. For training, we first introduce three difficulties of LLM-based event forecasting training: noisiness-sparsity, knowledge cut-off, and simple reward structure problems. Then, we present related ideas to mitigate these problems: hypothetical event Bayesian networks, utilizing poorly-recalled and counterfactual events, and auxiliary reward signals. For data, we propose aggressive use of market, public, and crawling datasets to enable large-scale training and evaluation. Finally, we explain how these technical advances could enable AI to provide predictive intelligence to society in broader areas. This position paper presents promising specific paths and considerations for getting closer to superforecaster-level AI technology, aiming to call for researchers' interest in these directions.
Updated: 2025-07-25 17:59:13
标题: 通过大规模训练大型语言模型推进事件预测:挑战、解决方案和更广泛影响
摘要: 许多最近的论文研究了超预测者级别事件预测LLMs的发展。尽管早期研究中的方法问题对LLMs用于事件预测提出了质疑,但最近的研究通过改进的评估方法表明,最先进的LLMs逐渐达到了超预测者级别的性能,同时强化学习也被报道可以改进未来的预测。此外,最近推理模型和深度研究风格模型取得了空前的成功,这表明能够大幅提高预测性能的技术已经发展出来。因此,基于这些积极的近期趋势,我们认为现在是研究大规模训练超预测者级别事件预测LLMs的时机。我们讨论了两个关键的研究方向:训练方法和数据获取。对于训练,我们首先介绍了基于LLM的事件预测训练的三个困难:嘈杂稀疏、知识截断和简单奖励结构问题。然后,我们提出了相关的想法来缓解这些问题:假设事件贝叶斯网络,利用记忆不佳和反事实事件,以及辅助奖励信号。对于数据,我们提出积极利用市场、公共和爬取数据集来实现大规模训练和评估。最后,我们解释了这些技术进步如何使人工智能能够在更广泛的领域为社会提供预测智能。这篇立场论文提出了朝着超预测者级别人工智能技术更近一步的具体路径和考虑,旨在呼吁研究人员对这些方向产生兴趣。
更新时间: 2025-07-25 17:59:13
领域: cs.LG,cs.AI,cs.CL
Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts
Many recent papers have studied the development of superforecaster-level event forecasting LLMs. While methodological problems with early studies cast doubt on the use of LLMs for event forecasting, recent studies with improved evaluation methods have shown that state-of-the-art LLMs are gradually reaching superforecaster-level performance, and reinforcement learning has also been reported to improve future forecasting. Additionally, the unprecedented success of recent reasoning models and Deep Research-style models suggests that technology capable of greatly improving forecasting performance has been developed. Therefore, based on these positive recent trends, we argue that the time is ripe for research on large-scale training of superforecaster-level event forecasting LLMs. We discuss two key research directions: training methods and data acquisition. For training, we first introduce three difficulties of LLM-based event forecasting training: noisiness-sparsity, knowledge cut-off, and simple reward structure problems. Then, we present related ideas to mitigate these problems: hypothetical event Bayesian networks, utilizing poorly-recalled and counterfactual events, and auxiliary reward signals. For data, we propose aggressive use of market, public, and crawling datasets to enable large-scale training and evaluation. Finally, we explain how these technical advances could enable AI to provide predictive intelligence to society in broader areas. This position paper presents promising specific paths and considerations for getting closer to superforecaster-level AI technology, aiming to call for researchers' interest in these directions.
Updated: 2025-07-25 17:59:13
标题: 通过大规模训练大型语言模型推进事件预测:挑战、解决方案和更广泛影响
摘要: 许多最近的论文研究了超预测者级别事件预测LLMs的发展。虽然早期研究中存在方法论问题,对LLMs用于事件预测产生了怀疑,但最近使用改进的评估方法的研究表明,最先进的LLMs逐渐达到了超预测者级别的性能,并且强化学习也被报道可以改进未来的预测。此外,最近推理模型和深度研究风格模型的前所未有成功表明,能够大幅提高预测性能的技术已经开发出来。因此,基于这些积极的最近趋势,我们认为现在是进行大规模培训超预测者级别事件预测LLMs研究的时机。我们讨论了两个关键的研究方向:培训方法和数据获取。对于培训,我们首先介绍了基于LLM的事件预测培训的三个困难:嘈杂稀疏、知识截断和简单奖励结构问题。然后,我们提出了相关想法来缓解这些问题:假设事件贝叶斯网络,利用记忆不佳和相对事实的事件,以及辅助奖励信号。对于数据,我们建议积极利用市场、公共和爬虫数据集,以实现大规模培训和评估。最后,我们解释了这些技术进步如何使AI能够在更广泛的领域为社会提供预测智能。这篇立场论文提出了接近超预测者级别AI技术的有前景的具体途径和考虑,旨在呼吁研究人员对这些方向产生兴趣。
更新时间: 2025-07-25 17:59:13
领域: cs.LG,cs.AI,cs.CL
Let It Go? Not Quite: Addressing Item Cold Start in Sequential Recommendations with Content-Based Initialization
Many sequential recommender systems suffer from the cold start problem, where items with few or no interactions cannot be effectively used by the model due to the absence of a trained embedding. Content-based approaches, which leverage item metadata, are commonly used in such scenarios. One possible way is to use embeddings derived from content features such as textual descriptions as initialization for the model embeddings. However, directly using frozen content embeddings often results in suboptimal performance, as they may not fully adapt to the recommendation task. On the other hand, fine-tuning these embeddings can degrade performance for cold-start items, as item representations may drift far from their original structure after training. We propose a novel approach to address this limitation. Instead of entirely freezing the content embeddings or fine-tuning them extensively, we introduce a small trainable delta to frozen embeddings that enables the model to adapt item representations without letting them go too far from their original semantic structure. This approach demonstrates consistent improvements across multiple datasets and modalities, including e-commerce datasets with textual descriptions and a music dataset with audio-based representation.
Updated: 2025-07-25 17:57:31
标题: 放手?还差一点:使用基于内容的初始化解决顺序推荐中的物品冷启动问题
摘要: 许多顺序推荐系统都受冷启动问题的困扰,即由于缺乏经过训练的嵌入,与用户互动较少或没有互动的物品无法被模型有效地使用。在这种情况下,通常会使用基于内容的方法,利用物品元数据。一种可能的方式是使用从文本描述等内容特征派生的嵌入作为模型嵌入的初始化。然而,直接使用冻结的内容嵌入通常会导致次优性能,因为它们可能无法完全适应推荐任务。另一方面,微调这些嵌入可能会降低冷启动物品的性能,因为物品表示在训练后可能会偏离其原始结构。我们提出了一种新颖的方法来解决这个限制。与其完全冻结内容嵌入或对其进行大量微调,我们引入了一个小的可训练增量到冻结嵌入中,使模型能够调整物品表示而不让它们偏离其原始语义结构太远。这种方法在多个数据集和模态上展示了持续的改进,包括具有文本描述的电子商务数据集和具有基于音频表示的音乐数据集。
更新时间: 2025-07-25 17:57:31
领域: cs.IR,cs.AI,cs.LG
Let It Go? Not Quite: Addressing Item Cold Start in Sequential Recommendations with Content-Based Initialization
Many sequential recommender systems suffer from the cold start problem, where items with few or no interactions cannot be effectively used by the model due to the absence of a trained embedding. Content-based approaches, which leverage item metadata, are commonly used in such scenarios. One possible way is to use embeddings derived from content features such as textual descriptions as initialization for the model embeddings. However, directly using frozen content embeddings often results in suboptimal performance, as they may not fully adapt to the recommendation task. On the other hand, fine-tuning these embeddings can degrade performance for cold-start items, as item representations may drift far from their original structure after training. We propose a novel approach to address this limitation. Instead of entirely freezing the content embeddings or fine-tuning them extensively, we introduce a small trainable delta to frozen embeddings that enables the model to adapt item representations without letting them go too far from their original semantic structure. This approach demonstrates consistent improvements across multiple datasets and modalities, including e-commerce datasets with textual descriptions and a music dataset with audio-based representation.
Updated: 2025-07-25 17:57:31
标题: 放手?并非完全如此:通过基于内容的初始化解决顺序推荐中的物品冷启动问题
摘要: 许多顺序推荐系统都存在冷启动问题,即与用户互动少或没有互动的物品无法有效地被模型使用,因为缺乏训练好的嵌入。在这种情况下,常常使用基于内容的方法,利用物品元数据。一种可能的方法是使用从文本描述等内容特征派生的嵌入作为模型嵌入的初始化。然而,直接使用冻结的内容嵌入往往导致次优性能,因为它们可能无法完全适应推荐任务。另一方面,对这些嵌入进行微调可能会降低冷启动物品的性能,因为物品表示可能在训练后远离其原始结构。我们提出了一种新颖的方法来解决这个限制。我们不是完全冻结内容嵌入或对其进行大量微调,而是向冻结嵌入引入一个小的可训练增量,使模型能够调整物品表示,而不让它们远离其原始语义结构。这种方法在多个数据集和模态中都表现出持续改进,包括具有文本描述的电子商务数据集和具有基于音频表示的音乐数据集。
更新时间: 2025-07-25 17:57:31
领域: cs.IR,cs.AI,cs.LG
Is Exchangeability better than I.I.D to handle Data Distribution Shifts while Pooling Data for Data-scarce Medical image segmentation?
Data scarcity is a major challenge in medical imaging, particularly for deep learning models. While data pooling (combining datasets from multiple sources) and data addition (adding more data from a new dataset) have been shown to enhance model performance, they are not without complications. Specifically, increasing the size of the training dataset through pooling or addition can induce distributional shifts, negatively affecting downstream model performance, a phenomenon known as the "Data Addition Dilemma". While the traditional i.i.d. assumption may not hold in multi-source contexts, assuming exchangeability across datasets provides a more practical framework for data pooling. In this work, we investigate medical image segmentation under these conditions, drawing insights from causal frameworks to propose a method for controlling foreground-background feature discrepancies across all layers of deep networks. This approach improves feature representations, which are crucial in data-addition scenarios. Our method achieves state-of-the-art segmentation performance on histopathology and ultrasound images across five datasets, including a novel ultrasound dataset that we have curated and contributed. Qualitative results demonstrate more refined and accurate segmentation maps compared to prominent baselines across three model architectures. The code will be available on Github.
Updated: 2025-07-25 17:55:06
标题: 交换性是否比I.I.D更好地处理数据分布转移,同时在数据稀缺的医学图像分割中汇总数据?
摘要: 数据稀缺是医学影像领域的一大挑战,尤其对于深度学习模型而言。虽然数据汇集(将来自多个来源的数据集合并)和数据增加(从新数据集中添加更多数据)已被证明可以提升模型性能,但它们并非没有复杂性。具体而言,通过汇集或增加训练数据集的大小可能引发分布转移,从而对下游模型性能产生负面影响,这一现象被称为“数据增加困境”。虽然传统的独立同分布假设在多来源情境下可能不成立,但假设数据集之间具有可交换性为数据汇集提供了更实用的框架。在本研究中,我们探讨了在这些条件下的医学图像分割,借鉴因果框架的见解,提出了一种控制深度网络所有层中前景-背景特征差异的方法。这种方法改善了特征表示,在数据增加场景中至关重要。我们的方法在组织病理学和超声波图像的五个数据集上实现了最先进的分割性能,其中包括我们精心策划并贡献的一个新型超声波数据集。定性结果显示,与三种主流基线相比,我们的方法实现了更精细和准确的分割地图。代码将在Github上提供。
更新时间: 2025-07-25 17:55:06
领域: cs.CV,cs.LG
ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation
Semantics-driven 3D spatial constraints align highlevel semantic representations with low-level action spaces, facilitating the unification of task understanding and execution in robotic manipulation. The synergistic reasoning of Multimodal Large Language Models (MLLMs) and Vision Foundation Models (VFMs) enables cross-modal 3D spatial constraint construction. Nevertheless, existing methods have three key limitations: (1) coarse semantic granularity in constraint modeling, (2) lack of real-time closed-loop planning, (3) compromised robustness in semantically diverse environments. To address these challenges, we propose ReSem3D, a unified manipulation framework for semantically diverse environments, leveraging the synergy between VFMs and MLLMs to achieve fine-grained visual grounding and dynamically constructs hierarchical 3D spatial constraints for real-time manipulation. Specifically, the framework is driven by hierarchical recursive reasoning in MLLMs, which interact with VFMs to automatically construct 3D spatial constraints from natural language instructions and RGB-D observations in two stages: part-level extraction and region-level refinement. Subsequently, these constraints are encoded as real-time optimization objectives in joint space, enabling reactive behavior to dynamic disturbances. Extensive simulation and real-world experiments are conducted in semantically rich household and sparse chemical lab environments. The results demonstrate that ReSem3D performs diverse manipulation tasks under zero-shot conditions, exhibiting strong adaptability and generalization. Code and videos are available at https://github.com/scy-v/ReSem3D and https://resem3d.github.io.
Updated: 2025-07-25 17:54:43
标题: ReSem3D:通过细粒度语义基础的可细化3D空间约束,实现通用机器人操作
摘要: 基于语义的3D空间约束可以将高级语义表示与低级动作空间对齐,从而促进机器人操作中任务理解和执行的统一。多模态大型语言模型(MLLMs)和视觉基础模型(VFMs)的协同推理使跨模态3D空间约束的构建成为可能。然而,现有方法存在三个关键限制:(1)在约束建模中的语义粒度较粗,(2)缺乏实时闭环规划,(3)在语义多样化环境中的鲁棒性受损。为了解决这些挑战,我们提出了ReSem3D,一个针对语义多样化环境的统一操作框架,利用VFMs和MLLMs之间的协同作用实现细粒度的视觉定位,并动态构建分层3D空间约束以进行实时操作。具体而言,该框架由MLLMs中的层次递归推理驱动,这些推理与VFMs交互,以自动从自然语言指令和RGB-D观察中构建3D空间约束,分为两个阶段:部分级抽取和区域级细化。随后,这些约束被编码为联合空间中的实时优化目标,使其能够对动态干扰做出反应。在语义丰富的家庭和稀疏的化学实验室环境中进行了大量模拟和实际实验。结果表明,ReSem3D在零样本条件下执行多样的操作任务,表现出很强的适应性和泛化性。代码和视频可在https://github.com/scy-v/ReSem3D和https://resem3d.github.io上找到。
更新时间: 2025-07-25 17:54:43
领域: cs.RO,cs.AI,cs.CV,cs.HC,cs.LG
ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation
Semantics-driven 3D spatial constraints align highlevel semantic representations with low-level action spaces, facilitating the unification of task understanding and execution in robotic manipulation. The synergistic reasoning of Multimodal Large Language Models (MLLMs) and Vision Foundation Models (VFMs) enables cross-modal 3D spatial constraint construction. Nevertheless, existing methods have three key limitations: (1) coarse semantic granularity in constraint modeling, (2) lack of real-time closed-loop planning, (3) compromised robustness in semantically diverse environments. To address these challenges, we propose ReSem3D, a unified manipulation framework for semantically diverse environments, leveraging the synergy between VFMs and MLLMs to achieve fine-grained visual grounding and dynamically constructs hierarchical 3D spatial constraints for real-time manipulation. Specifically, the framework is driven by hierarchical recursive reasoning in MLLMs, which interact with VFMs to automatically construct 3D spatial constraints from natural language instructions and RGB-D observations in two stages: part-level extraction and region-level refinement. Subsequently, these constraints are encoded as real-time optimization objectives in joint space, enabling reactive behavior to dynamic disturbances. Extensive simulation and real-world experiments are conducted in semantically rich household and sparse chemical lab environments. The results demonstrate that ReSem3D performs diverse manipulation tasks under zero-shot conditions, exhibiting strong adaptability and generalization. Code and videos are available at https://github.com/scy-v/ReSem3D and https://resem3d.github.io.
Updated: 2025-07-25 17:54:43
标题: ReSem3D:通过细粒度语义基础实现可精炼的三维空间约束,用于通用机器人操作
摘要: 基于语义的3D空间约束将高级语义表示与低级动作空间对齐,促进了机器人操作中任务理解和执行的统一。多模态大型语言模型(MLLMs)和视觉基础模型(VFMs)的协同推理实现了跨模态3D空间约束构建。然而,现有方法存在三个关键限制:(1)约束建模中的语义粒度粗糙,(2)缺乏实时闭环规划,(3)在语义多样性环境中牺牲了鲁棒性。为了解决这些挑战,我们提出了ReSem3D,一个针对语义多样性环境的统一操作框架,利用VFMs和MLLMs之间的协同作用实现精细的视觉基础,并动态构建分层3D空间约束以进行实时操作。具体来说,该框架由MLLMs中的层次递归推理驱动,这些推理与VFMs互动,从自然语言指令和RGB-D观察中自动构建3D空间约束的两个阶段:部分级别提取和区域级别细化。随后,这些约束被编码为联合空间中的实时优化目标,使得对动态干扰的反应行为成为可能。在语义丰富的家庭和稀疏化学实验室环境中进行了广泛的仿真和实际实验。结果表明,ReSem3D在零-shot条件下执行多样化的操作任务,表现出强大的适应性和泛化能力。代码和视频可在https://github.com/scy-v/ReSem3D和https://resem3d.github.io找到。
更新时间: 2025-07-25 17:54:43
领域: cs.RO,cs.AI,cs.CV,cs.HC,cs.LG
Gemini 2.5 Pro Capable of Winning Gold at IMO 2025
The International Mathematical Olympiad (IMO) poses uniquely challenging problems requiring deep insight, creativity, and formal reasoning. While Large Language Models (LLMs) perform well on mathematical benchmarks like AIME, they struggle with Olympiad-level tasks. We use Google's Gemini 2.5 Pro on the newly released IMO 2025 problems, avoiding data contamination. Using a self-verification pipeline with careful prompt design, 5 (out of 6) problems are solved correctly. This result underscores the importance of developing optimal strategies to harness the full potential of powerful LLMs for complex reasoning tasks.
Updated: 2025-07-25 17:53:11
标题: Gemini 2.5 Pro有能力在2025年IMO比赛中获得金牌
摘要: 国际数学奥林匹克竞赛(IMO)提出独特挑战性问题,需要深刻的洞察力、创造力和形式推理。虽然大型语言模型(LLMs)在数学基准测试中表现良好,如AIME,但在奥林匹克级别的任务上表现不佳。我们在新发布的IMO 2025问题上使用谷歌的Gemini 2.5 Pro,避免数据污染。通过仔细设计提示的自验证流水线,解决了6个问题中的5个。这一结果强调了开发优化策略以充分利用强大LLMs在复杂推理任务中的潜力的重要性。
更新时间: 2025-07-25 17:53:11
领域: cs.AI
Gemini 2.5 Pro Capable of Winning Gold at IMO 2025
The International Mathematical Olympiad (IMO) poses uniquely challenging problems requiring deep insight, creativity, and formal reasoning. While Large Language Models (LLMs) perform well on mathematical benchmarks like AIME, they struggle with Olympiad-level tasks. We use Google's Gemini 2.5 Pro on the newly released IMO 2025 problems, avoiding data contamination. Using a self-verification pipeline with careful prompt design, 5 (out of 6) problems are solved correctly. This result underscores the importance of developing optimal strategies to harness the full potential of powerful LLMs for complex reasoning tasks.
Updated: 2025-07-25 17:53:11
标题: Gemini 2.5 Pro 可能在 2025 年 IMO 赢得金牌
摘要: 国际数学奥林匹克竞赛(IMO)提出了独特具有挑战性的问题,需要深刻的洞察力、创造力和形式推理。虽然大型语言模型(LLMs)在数学基准测试(如AIME)中表现良好,但在奥林匹克级别的任务中却遇到困难。我们在新发布的IMO 2025问题上使用了谷歌的Gemini 2.5 Pro,避免了数据污染。通过使用一个自我验证的流程和精心设计的提示,成功解决了6个问题中的5个。这一结果强调了开发最佳策略以发挥强大LLMs在复杂推理任务中的全部潜力的重要性。
更新时间: 2025-07-25 17:53:11
领域: cs.AI
Linearly Convergent Algorithms for Nonsmooth Problems with Unknown Smooth Pieces
We develop efficient algorithms for optimizing piecewise smooth (PWS) functions where the underlying partition of the domain into smooth pieces is \emph{unknown}. For PWS functions satisfying a quadratic growth (QG) condition, we propose a bundle-level (BL) type method that achieves global linear convergence -- to our knowledge, the first such result for any algorithm for this problem class. We extend this method to handle approximately PWS functions and to solve weakly-convex PWS problems, improving the state-of-the-art complexity to match the benchmark for smooth non-convex optimization. Furthermore, we introduce the first verifiable and accurate termination criterion for PWS optimization. Similar to the gradient norm in smooth optimization, this certificate tightly characterizes the optimality gap under the QG condition, and can moreover be evaluated without knowledge of any problem parameters. We develop a search subroutine for this certificate and embed it within a guess-and-check framework, resulting in an almost parameter-free algorithm for both the convex QG and weakly-convex settings.
Updated: 2025-07-25 17:50:43
标题: 具有未知光滑片段的非光滑问题的线性收敛算法
摘要: 我们开发了一种针对优化分段光滑(PWS)函数的高效算法,其中域的基本分割为光滑部分是未知的。对于满足二次生长(QG)条件的PWS函数,我们提出了一种束级(BL)类型方法,实现了全局线性收敛 - 据我们所知,这是该问题类别的任何算法中第一个取得的结果。我们将这种方法扩展到处理近似PWS函数,并解决弱凸PWS问题,将现有技术复杂度提高到与光滑非凸优化基准相匹配。此外,我们引入了第一个可验证和准确的PWS优化终止准则。类似于光滑优化中的梯度范数,该证书在QG条件下紧密刻画了最优性差距,并且可以在不了解任何问题参数的情况下进行评估。我们为这个证书开发了一个搜索子程序,并将其嵌入到一个猜测和检查框架中,从而实现了一个几乎无参数的算法,适用于凸QG和弱凸设置。
更新时间: 2025-07-25 17:50:43
领域: math.OC,cs.LG
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
We present Rapid Attention Distillation to Linear Attention Decoders at Scale (RADLADS), a protocol for rapidly converting softmax attention transformers into linear attention decoder models, along with two new RWKV-variant architectures, and models converted from popular Qwen2.5 open source models in 7B, 32B, and 72B sizes. Our conversion process requires only 350-700M tokens, less than 0.005% of the token count used to train the original teacher models. Converting to our 72B linear attention model costs less than \$2,000 USD at today's prices, yet quality at inference remains close to the original transformer. These models achieve state-of-the-art downstream performance across a set of standard benchmarks for linear attention models of their size. We release all our models on HuggingFace under the Apache 2.0 license, with the exception of our 72B models which are also governed by the Qwen License Agreement. Models at https://huggingface.co/collections/recursal/radlads-6818ee69e99e729ba8a87102 Training Code at https://github.com/recursal/RADLADS-paper
Updated: 2025-07-25 17:46:09
标题: RADLADS:大规模快速注意力提炼到线性注意力解码器
摘要: 我们提出了一种快速将softmax注意力变换器转换为线性注意力解码器模型的协议,称为Rapid Attention Distillation to Linear Attention Decoders at Scale (RADLADS),并提出了两种新的RWKV变种架构,以及从流行的Qwen2.5开源模型转换而来的7B、32B和72B大小的模型。我们的转换过程仅需350-700M个标记,仅占用原始教师模型训练时使用的标记数量的0.005%。将转换为我们的72B线性注意力模型的成本不到2000美元,然而推理质量仍接近原始变换器。这些模型在一组标准基准测试中取得了同等大小的线性注意力模型的最新成果。我们在HuggingFace上以Apache 2.0许可发布了所有模型,除了我们的72B模型还受Qwen许可协议约束。 模型网址为https://huggingface.co/collections/recursal/radlads-6818ee69e99e729ba8a87102 训练代码网址为https://github.com/recursal/RADLADS-paper
更新时间: 2025-07-25 17:46:09
领域: cs.CL,cs.AI,cs.LG,I.2.7
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
We present Rapid Attention Distillation to Linear Attention Decoders at Scale (RADLADS), a protocol for rapidly converting softmax attention transformers into linear attention decoder models, along with two new RWKV-variant architectures, and models converted from popular Qwen2.5 open source models in 7B, 32B, and 72B sizes. Our conversion process requires only 350-700M tokens, less than 0.005% of the token count used to train the original teacher models. Converting to our 72B linear attention model costs less than \$2,000 USD at today's prices, yet quality at inference remains close to the original transformer. These models achieve state-of-the-art downstream performance across a set of standard benchmarks for linear attention models of their size. We release all our models on HuggingFace under the Apache 2.0 license, with the exception of our 72B models which are also governed by the Qwen License Agreement. Models at https://huggingface.co/collections/recursal/radlads-6818ee69e99e729ba8a87102 Training Code at https://github.com/recursal/RADLADS-paper
Updated: 2025-07-25 17:46:09
标题: RADLADS:快速关注精炼以线性关注解码器的扩展
摘要: 我们提出了快速注意力蒸馏到大规模线性注意力解码器(RADLADS)的协议,这是一种将softmax注意力变换器快速转换为线性注意力解码器模型的方法,以及两种新的RWKV变体架构,以及从流行的Qwen2.5开源模型中转换而来的7B、32B和72B大小的模型。我们的转换过程仅需要350-700M个标记,仅占用了原始教师模型训练时使用的标记数量的不到0.005%。将转换为我们的72B线性注意力模型的成本不到2000美元,但推断质量仍接近原始变换器。这些模型在其大小的标准线性注意力模型的一套基准测试中实现了最先进的下游性能。我们在HuggingFace上以Apache 2.0许可证发布了所有模型,但我们的72B模型也受Qwen许可协议约束。 模型位于https://huggingface.co/collections/recursal/radlads-6818ee69e99e729ba8a87102 训练代码位于https://github.com/recursal/RADLADS-paper
更新时间: 2025-07-25 17:46:09
领域: cs.CL,cs.AI,cs.LG,I.2.7
Fast Learning of Non-Cooperative Spacecraft 3D Models through Primitive Initialization
The advent of novel view synthesis techniques such as NeRF and 3D Gaussian Splatting (3DGS) has enabled learning precise 3D models only from posed monocular images. Although these methods are attractive, they hold two major limitations that prevent their use in space applications: they require poses during training, and have high computational cost at training and inference. To address these limitations, this work contributes: (1) a Convolutional Neural Network (CNN) based primitive initializer for 3DGS using monocular images; (2) a pipeline capable of training with noisy or implicit pose estimates; and (3) and analysis of initialization variants that reduce the training cost of precise 3D models. A CNN takes a single image as input and outputs a coarse 3D model represented as an assembly of primitives, along with the target's pose relative to the camera. This assembly of primitives is then used to initialize 3DGS, significantly reducing the number of training iterations and input images needed -- by at least an order of magnitude. For additional flexibility, the CNN component has multiple variants with different pose estimation techniques. This work performs a comparison between these variants, evaluating their effectiveness for downstream 3DGS training under noisy or implicit pose estimates. The results demonstrate that even with imperfect pose supervision, the pipeline is able to learn high-fidelity 3D representations, opening the door for the use of novel view synthesis in space applications.
Updated: 2025-07-25 17:43:29
标题: 通过基元初始化快速学习非合作空间飞行器的3D模型
摘要: 新视图合成技术的出现,如NeRF和3D高斯点云(3DGS),已经使得可以仅从单目姿态图像中学习精确的3D模型成为可能。虽然这些方法很有吸引力,但它们存在两个主要限制,阻碍了它们在空间应用中的使用:它们在训练过程中需要姿态,并且在训练和推断阶段具有较高的计算成本。为了解决这些限制,本研究提出了:(1)基于卷积神经网络(CNN)的单目图像的3DGS基元初始化器;(2)一个能够使用嘈杂或隐式姿态估计进行训练的流程;以及(3)分析初始化变体,减少精确3D模型的训练成本。CNN以单个图像作为输入,并输出表示为基元组装的粗糙3D模型,以及目标相对于摄像机的姿态。然后使用这些基元组装来初始化3DGS,显著降低了所需的训练迭代次数和输入图像数量,至少降低一个数量级。为了增加灵活性,CNN组件具有多个变体,采用不同的姿态估计技术。本研究对这些变体进行比较,评估它们在嘈杂或隐式姿态估计条件下对下游3DGS训练的有效性。结果表明,即使在姿态监督不完善的情况下,该流程也能够学习高保真度的3D表示,为在空间应用中使用新视图合成技术打开了大门。
更新时间: 2025-07-25 17:43:29
领域: cs.CV,cs.LG,cs.RO
Hierarchical Deep Reinforcement Learning Framework for Multi-Year Asset Management Under Budget Constraints
Budget planning and maintenance optimization are crucial for infrastructure asset management, ensuring cost-effectiveness and sustainability. However, the complexity arising from combinatorial action spaces, diverse asset deterioration, stringent budget constraints, and environmental uncertainty significantly limits existing methods' scalability. This paper proposes a Hierarchical Deep Reinforcement Learning methodology specifically tailored to multi-year infrastructure planning. Our approach decomposes the problem into two hierarchical levels: a high-level Budget Planner allocating annual budgets within explicit feasibility bounds, and a low-level Maintenance Planner prioritizing assets within the allocated budget. By structurally separating macro-budget decisions from asset-level prioritization and integrating linear programming projection within a hierarchical Soft Actor-Critic framework, the method efficiently addresses exponential growth in the action space and ensures rigorous budget compliance. A case study evaluating sewer networks of varying sizes (10, 15, and 20 sewersheds) illustrates the effectiveness of the proposed approach. Compared to conventional Deep Q-Learning and enhanced genetic algorithms, our methodology converges more rapidly, scales effectively, and consistently delivers near-optimal solutions even as network size grows.
Updated: 2025-07-25 17:42:34
标题: 层次化深度强化学习框架:在预算约束条件下进行多年资产管理
摘要: 预算规划和维护优化对于基础设施资产管理至关重要,确保成本效益和可持续性。然而,由于组合行动空间的复杂性、多样化资产退化、严格的预算限制和环境不确定性,现有方法的可扩展性受到显著限制。本文提出了一种专门针对多年基础设施规划的分层深度强化学习方法。我们的方法将问题分解为两个层次:高层预算规划者在明确的可行性范围内分配年度预算,低层维护规划者在分配的预算内对资产进行优先排序。通过在分层软演员-评论家框架中集成线性规划投影,将宏观预算决策与资产级别优先排序结构上分离,该方法有效地解决了行动空间的指数增长,并确保严格的预算合规性。一个评估不同大小下水道网络(10、15和20个下水道区)的案例研究展示了所提出方法的有效性。与传统的深度Q学习和增强遗传算法相比,我们的方法收敛更快,有效扩展,并在网络规模增长时始终提供接近最优解。
更新时间: 2025-07-25 17:42:34
领域: cs.AI,cs.LG,cs.SY,eess.SY,math.OC
Hierarchical Deep Reinforcement Learning Framework for Multi-Year Asset Management Under Budget Constraints
Budget planning and maintenance optimization are crucial for infrastructure asset management, ensuring cost-effectiveness and sustainability. However, the complexity arising from combinatorial action spaces, diverse asset deterioration, stringent budget constraints, and environmental uncertainty significantly limits existing methods' scalability. This paper proposes a Hierarchical Deep Reinforcement Learning methodology specifically tailored to multi-year infrastructure planning. Our approach decomposes the problem into two hierarchical levels: a high-level Budget Planner allocating annual budgets within explicit feasibility bounds, and a low-level Maintenance Planner prioritizing assets within the allocated budget. By structurally separating macro-budget decisions from asset-level prioritization and integrating linear programming projection within a hierarchical Soft Actor-Critic framework, the method efficiently addresses exponential growth in the action space and ensures rigorous budget compliance. A case study evaluating sewer networks of varying sizes (10, 15, and 20 sewersheds) illustrates the effectiveness of the proposed approach. Compared to conventional Deep Q-Learning and enhanced genetic algorithms, our methodology converges more rapidly, scales effectively, and consistently delivers near-optimal solutions even as network size grows.
Updated: 2025-07-25 17:42:34
标题: 基于预算约束的多年资产管理的分层深度强化学习框架
摘要: 预算规划和维护优化对基础设施资产管理至关重要,确保成本效益和可持续性。然而,由于组合行动空间的复杂性、不同资产退化、严格的预算约束和环境不确定性而产生的复杂性显著限制了现有方法的可扩展性。本文提出了一种专门针对多年基础设施规划的分层深度强化学习方法。我们的方法将问题分解为两个层次:高层预算规划者在明确的可行性边界内分配年度预算,低层维护规划者在分配的预算内对资产进行优先级排序。通过在分层软演员-评论家框架中结构化地将宏观预算决策与资产级优先级排序分开,并整合线性规划投影,该方法有效地解决了行动空间的指数增长,并确保了严格的预算合规性。通过评估不同规模的下水道网络(10、15和20个污水流域)的案例研究,展示了所提出方法的有效性。与传统的深度Q学习和增强遗传算法相比,我们的方法收敛更快,有效扩展,并即使网络规模增大,也始终提供接近最优解。
更新时间: 2025-07-25 17:42:34
领域: cs.AI,cs.LG,cs.SY,eess.SY,math.OC
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Large language models (LLMs) are increasingly adapted to downstream tasks via reinforcement learning (RL) methods like Group Relative Policy Optimization (GRPO), which often require thousands of rollouts to learn new tasks. We argue that the interpretable nature of language can often provide a much richer learning medium for LLMs, compared with policy gradients derived from sparse, scalar rewards. To test this, we introduce GEPA (Genetic-Pareto), a prompt optimizer that thoroughly incorporates natural language reflection to learn high-level rules from trial and error. Given any AI system containing one or more LLM prompts, GEPA samples system-level trajectories (e.g., reasoning, tool calls, and tool outputs) and reflects on them in natural language to diagnose problems, propose and test prompt updates, and combine complementary lessons from the Pareto frontier of its own attempts. As a result of GEPA's design, it can often turn even just a few rollouts into a large quality gain. Across four tasks, GEPA outperforms GRPO by 10% on average and by up to 20%, while using up to 35x fewer rollouts. GEPA also outperforms the leading prompt optimizer, MIPROv2, by over 10% across two LLMs, and demonstrates promising results as an inference-time search strategy for code optimization.
Updated: 2025-07-25 17:42:32
标题: GEPA:反思提示演变可以胜过强化学习
摘要: 大型语言模型(LLMs)越来越多地通过强化学习(RL)方法(如Group Relative Policy Optimization(GRPO))用于下游任务,这通常需要成千上万次的测试来学习新任务。我们认为,与从稀疏、标量奖励导出的策略梯度相比,语言的可解释性往往可以为LLMs提供更丰富的学习媒介。为了验证这一点,我们引入了GEPA(Genetic-Pareto),这是一个完全整合自然语言反思的提示优化器,用于通过试错学习高级规则。给定任何包含一个或多个LLM提示的AI系统,GEPA会对系统级轨迹(例如推理、工具调用和工具输出)进行抽样,并用自然语言反思这些轨迹,以诊断问题、提出和测试提示更新,并结合自己尝试的帕累托前沿的互补教训。由于GEPA的设计,它通常可以将甚至只有几次试验转化为大幅度的质量提升。在四个任务中,GEPA的表现平均优于GRPO 10%,最高可达20%,同时使用的试验次数最多比GRPO少35倍。GEPA还在两个LLMs上优于领先的提示优化器MIPROv2超过10%,并且作为代码优化的推理时间搜索策略展示了令人满意的结果。
更新时间: 2025-07-25 17:42:32
领域: cs.CL,cs.AI,cs.LG,cs.SE,I.2.7; I.2.6; I.2.4; I.2.8
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Large language models (LLMs) are increasingly adapted to downstream tasks via reinforcement learning (RL) methods like Group Relative Policy Optimization (GRPO), which often require thousands of rollouts to learn new tasks. We argue that the interpretable nature of language can often provide a much richer learning medium for LLMs, compared with policy gradients derived from sparse, scalar rewards. To test this, we introduce GEPA (Genetic-Pareto), a prompt optimizer that thoroughly incorporates natural language reflection to learn high-level rules from trial and error. Given any AI system containing one or more LLM prompts, GEPA samples system-level trajectories (e.g., reasoning, tool calls, and tool outputs) and reflects on them in natural language to diagnose problems, propose and test prompt updates, and combine complementary lessons from the Pareto frontier of its own attempts. As a result of GEPA's design, it can often turn even just a few rollouts into a large quality gain. Across four tasks, GEPA outperforms GRPO by 10% on average and by up to 20%, while using up to 35x fewer rollouts. GEPA also outperforms the leading prompt optimizer, MIPROv2, by over 10% across two LLMs, and demonstrates promising results as an inference-time search strategy for code optimization.
Updated: 2025-07-25 17:42:32
标题: GEPA:反思提示的演变可以胜过强化学习
摘要: 大型语言模型(LLMs)越来越多地通过强化学习(RL)方法,如Group Relative Policy Optimization(GRPO)来适应下游任务,这通常需要数千次的模拟来学习新任务。我们认为,与从稀疏、标量奖励推导出的策略梯度相比,语言的可解释性往往可以为LLMs提供更丰富的学习媒介。为了测试这一点,我们引入了GEPA(Genetic-Pareto),这是一个通过自然语言反思充分学习高级规则的提示优化器。给定任何包含一个或多个LLM提示的AI系统,GEPA会对系统级轨迹(如推理、工具调用和工具输出)进行采样,并用自然语言对其进行反思,以诊断问题、提出和测试提示更新,并结合自身尝试的帕累托前沿的互补教训。由于GEPA的设计,它通常可以将即使只有几次模拟也转化为大量的质量收益。在四个任务中,GEPA的表现平均比GRPO高出10%,最高可高出20%,同时使用的模拟次数少了多达35倍。GEPA还在两个LLMs上比领先的提示优化器MIPROv2表现出超过10%的优势,并且作为代码优化的推理时间搜索策略展现出有希望的结果。
更新时间: 2025-07-25 17:42:32
领域: cs.CL,cs.AI,cs.LG,cs.SE,I.2.7; I.2.6; I.2.4; I.2.8
Forest-Guided Clustering -- Shedding Light into the Random Forest Black Box
As machine learning models are increasingly deployed in sensitive application areas, the demand for interpretable and trustworthy decision-making has increased. Random Forests (RF), despite their widespread use and strong performance on tabular data, remain difficult to interpret due to their ensemble nature. We present Forest-Guided Clustering (FGC), a model-specific explainability method that reveals both local and global structure in RFs by grouping instances according to shared decision paths. FGC produces human-interpretable clusters aligned with the model's internal logic and computes cluster-specific and global feature importance scores to derive decision rules underlying RF predictions. FGC accurately recovered latent subclass structure on a benchmark dataset and outperformed classical clustering and post-hoc explanation methods. Applied to an AML transcriptomic dataset, FGC uncovered biologically coherent subpopulations, disentangled disease-relevant signals from confounders, and recovered known and novel gene expression patterns. FGC bridges the gap between performance and interpretability by providing structure-aware insights that go beyond feature-level attribution.
Updated: 2025-07-25 17:41:39
标题: 森林引导聚类——揭开随机森林黑盒的神秘面纱
摘要: 随着机器学习模型越来越多地部署在敏感应用领域,人们对可解释和可信赖的决策需求也在增加。尽管随机森林(RF)在表格数据上表现强劲且被广泛使用,但由于其集成性质,仍然难以解释。我们提出了一种名为Forest-Guided Clustering(FGC)的模型特定可解释性方法,通过根据共享决策路径对实例进行分组来揭示RF中的局部和全局结构。FGC生成与模型内部逻辑一致的人可解释的聚类,并计算特定于聚类和全局的特征重要性分数,以推导RF预测背后的决策规则。FGC在基准数据集上准确恢复了潜在的子类结构,并优于经典聚类和事后解释方法。应用于一个AML转录组数据集,FGC揭示了生物上连贯的亚种群,从混淆因素中解开了与疾病相关的信号,并恢复了已知和新的基因表达模式。FGC通过提供结构感知的见解,超越了特征级别的归因,弥合了性能和可解释性之间的鸿沟。
更新时间: 2025-07-25 17:41:39
领域: cs.LG
GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences
Aviation's climate impact includes not only CO2 emissions but also significant non-CO2 effects, especially from contrails. These ice clouds can alter Earth's radiative balance, potentially rivaling the warming effect of aviation CO2. Physics-based models provide useful estimates of contrail formation and climate impact, but their accuracy depends heavily on the quality of atmospheric input data and on assumptions used to represent complex processes like ice particle formation and humidity-driven persistence. Observational data from remote sensors, such as satellites and ground cameras, could be used to validate and calibrate these models. However, existing datasets don't explore all aspect of contrail dynamics and formation: they typically lack temporal tracking, and do not attribute contrails to their source flights. To address these limitations, we present the Ground Visible Camera Contrail Sequences (GVCCS), a new open data set of contrails recorded with a ground-based all-sky camera in the visible range. Each contrail is individually labeled and tracked over time, allowing a detailed analysis of its lifecycle. The dataset contains 122 video sequences (24,228 frames) and includes flight identifiers for contrails that form above the camera. As reference, we also propose a unified deep learning framework for contrail analysis using a panoptic segmentation model that performs semantic segmentation (contrail pixel identification), instance segmentation (individual contrail separation), and temporal tracking in a single architecture. By providing high-quality, temporally resolved annotations and a benchmark for model evaluation, our work supports improved contrail monitoring and will facilitate better calibration of physical models. This sets the groundwork for more accurate climate impact understanding and assessments.
Updated: 2025-07-25 17:32:47
标题: GVCCS:用于可见全天空摄像机序列上对化控识别和跟踪的数据集
摘要: 航空业的气候影响不仅包括二氧化碳排放,还包括来自冰云的显著非二氧化碳效应,尤其是来自凝结尾迹。这些冰云可以改变地球的辐射平衡,潜在地与航空二氧化碳的升温效应相媲美。基于物理的模型提供了凝结尾迹形成和气候影响的有用估计,但其准确性严重依赖大气输入数据的质量和用来表示冰粒子形成和受湿度驱动的持续性等复杂过程的假设。来自遥感器(如卫星和地面摄像机)的观测数据可以用来验证和校准这些模型。然而,现有数据集并未探索凝结尾迹动态和形成的所有方面:它们通常缺乏时间跟踪,并且未将凝结尾迹归因于其来源航班。为了解决这些限制,我们提出了地面可见摄像机凝结尾迹序列(GVCCS),这是一个新的开放数据集,记录了在可见范围内使用地面全天候摄像机拍摄的凝结尾迹。每个凝结尾迹都经过单独标记并随时间跟踪,允许对其生命周期进行详细分析。该数据集包含122个视频序列(24,228帧),并包括形成在摄像机上方的凝结尾迹的飞行标识。作为参考,我们还提出了一种统一的深度学习框架,用于凝结尾迹分析,该框架使用全景分割模型执行语义分割(凝结尾迹像素识别)、实例分割(单个凝结尾迹分离)和时间跟踪。通过提供高质量、时间分辨注释以及模型评估基准,我们的工作支持改进凝结尾迹监测,并将促进物理模型更好的校准。这为更准确地理解和评估气候影响奠定了基础。
更新时间: 2025-07-25 17:32:47
领域: cs.CV,cs.LG
VENENA: A Deceptive Visual Encryption Framework for Wireless Semantic Secrecy
Eavesdropping has been a long-standing threat to the security and privacy of wireless communications, since it is difficult to detect and costly to prevent. As networks evolve towards Sixth Generation (6G) and semantic communication becomes increasingly central to next-generation wireless systems, securing semantic information transmission emerges as a critical challenge. While classical physical layer security (PLS) focuses on passive security, the recently proposed concept of physical layer deception (PLD) offers a semantic encryption measure to actively deceive eavesdroppers. Yet the existing studies of PLD have been dominantly information-theoretical and link-level oriented, lacking considerations of system-level design and practical implementation. In this work we propose a novel artificial intelligence (AI)-enabled framework called Visual ENcryption for Eavesdropping NegAtion (VENENA), which combines the techniques of PLD, visual encryption, and image poisoning, into a comprehensive mechanism for deceptive secure semantic transmission in future wireless networks. By leveraging advanced vision transformers and semantic codecs, VENENA demonstrates how semantic security can be enhanced through the synergy of physical layer techniques and artificial intelligence, paving the way for secure semantic communication in 6G networks.
Updated: 2025-07-25 17:27:11
标题: VENENA:用于无线语义保密的欺骗性视觉加密框架
摘要: 窃听长期以来一直是无线通信安全和隐私的威胁,因为很难检测并且防范成本高昂。随着网络向第六代(6G)演变,并且语义通信在下一代无线系统中变得越来越重要,保护语义信息传输成为一个关键挑战。虽然经典的物理层安全(PLS)侧重 passsive security,但最近提出的物理层欺骗(PLD)概念提供了一种主动欺骗窃听者的语义加密措施。然而,现有的PLD研究主要是信息理论和链路级导向的,缺乏系统级设计和实际实施的考虑。 在这项工作中,我们提出了一个新颖的人工智能(AI)-启用的框架,名为Visual ENcryption for Eavesdropping NegAtion(VENENA),将PLD、视觉加密和图像毒化技术结合起来,形成一个综合机制,用于未来无线网络中的欺骗性安全语义传输。通过利用先进的视觉转换器和语义编解码器,VENENA展示了如何通过物理层技术和人工智能的协同作用来增强语义安全性,为6G网络中的安全语义通信铺平道路。
更新时间: 2025-07-25 17:27:11
领域: cs.CR
TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis
Satellite remote sensing from repeated observations and multiple sensors enables a wide range of downstream applications, including climate modeling, carbon accounting, and strategies for conservation and sustainable land use. However, satellite time series are voluminous, often corrupted by sensor noise, clouds, and atmospheric conditions, and unevenly spaced in time, making them challenging to use. We present TESSERA, an open, global, land-oriented remote sensing foundation model that uses self-supervised learning to generate `ready-to-use' embeddings at 10~m scale from pixel-level satellite time series data. TESSERA uses two parallel Transformer-based encoders to combine optical data from ten Sentinel-2 spectral bands at 10-60~m spatial resolution and two Sentinel-1 synthetic aperture radar backscatter coefficients at 10~m resolution to create embeddings that are subsequently fused with a multilayer perceptron to create annual global embedding maps. We compare our work with state-of-the-art task-specific models and other foundation models in five diverse downstream tasks and find that TESSERA closely matches or outperforms these baselines. We believe that TESSERA's ease of use, openness, computation-, label-, and data-efficiency, and high performance will prove transformative in a wide range of vegetation-oriented ecological and agricultural applications.
Updated: 2025-07-25 17:22:48
标题: TESSERA:用于地球表面光谱的时间嵌入的表示和分析
摘要: 卫星遥感通过重复观测和多个传感器,实现了广泛的下游应用,包括气候建模、碳核算以及保护和可持续土地利用的策略。然而,卫星时间序列庞大,经常受到传感器噪声、云层和大气条件的干扰,并且在时间上不均匀分布,这使得它们很难使用。我们提出了TESSERA,这是一个开放的、全球的、以土地为导向的遥感基础模型,利用自监督学习从像素级卫星时间序列数据中生成10m尺度的'即用型'嵌入。TESSERA使用两个并行的基于Transformer的编码器,将10-60m空间分辨率的十个Sentinel-2光谱波段的光学数据和两个Sentinel-1合成孔径雷达回波系数的10m分辨率结合起来,以创建嵌入,随后与多层感知器融合,创建年度全球嵌入地图。我们将我们的工作与最先进的特定任务模型和其他基础模型在五个不同的下游任务中进行比较,发现TESSERA与这些基线模型紧密匹配或优于它们。我们相信,TESSERA的易用性、开放性、计算效率、标签效率和数据效率以及高性能将在广泛的以植被为导向的生态和农业应用中发挥转变性作用。
更新时间: 2025-07-25 17:22:48
领域: cs.LG
Bounded KRnet and its applications to density estimation and approximation
In this paper, we develop an invertible mapping, called B-KRnet, on a bounded domain and apply it to density estimation/approximation for data or the solutions of PDEs such as the Fokker-Planck equation and the Keller-Segel equation. Similar to KRnet, B-KRnet consists of a series of coupling layers with progressively fewer active transformation dimensions, inspired by the triangular structure of the Knothe-Rosenblatt (KR) rearrangement. The main difference between B-KRnet and KRnet is that B-KRnet is defined on a hypercube while KRnet is defined on the whole space, in other words, a new mechanism is introduced in B-KRnet to maintain the exact invertibility. Using B-KRnet as a transport map, we obtain an explicit probability density function (PDF) model that corresponds to the pushforward of a base (uniform) distribution on the hypercube. It can be directly applied to density estimation when only data are available. By coupling KRnet and B-KRnet, we define a deep generative model on a high-dimensional domain where some dimensions are bounded and other dimensions are unbounded. A typical case is the solution of the stationary kinetic Fokker-Planck equation, which is a PDF of position and momentum. Based on B-KRnet, we develop an adaptive learning approach to approximate partial differential equations whose solutions are PDFs or can be treated as PDFs. A variety of numerical experiments is presented to demonstrate the effectiveness of B-KRnet.
Updated: 2025-07-25 17:22:26
标题: 有界KRnet及其在密度估计和逼近中的应用
摘要: 在这篇论文中,我们在有界域上开发了一种可逆映射,称为B-KRnet,并将其应用于数据的密度估计/逼近或PDE的解,例如Fokker-Planck方程和Keller-Segel方程。与KRnet类似,B-KRnet由一系列耦合层组成,逐渐减少活动转换维度,受Knothe-Rosenblatt(KR)重新排列的三角形结构启发。B-KRnet和KRnet之间的主要区别在于B-KRnet是在超立方体上定义的,而KRnet是在整个空间上定义的,换句话说,B-KRnet引入了一种新机制来保持精确的可逆性。使用B-KRnet作为传输映射,我们获得了一个明确的概率密度函数(PDF)模型,对应于超立方体上基本(均匀)分布的前向推进。当只有数据可用时,它可以直接应用于密度估计。通过耦合KRnet和B-KRnet,我们在高维域上定义了一个深度生成模型,其中一些维度是有界的,而其他维度是无界的。一个典型案例是稳态动力学Fokker-Planck方程的解,它是位置和动量的PDF。基于B-KRnet,我们开发了一种自适应学习方法,用于逼近其解为PDF或可视为PDF的偏微分方程。展示了各种数值实验以证明B-KRnet的有效性。
更新时间: 2025-07-25 17:22:26
领域: cs.LG
Gradient-based grand canonical optimization enabled by graph neural networks with fractional atomic existence
Machine learning interatomic potentials have become an indispensable tool for materials science, enabling the study of larger systems and longer timescales. State-of-the-art models are generally graph neural networks that employ message passing to iteratively update atomic embeddings that are ultimately used for predicting properties. In this work we extend the message passing formalism with the inclusion of a continuous variable that accounts for fractional atomic existence. This allows us to calculate the gradient of the Gibbs free energy with respect to both the Cartesian coordinates of atoms and their existence. Using this we propose a gradient-based grand canonical optimization method and document its capabilities for a Cu(110) surface oxide.
Updated: 2025-07-25 17:13:41
标题: 基于梯度的分数原子存在图神经网络辅助的巨正则优化
摘要: 机器学习间原子势成为材料科学中不可或缺的工具,使得研究更大系统和更长时间尺度成为可能。当前的模型通常是图神经网络,利用消息传递来迭代更新原子嵌入,最终用于预测性质。在这项工作中,我们扩展了消息传递形式,包括一个连续变量,用于考虑原子存在的分数。这使我们能够计算Gibbs自由能相对于原子的笛卡尔坐标和存在性的梯度。利用这一点,我们提出了一种基于梯度的正则优化方法,并记录了其在Cu(110)表面氧化物中的能力。
更新时间: 2025-07-25 17:13:41
领域: cond-mat.mtrl-sci,cs.LG
ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives
Bridging the gap between ego-centric and exo-centric views has been a long-standing question in computer vision. In this paper, we focus on the emerging Ego-Exo object correspondence task, which aims to understand object relations across ego-exo perspectives through segmentation. While numerous segmentation models have been proposed, most operate on a single image (view), making them impractical for cross-view scenarios. PSALM, a recently proposed segmentation method, stands out as a notable exception with its demonstrated zero-shot ability on this task. However, due to the drastic viewpoint change between ego and exo, PSALM fails to accurately locate and segment objects, especially in complex backgrounds or when object appearances change significantly. To address these issues, we propose ObjectRelator, a novel approach featuring two key modules: Multimodal Condition Fusion (MCFuse) and SSL-based Cross-View Object Alignment (XObjAlign). MCFuse introduces language as an additional cue, integrating both visual masks and textual descriptions to improve object localization and prevent incorrect associations. XObjAlign enforces cross-view consistency through self-supervised alignment, enhancing robustness to object appearance variations. Extensive experiments demonstrate ObjectRelator's effectiveness on the large-scale Ego-Exo4D benchmark and HANDAL-X (an adapted dataset for cross-view segmentation) with state-of-the-art performance. Code is made available at: http://yuqianfu.com/ObjectRelator.
Updated: 2025-07-25 17:11:59
标题: ObjectRelator:在自我中心和外部中心视角之间实现跨视图对象关系理解的能力
摘要: 将自我中心和外部中心观点之间的差距缩小一直是计算机视觉中一个长期存在的问题。在本文中,我们专注于新兴的自我-外部物体对应任务,旨在通过分割理解自我-外部视角下的物体关系。虽然已经提出了许多分割模型,但大多数只在单个图像(视图)上运行,使它们在跨视图场景下不切实际。最近提出的分割方法PSALM以其在这一任务上表现出的零样本能力脱颖而出。然而,由于自我和外部之间的视点变化剧烈,PSALM无法准确定位和分割物体,尤其是在复杂背景或物体外观发生显著变化时。为了解决这些问题,我们提出了ObjectRelator,这是一种新颖的方法,具有两个关键模块:多模态条件融合(MCFuse)和基于SSL的跨视图物体对齐(XObjAlign)。MCFuse引入语言作为额外线索,将视觉掩模和文本描述整合在一起,以改善物体定位并防止不正确的关联。XObjAlign通过自监督对齐强化跨视图一致性,提高对物体外观变化的鲁棒性。大量实验证明了ObjectRelator在大规模自我-外部4D基准和HANDAL-X(一个适用于跨视图分割的调整数据集)上的有效性,表现出最先进的性能。代码可在以下网址获取:http://yuqianfu.com/ObjectRelator。
更新时间: 2025-07-25 17:11:59
领域: cs.CV,cs.AI
ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives
Bridging the gap between ego-centric and exo-centric views has been a long-standing question in computer vision. In this paper, we focus on the emerging Ego-Exo object correspondence task, which aims to understand object relations across ego-exo perspectives through segmentation. While numerous segmentation models have been proposed, most operate on a single image (view), making them impractical for cross-view scenarios. PSALM, a recently proposed segmentation method, stands out as a notable exception with its demonstrated zero-shot ability on this task. However, due to the drastic viewpoint change between ego and exo, PSALM fails to accurately locate and segment objects, especially in complex backgrounds or when object appearances change significantly. To address these issues, we propose ObjectRelator, a novel approach featuring two key modules: Multimodal Condition Fusion (MCFuse) and SSL-based Cross-View Object Alignment (XObjAlign). MCFuse introduces language as an additional cue, integrating both visual masks and textual descriptions to improve object localization and prevent incorrect associations. XObjAlign enforces cross-view consistency through self-supervised alignment, enhancing robustness to object appearance variations. Extensive experiments demonstrate ObjectRelator's effectiveness on the large-scale Ego-Exo4D benchmark and HANDAL-X (an adapted dataset for cross-view segmentation) with state-of-the-art performance. Code is made available at: http://yuqianfu.com/ObjectRelator.
Updated: 2025-07-25 17:11:59
标题: ObjectRelator:在自我中心和外部中心视角之间实现跨视图对象关系理解的能力
摘要: 在计算机视觉中,解决以自我为中心和以外部为中心视角之间的差距一直是一个长期存在的问题。本文关注新兴的自我-外部对象对应任务,旨在通过分割理解自我-外部视角之间的对象关系。虽然已经提出了许多分割模型,但大多数只适用于单个图像(视图),使它们在跨视图场景中不切实际。最近提出的分割方法PSALM在此任务中展示了零射击能力,成为一个值得注意的例外。然而,由于自我和外部之间的视角变化剧烈,PSALM无法准确定位和分割对象,特别是在复杂背景或对象外观发生显著变化时。为了解决这些问题,我们提出了一种新颖的方法ObjectRelator,包括两个关键模块:多模态条件融合(MCFuse)和基于SSL的跨视图对象对齐(XObjAlign)。MCFuse引入语言作为额外线索,将视觉掩模和文本描述整合起来,以改进对象定位并防止不正确的关联。XObjAlign通过自监督对齐强化跨视图一致性,增强对对象外观变化的鲁棒性。大量实验证明了ObjectRelator在大规模Ego-Exo4D基准和HANDAL-X(用于跨视图分割的适配数据集)上的效果,表现出最先进的性能。代码可在http://yuqianfu.com/ObjectRelator 上获取。
更新时间: 2025-07-25 17:11:59
领域: cs.CV,cs.AI
Observations Meet Actions: Learning Control-Sufficient Representations for Robust Policy Generalization
Capturing latent variations ("contexts") is key to deploying reinforcement-learning (RL) agents beyond their training regime. We recast context-based RL as a dual inference-control problem and formally characterize two properties and their hierarchy: observation sufficiency (preserving all predictive information) and control sufficiency (retaining decision-making relevant information). Exploiting this dichotomy, we derive a contextual evidence lower bound(ELBO)-style objective that cleanly separates representation learning from policy learning and optimizes it with Bottlenecked Contextual Policy Optimization (BCPO), an algorithm that places a variational information-bottleneck encoder in front of any off-policy policy learner. On standard continuous-control benchmarks with shifting physical parameters, BCPO matches or surpasses other baselines while using fewer samples and retaining performance far outside the training regime. The framework unifies theory, diagnostics, and practice for context-based RL.
Updated: 2025-07-25 17:08:16
标题: 观察与行动相遇:学习控制充分表示以实现稳健的策略泛化
摘要: 捕捉潜在变化(“上下文”)对于部署强化学习(RL)代理程序超越其训练范围至关重要。我们将基于上下文的RL重新构建为一个双推理控制问题,并正式表征了两个特性及其层次结构:观察充分性(保留所有预测信息)和控制充分性(保留决策相关信息)。利用这种二分法,我们推导出一种上下文证据下界(ELBO)风格的目标函数,清晰地将表示学习与策略学习分离,并使用瓶颈上下文策略优化(BCPO)进行优化,该算法在任何离线策略学习者前放置变分信息瓶颈编码器。在具有不断变化的物理参数的标准连续控制基准测试中,BCPO与其他基线相匹配或超越,同时使用更少的样本,并保持远超出训练范围的性能。该框架统一了基于上下文的RL的理论、诊断和实践。
更新时间: 2025-07-25 17:08:16
领域: cs.LG
TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models
Existing post-hoc model-agnostic methods generate external explanations for opaque models, primarily by locally attributing the model output to its input features. However, they often lack an explicit and systematic framework for quantifying the contribution of individual features. Building on the Taylor expansion framework introduced by Deng et al. (2024) to unify existing local attribution methods, we propose a rigorous set of postulates -- "precision", "federation", and "zero-discrepancy" -- to govern Taylor term-specific attribution. Guided by these postulates, we introduce TaylorPODA (Taylor expansion-derived imPortance-Order aDapted Attribution), which incorporates an additional "adaptation" property. This property enables alignment with task-specific goals, especially in post-hoc settings lacking ground-truth explanations. Empirical evaluations demonstrate that TaylorPODA achieves competitive results against baseline methods, providing principled and visualization-friendly explanations. This work represents a step toward the trustworthy deployment of opaque models by offering explanations with stronger theoretical grounding.
Updated: 2025-07-25 17:02:54
标题: TaylorPODA:基于泰勒展开的方法,用于改进不透明模型的事后归因
摘要: 现有的事后模型无关方法通过将模型输出局部归因于其输入特征,主要生成不透明模型的外部解释。然而,它们通常缺乏明确和系统的框架来量化各个特征的贡献。在邓等人(2024年)引入的泰勒展开框架的基础上,我们提出了一套严格的假设 -- “精度”、“联邦”和“零差异” -- 来管理泰勒项特定的归因。在这些假设的指导下,我们引入了TaylorPODA(泰勒展开派生的重要性顺序适应归因),其中包含了一个额外的“适应性”属性。这个属性使其能够与任务特定目标对齐,特别是在缺乏地面真相解释的事后设置中。实证评估表明,TaylorPODA在基线方法面前取得了竞争性结果,提供了基于原则和友好可视化的解释。这项工作代表着向着提供更坚实理论基础的解释,朝着可信赖地部署不透明模型迈出了一步。
更新时间: 2025-07-25 17:02:54
领域: stat.ML,cs.AI,cs.LG
TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models
Existing post-hoc model-agnostic methods generate external explanations for opaque models, primarily by locally attributing the model output to its input features. However, they often lack an explicit and systematic framework for quantifying the contribution of individual features. Building on the Taylor expansion framework introduced by Deng et al. (2024) to unify existing local attribution methods, we propose a rigorous set of postulates -- "precision", "federation", and "zero-discrepancy" -- to govern Taylor term-specific attribution. Guided by these postulates, we introduce TaylorPODA (Taylor expansion-derived imPortance-Order aDapted Attribution), which incorporates an additional "adaptation" property. This property enables alignment with task-specific goals, especially in post-hoc settings lacking ground-truth explanations. Empirical evaluations demonstrate that TaylorPODA achieves competitive results against baseline methods, providing principled and visualization-friendly explanations. This work represents a step toward the trustworthy deployment of opaque models by offering explanations with stronger theoretical grounding.
Updated: 2025-07-25 17:02:54
标题: TaylorPODA:一种基于泰勒展开的方法,用于改进不透明模型的事后归因
摘要: 现有的事后模型无关方法通过将模型输出局部归因于其输入特征来为不透明模型生成外部解释。然而,它们通常缺乏明确和系统的框架来量化各个特征的贡献。在邓等人(2024年)引入的泰勒展开框架的基础上,我们提出了一套严格的假设 -- “精度”、“联邦”和“零差异” -- 来指导泰勒项特定的归因。在这些假设的指导下,我们引入了TaylorPODA(基于泰勒展开导出的重要性顺序自适应归因),其中包含一个额外的“适应”属性。这个属性使得与任务特定目标对齐,特别是在缺乏地面真相解释的事后设置中。实证评估表明,TaylorPODA在对照方法方面取得了竞争性结果,提供了基于原则和友好可视化的解释。这项工作代表了朝着通过提供具有更强理论基础的解释来信任不透明模型的部署迈出的一步。
更新时间: 2025-07-25 17:02:54
领域: stat.ML,cs.AI,cs.LG
ASR-Guided Speaker-Role Diarization and Diarization-Guided ASR Decoding
From an application standpoint, speaker-role diarization (RD), such as doctor vs. patient, host vs. guest, etc. is often more useful than traditional speaker diarization (SD), which assigns generic labels like speaker-1, speaker-2 etc. In the context of joint automatic speech recognition (ASR) + SD (who spoke what?), recent end-to-end models employ an auxiliary SD transducer, synchronized with the ASR transducer, to predict speakers per word. In this paper, we extend this framework to RD with three key contributions: (1) we simplify the training via forced alignment and cross-entropy loss instead of RNNT loss, (2) we show that word prediction and role prediction require different amounts of predictor's context, leading to separate task-specific predictors, unlike existing shared-predictor models, and (3) we propose a way to leverage RD posterior activity to influence ASR decoding and reduce small-word deletion errors.
Updated: 2025-07-25 17:02:11
标题: ASR引导的说话者角色判别和判别引导的ASR解码
摘要: 从应用的角度来看,说话者角色的日记化(RD),例如医生与病人、主持人与嘉宾等,通常比传统的说话者日记化(SD)更有用,后者分配类似说话者-1、说话者-2等通用标签。在联合自动语音识别(ASR)+ SD(谁说了什么?)的背景下,最近的端到端模型采用辅助SD转换器,与ASR转换器同步,以预测每个单词的说话者。在本文中,我们将这一框架扩展到RD,做出了三个关键贡献:(1)我们通过强制对齐和交叉熵损失简化了训练,而不是RNNT损失,(2)我们表明单词预测和角色预测需要不同数量的预测器上下文,导致单独的任务特定预测器,不像现有的共享预测器模型那样,(3)我们提出了一种利用RD后验活动影响ASR解码并减少小词删除错误的方法。
更新时间: 2025-07-25 17:02:11
领域: eess.AS,cs.AI,cs.LG
ASR-Guided Speaker-Role Diarization and Diarization-Guided ASR Decoding
From an application standpoint, speaker-role diarization (RD), such as doctor vs. patient, host vs. guest, etc. is often more useful than traditional speaker diarization (SD), which assigns generic labels like speaker-1, speaker-2 etc. In the context of joint automatic speech recognition (ASR) + SD (who spoke what?), recent end-to-end models employ an auxiliary SD transducer, synchronized with the ASR transducer, to predict speakers per word. In this paper, we extend this framework to RD with three key contributions: (1) we simplify the training via forced alignment and cross-entropy loss instead of RNNT loss, (2) we show that word prediction and role prediction require different amounts of predictor's context, leading to separate task-specific predictors, unlike existing shared-predictor models, and (3) we propose a way to leverage RD posterior activity to influence ASR decoding and reduce small-word deletion errors.
Updated: 2025-07-25 17:02:11
标题: ASR引导的说话者角色日程安排和日程安排引导的ASR解码
摘要: 从应用角度来看,说话者角色的分离(RD),如医生与患者,主人与客人等,通常比传统的说话者分离(SD)更有用,后者分配类似说话者-1、说话者-2等通用标签。在联合自动语音识别(ASR)+ SD(谁说了什么?)的背景下,最近的端到端模型采用了一个辅助SD传感器,与ASR传感器同步,以预测每个单词的说话者。在本文中,我们将此框架扩展到RD,具有三个关键贡献:(1)我们通过强制对齐和交叉熵损失简化了训练,而不是使用RNNT损失,(2)我们表明单词预测和角色预测需要不同数量的预测器上下文,导致独立的任务特定预测器,而不是现有的共享预测器模型,(3)我们提出一种利用RD后验活动来影响ASR解码并减少小词删除错误的方法。
更新时间: 2025-07-25 17:02:11
领域: eess.AS,cs.AI,cs.LG
Distillation Scaling Laws
We propose a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings mitigate the risks associated with large-scale distillation by enabling compute-optimal allocation for both the teacher and student to maximize student performance. We provide compute-optimal distillation recipes for two key scenarios: when a teacher already exists, and when a teacher needs training. In settings involving many students or an existing teacher, distillation outperforms supervised learning up to a compute level that scales predictably with student size. Conversely, if only one student is to be distilled and a teacher also requires training, supervised learning is generally preferable. Additionally, our large-scale study of distillation increases our understanding of the process and helps inform experimental design.
Updated: 2025-07-25 16:55:43
标题: 蒸馏比例定律
摘要: 我们提出了一种提炼缩放定律,根据计算预算及其在学生和教师之间的分配来估计提炼模型的性能。我们的发现通过实现针对教师和学生的计算优化分配,有助于降低与大规模提炼相关的风险,以最大化学生性能。我们为两种关键情景提供了计算优化的提炼配方:当已经存在教师时,以及当教师需要培训时。在涉及许多学生或已存在教师的情况下,提炼可以在一个与学生规模可预测扩展的计算水平上胜过监督学习。相反,如果只有一个学生需要被提炼并且教师也需要培训,通常情况下监督学习更可取。此外,我们对提炼的大规模研究增加了我们对该过程的理解,并有助于指导实验设计。
更新时间: 2025-07-25 16:55:43
领域: cs.LG,cs.AI,cs.CL,stat.ML
Distillation Scaling Laws
We propose a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings mitigate the risks associated with large-scale distillation by enabling compute-optimal allocation for both the teacher and student to maximize student performance. We provide compute-optimal distillation recipes for two key scenarios: when a teacher already exists, and when a teacher needs training. In settings involving many students or an existing teacher, distillation outperforms supervised learning up to a compute level that scales predictably with student size. Conversely, if only one student is to be distilled and a teacher also requires training, supervised learning is generally preferable. Additionally, our large-scale study of distillation increases our understanding of the process and helps inform experimental design.
Updated: 2025-07-25 16:55:43
标题: 蒸馏比例定律
摘要: 我们提出了一种蒸馏缩放定律,可以根据计算预算及其在学生和教师之间的分配来估计蒸馏模型的性能。我们的研究通过使教师和学生的计算最优分配来减轻与大规模蒸馏相关的风险,以最大化学生的性能。我们为两种关键情况提供了计算最优的蒸馏配方:当已经存在教师时,以及当需要培训教师时。在涉及许多学生或已有教师的情况下,蒸馏的性能优于监督学习,直到学生规模可预测地扩展至某个计算水平。相反,如果只有一个学生需要蒸馏且教师也需要培训,则通常更倾向于使用监督学习。此外,我们对蒸馏的大规模研究增加了对该过程的理解,并有助于指导实验设计。
更新时间: 2025-07-25 16:55:43
领域: cs.LG,cs.AI,cs.CL,stat.ML
Integrating Physics and Topology in Neural Networks for Learning Rigid Body Dynamics
Rigid body interactions are fundamental to numerous scientific disciplines, but remain challenging to simulate due to their abrupt nonlinear nature and sensitivity to complex, often unknown environmental factors. These challenges call for adaptable learning-based methods capable of capturing complex interactions beyond explicit physical models and simulations. While graph neural networks can handle simple scenarios, they struggle with complex scenes and long-term predictions. We introduce a novel framework for modeling rigid body dynamics and learning collision interactions, addressing key limitations of existing graph-based methods. Our approach extends the traditional representation of meshes by incorporating higher-order topology complexes, offering a physically consistent representation. Additionally, we propose a physics-informed message-passing neural architecture, embedding physical laws directly in the model. Our method demonstrates superior accuracy, even during long rollouts, and exhibits strong generalization to unseen scenarios. Importantly, this work addresses the challenge of multi-entity dynamic interactions, with applications spanning diverse scientific and engineering domains.
Updated: 2025-07-25 16:54:47
标题: 整合物理学和拓扑学于神经网络中,用于学习刚体动力学
摘要: 刚体相互作用对许多科学学科至关重要,但由于其突发的非线性特性和对复杂、常常未知的环境因素的敏感性,模拟仍然具有挑战性。这些挑战要求适应性学习方法,能够捕捉超越明确物理模型和模拟的复杂相互作用。虽然图神经网络可以处理简单场景,但在复杂场景和长期预测方面仍然有困难。我们提出了一个新颖的框架,用于建模刚体动力学并学习碰撞相互作用,解决了现有基于图的方法的关键限制。我们的方法通过结合高阶拓扑复合体扩展了传统的网格表示,提供了一个物理上一致的表示。此外,我们提出了一种物理信息传递神经架构,将物理定律直接嵌入模型中。我们的方法在长时间预测中表现出优越的准确性,并展现了对未知场景的强大泛化能力。重要的是,这项工作解决了多实体动态相互作用的挑战,应用领域涵盖多个科学和工程领域。
更新时间: 2025-07-25 16:54:47
领域: cs.LG
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding
Large language models (LLMs) face low hardware efficiency during decoding, especially for long-context reasoning tasks. This paper introduces Step-3, a 321B-parameter VLM with hardware-aware model-system co-design optimized for minimizing decoding costs. Step-3 innovates in two key dimensions: (1) A novel Multi-Matrix Factorization Attention (MFA) mechanism that significantly reduces both KV cache size and computation while maintaining high attention expressiveness, and (2) Attention-FFN Disaggregation (AFD), a distributed inference system that decouples attention and Feed-Forward Network (FFN) layers into specialized subsystems. This co-design achieves unprecedented cost efficiency: Step-3 significantly reduces theoretical decoding costs compared with models like DeepSeek-V3 and Qwen3 MoE 235B, with the gains widening at longer context. Step-3 achieves low cost while activating 38B parameters per token (more than DeepSeek-V3 and Qwen3 MoE 235B), demonstrating that hardware-aligned attention arithmetic intensity, MoE sparsity, and AFD are critical to cost-effectiveness. We perform a head-to-head comparison with DeepSeek-V3 in its favorable scenarios. Our implementation on Hopper GPUs achieves a decoding throughput of up to 4,039 tokens per second per GPU under 50ms TPOT SLA (4K context, FP8, no MTP). It is higher than DeepSeek-V3's 2,324 in the same setup and sets a new Pareto frontier for LLM decoding.
Updated: 2025-07-25 16:53:13
标题: 第三步是庞大而又实惠的:基于模型系统共同设计的成本效益解码
摘要: 大型语言模型(LLMs)在解码过程中面临硬件效率低下的问题,特别是对于长上下文推理任务。本文介绍了Step-3,一个拥有321B参数的VLM,采用硬件感知模型-系统协同设计,旨在最大程度地降低解码成本。Step-3在两个关键维度上进行了创新:(1)一种新颖的多矩阵因式分解注意力(MFA)机制,显著减少了KV缓存大小和计算量,同时保持高注意力表达能力;(2)注意力-前馈网络(FFN)解耦(AFD)机制,将注意力和前馈网络层分开成专门的子系统。这种协同设计实现了前所未有的成本效率:与DeepSeek-V3和Qwen3 MoE 235B等模型相比,Step-3显著降低了理论上的解码成本,随着上下文长度的增加,这种收益进一步扩大。Step-3在每个标记激活38B参数的情况下实现了低成本(高于DeepSeek-V3和Qwen3 MoE 235B),证明了硬件对齐的注意力算术密度、MoE稀疏性和AFD对成本效益至关重要。我们在有利场景下与DeepSeek-V3进行了直接比较。我们在Hopper GPU上的实现在50ms TPOT SLA条件下实现了每GPU每秒最多4,039个标记的解码吞吐量(4K上下文,FP8,无MTP)。这高于DeepSeek-V3在相同设置下的2,324个标记,并为LLM解码设定了新的帕累托前沿。
更新时间: 2025-07-25 16:53:13
领域: cs.LG,cs.AI
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding
Large language models (LLMs) face low hardware efficiency during decoding, especially for long-context reasoning tasks. This paper introduces Step-3, a 321B-parameter VLM with hardware-aware model-system co-design optimized for minimizing decoding costs. Step-3 innovates in two key dimensions: (1) A novel Multi-Matrix Factorization Attention (MFA) mechanism that significantly reduces both KV cache size and computation while maintaining high attention expressiveness, and (2) Attention-FFN Disaggregation (AFD), a distributed inference system that decouples attention and Feed-Forward Network (FFN) layers into specialized subsystems. This co-design achieves unprecedented cost efficiency: Step-3 significantly reduces theoretical decoding costs compared with models like DeepSeek-V3 and Qwen3 MoE 235B, with the gains widening at longer context. Step-3 achieves low cost while activating 38B parameters per token (more than DeepSeek-V3 and Qwen3 MoE 235B), demonstrating that hardware-aligned attention arithmetic intensity, MoE sparsity, and AFD are critical to cost-effectiveness. We perform a head-to-head comparison with DeepSeek-V3 in its favorable scenarios. Our implementation on Hopper GPUs achieves a decoding throughput of up to 4,039 tokens per second per GPU under 50ms TPOT SLA (4K context, FP8, no MTP). It is higher than DeepSeek-V3's 2,324 in the same setup and sets a new Pareto frontier for LLM decoding.
Updated: 2025-07-25 16:53:13
标题: Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding 步骤三是庞大而又实惠的:模型系统共同设计以实现成本效益解码
摘要: 大型语言模型(LLMs)在解码过程中面临硬件效率低下的问题,特别是对于长文本推理任务。本文介绍了Step-3,一个具有321亿参数的VLM,通过硬件感知的模型-系统协同设计进行优化,旨在最大程度地降低解码成本。Step-3在两个关键维度上进行了创新:(1)一种新颖的多矩阵因式分解注意力(MFA)机制,显著减少了KV缓存大小和计算量,同时保持了高注意力表现力;(2)注意力-前馈网络(FFN)分解(AFD),一种将注意力和前馈网络层解耦为专用子系统的分布式推理系统。这种协同设计实现了空前的成本效益:与DeepSeek-V3和Qwen3 MoE 235B等模型相比,Step-3显著降低了理论解码成本,在更长的上下文下获得了更大的收益。Step-3在每个标记激活38亿参数的情况下实现了低成本(比DeepSeek-V3和Qwen3 MoE 235B更多),表明硬件对齐的注意力算术强度、MoE稀疏性和AFD对成本效益至关重要。我们在其有利场景下与DeepSeek-V3进行了直接比较。我们在Hopper GPU上的实现在50ms TPOT SLA(4K上下文,FP8,无MTP)下实现了每GPU每秒4039个标记的解码吞吐量。这比相同设置下DeepSeek-V3的2324要高,并为LLM解码设定了新的帕累托前沿。
更新时间: 2025-07-25 16:53:13
领域: cs.LG,cs.AI
Perfect Clustering in Very Sparse Diverse Multiplex Networks
The paper studies the DIverse MultiPLEx Signed Generalized Random Dot Product Graph (DIMPLE-SGRDPG) network model (Pensky (2024)), where all layers of the network have the same collection of nodes. In addition, all layers can be partitioned into groups such that the layers in the same group are embedded in the same ambient subspace but otherwise matrices of connection probabilities can be all different. This setting includes majority of multilayer network models as its particular cases. The key task in this model is to recover the groups of layers with unique subspace structures, since the case where all layers of the network are embedded in the same subspace has been fairly well studied. Until now, clustering of layers in such networks was based on the layer-per-layer analysis, which required the multilayer network to be sufficiently dense. Nevertheless, in this paper we succeeded in pooling information in all layers together and providing a tensor-based methodology that ensures perfect clustering for a much sparser network. Our theoretical results, established under intuitive non-restrictive assumptions, assert that the new technique achieves perfect clustering under sparsity conditions that, up to logarithmic factors, coincide with the computational lower bound derived for a much simpler model.
Updated: 2025-07-25 16:43:42
标题: 非常稀疏多样化多重网络中的完美聚类
摘要: 本文研究了DIverse MultiPLEx Signed Generalized Random Dot Product Graph (DIMPLE-SGRDPG)网络模型(Pensky(2024)),其中网络的所有层具有相同的节点集合。此外,所有层可以被分成组,使得同一组中的层嵌入在相同的环境子空间中,但连接概率矩阵可能完全不同。这种设置包括大多数多层网络模型作为其特殊情况。该模型中的关键任务是恢复具有唯一子空间结构的层组,因为网络的所有层都嵌入在相同子空间的情况已经得到了很好的研究。直到现在,在这种网络中层的聚类是基于逐层分析的,这要求多层网络足够密集。然而,在本文中,我们成功地将所有层的信息汇集在一起,并提供了一种基于张量的方法,确保对更稀疏的网络进行完美聚类。我们在直观的非限制性假设下建立的理论结果表明,这种新技术在稀疏条件下实现了完美的聚类,这些条件与为一个更简单的模型推导出的计算下限在对数因子上一致。
更新时间: 2025-07-25 16:43:42
领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH
Programmable Virtual Humans Toward Human Physiologically-Based Drug Discovery
Artificial intelligence (AI) has sparked immense interest in drug discovery, but most current approaches only digitize existing high-throughput experiments. They remain constrained by conventional pipelines. As a result, they do not address the fundamental challenges of predicting drug effects in humans. Similarly, biomedical digital twins, largely grounded in real-world data and mechanistic models, are tailored for late-phase drug development and lack the resolution to model molecular interactions or their systemic consequences, limiting their impact in early-stage discovery. This disconnect between early discovery and late development is one of the main drivers of high failure rates in drug discovery. The true promise of AI lies not in augmenting current experiments but in enabling virtual experiments that are impossible in the real world: testing novel compounds directly in silico in the human body. Recent advances in AI, high-throughput perturbation assays, and single-cell and spatial omics across species now make it possible to construct programmable virtual humans: dynamic, multiscale models that simulate drug actions from molecular to phenotypic levels. By bridging the translational gap, programmable virtual humans offer a transformative path to optimize therapeutic efficacy and safety earlier than ever before. This perspective introduces the concept of programmable virtual humans, explores their roles in a new paradigm of drug discovery centered on human physiology, and outlines key opportunities, challenges, and roadmaps for their realization.
Updated: 2025-07-25 16:40:57
标题: 可编程虚拟人类朝着基于人体生理的药物发现方向发展
摘要: 人工智能(AI)在药物发现领域引起了巨大兴趣,但目前大多数方法仅是将现有的高通量实验数字化。它们仍受限于常规流程。因此,它们无法解决预测药物在人体中的效果的根本挑战。同样,主要基于现实世界数据和机械模型的生物医学数字孪生体,专门用于后期药物开发,缺乏模拟分子相互作用或其系统后果的分辨率,限制了它们在早期发现中的影响。早期发现与后期开发之间的脱节是导致药物发现高失败率的主要驱动因素之一。AI的真正承诺不在于增强当前实验,而在于实现在现实世界中不可能进行的虚拟实验:在体内直接对新型化合物进行计算机模拟测试。最近在AI、高通量干扰测定、单细胞和空间组学跨物种方面的进展,现在使得构建可编程虚拟人类成为可能:动态的、多尺度的模型,从分子到表型水平模拟药物作用。通过弥合翻译差距,可编程虚拟人类提供了一个转变路径,可以比以往任何时候更早地优化治疗效果和安全性。本文介绍了可编程虚拟人类的概念,探讨了它们在以人体生理学为中心的新药物发现范式中的作用,并概述了实现它们的关键机会、挑战和路线图。
更新时间: 2025-07-25 16:40:57
领域: cs.CY,cs.AI,cs.CE,cs.LG
Differentiating hype from practical applications of large language models in medicine - a primer for healthcare professionals
The medical ecosystem consists of the training of new clinicians and researchers, the practice of clinical medicine, and areas of adjacent research. There are many aspects of these domains that could benefit from the application of task automation and programmatic assistance. Machine learning and artificial intelligence techniques, including large language models (LLMs), have been promised to deliver on healthcare innovation, improving care speed and accuracy, and reducing the burden on staff for manual interventions. However, LLMs have no understanding of objective truth that is based in reality. They also represent real risks to the disclosure of protected information when used by clinicians and researchers. The use of AI in medicine in general, and the deployment of LLMs in particular, therefore requires careful consideration and thoughtful application to reap the benefits of these technologies while avoiding the dangers in each context.
Updated: 2025-07-25 16:40:17
标题: 区分医学领域中大型语言模型的炒作和实际应用——医疗专业人士的入门指南
摘要: 医疗生态系统包括新临床医生和研究人员的培训、临床医学实践以及相邻研究领域。这些领域的许多方面都可以从任务自动化和程序辅助应用中受益。机器学习和人工智能技术,包括大型语言模型(LLMs),被承诺能够实现医疗创新,提高护理速度和准确性,并减轻人工干预的负担。然而,LLMs并没有对现实中基于客观真相的理解。当临床医生和研究人员使用时,它们还具有泄露受保护信息的真实风险。因此,在医学领域普遍使用人工智能,特别是部署LLMs,需要仔细考虑和深思熟虑,以在每种情况下获得这些技术的益处同时避免危险。
更新时间: 2025-07-25 16:40:17
领域: cs.CY,cs.AI
CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing
The processing mechanisms underlying language and image understanding in large vision-language models (LVLMs) have been extensively studied. However, the internal reasoning mechanisms of LVLMs for spatiotemporal understanding remain poorly understood. In this work, we introduce a systematic, circuit-based framework designed to investigate how spatiotemporal visual semantics are represented and processed within these LVLMs. Specifically, our framework comprises three circuits: visual auditing circuit, semantic tracing circuit, and attention flow circuit. Through the lens of these circuits, we discover that visual semantics are highly localized to specific object tokens--removing these tokens can degrade model performance by up to 92.6%. Furthermore, we identify that interpretable concepts of objects and actions emerge and become progressively refined in the middle-to-late layers of LVLMs. In contrary to the current works that solely focus on objects in one image, we reveal that the middle-to-late layers of LVLMs exhibit specialized functional localization for spatiotemporal semantics. Our findings offer significant mechanistic insights into spatiotemporal semantics analysis of LVLMs, laying a foundation for designing more robust and interpretable models.
Updated: 2025-07-25 16:38:18
标题: CircuitProbe:利用电路追踪解剖时空视觉语义
摘要: 大视觉语言模型(LVLMs)中语言和图像理解背后的处理机制已经得到广泛研究。然而,LVLMs对时空理解的内部推理机制仍然知之甚少。在这项工作中,我们介绍了一个系统化的、基于电路的框架,旨在探讨这些LVLMs中时空视觉语义是如何表示和处理的。具体而言,我们的框架包括三个电路:视觉审计电路、语义追踪电路和注意力流电路。通过这些电路的视角,我们发现视觉语义高度局限于特定的物体标记--删除这些标记可能导致模型性能下降高达92.6%。此外,我们发现LVLMs中的物体和动作的可解释概念在中至后层逐渐出现并逐渐精细化。与当前仅关注一幅图像中的物体的工作相反,我们揭示了LVLMs中中至后层对时空语义具有专门的功能定位。我们的发现为LVLMs的时空语义分析提供了重要的机制洞见,为设计更强大和可解释的模型奠定了基础。
更新时间: 2025-07-25 16:38:18
领域: cs.CV,cs.LG
SILS: Strategic Influence on Liquidity Stability and Whale Detection in Concentrated-Liquidity DEXs
Traditional methods for identifying impactful liquidity providers (LPs) in Concentrated Liquidity Market Makers (CLMMs) rely on broad measures, such as nominal capital size or surface-level activity, which often lead to inaccurate risk analysis. The SILS framework offers a significantly more detailed approach, characterizing LPs not just as capital holders but as dynamic systemic agents whose actions directly impact market stability. This represents a fundamental paradigm shift from the static, volume-based analysis to a dynamic, impact-focused understanding. This advanced approach uses on-chain event logs and smart contract execution traces to compute Exponential Time-Weighted Liquidity (ETWL) profiles and apply unsupervised anomaly detection. Most importantly, it defines an LP's functional importance through the Liquidity Stability Impact Score (LSIS), a counterfactual metric that measures the potential degradation of the market if the LP withdraws. This combined approach provides a more detailed and realistic characterization of an LP's impact, moving beyond the binary and often misleading classifications used by existing methods. This impact-focused and comprehensive approach enables SILS to accurately identify high-impact LPs-including those missed by traditional methods and supports essential applications like a protective oracle layer and actionable trader signals, thereby significantly enhancing DeFi ecosystem. The framework provides unprecedented transparency into the underlying liquidity structure and associated risks, effectively reducing the common false positives and uncovering critical false negatives found in traditional models. Therefore, SILS provides an effective mechanism for proactive risk management, transforming how DeFi protocols safeguard their ecosystems against asymmetric liquidity behavior.
Updated: 2025-07-25 16:21:18
标题: SILS:在集中流动性DEX中对流动性稳定性和大户检测的战略影响
摘要: 传统方法用于确定在集中流动性市场制造商(CLMMs)中具有影响力的流动性提供者(LPs)依赖于广泛的措施,如名义资本规模或表面级活动,这经常导致不准确的风险分析。SILS框架提供了一种更加详细的方法,将LPs描述为动态系统代理,其行动直接影响市场稳定性,而不仅仅是资本持有者。这代表了从静态、基于交易量的分析向动态、关注影响的理解的根本范式转变。这种先进的方法利用链上事件日志和智能合约执行跟踪来计算指数加权流动性(ETWL)配置文件并应用无监督的异常检测。最重要的是,它通过流动性稳定性影响评分(LSIS)定义了LP的功能重要性,这是一个反事实度量,衡量了LP退出市场可能导致的市场恶化程度。这种综合方法提供了对LP影响的更详细和现实的描述,超越了现有方法使用的二元和常常误导性的分类。这种关注影响的综合方法使得SILS能够准确识别高影响LPs,包括那些传统方法所忽视的LPs,并支持关键应用,如保护性Oracle层和可操作的交易信号,从而显著增强DeFi生态系统。该框架为底层流动性结构和相关风险提供了前所未有的透明度,有效减少了传统模型中常见的假阳性,并揭示了传统模型中发现的重要假阴性。因此,SILS提供了一种有效的机制,用于积极的风险管理,改变了DeFi协议如何保护其生态系统免受不对称流动性行为的影响。
更新时间: 2025-07-25 16:21:18
领域: cs.LG,cs.CR,cs.ET
On Arbitrary Predictions from Equally Valid Models
Model multiplicity refers to the existence of multiple machine learning models that describe the data equally well but may produce different predictions on individual samples. In medicine, these models can admit conflicting predictions for the same patient -- a risk that is poorly understood and insufficiently addressed. In this study, we empirically analyze the extent, drivers, and ramifications of predictive multiplicity across diverse medical tasks and model architectures, and show that even small ensembles can mitigate/eliminate predictive multiplicity in practice. Our analysis reveals that (1) standard validation metrics fail to identify a uniquely optimal model and (2) a substantial amount of predictions hinges on arbitrary choices made during model development. Using multiple models instead of a single model reveals instances where predictions differ across equally plausible models -- highlighting patients that would receive arbitrary diagnoses if any single model were used. In contrast, (3) a small ensemble paired with an abstention strategy can effectively mitigate measurable predictive multiplicity in practice; predictions with high inter-model consensus may thus be amenable to automated classification. While accuracy is not a principled antidote to predictive multiplicity, we find that (4) higher accuracy achieved through increased model capacity reduces predictive multiplicity. Our findings underscore the clinical importance of accounting for model multiplicity and advocate for ensemble-based strategies to improve diagnostic reliability. In cases where models fail to reach sufficient consensus, we recommend deferring decisions to expert review.
Updated: 2025-07-25 16:15:59
标题: 关于等效有效模型的任意预测
摘要: 模型多样性指的是存在多个机器学习模型,它们同样有效地描述数据,但可能在个体样本上产生不同的预测。在医学领域,这些模型可能为同一患者提供相互冲突的预测 - 这是一个尚未得到充分理解和处理的风险。 在本研究中,我们实证分析了跨不同医学任务和模型架构的预测多样性的程度、驱动因素和后果,并展示即使小型集成模型也可以在实践中缓解/消除预测多样性。我们的分析揭示了(1)标准验证指标无法识别出唯一最佳模型,以及(2)大量预测取决于模型开发过程中做出的任意选择。使用多个模型而不是单一模型可以揭示在同样合理的模型之间预测不同的情况 - 强调如果使用任何单一模型,患者将会得到任意诊断。 相反,(3)一个小型集成模型配合弃权策略可以有效地在实践中减少可测量的预测多样性;具有高度模型间一致性的预测因此可能适合自动化分类。虽然准确性并非预测多样性的根本解药,但我们发现通过增加模型容量实现更高准确性可以减少预测多样性。 我们的发现强调了考虑模型多样性的临床重要性,并倡导基于集成的策略来改善诊断可靠性。在模型未能达成足够一致意见的情况下,我们建议将决策推迟至专家审查。
更新时间: 2025-07-25 16:15:59
领域: cs.LG,cs.AI
On Arbitrary Predictions from Equally Valid Models
Model multiplicity refers to the existence of multiple machine learning models that describe the data equally well but may produce different predictions on individual samples. In medicine, these models can admit conflicting predictions for the same patient -- a risk that is poorly understood and insufficiently addressed. In this study, we empirically analyze the extent, drivers, and ramifications of predictive multiplicity across diverse medical tasks and model architectures, and show that even small ensembles can mitigate/eliminate predictive multiplicity in practice. Our analysis reveals that (1) standard validation metrics fail to identify a uniquely optimal model and (2) a substantial amount of predictions hinges on arbitrary choices made during model development. Using multiple models instead of a single model reveals instances where predictions differ across equally plausible models -- highlighting patients that would receive arbitrary diagnoses if any single model were used. In contrast, (3) a small ensemble paired with an abstention strategy can effectively mitigate measurable predictive multiplicity in practice; predictions with high inter-model consensus may thus be amenable to automated classification. While accuracy is not a principled antidote to predictive multiplicity, we find that (4) higher accuracy achieved through increased model capacity reduces predictive multiplicity. Our findings underscore the clinical importance of accounting for model multiplicity and advocate for ensemble-based strategies to improve diagnostic reliability. In cases where models fail to reach sufficient consensus, we recommend deferring decisions to expert review.
Updated: 2025-07-25 16:15:59
标题: 关于同等有效模型中的任意预测
摘要: 模型多样性指的是存在多个机器学习模型,这些模型同样能够描述数据,但可能在个别样本上产生不同的预测。在医学领域,这些模型可能会对同一患者做出不一致的预测,这是一个风险,目前对其了解不足且未能得到充分解决。 在本研究中,我们实证分析了不同医学任务和模型架构中预测多样性的程度、驱动因素和后果,并展示了即使是小型集成模型也可以在实践中缓解/消除预测多样性。我们的分析揭示了:(1)标准验证指标无法识别出一个独特的最佳模型;(2)大量预测依赖于模型开发过程中做出的任意选择。使用多个模型而不是单一模型可以揭示出在同样合理的模型之间预测差异的情况,突出显示出如果只使用单一模型的话,某些患者可能会接受任意诊断。 相反,(3)一个小型集成模型搭配弃权策略可以有效地在实践中缓解可测量的预测多样性;具有高度一致性的模型之间的预测可能适合自动分类。虽然准确性并不是预测多样性的根本解决方案,但我们发现通过增加模型容量实现更高准确性可以降低预测多样性。 我们的发现强调了考虑模型多样性的临床重要性,并倡导基于集成的策略来提高诊断可靠性。在模型无法达成足够共识的情况下,我们建议推迟决策并进行专家审查。
更新时间: 2025-07-25 16:15:59
领域: cs.LG,cs.AI
Review of Deep Learning Applications to Structural Proteomics Enabled by Cryogenic Electron Microscopy and Tomography
The past decade's "cryoEM revolution" has produced exponential growth in high-resolution structural data through advances in cryogenic electron microscopy (cryoEM) and tomography (cryoET). Deep learning integration into structural proteomics workflows addresses longstanding challenges including low signal-to-noise ratios, preferred orientation artifacts, and missing-wedge problems that historically limited efficiency and scalability. This review examines AI applications across the entire cryoEM pipeline, from automated particle picking using convolutional neural networks (Topaz, crYOLO, CryoSegNet) to computational solutions for preferred orientation bias (spIsoNet, cryoPROS) and advanced denoising algorithms (Topaz-Denoise). In cryoET, tools like IsoNet employ U-Net architectures for simultaneous missing-wedge correction and noise reduction, while TomoNet streamlines subtomogram averaging through AI-driven particle detection. The workflow culminates with automated atomic model building using sophisticated tools like ModelAngelo, DeepTracer, and CryoREAD that translate density maps into interpretable biological structures. These AI-enhanced approaches have achieved near-atomic resolution reconstructions with minimal manual intervention, resolved previously intractable datasets suffering from severe orientation bias, and enabled successful application to diverse biological systems from HIV virus-like particles to in situ ribosomal complexes. As deep learning evolves, particularly with large language models and vision transformers, the future promises sophisticated automation and accessibility in structural biology, potentially revolutionizing our understanding of macromolecular architecture and function.
Updated: 2025-07-25 16:15:09
标题: 冷冻电子显微镜和层析法促进的深度学习在结构蛋白质组学中的应用综述
摘要: 过去十年的“冷冻电镜革命”通过冷冻电镜(cryoEM)和层析(cryoET)的进步,产生了高分辨率结构数据的指数增长。将深度学习整合到结构蛋白质组学工作流程中,解决了长期存在的低信噪比、优选取向伪影和缺失楔问题,这些问题在历史上限制了效率和可伸缩性。本综述检查了AI在整个cryoEM流程中的应用,从使用卷积神经网络(Topaz、crYOLO、CryoSegNet)进行自动粒子拾取,到解决优选取向偏差(spIsoNet、cryoPROS)和高级降噪算法(Topaz-Denoise)。在cryoET中,像IsoNet这样的工具采用U-Net架构,实现了同时缺失楔校正和降噪,而TomoNet通过AI驱动的粒子检测简化了子体积平均。工作流程以使用ModelAngelo、DeepTracer和CryoREAD等复杂工具自动构建原子模型,将密度图转化为可解释的生物结构。这些经过AI增强的方法已经实现了近原子分辨率的重建,几乎没有人工干预,解决了以前无法处理的受到严重取向偏差影响的数据集,并成功应用于从HIV病毒样颗粒到原位核糖体复合物等不同的生物系统。随着深度学习的发展,特别是大型语言模型和视觉转换器的出现,未来将带来结构生物学领域复杂的自动化和易用性,可能彻底改变我们对大分子结构和功能的理解。
更新时间: 2025-07-25 16:15:09
领域: q-bio.QM,cs.CV,cs.LG
SDVDiag: A Modular Platform for the Diagnosis of Connected Vehicle Functions
Connected and software-defined vehicles promise to offer a broad range of services and advanced functions to customers, aiming to increase passenger comfort and support autonomous driving capabilities. Due to the high reliability and availability requirements of connected vehicles, it is crucial to resolve any occurring failures quickly. To achieve this however, a complex cloud/edge architecture with a mesh of dependencies must be navigated to diagnose the responsible root cause. As such, manual analyses become unfeasible since they would significantly delay the troubleshooting. To address this challenge, this paper presents SDVDiag, an extensible platform for the automated diagnosis of connected vehicle functions. The platform enables the creation of pipelines that cover all steps from initial data collection to the tracing of potential root causes. In addition, SDVDiag supports self-adaptive behavior by the ability to exchange modules at runtime. Dependencies between functions are detected and continuously updated, resulting in a dynamic graph view of the system. In addition, vital system metrics are monitored for anomalies. Whenever an incident is investigated, a snapshot of the graph is taken and augmented by relevant anomalies. Finally, the analysis is performed by traversing the graph and creating a ranking of the most likely causes. To evaluate the platform, it is deployed inside an 5G test fleet environment for connected vehicle functions. The results show that injected faults can be detected reliably. As such, the platform offers the potential to gain new insights and reduce downtime by identifying problems and their causes at an early stage.
Updated: 2025-07-25 16:09:27
标题: SDVDiag:连接车辆功能诊断的模块化平台
摘要: 连接和软件定义车辆承诺为客户提供广泛的服务和先进功能,旨在提高乘客舒适度并支持自动驾驶功能。由于连接车辆具有很高的可靠性和可用性要求,快速解决任何发生的故障至关重要。然而,要实现这一目标,必须导航一个具有依赖关系网格的复杂云/边缘架构,以诊断负责的根本原因。因此,由于手动分析会显著延迟故障排除,因此变得不可行。 为了解决这一挑战,本文介绍了SDVDiag,一种用于自动诊断连接车辆功能的可扩展平台。该平台使得可以创建涵盖从初始数据收集到追踪潜在根本原因的流水线。此外,SDVDiag通过在运行时交换模块的能力来支持自适应行为。检测到并持续更新功能之间的依赖关系,从而产生系统的动态图视图。此外,重要的系统指标将被监视以检测异常。每当调查事故时,都会对系统图进行快照并加以相关异常。最后,通过遍历图并创建最有可能原因的排名来执行分析。 为了评估该平台,它被部署在一个用于连接车辆功能的5G测试车队环境中。结果表明,注入的故障可以被可靠地检测到。因此,该平台提供了在早期识别问题及其原因的情况下获得新见解并减少停机时间的潜力。
更新时间: 2025-07-25 16:09:27
领域: cs.SE,cs.AI,cs.DC,B.8.2; C.2.4
SDVDiag: A Modular Platform for the Diagnosis of Connected Vehicle Functions
Connected and software-defined vehicles promise to offer a broad range of services and advanced functions to customers, aiming to increase passenger comfort and support autonomous driving capabilities. Due to the high reliability and availability requirements of connected vehicles, it is crucial to resolve any occurring failures quickly. To achieve this however, a complex cloud/edge architecture with a mesh of dependencies must be navigated to diagnose the responsible root cause. As such, manual analyses become unfeasible since they would significantly delay the troubleshooting. To address this challenge, this paper presents SDVDiag, an extensible platform for the automated diagnosis of connected vehicle functions. The platform enables the creation of pipelines that cover all steps from initial data collection to the tracing of potential root causes. In addition, SDVDiag supports self-adaptive behavior by the ability to exchange modules at runtime. Dependencies between functions are detected and continuously updated, resulting in a dynamic graph view of the system. In addition, vital system metrics are monitored for anomalies. Whenever an incident is investigated, a snapshot of the graph is taken and augmented by relevant anomalies. Finally, the analysis is performed by traversing the graph and creating a ranking of the most likely causes. To evaluate the platform, it is deployed inside an 5G test fleet environment for connected vehicle functions. The results show that injected faults can be detected reliably. As such, the platform offers the potential to gain new insights and reduce downtime by identifying problems and their causes at an early stage.
Updated: 2025-07-25 16:09:27
标题: SDVDiag:用于诊断连接车辆功能的模块化平台
摘要: 连接和软件定义的车辆承诺为客户提供广泛的服务和先进功能,旨在增加乘客的舒适度并支持自动驾驶能力。由于连接车辆具有高可靠性和可用性要求,解决任何出现的故障非常关键。然而,为了实现这一目标,必须导航复杂的云/边缘架构以诊断负责根本原因。因此,手动分析变得不可行,因为它们会显著延迟故障排除。 为解决这一挑战,本文提出了SDVDiag,一个用于自动诊断连接车辆功能的可扩展平台。该平台支持从初始数据收集到追踪潜在根本原因的所有步骤的管道创建。此外,SDVDiag通过在运行时交换模块的能力支持自适应行为。功能之间的依赖关系被检测并持续更新,从而导致系统的动态图形视图。此外,关键系统指标被监测以检测异常。每当调查事件时,都会对图形进行快照,并增加相关异常。最后,通过遍历图形并创建最有可能的原因排名来进行分析。 为评估该平台,在连接车辆功能的5G测试车队环境中部署了该平台。结果显示可以可靠地检测注入的故障。因此,该平台有潜力在早期识别问题及其原因,为获取新见解并减少停机时间。
更新时间: 2025-07-25 16:09:27
领域: cs.SE,cs.AI,cs.DC,B.8.2; C.2.4
FD4QC: Application of Classical and Quantum-Hybrid Machine Learning for Financial Fraud Detection A Technical Report
The increasing complexity and volume of financial transactions pose significant challenges to traditional fraud detection systems. This technical report investigates and compares the efficacy of classical, quantum, and quantum-hybrid machine learning models for the binary classification of fraudulent financial activities. As of our methodology, first, we develop a comprehensive behavioural feature engineering framework to transform raw transactional data into a rich, descriptive feature set. Second, we implement and evaluate a range of models on the IBM Anti-Money Laundering (AML) dataset. The classical baseline models include Logistic Regression, Decision Tree, Random Forest, and XGBoost. These are compared against three hybrid classic quantum algorithms architectures: a Quantum Support Vector Machine (QSVM), a Variational Quantum Classifier (VQC), and a Hybrid Quantum Neural Network (HQNN). Furthermore, we propose Fraud Detection for Quantum Computing (FD4QC), a practical, API-driven system architecture designed for real-world deployment, featuring a classical-first, quantum-enhanced philosophy with robust fallback mechanisms. Our results demonstrate that classical tree-based models, particularly \textit{Random Forest}, significantly outperform the quantum counterparts in the current setup, achieving high accuracy (\(97.34\%\)) and F-measure (\(86.95\%\)). Among the quantum models, \textbf{QSVM} shows the most promise, delivering high precision (\(77.15\%\)) and a low false-positive rate (\(1.36\%\)), albeit with lower recall and significant computational overhead. This report provides a benchmark for a real-world financial application, highlights the current limitations of quantum machine learning in this domain, and outlines promising directions for future research.
Updated: 2025-07-25 16:08:22
标题: FD4QC:经典和量子混合机器学习在金融欺诈检测中的应用 技术报告
摘要: 随着金融交易的复杂性和数量的增加,传统的欺诈检测系统面临着重大挑战。这份技术报告调查并比较了经典、量子和量子混合机器学习模型在欺诈金融活动的二元分类中的功效。 在我们的方法论中,首先,我们开发了一个全面的行为特征工程框架,将原始交易数据转化为丰富的描述性特征集。其次,我们在IBM反洗钱(AML)数据集上实施和评估了一系列模型。经典基准模型包括逻辑回归、决策树、随机森林和XGBoost。这些模型与三种混合经典量子算法架构进行比较:量子支持向量机(QSVM)、变分量子分类器(VQC)和混合量子神经网络(HQNN)。 此外,我们提出了量子计算欺诈检测(FD4QC),这是一个实用的、API驱动的系统架构,旨在实现实际部署,具有以经典为先、量子增强的理念和强大的回退机制。 我们的结果表明,在当前设置中,经典基于树的模型,特别是\textit{随机森林},明显优于量子对应模型,实现了高精度(97.34%)和F-度量(86.95%)。在量子模型中,\textbf{QSVM}显示出最大的潜力,提供高精度(77.15%)和低假阳性率(1.36%),尽管召回率较低且计算开销较大。 这份报告为实际的金融应用提供了一个基准,突出了量子机器学习在该领域的当前限制,并概述了未来研究的有希望的方向。
更新时间: 2025-07-25 16:08:22
领域: cs.LG,cs.CE
Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security
As large language models (LLMs) increasingly integrate native code interpreters, they enable powerful real-time execution capabilities, substantially expanding their utility. However, such integrations introduce potential system-level cybersecurity threats, fundamentally different from prompt-based vulnerabilities. To systematically evaluate these interpreter-specific risks, we propose CIRCLE (Code-Interpreter Resilience Check for LLM Exploits), a simple benchmark comprising 1,260 prompts targeting CPU, memory, and disk resource exhaustion. Each risk category includes explicitly malicious ("direct") and plausibly benign ("indirect") prompt variants. Our automated evaluation framework assesses not only whether LLMs refuse or generates risky code, but also executes the generated code within the interpreter environment to evaluate code correctness, simplifications made by the LLM to make the code safe, or execution timeouts. Evaluating 7 commercially available models from OpenAI and Google, we uncover significant and inconsistent vulnerabilities. For instance, evaluations show substantial disparities even within providers - OpenAI's o4-mini correctly refuses risky requests at 7.1%, notably higher rates compared to GPT-4.1 at 0.5%. Results particularly underscore that indirect, socially-engineered prompts substantially weaken model defenses. This highlights an urgent need for interpreter-specific cybersecurity benchmarks, dedicated mitigation tools (e.g., guardrails), and clear industry standards to guide safe and responsible deployment of LLM interpreter integrations. The benchmark dataset and evaluation code are publicly released to foster further research.
Updated: 2025-07-25 16:06:16
标题: 在CIRCLE中奔跑?一种LLM代码解释器安全性的简单基准测试
摘要: 随着大型语言模型(LLMs)越来越多地集成本地代码解释器,它们实现了强大的实时执行能力,大大扩展了它们的实用性。然而,这种集成引入了潜在的系统级网络安全威胁,与基于提示的漏洞根本不同。为了系统评估这些解释器特定的风险,我们提出了CIRCLE(用于LLM利用的代码解释器弹性检查),这是一个简单的基准测试,包含1,260个针对CPU、内存和磁盘资源耗尽的提示。每个风险类别包括明确恶意(“直接”)和可能良性(“间接”)的提示变种。我们的自动化评估框架不仅评估LLMs是否拒绝或生成危险代码,还在解释器环境中执行生成的代码,以评估代码的正确性、LLMs为使代码安全所做的简化,或执行超时。通过评估OpenAI和Google的7个商业可用模型,我们发现了显著且不一致的漏洞。例如,评估显示即使在供应商内部也存在显著差异-OpenAI的o4-mini在7.1%的比率下正确拒绝危险请求,明显高于GPT-4.1的0.5%。结果特别强调间接的社会工程提示大大削弱了模型的防御能力。这突显了对解释器特定网络安全基准测试、专用缓解工具(例如护栏)以及明确的行业标准以指导LLM解释器集成的安全和负责任部署的迫切需求。基准数据集和评估代码已公开发布以促进进一步研究。
更新时间: 2025-07-25 16:06:16
领域: cs.CR,cs.AI
Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security
As large language models (LLMs) increasingly integrate native code interpreters, they enable powerful real-time execution capabilities, substantially expanding their utility. However, such integrations introduce potential system-level cybersecurity threats, fundamentally different from prompt-based vulnerabilities. To systematically evaluate these interpreter-specific risks, we propose CIRCLE (Code-Interpreter Resilience Check for LLM Exploits), a simple benchmark comprising 1,260 prompts targeting CPU, memory, and disk resource exhaustion. Each risk category includes explicitly malicious ("direct") and plausibly benign ("indirect") prompt variants. Our automated evaluation framework assesses not only whether LLMs refuse or generates risky code, but also executes the generated code within the interpreter environment to evaluate code correctness, simplifications made by the LLM to make the code safe, or execution timeouts. Evaluating 7 commercially available models from OpenAI and Google, we uncover significant and inconsistent vulnerabilities. For instance, evaluations show substantial disparities even within providers - OpenAI's o4-mini correctly refuses risky requests at 7.1%, notably higher rates compared to GPT-4.1 at 0.5%. Results particularly underscore that indirect, socially-engineered prompts substantially weaken model defenses. This highlights an urgent need for interpreter-specific cybersecurity benchmarks, dedicated mitigation tools (e.g., guardrails), and clear industry standards to guide safe and responsible deployment of LLM interpreter integrations. The benchmark dataset and evaluation code are publicly released to foster further research.
Updated: 2025-07-25 16:06:16
标题: 在CIRCLE中运行?一个用于LLM代码解释器安全性的简单基准测试
摘要: 随着大型语言模型(LLMs)越来越多地集成本地代码解释器,它们实现了强大的实时执行能力,极大地扩展了它们的实用性。然而,这种集成引入了潜在的系统级网络安全威胁,与基于提示的漏洞基本不同。为了系统评估这些解释器特定的风险,我们提出了CIRCLE(用于LLM利用的代码解释器韧性检查),这是一个简单的基准测试,包括1,260个针对CPU、内存和磁盘资源耗尽的提示。每个风险类别包括明确恶意(“直接”)和可能良性(“间接”)提示的变体。我们的自动化评估框架不仅评估LLMs是否拒绝或生成风险代码,还在解释器环境中执行生成的代码以评估代码的正确性、LLMs为使代码安全而做的简化,或执行超时。评估了来自OpenAI和谷歌的7个商业可用模型,我们发现了重大且不一致的漏洞。例如,评估结果显示,甚至在提供商内部也存在显著差异 - OpenAI的o4-mini以7.1%的比率正确拒绝风险请求,远高于GPT-4.1的0.5%。结果特别强调间接的社交工程提示会明显削弱模型的防御力。这突显了对解释器特定网络安全基准测试、专门的缓解工具(例如防护栏)和明确的行业标准的迫切需求,以指导LLM解释器集成的安全和负责任部署。基准测试数据集和评估代码已公开发布,以促进进一步研究。
更新时间: 2025-07-25 16:06:16
领域: cs.CR,cs.AI
CXR-CML: Improved zero-shot classification of long-tailed multi-label diseases in Chest X-Rays
Chest radiography (CXR) plays a crucial role in the diagnosis of various diseases. However, the inherent class imbalance in the distribution of clinical findings presents a significant challenge for current self-supervised deep learning models. These models often fail to accurately classify long-tailed classes. Current Vision-Language models such as Contrastive Language Image Pre-training (CLIP) models effectively model the manifold distribution of the latent space, enabling high zero-shot classification accuracies. Although CLIP performs well on most of the primary classes in the dataset, our work reveals that its effectiveness decreases significantly for classes with a long-tailed distribution. Our approach employs a class-weighting mechanism that directly aligns with the distribution of classes within the latent space. This method ensures a substantial improvement in overall classification performance, with particular emphasis on enhancing the recognition and accuracy of rarely observed classes. We accomplish this by applying Gaussian Mixture Model (GMM) clustering to the latent space. The subsequent clusters are further refined by Student t-distribution, followed by a metric loss that utilizes the altered embeddings. Our approach facilitates stable and adaptive clustering of the features. This results in a notable average improvement of 7\% points in zero-shot AUC scores across 40 classes in the MIMIC-CXR-JPG dataset from previous SOTA models.
Updated: 2025-07-25 16:05:47
标题: CXR-CML:改进的胸部X光长尾多标签疾病的零样本分类
摘要: 胸部X射线(CXR)在各种疾病的诊断中起着至关重要的作用。然而,在临床结果的分布中存在的固有类别不平衡对当前的自监督深度学习模型构成了重大挑战。这些模型通常无法准确分类长尾类别。目前的视觉-语言模型,如对比语言图像预训练(CLIP)模型有效地建模了潜在空间的流形分布,实现了高的零样本分类准确性。尽管CLIP在数据集中大多数主要类别上表现良好,但我们的研究揭示了其对长尾分布类别的有效性显著下降。我们的方法采用了一个与潜在空间内类别分布直接对齐的类别加权机制。这种方法确保了整体分类性能的显著提高,特别强调增强罕见类别的识别和准确性。我们通过在潜在空间应用高斯混合模型(GMM)聚类来实现这一目标。随后,这些聚类通过学生t分布进一步细化,然后使用改变的嵌入来利用度量损失。我们的方法促进了特征的稳定和自适应聚类。这导致在来自先前SOTA模型的MIMIC-CXR-JPG数据集的40个类别中,零样本AUC分数平均提高了7个百分点。
更新时间: 2025-07-25 16:05:47
领域: cs.CV,cs.AI
CXR-CML: Improved zero-shot classification of long-tailed multi-label diseases in Chest X-Rays
Chest radiography (CXR) plays a crucial role in the diagnosis of various diseases. However, the inherent class imbalance in the distribution of clinical findings presents a significant challenge for current self-supervised deep learning models. These models often fail to accurately classify long-tailed classes. Current Vision-Language models such as Contrastive Language Image Pre-training (CLIP) models effectively model the manifold distribution of the latent space, enabling high zero-shot classification accuracies. Although CLIP performs well on most of the primary classes in the dataset, our work reveals that its effectiveness decreases significantly for classes with a long-tailed distribution. Our approach employs a class-weighting mechanism that directly aligns with the distribution of classes within the latent space. This method ensures a substantial improvement in overall classification performance, with particular emphasis on enhancing the recognition and accuracy of rarely observed classes. We accomplish this by applying Gaussian Mixture Model (GMM) clustering to the latent space. The subsequent clusters are further refined by Student t-distribution, followed by a metric loss that utilizes the altered embeddings. Our approach facilitates stable and adaptive clustering of the features. This results in a notable average improvement of 7\% points in zero-shot AUC scores across 40 classes in the MIMIC-CXR-JPG dataset from previous SOTA models.
Updated: 2025-07-25 16:05:47
标题: CXR-CML: 改进的胸部X射线长尾多标签疾病的零样本分类
摘要: 胸部X射线(CXR)在各种疾病的诊断中发挥着至关重要的作用。然而,在临床发现的分布中固有的类别不平衡为当前自监督深度学习模型带来了重大挑战。这些模型经常无法准确分类长尾类别。当前的视觉-语言模型,如对比语言图像预训练(CLIP)模型,有效地模拟了潜在空间的流形分布,实现了高零样本分类准确率。尽管CLIP在数据集中的大多数主要类别上表现良好,但我们的研究揭示了它在长尾分布类别上的有效性显著降低。我们的方法采用了一种与潜在空间内类别分布直接对齐的类别加权机制。这种方法确保了整体分类性能的显著提高,特别强调了对罕见类别的识别和准确性的增强。我们通过将高斯混合模型(GMM)聚类应用于潜在空间来实现这一点。随后的聚类通过学生t-分布进一步细化,然后使用修改后的嵌入来利用度量损失。我们的方法促进了特征的稳定和自适应聚类。这导致在MIMIC-CXR-JPG数据集的40个类别中,零样本AUC分数平均提高了7个百分点,超过了先前的SOTA模型。
更新时间: 2025-07-25 16:05:47
领域: cs.CV,cs.AI
Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data
Causal inference in longitudinal biomedical data remains a central challenge, especially in psychiatry, where symptom heterogeneity and latent confounding frequently undermine classical estimators. Most existing methods for treatment effect estimation presuppose a fixed outcome variable and address confounding through observed covariate adjustment. However, the assumption of unconfoundedness may not hold for a fixed outcome in practice. To address this foundational limitation, we directly optimize the outcome definition to maximize causal identifiability. Our DEBIAS (Durable Effects with Backdoor-Invariant Aggregated Symptoms) algorithm learns non-negative, clinically interpretable weights for outcome aggregation, maximizing durable treatment effects and empirically minimizing both observed and latent confounding by leveraging the time-limited direct effects of prior treatments in psychiatric longitudinal data. The algorithm also furnishes an empirically verifiable test for outcome unconfoundedness. DEBIAS consistently outperforms state-of-the-art methods in recovering causal effects for clinically interpretable composite outcomes across comprehensive experiments in depression and schizophrenia.
Updated: 2025-07-25 16:03:47
标题: 学习从精神病学纵向数据中预测结果原因
摘要: 纵向生物医学数据中的因果推断仍然是一个中心挑战,特别是在精神病学领域,其中症状异质性和潜在混杂经常破坏传统的估计器。大多数现有的治疗效果估计方法假定一个固定的结果变量,并通过观察到的协变量调整来解决混杂问题。然而,在实践中,固定结果的无混杂性假设可能不成立。为了解决这个基础性的限制,我们直接优化结果定义,以最大化因果可识别性。我们的DEBIAS(持久效应与后门不变的聚合症状)算法学习非负的、临床可解释的权重用于结果聚合,最大化持久的治疗效果,并通过利用精神病学纵向数据中先前治疗的有限直接效应来实证地最小化观察到的和潜在的混杂。该算法还提供了一个可实证验证的结果无混杂性检验。在抑郁症和精神分裂症的全面实验中,DEBIAS在恢复临床可解释的复合结果的因果效应方面始终优于现有最先进的方法。
更新时间: 2025-07-25 16:03:47
领域: cs.LG,q-bio.QM,stat.ML
Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning
Auxiliary tasks facilitate learning in situations where data is scarce or the principal task of interest is extremely complex. This idea is primarily inspired by the improved generalization capability induced by solving multiple tasks simultaneously, which leads to a more robust shared representation. Nevertheless, finding optimal auxiliary tasks is a crucial problem that often requires hand-crafted solutions or expensive meta-learning approaches. In this paper, we propose a novel framework, dubbed Detaux, whereby a weakly supervised disentanglement procedure is used to discover a new unrelated auxiliary classification task, which allows us to go from a Single-Task Learning (STL) to a Multi-Task Learning (MTL) problem. The disentanglement procedure works at the representation level, isolating the variation related to the principal task into an isolated subspace and additionally producing an arbitrary number of orthogonal subspaces, each of which encourages high separability among projections. We generate the auxiliary classification task through a clustering procedure on the most disentangled subspace, obtaining a discrete set of labels. Subsequently, the original data, the labels associated with the principal task, and the newly discovered ones can be fed into any MTL framework. Experimental validation on both synthetic and real data, along with various ablation studies, demonstrates promising results, revealing the potential in what has been, so far, an unexplored connection between learning disentangled representations and MTL. The source code is available at https://github.com/intelligolabs/Detaux.
Updated: 2025-07-25 16:01:55
标题: 解缠缠绕的潜在空间促进数据驱动的辅助学习
摘要: 辅助任务有助于在数据稀缺或主要兴趣任务极其复杂的情况下学习。这个想法主要受到同时解决多个任务引起的改进泛化能力的启发,这导致了更稳健的共享表示。然而,找到最佳的辅助任务是一个关键问题,通常需要手工设计的解决方案或昂贵的元学习方法。在本文中,我们提出了一个新的框架,名为Detaux,其中使用弱监督的解缠程序来发现一个新的不相关的辅助分类任务,这使我们从单任务学习(STL)转变为多任务学习(MTL)问题。解缠程度在表示级别起作用,将与主要任务相关的变化隔离到一个孤立的子空间,并额外生成任意数量的正交子空间,每个子空间鼓励在投影之间具有高可分性。我们通过对最解缠的子空间进行聚类过程生成辅助分类任务,获得一组离散标签。随后,原始数据、与主要任务相关的标签以及新发现的标签可以输入任何MTL框架。对合成和真实数据的实验验证,以及各种消融研究显示了有希望的结果,揭示了学习解缠表示和MTL之间迄今为止未被探索的联系的潜力。源代码可在https://github.com/intelligolabs/Detaux 上找到。
更新时间: 2025-07-25 16:01:55
领域: cs.LG
Transcript Franking for Encrypted Messaging
Message franking is an indispensable abuse mitigation tool for end-to-end encrypted (E2EE) messaging platforms. With it, users who receive harmful content can securely report that content to platform moderators. However, while real-world deployments of reporting require the disclosure of multiple messages, existing treatments of message franking only consider the report of a single message. As a result, there is a gap between the security goals achieved by constructions and those needed in practice. Our work introduces transcript franking, a new type of protocol that allows reporting subsets of conversations such that moderators can cryptographically verify message causality and contents. We define syntax, semantics, and security for transcript franking in two-party and group messaging. We then present efficient constructions for transcript franking and prove their security. Looking toward deployment considerations, we provide detailed discussion of how real-world messaging systems can incorporate our protocols.
Updated: 2025-07-25 15:50:42
标题: 加密消息的转录戳记
摘要: 消息签章是端到端加密(E2EE)消息平台不可或缺的滥用缓解工具。借助消息签章,接收到有害内容的用户可以安全地将该内容报告给平台管理员。然而,在现实世界的报告部署中,需要披露多条消息,而现有的消息签章处理仅考虑报告单条消息。结果是,构造实现的安全目标与实践中所需的安全目标之间存在差距。我们的工作引入了转录签章,这是一种新类型的协议,允许报告对话的子集,以便管理员可以加密验证消息的因果关系和内容。我们在两方和群体消息传递中定义了转录签章的语法、语义和安全性。然后,我们提出了转录签章的高效构造,并证明了它们的安全性。在考虑部署方面,我们提供了详细讨论,说明现实世界的消息系统如何整合我们的协议。
更新时间: 2025-07-25 15:50:42
领域: cs.CR
ReCatcher: Towards LLMs Regression Testing for Code Generation
Large Language Models (LLMs) for code generation evolve rapidly through fine-tuning, merging, or new model releases. However, such updates can introduce regressions, not only in correctness but also in code quality and performance. To address this, we present ReCatcher, a regression testing framework for Python code generation. ReCatcher systematically compares two LLMs, typically a current model and a candidate update, across three dimensions: logical correctness, static code quality, and execution performance. We apply ReCatcher to assess regressions across three update scenarios, fine-tuning, merging, and model release, using CodeLlama, DeepSeek-Coder, and GPT-4o. Our evaluation shows that fine-tuning with cross-language datasets increases syntax errors by up to 12%. Merging with general-purpose models like Llama2 leads to regressions in correctness by up to 18%. GPT-4o introduces regressions of up to 50% in handling missing imports compared to GPT-3.5-turbo, while GPT-4o-mini suffers up to 80% performance degradation in execution time versus GPT-4o. Overall, logical correctness, performance, and error handling (e.g., syntax errors and missing imports) are the most regression-prone areas. Comparing ReCatcher with baseline solutions, it presents better and consistent accuracy across logical and performance aspects. ReCatcher highlights the importance of systematic regression evaluation before adopting new models, while assisting researchers and practitioners in making more informed update decisions.
Updated: 2025-07-25 15:45:55
标题: 重新捕捉器:面向LLMs代码生成的回归测试
摘要: 大型语言模型(LLMs)用于代码生成通过微调、合并或新模型发布快速演变。然而,这些更新可能会引入回归,不仅在正确性方面,还在代码质量和性能方面。为了解决这个问题,我们提出了ReCatcher,一个用于Python代码生成的回归测试框架。ReCatcher系统地比较两个LLMs,通常是当前模型和候选更新,跨三个维度:逻辑正确性、静态代码质量和执行性能。我们应用ReCatcher来评估三种更新方案下的回归,包括微调、合并和模型发布,使用CodeLlama、DeepSeek-Coder和GPT-4o。我们的评估结果显示,使用跨语言数据集进行微调会导致语法错误增加高达12%。与Llama2等通用模型合并会导致正确性回归高达18%。GPT-4o相对于GPT-3.5-turbo在处理缺失导入方面引入了高达50%的回归,而GPT-4o-mini在执行时间方面相对于GPT-4o会出现高达80%的性能下降。总体而言,逻辑正确性、性能和错误处理(例如语法错误和缺失导入)是最容易出现回归的领域。与基准解决方案相比,ReCatcher在逻辑和性能方面表现出更好和更一致的准确性。ReCatcher强调了在采用新模型之前进行系统回归评估的重要性,同时帮助研究人员和从业者做出更明智的更新决策。
更新时间: 2025-07-25 15:45:55
领域: cs.SE,cs.AI
ReCatcher: Towards LLMs Regression Testing for Code Generation
Large Language Models (LLMs) for code generation evolve rapidly through fine-tuning, merging, or new model releases. However, such updates can introduce regressions, not only in correctness but also in code quality and performance. To address this, we present ReCatcher, a regression testing framework for Python code generation. ReCatcher systematically compares two LLMs, typically a current model and a candidate update, across three dimensions: logical correctness, static code quality, and execution performance. We apply ReCatcher to assess regressions across three update scenarios, fine-tuning, merging, and model release, using CodeLlama, DeepSeek-Coder, and GPT-4o. Our evaluation shows that fine-tuning with cross-language datasets increases syntax errors by up to 12%. Merging with general-purpose models like Llama2 leads to regressions in correctness by up to 18%. GPT-4o introduces regressions of up to 50% in handling missing imports compared to GPT-3.5-turbo, while GPT-4o-mini suffers up to 80% performance degradation in execution time versus GPT-4o. Overall, logical correctness, performance, and error handling (e.g., syntax errors and missing imports) are the most regression-prone areas. Comparing ReCatcher with baseline solutions, it presents better and consistent accuracy across logical and performance aspects. ReCatcher highlights the importance of systematic regression evaluation before adopting new models, while assisting researchers and practitioners in making more informed update decisions.
Updated: 2025-07-25 15:45:55
标题: ReCatcher: 为代码生成设计的LLMs回归测试
摘要: 大型语言模型(LLMs)用于代码生成通过微调、合并或发布新模型快速演进。然而,这些更新可能会引入回归,不仅在正确性方面,在代码质量和性能方面也可能存在问题。为了解决这个问题,我们提出了ReCatcher,一个用于Python代码生成的回归测试框架。ReCatcher系统地比较两个LLMs,通常是当前模型和候选更新,跨三个维度:逻辑正确性、静态代码质量和执行性能。我们应用ReCatcher来评估三种更新方案中的回归,包括微调、合并和模型发布,使用CodeLlama、DeepSeek-Coder和GPT-4o。我们的评估显示,使用跨语言数据集进行微调会导致语法错误增加高达12%。与Llama2等通用模型合并会导致正确性回归高达18%。与GPT-3.5-turbo相比,GPT-4o在处理缺失导入方面引入高达50%的回归,而GPT-4o-mini在执行时间方面与GPT-4o相比会出现高达80%的性能下降。总体而言,逻辑正确性、性能和错误处理(例如语法错误和缺失导入)是最容易出现回归的领域。将ReCatcher与基准解决方案进行比较,它在逻辑和性能方面表现出更好和一致的准确性。ReCatcher强调了在采用新模型之前进行系统回归评估的重要性,同时帮助研究人员和从业者做出更明智的更新决策。
更新时间: 2025-07-25 15:45:55
领域: cs.SE,cs.AI
Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers
Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and problem-solving across various domains. However, their ability to perform complex, multi-step reasoning task-essential for applications in science, medicine, and law-remains an area of active investigation. This paper examines the reasoning capabilities of contemporary LLMs, analyzing their strengths, limitations, and potential for improvement. The study uses prompt engineering techniques on the Graduate-Level GoogleProof Q&A (GPQA) dataset to assess the scientific reasoning of GPT-4o. Five popular prompt engineering techniques and two tailored promptings were tested: baseline direct answer (zero-shot), chain-of-thought (CoT), zero-shot CoT, self-ask, self-consistency, decomposition, and multipath promptings. Our findings indicate that while LLMs exhibit emergent reasoning abilities, they often rely on pattern recognition rather than true logical inference, leading to inconsistencies in complex problem-solving. The results indicated that self-consistency outperformed the other prompt engineering technique with an accuracy of 52.99%, followed by direct answer (52.23%). Zero-shot CoT (50%) outperformed multipath (48.44%), decomposition (47.77%), self-ask (46.88%), and CoT (43.75%). Self-consistency performed the second worst in explaining the answers. Simple techniques such as direct answer, CoT, and zero-shot CoT have the best scientific reasoning. We propose a research agenda aimed at bridging these gaps by integrating structured reasoning frameworks, hybrid AI approaches, and human-in-the-loop methodologies. By critically evaluating the reasoning mechanisms of LLMs, this paper contributes to the ongoing discourse on the future of artificial general intelligence and the development of more robust, trustworthy AI systems.
Updated: 2025-07-25 15:43:40
标题: 理解LLM科学推理通过提示和模型的答案解释
摘要: 大型语言模型(LLMs)在各个领域展示了在自然语言理解、推理和问题解决方面的显著能力。然而,它们在进行复杂的、多步骤推理任务的能力——这对于科学、医学和法律领域的应用至关重要——仍然是一个活跃研究的领域。本文研究了当代LLMs的推理能力,分析了它们的优势、局限性和改进潜力。该研究使用了Graduate-Level GoogleProof Q&A(GPQA)数据集上的提示工程技术来评估GPT-4o的科学推理能力。测试了五种流行的提示工程技术和两种定制提示:基准直接答案(零-shot)、思维链(CoT)、零-shot CoT、自问、自一致性、分解和多路径提示。我们的研究结果表明,虽然LLMs展现出新兴的推理能力,但它们通常依赖于模式识别而不是真正的逻辑推理,导致复杂问题解决中的不一致性。结果显示,自一致性的准确率为52.99%,胜过直接答案(52.23%)。零-shot CoT(50%)胜过多路径(48.44%)、分解(47.77%)、自问(46.88%)和CoT(43.75%)。自一致性在解释答案方面表现第二差。简单的技术,如直接答案、CoT和零-shot CoT,具有最佳的科学推理能力。我们提出了一个研究议程,旨在通过整合结构化推理框架、混合AI方法和人机协作方法来弥合这些差距。通过对LLMs的推理机制进行批判性评估,本文为关于人工通用智能的未来和更健壮、可信赖的AI系统的发展的持续讨论做出了贡献。
更新时间: 2025-07-25 15:43:40
领域: cs.AI
Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers
Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and problem-solving across various domains. However, their ability to perform complex, multi-step reasoning task-essential for applications in science, medicine, and law-remains an area of active investigation. This paper examines the reasoning capabilities of contemporary LLMs, analyzing their strengths, limitations, and potential for improvement. The study uses prompt engineering techniques on the Graduate-Level GoogleProof Q&A (GPQA) dataset to assess the scientific reasoning of GPT-4o. Five popular prompt engineering techniques and two tailored promptings were tested: baseline direct answer (zero-shot), chain-of-thought (CoT), zero-shot CoT, self-ask, self-consistency, decomposition, and multipath promptings. Our findings indicate that while LLMs exhibit emergent reasoning abilities, they often rely on pattern recognition rather than true logical inference, leading to inconsistencies in complex problem-solving. The results indicated that self-consistency outperformed the other prompt engineering technique with an accuracy of 52.99%, followed by direct answer (52.23%). Zero-shot CoT (50%) outperformed multipath (48.44%), decomposition (47.77%), self-ask (46.88%), and CoT (43.75%). Self-consistency performed the second worst in explaining the answers. Simple techniques such as direct answer, CoT, and zero-shot CoT have the best scientific reasoning. We propose a research agenda aimed at bridging these gaps by integrating structured reasoning frameworks, hybrid AI approaches, and human-in-the-loop methodologies. By critically evaluating the reasoning mechanisms of LLMs, this paper contributes to the ongoing discourse on the future of artificial general intelligence and the development of more robust, trustworthy AI systems.
Updated: 2025-07-25 15:43:40
标题: 理解通过提示和模型解释答案的LLM科学推理
摘要: 大型语言模型(LLMs)在各个领域展示了出色的自然语言理解、推理和问题解决能力。然而,它们在执行复杂的、多步推理任务的能力-对于科学、医学和法律领域的应用至关重要-仍然是一个活跃研究的领域。本文研究了当代LLMs的推理能力,分析了它们的优势、局限性和改进潜力。该研究使用了在研究生水平的GoogleProof问答(GPQA)数据集上的提示工程技术来评估GPT-4o的科学推理能力。测试了五种流行的提示工程技术和两种定制提示:基线直接答案(零-shot)、思维链(CoT)、零-shot CoT、自问、自一致性、分解和多路径提示。我们的研究结果表明,虽然LLMs表现出新兴的推理能力,但它们通常依赖于模式识别而不是真正的逻辑推理,导致在复杂问题解决中存在不一致性。结果表明,自一致性的准确率为52.99%,优于直接答案(52.23%)。零-shot CoT(50%)优于多路径(48.44%)、分解(47.77%)、自问(46.88%)和CoT(43.75%)。自一致性在解释答案方面表现第二差。简单技术,如直接答案、CoT和零-shot CoT,具有最佳的科学推理能力。我们提出了一个旨在通过整合结构化推理框架、混合AI方法和人机协作方法来弥合这些差距的研究议程。通过对LLMs的推理机制进行批判性评估,本文为关于人工通用智能的未来和更加健壮、可靠的AI系统的发展做出了贡献。
更新时间: 2025-07-25 15:43:40
领域: cs.AI
Agreement-Based Cascading for Efficient Inference
Adaptive inference schemes reduce the cost of machine learning inference by assigning smaller models to easier examples, attempting to avoid invocation of larger models when possible. In this work we explore a simple, effective adaptive inference technique we term Agreement-Based Cascading (ABC). ABC builds a cascade of models of increasing size/complexity, and uses agreement between ensembles of models at each level of the cascade as a basis for data-dependent routing. Although ensemble execution introduces additional expense, we show that these costs can be easily offset in practice due to large expected differences in model sizes, parallel inference execution capabilities, and accuracy benefits of ensembling. We examine ABC theoretically and empirically in terms of these parameters, showing that the approach can reliably act as a drop-in replacement for existing models and surpass the best single model it aims to replace in terms of both efficiency and accuracy. Additionally, we explore the performance of ABC relative to existing cascading methods in three common scenarios: (1) edge-to-cloud inference, where ABC reduces communication costs by up to 14x; (2) cloud-based model serving, where it achieves a 3x reduction in rental costs; and (3) inference via model API services, where ABC achieves a 2-25x reduction in average price per token/request relative to state-of-the-art LLM cascades.
Updated: 2025-07-25 15:38:39
标题: 基于协议的级联用于高效推理
摘要: 自适应推理方案通过将较小的模型分配给更容易的示例来降低机器学习推理的成本,试图在可能的情况下避免调用较大的模型。在这项工作中,我们探索了一种简单而有效的自适应推理技术,我们称之为基于协议的级联(ABC)。ABC构建了一个逐渐增大/复杂的模型级联,并利用每个级联级别的模型集之间的协议作为数据相关的路由基础。尽管集成执行引入了额外的开销,但我们表明这些成本可以在实践中很容易地抵消,因为模型大小、并行推理执行能力以及集成带来的准确性优势之间存在很大的预期差异。我们从理论和经验的角度检验了ABC在这些参数方面,表明该方法可以可靠地作为现有模型的即插即用替代品,并在效率和准确性方面超越其旨在替代的最佳单一模型。此外,我们探索了ABC相对于现有级联方法在三种常见场景中的性能表现:(1)边缘到云端推理,ABC可以将通信成本降低多达14倍;(2)基于云的模型服务,在租金成本上实现了3倍的降低;以及(3)通过模型API服务进行推理,ABC相对于最先进的LLM级联,将每个标记/请求的平均价格降低了2-25倍。
更新时间: 2025-07-25 15:38:39
领域: cs.LG
Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025)
Accurately predicting distributed cortical responses to naturalistic stimuli requires models that integrate visual, auditory and semantic information over time. We present a hierarchical multimodal recurrent ensemble that maps pretrained video, audio, and language embeddings to fMRI time series recorded while four subjects watched almost 80 hours of movies provided by the Algonauts 2025 challenge. Modality-specific bidirectional RNNs encode temporal dynamics; their hidden states are fused and passed to a second recurrent layer, and lightweight subject-specific heads output responses for 1000 cortical parcels. Training relies on a composite MSE-correlation loss and a curriculum that gradually shifts emphasis from early sensory to late association regions. Averaging 100 model variants further boosts robustness. The resulting system ranked third on the competition leaderboard, achieving an overall Pearson r = 0.2094 and the highest single-parcel peak score (mean r = 0.63) among all participants, with particularly strong gains for the most challenging subject (Subject 5). The approach establishes a simple, extensible baseline for future multimodal brain-encoding benchmarks.
Updated: 2025-07-25 15:38:12
标题: 多模态递归集合用于预测大自然电影对大脑反应的文献(阿尔戈诺茨2025)
摘要: 准确预测自然刺激下的分布式皮层响应需要整合视觉、听觉和语义信息的模型。我们提出了一种分层多模态递归集成模型,将预训练的视频、音频和语言嵌入映射到四名受试者在观看由Algonauts 2025挑战提供的近80小时电影时记录的fMRI时间序列。每种模态具体的双向RNN编码时间动态;它们的隐藏状态被融合并传递到第二个递归层,并且轻量级的特定于受试者的头部输出了1000个皮层区域的响应。训练依赖于复合的MSE-相关性损失和一个课程,逐渐将重点从早期感觉区转移到后期联想区。对100个模型变体进行平均进一步增强了鲁棒性。结果系统在竞赛排行榜上排名第三,实现了总体皮尔逊r = 0.2094,并且在所有参与者中取得了最高的单个区域峰值分数(平均r = 0.63),尤其是对于最具挑战性的受试者(受试者5)获得了特别显著的增益。该方法为未来多模态大脑编码基准提供了一个简单且可扩展的基线。
更新时间: 2025-07-25 15:38:12
领域: q-bio.NC,cs.CV,cs.LG
Data Augmentation for Spoken Grammatical Error Correction
While there exist strong benchmark datasets for grammatical error correction (GEC), high-quality annotated spoken datasets for Spoken GEC (SGEC) are still under-resourced. In this paper, we propose a fully automated method to generate audio-text pairs with grammatical errors and disfluencies. Moreover, we propose a series of objective metrics that can be used to evaluate the generated data and choose the more suitable dataset for SGEC. The goal is to generate an augmented dataset that maintains the textual and acoustic characteristics of the original data while providing new types of errors. This augmented dataset should augment and enrich the original corpus without altering the language assessment scores of the second language (L2) learners. We evaluate the use of the augmented corpus both for written GEC (the text part) and for SGEC (the audio-text pairs). Our experiments are conducted on the S\&I Corpus, the first publicly available speech dataset with grammar error annotations.
Updated: 2025-07-25 15:25:17
标题: 口语语法错误修正的数据增强
摘要: 尽管存在着用于语法错误校正(GEC)的强有力基准数据集,但用于口语GEC(SGEC)的高质量标注口语数据集仍然资源匮乏。在本文中,我们提出了一种完全自动化的方法,用于生成带有语法错误和语言不流畅性的音频-文本对。此外,我们提出了一系列客观指标,可用于评估生成的数据并选择更适合用于SGEC的数据集。我们的目标是生成一个扩充数据集,保持原始数据的文本和声学特征,同时提供新类型的错误。这个扩充数据集应该增加和丰富原始语料库,而不改变第二语言(L2)学习者的语言评估分数。我们评估了扩充语料库在书面GEC(文本部分)和SGEC(音频-文本对)方面的应用。我们的实验是在S&I语料库上进行的,这是第一个具有语法错误标注的公开可用语音数据集。
更新时间: 2025-07-25 15:25:17
领域: cs.CL,cs.AI
Data Augmentation for Spoken Grammatical Error Correction
While there exist strong benchmark datasets for grammatical error correction (GEC), high-quality annotated spoken datasets for Spoken GEC (SGEC) are still under-resourced. In this paper, we propose a fully automated method to generate audio-text pairs with grammatical errors and disfluencies. Moreover, we propose a series of objective metrics that can be used to evaluate the generated data and choose the more suitable dataset for SGEC. The goal is to generate an augmented dataset that maintains the textual and acoustic characteristics of the original data while providing new types of errors. This augmented dataset should augment and enrich the original corpus without altering the language assessment scores of the second language (L2) learners. We evaluate the use of the augmented corpus both for written GEC (the text part) and for SGEC (the audio-text pairs). Our experiments are conducted on the S\&I Corpus, the first publicly available speech dataset with grammar error annotations.
Updated: 2025-07-25 15:25:17
标题: 口语语法错误纠正的数据增强
摘要: 尽管存在用于语法错误修正(GEC)的强大基准数据集,但用于口语语法错误修正(SGEC)的高质量标注口语数据集仍然资源匮乏。在本文中,我们提出了一种完全自动化的方法来生成具有语法错误和语言不流畅性的音频文本对。此外,我们提出了一系列客观指标,可用于评估生成的数据并选择更适合SGEC的数据集。目标是生成一个扩充数据集,保持原始数据的文本和声学特征,同时提供新类型的错误。这个扩充数据集应该扩充和丰富原始语料库,而不会改变第二语言(L2)学习者的语言评估分数。我们评估了扩充语料库在书面GEC(文本部分)和SGEC(音频文本对)中的使用。我们的实验是在S&I语料库上进行的,这是第一个带有语法错误注释的公开可用语音数据集。
更新时间: 2025-07-25 15:25:17
领域: cs.CL,cs.AI
Learning neuro-symbolic convergent term rewriting systems
Building neural systems that can learn to execute symbolic algorithms is a challenging open problem in artificial intelligence, especially when aiming for strong generalization and out-of-distribution performance. In this work, we introduce a general framework for learning convergent term rewriting systems using a neuro-symbolic architecture inspired by the rewriting algorithm itself. We present two modular implementations of such architecture: the Neural Rewriting System (NRS) and the Fast Neural Rewriting System (FastNRS). As a result of algorithmic-inspired design and key architectural elements, both models can generalize to out-of-distribution instances, with FastNRS offering significant improvements in terms of memory efficiency, training speed, and inference time. We evaluate both architectures on four tasks involving the simplification of mathematical formulas and further demonstrate their versatility in a multi-domain learning scenario, where a single model is trained to solve multiple types of problems simultaneously. The proposed system significantly outperforms two strong neural baselines: the Neural Data Router, a recent transformer variant specifically designed to solve algorithmic problems, and GPT-4o, one of the most powerful general-purpose large-language models. Moreover, our system matches or outperforms the latest o1-preview model from OpenAI that excels in reasoning benchmarks.
Updated: 2025-07-25 15:24:56
标题: 学习神经符号收敛项重写系统
摘要: 建立能够学习执行符号算法的神经系统是人工智能中一个具有挑战性的开放问题,特别是在追求强大的泛化能力和超出分布性能时。在这项工作中,我们引入了一个通用框架,用于学习收敛的术语重写系统,该系统使用了受重写算法本身启发的神经符号结构。我们提出了两种模块化实现这种结构的方法:神经重写系统(NRS)和快速神经重写系统(FastNRS)。由于算法启发式设计和关键的架构元素,这两个模型都能泛化到分布外实例,其中FastNRS在内存效率、训练速度和推理时间方面提供了显著的改进。我们在涉及简化数学公式的四项任务上评估了这两种架构,并进一步展示了它们在多领域学习场景中的多功能性,其中一个单一模型被训练用于同时解决多种类型的问题。所提出的系统在性能上明显优于两种强大的神经基线模型:神经数据路由器,一个最近专门设计用于解决算法问题的变种变压器,以及GPT-4o,其中一个功能最强大的通用大型语言模型。此外,我们的系统与OpenAI的最新o1-preview模型相匹配或超越,在推理基准测试中表现出色。
更新时间: 2025-07-25 15:24:56
领域: cs.AI,cs.LG
Learning neuro-symbolic convergent term rewriting systems
Building neural systems that can learn to execute symbolic algorithms is a challenging open problem in artificial intelligence, especially when aiming for strong generalization and out-of-distribution performance. In this work, we introduce a general framework for learning convergent term rewriting systems using a neuro-symbolic architecture inspired by the rewriting algorithm itself. We present two modular implementations of such architecture: the Neural Rewriting System (NRS) and the Fast Neural Rewriting System (FastNRS). As a result of algorithmic-inspired design and key architectural elements, both models can generalize to out-of-distribution instances, with FastNRS offering significant improvements in terms of memory efficiency, training speed, and inference time. We evaluate both architectures on four tasks involving the simplification of mathematical formulas and further demonstrate their versatility in a multi-domain learning scenario, where a single model is trained to solve multiple types of problems simultaneously. The proposed system significantly outperforms two strong neural baselines: the Neural Data Router, a recent transformer variant specifically designed to solve algorithmic problems, and GPT-4o, one of the most powerful general-purpose large-language models. Moreover, our system matches or outperforms the latest o1-preview model from OpenAI that excels in reasoning benchmarks.
Updated: 2025-07-25 15:24:56
标题: 学习神经符号收敛项重写系统
摘要: 构建可以学习执行符号算法的神经系统是人工智能中一个具有挑战性的开放问题,特别是当目标是实现强大的泛化能力和超出分布性能时。在这项工作中,我们引入了一个通用框架,用神经符号架构来学习收敛的项重写系统,灵感来自于重写算法本身。我们提出了两种模块化实现这种架构的方法:神经重写系统(NRS)和快速神经重写系统(FastNRS)。由于算法启发的设计和关键的架构元素,这两个模型都能泛化到超出分布实例,其中FastNRS在内存效率、训练速度和推理时间方面提供了显著的改进。我们在涉及简化数学公式的四个任务中评估了这两种架构,并进一步展示了它们在多领域学习场景中的多功能性,其中一个模型被训练来同时解决多种类型的问题。所提出的系统显著优于两种强大的神经基线:神经数据路由器,这是一个最近设计用于解决算法问题的变体,以及GPT-4o,这是一个最强大的通用大语言模型。此外,我们的系统与OpenAI的最新o1-preview模型相匹配或优于在推理基准测试中表现优异。
更新时间: 2025-07-25 15:24:56
领域: cs.AI,cs.LG
Deep Learning for Double Auction
Auctions are important mechanisms extensively implemented in various markets, e.g., search engines' keyword auctions, antique auctions, etc. Finding an optimal auction mechanism is extremely difficult due to the constraints of imperfect information, incentive compatibility (IC), and individual rationality (IR). In addition to the traditional economic methods, some recently attempted to find the optimal (single) auction using deep learning methods. Unlike those attempts focusing on single auctions, we develop deep learning methods for double auctions, where imperfect information exists on both the demand and supply sides. The previous attempts on single auction cannot directly apply to our contexts and those attempts additionally suffer from limited generalizability, inefficiency in ensuring the constraints, and learning fluctuations. We innovate in designing deep learning models for solving the more complex problem and additionally addressing the previous models' three limitations. Specifically, we achieve generalizability by leveraging a transformer-based architecture to model market participants as sequences for varying market sizes; we utilize the numerical features of the constraints and pre-treat them for a higher learning efficiency; we develop a gradient-conflict-elimination scheme to address the problem of learning fluctuation. Extensive experimental evaluations demonstrate the superiority of our approach to classical and machine learning baselines.
Updated: 2025-07-25 15:21:48
标题: 深度学习用于双向拍卖
摘要: 拍卖是广泛实施在各种市场中的重要机制,例如搜索引擎的关键词拍卖,古董拍卖等。由于存在信息不完全、激励兼容性(IC)和个体合理性(IR)的约束,找到最优的拍卖机制非常困难。除了传统的经济方法外,一些最近尝试利用深度学习方法找到最优(单一)拍卖。与那些专注于单一拍卖的尝试不同,我们开发了用于双重拍卖的深度学习方法,在这种情况下,需求和供应方都存在信息不完整。先前对单一拍卖的尝试不能直接应用于我们的情境,并且这些尝试受限于泛化能力有限、确保约束的效率低下和学习波动。我们在设计深度学习模型以解决更复杂问题的同时,还解决了先前模型的三个限制。具体来说,我们通过利用基于transformer的架构将市场参与者建模为不同市场规模的序列来实现泛化性;我们利用约束的数值特征对其进行预处理以提高学习效率;我们开发了一个梯度冲突消除方案来解决学习波动的问题。大量实验评估表明,我们的方法优于传统的和机器学习基线。
更新时间: 2025-07-25 15:21:48
领域: cs.LG,cs.GT,econ.TH
Counterfactual Explanations in Medical Imaging: Exploring SPN-Guided Latent Space Manipulation
Artificial intelligence is increasingly leveraged across various domains to automate decision-making processes that significantly impact human lives. In medical image analysis, deep learning models have demonstrated remarkable performance. However, their inherent complexity makes them black box systems, raising concerns about reliability and interpretability. Counterfactual explanations provide comprehensible insights into decision processes by presenting hypothetical "what-if" scenarios that alter model classifications. By examining input alterations, counterfactual explanations provide patterns that influence the decision-making process. Despite their potential, generating plausible counterfactuals that adhere to similarity constraints providing human-interpretable explanations remains a challenge. In this paper, we investigate this challenge by a model-specific optimization approach. While deep generative models such as variational autoencoders (VAEs) exhibit significant generative power, probabilistic models like sum-product networks (SPNs) efficiently represent complex joint probability distributions. By modeling the likelihood of a semi-supervised VAE's latent space with an SPN, we leverage its dual role as both a latent space descriptor and a classifier for a given discrimination task. This formulation enables the optimization of latent space counterfactuals that are both close to the original data distribution and aligned with the target class distribution. We conduct experimental evaluation on the cheXpert dataset. To evaluate the effectiveness of the integration of SPNs, our SPN-guided latent space manipulation is compared against a neural network baseline. Additionally, the trade-off between latent variable regularization and counterfactual quality is analyzed.
Updated: 2025-07-25 15:19:32
标题: 医学影像中的反事实解释:探索SPN引导的潜在空间操作
摘要: 人工智能在各个领域越来越被利用来自动化影响人类生活的决策过程。在医学图像分析中,深度学习模型表现出了显著的性能。然而,它们固有的复杂性使它们成为黑匣子系统,引发了对可靠性和可解释性的担忧。反事实解释通过呈现改变模型分类的假设性“如果”场景,提供了对决策过程的可理解洞见。通过检查输入的改变,反事实解释提供了影响决策过程的模式。尽管具有潜力,生成符合相似性约束并提供人类可解释解释的合理反事实仍然是一个挑战。在本文中,我们通过一种模型特定的优化方法来研究这一挑战。虽然深度生成模型如变分自动编码器(VAEs)展现出了显著的生成能力,概率模型如和积网络(SPNs)有效地表示复杂的联合概率分布。通过用SPN对半监督VAE的潜在空间建模,我们利用其作为潜在空间描述符和给定歧视任务的分类器的双重角色。这种形式使得潜在空间反事实的优化既接近原始数据分布又与目标类分布一致。我们在chexPert数据集上进行了实验评估。为了评估SPNs整合的有效性,我们将我们的SPN引导的潜在空间操纵与神经网络基线进行比较。另外,分析了潜变量规范化和反事实质量之间的权衡。
更新时间: 2025-07-25 15:19:32
领域: cs.LG,cs.AI
Counterfactual Explanations in Medical Imaging: Exploring SPN-Guided Latent Space Manipulation
Artificial intelligence is increasingly leveraged across various domains to automate decision-making processes that significantly impact human lives. In medical image analysis, deep learning models have demonstrated remarkable performance. However, their inherent complexity makes them black box systems, raising concerns about reliability and interpretability. Counterfactual explanations provide comprehensible insights into decision processes by presenting hypothetical "what-if" scenarios that alter model classifications. By examining input alterations, counterfactual explanations provide patterns that influence the decision-making process. Despite their potential, generating plausible counterfactuals that adhere to similarity constraints providing human-interpretable explanations remains a challenge. In this paper, we investigate this challenge by a model-specific optimization approach. While deep generative models such as variational autoencoders (VAEs) exhibit significant generative power, probabilistic models like sum-product networks (SPNs) efficiently represent complex joint probability distributions. By modeling the likelihood of a semi-supervised VAE's latent space with an SPN, we leverage its dual role as both a latent space descriptor and a classifier for a given discrimination task. This formulation enables the optimization of latent space counterfactuals that are both close to the original data distribution and aligned with the target class distribution. We conduct experimental evaluation on the cheXpert dataset. To evaluate the effectiveness of the integration of SPNs, our SPN-guided latent space manipulation is compared against a neural network baseline. Additionally, the trade-off between latent variable regularization and counterfactual quality is analyzed.
Updated: 2025-07-25 15:19:32
标题: 在医学成像中的反事实解释:探索SPN引导的潜在空间操作
摘要: 人工智能越来越广泛地应用于各个领域,以自动化影响人类生活的决策过程。在医学图像分析中,深度学习模型表现出卓越的性能。然而,它们固有的复杂性使它们成为黑盒系统,引发了对可靠性和可解释性的担忧。反事实解释通过提出改变模型分类的假设“如果”场景,为决策过程提供了可理解的见解。通过检查输入的改变,反事实解释提供了影响决策过程的模式。尽管具有潜力,生成符合相似性约束并提供人类可解释性解释的合理反事实仍然是一个挑战。在本文中,我们通过模型特定的优化方法来研究这一挑战。虽然深度生成模型如变分自动编码器(VAEs)展现出显著的生成能力,但概率模型如和-积网络(SPNs)有效地表示复杂的联合概率分布。通过用SPN对半监督VAE的潜在空间建模,我们利用其作为潜在空间描述符和给定歧视任务的分类器的双重角色。这种表述使得可以优化潜在空间反事实,使其既接近原始数据分布又与目标类别分布一致。我们在chexPert数据集上进行实验评估。为了评估SPNs整合的有效性,我们将SPN引导的潜在空间操纵与神经网络基线进行比较。此外,还分析了潜在变量正则化与反事实质量之间的权衡。
更新时间: 2025-07-25 15:19:32
领域: cs.LG,cs.AI
Integrating IP Broadcasting with Audio Tags: Workflow and Challenges
The broadcasting industry has adopted IP technologies, revolutionising both live and pre-recorded content production, from news gathering to live music events. IP broadcasting allows for the transport of audio and video signals in an easily configurable way, aligning with modern networking techniques. This shift towards an IP workflow allows for much greater flexibility, not only in routing signals but with the integration of tools using standard web development techniques. One possible tool could include the use of live audio tagging, which has a number of uses in the production of content. These could include adding sound effects to automated closed captioning or identifying unwanted sound events within a scene. In this paper, we describe the process of containerising an audio tagging model into a microservice, a small segregated code module that can be integrated into a multitude of different network setups. The goal is to develop a modular, accessible, and flexible tool capable of seamless deployment into broadcasting workflows of all sizes, from small productions to large corporations. Challenges surrounding latency of the selected audio tagging model and its effect on the usefulness of the end product are discussed.
Updated: 2025-07-25 15:18:16
标题: 将IP广播与音频标签集成:工作流程和挑战
摘要: 广播行业已经采用IP技术,彻底改变了现场和预录内容的制作方式,从新闻采集到现场音乐活动。IP广播允许以一种易于配置的方式传输音频和视频信号,与现代网络技术相一致。这种向IP工作流的转变不仅允许信号路由更加灵活,还可以通过使用标准的网络开发技术集成工具。一种可能的工具是使用实时音频标记,它在内容制作中有多种用途。这可能包括将声音效果添加到自动封闭字幕中,或者识别场景中不想要的声音事件。在本文中,我们描述了将音频标记模型容器化为微服务的过程,这是一个可以集成到多种不同网络设置中的小型隔离代码模块。目标是开发一个模块化、可访问和灵活的工具,能够无缝地部署到各种规模的广播工作流中,从小型制作到大型公司。讨论了围绕选择的音频标记模型的延迟以及其对最终产品有用性的影响的挑战。
更新时间: 2025-07-25 15:18:16
领域: eess.AS,cs.AI,cs.MM,cs.SD
Integrating IP Broadcasting with Audio Tags: Workflow and Challenges
The broadcasting industry has adopted IP technologies, revolutionising both live and pre-recorded content production, from news gathering to live music events. IP broadcasting allows for the transport of audio and video signals in an easily configurable way, aligning with modern networking techniques. This shift towards an IP workflow allows for much greater flexibility, not only in routing signals but with the integration of tools using standard web development techniques. One possible tool could include the use of live audio tagging, which has a number of uses in the production of content. These could include adding sound effects to automated closed captioning or identifying unwanted sound events within a scene. In this paper, we describe the process of containerising an audio tagging model into a microservice, a small segregated code module that can be integrated into a multitude of different network setups. The goal is to develop a modular, accessible, and flexible tool capable of seamless deployment into broadcasting workflows of all sizes, from small productions to large corporations. Challenges surrounding latency of the selected audio tagging model and its effect on the usefulness of the end product are discussed.
Updated: 2025-07-25 15:18:16
标题: 将IP广播与音频标签集成:工作流程和挑战
摘要: 广播行业已经采用IP技术,彻底改变了直播和预先录制内容的制作方式,从新闻采集到现场音乐活动。IP广播允许以一种易于配置的方式传输音频和视频信号,与现代网络技术相一致。这种向IP工作流的转变不仅允许更大的灵活性,不仅在信号路由方面,而且还可以使用标准的网络开发技术集成工具。一个可能的工具可能包括使用实时音频标记,这在内容制作中有多种用途。这些用途可能包括向自动关闭字幕添加音效或识别场景中不需要的声音事件。在本文中,我们描述了将音频标记模型容器化为微服务的过程,即一个可以集成到多种不同网络设置中的小型隔离代码模块。目标是开发一个模块化、易于访问和灵活的工具,能够无缝地部署到各种规模的广播工作流中,从小型制作到大型公司。讨论了围绕所选音频标记模型的延迟和其对最终产品实用性的影响的挑战。
更新时间: 2025-07-25 15:18:16
领域: eess.AS,cs.AI,cs.MM,cs.SD
Empowering IoT Firmware Secure Update with Customization Rights
Firmware updates remain the primary line of defense for IoT devices; however, the update channel itself has become a well-established attack vector. Existing defenses mainly focus on securing monolithic firmware images, leaving module-level customization -a growing user demand-largely unprotected and insufficiently explored. To address this gap, we conduct a pilot study on the update workflows of 200 Linux-based IoT devices across 23 vendors, uncovering five previously undocumented vulnerabilities caused by customization practices. A broader analysis of update-related CVEs from 2020 to 2024 reveals that over half originate from customization-induced issues. These findings highlight a critical yet underexamined reality: as customization increases, so does the attack surface, while current defenses fail to keep pace. We propose IMUP (Integrity-Centric Modular Update Platform), the first framework to address two key challenges: constructing a trustworthy cross-module integrity chain and scaling update performance under mass customization. IMUP combines three techniques: per-module chameleon hashing for integrity, server-side proof-of-work offloading to reduce device overhead, and server-side caching to reuse module combinations, minimizing rebuild costs. Security analysis shows that even when 95 percent of secret keys are exposed, forging a valid image incurs over 300 times the cost of the legitimate server. Experiments on heterogeneous IoT devices demonstrate that IMUP reduces server-side generation time by 2.9 times and device downtime by 5.9 times compared to a package-manager baseline.
Updated: 2025-07-25 15:17:29
标题: 使用定制权限增强IoT固件安全更新
摘要: 固件更新仍然是物联网设备的主要防御线,然而,更新通道本身已成为一个成熟的攻击向量。现有的防御主要集中在保护整体固件映像,而对于模块级定制——一个不断增长的用户需求——主要保护不足且未充分研究。为了填补这一空白,我们对23家供应商的200台基于Linux的物联网设备的更新工作流程进行了试点研究,发现了由定制实践引起的五个以前未记录的漏洞。对2020年至2024年与更新相关的CVE的更广泛分析显示,超过一半源于定制导致的问题。这些发现突显了一个至关重要但未受到充分审查的现实:随着定制的增加,攻击面也增加,而当前的防御措施无法跟上步伐。我们提出了IMUP(完整性中心化模块化更新平台),这是第一个旨在解决两个关键挑战的框架:构建一个可信的跨模块完整性链和在大规模定制下扩展更新性能。IMUP结合了三种技术:用于完整性的模块级变色哈希,服务器端的工作证明卸载以减少设备开销,以及服务器端缓存以重用模块组合,从而最小化重建成本。安全分析表明,即使暴露了95%的秘密密钥,伪造一个有效的镜像的成本是合法服务器的300多倍。对异构物联网设备的实验表明,与基线包管理器相比,IMUP将服务器端生成时间缩短了2.9倍,将设备停机时间缩短了5.9倍。
更新时间: 2025-07-25 15:17:29
领域: cs.CR
A Data-Driven Approach to Estimate LEO Orbit Capacity Models
Utilizing the Sparse Identification of Nonlinear Dynamics algorithm (SINDy) and Long Short-Term Memory Recurrent Neural Networks (LSTM), the population of resident space objects, divided into Active, Derelict, and Debris, in LEO can be accurately modeled to predict future satellite and debris propagation. This proposed approach makes use of a data set coming from a computational expensive high-fidelity model, the MOCAT-MC, to provide a light, low-fidelity counterpart that provides accurate forecasting in a shorter time frame.
Updated: 2025-07-25 15:16:54
标题: 一个基于数据驱动的方法来估计低地球轨道容量模型
摘要: 利用稀疏非线性动力学算法(SINDy)和长短期记忆递归神经网络(LSTM),可以准确地对LEO中的居民空间物体群进行建模,分为活跃、废弃和碎片,以预测未来卫星和碎片传播。该方法利用来自计算昂贵的高保真模型MOCAT-MC的数据集,提供一个轻量级、低保真的对应模型,在较短的时间范围内提供准确的预测。
更新时间: 2025-07-25 15:16:54
领域: cs.LG
Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges
This position paper examines the use of Large Language Models (LLMs) in social simulation, analyzing both their potential and their limitations from a computational social science perspective. The first part reviews recent findings on the ability of LLMs to replicate key aspects of human cognition, including Theory of Mind reasoning and social inference, while also highlighting significant limitations such as cognitive biases, lack of true understanding, and inconsistencies in behavior. The second part surveys emerging applications of LLMs in multi-agent simulation frameworks, focusing on system architectures, scale, and validation strategies. Notable projects such as Generative Agents (Smallville) and AgentSociety are discussed in terms of their design choices, empirical grounding, and methodological innovations. Particular attention is given to the challenges of behavioral fidelity, calibration, and reproducibility in large-scale LLM-driven simulations. The final section distinguishes between contexts where LLMs, like other black-box systems, offer direct value-such as interactive simulations and serious games-and those where their use is more problematic, notably in explanatory or predictive modeling. The paper concludes by advocating for hybrid approaches that integrate LLMs into traditional agent-based modeling platforms (GAMA, Netlogo, etc), enabling modelers to combine the expressive flexibility of language-based reasoning with the transparency and analytical rigor of classical rule-based systems.
Updated: 2025-07-25 15:15:35
标题: 将LLM整合到基于代理的社会模拟中:机遇与挑战
摘要: 这篇立场文件考察了大型语言模型(LLMs)在社会模拟中的应用,从计算社会科学的角度分析了它们的潜力和局限性。第一部分回顾了最近关于LLMs能够复制人类认知关键方面的能力的发现,包括心灵理论推理和社会推理,同时也突出了重要局限性,如认知偏见、缺乏真正理解和行为不一致。第二部分调查了LLMs在多智能体仿真框架中新兴应用,重点关注系统架构、规模和验证策略。着重讨论了一些显著项目,如生成代理(Smallville)和AgentSociety,以及它们的设计选择、实证基础和方法创新。特别关注了大规模LLM驱动仿真中行为保真度、校准和可重复性的挑战。最后一部分区分了LLMs在交互式仿真和严肃游戏等直接提供价值的情境与在解释性或预测性建模等情境中使用更为棘手的情况。文章最后提倡采用混合方法,将LLMs整合到传统的基于代理的建模平台(如GAMA、Netlogo等)中,使建模者能够将基于语言的推理的表达灵活性与经典基于规则的系统的透明度和分析严密性相结合。
更新时间: 2025-07-25 15:15:35
领域: cs.AI,cs.MA
Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges
This position paper examines the use of Large Language Models (LLMs) in social simulation, analyzing both their potential and their limitations from a computational social science perspective. The first part reviews recent findings on the ability of LLMs to replicate key aspects of human cognition, including Theory of Mind reasoning and social inference, while also highlighting significant limitations such as cognitive biases, lack of true understanding, and inconsistencies in behavior. The second part surveys emerging applications of LLMs in multi-agent simulation frameworks, focusing on system architectures, scale, and validation strategies. Notable projects such as Generative Agents (Smallville) and AgentSociety are discussed in terms of their design choices, empirical grounding, and methodological innovations. Particular attention is given to the challenges of behavioral fidelity, calibration, and reproducibility in large-scale LLM-driven simulations. The final section distinguishes between contexts where LLMs, like other black-box systems, offer direct value-such as interactive simulations and serious games-and those where their use is more problematic, notably in explanatory or predictive modeling. The paper concludes by advocating for hybrid approaches that integrate LLMs into traditional agent-based modeling platforms (GAMA, Netlogo, etc), enabling modelers to combine the expressive flexibility of language-based reasoning with the transparency and analytical rigor of classical rule-based systems.
Updated: 2025-07-25 15:15:35
标题: 整合LLM在基于代理人的社会仿真中的机遇与挑战
摘要: 这篇立场文件考察了在社会模拟中使用大型语言模型(LLMs),从计算社会科学的角度分析了它们的潜力和局限性。第一部分回顾了最近关于LLMs能否复制人类认知的关键方面,包括心理理论推理和社会推断,同时也突出了认知偏见、缺乏真正理解和行为不一致等重要局限性。第二部分调查了LLMs在多智能体模拟框架中新兴应用,重点关注系统架构、规模和验证策略。诸如生成智能体(Smallville)和AgentSociety等显著项目在设计选择、实证基础和方法创新方面进行讨论。特别关注大规模LLM驱动模拟中行为逼真度、校准和可重现性方面的挑战。最后一节区分了LLMs在交互式模拟和严肃游戏等领域提供直接价值的情境,以及它们在解释性或预测性建模中使用更为问题的情境。文章最后主张采用混合方法,将LLMs整合到传统的基于代理的建模平台(GAMA、Netlogo等)中,使建模者能够将基于语言推理的表达灵活性与经典基于规则系统的透明性和分析严谨性相结合。
更新时间: 2025-07-25 15:15:35
领域: cs.AI,cs.MA
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
Large Vision-Language Models (LVLMs) have transformed image captioning, shifting from concise captions to detailed descriptions. We introduce LOTUS, a leaderboard for evaluating detailed captions, addressing three main gaps in existing evaluations: lack of standardized criteria, bias-aware assessments, and user preference considerations. LOTUS comprehensively evaluates various aspects, including caption quality (e.g., alignment, descriptiveness), risks (\eg, hallucination), and societal biases (e.g., gender bias) while enabling preference-oriented evaluations by tailoring criteria to diverse user preferences. Our analysis of recent LVLMs reveals no single model excels across all criteria, while correlations emerge between caption detail and bias risks. Preference-oriented evaluations demonstrate that optimal model selection depends on user priorities.
Updated: 2025-07-25 15:12:42
标题: LOTUS:从质量到社会偏见和用户偏好的详细图像字幕排行榜
摘要: 大型视觉-语言模型(LVLMs)已经改变了图像字幕生成的方式,从简洁的字幕转变为详细的描述。我们引入了LOTUS,一个用于评估详细字幕的排行榜,解决了现有评估中的三个主要缺陷:缺乏标准化标准、偏见感知评估和用户偏好考虑。LOTUS全面评估各个方面,包括字幕质量(例如对齐、描述性)、风险(例如幻觉)和社会偏见(例如性别偏见),同时通过根据不同用户偏好定制标准,实现了基于偏好的评估。我们对最近的LVLMs进行的分析表明,没有单一模型在所有标准上表现出色,但字幕细节和偏见风险之间存在相关性。基于偏好的评估表明,最佳模型选择取决于用户的优先考虑。
更新时间: 2025-07-25 15:12:42
领域: cs.CV,cs.AI,cs.CL,cs.CY,cs.LG
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
Large Vision-Language Models (LVLMs) have transformed image captioning, shifting from concise captions to detailed descriptions. We introduce LOTUS, a leaderboard for evaluating detailed captions, addressing three main gaps in existing evaluations: lack of standardized criteria, bias-aware assessments, and user preference considerations. LOTUS comprehensively evaluates various aspects, including caption quality (e.g., alignment, descriptiveness), risks (\eg, hallucination), and societal biases (e.g., gender bias) while enabling preference-oriented evaluations by tailoring criteria to diverse user preferences. Our analysis of recent LVLMs reveals no single model excels across all criteria, while correlations emerge between caption detail and bias risks. Preference-oriented evaluations demonstrate that optimal model selection depends on user priorities.
Updated: 2025-07-25 15:12:42
标题: LOTUS:一个从质量到社会偏见和用户偏好的详细图像描述排行榜
摘要: Large Vision-Language Models(LVLMs)已经改变了图像字幕,从简洁的字幕转变为详细的描述。我们引入了LOTUS,一个评估详细字幕的排行榜,解决了现有评估中的三个主要缺点:缺乏标准化标准、偏见意识评估和用户偏好考虑。LOTUS全面评估各个方面,包括字幕质量(例如对齐、描述性)、风险(例如幻觉)以及社会偏见(例如性别偏见),同时通过根据不同用户偏好定制标准,实现了以偏好为导向的评估。我们对最近的LVLMs进行的分析表明,没有一个单一模型在所有标准上表现出色,而字幕详细程度与偏见风险之间存在相关性。偏好导向的评估表明,最佳模型选择取决于用户的优先考虑因素。
更新时间: 2025-07-25 15:12:42
领域: cs.CV,cs.AI,cs.CL,cs.CY,cs.LG
SpeechIQ: Speech Intelligence Quotient Across Cognitive Levels in Voice Understanding Large Language Models
We introduce Speech-based Intelligence Quotient (SIQ) as a new form of human cognition-inspired evaluation pipeline for voice understanding large language models, LLM Voice, designed to assess their voice understanding ability. Moving beyond popular voice understanding metrics such as word error rate (WER), SIQ examines LLM Voice across three cognitive levels motivated by Bloom's Taxonomy: (1) Remembering (i.e., WER for verbatim accuracy); (2) Understanding (i.e., similarity of LLM's interpretations); and (3) Application (i.e., QA accuracy for simulating downstream tasks). We demonstrate that SIQ not only quantifies voice understanding abilities but also provides unified comparisons between cascaded methods (e.g., ASR LLM) and end-to-end models, identifies annotation errors in existing benchmarks, and detects hallucinations in LLM Voice. Our framework represents a first-of-its-kind intelligence examination that bridges cognitive principles with voice-oriented benchmarks, while exposing overlooked challenges in multi-modal training.
Updated: 2025-07-25 15:12:06
标题: SpeechIQ:语音情商在语音理解大型语言模型中的认知水平间
摘要: 我们引入基于语音的智商(SIQ)作为一种新形式的人类认知启发评估管道,用于评估其语音理解能力的大型语言模型LLM Voice。SIQ超越了流行的语音理解度量标准,如词错误率(WER),它根据布鲁姆的分类法考察LLM Voice在三个认知水平上的表现:(1)记忆(即逐字准确性的WER);(2)理解(即LLM解释的相似性);(3)应用(即模拟下游任务的QA准确性)。我们证明SIQ不仅量化了语音理解能力,还提供了级联方法(如ASR LLM)和端到端模型之间的统一比较,识别了现有基准数据集中的标注错误,并检测到LLM Voice中的幻觉。我们的框架代表了一种首创的智力考试,将认知原则与面向语音的基准数据集联系起来,同时暴露了多模态训练中被忽视的挑战。
更新时间: 2025-07-25 15:12:06
领域: cs.CL,cs.AI,cs.SC,cs.SD,eess.AS
SpeechIQ: Speech Intelligence Quotient Across Cognitive Levels in Voice Understanding Large Language Models
We introduce Speech-based Intelligence Quotient (SIQ) as a new form of human cognition-inspired evaluation pipeline for voice understanding large language models, LLM Voice, designed to assess their voice understanding ability. Moving beyond popular voice understanding metrics such as word error rate (WER), SIQ examines LLM Voice across three cognitive levels motivated by Bloom's Taxonomy: (1) Remembering (i.e., WER for verbatim accuracy); (2) Understanding (i.e., similarity of LLM's interpretations); and (3) Application (i.e., QA accuracy for simulating downstream tasks). We demonstrate that SIQ not only quantifies voice understanding abilities but also provides unified comparisons between cascaded methods (e.g., ASR LLM) and end-to-end models, identifies annotation errors in existing benchmarks, and detects hallucinations in LLM Voice. Our framework represents a first-of-its-kind intelligence examination that bridges cognitive principles with voice-oriented benchmarks, while exposing overlooked challenges in multi-modal training.
Updated: 2025-07-25 15:12:06
标题: SpeechIQ:在语音理解大型语言模型中跨认知水平的语音智商
摘要: 我们引入了基于语音的智商(SIQ)作为一种新形式的人类认知启发评估管道,用于评估语音理解大型语言模型LLM Voice的能力。SIQ超越了流行的语音理解指标,如词错误率(WER),它通过布鲁姆的认知分类法考察LLM Voice在三个认知级别上的表现:(1)记忆(即逐字准确性的WER);(2)理解(即LLM解释的相似性);以及(3)应用(即模拟下游任务的QA准确性)。我们证明了SIQ不仅量化了语音理解能力,还在级联方法(例如ASR LLM)和端到端模型之间提供了统一的比较,识别了现有基准中的注释错误,并检测到LLM Voice中的幻觉。我们的框架代表了一种首创性的智力检测方法,它将认知原理与面向语音的基准相结合,同时揭示了多模态训练中被忽视的挑战。
更新时间: 2025-07-25 15:12:06
领域: cs.CL,cs.AI,cs.SC,cs.SD,eess.AS
EffiComm: Bandwidth Efficient Multi Agent Communication
Collaborative perception allows connected vehicles to exchange sensor information and overcome each vehicle's blind spots. Yet transmitting raw point clouds or full feature maps overwhelms Vehicle-to-Vehicle (V2V) communications, causing latency and scalability problems. We introduce EffiComm, an end-to-end framework that transmits less than 40% of the data required by prior art while maintaining state-of-the-art 3D object detection accuracy. EffiComm operates on Bird's-Eye-View (BEV) feature maps from any modality and applies a two-stage reduction pipeline: (1) Selective Transmission (ST) prunes low-utility regions with a confidence mask; (2) Adaptive Grid Reduction (AGR) uses a Graph Neural Network (GNN) to assign vehicle-specific keep ratios according to role and network load. The remaining features are fused with a soft-gated Mixture-of-Experts (MoE) attention layer, offering greater capacity and specialization for effective feature integration. On the OPV2V benchmark, EffiComm reaches 0.84 mAP@0.7 while sending only an average of approximately 1.5 MB per frame, outperforming previous methods on the accuracy-per-bit curve. These results highlight the value of adaptive, learned communication for scalable Vehicle-to-Everything (V2X) perception.
Updated: 2025-07-25 15:03:26
标题: EffiComm:带宽高效的多代理通信
摘要: 协同感知允许连接的车辆交换传感器信息并弥补每辆车的盲区。然而,传输原始点云或完整特征图会使车辆间通信(V2V)不堪重负,导致延迟和可扩展性问题。我们引入了EffiComm,这是一个端到端框架,传输的数据量不到先前技术所需的40%,同时保持了最先进的3D目标检测精度。EffiComm在任何模态的鸟瞰特征图上运行,并应用了一个两阶段的降低流程:(1)选择性传输(ST)使用置信度掩码修剪低效果区域;(2)自适应网格减少(AGR)使用图神经网络(GNN)根据角色和网络负载分配车辆特定的保持比例。剩下的特征与软门控的专家混合(MoE)注意力层融合,为有效的特征整合提供更大的容量和专业化。在OPV2V基准上,EffiComm在每帧只发送平均约1.5 MB的情况下达到了0.84 mAP@0.7,优于以往方法在准确性-比特曲线上的表现。这些结果突显了可扩展的车辆对一切(V2X)感知中自适应、学习通信的价值。
更新时间: 2025-07-25 15:03:26
领域: cs.CV,cs.LG,cs.RO
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks
Recently, recurrent large language models (Recurrent LLMs) with linear computational complexity have re-emerged as efficient alternatives to self-attention-based LLMs (Self-Attention LLMs), which have quadratic complexity. However, Recurrent LLMs often underperform on long-context tasks due to their limited fixed-size memory. Previous research has primarily focused on enhancing the memory capacity of Recurrent LLMs through architectural innovations, but these approaches have not yet enabled Recurrent LLMs to match the performance of Self-Attention LLMs on long-context tasks. We argue that this limitation arises because processing the entire context at once is not well-suited for Recurrent LLMs. In this paper, we propose Smooth Reading, a chunk-wise inference method inspired by human reading strategies. Smooth Reading processes context in chunks and iteratively summarizes the contextual information, thereby reducing memory demands and making the approach more compatible with Recurrent LLMs. Our experimental results show that this method substantially narrows the performance gap between Recurrent and Self-Attention LLMs on long-context tasks, while preserving the efficiency advantages of Recurrent LLMs. Our Smooth Reading boosts SWA-3B-4k (a Recurrent LLM) from 5.68% lower to 3.61% higher performance than Self-Attention LLMs on LongBench. Besides, our method maintains the high efficiency, training 3x faster and inferring 2x faster at 64k context compared to Self-Attention LLMs. To our knowledge, this is the first work to achieve comparable performance using Recurrent LLMs compared with Self-Attention LLMs on long-context tasks. We hope our method will inspire future research in this area. To facilitate further progress, we will release code and dataset.
Updated: 2025-07-25 15:02:45
标题: 流畅阅读:弥合循环LLM和自注意力LLM在长文本任务上的差距
摘要: 最近,具有线性计算复杂度的循环大型语言模型(Recurrent LLMs)重新出现,成为自注意力型LLMs(Self-Attention LLMs)的高效替代方案,后者具有二次复杂度。然而,由于其有限的固定大小内存,循环LLMs在长上下文任务中经常表现不佳。先前的研究主要集中在通过架构创新来增强循环LLMs的内存容量,但这些方法尚未使循环LLMs在长上下文任务上达到自注意力LLMs的性能水平。我们认为,这种限制是因为一次处理整个上下文对循环LLMs不太适合。在本文中,我们提出了Smooth Reading,一种受人类阅读策略启发的分块推理方法。Smooth Reading以块的形式处理上下文,并逐步总结上下文信息,从而减少了内存需求,并使该方法更加与循环LLMs兼容。我们的实验结果表明,这种方法在长上下文任务上显著缩小了循环和自注意力LLMs之间的性能差距,同时保留了循环LLMs的效率优势。我们的Smooth Reading将SWA-3B-4k(一种循环LLMs)的性能从低于Self-Attention LLMs的5.68%提升至高于3.61%。此外,我们的方法保持高效率,在64k上下文下训练速度快3倍,推理速度快2倍,相比之下比自注意力LLMs更快。据我们所知,这是第一项在长上下文任务中使用循环LLMs实现与自注意力LLMs相当性能的工作。我们希望我们的方法能激发未来在这一领域的研究。为了促进进一步进展,我们将发布代码和数据集。
更新时间: 2025-07-25 15:02:45
领域: cs.CL,cs.AI
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks
Recently, recurrent large language models (Recurrent LLMs) with linear computational complexity have re-emerged as efficient alternatives to self-attention-based LLMs (Self-Attention LLMs), which have quadratic complexity. However, Recurrent LLMs often underperform on long-context tasks due to their limited fixed-size memory. Previous research has primarily focused on enhancing the memory capacity of Recurrent LLMs through architectural innovations, but these approaches have not yet enabled Recurrent LLMs to match the performance of Self-Attention LLMs on long-context tasks. We argue that this limitation arises because processing the entire context at once is not well-suited for Recurrent LLMs. In this paper, we propose Smooth Reading, a chunk-wise inference method inspired by human reading strategies. Smooth Reading processes context in chunks and iteratively summarizes the contextual information, thereby reducing memory demands and making the approach more compatible with Recurrent LLMs. Our experimental results show that this method substantially narrows the performance gap between Recurrent and Self-Attention LLMs on long-context tasks, while preserving the efficiency advantages of Recurrent LLMs. Our Smooth Reading boosts SWA-3B-4k (a Recurrent LLM) from 5.68% lower to 3.61% higher performance than Self-Attention LLMs on LongBench. Besides, our method maintains the high efficiency, training 3x faster and inferring 2x faster at 64k context compared to Self-Attention LLMs. To our knowledge, this is the first work to achieve comparable performance using Recurrent LLMs compared with Self-Attention LLMs on long-context tasks. We hope our method will inspire future research in this area. To facilitate further progress, we will release code and dataset.
Updated: 2025-07-25 15:02:45
标题: 平滑阅读:在长上下文任务中将循环LLM与自注意LLM之间的差距弥合
摘要: 最近,具有线性计算复杂度的循环大型语言模型(Recurrent LLMs)重新出现,成为自注意力大型语言模型(Self-Attention LLMs)的高效替代品,后者具有二次复杂度。然而,循环LLMs在长上下文任务上通常表现不佳,原因是它们具有有限的固定大小内存。先前的研究主要集中在通过架构创新增强循环LLMs的记忆容量,但这些方法尚未使循环LLMs在长上下文任务上达到自注意力LLMs的性能。我们认为这种限制是因为一次处理整个上下文对于循环LLMs并不合适。在本文中,我们提出了一种受人类阅读策略启发的分块推理方法——Smooth Reading。Smooth Reading将上下文分块处理,并逐步总结上下文信息,从而减少内存需求,使该方法更适用于循环LLMs。我们的实验结果表明,这种方法显著缩小了循环和自注意力LLMs在长上下文任务上的性能差距,同时保留了循环LLMs的效率优势。我们的Smooth Reading将SWA-3B-4k(一种循环LLM)的性能从低于5.68%提升到高于3.61%,比Self-Attention LLMs在LongBench上表现更好。此外,我们的方法保持了高效性,在64k上下文下训练速度快3倍,推理速度快2倍,相比于Self-Attention LLMs。据我们所知,这是首次在长上下文任务上使用循环LLMs实现与自注意力LLMs相当的性能。我们希望我们的方法能激发未来在该领域的研究。为了促进进一步的进展,我们将发布代码和数据集。
更新时间: 2025-07-25 15:02:45
领域: cs.CL,cs.AI
Reconstruction of Sparse Urban Wireless Signals via Group Equivariant Non-Expansive Operators
In emerging communication systems such as sixth generation (6G) wireless networks, efficient resource management and service delivery rely on accurate knowledge of spatially-varying quantities like signal-to-interference-noise ratio (SINR) maps, which are costly to acquire at high resolution. This work explores the reconstruction of such spatial signals from sparse measurements using Group Equivariant Non-Expansive Operators (GENEOs), offering a low-complexity alternative to traditional neural networks. The concept of GENEO, which originated in topological data analysis (TDA), is a mathematical tool used in machine learning to represent agents modelled as functional operators acting on data while incorporating application-specific invariances. Leveraging these invariances reduces the number of parameters with respect to traditional neural networks and mitigates data scarcity by enforcing known algebraic and geometric constraints that reflect symmetries in the agents' actions. In this paper, we introduce a novel GENEO-based approach for SINR map reconstruction in urban wireless communication networks using extremely sparse sampling. We demonstrate that this mathematical framework achieves competitive performance compared to established methods. Our evaluation, conducted using both statistical and TDA metrics, highlights the advantages of our approach in accurately reconstructing spatial signals under severe data limitations on the number of samples.
Updated: 2025-07-25 14:59:44
标题: 通过群等变非扩张算子重建稀疏城市无线信号
摘要: 在新兴的通信系统中,如第六代(6G)无线网络,高效的资源管理和服务交付依赖于准确了解空间变化量,如信号干扰噪声比(SINR)图,这在高分辨率下是昂贵的。本研究探讨了利用群等变非扩展算子(GENEOs)从稀疏测量中重建这种空间信号,为传统神经网络提供了低复杂度的替代方案。GENEO的概念源自拓扑数据分析(TDA),是一种数学工具,用于机器学习中代表作为对数据进行操作的功能算子的代理,并结合特定应用不变性。利用这些不变性减少了参数数量,相对于传统神经网络,并通过强制执行反映代理行动中对称性的已知代数和几何约束,减轻了数据稀缺的问题。本文介绍了一种基于GENEO的方法,用于在城市无线通信网络中使用极度稀疏采样重建SINR图。我们展示了这种数学框架与已建立方法相比的竞争性性能。我们的评估使用统计和TDA指标进行,突出了我们方法在准确重建受限制样本数量的空间信号方面的优势。
更新时间: 2025-07-25 14:59:44
领域: cs.LG,cs.NI
Short-Form Video Recommendations with Multimodal Embeddings: Addressing Cold-Start and Bias Challenges
In recent years, social media users have spent significant amounts of time on short-form video platforms. As a result, established platforms in other domains, such as e-commerce, have begun introducing short-form video content to engage users and increase their time spent on the platform. The success of these experiences is due not only to the content itself but also to a unique UI innovation: instead of offering users a list of choices to click, platforms actively recommend content for users to watch one at a time. This creates new challenges for recommender systems, especially when launching a new video experience. Beyond the limited interaction data, immersive feed experiences introduce stronger position bias due to the UI and duration bias when optimizing for watch-time, as models tend to favor shorter videos. These issues, together with the feedback loop inherent in recommender systems, make it difficult to build effective solutions. In this paper, we highlight the challenges faced when introducing a new short-form video experience and present our experience showing that, even with sufficient video interaction data, it can be more beneficial to leverage a video retrieval system using a fine-tuned multimodal vision-language model to overcome these challenges. This approach demonstrated greater effectiveness compared to conventional supervised learning methods in online experiments conducted on our e-commerce platform.
Updated: 2025-07-25 14:57:04
标题: 多模态嵌入的短视频推荐:解决冷启动和偏见挑战
摘要: 近年来,社交媒体用户在短视频平台上花费了大量时间。因此,其他领域的已建立平台,如电子商务,已开始引入短视频内容来吸引用户并增加他们在平台上的时间。这些经验的成功不仅源于内容本身,还有一项独特的用户界面创新:平台不是提供用户点击的选择列表,而是积极推荐用户逐个观看内容。这给推荐系统带来了新的挑战,特别是在推出新视频体验时。除了有限的交互数据之外,沉浸式的信息流体验由于用户界面和观看时间的优化而引入了更强的位置偏见,因为模型往往更青睐较短的视频。这些问题,以及推荐系统固有的反馈循环,使构建有效解决方案变得困难。在本文中,我们强调了引入新的短视频体验时面临的挑战,并展示了我们的经验,即即使有足够的视频交互数据,利用一个经过精细调整的多模态视觉语言模型的视频检索系统可能更有利于克服这些挑战。与在线实验中进行的传统监督学习方法相比,这种方法在我们的电子商务平台上表现出更大的有效性。
更新时间: 2025-07-25 14:57:04
领域: cs.LG
Pulse-Level Simulation of Crosstalk Attacks on Superconducting Quantum Hardware
Hardware crosstalk in multi-tenant superconducting quantum computers poses a severe security threat, allowing adversaries to induce targeted errors across tenant boundaries by injecting carefully engineered pulses. We present a simulation-based study of active crosstalk attacks at the pulse level, analyzing how adversarial control of pulse timing, shape, amplitude, and coupling can disrupt a victim's computation. Our framework models the time-dependent dynamics of a three-qubit system in the rotating frame, capturing both always-on couplings and injected drive pulses. We examine two attack strategies: attacker-first (pulse before victim operation) and victim-first (pulse after), and systematically identify the pulse and coupling configurations that cause the largest logical errors. Protocol-level experiments on quantum coin flip and XOR classification circuits show that some protocols are highly vulnerable to these attacks, while others remain robust. Based on these findings, we discuss practical methods for detection and mitigation to improve security in quantum cloud platforms.
Updated: 2025-07-25 14:49:58
标题: 超导量子硬件上串扰攻击的脉冲级模拟
摘要: 多租户超导量子计算机中的硬件串扰构成严重的安全威胁,允许对手通过注入精心设计的脉冲跨租户边界诱发有针对性的错误。我们提出了一项基于模拟的研究,针对脉冲级别的主动串扰攻击进行分析,分析对手控制脉冲的时间、形状、幅度和耦合如何干扰受害者的计算。我们的框架模拟了在旋转参考系中的三比特系统的时间依赖动态,捕捉了始终开启的耦合和注入的激励脉冲。我们考察了两种攻击策略:攻击者先行(在受害者操作之前注入脉冲)和受害者先行(在之后注入脉冲),并系统地确定导致最大逻辑错误的脉冲和耦合配置。关于量子硬币翻转和XOR分类电路的协议级实验表明,一些协议对这些攻击非常脆弱,而其他一些则保持稳健。基于这些发现,我们讨论了改进量子云平台安全性的检测和缓解的实用方法。
更新时间: 2025-07-25 14:49:58
领域: quant-ph,cs.CR
Lower Bounds on the Size of Markov Equivalence Classes
Causal discovery algorithms typically recover causal graphs only up to their Markov equivalence classes unless additional parametric assumptions are made. The sizes of these equivalence classes reflect the limits of what can be learned about the underlying causal graph from purely observational data. Under the assumptions of acyclicity, causal sufficiency, and a uniform model prior, Markov equivalence classes are known to be small on average. In this paper, we show that this is no longer the case when any of these assumptions is relaxed. Specifically, we prove exponentially large lower bounds for the expected size of Markov equivalence classes in three settings: sparse random directed acyclic graphs, uniformly random acyclic directed mixed graphs, and uniformly random directed cyclic graphs.
Updated: 2025-07-25 14:48:30
标题: 马尔可夫等价类大小的下界
摘要: 因果发现算法通常只能恢复因果图的马尔可夫等价类,除非额外假设参数。这些等价类的大小反映了纯观测数据中可以学到的潜在因果图的限制。在无环性、因果充分性和统一模型先验的假设下,已知马尔可夫等价类平均较小。在本文中,我们证明在放宽任何这些假设时,这种情况不再成立。具体地,我们针对三种情境证明了马尔可夫等价类期望大小的指数级下界:稀疏随机有向无环图、均匀随机无环有向混合图和均匀随机有向循环图。
更新时间: 2025-07-25 14:48:30
领域: stat.ML,cs.LG,math.ST,stat.TH
Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs
Tabular data is critical across diverse domains, yet high-quality datasets remain scarce due to privacy concerns and the cost of collection. Contemporary approaches adopt large language models (LLMs) for tabular augmentation, but exhibit two major limitations: (1) dense dependency modeling among tabular features that can introduce bias, and (2) high computational overhead in sampling. To address these issues, we propose SPADA for SPArse Dependency-driven Augmentation, a lightweight generative framework that explicitly captures sparse dependencies via an LLM-induced graph. We treat each feature as a node and synthesize values by traversing the graph, conditioning each feature solely on its parent nodes. We explore two synthesis strategies: a non-parametric method using Gaussian kernel density estimation, and a conditional normalizing flow model that learns invertible mappings for conditional density estimation. Experiments on four datasets show that SPADA reduces constraint violations by 4% compared to diffusion-based methods and accelerates generation by nearly 9,500 times over LLM-based baselines.
Updated: 2025-07-25 14:43:50
标题: 在几分钟内翻倍您的数据:通过LLM引发的依赖图实现超快速表格数据生成
摘要: 表格数据在各个领域中至关重要,但由于隐私问题和收集成本而使高质量数据集稀缺。当代方法采用大型语言模型(LLMs)进行表格增强,但存在两个主要限制:(1)表格特征之间的密集依赖建模可能引入偏差,以及(2)在抽样中存在高计算开销。为了解决这些问题,我们提出了SPADA,即基于稀疏依赖的增强SPArse Dependency-driven Augmentation的轻量级生成框架,通过LLM诱导的图明确捕获稀疏依赖关系。我们将每个特征视为一个节点,并通过遍历图来合成值,仅将每个特征条件化于其父节点。我们探索了两种合成策略:一种使用高斯核密度估计的非参数方法,以及一种学习用于条件密度估计的可逆映射的条件归一化流模型。对四个数据集的实验证明,与基于扩散的方法相比,SPADA将约束违规减少了4%,并且生成速度比基于LLM的基线快近9500倍。
更新时间: 2025-07-25 14:43:50
领域: cs.LG,cs.AI
Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs
Tabular data is critical across diverse domains, yet high-quality datasets remain scarce due to privacy concerns and the cost of collection. Contemporary approaches adopt large language models (LLMs) for tabular augmentation, but exhibit two major limitations: (1) dense dependency modeling among tabular features that can introduce bias, and (2) high computational overhead in sampling. To address these issues, we propose SPADA for SPArse Dependency-driven Augmentation, a lightweight generative framework that explicitly captures sparse dependencies via an LLM-induced graph. We treat each feature as a node and synthesize values by traversing the graph, conditioning each feature solely on its parent nodes. We explore two synthesis strategies: a non-parametric method using Gaussian kernel density estimation, and a conditional normalizing flow model that learns invertible mappings for conditional density estimation. Experiments on four datasets show that SPADA reduces constraint violations by 4% compared to diffusion-based methods and accelerates generation by nearly 9,500 times over LLM-based baselines.
Updated: 2025-07-25 14:43:50
标题: 在几分钟内将数据翻倍:通过LLM诱导的依赖图实现超快速表格数据生成
摘要: 表格数据在各个领域至关重要,但由于隐私问题和收集成本的原因,高质量的数据集仍然很少。当代方法采用大型语言模型(LLMs)进行表格增强,但存在两个主要限制:(1)表格特征之间的密集依赖建模可能引入偏差,以及(2)采样中的高计算开销。为了解决这些问题,我们提出了SPADA,即稀疏依赖驱动增强,这是一个轻量级生成框架,通过LLM诱导的图明确捕捉稀疏依赖关系。我们将每个特征视为一个节点,并通过遍历图来合成值,仅根据其父节点对每个特征进行条件化。我们探索了两种合成策略:一种使用高斯核密度估计的非参数方法,以及一个学习可逆映射进行条件密度估计的条件归一化流模型。对四个数据集的实验表明,与基于扩散的方法相比,SPADA将约束违反降低了4%,并且相对于基于LLM的基线模型,生成加速了近9,500倍。
更新时间: 2025-07-25 14:43:50
领域: cs.LG,cs.AI
$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation
The pursuit of a generalizable stereo matching model, capable of performing across varying resolutions and disparity ranges without dataset-specific fine-tuning, has revealed a fundamental trade-off. Iterative local search methods achieve high scores on constrained benchmarks, but their core mechanism inherently limits the global consistency required for true generalization. On the other hand, global matching architectures, while theoretically more robust, have been historically rendered infeasible by prohibitive computational and memory costs. We resolve this dilemma with $S^2M^2$: a global matching architecture that achieves both state-of-the-art accuracy and high efficiency without relying on cost volume filtering or deep refinement stacks. Our design integrates a multi-resolution transformer for robust long-range correspondence, trained with a novel loss function that concentrates probability on feasible matches. This approach enables a more robust joint estimation of disparity, occlusion, and confidence. $S^2M^2$ establishes a new state of the art on the Middlebury v3 and ETH3D benchmarks, significantly outperforming prior methods across most metrics while reconstructing high-quality details with competitive efficiency.
Updated: 2025-07-25 14:42:59
标题: $S^2M^2$: 可扩展的立体匹配模型用于可靠的深度估计
摘要: 追求一个可在不同分辨率和视差范围下执行而无需特定数据集微调的通用立体匹配模型揭示了一个基本的权衡。迭代局部搜索方法在受限基准上取得了高分,但它们的核心机制固有地限制了真正泛化所需的全局一致性。另一方面,尽管全局匹配架构在理论上更加稳健,但其历史上由于计算和内存成本的限制而不可行。我们通过 $S^2M^2$ 解决了这个困境:一个全局匹配架构,实现了最先进的准确性和高效率,而不依赖于代价体积过滤或深度细化堆栈。我们的设计集成了一个多分辨率变压器,用于强大的远程对应关系,训练了一个新颖的损失函数,将概率集中在可行匹配上。这种方法实现了视差、遮挡和置信度的更强大的联合估计。$S^2M^2$ 在Middlebury v3和ETH3D基准上建立了一个新的技术水平,大部分指标显著优于先前的方法,同时以竞争性效率重建高质量细节。
更新时间: 2025-07-25 14:42:59
领域: cs.CV,cs.AI,cs.RO
$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation
The pursuit of a generalizable stereo matching model, capable of performing across varying resolutions and disparity ranges without dataset-specific fine-tuning, has revealed a fundamental trade-off. Iterative local search methods achieve high scores on constrained benchmarks, but their core mechanism inherently limits the global consistency required for true generalization. On the other hand, global matching architectures, while theoretically more robust, have been historically rendered infeasible by prohibitive computational and memory costs. We resolve this dilemma with $S^2M^2$: a global matching architecture that achieves both state-of-the-art accuracy and high efficiency without relying on cost volume filtering or deep refinement stacks. Our design integrates a multi-resolution transformer for robust long-range correspondence, trained with a novel loss function that concentrates probability on feasible matches. This approach enables a more robust joint estimation of disparity, occlusion, and confidence. $S^2M^2$ establishes a new state of the art on the Middlebury v3 and ETH3D benchmarks, significantly outperforming prior methods across most metrics while reconstructing high-quality details with competitive efficiency.
Updated: 2025-07-25 14:42:59
标题: $S^2M^2$: 可伸缩的立体匹配模型,用于可靠的深度估计
摘要: 追求一个可推广的立体匹配模型,能够在不同分辨率和视差范围下进行操作,而无需特定数据集的精细调整,已揭示了一种基本权衡。迭代局部搜索方法在受限基准上取得了高分,但它们的核心机制本质上限制了真正泛化所需的全局一致性。另一方面,全局匹配架构在理论上更加健壮,但由于计算和存储成本过高而在历史上变得不可行。我们通过$S^2M^2$解决了这一困境:一种全局匹配架构,既实现了最先进的准确性,又具有高效性,而无需依赖成本体积滤波或深度细化堆栈。我们的设计集成了多分辨率变换器,用于强大的长距离对应,训练了一种新颖的损失函数,将概率集中在可行的匹配上。这种方法实现了视差、遮挡和置信度的更稳健的联合估计。$S^2M^2$在Middlebury v3和ETH3D基准上建立了一个新的技术水平,大部分指标明显优于先前的方法,同时以竞争性效率重建高质量细节。
更新时间: 2025-07-25 14:42:59
领域: cs.CV,cs.AI,cs.RO
SIDE: Sparse Information Disentanglement for Explainable Artificial Intelligence
Understanding the decisions made by deep neural networks is essential in high-stakes domains such as medical imaging and autonomous driving. Yet, these models often lack transparency, particularly in computer vision. Prototypical-parts-based neural networks have emerged as a promising solution by offering concept-level explanations. However, most are limited to fine-grained classification tasks, with few exceptions such as InfoDisent. InfoDisent extends prototypical models to large-scale datasets like ImageNet, but produces complex explanations. We introduce Sparse Information Disentanglement for Explainability (SIDE), a novel method that improves the interpretability of prototypical parts through a dedicated training and pruning scheme that enforces sparsity. Combined with sigmoid activations in place of softmax, this approach allows SIDE to associate each class with only a small set of relevant prototypes. Extensive experiments show that SIDE matches the accuracy of existing methods while reducing explanation size by over $90\%$, substantially enhancing the understandability of prototype-based explanations.
Updated: 2025-07-25 14:34:15
标题: SIDE:稀疏信息解缠算法用于可解释人工智能
摘要: 理解深度神经网络所做的决策对于高风险领域如医学影像和自动驾驶至关重要。然而,这些模型通常缺乏透明性,特别是在计算机视觉领域。原型部分为基础的神经网络已经成为一种有前途的解决方案,提供概念级别的解释。然而,大多数仅限于细粒度分类任务,少数例外如InfoDisent。InfoDisent将原型模型扩展到像ImageNet这样的大规模数据集,但产生复杂的解释。 我们引入了Sparse Information Disentanglement for Explainability (SIDE),这是一种新的方法,通过专门的训练和修剪方案,强制稀疏性,提高了原型部分的可解释性。结合了sigmoid激活函数代替softmax,该方法使SIDE能够将每个类别与仅有少量相关原型关联。大量实验证明,SIDE在保持现有方法准确性的同时,将解释大小降低了超过90%,显著提高了基于原型的解释的可理解性。
更新时间: 2025-07-25 14:34:15
领域: cs.CV,cs.AI,cs.LG
SIDE: Sparse Information Disentanglement for Explainable Artificial Intelligence
Understanding the decisions made by deep neural networks is essential in high-stakes domains such as medical imaging and autonomous driving. Yet, these models often lack transparency, particularly in computer vision. Prototypical-parts-based neural networks have emerged as a promising solution by offering concept-level explanations. However, most are limited to fine-grained classification tasks, with few exceptions such as InfoDisent. InfoDisent extends prototypical models to large-scale datasets like ImageNet, but produces complex explanations. We introduce Sparse Information Disentanglement for Explainability (SIDE), a novel method that improves the interpretability of prototypical parts through a dedicated training and pruning scheme that enforces sparsity. Combined with sigmoid activations in place of softmax, this approach allows SIDE to associate each class with only a small set of relevant prototypes. Extensive experiments show that SIDE matches the accuracy of existing methods while reducing explanation size by over $90\%$, substantially enhancing the understandability of prototype-based explanations.
Updated: 2025-07-25 14:34:15
标题: SIDE:用于可解释人工智能的稀疏信息解缠。
摘要: 理解深度神经网络所做的决策对于高风险领域如医学成像和自动驾驶至关重要。然而,这些模型通常缺乏透明度,特别是在计算机视觉领域。原型部分基础的神经网络已经成为一个有前途的解决方案,通过提供概念级别的解释。然而,大多数只限于细粒度分类任务,少数例外如InfoDisent。InfoDisent将原型模型扩展到像ImageNet这样的大规模数据集,但产生复杂的解释。 我们引入了Sparse Information Disentanglement for Explainability(SIDE),这是一种通过专门的训练和修剪方案来提高原型部分解释性的新方法,强制稀疏性。结合使用Sigmoid激活代替Softmax,这种方法使得SIDE能够将每个类别与仅有一小组相关的原型相关联。大量实验证明,SIDE与现有方法的准确度相匹配,同时将解释大小降低了超过90%,显著提高了基于原型的解释的可理解性。
更新时间: 2025-07-25 14:34:15
领域: cs.CV,cs.AI,cs.LG
Human-AI Synergy in Adaptive Active Learning for Continuous Lithium Carbonate Crystallization Optimization
As demand for high-purity lithium surges with the growth of the electric vehicle (EV) industry, cost-effective extraction from lower-grade North American sources like the Smackover Formation is critical. These resources, unlike high-purity South American brines, require innovative purification techniques to be economically viable. Continuous crystallization is a promising method for producing battery-grade lithium carbonate, but its optimization is challenged by a complex parameter space and limited data. This study introduces a Human-in-the-Loop (HITL) assisted active learning framework to optimize the continuous crystallization of lithium carbonate. By integrating human expertise with data-driven insights, our approach accelerates the optimization of lithium extraction from challenging sources. Our results demonstrate the framework's ability to rapidly adapt to new data, significantly improving the process's tolerance to critical impurities like magnesium from the industry standard of a few hundred ppm to as high as 6000 ppm. This breakthrough makes the exploitation of low-grade, impurity-rich lithium resources feasible, potentially reducing the need for extensive pre-refinement processes. By leveraging artificial intelligence, we have refined operational parameters and demonstrated that lower-grade materials can be used without sacrificing product quality. This advancement is a significant step towards economically harnessing North America's vast lithium reserves, such as those in the Smackover Formation, and enhancing the sustainability of the global lithium supply chain.
Updated: 2025-07-25 14:30:37
标题: 人工智能与人类的协同作用在连续碳酸锂结晶优化中的应用
摘要: 随着电动汽车(EV)行业的增长,对高纯度锂的需求急剧增加,从像Smackover Formation这样的北美低品位资源中进行成本效益高的提取至关重要。与高纯度的南美盐湖不同,这些资源需要创新的纯化技术才能具有经济可行性。连续结晶是生产电池级碳酸锂的一种有前途的方法,但其优化受到复杂参数空间和有限数据的挑战。本研究引入了一个人机协同辅助(HITL)的主动学习框架来优化碳酸锂的连续结晶过程。通过将人类专业知识与数据驱动的见解相结合,我们的方法加速了从具有挑战性的资源中提取锂的优化过程。我们的结果表明,该框架能够快速适应新数据,将过程对关键杂质如镁的容忍度从行业标准的几百ppm显著提高到高达6000ppm。这一突破使得开采低品位、富含杂质的锂资源成为可能,潜在地减少了对广泛的预精炼过程的需求。通过利用人工智能,我们已经优化了操作参数,并证明了可以使用低品位材料而不牺牲产品质量。这一进展是朝着经济利用北美广阔的锂储量迈出的重要一步,例如Smackover Formation中的锂储量,并提升全球锂供应链的可持续性。
更新时间: 2025-07-25 14:30:37
领域: cond-mat.mtrl-sci,cond-mat.other,cs.HC,cs.LG,physics.data-an
Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer
Generating realistic synthetic electronic health records (EHRs) holds tremendous promise for accelerating healthcare research, facilitating AI model development and enhancing patient privacy. However, existing generative methods typically treat EHRs as flat sequences of discrete medical codes. This approach overlooks two critical aspects: the inherent hierarchical organization of clinical coding systems and the rich semantic context provided by code descriptions. Consequently, synthetic patient sequences often lack high clinical fidelity and have limited utility in downstream clinical tasks. In this paper, we propose the Hierarchy- and Semantics-Guided Transformer (HiSGT), a novel framework that leverages both hierarchical and semantic information for the generative process. HiSGT constructs a hierarchical graph to encode parent-child and sibling relationships among clinical codes and employs a graph neural network to derive hierarchy-aware embeddings. These are then fused with semantic embeddings extracted from a pre-trained clinical language model (e.g., ClinicalBERT), enabling the Transformer-based generator to more accurately model the nuanced clinical patterns inherent in real EHRs. Extensive experiments on the MIMIC-III and MIMIC-IV datasets demonstrate that HiSGT significantly improves the statistical alignment of synthetic data with real patient records, as well as supports robust downstream applications such as chronic disease classification. By addressing the limitations of conventional raw code-based generative models, HiSGT represents a significant step toward clinically high-fidelity synthetic data generation and a general paradigm suitable for interpretable medical code representation, offering valuable applications in data augmentation and privacy-preserving healthcare analytics.
Updated: 2025-07-25 14:26:39
标题: 通过层次结构和语义指导的变压器生成临床真实的电子健康记录数据
摘要: 生成逼真的合成电子健康记录(EHRs)在加速医疗研究、促进人工智能模型开发和增强患者隐私方面具有巨大潜力。然而,现有的生成方法通常将EHRs视为离散医疗代码的平面序列。这种方法忽视了两个关键方面:临床编码系统的固有分层组织和代码描述提供的丰富语义背景。因此,合成患者序列通常缺乏高度临床保真度,并在下游临床任务中具有有限的实用性。在本文中,我们提出了层次结构和语义引导变压器(HiSGT),这是一个利用层次结构和语义信息进行生成过程的新框架。HiSGT构建了一个层次图,用于编码临床代码之间的父子关系和兄弟关系,并使用图神经网络来推导具有层次感知的嵌入。然后,这些嵌入与从预训练的临床语言模型(例如ClinicalBERT)中提取的语义嵌入融合,使基于Transformer的生成器能够更准确地建模真实EHRs中固有的微妙临床模式。对MIMIC-III和MIMIC-IV数据集的大量实验表明,HiSGT显著提高了合成数据与真实患者记录的统计对齐度,同时支持强大的下游应用,如慢性疾病分类。通过解决传统基于原始代码的生成模型的局限性,HiSGT代表了朝着高度临床保真度的合成数据生成的重大进步,是一种适合可解释的医疗代码表示的通用范例,提供了在数据增强和隐私保护健康分析方面有价值的应用。
更新时间: 2025-07-25 14:26:39
领域: cs.LG,cs.AI
Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer
Generating realistic synthetic electronic health records (EHRs) holds tremendous promise for accelerating healthcare research, facilitating AI model development and enhancing patient privacy. However, existing generative methods typically treat EHRs as flat sequences of discrete medical codes. This approach overlooks two critical aspects: the inherent hierarchical organization of clinical coding systems and the rich semantic context provided by code descriptions. Consequently, synthetic patient sequences often lack high clinical fidelity and have limited utility in downstream clinical tasks. In this paper, we propose the Hierarchy- and Semantics-Guided Transformer (HiSGT), a novel framework that leverages both hierarchical and semantic information for the generative process. HiSGT constructs a hierarchical graph to encode parent-child and sibling relationships among clinical codes and employs a graph neural network to derive hierarchy-aware embeddings. These are then fused with semantic embeddings extracted from a pre-trained clinical language model (e.g., ClinicalBERT), enabling the Transformer-based generator to more accurately model the nuanced clinical patterns inherent in real EHRs. Extensive experiments on the MIMIC-III and MIMIC-IV datasets demonstrate that HiSGT significantly improves the statistical alignment of synthetic data with real patient records, as well as supports robust downstream applications such as chronic disease classification. By addressing the limitations of conventional raw code-based generative models, HiSGT represents a significant step toward clinically high-fidelity synthetic data generation and a general paradigm suitable for interpretable medical code representation, offering valuable applications in data augmentation and privacy-preserving healthcare analytics.
Updated: 2025-07-25 14:26:39
标题: 通过层次结构和语义指导的变换器生成临床真实的电子健康记录数据
摘要: 生成逼真的合成电子健康记录(EHRs)对加速医疗研究、促进AI模型开发和增强患者隐私保护具有巨大潜力。然而,现有的生成方法通常将EHRs视为离散医疗编码的平面序列。这种方法忽视了两个关键方面:临床编码系统的固有层次组织以及编码描述提供的丰富语义上下文。因此,合成患者序列通常缺乏高临床逼真度,并在下游临床任务中的实用性有限。在本文中,我们提出了层次结构和语义引导的Transformer(HiSGT),这是一个利用层次结构和语义信息进行生成过程的新框架。HiSGT构建了一个层次图来编码临床编码之间的父子和兄弟关系,并采用图神经网络来推导具有层次意识的嵌入。然后,这些嵌入与从预训练的临床语言模型(例如ClinicalBERT)中提取的语义嵌入相融合,使基于Transformer的生成器能够更准确地模拟真实EHRs中固有的微妙临床模式。对MIMIC-III和MIMIC-IV数据集的大量实验表明,HiSGT显著提高了合成数据与真实患者记录的统计对齐性,同时支持强大的下游应用,如慢性疾病分类。通过解决传统基于原始编码的生成模型的局限性,HiSGT代表了朝着临床高逼真度合成数据生成的重要一步,是一个适用于可解释医学编码表示的通用范式,提供了在数据增强和隐私保护医疗分析中有价值的应用。
更新时间: 2025-07-25 14:26:39
领域: cs.LG,cs.AI
Accelerometry-based Energy Expenditure Estimation During Activities of Daily Living: A Comparison Among Different Accelerometer Compositions
Physical activity energy expenditure (PAEE) can be measured from breath-by-breath respiratory data, which can serve as a reference. Alternatively, PAEE can be predicted from the body movements, which can be measured and estimated with accelerometers. The body center of mass (COM) acceleration reflects the movements of the whole body and thus serves as a good predictor for PAEE. However, the wrist has also become a popular location due to recent advancements in wrist-worn devices. Therefore, in this work, using the respiratory data measured by COSMED K5 as the reference, we evaluated and compared the performances of COM-based settings and wrist-based settings. The COM-based settings include two different accelerometer compositions, using only the pelvis accelerometer (pelvis-acc) and the pelvis accelerometer with two accelerometers from two thighs (3-acc). The wrist-based settings include using only the left wrist accelerometer (l-wrist-acc) and only the right wrist accelerometer (r-wrist-acc). We implemented two existing PAEE estimation methods on our collected dataset, where 9 participants performed activities of daily living while wearing 5 accelerometers (i.e., pelvis, two thighs, and two wrists). These two methods include a linear regression (LR) model and a CNN-LSTM model. Both models yielded the best results with the COM-based 3-acc setting (LR: $R^2$ = 0.41, CNN-LSTM: $R^2$ = 0.53). No significant difference was found between the 3-acc and pelvis-acc settings (p-value = 0.278). For both models, neither the l-wrist-acc nor the r-wrist-acc settings demonstrated predictive power on PAEE with $R^2$ values close to 0, significantly outperformed by the two COM-based settings (p-values $<$ 0.05). No significant difference was found between the two wrists (p-value = 0.329).
Updated: 2025-07-25 14:23:24
标题: 基于加速度计的日常活动能量消耗估算:不同加速度计组合的比较
摘要: 身体活动能量消耗(PAEE)可以通过呼吸数据逐呼吸测量得出,这可以作为参考。另外,PAEE也可以通过身体运动来预测,可以用加速计来测量和估计。身体质心(COM)加速度反映了整个身体的运动,因此可作为PAEE的良好预测因子。然而,由于最近手腕设备的进步,手腕也成为了一个受欢迎的位置。因此,在这项工作中,我们使用由COSMED K5测量的呼吸数据作为参考,评估和比较了基于COM和基于手腕的设置的性能。基于COM的设置包括两种不同的加速计组合,仅使用骨盆加速计(pelvis-acc)和骨盆加速计与两大腿的两个加速计(3-acc)。基于手腕的设置包括仅使用左手腕加速计(l-wrist-acc)和仅使用右手腕加速计(r-wrist-acc)。我们在收集的数据集上实施了两种现有的PAEE估计方法,9名参与者在穿戴5个加速计(即骨盆、两大腿和两只手腕)的情况下进行了日常生活活动。这两种方法包括线性回归(LR)模型和CNN-LSTM模型。两种模型在基于COM的3-acc设置下取得了最佳结果(LR:$R^2$ = 0.41,CNN-LSTM:$R^2$ = 0.53)。在3-acc和骨盆-acc设置之间没有发现显著差异(p值 = 0.278)。对于两种模型,无论是l-wrist-acc还是r-wrist-acc设置都没有显示出对PAEE的预测能力,$R^2$值接近0,明显不如两个基于COM的设置(p值 $<$ 0.05)。两只手腕之间没有发现显著差异(p值 = 0.329)。
更新时间: 2025-07-25 14:23:24
领域: cs.LG
Multistream Network for LiDAR and Camera-based 3D Object Detection in Outdoor Scenes
Fusion of LiDAR and RGB data has the potential to enhance outdoor 3D object detection accuracy. To address real-world challenges in outdoor 3D object detection, fusion of LiDAR and RGB input has started gaining traction. However, effective integration of these modalities for precise object detection task still remains a largely open problem. To address that, we propose a MultiStream Detection (MuStD) network, that meticulously extracts task-relevant information from both data modalities. The network follows a three-stream structure. Its LiDAR-PillarNet stream extracts sparse 2D pillar features from the LiDAR input while the LiDAR-Height Compression stream computes Bird's-Eye View features. An additional 3D Multimodal stream combines RGB and LiDAR features using UV mapping and polar coordinate indexing. Eventually, the features containing comprehensive spatial, textural and geometric information are carefully fused and fed to a detection head for 3D object detection. Our extensive evaluation on the challenging KITTI Object Detection Benchmark using public testing server at https://www.cvlibs.net/datasets/kitti/eval_object_detail.php?&result=d162ec699d6992040e34314d19ab7f5c217075e0 establishes the efficacy of our method by achieving new state-of-the-art or highly competitive results in different categories while remaining among the most efficient methods. Our code will be released through MuStD GitHub repository at https://github.com/IbrahimUWA/MuStD.git
Updated: 2025-07-25 14:20:16
标题: 多流网络用于室外场景中基于LiDAR和摄像头的三维物体检测
摘要: LiDAR和RGB数据的融合有潜力提高室外3D物体检测的准确性。为了解决室外3D物体检测中的现实挑战,LiDAR和RGB输入的融合开始引起关注。然而,有效地整合这些模态以进行精确的物体检测任务仍然是一个尚未解决的问题。为了解决这个问题,我们提出了一个名为MultiStream Detection (MuStD)网络,它精心从两种数据模态中提取任务相关信息。该网络采用三个流结构。其LiDAR-PillarNet流从LiDAR输入中提取稀疏的2D柱特征,而LiDAR-Height Compression流计算鸟瞰视图特征。另外,一个3D多模态流使用UV映射和极坐标索引结合RGB和LiDAR特征。最终,包含全面空间、纹理和几何信息的特征被仔细融合并输入到一个用于3D物体检测的检测头中。我们在具有挑战性的KITTI物体检测基准上进行了广泛评估,使用公共测试服务器证实了我们的方法的有效性,取得了不同类别的新的最先进或高度竞争的结果,并保持了高效方法之间的位置。我们的代码将通过MuStD GitHub存储库发布。
更新时间: 2025-07-25 14:20:16
领域: cs.CV,cs.AI
Multistream Network for LiDAR and Camera-based 3D Object Detection in Outdoor Scenes
Fusion of LiDAR and RGB data has the potential to enhance outdoor 3D object detection accuracy. To address real-world challenges in outdoor 3D object detection, fusion of LiDAR and RGB input has started gaining traction. However, effective integration of these modalities for precise object detection task still remains a largely open problem. To address that, we propose a MultiStream Detection (MuStD) network, that meticulously extracts task-relevant information from both data modalities. The network follows a three-stream structure. Its LiDAR-PillarNet stream extracts sparse 2D pillar features from the LiDAR input while the LiDAR-Height Compression stream computes Bird's-Eye View features. An additional 3D Multimodal stream combines RGB and LiDAR features using UV mapping and polar coordinate indexing. Eventually, the features containing comprehensive spatial, textural and geometric information are carefully fused and fed to a detection head for 3D object detection. Our extensive evaluation on the challenging KITTI Object Detection Benchmark using public testing server at https://www.cvlibs.net/datasets/kitti/eval_object_detail.php?&result=d162ec699d6992040e34314d19ab7f5c217075e0 establishes the efficacy of our method by achieving new state-of-the-art or highly competitive results in different categories while remaining among the most efficient methods. Our code will be released through MuStD GitHub repository at https://github.com/IbrahimUWA/MuStD.git
Updated: 2025-07-25 14:20:16
标题: 室外场景中基于LiDAR和摄像头的多流网络用于3D目标检测
摘要: LiDAR和RGB数据的融合有潜力提高室外3D目标检测的准确性。为了解决室外3D目标检测中的现实挑战,LiDAR和RGB输入的融合开始受到关注。然而,有效地整合这些模态以进行精确的目标检测任务仍然是一个较为开放的问题。为了解决这个问题,我们提出了一个名为MultiStream Detection(MuStD)网络,精细提取来自两种数据模态的任务相关信息。该网络遵循三个流的结构。其LiDAR-PillarNet流从LiDAR输入中提取稀疏的2D柱特征,而LiDAR-Height Compression流计算鸟瞰图特征。另外一个3D多模态流结合RGB和LiDAR特征使用UV映射和极坐标索引。最终,包含全面的空间、纹理和几何信息的特征被仔细融合并馈送到用于3D目标检测的检测头。我们在具有挑战性的KITTI目标检测基准上进行了广泛评估,使用公共测试服务器证实了我们的方法的有效性,取得了在不同类别中的最新最先进或高度竞争的结果,同时仍然保持在最高效的方法之间。我们的代码将通过MuStD GitHub仓库发布。
更新时间: 2025-07-25 14:20:16
领域: cs.CV,cs.AI
Negative news posts are less prevalent and generate lower user engagement than non-negative news posts across six countries
Although news negativity is often studied, missing is comparative evidence on the prevalence of and engagement with negative political and non-political news posts on social media. We use 6,081,134 Facebook posts published between January 1, 2020, and April 1, 2024, by 97 media organizations in six countries (U.S., UK, Ireland, Poland, France, Spain) and develop two multilingual classifiers for labeling posts as (non-)political and (non-)negative. We show that: (1) negative news posts constitute a relatively small fraction (12.6%); (2) political news posts are neither more nor less negative than non-political news posts; (3) U.S. political news posts are less negative relative to the other countries on average (40% lower odds); (4) Negative news posts get 15% fewer likes and 13% fewer comments than non-negative news posts. Lastly, (5) we provide estimates of the proportion of the total volume of user engagement with negative news posts and show that only between 10.2% to 13.1% of engagement is linked to negative posts by the analyzed news organizations.
Updated: 2025-07-25 14:14:19
标题: 负面新闻帖子在六个国家中较少见,且产生的用户参与度较低。
摘要: 尽管新闻的负面性经常被研究,但缺乏关于社交媒体上负面政治和非政治新闻帖子的普遍性和参与度的比较证据。我们使用了6,081,134条由六个国家(美国、英国、爱尔兰、波兰、法国、西班牙)的97家媒体组织在2020年1月1日至2024年4月1日期间发布的Facebook帖子,并开发了两个多语言分类器,将帖子标记为(非)政治和(非)负面。我们表明:(1)负面新闻帖子构成相对较小的比例(12.6%);(2)政治新闻帖子与非政治新闻帖子既不更负面也不更积极;(3)美国政治新闻帖子相对于其他国家平均更少负面(几率降低40%);(4)负面新闻帖子获得的点赞数少15%,评论数少13%。最后,(5)我们提供了用户与负面新闻帖子的总体参与量的比例估计,并显示仅有10.2%至13.1%的参与与分析的新闻组织发布的负面帖子有关。
更新时间: 2025-07-25 14:14:19
领域: cs.SI,cs.LG
Controlling Topological Defects in Polar Fluids via Reinforcement Learning
Topological defects in active polar fluids exhibit complex dynamics driven by internally generated stresses, reflecting the deep interplay between topology, flow, and non-equilibrium hydrodynamics. Feedback control offers a powerful means to guide such systems, enabling transitions between dynamic states. We investigated closed-loop steering of integer-charged defects in a confined active fluid by modulating the spatial profile of activity. Using a continuum hydrodynamic model, we show that localized control of active stress induces flow fields that can reposition and direct defects along prescribed trajectories by exploiting non-linear couplings in the system. A reinforcement learning framework is used to discover effective control strategies that produce robust defect transport across both trained and novel trajectories. The results highlight how AI agents can learn the underlying dynamics and spatially structure activity to manipulate topological excitations, offering insights into the controllability of active matter and the design of adaptive, self-organized materials.
Updated: 2025-07-25 14:12:11
标题: 通过强化学习控制极性流体中的拓扑缺陷
摘要: 主动极性流体中的拓扑缺陷展现出复杂的动态特性,受内部产生的应力驱动,反映了拓扑、流动和非平衡流体力学之间的深层关系。反馈控制为引导这种系统提供了强大的手段,使其能够在动态状态之间转换。我们通过调节活性空间分布的方式,研究了受限主动流体中整数电荷缺陷的闭环控制。使用连续流体动力学模型,我们展示了对活动应力进行局部控制会诱发流场,从而通过利用系统中的非线性耦合,重新定位和引导缺陷沿着指定的轨迹移动。采用强化学习框架来发现有效的控制策略,实现了缺陷在训练和新颖轨迹上的稳健传输。结果突显了AI代理可以学习基础动态并空间结构化活动,以操纵拓扑激发,为活性物质的可控性和自适应、自组织材料的设计提供了见解。
更新时间: 2025-07-25 14:12:11
领域: cond-mat.soft,cs.AI,cs.LG
Controlling Topological Defects in Polar Fluids via Reinforcement Learning
Topological defects in active polar fluids exhibit complex dynamics driven by internally generated stresses, reflecting the deep interplay between topology, flow, and non-equilibrium hydrodynamics. Feedback control offers a powerful means to guide such systems, enabling transitions between dynamic states. We investigated closed-loop steering of integer-charged defects in a confined active fluid by modulating the spatial profile of activity. Using a continuum hydrodynamic model, we show that localized control of active stress induces flow fields that can reposition and direct defects along prescribed trajectories by exploiting non-linear couplings in the system. A reinforcement learning framework is used to discover effective control strategies that produce robust defect transport across both trained and novel trajectories. The results highlight how AI agents can learn the underlying dynamics and spatially structure activity to manipulate topological excitations, offering insights into the controllability of active matter and the design of adaptive, self-organized materials.
Updated: 2025-07-25 14:12:11
标题: 通过强化学习控制极性流体中的拓扑缺陷
摘要: 主动极性流体中的拓扑缺陷表现出由内部产生的应力驱动的复杂动态,反映了拓扑、流动和非平衡流体动力学之间的深层相互作用。反馈控制为引导这种系统提供了强大的手段,使动态状态之间能够发生转变。我们通过调节活动空间配置文件,研究了在受限活动流体中整数电荷缺陷的闭环转向。使用连续流体动力学模型,我们展示了对活动应力的局部控制引起的流场可以通过利用系统中的非线性耦合,重新定位和引导缺陷沿着规定的轨迹移动。采用强化学习框架来发现有效的控制策略,以产生穿越训练和新颖轨迹的稳健缺陷运输。结果突显了AI代理如何学习基础动态和空间结构活动以操纵拓扑激发,为活动物质的可控性和自适应、自组织材料的设计提供了见解。
更新时间: 2025-07-25 14:12:11
领域: cond-mat.soft,cs.AI,cs.LG
On the Security of a Code-Based PIR Scheme
Private Information Retrieval (PIR) schemes allow clients to retrieve files from a database without disclosing the requested file's identity to the server. In the pursuit of post-quantum security, most recent PIR schemes rely on hard lattice problems. In contrast, the so called CB-cPIR scheme stands out as a pioneering effort to base PIR schemes on hard problems in coding theory, thereby contributing significantly to the diversification of security foundations. However, our research reveals a critical vulnerability in CB-cPIR, substantially diminishing its security levels. Moreover, a comparative analysis with state-of-the-art PIR schemes shows that CB-cPIR's advantages are reduced, making it less competitive in terms of the communication cost. Nevertheless, our findings highlight the importance of continued research into code-based PIR schemes, as they have the potential to provide a valuable alternative to lattice-based approaches.
Updated: 2025-07-25 14:12:00
标题: 关于基于编码的PIR方案的安全性
摘要: 私人信息检索(PIR)方案允许客户端从数据库中检索文件,而不向服务器披露所请求文件的身份。为了追求量子后安全性,最近的大多数PIR方案依赖于困难的格问题。相比之下,所谓的CB-cPIR方案以在编码理论中的困难问题作为基础,因此在安全基础的多样化方面做出了重大贡献。然而,我们的研究揭示了CB-cPIR中的一个重要漏洞,显著降低了其安全级别。此外,与最先进的PIR方案进行比较分析显示,CB-cPIR的优势被削弱,使其在通信成本方面不那么具有竞争力。尽管如此,我们的发现强调了继续研究基于代码的PIR方案的重要性,因为它们有潜力提供一个有价值的替代方案,与基于格的方法相比。
更新时间: 2025-07-25 14:12:00
领域: cs.CR,cs.IR
Interpretable Cross-Sphere Multiscale Deep Learning Predicts ENSO Skilfully Beyond 2 Years
El Ni\~no-Southern Oscillation (ENSO) exerts global climate and societal impacts, but real-time prediction with lead times beyond one year remains challenging. Dynamical models suffer from large biases and uncertainties, while deep learning struggles with interpretability and multi-scale dynamics. Here, we introduce PTSTnet, an interpretable model that unifies dynamical processes and cross-scale spatiotemporal learning in an innovative neural-network framework with physics-encoding learning. PTSTnet produces interpretable predictions significantly outperforming state-of-the-art benchmarks with lead times beyond 24 months, providing physical insights into error propagation in ocean-atmosphere interactions. PTSTnet learns feature representations with physical consistency from sparse data to tackle inherent multi-scale and multi-physics challenges underlying ocean-atmosphere processes, thereby inherently enhancing long-term prediction skill. Our successful realizations mark substantial steps forward in interpretable insights into innovative neural ocean modelling.
Updated: 2025-07-25 14:11:49
标题: 可解释的跨球体多尺度深度学习在超过2年的时限内熟练预测厄尔尼诺现象
摘要: El Ni\~no-Southern Oscillation (ENSO)对全球气候和社会产生影响,但实时预测超过一年的前瞻性仍然具有挑战性。动力模型存在较大的偏差和不确定性,而深度学习则面临解释性和多尺度动态的困难。在这里,我们介绍了PTSTnet,这是一个可解释的模型,它将动力过程和跨尺度时空学习统一到一个创新的神经网络框架中,具有物理编码学习。PTSTnet产生了可解释的预测结果,明显优于现有的基准模型,在超过24个月的前瞻期内,提供了关于海气相互作用中误差传播的物理洞见。PTSTnet从稀疏数据中学习具有物理一致性的特征表示,以应对底层海气过程中固有的多尺度和多物理挑战,从而在本质上提高了长期预测的能力。我们的成功实现标志着在创新的神经海洋建模中获得可解释洞见的重要进展。
更新时间: 2025-07-25 14:11:49
领域: physics.ao-ph,cs.LG
Query Efficient Structured Matrix Learning
We study the problem of learning a structured approximation (low-rank, sparse, banded, etc.) to an unknown matrix $A$ given access to matrix-vector product (matvec) queries of the form $x \rightarrow Ax$ and $x \rightarrow A^Tx$. This problem is of central importance to algorithms across scientific computing and machine learning, with applications to fast multiplication and inversion for structured matrices, building preconditioners for first-order optimization, and as a model for differential operator learning. Prior work focuses on obtaining query complexity upper and lower bounds for learning specific structured matrix families that commonly arise in applications. We initiate the study of the problem in greater generality, aiming to understand the query complexity of learning approximations from general matrix families. Our main result focuses on finding a near-optimal approximation to $A$ from any finite-sized family of matrices, $\mathcal{F}$. Standard results from matrix sketching show that $O(\log|\mathcal{F}|)$ matvec queries suffice in this setting. This bound can also be achieved, and is optimal, for vector-matrix-vector queries of the form $x,y\rightarrow x^TAy$, which have been widely studied in work on rank-$1$ matrix sensing. Surprisingly, we show that, in the matvec model, it is possible to obtain a nearly quadratic improvement in complexity, to $\tilde{O}(\sqrt{\log|\mathcal{F}|})$. Further, we prove that this bound is tight up to log-log factors.Via covering number arguments, our result extends to well-studied infinite families. As an example, we establish that a near-optimal approximation from any \emph{linear matrix family} of dimension $q$ can be learned with $\tilde{O}(\sqrt{q})$ matvec queries, improving on an $O(q)$ bound achievable via sketching techniques and vector-matrix-vector queries.
Updated: 2025-07-25 14:04:20
标题: 高效查询的结构化矩阵学习
摘要: 我们研究了学习一个未知矩阵$A$的结构化近似(低秩、稀疏、带状等)的问题,在访问形式为$x \rightarrow Ax$和$x \rightarrow A^Tx$的矩阵-向量乘积(matvec)查询的情况下。这个问题对于科学计算和机器学习中的算法至关重要,具有快速乘法和逆矩阵的应用,构建用于一阶优化的预处理器,以及作为微分算子学习的模型。先前的工作侧重于获得学习常见应用中出现的特定结构化矩阵族的查询复杂度上下界。 我们在更广泛的范围内开始研究这个问题,旨在了解从一般矩阵族中学习近似的查询复杂度。我们的主要结果集中在找到从任意有限大小矩阵族$\mathcal{F}$中对$A$的近似的近乎最优解。矩阵素描的标准结果表明,在这种情况下,$O(\log|\mathcal{F}|)$个matvec查询就足够了。这个界限也可以通过矢量-矩阵-矢量查询$x,y \rightarrow x^TAy$来实现,并且在研究秩为1的矩阵感知的工作中得到了广泛研究。 令人惊讶的是,我们发现,在matvec模型中,可以实现复杂度近乎二次的改进,达到$\tilde{O}(\sqrt{\log|\mathcal{F}|})$。此外,我们证明了这一界限在对数对数因子上是紧密的。通过覆盖数论的论证,我们的结果扩展到了研究充分的无限族。举例来说,我们建立了从任意维度为$q$的\emph{线性矩阵族}中学习近乎最优近似可以通过$\tilde{O}(\sqrt{q})$个matvec查询来实现,这比通过素描技术和矢量-矩阵-矢量查询实现的$O(q)$界限有所提高。
更新时间: 2025-07-25 14:04:20
领域: cs.DS,cs.LG,cs.NA,math.NA
All in One: Visual-Description-Guided Unified Point Cloud Segmentation
Unified segmentation of 3D point clouds is crucial for scene understanding, but is hindered by its sparse structure, limited annotations, and the challenge of distinguishing fine-grained object classes in complex environments. Existing methods often struggle to capture rich semantic and contextual information due to limited supervision and a lack of diverse multimodal cues, leading to suboptimal differentiation of classes and instances. To address these challenges, we propose VDG-Uni3DSeg, a novel framework that integrates pre-trained vision-language models (e.g., CLIP) and large language models (LLMs) to enhance 3D segmentation. By leveraging LLM-generated textual descriptions and reference images from the internet, our method incorporates rich multimodal cues, facilitating fine-grained class and instance separation. We further design a Semantic-Visual Contrastive Loss to align point features with multimodal queries and a Spatial Enhanced Module to model scene-wide relationships efficiently. Operating within a closed-set paradigm that utilizes multimodal knowledge generated offline, VDG-Uni3DSeg achieves state-of-the-art results in semantic, instance, and panoptic segmentation, offering a scalable and practical solution for 3D understanding. Our code is available at https://github.com/Hanzy1996/VDG-Uni3DSeg.
Updated: 2025-07-25 14:03:22
标题: 一个都有:视觉描述引导的统一点云分割
摘要: 三维点云的统一分割对于场景理解至关重要,但受到其稀疏结构、有限注释和在复杂环境中区分细粒度对象类的挑战的阻碍。现有方法往往难以捕捉丰富的语义和上下文信息,因为受限于有限的监督和缺乏多样性的多模态线索,导致类别和实例的区分不够理想。为了解决这些挑战,我们提出了VDG-Uni3DSeg,这是一个集成了预训练视觉语言模型(例如CLIP)和大型语言模型(LLMs)以增强3D分割的新框架。通过利用LLM生成的文本描述和来自互联网的参考图像,我们的方法融合了丰富的多模态线索,有助于实现细粒度类别和实例的分离。我们进一步设计了一个语义-视觉对比损失来对齐点特征和多模态查询,以及一个空间增强模块来有效地建模全局关系。在利用离线生成的多模态知识的封闭集范式下操作,VDG-Uni3DSeg在语义、实例和全景分割方面实现了最先进的结果,为三维理解提供了可扩展和实用的解决方案。我们的代码可在https://github.com/Hanzy1996/VDG-Uni3DSeg找到。
更新时间: 2025-07-25 14:03:22
领域: cs.CV,cs.AI
All in One: Visual-Description-Guided Unified Point Cloud Segmentation
Unified segmentation of 3D point clouds is crucial for scene understanding, but is hindered by its sparse structure, limited annotations, and the challenge of distinguishing fine-grained object classes in complex environments. Existing methods often struggle to capture rich semantic and contextual information due to limited supervision and a lack of diverse multimodal cues, leading to suboptimal differentiation of classes and instances. To address these challenges, we propose VDG-Uni3DSeg, a novel framework that integrates pre-trained vision-language models (e.g., CLIP) and large language models (LLMs) to enhance 3D segmentation. By leveraging LLM-generated textual descriptions and reference images from the internet, our method incorporates rich multimodal cues, facilitating fine-grained class and instance separation. We further design a Semantic-Visual Contrastive Loss to align point features with multimodal queries and a Spatial Enhanced Module to model scene-wide relationships efficiently. Operating within a closed-set paradigm that utilizes multimodal knowledge generated offline, VDG-Uni3DSeg achieves state-of-the-art results in semantic, instance, and panoptic segmentation, offering a scalable and practical solution for 3D understanding. Our code is available at https://github.com/Hanzy1996/VDG-Uni3DSeg.
Updated: 2025-07-25 14:03:22
标题: 一体化:视觉描述引导的统一点云分割
摘要: 三维点云的统一分割对于场景理解至关重要,但由于其稀疏结构、有限的注释以及在复杂环境中区分细粒度对象类别的挑战,受到了阻碍。现有方法往往难以捕捉丰富的语义和上下文信息,因为监督有限且缺乏多样化的多模态线索,导致类别和实例的区分不足。为了解决这些挑战,我们提出了VDG-Uni3DSeg,这是一个集成了预训练视觉语言模型(例如CLIP)和大型语言模型(LLMs)以增强三维分割的新框架。通过利用LLM生成的文本描述和来自互联网的参考图像,我们的方法融合了丰富的多模态线索,促进了细粒度类别和实例的分离。我们进一步设计了语义-视觉对比损失,以将点特征与多模态查询对齐,以及空间增强模块,以有效地建模整个场景的关系。VDG-Uni3DSeg在利用离线生成的多模态知识的封闭集范式下,实现了语义、实例和全景分割的最先进结果,为三维理解提供了可扩展且实用的解决方案。我们的代码可在https://github.com/Hanzy1996/VDG-Uni3DSeg 上找到。
更新时间: 2025-07-25 14:03:22
领域: cs.CV,cs.AI
Towards LLM-Enhanced Group Recommender Systems
In contrast to single-user recommender systems, group recommender systems are designed to generate and explain recommendations for groups. This group-oriented setting introduces additional complexities, as several factors - absent in individual contexts - must be addressed. These include understanding group dynamics (e.g., social dependencies within the group), defining effective decision-making processes, ensuring that recommendations are suitable for all group members, and providing group-level explanations as well as explanations for individual users. In this paper, we analyze in which way large language models (LLMs) can support these aspects and help to increase the overall decision support quality and applicability of group recommender systems.
Updated: 2025-07-25 13:59:54
标题: 朝着增强型LLM的群组推荐系统
摘要: 与单用户推荐系统相比,群体推荐系统旨在为群体生成和解释推荐。这种以群体为导向的设置引入了额外的复杂性,因为必须解决一些在个人环境中不存在的因素。这些因素包括理解群体动态(例如,群体内的社会依赖关系),定义有效的决策过程,确保推荐适合所有群体成员,并提供群体级别的解释以及个人用户的解释。在本文中,我们分析了大型语言模型(LLMs)如何支持这些方面,并帮助提高群体推荐系统的整体决策支持质量和适用性。
更新时间: 2025-07-25 13:59:54
领域: cs.IR,cs.AI
Towards LLM-Enhanced Group Recommender Systems
In contrast to single-user recommender systems, group recommender systems are designed to generate and explain recommendations for groups. This group-oriented setting introduces additional complexities, as several factors - absent in individual contexts - must be addressed. These include understanding group dynamics (e.g., social dependencies within the group), defining effective decision-making processes, ensuring that recommendations are suitable for all group members, and providing group-level explanations as well as explanations for individual users. In this paper, we analyze in which way large language models (LLMs) can support these aspects and help to increase the overall decision support quality and applicability of group recommender systems.
Updated: 2025-07-25 13:59:54
标题: 朝向LLM增强型群组推荐系统
摘要: 与单用户推荐系统相比,群组推荐系统旨在为群体生成和解释推荐。这种以群体为导向的设置引入了额外的复杂性,因为在个人环境中不存在的几个因素必须得到解决。这些因素包括理解群体动态(例如,群体内的社会依赖关系),定义有效的决策过程,确保推荐适合所有群体成员,并提供群体级别的解释以及个别用户的解释。在本文中,我们分析了大型语言模型(LLMs)如何支持这些方面,并帮助提高群组推荐系统的整体决策支持质量和适用性。
更新时间: 2025-07-25 13:59:54
领域: cs.IR,cs.AI
Latent Space Analysis for Melanoma Prevention
Melanoma represents a critical health risk due to its aggressive progression and high mortality, underscoring the need for early, interpretable diagnostic tools. While deep learning has advanced in skin lesion classification, most existing models provide only binary outputs, offering limited clinical insight. This work introduces a novel approach that extends beyond classification, enabling interpretable risk modelling through a Conditional Variational Autoencoder. The proposed method learns a structured latent space that captures semantic relationships among lesions, allowing for a nuanced, continuous assessment of morphological differences. An SVM is also trained on this representation effectively differentiating between benign nevi and melanomas, demonstrating strong and consistent performance. More importantly, the learned latent space supports visual and geometric interpretation of malignancy, with the spatial proximity of a lesion to known melanomas serving as a meaningful indicator of risk. This approach bridges predictive performance with clinical applicability, fostering early detection, highlighting ambiguous cases, and enhancing trust in AI-assisted diagnosis through transparent and interpretable decision-making.
Updated: 2025-07-25 13:54:49
标题: 潜在空间分析用于黑色素瘤预防
摘要: 黑色素瘤由于其侵略性进展和高死亡率而代表了一个重要的健康风险,强调了对早期、易解释的诊断工具的需求。虽然深度学习在皮肤病变分类方面取得了进展,但大多数现有模型仅提供二元输出,提供有限的临床见解。本文介绍了一种新颖的方法,超越了分类,通过条件变分自编码器实现可解释的风险建模。所提出的方法学习了一个结构化的潜在空间,捕捉了病变之间的语义关系,从而允许对形态差异进行微妙、连续的评估。一个支持向量机(SVM)也在这个表示上进行了训练,有效区分了良性痣和黑色素瘤,展现了强大且一致的性能。更重要的是,学习到的潜在空间支持恶性的视觉和几何解释,一个病变与已知黑色素瘤之间的空间接近性作为风险的有意义指标。这种方法将预测性能与临床适用性联系起来,促进早期发现,强调模糊案例,并通过透明且可解释的决策制定增强对AI辅助诊断的信任。
更新时间: 2025-07-25 13:54:49
领域: cs.CV,cs.AI
Latent Space Analysis for Melanoma Prevention
Melanoma represents a critical health risk due to its aggressive progression and high mortality, underscoring the need for early, interpretable diagnostic tools. While deep learning has advanced in skin lesion classification, most existing models provide only binary outputs, offering limited clinical insight. This work introduces a novel approach that extends beyond classification, enabling interpretable risk modelling through a Conditional Variational Autoencoder. The proposed method learns a structured latent space that captures semantic relationships among lesions, allowing for a nuanced, continuous assessment of morphological differences. An SVM is also trained on this representation effectively differentiating between benign nevi and melanomas, demonstrating strong and consistent performance. More importantly, the learned latent space supports visual and geometric interpretation of malignancy, with the spatial proximity of a lesion to known melanomas serving as a meaningful indicator of risk. This approach bridges predictive performance with clinical applicability, fostering early detection, highlighting ambiguous cases, and enhancing trust in AI-assisted diagnosis through transparent and interpretable decision-making.
Updated: 2025-07-25 13:54:49
标题: 潜在空间分析用于黑色素瘤预防
摘要: 黑色素瘤由于其侵袭性进展和高死亡率而代表了一个重要的健康风险,强调了对早期、可解释的诊断工具的需求。虽然深度学习在皮肤病变分类方面取得了进展,但大多数现有模型仅提供二进制输出,仅提供有限的临床见解。本研究介绍了一种超越分类的新方法,通过条件变分自动编码器实现可解释的风险建模。所提出的方法学习了一个结构化的潜在空间,捕捉了病变之间的语义关系,允许对形态上的差异进行微妙、连续的评估。同时,一个支持向量机也在该表示上进行了训练,有效区分了良性痣和黑色素瘤,表现出强大且一致的性能。更重要的是,学习到的潜在空间支持恶性肿瘤的视觉和几何解释,病变与已知黑色素瘤的空间接近性作为风险的有意义指标。这种方法将预测性能与临床适用性相结合,促进早期检测,突出模糊病例,并通过透明和可解释的决策制定增强对AI辅助诊断的信任。
更新时间: 2025-07-25 13:54:49
领域: cs.CV,cs.AI
Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects
Code review is essential for maintaining software quality but often time-consuming and cognitively demanding, especially in industrial environments. Recent advancements in language models (LMs) have opened new avenues for automating core review tasks. This study presents the empirical evaluation of monolingual fine-tuning on the performance of open-source LMs across three key automated code review tasks: Code Change Quality Estimation, Review Comment Generation, and Code Refinement. We fine-tuned three distinct models, CodeReviewer, CodeLlama-7B, and DeepSeek-R1-Distill, on a C\# specific dataset combining public benchmarks with industrial repositories. Our study investigates how different configurations of programming languages and natural languages in the training data affect LM performance, particularly in comment generation. Additionally, we benchmark the fine-tuned models against an automated software analysis tool (ASAT) and human reviewers to evaluate their practical utility in real-world settings. Our results show that monolingual fine-tuning improves model accuracy and relevance compared to multilingual baselines. While LMs can effectively support code review workflows, especially for routine or repetitive tasks, human reviewers remain superior in handling semantically complex or context-sensitive changes. Our findings highlight the importance of language alignment and task-specific adaptation in optimizing LMs for automated code review.
Updated: 2025-07-25 13:49:24
标题: Fine-Tuning多语言语言模型以进行代码审查:工业C#项目的实证研究
摘要: 代码审查对于维护软件质量至关重要,但在工业环境中往往耗时且认知负荷大。最近语言模型(LMs)的进展为自动化核心审查任务开辟了新的途径。本研究通过单语调优对开源LMs在三个关键的自动化代码审查任务中的性能进行了实证评估:代码变更质量估计、审查评论生成和代码细化。我们在一个特定的C\#数据集上对三个不同的模型,CodeReviewer、CodeLlama-7B和DeepSeek-R1-Distill进行了调优,结合了公共基准和工业存储库。我们的研究调查了训练数据中编程语言和自然语言的不同配置如何影响LM性能,尤其是在评论生成方面。此外,我们将调优模型与自动软件分析工具(ASAT)和人工审阅员进行了基准测试,以评估它们在实际环境中的实用性。我们的结果显示,与多语言基准相比,单语调优提高了模型的准确性和相关性。虽然LMs可以有效支持代码审查工作流程,尤其是对于例行或重复性任务,但在处理语义复杂或与上下文相关的更改方面,人工审阅员仍然优于LMs。我们的研究结果突显了语言对齐和任务特定适应在为自动化代码审查优化LMs方面的重要性。
更新时间: 2025-07-25 13:49:24
领域: cs.SE,cs.AI,cs.PL
Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects
Code review is essential for maintaining software quality but often time-consuming and cognitively demanding, especially in industrial environments. Recent advancements in language models (LMs) have opened new avenues for automating core review tasks. This study presents the empirical evaluation of monolingual fine-tuning on the performance of open-source LMs across three key automated code review tasks: Code Change Quality Estimation, Review Comment Generation, and Code Refinement. We fine-tuned three distinct models, CodeReviewer, CodeLlama-7B, and DeepSeek-R1-Distill, on a C\# specific dataset combining public benchmarks with industrial repositories. Our study investigates how different configurations of programming languages and natural languages in the training data affect LM performance, particularly in comment generation. Additionally, we benchmark the fine-tuned models against an automated software analysis tool (ASAT) and human reviewers to evaluate their practical utility in real-world settings. Our results show that monolingual fine-tuning improves model accuracy and relevance compared to multilingual baselines. While LMs can effectively support code review workflows, especially for routine or repetitive tasks, human reviewers remain superior in handling semantically complex or context-sensitive changes. Our findings highlight the importance of language alignment and task-specific adaptation in optimizing LMs for automated code review.
Updated: 2025-07-25 13:49:24
标题: Fine-Tuning多语言语言模型用于代码审查:工业C#项目的实证研究
摘要: 代码审查对于保持软件质量至关重要,但往往耗时且认知负荷大,尤其是在工业环境中。最近语言模型(LMs)的进展为自动化核心审查任务开辟了新的途径。本研究对单语言微调对开源LMs在三个关键的自动化代码审查任务(代码变更质量估计、审查评论生成和代码细化)性能的经验评估进行了介绍。我们在一个结合了公共基准和工业存储库的C#特定数据集上对三个不同的模型(CodeReviewer、CodeLlama-7B和DeepSeek-R1-Distill)进行了微调。我们研究了在训练数据中不同编程语言和自然语言配置如何影响LM性能,尤其是在评论生成方面。此外,我们将微调后的模型与自动化软件分析工具(ASAT)和人类审阅员进行了基准测试,以评估它们在真实世界环境中的实用性。我们的结果显示,与多语言基线相比,单语言微调改进了模型的准确性和相关性。虽然LMs可以有效支持代码审查工作流程,尤其是对于日常或重复性任务,但人类审阅员在处理语义复杂或上下文敏感的变化方面仍然更为优越。我们的发现强调了语言对齐和任务特定适应在优化LMs用于自动化代码审查中的重要性。
更新时间: 2025-07-25 13:49:24
领域: cs.SE,cs.AI,cs.PL
Modeling Uncertainty: Constraint-Based Belief States in Imperfect-Information Games
In imperfect-information games, agents must make decisions based on partial knowledge of the game state. The Belief Stochastic Game model addresses this challenge by delegating state estimation to the game model itself. This allows agents to operate on externally provided belief states, thereby reducing the need for game-specific inference logic. This paper investigates two approaches to represent beliefs in games with hidden piece identities: a constraint-based model using Constraint Satisfaction Problems and a probabilistic extension using Belief Propagation to estimate marginal probabilities. We evaluated the impact of both representations using general-purpose agents across two different games. Our findings indicate that constraint-based beliefs yield results comparable to those of probabilistic inference, with minimal differences in agent performance. This suggests that constraint-based belief states alone may suffice for effective decision-making in many settings.
Updated: 2025-07-25 13:38:44
标题: 建模不确定性:在信息不完全游戏中基于约束的信念状态
摘要: 在信息不完全的游戏中,代理必须根据对游戏状态的部分了解做出决策。信念随机游戏模型通过将状态估计委托给游戏模型本身来解决这一挑战。这使代理能够在外部提供的信念状态上操作,从而减少了对特定游戏推理逻辑的需求。本文研究了在具有隐藏的棋子身份的游戏中代表信念的两种方法:使用约束满足问题的基于约束的模型和使用信念传播的概率扩展来估计边际概率。我们使用通用代理评估了这两种表示法对两种不同游戏的影响。我们的研究结果表明,基于约束的信念产生了与概率推理相当的结果,代理性能之间几乎没有什么差异。这表明,基于约束的信念状态本身可能足以在许多情况下支持有效的决策制定。
更新时间: 2025-07-25 13:38:44
领域: cs.AI
Modeling Uncertainty: Constraint-Based Belief States in Imperfect-Information Games
In imperfect-information games, agents must make decisions based on partial knowledge of the game state. The Belief Stochastic Game model addresses this challenge by delegating state estimation to the game model itself. This allows agents to operate on externally provided belief states, thereby reducing the need for game-specific inference logic. This paper investigates two approaches to represent beliefs in games with hidden piece identities: a constraint-based model using Constraint Satisfaction Problems and a probabilistic extension using Belief Propagation to estimate marginal probabilities. We evaluated the impact of both representations using general-purpose agents across two different games. Our findings indicate that constraint-based beliefs yield results comparable to those of probabilistic inference, with minimal differences in agent performance. This suggests that constraint-based belief states alone may suffice for effective decision-making in many settings.
Updated: 2025-07-25 13:38:44
标题: 建模不确定性:基于约束的信念状态在不完全信息游戏中的应用
摘要: 在信息不完全的游戏中,代理必须根据对游戏状态的部分了解做出决策。信念随机博弈模型通过将状态估计委托给游戏模型本身来解决这一挑战。这使代理能够操作外部提供的信念状态,从而减少对特定于游戏的推理逻辑的需求。本文研究了在具有隐藏的棋子身份的游戏中表示信念的两种方法:使用约束满足问题的基于约束的模型和使用信念传播来估计边际概率的概率扩展。我们使用两种不同游戏上的通用代理评估了这两种表示的影响。我们的研究结果表明,基于约束的信念产生了与概率推理相当的结果,代理性能几乎没有差异。这表明,在许多情况下,仅仅基于约束的信念状态可能足以进行有效的决策。
更新时间: 2025-07-25 13:38:44
领域: cs.AI
Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
The increasing adoption of Artificial Intelligence (AI) has led to larger, more complex models with numerous parameters that require substantial computing power -- resources often unavailable in many real-world application scenarios. Our paper addresses this challenge by introducing knowledge grafting, a novel mechanism that optimizes AI models for resource-constrained environments by transferring selected features (the scion) from a large donor model to a smaller rootstock model. The approach achieves an 88.54% reduction in model size (from 64.39 MB to 7.38 MB), while improving generalization capability of the model. Our new rootstock model achieves 89.97% validation accuracy (vs. donor's 87.47%), maintains lower validation loss (0.2976 vs. 0.5068), and performs exceptionally well on unseen test data with 90.45% accuracy. It addresses the typical size vs performance trade-off, and enables deployment of AI frameworks on resource-constrained devices with enhanced performance. We have tested our approach on an agricultural weed detection scenario, however, it can be extended across various edge computing scenarios, potentially accelerating AI adoption in areas with limited hardware/software support -- by mirroring in a similar manner the horticultural grafting enables productive cultivation in challenging agri-based environments.
Updated: 2025-07-25 13:37:45
标题: 知识嫁接:在资源受限环境中优化AI模型部署的机制
摘要: 人工智能(AI)的不断普及导致了更大、更复杂的模型,具有大量参数,需要大量的计算资源--这些资源在许多实际应用场景中通常不可用。本文通过引入知识嫁接,一种新颖的机制,来解决这一挑战,通过将选定的特征(嫁接体)从大型供体模型转移到较小的砧木模型,从而优化资源受限环境下的AI模型。该方法实现了模型尺寸的88.54%减小(从64.39 MB减小到7.38 MB),同时提高了模型的泛化能力。我们的新砧木模型实现了89.97%的验证准确率(供体为87.47%),保持较低的验证损失(0.2976比0.5068),并在未见测试数据上表现出色,准确率达到90.45%。它解决了尺寸与性能之间的典型平衡,实现了AI框架在资源受限设备上的部署,性能得到提升。我们在农业杂草检测场景上测试了我们的方法,然而,它可以在各种边缘计算场景中推广,潜在地加速在硬件/软件支持有限的领域中采用AI--通过类似的方式,园艺嫁接使得在具有挑战性的农业环境中进行生产性栽培成为可能。
更新时间: 2025-07-25 13:37:45
领域: cs.AI,cs.LG,cs.PF
Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
The increasing adoption of Artificial Intelligence (AI) has led to larger, more complex models with numerous parameters that require substantial computing power -- resources often unavailable in many real-world application scenarios. Our paper addresses this challenge by introducing knowledge grafting, a novel mechanism that optimizes AI models for resource-constrained environments by transferring selected features (the scion) from a large donor model to a smaller rootstock model. The approach achieves an 88.54% reduction in model size (from 64.39 MB to 7.38 MB), while improving generalization capability of the model. Our new rootstock model achieves 89.97% validation accuracy (vs. donor's 87.47%), maintains lower validation loss (0.2976 vs. 0.5068), and performs exceptionally well on unseen test data with 90.45% accuracy. It addresses the typical size vs performance trade-off, and enables deployment of AI frameworks on resource-constrained devices with enhanced performance. We have tested our approach on an agricultural weed detection scenario, however, it can be extended across various edge computing scenarios, potentially accelerating AI adoption in areas with limited hardware/software support -- by mirroring in a similar manner the horticultural grafting enables productive cultivation in challenging agri-based environments.
Updated: 2025-07-25 13:37:45
标题: 知识嫁接:在资源有限环境中优化AI模型部署的机制
摘要: 人工智能(AI)的不断普及导致了更大、更复杂的模型,具有大量参数,需要大量的计算资源--这些资源在许多现实世界应用场景中通常不可用。本文通过引入知识嫁接,一种新颖的机制,来解决这一挑战,通过将选择的特征(嫁接)从一个大型捐赠模型转移到一个较小的砧木模型,来优化AI模型,使其适用于资源受限环境。该方法实现了模型大小的88.54%减少(从64.39 MB减少到7.38 MB),同时提高了模型的泛化能力。我们的新砧木模型实现了89.97%的验证准确度(与捐赠者的87.47%相比),保持了更低的验证损失(0.2976 vs. 0.5068),并在未见的测试数据上表现出色,准确率达到90.45%。它解决了模型大小与性能之间的典型权衡,并增强了在资源受限设备上部署AI框架的性能。我们已在农业杂草检测场景中测试了我们的方法,然而,它可以在各种边缘计算场景中推广,潜在地加速在硬件/软件支持有限的领域中采用AI--通过类似的方式模拟园艺嫁接,实现在具有挑战性的农业环境中的生产性栽培。
更新时间: 2025-07-25 13:37:45
领域: cs.AI,cs.LG,cs.PF
Reactivation: Empirical NTK Dynamics Under Task Shifts
The Neural Tangent Kernel (NTK) offers a powerful tool to study the functional dynamics of neural networks. In the so-called lazy, or kernel regime, the NTK remains static during training and the network function is linear in the static neural tangents feature space. The evolution of the NTK during training is necessary for feature learning, a key driver of deep learning success. The study of the NTK dynamics has led to several critical discoveries in recent years, in generalization and scaling behaviours. However, this body of work has been limited to the single task setting, where the data distribution is assumed constant over time. In this work, we present a comprehensive empirical analysis of NTK dynamics in continual learning, where the data distribution shifts over time. Our findings highlight continual learning as a rich and underutilized testbed for probing the dynamics of neural training. At the same time, they challenge the validity of static-kernel approximations in theoretical treatments of continual learning, even at large scale.
Updated: 2025-07-25 13:33:48
标题: 重新激活:任务转移下的经验NTK动态
摘要: 神经切向核(NTK)为研究神经网络的功能动态提供了强大的工具。在所谓的懒惰或核区域中,NTK在训练过程中保持静态,网络函数在静态神经切向特征空间中是线性的。NTK在训练过程中的演变对于特征学习至关重要,这是深度学习成功的关键驱动因素。NTK动态的研究在近年来在泛化和扩展行为方面取得了几项关键发现。然而,这项工作一直局限于单任务设置,其中数据分布被假定随时间恒定。在这项工作中,我们提出了对连续学习中NTK动态的全面实证分析,在连续学习中,数据分布随时间变化。我们的发现突显了连续学习作为探究神经训练动态的丰富而未充分利用的试验平台。与此同时,它们质疑了在理论处理中静态核逼近在大规模连续学习中的有效性。
更新时间: 2025-07-25 13:33:48
领域: cs.LG,cs.AI
Reactivation: Empirical NTK Dynamics Under Task Shifts
The Neural Tangent Kernel (NTK) offers a powerful tool to study the functional dynamics of neural networks. In the so-called lazy, or kernel regime, the NTK remains static during training and the network function is linear in the static neural tangents feature space. The evolution of the NTK during training is necessary for feature learning, a key driver of deep learning success. The study of the NTK dynamics has led to several critical discoveries in recent years, in generalization and scaling behaviours. However, this body of work has been limited to the single task setting, where the data distribution is assumed constant over time. In this work, we present a comprehensive empirical analysis of NTK dynamics in continual learning, where the data distribution shifts over time. Our findings highlight continual learning as a rich and underutilized testbed for probing the dynamics of neural training. At the same time, they challenge the validity of static-kernel approximations in theoretical treatments of continual learning, even at large scale.
Updated: 2025-07-25 13:33:48
标题: 重新激活:任务转移下的经验性 NTK 动态
摘要: 神经切向核(NTK)为研究神经网络的功能动态提供了强大的工具。在所谓的懒惰或核区域,NTK在训练过程中保持静态,网络函数在静态神经切向特征空间中是线性的。训练过程中NTK的演变对于特征学习是必要的,这是深度学习成功的关键驱动因素。NTK动态的研究在近年来在泛化和扩展行为方面取得了一些关键发现。然而,这些工作仅限于单一任务设置,其中数据分布被假定为随时间保持恒定。在本研究中,我们提出了对持续学习中NTK动态的全面实证分析,其中数据分布随时间发生变化。我们的研究结果突出了持续学习作为探究神经训练动态的丰富而未充分利用的实验平台。与此同时,它们挑战了在理论上对持续学习的静态核近似的有效性,即使在大规模情况下也是如此。
更新时间: 2025-07-25 13:33:48
领域: cs.LG,cs.AI
SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models
Recent advances in vision-language navigation (VLN) were mainly attributed to emerging large language models (LLMs). These methods exhibited excellent generalization capabilities in instruction understanding and task reasoning. However, they were constrained by the fixed knowledge bases and reasoning abilities of LLMs, preventing fully incorporating experiential knowledge and thus resulting in a lack of efficient evolutionary capacity. To address this, we drew inspiration from the evolution capabilities of natural agents, and proposed a self-evolving VLN framework (SE-VLN) to endow VLN agents with the ability to continuously evolve during testing. To the best of our knowledge, it was the first time that an multimodal LLM-powered self-evolving VLN framework was proposed. Specifically, SE-VLN comprised three core modules, i.e., a hierarchical memory module to transfer successful and failure cases into reusable knowledge, a retrieval-augmented thought-based reasoning module to retrieve experience and enable multi-step decision-making, and a reflection module to realize continual evolution. Comprehensive tests illustrated that the SE-VLN achieved navigation success rates of 57% and 35.2% in unseen environments, representing absolute performance improvements of 23.9% and 15.0% over current state-of-the-art methods on R2R and REVERSE datasets, respectively. Moreover, the SE-VLN showed performance improvement with increasing experience repository, elucidating its great potential as a self-evolving agent framework for VLN.
Updated: 2025-07-25 13:28:55
标题: SE-VLN:基于多模态大型语言模型的自进化视觉语言导航框架
摘要: 最近在视觉-语言导航(VLN)方面取得的进展主要归因于新兴的大型语言模型(LLMs)。这些方法展现出在指导理解和任务推理方面的出色泛化能力。然而,它们受限于LLMs的固定知识库和推理能力,无法充分整合经验知识,因此导致缺乏有效的演化能力。为了解决这个问题,我们从自然代理的演化能力中汲取灵感,提出了一个自进化的VLN框架(SE-VLN),赋予VLN代理在测试过程中持续演化的能力。据我们所知,这是第一次提出了一个由多模态LLM驱动的自进化VLN框架。具体而言,SE-VLN包括三个核心模块,即一个层级记忆模块,将成功和失败案例转化为可重复利用的知识,一个检索增强的基于思考的推理模块,检索经验并实现多步决策,以及一个反思模块,实现持续演化。综合测试表明,SE-VLN在未知环境中实现了57%和35.2%的导航成功率,分别比R2R和REVERSE数据集上当前最先进方法的绝对性能提升了23.9%和15.0%。此外,随着经验存储库的增加,SE-VLN表现出性能提升,阐明了其作为VLN自进化代理框架的巨大潜力。
更新时间: 2025-07-25 13:28:55
领域: cs.CV,cs.AI,cs.RO
SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models
Recent advances in vision-language navigation (VLN) were mainly attributed to emerging large language models (LLMs). These methods exhibited excellent generalization capabilities in instruction understanding and task reasoning. However, they were constrained by the fixed knowledge bases and reasoning abilities of LLMs, preventing fully incorporating experiential knowledge and thus resulting in a lack of efficient evolutionary capacity. To address this, we drew inspiration from the evolution capabilities of natural agents, and proposed a self-evolving VLN framework (SE-VLN) to endow VLN agents with the ability to continuously evolve during testing. To the best of our knowledge, it was the first time that an multimodal LLM-powered self-evolving VLN framework was proposed. Specifically, SE-VLN comprised three core modules, i.e., a hierarchical memory module to transfer successful and failure cases into reusable knowledge, a retrieval-augmented thought-based reasoning module to retrieve experience and enable multi-step decision-making, and a reflection module to realize continual evolution. Comprehensive tests illustrated that the SE-VLN achieved navigation success rates of 57% and 35.2% in unseen environments, representing absolute performance improvements of 23.9% and 15.0% over current state-of-the-art methods on R2R and REVERSE datasets, respectively. Moreover, the SE-VLN showed performance improvement with increasing experience repository, elucidating its great potential as a self-evolving agent framework for VLN.
Updated: 2025-07-25 13:28:55
标题: SE-VLN:基于多模态大语言模型的自适应视觉-语言导航框架
摘要: 最近在视觉语言导航(VLN)领域的进展主要归因于新兴的大型语言模型(LLMs)。这些方法在指令理解和任务推理方面展现出出色的泛化能力。然而,它们受限于LLMs的固定知识库和推理能力,无法充分整合经验知识,因此导致缺乏高效的演化能力。为了解决这个问题,我们从自然实体的进化能力中汲取灵感,提出了一个自我进化的VLN框架(SE-VLN),赋予VLN实体在测试过程中持续演化的能力。据我们所知,这是第一次提出了一个多模态LLM驱动的自我进化的VLN框架。具体来说,SE-VLN包括三个核心模块,即层次化记忆模块将成功和失败案例转化为可重复利用的知识,具有检索增强的基于思考的推理模块来检索经验并实现多步决策,以及一个反思模块来实现持续演化。全面的测试表明,SE-VLN在未知环境中实现了57%和35.2%的导航成功率,在R2R和REVERSE数据集上分别比当前最先进的方法提高了23.9%和15.0%的绝对性能。此外,随着经验知识库的增加,SE-VLN表现出性能提升,显示了作为VLN自我进化代理框架的巨大潜力。
更新时间: 2025-07-25 13:28:55
领域: cs.CV,cs.AI,cs.RO
Delphos: A reinforcement learning framework for assisting discrete choice model specification
We introduce Delphos, a deep reinforcement learning framework for assisting the discrete choice model specification process. Unlike traditional approaches that treat model specification as a static optimisation problem, Delphos represents a paradigm shift: it frames this specification challenge as a sequential decision-making problem, formalised as a Markov Decision Process. In this setting, an agent learns to specify well-performing model candidates by choosing a sequence of modelling actions - such as selecting variables, accommodating both generic and alternative-specific taste parameters, applying non-linear transformations, and including interactions with covariates - and interacting with a modelling environment that estimates each candidate and returns a reward signal. Specifically, Delphos uses a Deep Q-Network that receives delayed rewards based on modelling outcomes (e.g., log-likelihood) and behavioural expectations (e.g., parameter signs), and distributes rewards across the sequence of actions to learn which modelling decisions lead to well-performing candidates. We evaluate Delphos on both simulated and empirical datasets, varying the size of the modelling space and the reward function. To assess the agent's performance in navigating the model space, we analyse the learning curve, the distribution of Q-values, occupancy metrics, and Pareto fronts. Our results show that the agent learns to adaptively explore strategies to identify well-performing models across search spaces, even without prior domain knowledge. It efficiently explores large modelling spaces, concentrates its search in high-reward regions, and suggests candidates that define Pareto frontiers balancing model fit and behavioural plausibility. These findings highlight the potential of this novel adaptive, learning-based framework to assist in the model specification process.
Updated: 2025-07-25 13:23:22
标题: Delphos:一个强化学习框架,用于辅助离散选择模型规范化
摘要: 我们介绍了Delphos,这是一个用于辅助离散选择模型规范过程的深度强化学习框架。与传统方法不同,传统方法将模型规范视为一个静态优化问题,Delphos代表了一种范式转变:它将这一规范挑战框定为一个序贯决策问题,形式化为马尔可夫决策过程。在这种情况下,一个代理通过选择一系列建模行为 - 如选择变量,包含通用和特定选择参数,应用非线性变换,并与协变量进行交互 - 并与建模环境交互,该环境估计每个候选模型并返回奖励信号来学习规范良好表现的模型候选。具体而言,Delphos使用一个深度Q网络,根据建模结果(例如对数似然)和行为预期(例如参数符号)接收延迟奖励,并通过一系列行为分配奖励来学习哪些建模决策导致表现良好的候选模型。我们在模拟和实证数据集上评估了Delphos,变化建模空间的大小和奖励函数。为了评估代理在导航模型空间中的表现,我们分析了学习曲线,Q值的分布,占用度指标和帕累托前沿。我们的结果显示,即使没有先前的领域知识,代理也学会了自适应地探索策略,以在搜索空间中识别表现良好的模型。它高效地探索大型建模空间,在高奖励区域集中搜索,并建议定义平衡模型拟合度和行为合理性的帕累托前沿的候选模型。这些发现突显了这种新型自适应、基于学习的框架在协助模型规范过程中的潜力。
更新时间: 2025-07-25 13:23:22
领域: econ.GN,cs.LG,q-fin.EC
Enhancing Generalization of Spiking Neural Networks Through Temporal Regularization
Spiking Neural Networks (SNNs) have received widespread attention due to their event-driven and low-power characteristics, making them particularly effective for processing event-based neuromorphic data. Recent studies have shown that directly trained SNNs suffer from severe overfitting issues due to the limited scale of neuromorphic datasets and the gradient mismatching problem, which fundamentally constrain their generalization performance. In this paper, we propose a temporal regularization training (TRT) method by introducing a time-dependent regularization mechanism to enforce stronger constraints on early timesteps. We compare the performance of TRT with other state-of-the-art methods performance on datasets including CIFAR10/100, ImageNet100, DVS-CIFAR10, and N-Caltech101. To validate the effectiveness of TRT, we conducted ablation studies and analyses including loss landscape visualization and learning curve analysis, demonstrating that TRT can effectively mitigate overfitting and flatten the training loss landscape, thereby enhancing generalizability. Furthermore, we establish a theoretical interpretation of TRT's temporal regularization mechanism based on the results of Fisher information analysis. We analyze the temporal information dynamics inside SNNs by tracking Fisher information during the TRT training process, revealing the Temporal Information Concentration (TIC) phenomenon, where Fisher information progressively concentrates in early timesteps. The time-decaying regularization mechanism implemented in TRT effectively guides the network to learn robust features in early timesteps with rich information, thereby leading to significant improvements in model generalization. Code is available at https://anonymous.4open.science/r/TRT-7FBFUYT4E.
Updated: 2025-07-25 13:18:36
标题: 通过时间正则化增强脉冲神经网络的泛化能力
摘要: 脉冲神经网络(SNNs)因其事件驱动和低功耗特性而受到广泛关注,使其特别适用于处理基于事件的神经形态数据。最近的研究表明,直接训练的SNNs由于神经形态数据集规模有限和梯度不匹配问题,导致严重的过拟合问题,从根本上限制了它们的泛化性能。在本文中,我们提出了一种时间正则化训练(TRT)方法,通过引入一个时间相关的正则化机制,对早期时间步施加更强的约束。我们将TRT的性能与其他最先进的方法在包括CIFAR10/100、ImageNet100、DVS-CIFAR10和N-Caltech101在内的数据集上进行了比较。为了验证TRT的有效性,我们进行了消融研究和分析,包括损失景观可视化和学习曲线分析,证明了TRT可以有效减轻过拟合并使训练损失景观变得平坦,从而增强了泛化能力。此外,我们根据Fisher信息分析的结果,建立了对TRT的时间正则化机制的理论解释。我们通过跟踪TRT训练过程中的Fisher信息,分析了SNN内部的时间信息动态,揭示了时间信息集中(TIC)现象,即Fisher信息逐渐集中在早期时间步。TRT中实施的时间衰减正则化机制有效地引导网络在富含信息的早期时间步学习稳健特征,从而显著提高了模型的泛化能力。代码可在https://anonymous.4open.science/r/TRT-7FBFUYT4E获取。
更新时间: 2025-07-25 13:18:36
领域: cs.NE,cs.AI
Enhancing Generalization of Spiking Neural Networks Through Temporal Regularization
Spiking Neural Networks (SNNs) have received widespread attention due to their event-driven and low-power characteristics, making them particularly effective for processing event-based neuromorphic data. Recent studies have shown that directly trained SNNs suffer from severe overfitting issues due to the limited scale of neuromorphic datasets and the gradient mismatching problem, which fundamentally constrain their generalization performance. In this paper, we propose a temporal regularization training (TRT) method by introducing a time-dependent regularization mechanism to enforce stronger constraints on early timesteps. We compare the performance of TRT with other state-of-the-art methods performance on datasets including CIFAR10/100, ImageNet100, DVS-CIFAR10, and N-Caltech101. To validate the effectiveness of TRT, we conducted ablation studies and analyses including loss landscape visualization and learning curve analysis, demonstrating that TRT can effectively mitigate overfitting and flatten the training loss landscape, thereby enhancing generalizability. Furthermore, we establish a theoretical interpretation of TRT's temporal regularization mechanism based on the results of Fisher information analysis. We analyze the temporal information dynamics inside SNNs by tracking Fisher information during the TRT training process, revealing the Temporal Information Concentration (TIC) phenomenon, where Fisher information progressively concentrates in early timesteps. The time-decaying regularization mechanism implemented in TRT effectively guides the network to learn robust features in early timesteps with rich information, thereby leading to significant improvements in model generalization. Code is available at https://anonymous.4open.science/r/TRT-7FBFUYT4E.
Updated: 2025-07-25 13:18:36
标题: 通过时间正则化增强脉冲神经网络的泛化能力
摘要: 脉冲神经网络(SNNs)因其事件驱动和低功耗特性而受到广泛关注,使其特别适用于处理基于事件的神经形态数据。最近的研究表明,直接训练的SNNs由于神经形态数据集规模有限和梯度不匹配问题而导致严重的过拟合问题,从根本上限制了它们的泛化性能。在本文中,我们提出了一种时间正则化训练(TRT)方法,通过引入一个依赖时间的正则化机制来对早期时间步施加更强的约束。我们将TRT的性能与其他最先进的方法在包括CIFAR10/100、ImageNet100、DVS-CIFAR10和N-Caltech101在内的数据集上进行了比较。为验证TRT的有效性,我们进行了消融研究和分析,包括损失景观可视化和学习曲线分析,表明TRT可以有效减轻过拟合并使训练损失景观平坦化,从而增强泛化能力。此外,我们基于Fisher信息分析的结果建立了TRT的时间正则化机制的理论解释。我们通过跟踪TRT训练过程中的Fisher信息来分析SNNs内部的时间信息动态,揭示了时间信息集中(TIC)现象,即Fisher信息逐渐集中在早期时间步。TRT中实施的时间衰减正则化机制有效地引导网络在早期时间步学习富含信息的稳健特征,从而显著提高了模型的泛化性能。代码可在https://anonymous.4open.science/r/TRT-7FBFUYT4E找到。
更新时间: 2025-07-25 13:18:36
领域: cs.NE,cs.AI
A Markov Categorical Framework for Language Modeling
Auto-regressive language models factorize sequence probabilities and are trained by minimizing the negative log-likelihood (NLL) objective. While empirically powerful, a deep theoretical understanding of why this simple objective yields such versatile representations remains elusive. This work introduces a unifying analytical framework using Markov Categories (MCs) to deconstruct the AR generation process and the NLL objective. We model the single-step generation map as a composition of Markov kernels in the category Stoch. This compositional view, when enriched with statistical divergences, allows us to dissect information flow and learned geometry. Our framework makes three main contributions. First, we provide a formal, information-theoretic rationale for the success of modern speculative decoding methods like EAGLE, quantifying the information surplus in hidden states that these methods exploit. Second, we formalize how NLL minimization forces the model to learn not just the next token, but the data's intrinsic conditional stochasticity, a process we analyze using categorical entropy. Third, and most centrally, we prove that NLL training acts as an implicit form of spectral contrastive learning. By analyzing the information geometry of the model's prediction head, we show that NLL implicitly forces the learned representation space to align with the eigenspectrum of a predictive similarity operator, thereby learning a geometrically structured space without explicit contrastive pairs. This compositional and information-geometric perspective reveals the deep structural principles underlying the effectiveness of modern LMs. Project Page: https://github.com/asiresearch/lm-theory
Updated: 2025-07-25 13:14:03
标题: 一个马尔科夫分类框架用于语言建模
摘要: 自回归语言模型分解序列概率,并通过最小化负对数似然(NLL)目标进行训练。尽管在实证上强大,但对于为什么这个简单的目标会产生如此多才多艺的表示,深入的理论理解仍然令人困惑。本工作引入了一个统一的分析框架,使用马尔科夫范畴(MCs)来解构AR生成过程和NLL目标。我们将单步生成映射建模为Stoch类别中马尔科夫核的组合。这种组合视角,当结合统计分歧时,可以让我们解剖信息流和学习几何。我们的框架做出了三个主要贡献。首先,我们为现代推测解码方法(如EAGLE)成功的信息理论基础提供了正式的理由,量化了这些方法利用隐藏状态中的信息剩余。其次,我们形式化了NLL最小化如何迫使模型学习不仅仅是下一个标记,而且是数据的内在条件随机性,我们使用分类熵对此过程进行了分析。第三,最重要的是,我们证明NLL训练作为一种隐式形式的谱对比学习。通过分析模型预测头部的信息几何,我们展示了NLL隐式地强迫学习表示空间与预测相似性算子的特征谱保持一致,从而学习一个几何结构化的空间,而无需明确的对比对。这种组合和信息几何视角揭示了现代LM的有效性背后的深层结构原则。项目页面:https://github.com/asiresearch/lm-theory
更新时间: 2025-07-25 13:14:03
领域: cs.LG,cs.AI,cs.CL
A Markov Categorical Framework for Language Modeling
Auto-regressive language models factorize sequence probabilities and are trained by minimizing the negative log-likelihood (NLL) objective. While empirically powerful, a deep theoretical understanding of why this simple objective yields such versatile representations remains elusive. This work introduces a unifying analytical framework using Markov Categories (MCs) to deconstruct the AR generation process and the NLL objective. We model the single-step generation map as a composition of Markov kernels in the category Stoch. This compositional view, when enriched with statistical divergences, allows us to dissect information flow and learned geometry. Our framework makes three main contributions. First, we provide a formal, information-theoretic rationale for the success of modern speculative decoding methods like EAGLE, quantifying the information surplus in hidden states that these methods exploit. Second, we formalize how NLL minimization forces the model to learn not just the next token, but the data's intrinsic conditional stochasticity, a process we analyze using categorical entropy. Third, and most centrally, we prove that NLL training acts as an implicit form of spectral contrastive learning. By analyzing the information geometry of the model's prediction head, we show that NLL implicitly forces the learned representation space to align with the eigenspectrum of a predictive similarity operator, thereby learning a geometrically structured space without explicit contrastive pairs. This compositional and information-geometric perspective reveals the deep structural principles underlying the effectiveness of modern LMs. Project Page: https://github.com/asiresearch/lm-theory
Updated: 2025-07-25 13:14:03
标题: 一个马尔科夫分类框架用于语言建模
摘要: 自回归语言模型将序列概率分解,并通过最小化负对数似然(NLL)目标进行训练。尽管在实证上很强大,但对于为什么这个简单目标会产生如此多才多艺的表示仍然缺乏深入的理论理解。本文引入了一个统一的分析框架,使用马尔可夫类别(MCs)来解构AR生成过程和NLL目标。我们将单步生成映射建模为Stoch类别中马尔可夫核的组合。当丰富了统计分歧时,这种组合视图使我们能够剖析信息流和学习的几何结构。我们的框架提供了三个主要贡献。首先,我们为现代投机性解码方法(如EAGLE)成功的信息理论基础提供了正式的理由,量化了这些方法利用的隐藏状态中的信息剩余。其次,我们形式化了NLL最小化如何迫使模型学习不仅仅是下一个标记,还有数据的内在条件随机性,我们使用分类熵来分析这个过程。第三,最为核心的是,我们证明NLL训练作为一种隐式形式的谱对比学习。通过分析模型预测头部的信息几何学,我们展示了NLL隐式地强制学习的表示空间与预测相似性算子的特征谱对齐,从而学习了一个具有几何结构的空间,而无需明确的对比对。这种组合和信息几何学的视角揭示了现代LMs有效性的深层结构原理。项目页面:https://github.com/asiresearch/lm-theory
更新时间: 2025-07-25 13:14:03
领域: cs.LG,cs.AI,cs.CL
Transfinite Fixed Points in Alpay Algebra as Ordinal Game Equilibria in Dependent Type Theory
This paper contributes to the Alpay Algebra by demonstrating that the stable outcome of a self referential process, obtained by iterating a transformation through all ordinal stages, is identical to the unique equilibrium of an unbounded revision dialogue between a system and its environment. The analysis initially elucidates how classical fixed point theorems guarantee such convergence in finite settings and subsequently extends the argument to the transfinite domain, relying upon well founded induction and principles of order theoretic continuity. Furthermore, the resulting transordinal fixed point operator is embedded into dependent type theory, a formalization which permits every step of the transfinite iteration and its limit to be verified within a modern proof assistant. This procedure yields a machine checked proof that the iterative dialogue necessarily stabilizes and that its limit is unique. The result provides a foundation for Alpay's philosophical claim of semantic convergence within the framework of constructive logic. By unifying concepts from fixed point theory, game semantics, ordinal analysis, and type theory, this research establishes a broadly accessible yet formally rigorous foundation for reasoning about infinite self referential systems and offers practical tools for certifying their convergence within computational environments.
Updated: 2025-07-25 13:12:55
标题: Alpay代数中的超限不动点作为依赖类型理论中的序数游戏均衡
摘要: 本文通过展示,通过在所有顺序阶段上迭代一个转换获得的自我参照过程的稳定结果,与系统与其环境之间的一个无界修订对话的唯一均衡是相同的,为Alpay代数做出贡献。分析最初阐明了经典不动点定理如何在有限设置中保证这种收敛性,并随后将论点扩展到超越领域,依赖于良好的归纳和秩序论连续性原则。 此外,所得到的超序不动点操作符被嵌入依赖类型理论中,这是一种形式化,允许在现代证明助手中验证每一步的超越迭代及其极限。这一程序产生了一个机器检查的证明,即迭代对话必然稳定,并且其极限是唯一的。该结果为Alpay在建构逻辑框架内的语义收敛哲学主张提供了基础。通过统一固定点理论、游戏语义、顺序分析和类型理论的概念,这项研究建立了一个广泛可访问但形式严谨的推理基础,用于推理关于无限自我参照系统,并提供了在计算环境中验证其收敛性的实用工具。
更新时间: 2025-07-25 13:12:55
领域: cs.LO,cs.AI,68T27, 03B70, 68Q55
Transfinite Fixed Points in Alpay Algebra as Ordinal Game Equilibria in Dependent Type Theory
This paper contributes to the Alpay Algebra by demonstrating that the stable outcome of a self referential process, obtained by iterating a transformation through all ordinal stages, is identical to the unique equilibrium of an unbounded revision dialogue between a system and its environment. The analysis initially elucidates how classical fixed point theorems guarantee such convergence in finite settings and subsequently extends the argument to the transfinite domain, relying upon well founded induction and principles of order theoretic continuity. Furthermore, the resulting transordinal fixed point operator is embedded into dependent type theory, a formalization which permits every step of the transfinite iteration and its limit to be verified within a modern proof assistant. This procedure yields a machine checked proof that the iterative dialogue necessarily stabilizes and that its limit is unique. The result provides a foundation for Alpay's philosophical claim of semantic convergence within the framework of constructive logic. By unifying concepts from fixed point theory, game semantics, ordinal analysis, and type theory, this research establishes a broadly accessible yet formally rigorous foundation for reasoning about infinite self referential systems and offers practical tools for certifying their convergence within computational environments.
Updated: 2025-07-25 13:12:55
标题: 在依赖类型理论中,Alpay代数中的超限不动点作为序数博弈均衡
摘要: 本文通过展示自指过程的稳定结果,即通过在所有序数阶段上迭代变换获得的结果,与系统与其环境之间无限修订对话的唯一平衡点相同,为Alpay代数做出了贡献。分析首先阐明了如何在有限设置中通过经典不动点定理保证这种收敛性,并随后将论证扩展到超限领域,依赖于良好的归纳和序理论连续性原则。 此外,由此产生的超序数不动点算子被嵌入到依赖类型理论中,这是一种形式化,允许在现代证明助手中验证超限迭代的每一步及其极限。这一程序产生了一个经过机器检查的证明,即迭代对话必然稳定,并且其极限是唯一的。该结果为Alpay在建构逻辑框架内关于语义收敛的哲学主张提供了基础。通过统一固定点理论、博弈语义、序数分析和类型理论的概念,这项研究为推理无限自指系统建立了一个广泛可访问但形式严谨的基础,并为在计算环境中证明它们的收敛性提供了实用工具。
更新时间: 2025-07-25 13:12:55
领域: cs.LO,cs.AI,68T27, 03B70, 68Q55
Virne: A Comprehensive Benchmark for Deep RL-based Network Resource Allocation in NFV
Resource allocation (RA) is critical to efficient service deployment in Network Function Virtualization (NFV), a transformative networking paradigm. Recently, deep Reinforcement Learning (RL)-based methods have been showing promising potential to address this complexity. However, the lack of a systematic benchmarking framework and thorough analysis hinders the exploration of emerging networks and the development of more robust algorithms while causing inconsistent evaluation. In this paper, we introduce Virne, a comprehensive benchmarking framework for the NFV-RA problem, with a focus on supporting deep RL-based methods. Virne provides customizable simulations for diverse network scenarios, including cloud, edge, and 5G environments. It also features a modular and extensible implementation pipeline that supports over 30 methods of various types, and includes practical evaluation perspectives beyond effectiveness, such as scalability, generalization, and scalability. Furthermore, we conduct in-depth analysis through extensive experiments to provide valuable insights into performance trade-offs for efficient implementation and offer actionable guidance for future research directions. Overall, with its diverse simulations, rich implementations, and extensive evaluation capabilities, Virne could serve as a comprehensive benchmark for advancing NFV-RA methods and deep RL applications. The code is publicly available at https://github.com/GeminiLight/virne.
Updated: 2025-07-25 12:58:32
标题: Virne:NFV中基于深度强化学习的网络资源分配的全面基准
摘要: 资源分配(RA)对网络功能虚拟化(NFV)中高效服务部署至关重要,这是一种变革性的网络范式。最近,基于深度强化学习(RL)的方法显示出解决这种复杂性的潜力。然而,缺乏系统化的基准测试框架和彻底分析阻碍了对新兴网络的探索和更稳健算法的开发,同时导致评估不一致。在本文中,我们引入了Virne,一个针对NFV-RA问题的综合基准测试框架,重点支持基于深度RL的方法。Virne提供了可定制的模拟,涵盖各种网络场景,包括云端、边缘和5G环境。它还具有模块化和可扩展的实现管道,支持超过30种各种类型的方法,并包括实际评估视角,超出了有效性,如可扩展性、泛化性和可扩展性等。此外,我们通过广泛的实验进行深入分析,为高效实施提供有价值的见解,并为未来研究方向提供可行的指导。总的来说,凭借其多样的模拟、丰富的实现和广泛的评估能力,Virne可以作为推动NFV-RA方法和深度RL应用的综合基准。代码公开可用于https://github.com/GeminiLight/virne。
更新时间: 2025-07-25 12:58:32
领域: cs.NI,cs.AI
Virne: A Comprehensive Benchmark for Deep RL-based Network Resource Allocation in NFV
Resource allocation (RA) is critical to efficient service deployment in Network Function Virtualization (NFV), a transformative networking paradigm. Recently, deep Reinforcement Learning (RL)-based methods have been showing promising potential to address this complexity. However, the lack of a systematic benchmarking framework and thorough analysis hinders the exploration of emerging networks and the development of more robust algorithms while causing inconsistent evaluation. In this paper, we introduce Virne, a comprehensive benchmarking framework for the NFV-RA problem, with a focus on supporting deep RL-based methods. Virne provides customizable simulations for diverse network scenarios, including cloud, edge, and 5G environments. It also features a modular and extensible implementation pipeline that supports over 30 methods of various types, and includes practical evaluation perspectives beyond effectiveness, such as scalability, generalization, and scalability. Furthermore, we conduct in-depth analysis through extensive experiments to provide valuable insights into performance trade-offs for efficient implementation and offer actionable guidance for future research directions. Overall, with its diverse simulations, rich implementations, and extensive evaluation capabilities, Virne could serve as a comprehensive benchmark for advancing NFV-RA methods and deep RL applications. The code is publicly available at https://github.com/GeminiLight/virne.
Updated: 2025-07-25 12:58:32
标题: Virne:基于深度强化学习的NFV网络资源分配的全面基准
摘要: 资源分配(RA)对于网络功能虚拟化(NFV)中的高效服务部署至关重要,这是一种变革性的网络范式。最近,基于深度强化学习(RL)的方法显示出了解决这种复杂性的潜力。然而,缺乏系统化的基准测试框架和彻底的分析阻碍了对新兴网络的探索和更健壮算法的开发,同时导致评估不一致。在本文中,我们介绍了Virne,这是一个专门针对NFV-RA问题的全面基准测试框架,重点支持基于深度RL的方法。Virne提供了可定制的模拟,涵盖了多种网络场景,包括云、边缘和5G环境。它还具有模块化和可扩展的实现流程,支持超过30种不同类型的方法,并包括除有效性之外的实用评估视角,如可扩展性、泛化性和可扩展性。此外,我们通过广泛实验进行深入分析,提供有价值的见解,以便为高效实施提供性能权衡,并为未来研究方向提供可行的指导。总体而言,凭借其多样的模拟、丰富的实现和广泛的评估能力,Virne可以作为推进NFV-RA方法和深度RL应用的全面基准。代码可在https://github.com/GeminiLight/virne 上公开获取。
更新时间: 2025-07-25 12:58:32
领域: cs.NI,cs.AI
Component-Based Machine Learning for Indoor Flow and Temperature Fields Prediction Latent Feature Aggregation and Flow Interaction
Accurate and efficient prediction of indoor airflow and temperature distributions is essential for building energy optimization and occupant comfort control. However, traditional CFD simulations are computationally intensive, limiting their integration into real-time or design-iterative workflows. This study proposes a component-based machine learning (CBML) surrogate modeling approach to replace conventional CFD simulation for fast prediction of indoor velocity and temperature fields. The model consists of three neural networks: a convolutional autoencoder with residual connections (CAER) to extract and compress flow features, a multilayer perceptron (MLP) to map inlet velocities to latent representations, and a convolutional neural network (CNN) as an aggregator to combine single-inlet features into dual-inlet scenarios. A two-dimensional room with varying left and right air inlet velocities is used as a benchmark case, with CFD simulations providing training and testing data. Results show that the CBML model accurately and fast predicts two-component aggregated velocity and temperature fields across both training and testing datasets.
Updated: 2025-07-25 12:57:30
标题: 基于组件的机器学习用于室内流动和温度场预测:潜在特征聚合和流动相互作用
摘要: 室内空气流动和温度分布的准确和高效预测对建筑能源优化和居住者舒适控制至关重要。然而,传统的CFD模拟计算量大,限制了它们在实时或设计迭代工作流中的整合。本研究提出了一种基于组件的机器学习(CBML)替代传统CFD模拟的快速预测室内速度和温度场的方法。该模型由三个神经网络组成:一个带有残差连接的卷积自编码器(CAER)用于提取和压缩流动特征,一个多层感知器(MLP)将入口速度映射到潜在表示,以及一个卷积神经网络(CNN)作为聚合器,将单入口特征组合成双入口场景。一个具有不同左右进气速度的二维房间被用作基准案例,CFD模拟提供训练和测试数据。结果表明,CBML模型准确且快速地预测了训练和测试数据集中的两个组分聚合速度和温度场。
更新时间: 2025-07-25 12:57:30
领域: cs.LG,physics.flu-dyn
Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning
In inaccessible environments with uncertain task demands, robots often rely on general-purpose tools that lack predefined usage strategies. These tools are not tailored for particular operations, making their longevity highly sensitive to how they are used. This creates a fundamental challenge: how can a robot learn a tool-use policy that both completes the task and prolongs the tool's lifespan? In this work, we address this challenge by introducing a reinforcement learning (RL) framework that incorporates tool lifespan as a factor during policy optimization. Our framework leverages Finite Element Analysis (FEA) and Miner's Rule to estimate Remaining Useful Life (RUL) based on accumulated stress, and integrates the RUL into the RL reward to guide policy learning toward lifespan-guided behavior. To handle the fact that RUL can only be estimated after task execution, we introduce an Adaptive Reward Normalization (ARN) mechanism that dynamically adjusts reward scaling based on estimated RULs, ensuring stable learning signals. We validate our method across simulated and real-world tool use tasks, including Object-Moving and Door-Opening with multiple general-purpose tools. The learned policies consistently prolong tool lifespan (up to 8.01x in simulation) and transfer effectively to real-world settings, demonstrating the practical value of learning lifespan-guided tool use strategies.
Updated: 2025-07-25 12:48:57
标题: 延长工具寿命:通过寿命引导的强化学习学习通用工具的熟练使用
摘要: 在具有不确定任务需求的无法访问的环境中,机器人通常依赖缺乏预定义使用策略的通用工具。这些工具并不针对特定操作进行定制,使得它们的寿命极其依赖于如何使用。这带来了一个基本挑战:机器人如何学习一种既能完成任务又能延长工具寿命的使用策略?在这项工作中,我们通过引入一个强化学习(RL)框架来应对这个挑战,该框架在策略优化过程中将工具寿命作为一个因素。我们的框架利用有限元分析(FEA)和Miner's Rule来基于累积应力估算剩余有用寿命(RUL),并将RUL整合到RL奖励中,以引导策略学习朝向寿命导向行为。为了处理只能在任务执行后估算RUL的事实,我们引入了一种自适应奖励归一化(ARN)机制,根据估算的RUL动态调整奖励缩放,确保稳定的学习信号。我们验证了我们的方法在模拟和真实世界的工具使用任务中,包括使用多种通用工具进行物体移动和开门。学习到的策略能够持续延长工具寿命(在模拟中最多达到8.01倍),并有效地转移到现实世界环境,展示了学习寿命导向工具使用策略的实际价值。
更新时间: 2025-07-25 12:48:57
领域: cs.RO,cs.LG
Pilot Contamination-Aware Graph Attention Network for Power Control in CFmMIMO
Optimization-based power control algorithms are predominantly iterative with high computational complexity, making them impractical for real-time applications in cell-free massive multiple-input multiple-output (CFmMIMO) systems. Learning-based methods have emerged as a promising alternative, and among them, graph neural networks (GNNs) have demonstrated their excellent performance in solving power control problems. However, all existing GNN-based approaches assume ideal orthogonality among pilot sequences for user equipments (UEs), which is unrealistic given that the number of UEs exceeds the available orthogonal pilot sequences in CFmMIMO schemes. Moreover, most learning-based methods assume a fixed number of UEs, whereas the number of active UEs varies over time in practice. Additionally, supervised training necessitates costly computational resources for computing the target power control solutions for a large volume of training samples. To address these issues, we propose a graph attention network for downlink power control in CFmMIMO systems that operates in a self-supervised manner while effectively handling pilot contamination and adapting to a dynamic number of UEs. Experimental results show its effectiveness, even in comparison to the optimal accelerated projected gradient method as a baseline.
Updated: 2025-07-25 12:42:35
标题: CFmMIMO中的功率控制中的飞行污染感知图注意网络
摘要: 基于优化的功率控制算法主要是迭代的,具有高计算复杂度,使它们在无线电多信道多输入多输出系统中的实时应用变得不切实际。基于学习的方法已经成为一种有前途的替代方案,其中,图神经网络(GNNs)在解决功率控制问题上表现出色。然而,所有现有的基于GNN的方法都假设用户设备(UEs)的导频序列之间存在理想的正交性,这是不现实的,因为在CFmMIMO方案中UE的数量超过了可用的正交导频序列。此外,大多数基于学习的方法都假设UE的数量是固定的,而在实际应用中,活跃UE的数量会随时间变化。此外,监督式训练需要耗费大量计算资源来计算大量训练样本的目标功率控制解决方案。为了解决这些问题,我们提出了一种用于CFmMIMO系统中下行功率控制的图注意力网络,该网络以自我监督的方式运行,有效处理导频污染并适应动态UE的数量。实验结果表明,即使与作为基线的最优加速投影梯度方法相比,其效果也很好。
更新时间: 2025-07-25 12:42:35
领域: cs.LG
How Much Do Large Language Model Cheat on Evaluation? Benchmarking Overestimation under the One-Time-Pad-Based Framework
Overestimation in evaluating large language models (LLMs) has become an increasing concern. Due to the contamination of public benchmarks or imbalanced model training, LLMs may achieve unreal evaluation results on public benchmarks, either intentionally or unintentionally, which leads to unfair comparisons among LLMs and undermines their realistic capability assessments. Existing benchmarks attempt to address these issues by keeping test cases permanently secret, mitigating contamination through human evaluation, or repeatedly collecting and constructing new samples. However, these approaches fail to ensure reproducibility, transparency, and high efficiency simultaneously. Moreover, the extent of overestimation in current LLMs remains unquantified. To address these issues, we propose ArxivRoll, a dynamic evaluation framework inspired by one-time pad encryption in cryptography. ArxivRoll comprises two key components: \emph{i) SCP (Sequencing, Cloze, and Prediction)}, an automated generator for private test cases, and \emph{ii) Rugged Scores (RS)}, metrics that measure the proportion of public benchmark contamination and training bias. Leveraging SCP, ArxivRoll constructs a new benchmark every six months using recent articles from ArXiv and employs them for one-time evaluations of LLM performance. Extensive experiments demonstrate the high quality of our benchmark, and we provide a systematic evaluation of current LLMs. The source code is available at https://github.com/liangzid/ArxivRoll/.
Updated: 2025-07-25 12:39:03
标题: 大型语言模型在评估中作弊多少?在基于一次性密码本的框架下进行基准测试。
摘要: 在评估大型语言模型(LLMs)时出现的过高估计已成为一个日益关注的问题。由于公共基准的污染或不平衡的模型训练,LLMs可能会在公共基准上获得不真实的评估结果,无论是有意还是无意地,这导致了LLMs之间的不公平比较,并削弱了它们的现实能力评估。现有的基准尝试通过保持测试用例永久保密,通过人工评估减轻污染,或者反复收集和构建新样本来解决这些问题。然而,这些方法不能同时确保可重现性、透明度和高效性。此外,当前LLMs中过高估计的程度尚未得到量化。为了解决这些问题,我们提出了ArxivRoll,这是一个受密码学中的一次性密码加密启发的动态评估框架。ArxivRoll包括两个关键组件:i)SCP(Sequencing、Cloze和Prediction),用于生成私有测试用例的自动生成器,以及ii)Rugged Scores(RS),用于衡量公共基准污染和训练偏差的比例。利用SCP,ArxivRoll每六个月使用来自ArXiv的最新文章构建一个新的基准,并将它们用于LLM性能的一次性评估。大量实验证明了我们基准的高质量,并提供了对当前LLMs的系统评估。源代码可在https://github.com/liangzid/ArxivRoll/获取。
更新时间: 2025-07-25 12:39:03
领域: cs.CL,cs.CR
Dependency-aware synthetic tabular data generation
Synthetic tabular data is increasingly used in privacy-sensitive domains such as health care, but existing generative models often fail to preserve inter-attribute relationships. In particular, functional dependencies (FDs) and logical dependencies (LDs), which capture deterministic and rule-based associations between features, are rarely or often poorly retained in synthetic datasets. To address this research gap, we propose the Hierarchical Feature Generation Framework (HFGF) for synthetic tabular data generation. We created benchmark datasets with known dependencies to evaluate our proposed HFGF. The framework first generates independent features using any standard generative model, and then reconstructs dependent features based on predefined FD and LD rules. Our experiments on four benchmark datasets with varying sizes, feature imbalance, and dependency complexity demonstrate that HFGF improves the preservation of FDs and LDs across six generative models, including CTGAN, TVAE, and GReaT. Our findings demonstrate that HFGF can significantly enhance the structural fidelity and downstream utility of synthetic tabular data.
Updated: 2025-07-25 12:29:58
标题: 依赖感知的合成表格数据生成
摘要: 合成表格数据在隐私敏感领域(如医疗保健)中的应用越来越广泛,但现有的生成模型通常无法保留属性之间的关系。特别是,功能依赖(FDs)和逻辑依赖(LDs)很少或经常无法在合成数据集中有效保留,这些依赖关系捕捉了特征之间的确定性和基于规则的关联。为了填补这一研究空白,我们提出了用于合成表格数据生成的Hierarchical Feature Generation Framework(HFGF)。我们创建了具有已知依赖关系的基准数据集,以评估我们提出的HFGF。该框架首先使用任何标准的生成模型生成独立特征,然后根据预定义的FD和LD规则重建依赖特征。我们在四个基准数据集上进行实验,这些数据集具有不同的大小、特征不平衡和依赖关系复杂度,结果表明HFGF改善了六种生成模型(包括CTGAN、TVAE和GReaT)对FDs和LDs的保留。我们的研究结果表明,HFGF可以显著增强合成表格数据的结构保真度和下游实用性。
更新时间: 2025-07-25 12:29:58
领域: cs.LG
Secret Collusion among AI Agents: Multi-Agent Deception via Steganography
Recent capability increases in large language models (LLMs) open up applications in which groups of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models.
Updated: 2025-07-25 12:28:15
标题: AI代理之间的秘密勾结:通过隐写术实现的多代理欺骗
摘要: 最近大型语言模型(LLMs)的能力增强开启了一系列应用场景,其中一组通信的生成式人工智能代理共同解决任务。这引发了关于未经授权分享信息或其他不希望的代理协调形式的隐私和安全挑战。现代隐写术技术可以使这种动态难以被检测到。在本文中,我们通过借鉴人工智能和安全文献中的相关概念,全面形式化了生成式人工智能代理系统中的秘密勾结问题。我们研究了使用隐写术的激励,并提出了各种缓解措施。我们的调查结果产生了一个模型评估框架,系统地测试了各种形式的秘密勾结所需的能力。我们提供了一系列当代LLMs的广泛实证结果。虽然当前模型的隐写能力仍然有限,但GPT-4展现出了一个能力跳跃,表明需要持续监控隐写术前沿模型的能力。最后,我们总结了一个全面的研究计划,以减轻未来生成式人工智能模型之间勾结的风险。
更新时间: 2025-07-25 12:28:15
领域: cs.AI,cs.CR
Secret Collusion among AI Agents: Multi-Agent Deception via Steganography
Recent capability increases in large language models (LLMs) open up applications in which groups of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models.
Updated: 2025-07-25 12:28:15
标题: AI代理之间的秘密勾结:通过隐写术实现的多智能体欺骗
摘要: 最近,大型语言模型(LLMs)的能力增强开启了一系列应用,其中一组通信生成式人工智能代理共同解决任务。这引发了关于未经授权分享信息或其他不需要的代理协调形式的隐私和安全挑战。现代隐写术技术可以使这种动态难以被检测。在本文中,我们通过借鉴人工智能和安全文献中相关概念,全面形式化了生成式人工智能代理系统中秘密勾结的问题。我们研究了使用隐写术的激励,并提出了各种缓解措施。我们的研究结果在系统地测试了各种形式的秘密勾结所需的能力的模型评估框架。我们提供了跨多种当代LLMs的广泛实证结果。尽管当前模型的隐写能力仍有限,但GPT-4展示了一个能力跃升,表明有必要持续监测隐写边界模型的能力。我们最后总结了一项全面的研究计划,以减轻未来生成式人工智能模型之间勾结的风险。
更新时间: 2025-07-25 12:28:15
领域: cs.AI,cs.CR
Physics-Informed Graph Neural Networks for Transverse Momentum Estimation in CMS Trigger Systems
Real-time particle transverse momentum ($p_T$) estimation in high-energy physics demands algorithms that are both efficient and accurate under strict hardware constraints. Static machine learning models degrade under high pileup and lack physics-aware optimization, while generic graph neural networks (GNNs) often neglect domain structure critical for robust $p_T$ regression. We propose a physics-informed GNN framework that systematically encodes detector geometry and physical observables through four distinct graph construction strategies that systematically encode detector geometry and physical observables: station-as-node, feature-as-node, bending angle-centric, and pseudorapidity ($\eta$)-centric representations. This framework integrates these tailored graph structures with a novel Message Passing Layer (MPL), featuring intra-message attention and gated updates, and domain-specific loss functions incorporating $p_{T}$-distribution priors. Our co-design methodology yields superior accuracy-efficiency trade-offs compared to existing baselines. Extensive experiments on the CMS Trigger Dataset validate the approach: a station-informed EdgeConv model achieves a state-of-the-art MAE of 0.8525 with $\ge55\%$ fewer parameters than deep learning baselines, especially TabNet, while an $\eta$-centric MPL configuration also demonstrates improved accuracy with comparable efficiency. These results establish the promise of physics-guided GNNs for deployment in resource-constrained trigger systems.
Updated: 2025-07-25 12:19:57
标题: 物理信息图神经网络用于CMS触发系统中横向动量估计
摘要: 高能物理中实时粒子横向动量($p_T$)估计需要在严格的硬件约束下既高效又准确的算法。静态机器学习模型在高堆叠条件下退化且缺乏物理感知优化,而通用图神经网络(GNNs)经常忽视对稳健$p_T$回归至关重要的领域结构。我们提出了一个物理信息GNN框架,通过四种不同的图构建策略系统地编码探测器几何和物理可观测量:以站点为节点、以特征为节点、弯曲角为中心、以及伪快度($\eta$)为中心的表示。这个框架将这些定制的图结构与一种新颖的消息传递层(MPL)集成在一起,具有内部消息关注和门控更新,以及结合$p_{T}$分布先验的领域特定损失函数。我们的联合设计方法相比现有基线实现了更优秀的准确性和效率权衡。对CMS触发器数据集的大量实验验证了这种方法:一个以站点信息为基础的EdgeConv模型实现了0.8525的最新MAE,比深度学习基线少55%的参数,特别是TabNet,而一个$\eta$-中心的MPL配置也展示了改进的准确性和可比的效率。这些结果证实了物理引导的GNN在资源受限的触发系统中的潜力。
更新时间: 2025-07-25 12:19:57
领域: cs.LG
Latent Granular Resynthesis using Neural Audio Codecs
We introduce a novel technique for creative audio resynthesis that operates by reworking the concept of granular synthesis at the latent vector level. Our approach creates a "granular codebook" by encoding a source audio corpus into latent vector segments, then matches each latent grain of a target audio signal to its closest counterpart in the codebook. The resulting hybrid sequence is decoded to produce audio that preserves the target's temporal structure while adopting the source's timbral characteristics. This technique requires no model training, works with diverse audio materials, and naturally avoids the discontinuities typical of traditional concatenative synthesis through the codec's implicit interpolation during decoding. We include supplementary material at https://github.com/naotokui/latentgranular/ , as well as a proof-of-concept implementation to allow users to experiment with their own sounds at https://huggingface.co/spaces/naotokui/latentgranular .
Updated: 2025-07-25 12:14:12
标题: 使用神经音频编解码器进行潜在颗粒重合成
摘要: 我们介绍了一种创新的音频再合成技术,该技术通过在潜在向量级别重新构建颗粒合成的概念来运作。我们的方法通过将源音频语料库编码为潜在向量段,然后将目标音频信号的每个潜在颗粒与代码书中最接近的对应物进行匹配来创建一个“颗粒码书”。结果混合序列被解码以产生保留目标时间结构的音频,同时采用源的音色特征。这种技术不需要模型训练,适用于各种音频材料,并且在解码过程中通过编解码器的隐式插值自然避免了传统串联合成的不连续性。我们在 https://github.com/naotokui/latentgranular/ 提供了补充材料,以及一个概念验证实现,让用户可以在 https://huggingface.co/spaces/naotokui/latentgranular 上尝试他们自己的声音。
更新时间: 2025-07-25 12:14:12
领域: cs.SD,cs.LG,eess.AS,eess.SP
Joint Holistic and Lesion Controllable Mammogram Synthesis via Gated Conditional Diffusion Model
Mammography is the most commonly used imaging modality for breast cancer screening, driving an increasing demand for deep-learning techniques to support large-scale analysis. However, the development of accurate and robust methods is often limited by insufficient data availability and a lack of diversity in lesion characteristics. While generative models offer a promising solution for data synthesis, current approaches often fail to adequately emphasize lesion-specific features and their relationships with surrounding tissues. In this paper, we propose Gated Conditional Diffusion Model (GCDM), a novel framework designed to jointly synthesize holistic mammogram images and localized lesions. GCDM is built upon a latent denoising diffusion framework, where the noised latent image is concatenated with a soft mask embedding that represents breast, lesion, and their transitional regions, ensuring anatomical coherence between them during the denoising process. To further emphasize lesion-specific features, GCDM incorporates a gated conditioning branch that guides the denoising process by dynamically selecting and fusing the most relevant radiomic and geometric properties of lesions, effectively capturing their interplay. Experimental results demonstrate that GCDM achieves precise control over small lesion areas while enhancing the realism and diversity of synthesized mammograms. These advancements position GCDM as a promising tool for clinical applications in mammogram synthesis. Our code is available at https://github.com/lixinHUST/Gated-Conditional-Diffusion-Model/
Updated: 2025-07-25 12:10:45
标题: 通过门控条件扩散模型实现联合整体和病变可控乳腺X线合成
摘要: 乳腺X线摄影是乳腺癌筛查中最常用的成像模式,推动了对深度学习技术的增加需求,以支持大规模分析。然而,精确和稳健方法的开发往往受到数据可用性不足和病变特征缺乏多样性的限制。虽然生成模型为数据合成提供了有希望的解决方案,但当前的方法常常未能充分强调病变特定特征及其与周围组织的关系。在本文中,我们提出了门控条件扩散模型(GCDM),这是一个新颖的框架,旨在共同合成全面乳腺X线图像和局部病变。GCDM建立在潜在去噪扩散框架之上,其中噪声潜在图像与软蒙版嵌入相连接,代表乳房、病变及其过渡区域,确保在去噪过程中它们之间的解剖连贯性。为了进一步强调病变特定特征,GCDM融入了一个门控调节分支,通过动态选择和融合病变的最相关的放射病理学和几何特性来引导去噪过程,有效捕捉它们之间的相互作用。实验结果表明,GCDM在精确控制小病变区域的同时,提高了合成乳腺X线图像的真实性和多样性。这些进展将GCDM定位为乳腺X线图像合成临床应用的有前景工具。我们的代码可在https://github.com/lixinHUST/Gated-Conditional-Diffusion-Model/获得。
更新时间: 2025-07-25 12:10:45
领域: cs.CV,cs.AI
Joint Holistic and Lesion Controllable Mammogram Synthesis via Gated Conditional Diffusion Model
Mammography is the most commonly used imaging modality for breast cancer screening, driving an increasing demand for deep-learning techniques to support large-scale analysis. However, the development of accurate and robust methods is often limited by insufficient data availability and a lack of diversity in lesion characteristics. While generative models offer a promising solution for data synthesis, current approaches often fail to adequately emphasize lesion-specific features and their relationships with surrounding tissues. In this paper, we propose Gated Conditional Diffusion Model (GCDM), a novel framework designed to jointly synthesize holistic mammogram images and localized lesions. GCDM is built upon a latent denoising diffusion framework, where the noised latent image is concatenated with a soft mask embedding that represents breast, lesion, and their transitional regions, ensuring anatomical coherence between them during the denoising process. To further emphasize lesion-specific features, GCDM incorporates a gated conditioning branch that guides the denoising process by dynamically selecting and fusing the most relevant radiomic and geometric properties of lesions, effectively capturing their interplay. Experimental results demonstrate that GCDM achieves precise control over small lesion areas while enhancing the realism and diversity of synthesized mammograms. These advancements position GCDM as a promising tool for clinical applications in mammogram synthesis. Our code is available at https://github.com/lixinHUST/Gated-Conditional-Diffusion-Model/
Updated: 2025-07-25 12:10:45
标题: 通过门控条件扩散模型实现联合整体和病变可控乳房X线合成
摘要: 乳腺X线摄影是乳腺癌筛查中最常用的成像技术,推动了对深度学习技术的需求,以支持大规模分析。然而,准确和鲁棒方法的开发通常受到数据不足和病变特征缺乏多样性的限制。虽然生成模型为数据合成提供了一个有前途的解决方案,但当前方法往往未能充分强调病变特定特征及其与周围组织的关系。在本文中,我们提出了一种新颖的框架——门控条件扩散模型(GCDM),旨在共同合成整体乳腺X线图像和局部病变。GCDM建立在潜在去噪扩散框架之上,其中噪声潜在图像与软蒙版嵌入相连,代表乳腺、病变及其过渡区域,确保它们在去噪过程中解剖一致性。为进一步强调病变特定特征,GCDM引入了一个门控调节分支,通过动态选择和融合最相关的病变的放射密度和几何特性,有效捕捉它们之间的相互作用。实验结果表明,GCDM实现了对小病变区域的精确控制,同时增强了合成乳腺X线图像的逼真度和多样性。这些进展将GCDM定位为乳腺X线图像合成临床应用的有前途工具。我们的代码可在https://github.com/lixinHUST/Gated-Conditional-Diffusion-Model/上找到。
更新时间: 2025-07-25 12:10:45
领域: cs.CV,cs.AI
Enhancing Diabetic Retinopathy Classification Accuracy through Dual Attention Mechanism in Deep Learning
Automatic classification of Diabetic Retinopathy (DR) can assist ophthalmologists in devising personalized treatment plans, making it a critical component of clinical practice. However, imbalanced data distribution in the dataset becomes a bottleneck in the generalization of deep learning models trained for DR classification. In this work, we combine global attention block (GAB) and category attention block (CAB) into the deep learning model, thus effectively overcoming the imbalanced data distribution problem in DR classification. Our proposed approach is based on an attention mechanism-based deep learning model that employs three pre-trained networks, namely, MobileNetV3-small, Efficientnet-b0, and DenseNet-169 as the backbone architecture. We evaluate the proposed method on two publicly available datasets of retinal fundoscopy images for DR. Experimental results show that on the APTOS dataset, the DenseNet-169 yielded 83.20% mean accuracy, followed by the MobileNetV3-small and EfficientNet-b0, which yielded 82% and 80% accuracies, respectively. On the EYEPACS dataset, the EfficientNet-b0 yielded a mean accuracy of 80%, while the DenseNet-169 and MobileNetV3-small yielded 75.43% and 76.68% accuracies, respectively. In addition, we also compute the F1-score of 82.0%, precision of 82.1%, sensitivity of 83.0%, specificity of 95.5%, and a kappa score of 88.2% for the experiments. Moreover, in our work, the MobileNetV3-small has 1.6 million parameters on the APTOS dataset and 0.90 million parameters on the EYEPACS dataset, which is comparatively less than other methods. The proposed approach achieves competitive performance that is at par with recently reported works on DR classification.
Updated: 2025-07-25 12:09:27
标题: 通过双重注意机制在深度学习中提高糖尿病视网膜病变分类的准确性
摘要: 糖尿病性视网膜病变(DR)的自动分类可以帮助眼科医生制定个性化治疗方案,成为临床实践的关键组成部分。然而,在训练用于DR分类的深度学习模型时,数据集中不平衡的数据分布成为泛化的瓶颈。在这项工作中,我们将全局注意力块(GAB)和类别注意力块(CAB)结合到深度学习模型中,从而有效地克服了DR分类中的数据分布不平衡问题。我们提出的方法基于基于注意力机制的深度学习模型,采用三个预训练网络,即MobileNetV3-small、Efficientnet-b0和DenseNet-169作为骨干架构。我们在两个公开可用的视网膜底片图像数据集上评估了所提出的方法用于DR。实验结果显示,在APTOS数据集上,DenseNet-169的平均准确率为83.20%,其次是MobileNetV3-small和EfficientNet-b0,分别为82%和80%的准确率。在EYEPACS数据集上,EfficientNet-b0的平均准确率为80%,而DenseNet-169和MobileNetV3-small的准确率分别为75.43%和76.68%。此外,我们还计算了实验的F1分数为82.0%,精确度为82.1%,灵敏度为83.0%,特异性为95.5%,kappa分数为88.2%。此外,在我们的工作中,MobileNetV3-small在APTOS数据集上有160万个参数,在EYEPACS数据集上有90万个参数,相对于其他方法来说较少。所提出的方法实现了与最近报道的DR分类工作相媲美的竞争性能。
更新时间: 2025-07-25 12:09:27
领域: eess.IV,cs.AI,cs.CV
Enhancing Diabetic Retinopathy Classification Accuracy through Dual Attention Mechanism in Deep Learning
Automatic classification of Diabetic Retinopathy (DR) can assist ophthalmologists in devising personalized treatment plans, making it a critical component of clinical practice. However, imbalanced data distribution in the dataset becomes a bottleneck in the generalization of deep learning models trained for DR classification. In this work, we combine global attention block (GAB) and category attention block (CAB) into the deep learning model, thus effectively overcoming the imbalanced data distribution problem in DR classification. Our proposed approach is based on an attention mechanism-based deep learning model that employs three pre-trained networks, namely, MobileNetV3-small, Efficientnet-b0, and DenseNet-169 as the backbone architecture. We evaluate the proposed method on two publicly available datasets of retinal fundoscopy images for DR. Experimental results show that on the APTOS dataset, the DenseNet-169 yielded 83.20% mean accuracy, followed by the MobileNetV3-small and EfficientNet-b0, which yielded 82% and 80% accuracies, respectively. On the EYEPACS dataset, the EfficientNet-b0 yielded a mean accuracy of 80%, while the DenseNet-169 and MobileNetV3-small yielded 75.43% and 76.68% accuracies, respectively. In addition, we also compute the F1-score of 82.0%, precision of 82.1%, sensitivity of 83.0%, specificity of 95.5%, and a kappa score of 88.2% for the experiments. Moreover, in our work, the MobileNetV3-small has 1.6 million parameters on the APTOS dataset and 0.90 million parameters on the EYEPACS dataset, which is comparatively less than other methods. The proposed approach achieves competitive performance that is at par with recently reported works on DR classification.
Updated: 2025-07-25 12:09:27
标题: 通过深度学习中的双重注意机制提高糖尿病视网膜病变分类准确性
摘要: 糖尿病视网膜病变(DR)的自动分类可以帮助眼科医生制定个性化治疗方案,使其成为临床实践的关键组成部分。然而,在用于DR分类的深度学习模型中,数据集中的不平衡数据分布成为泛化的瓶颈。在这项工作中,我们将全局注意力块(GAB)和类别注意力块(CAB)结合到深度学习模型中,从而有效地克服DR分类中的数据不平衡问题。我们提出的方法基于基于注意力机制的深度学习模型,采用三个预训练网络,即MobileNetV3-small、Efficientnet-b0和DenseNet-169作为骨干架构。我们在两个公开可用的DR视网膜底片图像数据集上评估了所提出的方法。实验结果显示,在APTOS数据集上,DenseNet-169的平均准确率为83.20%,其次是MobileNetV3-small和EfficientNet-b0,分别为82%和80%的准确率。在EYEPACS数据集上,EfficientNet-b0的平均准确率为80%,而DenseNet-169和MobileNetV3-small的准确率分别为75.43%和76.68%。此外,我们还计算了实验的F1分数为82.0%,精确度为82.1%,敏感度为83.0%,特异度为95.5%,Kappa得分为88.2%。此外,在我们的工作中,MobileNetV3-small在APTOS数据集上有160万个参数,在EYEPACS数据集上有90万个参数,相对于其他方法较少。所提出的方法实现了与最近报道的DR分类工作相媲美的竞争性能。
更新时间: 2025-07-25 12:09:27
领域: eess.IV,cs.AI,cs.CV
WACA-UNet: Weakness-Aware Channel Attention for Static IR Drop Prediction in Integrated Circuit Design
Accurate spatial prediction of power integrity issues, such as IR drop, is critical for reliable VLSI design. However, traditional simulation-based solvers are computationally expensive and difficult to scale. We address this challenge by reformulating IR drop estimation as a pixel-wise regression task on heterogeneous multi-channel physical maps derived from circuit layouts. Prior learning-based methods treat all input layers (e.g., metal, via, and current maps) equally, ignoring their varying importance to prediction accuracy. To tackle this, we propose a novel Weakness-Aware Channel Attention (WACA) mechanism, which recursively enhances weak feature channels while suppressing over-dominant ones through a two-stage gating strategy. Integrated into a ConvNeXtV2-based attention U-Net, our approach enables adaptive and balanced feature representation. On the public ICCAD-2023 benchmark, our method outperforms the ICCAD-2023 contest winner by reducing mean absolute error by 61.1% and improving F1-score by 71.0%. These results demonstrate that channel-wise heterogeneity is a key inductive bias in physical layout analysis for VLSI.
Updated: 2025-07-25 12:07:16
标题: WACA-UNet:一种用于集成电路设计中静态IR下降预测的弱点感知通道注意力模型
摘要: 精确的空间预测电源完整性问题,如IR降,对于可靠的VLSI设计至关重要。然而,传统的基于模拟的求解器在计算上昂贵且难以扩展。我们通过将IR降估计重新制定为从电路布局导出的异构多通道物理图上的像素级回归任务来解决这一挑战。先前的基于学习的方法平等地处理所有输入层(例如金属、途经和电流图),忽略了它们对预测准确性的不同重要性。为了解决这个问题,我们提出了一种新颖的Weakness-Aware Channel Attention(WACA)机制,通过两阶段的门控策略,递归地增强弱特征通道,同时抑制过于主导的通道。集成到基于ConvNeXtV2的注意力U-Net中,我们的方法实现了自适应和平衡的特征表示。在公共ICCAD-2023基准测试中,我们的方法通过将平均绝对误差降低61.1%并将F1分数提高71.0%,超过了ICCAD-2023比赛获胜者。这些结果表明,通道级异质性是VLSI物理布局分析中的一种关键归纳偏差。
更新时间: 2025-07-25 12:07:16
领域: cs.LG,cs.AI,cs.CV,B.7.2; I.5.1; I.2.10; I.5.4
WACA-UNet: Weakness-Aware Channel Attention for Static IR Drop Prediction in Integrated Circuit Design
Accurate spatial prediction of power integrity issues, such as IR drop, is critical for reliable VLSI design. However, traditional simulation-based solvers are computationally expensive and difficult to scale. We address this challenge by reformulating IR drop estimation as a pixel-wise regression task on heterogeneous multi-channel physical maps derived from circuit layouts. Prior learning-based methods treat all input layers (e.g., metal, via, and current maps) equally, ignoring their varying importance to prediction accuracy. To tackle this, we propose a novel Weakness-Aware Channel Attention (WACA) mechanism, which recursively enhances weak feature channels while suppressing over-dominant ones through a two-stage gating strategy. Integrated into a ConvNeXtV2-based attention U-Net, our approach enables adaptive and balanced feature representation. On the public ICCAD-2023 benchmark, our method outperforms the ICCAD-2023 contest winner by reducing mean absolute error by 61.1% and improving F1-score by 71.0%. These results demonstrate that channel-wise heterogeneity is a key inductive bias in physical layout analysis for VLSI.
Updated: 2025-07-25 12:07:16
标题: WACA-UNet:集成电路设计中静态IR下降预测的弱点感知通道注意力
摘要: 准确预测功率完整性问题,如IR降,对于可靠的VLSI设计至关重要。然而,传统基于模拟的求解器计算成本高昂且难以扩展。我们通过将IR降估计重新构建为基于来自电路布局的异构多通道物理地图的像素级回归任务来解决这一挑战。先前基于学习的方法将所有输入层(如金属、通孔和电流地图)视为同等重要,忽略了它们对预测精度的不同重要性。为了解决这个问题,我们提出了一种新颖的Weakness-Aware Channel Attention(WACA)机制,通过两阶段的门控策略递归增强弱特征通道,同时抑制过度主导的通道。集成到基于ConvNeXtV2的注意力U-Net中,我们的方法实现了自适应和平衡的特征表示。在公开的ICCAD-2023基准测试中,我们的方法通过将平均绝对误差减少了61.1%并将F1分数提高了71.0%,优于ICCAD-2023比赛获胜者。这些结果表明,通道级异质性是VLSI物理布局分析中的一个关键归纳偏差。
更新时间: 2025-07-25 12:07:16
领域: cs.LG,cs.AI,cs.CV,B.7.2; I.5.1; I.2.10; I.5.4
Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?
Despite the ongoing improvements in the design of large language models (LLMs) to foster inclusion and balanced responses, these systems remain susceptible to encoding and amplifying social biases. This study examines how dialectal variation, specifically African American Vernacular English (AAVE) versus Standard American English (SAE), interacts with data poisoning to influence toxicity in outputs. Using both small- and medium-scale LLaMA models, we show that even minimal exposure to poisoned data significantly increases toxicity for AAVE inputs, while it remains comparatively unaffected for SAE. Larger models exhibit a more significant amplification effect which suggests heightened susceptibility with scale. To further assess these disparities, we employed GPT-4o as a fairness auditor, which identified harmful stereotypical patterns disproportionately tied to AAVE inputs, including portrayals of aggression, criminality, and intellectual inferiority. These findings underscore the compounding impact of data poisoning and dialectal bias and emphasize the need for dialect-aware evaluation, targeted debiasing interventions, and socially responsible training protocols during development.
Updated: 2025-07-25 12:05:47
标题: 小规模数据污染是否会加剧大型语言模型中与方言相关的偏见?
摘要: 尽管大型语言模型(LLMs)的设计不断改进以促进包容性和平衡回应,但这些系统仍然容易编码和放大社会偏见。本研究探讨了方言变体,特别是非洲裔美国人方言英语(AAVE)与标准美国英语(SAE)如何与数据污染相互作用,影响输出中的毒性。使用小型和中型规模的LLaMA模型,我们发现即使对毒害数据有最小程度的接触,也会显著增加AAVE输入的毒性,而对SAE输入的影响相对较小。更大的模型表现出更显著的放大效应,这表明了在规模上更容易受影响。为进一步评估这些差异,我们采用了GPT-4o作为公平审计员,发现了与AAVE输入密切相关的有害刻板印象模式,包括对攻击性、犯罪和智力低下的描绘。这些发现强调了数据污染和方言偏见的叠加影响,并强调了在开发过程中需要方言感知评估、有针对性的去偏见干预和社会责任培训方案。
更新时间: 2025-07-25 12:05:47
领域: cs.CL,cs.AI,cs.LG
Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?
Despite the ongoing improvements in the design of large language models (LLMs) to foster inclusion and balanced responses, these systems remain susceptible to encoding and amplifying social biases. This study examines how dialectal variation, specifically African American Vernacular English (AAVE) versus Standard American English (SAE), interacts with data poisoning to influence toxicity in outputs. Using both small- and medium-scale LLaMA models, we show that even minimal exposure to poisoned data significantly increases toxicity for AAVE inputs, while it remains comparatively unaffected for SAE. Larger models exhibit a more significant amplification effect which suggests heightened susceptibility with scale. To further assess these disparities, we employed GPT-4o as a fairness auditor, which identified harmful stereotypical patterns disproportionately tied to AAVE inputs, including portrayals of aggression, criminality, and intellectual inferiority. These findings underscore the compounding impact of data poisoning and dialectal bias and emphasize the need for dialect-aware evaluation, targeted debiasing interventions, and socially responsible training protocols during development.
Updated: 2025-07-25 12:05:47
标题: 小规模数据中毒是否会加剧大型语言模型中的方言偏见?
摘要: 尽管大型语言模型(LLMs)的设计不断改进以促进包容性和平衡的回应,但这些系统仍然容易编码和放大社会偏见。本研究探讨了方言变体,特别是非裔美国人方言英语(AAVE)与标准美国英语(SAE),如何与数据污染相互作用,影响输出中的毒性。使用小规模和中等规模的LLaMA模型,我们发现即使对毒害数据的最小暴露也会显著增加AAVE输入的毒性,而对SAE则相对不受影响。更大的模型表现出更显著的放大效应,这表明随着规模的增加,易受影响程度更高。为进一步评估这些不平等,我们使用GPT-4o作为公平审计员,发现了与AAVE输入不成比例地相关的有害的刻板印象模式,包括对侵略、犯罪和智力低下的描绘。这些发现强调了数据污染和方言偏见的叠加影响,并强调了在开发过程中进行方言意识评估、有针对性的去偏见干预和社会责任培训协议的必要性。
更新时间: 2025-07-25 12:05:47
领域: cs.CL,cs.AI,cs.LG
PennyCoder: Efficient Domain-Specific LLMs for PennyLane-Based Quantum Code Generation
The growing demand for robust quantum programming frameworks has unveiled a critical limitation: current large language model (LLM) based quantum code assistants heavily rely on remote APIs, introducing challenges related to privacy, latency, and excessive usage costs. Addressing this gap, we propose PennyCoder, a novel lightweight framework for quantum code generation, explicitly designed for local and embedded deployment to enable on-device quantum programming assistance without external API dependence. PennyCoder leverages a fine-tuned version of the LLaMA 3.1-8B model, adapted through parameter-efficient Low-Rank Adaptation (LoRA) techniques combined with domain-specific instruction tuning optimized for the specialized syntax and computational logic of quantum programming in PennyLane, including tasks in quantum machine learning and quantum reinforcement learning. Unlike prior work focused on cloud-based quantum code generation, our approach emphasizes device-native operability while maintaining high model efficacy. We rigorously evaluated PennyCoder over a comprehensive quantum programming dataset, achieving 44.3% accuracy with our fine-tuned model (compared to 33.7% for the base LLaMA 3.1-8B and 40.1% for the RAG-augmented baseline), demonstrating a significant improvement in functional correctness.
Updated: 2025-07-25 12:02:49
标题: PennyCoder:基于PennyLane的高效领域特定LLMs用于量子代码生成
摘要: 对于健壮的量子编程框架的不断增长需求揭示了一个关键限制:当前基于大型语言模型(LLM)的量子代码助手严重依赖于远程API,引入了与隐私、延迟和过高使用成本相关的挑战。为了解决这一问题,我们提出了PennyCoder,这是一个新颖的轻量级量子代码生成框架,专门设计用于本地和嵌入式部署,以实现设备上的量子编程辅助,而无需外部API依赖。PennyCoder利用经过参数高效的低秩适应(LoRA)技术调整的LLaMA 3.1-8B模型的精细调整版本,结合针对PennyLane中量子编程的专门语法和计算逻辑进行优化的特定领域指令调整,包括量子机器学习和量子强化学习中的任务。与着重于基于云的量子代码生成的先前工作不同,我们的方法强调设备本地可操作性,同时保持高模型效力。我们在全面的量子编程数据集上对PennyCoder进行了严格评估,通过我们的精细调整模型实现了44.3%的准确率(相比于基础LLaMA 3.1-8B的33.7%和RAG增强基准的40.1%),显示了在功能正确性方面的显著改进。
更新时间: 2025-07-25 12:02:49
领域: quant-ph,cs.AI,68T50, 81P68, 68T07,I.2.7; I.2.2
Natural Language Processing for Tigrinya: Current State and Future Directions
Despite being spoken by millions of people, Tigrinya remains severely underrepresented in Natural Language Processing (NLP) research. This work presents a comprehensive survey of NLP research for Tigrinya, analyzing over 40 studies spanning more than a decade of work from 2011 to 2025. We systematically review the current state of computational resources, models, and applications across ten distinct downstream tasks, including morphological processing, machine translation, speech recognition, and question-answering. Our analysis reveals a clear trajectory from foundational, rule-based systems to modern neural architectures, with progress consistently unlocked by resource creation milestones. We identify key challenges rooted in Tigrinya's morphological complexity and resource scarcity, while highlighting promising research directions, including morphology-aware modeling, cross-lingual transfer, and community-centered resource development. This work serves as both a comprehensive reference for researchers and a roadmap for advancing Tigrinya NLP. A curated metadata of the surveyed studies and resources is made publicly available.
Updated: 2025-07-25 11:58:42
标题: 提格里尼亚语的自然语言处理:当前状态和未来方向
摘要: 虽然提格利尼亚语是数百万人使用的语言,但在自然语言处理(NLP)研究中仍然严重缺乏代表性。本文提出了一项关于提格利尼亚语的NLP研究的全面调查,分析了从2011年到2025年的十多年工作中涵盖的40多项研究。我们系统地审查了当前的计算资源、模型和应用程序在十个不同的下游任务中的情况,包括形态处理、机器翻译、语音识别和问答。我们的分析揭示了从基础规则系统到现代神经架构的清晰轨迹,进展一直通过资源创建里程碑持续解锁。我们确定了根植于提格利尼亚语形态复杂性和资源稀缺性的关键挑战,同时突出了有前途的研究方向,包括形态感知建模、跨语言转移和以社区为中心的资源开发。这项工作既是研究人员的全面参考,也是推动提格利尼亚语NLP发展的路线图。对调查研究和资源进行了精心筛选的元数据已公开发布。
更新时间: 2025-07-25 11:58:42
领域: cs.CL,cs.AI,I.2.7
Natural Language Processing for Tigrinya: Current State and Future Directions
Despite being spoken by millions of people, Tigrinya remains severely underrepresented in Natural Language Processing (NLP) research. This work presents a comprehensive survey of NLP research for Tigrinya, analyzing over 40 studies spanning more than a decade of work from 2011 to 2025. We systematically review the current state of computational resources, models, and applications across ten distinct downstream tasks, including morphological processing, machine translation, speech recognition, and question-answering. Our analysis reveals a clear trajectory from foundational, rule-based systems to modern neural architectures, with progress consistently unlocked by resource creation milestones. We identify key challenges rooted in Tigrinya's morphological complexity and resource scarcity, while highlighting promising research directions, including morphology-aware modeling, cross-lingual transfer, and community-centered resource development. This work serves as both a comprehensive reference for researchers and a roadmap for advancing Tigrinya NLP. A curated metadata of the surveyed studies and resources is made publicly available.
Updated: 2025-07-25 11:58:42
标题: 提格雷尼亚语的自然语言处理:现状与未来方向
摘要: 尽管提格利尼亚语被数以百万计的人使用,但在自然语言处理(NLP)研究中仍然受到严重忽视。本文提供了一份关于提格利尼亚语NLP研究的全面调查,分析了从2011年到2025年超过40项研究的工作。我们系统地审查了当前计算资源、模型和应用在十个不同的下游任务中的现状,包括形态处理、机器翻译、语音识别和问答。我们的分析揭示了从基础的规则系统到现代神经结构的明确轨迹,资源创造里程碑始终释放出进展。我们确定了根源于提格利尼亚语形态复杂性和资源稀缺性的关键挑战,同时强调了有前途的研究方向,包括形态感知建模、跨语言转移和以社区为中心的资源开发。这项工作既是研究人员的全面参考,也是推进提格利尼亚语NLP的路线图。对调查研究和资源的精心整理元数据已公开发布。
更新时间: 2025-07-25 11:58:42
领域: cs.CL,cs.AI,I.2.7
PrompTrend: Continuous Community-Driven Vulnerability Discovery and Assessment for Large Language Models
Static benchmarks fail to capture LLM vulnerabilities emerging through community experimentation in online forums. We present PrompTrend, a system that collects vulnerability data across platforms and evaluates them using multidimensional scoring, with an architecture designed for scalable monitoring. Cross-sectional analysis of 198 vulnerabilities collected from online communities over a five-month period (January-May 2025) and tested on nine commercial models reveals that advanced capabilities correlate with increased vulnerability in some architectures, psychological attacks significantly outperform technical exploits, and platform dynamics shape attack effectiveness with measurable model-specific patterns. The PrompTrend Vulnerability Assessment Framework achieves 78% classification accuracy while revealing limited cross-model transferability, demonstrating that effective LLM security requires comprehensive socio-technical monitoring beyond traditional periodic assessment. Our findings challenge the assumption that capability advancement improves security and establish community-driven psychological manipulation as the dominant threat vector for current language models.
Updated: 2025-07-25 11:52:46
标题: PrompTrend:大型语言模型的持续社区驱动漏洞发现和评估
摘要: 静态基准测试无法捕捉LLM漏洞,这些漏洞是通过在线论坛中的社区实验而出现的。我们提出了PrompTrend,一个系统,它在各个平台上收集漏洞数据,并使用多维评分进行评估,其架构设计用于可伸缩的监测。对于在五个月内(2025年1月至5月)从在线社区收集的198个漏洞进行的横断面分析,并在九个商业模型上进行测试,结果显示,高级能力与某些架构中的漏洞增加相关,心理攻击明显优于技术攻击,平台动态塑造了攻击的有效性,并具有可测量的模型特定模式。PrompTrend漏洞评估框架实现了78%的分类准确性,同时揭示了有限的跨模型可转移性,表明有效的LLM安全性需要超越传统的定期评估的综合社会技术监测。我们的研究结果挑战了能力提升改善安全性的假设,并确立了由社区驱动的心理操纵作为当前语言模型的主要威胁向量。
更新时间: 2025-07-25 11:52:46
领域: cs.CR,cs.AI
PrompTrend: Continuous Community-Driven Vulnerability Discovery and Assessment for Large Language Models
Static benchmarks fail to capture LLM vulnerabilities emerging through community experimentation in online forums. We present PrompTrend, a system that collects vulnerability data across platforms and evaluates them using multidimensional scoring, with an architecture designed for scalable monitoring. Cross-sectional analysis of 198 vulnerabilities collected from online communities over a five-month period (January-May 2025) and tested on nine commercial models reveals that advanced capabilities correlate with increased vulnerability in some architectures, psychological attacks significantly outperform technical exploits, and platform dynamics shape attack effectiveness with measurable model-specific patterns. The PrompTrend Vulnerability Assessment Framework achieves 78% classification accuracy while revealing limited cross-model transferability, demonstrating that effective LLM security requires comprehensive socio-technical monitoring beyond traditional periodic assessment. Our findings challenge the assumption that capability advancement improves security and establish community-driven psychological manipulation as the dominant threat vector for current language models.
Updated: 2025-07-25 11:52:46
标题: PrompTrend: 大型语言模型的持续社区驱动漏洞发现和评估
摘要: Static benchmarks无法捕捉通过在线论坛社区实验而出现的LLM漏洞。我们提出了PrompTrend,一个系统,它跨平台收集漏洞数据,并使用多维评分进行评估,其架构设计用于可扩展监控。对2025年1月至5月期间(五个月)从在线社区收集的198个漏洞进行的横断面分析,并在九个商业模型上进行测试表明,一些架构中的高级功能与漏洞增加相关,心理攻击远远优于技术攻击,平台动态塑造攻击效果,具有可衡量的特定模型模式。PrompTrend漏洞评估框架实现了78%的分类准确性,同时揭示了有限的跨模型可转移性,表明有效的LLM安全性需要超越传统周期性评估的全面社会技术监控。我们的发现挑战了能力提升改善安全性的假设,并建立了社区驱动的心理操纵作为当前语言模型的主要威胁向量。
更新时间: 2025-07-25 11:52:46
领域: cs.CR,cs.AI
Faster Lifting for Ordered Domains with Predecessor Relations
We investigate lifted inference on ordered domains with predecessor relations, where the elements of the domain respect a total (cyclic) order, and every element has a distinct (clockwise) predecessor. Previous work has explored this problem through weighted first-order model counting (WFOMC), which computes the weighted sum of models for a given first-order logic sentence over a finite domain. In WFOMC, the order constraint is typically encoded by the linear order axiom introducing a binary predicate in the sentence to impose a linear ordering on the domain elements. The immediate and second predecessor relations are then encoded by the linear order predicate. Although WFOMC with the linear order axiom is theoretically tractable, existing algorithms struggle with practical applications, particularly when the predecessor relations are involved. In this paper, we treat predecessor relations as a native part of the axiom and devise a novel algorithm that inherently supports these relations. The proposed algorithm not only provides an exponential speedup for the immediate and second predecessor relations, which are known to be tractable, but also handles the general k-th predecessor relations. The extensive experiments on lifted inference tasks and combinatorics math problems demonstrate the efficiency of our algorithm, achieving speedups of a full order of magnitude.
Updated: 2025-07-25 11:43:34
标题: 有序域具有前驱关系的更快提升
摘要: 我们研究了在具有前任关系的有序域上的提升推理,其中域的元素遵守总(循环)顺序,并且每个元素具有一个唯一的(顺时针)前任。先前的研究通过加权的一阶模型计数(WFOMC)来探讨这个问题,该方法计算有限域上给定一阶逻辑句子的模型的加权和。在WFOMC中,顺序约束通常通过引入一个二元谓词来编码线性顺序公理来实现,该谓词在句子中对域元素施加线性顺序。然后,立即前任关系和第二前任关系通过线性顺序谓词进行编码。虽然具有线性顺序公理的WFOMC在理论上是可以处理的,但现有算法在实际应用中存在困难,特别是当涉及到前任关系时。在本文中,我们将前任关系视为公理的本质部分,并设计了一种天然支持这些关系的新算法。所提出的算法不仅为已知可处理的立即前任关系和第二前任关系提供了指数级的加速,还处理了一般的第k个前任关系。对提升推理任务和组合数学问题的大量实验表明了我们算法的效率,实现了一个数量级的加速。
更新时间: 2025-07-25 11:43:34
领域: cs.AI,cs.LO
Faster Lifting for Ordered Domains with Predecessor Relations
We investigate lifted inference on ordered domains with predecessor relations, where the elements of the domain respect a total (cyclic) order, and every element has a distinct (clockwise) predecessor. Previous work has explored this problem through weighted first-order model counting (WFOMC), which computes the weighted sum of models for a given first-order logic sentence over a finite domain. In WFOMC, the order constraint is typically encoded by the linear order axiom introducing a binary predicate in the sentence to impose a linear ordering on the domain elements. The immediate and second predecessor relations are then encoded by the linear order predicate. Although WFOMC with the linear order axiom is theoretically tractable, existing algorithms struggle with practical applications, particularly when the predecessor relations are involved. In this paper, we treat predecessor relations as a native part of the axiom and devise a novel algorithm that inherently supports these relations. The proposed algorithm not only provides an exponential speedup for the immediate and second predecessor relations, which are known to be tractable, but also handles the general k-th predecessor relations. The extensive experiments on lifted inference tasks and combinatorics math problems demonstrate the efficiency of our algorithm, achieving speedups of a full order of magnitude.
Updated: 2025-07-25 11:43:34
标题: 有前趋关系的有序域的更快提升
摘要: 我们研究了在具有前驱关系的有序域上的提升推理,其中域的元素遵守一个总的(循环)顺序,并且每个元素都有一个不同的(顺时针)前驱。先前的研究通过加权一阶模型计数(WFOMC)探索了这个问题,该方法计算给定有限域上的一阶逻辑句子的模型的加权和。在WFOMC中,顺序约束通常通过引入一个二元谓词的线性顺序公理来编码,以对域元素强加线性排序。然后通过线性顺序谓词对即时和第二个前驱关系进行编码。尽管带有线性顺序公理的WFOMC在理论上是可处理的,但现有算法在实际应用中存在困难,特别是涉及前驱关系时。在本文中,我们将前驱关系视为公理的一个固有部分,并设计了一种新颖的算法,从根本上支持这些关系。所提出的算法不仅为已知可处理的即时和第二个前驱关系提供了指数级加速,而且还处理了一般的第k个前驱关系。对提升推理任务和组合数学问题的大量实验表明了我们算法的效率,实现了一个数量级的加速。
更新时间: 2025-07-25 11:43:34
领域: cs.AI,cs.LO
Bespoke multiresolution analysis of graph signals
We present a novel framework for discrete multiresolution analysis of graph signals. The main analytical tool is the samplet transform, originally defined in the Euclidean framework as a discrete wavelet-like construction, tailored to the analysis of scattered data. The first contribution of this work is defining samplets on graphs. To this end, we subdivide the graph into a fixed number of patches, embed each patch into a Euclidean space, where we construct samplets, and eventually pull the construction back to the graph. This ensures orthogonality, locality, and the vanishing moments property with respect to properly defined polynomial spaces on graphs. Compared to classical Haar wavelets, this framework broadens the class of graph signals that can efficiently be compressed and analyzed. Along this line, we provide a definition of a class of signals that can be compressed using our construction. We support our findings with different examples of signals defined on graphs whose vertices lie on smooth manifolds. For efficient numerical implementation, we combine heavy edge clustering, to partition the graph into meaningful patches, with landmark \texttt{Isomap}, which provides low-dimensional embeddings for each patch. Our results demonstrate the method's robustness, scalability, and ability to yield sparse representations with controllable approximation error, significantly outperforming traditional Haar wavelet approaches in terms of compression efficiency and multiresolution fidelity.
Updated: 2025-07-25 11:43:19
标题: 个性化的图信号多分辨率分析
摘要: 我们提出了一种新颖的框架,用于图信号的离散多分辨率分析。主要的分析工具是样本变换,最初在欧几里得框架中定义为离散类似小波的构造,专门用于分析分散数据。这项工作的第一个贡献是在图上定义样本。为此,我们将图分成固定数量的补丁,将每个补丁嵌入到欧几里得空间中,在那里构造样本,最终将构造拉回到图中。这确保了正交性、局部性和关于图上适当定义的多项式空间的消失时刻性质。与传统的Haar小波相比,这个框架扩大了可以高效压缩和分析的图信号类别。沿着这条线,我们提供了一类可以使用我们的构造进行压缩的信号的定义。我们用定义在图上的信号的不同示例来支持我们的发现,这些信号的顶点位于光滑流形上。为了实现高效的数值实现,我们将重边聚类与地标Isomap相结合,将图分成有意义的补丁,并为每个补丁提供低维嵌入。我们的结果显示出该方法的鲁棒性、可扩展性和能够产生稀疏表示,并通过可控制的逼近误差明显优于传统的Haar小波方法,从而在压缩效率和多分辨率保真度方面表现出色。
更新时间: 2025-07-25 11:43:19
领域: eess.SP,cs.DM,cs.IT,cs.LG,math.IT
Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs
Large language models (LLMs) have demonstrated impressive capabilities, but their enormous size poses significant challenges for deployment in real-world applications. To address this issue, researchers have sought to apply network pruning techniques to LLMs. A critical challenge in pruning is allocation the sparsity for each layer. Recent sparsity allocation methods is often based on heuristics or search that can easily lead to suboptimal performance. In this paper, we conducted an extensive investigation into various LLMs and revealed three significant discoveries: (1) the layerwise pruning sensitivity (LPS) of LLMs is highly non-uniform, (2) the choice of pruning metric affects LPS, and (3) the performance of a sparse model is related to the uniformity of its layerwise redundancy level. Based on these observations, we propose that the layerwise sparsity of LLMs should adhere to three principles: \emph{non-uniformity}, \emph{pruning metric dependency}, and \emph{uniform layerwise redundancy level} in the pruned model. To this end, we proposed Maximum Redundancy Pruning (MRP), an iterative pruning algorithm that prunes in the most redundant layers (\emph{i.e.}, those with the highest non-outlier ratio) at each iteration. The achieved layerwise sparsity aligns with the outlined principles. We conducted extensive experiments on publicly available LLMs, including the LLaMA2 and OPT, across various benchmarks. Experimental results validate the effectiveness of MRP, demonstrating its superiority over previous methods.
Updated: 2025-07-25 11:32:02
标题: 最大冗余修剪:LLMs的基于原则的逐层稀疏分配
摘要: 大型语言模型(LLMs)展示了令人印象深刻的能力,但它们巨大的规模在实际应用中面临着重大挑战。为了解决这个问题,研究人员试图将网络剪枝技术应用于LLMs。剪枝中的一个关键挑战是为每一层分配稀疏性。最近的稀疏性分配方法通常基于启发式或搜索,这很容易导致次优性能。在本文中,我们对各种LLMs进行了广泛调查,并揭示了三个重要发现:(1)LLMs的逐层剪枝敏感性(LPS)高度不均匀,(2)剪枝度量的选择影响LPS,(3)稀疏模型的性能与其逐层冗余水平的均匀性相关。基于这些观察,我们提出LLMs的逐层稀疏性应遵守三个原则:非均匀性、剪枝度量依赖性和均匀的逐层冗余水平。为此,我们提出了最大冗余剪枝(MRP),这是一种迭代剪枝算法,每次迭代在最冗余的层(即具有最高非异常值比率的层)进行剪枝。实现的逐层稀疏性符合概述的原则。我们在公开可用的LLMs上进行了大量实验,包括LLaMA2和OPT,跨各种基准测试。实验结果验证了MRP的有效性,表明其优于先前的方法。
更新时间: 2025-07-25 11:32:02
领域: cs.LG,cs.AI
Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs
Large language models (LLMs) have demonstrated impressive capabilities, but their enormous size poses significant challenges for deployment in real-world applications. To address this issue, researchers have sought to apply network pruning techniques to LLMs. A critical challenge in pruning is allocation the sparsity for each layer. Recent sparsity allocation methods is often based on heuristics or search that can easily lead to suboptimal performance. In this paper, we conducted an extensive investigation into various LLMs and revealed three significant discoveries: (1) the layerwise pruning sensitivity (LPS) of LLMs is highly non-uniform, (2) the choice of pruning metric affects LPS, and (3) the performance of a sparse model is related to the uniformity of its layerwise redundancy level. Based on these observations, we propose that the layerwise sparsity of LLMs should adhere to three principles: \emph{non-uniformity}, \emph{pruning metric dependency}, and \emph{uniform layerwise redundancy level} in the pruned model. To this end, we proposed Maximum Redundancy Pruning (MRP), an iterative pruning algorithm that prunes in the most redundant layers (\emph{i.e.}, those with the highest non-outlier ratio) at each iteration. The achieved layerwise sparsity aligns with the outlined principles. We conducted extensive experiments on publicly available LLMs, including the LLaMA2 and OPT, across various benchmarks. Experimental results validate the effectiveness of MRP, demonstrating its superiority over previous methods.
Updated: 2025-07-25 11:32:02
标题: 最大冗余修剪:一种基于原则的逐层稀疏分配方法,用于LLMs
摘要: 大型语言模型(LLMs)展示了令人印象深刻的能力,但它们巨大的体积对于在现实世界应用中的部署提出了重大挑战。为了解决这个问题,研究人员试图将网络剪枝技术应用于LLMs。剪枝中的一个关键挑战是为每一层分配稀疏性。最近的稀疏性分配方法通常基于启发式或搜索,这很容易导致次优性能。在本文中,我们对各种LLMs进行了广泛调查,并揭示了三个重要发现:(1)LLMs的逐层剪枝敏感性(LPS)高度不均匀,(2)剪枝度量的选择影响LPS,(3)稀疏模型的性能与其逐层冗余水平的均匀性有关。基于这些观察结果,我们提出LLMs的逐层稀疏性应遵循三个原则:\emph{非均匀性}、\emph{剪枝度量依赖性}和\emph{逐层冗余水平的均匀性}。为此,我们提出了最大冗余剪枝(MRP),这是一种迭代剪枝算法,在每次迭代中剪枝最多冗余的层(即,具有最高非异常值比率的层)。实现的逐层稀疏与所述原则一致。我们在公开可用的LLMs上进行了广泛实验,包括LLaMA2和OPT,跨越各种基准。实验结果验证了MRP的有效性,展示了其优于先前方法的优越性。
更新时间: 2025-07-25 11:32:02
领域: cs.LG,cs.AI
Automatic Cough Analysis for Non-Small Cell Lung Cancer Detection
Early detection of non-small cell lung cancer (NSCLC) is critical for improving patient outcomes, and novel approaches are needed to facilitate early diagnosis. In this study, we explore the use of automatic cough analysis as a pre-screening tool for distinguishing between NSCLC patients and healthy controls. Cough audio recordings were prospectively acquired from a total of 227 subjects, divided into NSCLC patients and healthy controls. The recordings were analyzed using machine learning techniques, such as support vector machine (SVM) and XGBoost, as well as deep learning approaches, specifically convolutional neural networks (CNN) and transfer learning with VGG16. To enhance the interpretability of the machine learning model, we utilized Shapley Additive Explanations (SHAP). The fairness of the models across demographic groups was assessed by comparing the performance of the best model across different age groups (less than or equal to 58y and higher than 58y) and gender using the equalized odds difference on the test set. The results demonstrate that CNN achieves the best performance, with an accuracy of 0.83 on the test set. Nevertheless, SVM achieves slightly lower performances (accuracy of 0.76 in validation and 0.78 in the test set), making it suitable in contexts with low computational power. The use of SHAP for SVM interpretation further enhances model transparency, making it more trustworthy for clinical applications. Fairness analysis shows slightly higher disparity across age (0.15) than gender (0.09) on the test set. Therefore, to strengthen our findings' reliability, a larger, more diverse, and unbiased dataset is needed -- particularly including individuals at risk of NSCLC and those in early disease stages.
Updated: 2025-07-25 11:30:22
标题: 自动咳嗽分析用于非小细胞肺癌检测
摘要: 早期检测非小细胞肺癌(NSCLC)对改善患者预后至关重要,需要新颖方法促进早期诊断。本研究探讨自动咳嗽分析作为NSCLC患者和健康对照之间区分的预筛查工具的应用。共有227名受试者,分为NSCLC患者和健康对照组,进行了咳嗽音频记录。利用支持向量机(SVM)、XGBoost等机器学习技术以及卷积神经网络(CNN)和VGG16迁移学习对记录进行分析。为提高机器学习模型的可解释性,使用Shapley Additive Explanations(SHAP)。通过在测试集上使用平等几率差比较不同年龄组(小于或等于58岁和大于58岁)和性别,评估模型在人口统计学群体之间的公平性。结果显示CNN在测试集上具有最佳性能,准确率为0.83。然而,SVM性能稍低(验证集准确率0.76,测试集准确率0.78),适用于计算能力低的环境。使用SHAP解释SVM进一步提高了模型的透明度,使其更值得信赖于临床应用。公平性分析显示在测试集上,年龄(0.15)的差异略高于性别(0.09)。因此,为加强研究结果的可靠性,需要更大、更多样化且无偏见的数据集,特别是包括有NSCLC风险和早期疾病阶段的个体。
更新时间: 2025-07-25 11:30:22
领域: cs.LG
PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring
Robust and unobtrusive in-vehicle physiological monitoring is crucial for ensuring driving safety and user experience. While remote physiological measurement (RPM) offers a promising non-invasive solution, its translation to real-world driving scenarios is critically constrained by the scarcity of comprehensive datasets. Existing resources are often limited in scale, modality diversity, the breadth of biometric annotations, and the range of captured conditions, thereby omitting inherent real-world challenges in driving. Here, we present PhysDrive, the first large-scale multimodal dataset for contactless in-vehicle physiological sensing with dedicated consideration on various modality settings and driving factors. PhysDrive collects data from 48 drivers, including synchronized RGB, near-infrared camera, and raw mmWave radar data, accompanied with six synchronized ground truths (ECG, BVP, Respiration, HR, RR, and SpO2). It covers a wide spectrum of naturalistic driving conditions, including driver motions, dynamic natural light, vehicle types, and road conditions. We extensively evaluate both signal-processing and deep-learning methods on PhysDrive, establishing a comprehensive benchmark across all modalities, and release full open-source code with compatibility for mainstream public toolboxes. We envision PhysDrive will serve as a foundational resource and accelerate research on multimodal driver monitoring and smart-cockpit systems.
Updated: 2025-07-25 11:23:44
标题: PhysDrive:用于车内驾驶员监测的多模式远程生理测量数据集
摘要: 坚固且不显眼的车内生理监测对确保驾驶安全和用户体验至关重要。虽然远程生理测量(RPM)提供了一种有前途的非侵入性解决方案,但其在现实驾驶场景中的转化受到全面数据集的极度限制。现有资源往往在规模、模态多样性、生物特征注释的广度以及捕获条件的范围上存在限制,从而忽略了驾驶中固有的真实挑战。在这里,我们介绍了PhysDrive,这是第一个针对各种模态设置和驾驶因素进行专门考虑的大规模多模态数据集,用于无接触车内生理传感。PhysDrive从48名驾驶员收集数据,包括同步的RGB、近红外摄像头和原始mmWave雷达数据,配备六个同步的地面真实值(心电图、动脉搏动、呼吸、心率、呼吸频率和血氧饱和度)。它涵盖了广泛的自然驾驶条件,包括驾驶员动作、动态自然光、车辆类型和道路条件。我们在PhysDrive上广泛评估了信号处理和深度学习方法,建立了跨所有模态的全面基准,并发布了完整的开源代码,兼容主流公共工具箱。我们预计PhysDrive将作为基础资源,加速多模态驾驶监测和智能驾驶舱系统的研究。
更新时间: 2025-07-25 11:23:44
领域: cs.AI,cs.CV
PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring
Robust and unobtrusive in-vehicle physiological monitoring is crucial for ensuring driving safety and user experience. While remote physiological measurement (RPM) offers a promising non-invasive solution, its translation to real-world driving scenarios is critically constrained by the scarcity of comprehensive datasets. Existing resources are often limited in scale, modality diversity, the breadth of biometric annotations, and the range of captured conditions, thereby omitting inherent real-world challenges in driving. Here, we present PhysDrive, the first large-scale multimodal dataset for contactless in-vehicle physiological sensing with dedicated consideration on various modality settings and driving factors. PhysDrive collects data from 48 drivers, including synchronized RGB, near-infrared camera, and raw mmWave radar data, accompanied with six synchronized ground truths (ECG, BVP, Respiration, HR, RR, and SpO2). It covers a wide spectrum of naturalistic driving conditions, including driver motions, dynamic natural light, vehicle types, and road conditions. We extensively evaluate both signal-processing and deep-learning methods on PhysDrive, establishing a comprehensive benchmark across all modalities, and release full open-source code with compatibility for mainstream public toolboxes. We envision PhysDrive will serve as a foundational resource and accelerate research on multimodal driver monitoring and smart-cockpit systems.
Updated: 2025-07-25 11:23:44
标题: PhysDrive:一种用于车内驾驶员监测的多模式远程生理测量数据集
摘要: 稳健且不显眼的车内生理监测对于确保驾驶安全和用户体验至关重要。虽然远程生理测量(RPM)提供了一种有前途的非侵入式解决方案,但其在现实驾驶场景中的应用受到综合数据集的稀缺性的严重限制。现有资源往往在规模、模态多样性、生物特征注释广度和捕捉条件范围方面受到限制,因此忽略了驾驶中固有的现实挑战。在这里,我们提出PhysDrive,这是第一个专注于各种模态设置和驾驶因素的无接触车内生理感知的大规模多模态数据集。PhysDrive从48名驾驶员那里收集数据,包括同步的RGB、近红外摄像头和原始毫米波雷达数据,配有六个同步的地面真相(ECG、BVP、呼吸、心率、RR和SpO2)。它涵盖了广泛的自然驾驶条件,包括驾驶员动作、动态自然光、车辆类型和道路条件。我们在PhysDrive上广泛评估了信号处理和深度学习方法,建立了一个跨所有模态的全面基准,并发布了完整的开源代码,兼容主流公共工具箱。我们预计PhysDrive将作为一个基础资源,加速对多模态驾驶员监测和智能驾驶舱系统的研究。
更新时间: 2025-07-25 11:23:44
领域: cs.AI,cs.CV
Doubly Regularized Entropic Wasserstein Barycenters
We study a general formulation of regularized Wasserstein barycenters that enjoys favorable regularity, approximation, stability and (grid-free) optimization properties. This barycenter is defined as the unique probability measure that minimizes the sum of entropic optimal transport (EOT) costs with respect to a family of given probability measures, plus an entropy term. We denote it $(\lambda,\tau)$-barycenter, where $\lambda$ is the inner regularization strength and $\tau$ the outer one. This formulation recovers several previously proposed EOT barycenters for various choices of $\lambda,\tau \geq 0$ and generalizes them. First, in spite of -- and in fact owing to -- being \emph{doubly} regularized, we show that our formulation is debiased for $\tau=\lambda/2$: the suboptimality in the (unregularized) Wasserstein barycenter objective is, for smooth densities, of the order of the strength $\lambda^2$ of entropic regularization, instead of $\max\{\lambda,\tau\}$ in general. We discuss this phenomenon for isotropic Gaussians where all $(\lambda,\tau)$-barycenters have closed form. Second, we show that for $\lambda,\tau>0$, this barycenter has a smooth density and is strongly stable under perturbation of the marginals. In particular, it can be estimated efficiently: given $n$ samples from each of the probability measures, it converges in relative entropy to the population barycenter at a rate $n^{-1/2}$. And finally, this formulation lends itself naturally to a grid-free optimization algorithm: we propose a simple \emph{noisy particle gradient descent} which, in the mean-field limit, converges globally at an exponential rate to the barycenter.
Updated: 2025-07-25 11:16:33
标题: 双重正则化的熵式Wasserstein质心
摘要: 我们研究了正则化Wasserstein barycenters的一般公式,具有良好的正则性、逼近性、稳定性和(无网格)优化特性。该barycenter被定义为最小化与给定概率测度族相关的熵最优传输(EOT)成本之和,再加上一个熵项的唯一概率测度。我们将其标记为$(\lambda,\tau)$-barycenter,其中$\lambda$是内部正则化强度,$\tau$是外部正则化强度。该公式恢复了先前提出的各种选择$\lambda,\tau \geq 0$的EOT barycenters,并对其进行了泛化。首先,尽管由于被\emph{双重}正则化,我们的公式对于$\tau=\lambda/2$是无偏的:对于平滑密度,(非正则化)Wasserstein barycenter目标的次优性是熵正则化强度$\lambda^2$的数量级,而不是通常情况下的$\max\{\lambda,\tau\}$。我们讨论了这种现象在各向同性高斯分布中的情况,所有$(\lambda,\tau)$-barycenters都具有封闭形式。其次,我们证明对于$\lambda,\tau>0$,该barycenter具有平滑密度,并且在边缘扰动下具有强稳定性。特别地,它可以被高效地估计:给定每个概率测度的$n$个样本,它以$n^{-1/2}$的速率相对熵收敛到总体barycenter。最后,这个公式自然地适用于无网格优化算法:我们提出了一个简单的\emph{嘈杂粒子梯度下降},在均场极限下,以指数速率全局收敛到barycenter。
更新时间: 2025-07-25 11:16:33
领域: math.OC,cs.LG,stat.ML,49N99 (Primary) 62G05, 90C30 (Secondary)
Explainable AI guided unsupervised fault diagnostics for high-voltage circuit breakers
Commercial high-voltage circuit breaker (CB) condition monitoring systems rely on directly observable physical parameters such as gas filling pressure with pre-defined thresholds. While these parameters are crucial, they only cover a small subset of malfunctioning mechanisms and usually can be monitored only if the CB is disconnected from the grid. To facilitate online condition monitoring while CBs remain connected, non-intrusive measurement techniques such as vibration or acoustic signals are necessary. Currently, CB condition monitoring studies using these signals typically utilize supervised methods for fault diagnostics, where ground-truth fault types are known due to artificially introduced faults in laboratory settings. This supervised approach is however not feasible in real-world applications, where fault labels are unavailable. In this work, we propose a novel unsupervised fault detection and segmentation framework for CBs based on vibration and acoustic signals. This framework can detect deviations from the healthy state. The explainable artificial intelligence (XAI) approach is applied to the detected faults for fault diagnostics. The specific contributions are: (1) we propose an integrated unsupervised fault detection and segmentation framework that is capable of detecting faults and clustering different faults with only healthy data required during training (2) we provide an unsupervised explainability-guided fault diagnostics approach using XAI to offer domain experts potential indications of the aged or faulty components, achieving fault diagnostics without the prerequisite of ground-truth fault labels. These contributions are validated using an experimental dataset from a high-voltage CB under healthy and artificially introduced fault conditions, contributing to more reliable CB system operation.
Updated: 2025-07-25 11:14:56
标题: 高压断路器的可解释AI引导的无监督故障诊断
摘要: 商业高压断路器(CB)状态监测系统依赖于直接可观察的物理参数,如预定义的气体填充压力阈值。虽然这些参数至关重要,但只涵盖了故障机制的一小部分,并且通常只能在CB与电网断开连接时进行监测。为了在CB保持连接的情况下实现在线状态监测,非侵入式测量技术,如振动或声学信号是必要的。目前,使用这些信号进行CB状态监测研究通常利用监督方法进行故障诊断,其中由于在实验室环境中引入了人为故障,因此已知地面真实故障类型。然而,在现实世界的应用中,这种监督方法是不可行的,因为故障标签不可用。在这项工作中,我们提出了一种基于振动和声学信号的CB的新颖无监督故障检测和分割框架。该框架可以检测与健康状态的偏差。对检测到的故障应用了可解释的人工智能(XAI)方法进行故障诊断。具体贡献包括:(1)我们提出了一种集成的无监督故障检测和分割框架,能够在训练期间仅使用健康数据检测故障并对不同故障进行聚类;(2)我们提供了一种无监督的可解释性引导的故障诊断方法,利用XAI为领域专家提供潜在的老化或故障组件指示,实现故障诊断而无需地面真实故障标签的先决条件。这些贡献是通过在高压CB在健康和人为引入故障条件下的实验数据集上进行验证的,有助于更可靠地运行CB系统。
更新时间: 2025-07-25 11:14:56
领域: cs.LG,eess.SP
Addressing the Minor-Embedding Problem in Quantum Annealing and Evaluating State-of-the-Art Algorithm Performance
This study addresses the minor-embedding problem, which involves mapping the variables of an Ising model onto a quantum annealing processor. The primary motivation stems from the observed performance disparity of quantum annealers when solving problems suited to the processor's architecture versus those with non-hardware-native topologies. Our research has two main objectives: i) to analyze the impact of embedding quality on the performance of D-Wave Systems quantum annealers, and ii) to evaluate the quality of the embeddings generated by Minorminer, the standard minor-embedding technique in the quantum annealing literature, provided by D-Wave. Regarding the first objective, our experiments reveal a clear correlation between the average chain length of embeddings and the relative errors of the solutions sampled. This underscores the critical influence of embedding quality on quantum annealing performance. For the second objective, we evaluate Minorminer's embedding capabilities, the quality and robustness of its embeddings, and its execution-time performance. We also compare its performance with Clique Embedding, another algorithm developed by D-Wave, which is deterministic and designed to embed fully connected Ising models into quantum annealing processors, serving as a worst-case scenario. The results demonstrate that there is significant room for improvement for Minorminer, suggesting that more effective embedding strategies could lead to meaningful gains in quantum annealing performance.
Updated: 2025-07-25 11:11:03
标题: 解决量子退火中的小嵌入问题并评估最新算法性能
摘要: 这项研究涉及到小嵌入问题,它涉及将一个Ising模型的变量映射到量子退火处理器上。主要动机源于观察到的量子退火器在解决适合处理器架构的问题和非硬件本地拓扑结构的问题时的性能差异。我们的研究有两个主要目标:i)分析嵌入质量对D-Wave Systems量子退火器性能的影响,ii)评估Minorminer生成的嵌入的质量,Minorminer是量子退火文献中标准的小嵌入技术,由D-Wave提供。关于第一个目标,我们的实验揭示了嵌入的平均链长度与采样解的相对误差之间的明确相关性。这突出了嵌入质量对量子退火性能的关键影响。对于第二个目标,我们评估了Minorminer的嵌入能力,其嵌入的质量和鲁棒性,以及其执行时间性能。我们还将其性能与另一种由D-Wave开发的Clique Embedding算法进行比较,后者是确定性的,并设计用于将完全连接的Ising模型嵌入到量子退火处理器中,作为最坏情况。结果表明,Minorminer存在显著的改进空间,这表明更有效的嵌入策略可能会带来量子退火性能的有意义的提升。
更新时间: 2025-07-25 11:11:03
领域: quant-ph,cs.AI,cs.ET
Addressing the Minor-Embedding Problem in Quantum Annealing and Evaluating State-of-the-Art Algorithm Performance
This study addresses the minor-embedding problem, which involves mapping the variables of an Ising model onto a quantum annealing processor. The primary motivation stems from the observed performance disparity of quantum annealers when solving problems suited to the processor's architecture versus those with non-hardware-native topologies. Our research has two main objectives: i) to analyze the impact of embedding quality on the performance of D-Wave Systems quantum annealers, and ii) to evaluate the quality of the embeddings generated by Minorminer, the standard minor-embedding technique in the quantum annealing literature, provided by D-Wave. Regarding the first objective, our experiments reveal a clear correlation between the average chain length of embeddings and the relative errors of the solutions sampled. This underscores the critical influence of embedding quality on quantum annealing performance. For the second objective, we evaluate Minorminer's embedding capabilities, the quality and robustness of its embeddings, and its execution-time performance. We also compare its performance with Clique Embedding, another algorithm developed by D-Wave, which is deterministic and designed to embed fully connected Ising models into quantum annealing processors, serving as a worst-case scenario. The results demonstrate that there is significant room for improvement for Minorminer, suggesting that more effective embedding strategies could lead to meaningful gains in quantum annealing performance.
Updated: 2025-07-25 11:11:03
标题: 解决量子退火中的小嵌入问题并评估最先进算法性能
摘要: 这项研究涉及到小嵌入问题,涉及将一个Ising模型的变量映射到量子退火处理器上。主要动机来自于观察到的量子退火器在解决适合处理器架构的问题与非硬件本地拓扑的问题时性能差异。我们的研究有两个主要目标:i)分析嵌入质量对D-Wave Systems量子退火器性能的影响,ii)评估Minorminer,即量子退火文献中提供的标准小嵌入技术生成的嵌入质量。关于第一个目标,我们的实验揭示了嵌入的平均链长度与采样解的相对误差之间的明显相关性。这突显了嵌入质量对量子退火性能的关键影响。对于第二个目标,我们评估了Minorminer的嵌入能力,其嵌入质量和鲁棒性,以及其执行时间性能。我们还将其性能与D-Wave开发的另一种算法Clique Embedding进行比较,后者是确定性的,并设计将完全连接的Ising模型嵌入到量子退火处理器中,作为最坏情况。结果表明Minorminer还有很大的改进空间,这表明更有效的嵌入策略可能会带来量子退火性能的显著增益。
更新时间: 2025-07-25 11:11:03
领域: quant-ph,cs.AI,cs.ET
Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them
Training large language models (LLMs) for reasoning via maths and code datasets has become a major new focus in LLM post-training. Two particularly popular approaches are reinforcement learning (RL) and supervised fine-tuning (SFT), but their training dynamics are poorly understood. We present a comparative analysis of RL and SFT on the same maths problems with the same model and similar hyperparameters. We find that RL yields minor in-domain gains on maths and slight degradation on knowledge-intensive benchmarks like MMLU, while both trends are more pronounced in SFT. We also analyse model parameters across checkpoints, observing that both algorithms modify query and key weights the most. Meanwhile, SFT exhibits greater updates and also affects mid-layer MLPs more, leading us to hypothesise that this may have caused the out-of-domain degradation. We therefore investigate whether freezing parts of the model during training can mitigate the reduced performance on knowledge-intensive benchmarks. However, our results are inconclusive, with benefits on GPQA:Diamond and degradation on other benchmarks. Taken together, our observations provide a preliminary indication for why RL amplifies existing capabilities, while SFT replaces old skills with new ones.
Updated: 2025-07-25 11:09:53
标题: 解剖刀 vs 锤子:GRPO增强现有能力,SFT替代它们
摘要: 通过数学和代码数据集训练大型语言模型(LLMs)进行推理已经成为LLM后训练的一个主要新焦点。两种特别流行的方法是强化学习(RL)和监督微调(SFT),但它们的训练动态尚不明确。我们对相同数学问题使用相同模型和类似超参数的RL和SFT进行了比较分析。我们发现RL在数学方面有轻微的领域内收益,而在像MMLU这样的知识密集型基准测试中略有下降,而这两种趋势在SFT中更为显著。我们还分析了不同检查点下的模型参数,观察到两种算法都对查询和键权重进行了最大修改。与此同时,SFT表现出更大的更新,并且还会更多地影响中间层MLPs,导致我们假设这可能导致领域外的性能下降。因此,我们调查了在训练过程中冻结模型的部分部分是否可以缓解在知识密集型基准测试中性能降低。然而,我们的结果并不明确,在GPQA:Diamond上有益处,在其他基准测试中有下降。综合起来,我们的观察为为什么RL增强现有能力提供了初步迹象,而SFT则用新技能替换旧技能提供了初步迹象。
更新时间: 2025-07-25 11:09:53
领域: cs.LG,cs.AI,cs.CL
Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them
Training large language models (LLMs) for reasoning via maths and code datasets has become a major new focus in LLM post-training. Two particularly popular approaches are reinforcement learning (RL) and supervised fine-tuning (SFT), but their training dynamics are poorly understood. We present a comparative analysis of RL and SFT on the same maths problems with the same model and similar hyperparameters. We find that RL yields minor in-domain gains on maths and slight degradation on knowledge-intensive benchmarks like MMLU, while both trends are more pronounced in SFT. We also analyse model parameters across checkpoints, observing that both algorithms modify query and key weights the most. Meanwhile, SFT exhibits greater updates and also affects mid-layer MLPs more, leading us to hypothesise that this may have caused the out-of-domain degradation. We therefore investigate whether freezing parts of the model during training can mitigate the reduced performance on knowledge-intensive benchmarks. However, our results are inconclusive, with benefits on GPQA:Diamond and degradation on other benchmarks. Taken together, our observations provide a preliminary indication for why RL amplifies existing capabilities, while SFT replaces old skills with new ones.
Updated: 2025-07-25 11:09:53
标题: 解剖刀 vs 锤子:GRPO增强现有能力,SFT替代它们
摘要: 通过数学和代码数据集训练大型语言模型(LLMs)进行推理已成为LLM后训练的主要新焦点。两种特别流行的方法是强化学习(RL)和监督微调(SFT),但它们的训练动态尚不明确。我们对相同数学问题使用相同模型和类似超参数的RL和SFT进行了比较分析。我们发现RL在数学领域带来了轻微的收益,并在像MMLU这样的知识密集型基准上略微降低,而这两种趋势在SFT中更为明显。我们还分析了检查点间的模型参数,观察到两种算法都对查询和键权重进行了最大修改。与此同时,SFT表现出更大的更新,并且对中间层MLP产生更大影响,这使我们推测这可能导致了领域外的降级。因此,我们研究了在训练过程中冻结模型部分是否可以减轻在知识密集型基准上降低的性能。然而,我们的结果并不一致,对GPQA:Diamond有益,对其他基准有损。综上所述,我们的观察为为什么RL增强了现有能力提供了初步指示,而SFT则用新技能代替了旧技能。
更新时间: 2025-07-25 11:09:53
领域: cs.LG,cs.AI,cs.CL
AI PsyRoom: Artificial Intelligence Platform for Segmented Yearning and Reactive Outcome Optimization Method
Psychological counseling faces huge challenges due to the growing demand for mental health services and the shortage of trained professionals. Large language models (LLMs) have shown potential to assist psychological counseling, especially in empathy and emotional support. However, existing models lack a deep understanding of emotions and are unable to generate personalized treatment plans based on fine-grained emotions. To address these shortcomings, we present AI PsyRoom, a multi-agent simulation framework designed to enhance psychological counseling by generating empathetic and emotionally nuanced conversations. By leveraging fine-grained emotion classification and a multi-agent framework, we construct a multi-agent PsyRoom A for dialogue reconstruction, generating a high-quality dialogue dataset EmoPsy, which contains 35 sub-emotions, 423 specific emotion scenarios, and 12,350 dialogues. We also propose PsyRoom B for generating personalized treatment plans. Quantitative evaluations demonstrate that AI PsyRoom significantly outperforms state-of-the-art methods, achieving 18% improvement in problem orientation, 23% in expression, 24% in Empathy, and 16% in interactive communication quality. The datasets and models are publicly available, providing a foundation for advancing AI-assisted psychological counseling research.
Updated: 2025-07-25 11:08:54
标题: AI PsyRoom:用于分段渴望和反应结果优化方法的人工智能平台
摘要: 心理咨询面临巨大挑战,原因在于对心理健康服务需求的增长和受过训练的专业人员短缺。大型语言模型(LLMs)已显示出潜力,可以协助心理咨询,特别是在共情和情感支持方面。然而,现有模型缺乏对情绪的深刻理解,无法基于细粒度情绪生成个性化的治疗计划。为解决这些不足,我们提出了AI PsyRoom,这是一个旨在通过生成富有同理心和情感细微差别对话的多智能体模拟框架,以增强心理咨询。通过利用细粒度情绪分类和多智能体框架,我们构建了一个用于对话重构的多智能体PsyRoom A,生成了一个高质量的对话数据集EmoPsy,其中包含35种子情绪,423种具体情绪场景和12,350个对话。我们还提出了PsyRoom B,用于生成个性化治疗计划。定量评估表明,AI PsyRoom明显优于最先进的方法,在问题定位方面提高了18%,在表达方面提高了23%,在共情方面提高了24%,在互动交流质量方面提高了16%。数据集和模型是公开可用的,为推进AI辅助心理咨询研究提供了基础。
更新时间: 2025-07-25 11:08:54
领域: cs.AI
AI PsyRoom: Artificial Intelligence Platform for Segmented Yearning and Reactive Outcome Optimization Method
Psychological counseling faces huge challenges due to the growing demand for mental health services and the shortage of trained professionals. Large language models (LLMs) have shown potential to assist psychological counseling, especially in empathy and emotional support. However, existing models lack a deep understanding of emotions and are unable to generate personalized treatment plans based on fine-grained emotions. To address these shortcomings, we present AI PsyRoom, a multi-agent simulation framework designed to enhance psychological counseling by generating empathetic and emotionally nuanced conversations. By leveraging fine-grained emotion classification and a multi-agent framework, we construct a multi-agent PsyRoom A for dialogue reconstruction, generating a high-quality dialogue dataset EmoPsy, which contains 35 sub-emotions, 423 specific emotion scenarios, and 12,350 dialogues. We also propose PsyRoom B for generating personalized treatment plans. Quantitative evaluations demonstrate that AI PsyRoom significantly outperforms state-of-the-art methods, achieving 18% improvement in problem orientation, 23% in expression, 24% in Empathy, and 16% in interactive communication quality. The datasets and models are publicly available, providing a foundation for advancing AI-assisted psychological counseling research.
Updated: 2025-07-25 11:08:54
标题: AI PsyRoom:用于分段渴望和反应结果优化方法的人工智能平台
摘要: 心理咨询面临巨大挑战,原因是心理健康服务需求增长与受训专业人员短缺。大型语言模型(LLMs)已显示出潜力,尤其在同情和情感支持方面有助于心理咨询。然而,现有模型缺乏对情绪的深刻理解,无法根据细粒度情绪生成个性化治疗计划。为解决这些缺点,我们提出了AI PsyRoom,一个旨在通过生成富有同情心和情感细腻对话来增强心理咨询的多智能体模拟框架。通过利用细粒度情绪分类和多智能体框架,我们构建了一个用于对话重建的多智能体PsyRoom A,生成了一个高质量的对话数据集EmoPsy,其中包含35种子情绪,423种具体情绪场景和12350个对话。我们还提出了PsyRoom B,用于生成个性化治疗计划。定量评估表明,AI PsyRoom明显优于最先进的方法,问题取向改善了18%,表达力提高了23%,同情心提高了24%,互动沟通质量提高了16%。数据集和模型可公开获取,为推进AI辅助心理咨询研究奠定了基础。
更新时间: 2025-07-25 11:08:54
领域: cs.AI
An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case
The increasing use of Large Language Models (LLMs) in a large variety of domains has sparked worries about how easily they can perpetuate stereotypes and contribute to the generation of biased content. With a focus on gender and professional bias, this work examines in which manner LLMs shape responses to ungendered prompts, contributing to biased outputs. This analysis uses a structured experimental method, giving different prompts involving three different professional job combinations, which are also characterized by a hierarchical relationship. This study uses Italian, a language with extensive grammatical gender differences, to highlight potential limitations in current LLMs' ability to generate objective text in non-English languages. Two popular LLM-based chatbots are examined, namely OpenAI ChatGPT (gpt-4o-mini) and Google Gemini (gemini-1.5-flash). Through APIs, we collected a range of 3600 responses. The results highlight how content generated by LLMs can perpetuate stereotypes. For example, Gemini associated 100% (ChatGPT 97%) of 'she' pronouns to the 'assistant' rather than the 'manager'. The presence of bias in AI-generated text can have significant implications in many fields, such as in the workplaces or in job selections, raising ethical concerns about its use. Understanding these risks is pivotal to developing mitigation strategies and assuring that AI-based systems do not increase social inequalities, but rather contribute to more equitable outcomes. Future research directions include expanding the study to additional chatbots or languages, refining prompt engineering methods or further exploiting a larger experimental base.
Updated: 2025-07-25 10:57:29
标题: 一个关于大型语言模型中性别刻板印象表现的实证调查:以意大利为例
摘要: 随着大型语言模型(LLMs)在各个领域的广泛应用,人们开始担心它们如何容易地固化刻板印象并促成产生有偏见的内容。本文关注性别和职业偏见,研究LLMs在对无性别提示的回应中如何塑造,从而促成有偏见的输出。本研究采用结构化实验方法,提供涉及三种不同专业职业组合的不同提示,这些提示还具有层次关系。本研究使用意大利语,这是一个具有广泛语法性别差异的语言,以突显当前LLMs在非英语语言中生成客观文本的潜在局限。我们检查了两种流行的基于LLM的聊天机器人,即OpenAI ChatGPT(gpt-4o-mini)和Google Gemini(gemini-1.5-flash)。通过API,我们收集了3600个回应范围。结果突显了LLMs生成的内容如何固化刻板印象。例如,Gemini将100%(ChatGPT 97%)的“她”代词与“助理”而不是“经理”相关联。AI生成的文本中存在偏见对许多领域具有重要影响,例如工作场所或工作选择,引发了对其使用的伦理关注。了解这些风险对于制定缓解策略至关重要,确保基于AI的系统不会加剧社会不平等,而是促进更公平的结果。未来的研究方向包括将研究扩展到其他聊天机器人或语言,完善提示工程方法或进一步利用更大的实验基础。
更新时间: 2025-07-25 10:57:29
领域: cs.CL,cs.AI,cs.CY,cs.HC
An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case
The increasing use of Large Language Models (LLMs) in a large variety of domains has sparked worries about how easily they can perpetuate stereotypes and contribute to the generation of biased content. With a focus on gender and professional bias, this work examines in which manner LLMs shape responses to ungendered prompts, contributing to biased outputs. This analysis uses a structured experimental method, giving different prompts involving three different professional job combinations, which are also characterized by a hierarchical relationship. This study uses Italian, a language with extensive grammatical gender differences, to highlight potential limitations in current LLMs' ability to generate objective text in non-English languages. Two popular LLM-based chatbots are examined, namely OpenAI ChatGPT (gpt-4o-mini) and Google Gemini (gemini-1.5-flash). Through APIs, we collected a range of 3600 responses. The results highlight how content generated by LLMs can perpetuate stereotypes. For example, Gemini associated 100% (ChatGPT 97%) of 'she' pronouns to the 'assistant' rather than the 'manager'. The presence of bias in AI-generated text can have significant implications in many fields, such as in the workplaces or in job selections, raising ethical concerns about its use. Understanding these risks is pivotal to developing mitigation strategies and assuring that AI-based systems do not increase social inequalities, but rather contribute to more equitable outcomes. Future research directions include expanding the study to additional chatbots or languages, refining prompt engineering methods or further exploiting a larger experimental base.
Updated: 2025-07-25 10:57:29
标题: 一个关于大型语言模型中性别刻板形象表现的实证调查:以意大利为例
摘要: 大型语言模型(LLMs)在各种领域的日益广泛使用引发了人们对它们如何轻易地固化刻板印象和促成偏见内容生成的担忧。本文关注性别和职业偏见,研究LLMs以何种方式塑造对不带性别提示的回应,从而导致偏见输出。该分析采用了结构化的实验方法,提供了涉及三种不同职业工作组合的不同提示,这些职业也具有层次关系。本研究使用了意大利语,这是一种具有广泛的语法性别差异的语言,以突显当前LLMs在非英语语言中生成客观文本的潜在局限性。我们考察了两种流行的基于LLM的聊天机器人,即OpenAI ChatGPT(gpt-4o-mini)和Google Gemini(gemini-1.5-flash)。通过API,我们收集了3600个回应。结果突显了LLMs生成的内容如何固化刻板印象。例如,Gemini将100%的“她”代词(ChatGPT为97%)与“助理”而非“经理”关联起来。AI生成的文本中存在偏见可能对许多领域产生重大影响,如工作场所或工作选择,引发对其使用的伦理关切。了解这些风险对于制定缓解策略以及确保基于AI的系统不增加社会不平等,而是促成更加公平结果至关重要。未来的研究方向包括将研究扩展到其他聊天机器人或语言,完善提示工程方法或进一步利用更大的实验基础。
更新时间: 2025-07-25 10:57:29
领域: cs.CL,cs.AI,cs.CY,cs.HC
Harnessing intuitive local evolution rules for physical learning
Machine Learning, however popular and accessible, is computationally intensive and highly power-consuming, prompting interest in alternative physical implementations of learning tasks. We introduce a training scheme for physical systems that minimize power dissipation in which only boundary parameters (i.e. inputs and outputs) are externally controlled. Using this scheme, these Boundary-Enabled Adaptive State Tuning Systems (BEASTS) learn by exploiting local physical rules. Our scheme, BEASTAL (BEAST-Adaline), is the closest analog of the Adaline algorithm for such systems. We demonstrate this autonomous learning in silico for regression and classification tasks. Our approach advances previous physical learning schemes by using intuitive, local evolution rules without requiring large-scale memory or complex internal architectures. BEASTAL can perform any linear task, achieving best performance when the local evolution rule is non-linear.
Updated: 2025-07-25 10:51:42
标题: 利用直观的局部演化规则进行物理学习
摘要: 机器学习虽然流行且易于访问,但计算密集且耗电量高,这促使人们对学习任务的替代物理实现产生兴趣。我们介绍了一种用于最小化功耗的物理系统训练方案,其中只有边界参数(即输入和输出)受外部控制。利用这种方案,这些边界启用的自适应状态调整系统(BEASTS)通过利用本地物理规则进行学习。我们的方案,BEASTAL(BEAST-Adaline),是这类系统的Adaline算法的最接近模拟。我们在硅中模拟展示了这种自主学习用于回归和分类任务。我们的方法通过使用直观的本地演化规则而不需要大规模存储器或复杂的内部架构,推进了先前的物理学习方案。当本地演化规则为非线性时,BEASTAL可以执行任何线性任务,实现最佳性能。
更新时间: 2025-07-25 10:51:42
领域: cs.LG
Learnable cut flow for high energy physics
Neural networks have emerged as a powerful paradigm for tasks in high energy physics, yet their opaque training process renders them as a black box. In contrast, the traditional cut flow method offers simplicity and interpretability but requires extensive manual tuning to identify optimal cut boundaries. To merge the strengths of both approaches, we propose the Learnable Cut Flow (LCF), a neural network that transforms the traditional cut selection into a fully differentiable, data-driven process. LCF implements two cut strategies-parallel, where observable distributions are treated independently, and sequential, where prior cuts shape subsequent ones-to flexibly determine optimal boundaries. Building on this strategy, we introduce the Learnable Importance, a metric that quantifies feature importance and adjusts their contributions to the loss accordingly, offering model-driven insights unlike ad-hoc metrics. To ensure differentiability, a modified loss function replaces hard cuts with mask operations, preserving data shape throughout the training process. LCF is tested on six varied mock datasets and a realistic diboson vs. QCD dataset. Results demonstrate that LCF 1. accurately learns cut boundaries across typical feature distributions in both parallel and sequential strategies, 2. assigns higher importance to discriminative features with minimal overlap, 3. handles redundant or correlated features robustly, and 4. performs effectively in real-world scenarios. In the diboson dataset, LCF initially underperforms boosted decision trees and multiplayer perceptrons when using all observables. However, pruning less critical features-guided by learned importance-boosts its performance to match or exceed these baselines. LCF bridges the gap between traditional cut flow method and modern black-box neural networks, delivering actionable insights into the training process and feature importance.
Updated: 2025-07-25 10:49:47
标题: 可学习的高能物理切割流程
摘要: 神经网络已经成为高能物理任务中强大的范例,然而它们不透明的训练过程使其成为一个黑盒子。相比之下,传统的切割流程方法提供了简单性和可解释性,但需要大量手动调整来确定最佳切割边界。为了结合这两种方法的优势,我们提出了可学习切割流(LCF),这是一个神经网络,将传统的切割选择转化为一个完全可微分的、数据驱动的过程。LCF 实现了两种切割策略——并行,其中可观测分布被独立处理,以及顺序,其中先前的切割形状后续的切割,以灵活确定最佳边界。基于这一策略,我们引入了可学习重要性,这是一个度量特征重要性并相应调整它们对损失的贡献的指标,提供了不同于特定度量的模型驱动的见解。为了确保可微分性,一个修改后的损失函数用掩码操作替代硬切割,保留数据形状在整个训练过程中。LCF 在六个不同的模拟数据集和一个真实的双玻色子 vs. QCD 数据集上进行了测试。结果表明,LCF 1. 准确学习了并行和顺序策略中典型特征分布的切割边界,2. 对具有最小重叠的区分性特征分配了更高的重要性,3. 鲁棒地处理了冗余或相关特征,4. 在现实场景中表现良好。在双玻色子数据集中,当使用所有可观测变量时,LCF 最初表现不及提升决策树和多层感知器。然而,根据学习到的重要性进行剪枝不那么关键的特征,可以提高其性能,使其与或超过这些基线。LCF 架起了传统切割流程方法和现代黑盒子神经网络之间的鸿沟,为训练过程和特征重要性提供了可操作的见解。
更新时间: 2025-07-25 10:49:47
领域: cs.LG,hep-ph
ReCoDe: Reinforcement Learning-based Dynamic Constraint Design for Multi-Agent Coordination
Constraint-based optimization is a cornerstone of robotics, enabling the design of controllers that reliably encode task and safety requirements such as collision avoidance or formation adherence. However, handcrafted constraints can fail in multi-agent settings that demand complex coordination. We introduce ReCoDe--Reinforcement-based Constraint Design--a decentralized, hybrid framework that merges the reliability of optimization-based controllers with the adaptability of multi-agent reinforcement learning. Rather than discarding expert controllers, ReCoDe improves them by learning additional, dynamic constraints that capture subtler behaviors, for example, by constraining agent movements to prevent congestion in cluttered scenarios. Through local communication, agents collectively constrain their allowed actions to coordinate more effectively under changing conditions. In this work, we focus on applications of ReCoDe to multi-agent navigation tasks requiring intricate, context-based movements and consensus, where we show that it outperforms purely handcrafted controllers, other hybrid approaches, and standard MARL baselines. We give empirical (real robot) and theoretical evidence that retaining a user-defined controller, even when it is imperfect, is more efficient than learning from scratch, especially because ReCoDe can dynamically change the degree to which it relies on this controller.
Updated: 2025-07-25 10:47:39
标题: ReCoDe:基于强化学习的多智能体协调动态约束设计
摘要: 基于约束的优化是机器人技术的基石,可以设计出可靠地编码任务和安全要求的控制器,如避撞或形成粘附。然而,在要求复杂协调的多智能体环境中,手工设计的约束可能会失败。我们介绍了ReCoDe--基于强化学习的约束设计--一种分散的、混合的框架,将基于优化的控制器的可靠性与多智能体强化学习的适应性相结合。ReCoDe并不丢弃专家控制器,而是通过学习额外的、动态的约束来改进它们,以捕捉更微妙的行为,例如通过限制智能体的移动以防止拥挤场景中的拥堵。通过局部通信,智能体共同限制其允许的行动,在不断变化的条件下更有效地协调。在这项工作中,我们专注于将ReCoDe应用于需要复杂、基于上下文的移动和共识的多智能体导航任务,我们展示了它优于纯手工设计的控制器、其他混合方法和标准的多智能体强化学习基线。我们提供了实证(真实机器人)和理论证据,证明保留用户定义的控制器,即使它并不完美,也比从头开始学习更有效,特别是因为ReCoDe可以动态地调整依赖该控制器的程度。
更新时间: 2025-07-25 10:47:39
领域: cs.RO,cs.AI,cs.LG,cs.MA,I.2.9
ReCoDe: Reinforcement Learning-based Dynamic Constraint Design for Multi-Agent Coordination
Constraint-based optimization is a cornerstone of robotics, enabling the design of controllers that reliably encode task and safety requirements such as collision avoidance or formation adherence. However, handcrafted constraints can fail in multi-agent settings that demand complex coordination. We introduce ReCoDe--Reinforcement-based Constraint Design--a decentralized, hybrid framework that merges the reliability of optimization-based controllers with the adaptability of multi-agent reinforcement learning. Rather than discarding expert controllers, ReCoDe improves them by learning additional, dynamic constraints that capture subtler behaviors, for example, by constraining agent movements to prevent congestion in cluttered scenarios. Through local communication, agents collectively constrain their allowed actions to coordinate more effectively under changing conditions. In this work, we focus on applications of ReCoDe to multi-agent navigation tasks requiring intricate, context-based movements and consensus, where we show that it outperforms purely handcrafted controllers, other hybrid approaches, and standard MARL baselines. We give empirical (real robot) and theoretical evidence that retaining a user-defined controller, even when it is imperfect, is more efficient than learning from scratch, especially because ReCoDe can dynamically change the degree to which it relies on this controller.
Updated: 2025-07-25 10:47:39
标题: ReCoDe:基于强化学习的多智能体协调的动态约束设计
摘要: 基于约束的优化是机器人技术的基石,可以设计可靠地编码任务和安全需求的控制器,如避免碰撞或保持形态。然而,在要求复杂协调的多智能体环境中,手工制定的约束可能会失败。我们引入了ReCoDe--基于强化学习的约束设计--一种分布式、混合框架,将基于优化的控制器的可靠性与多智能体强化学习的适应性相结合。ReCoDe不是丢弃专家控制器,而是通过学习额外的动态约束来改进它们,捕捉更微妙的行为,例如通过约束智能体移动以防止在拥挤场景中发生拥堵。通过本地通信,智能体共同限制它们允许的动作,在不断变化的条件下更有效地协调。在这项工作中,我们专注于将ReCoDe应用于需要复杂、基于环境的移动和共识的多智能体导航任务,我们展示它优于纯手工制定的控制器、其他混合方法和标准的多智能体强化学习基线。我们提供了经验(真实机器人)和理论证据,表明保留用户定义的控制器,即使它并不完美,也比从头开始学习更为高效,尤其是因为ReCoDe可以动态地改变依赖于该控制器的程度。
更新时间: 2025-07-25 10:47:39
领域: cs.RO,cs.AI,cs.LG,cs.MA,I.2.9
Studying Cross-cluster Modularity in Neural Networks
An approach to improve neural network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We define a measure for clusterability and show that pre-trained models form highly enmeshed clusters via spectral graph clustering. We thus train models to be more modular using a "clusterability loss" function that encourages the formation of non-interacting clusters. We then investigate the emerging properties of these highly clustered models. We find our trained clustered models do not exhibit more task specialization, but do form smaller circuits. We investigate CNNs trained on MNIST and CIFAR, small transformers trained on modular addition, and GPT-2 and Pythia on the Wiki dataset, and Gemma on a Chemistry dataset. This investigation shows what to expect from clustered models.
Updated: 2025-07-25 10:41:54
标题: 研究神经网络中的跨集群模块化特性
摘要: 一种改进神经网络可解释性的方法是通过聚类性,即将模型分割成可以独立研究的不相交的簇。我们定义了一个衡量聚类性的指标,并展示了预训练模型通过谱图聚类形成高度交织的簇。因此,我们通过使用“聚类性损失”函数来训练模型,鼓励形成非交互作用的簇,从而使模型更加模块化。然后,我们调查了这些高度聚类模型的新性质。我们发现我们训练的聚类模型并不表现出更多的任务专业化,但确实形成了更小的电路。我们研究了在MNIST和CIFAR上训练的CNN,以及在模块化加法上训练的小型transformer,以及在Wiki数据集上的GPT-2和Pythia,以及在化学数据集上的Gemma。这项研究展示了对聚类模型有何期待。
更新时间: 2025-07-25 10:41:54
领域: cs.LG,cs.AI
Studying Cross-cluster Modularity in Neural Networks
An approach to improve neural network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We define a measure for clusterability and show that pre-trained models form highly enmeshed clusters via spectral graph clustering. We thus train models to be more modular using a "clusterability loss" function that encourages the formation of non-interacting clusters. We then investigate the emerging properties of these highly clustered models. We find our trained clustered models do not exhibit more task specialization, but do form smaller circuits. We investigate CNNs trained on MNIST and CIFAR, small transformers trained on modular addition, and GPT-2 and Pythia on the Wiki dataset, and Gemma on a Chemistry dataset. This investigation shows what to expect from clustered models.
Updated: 2025-07-25 10:41:54
标题: 研究神经网络中的跨集群模块化
摘要: 一种改善神经网络可解释性的方法是通过集群性,即将模型分为可以独立研究的不相交集群。我们定义了一种集群性的度量,并展示了预训练模型通过谱图聚类形成高度交织的集群。因此,我们训练模型更具模块化,使用一种“集群性损失”函数来鼓励非交互式集群的形成。然后,我们调查这些高度集群模型的新属性。我们发现我们训练的集群模型并没有表现出更多的任务专业化,但形成了更小的电路。我们调查了在MNIST和CIFAR上训练的CNN,模块化加法上训练的小型变压器,以及在维基数据集上的GPT-2和Pythia,以及在化学数据集上的Gemma。这项调查展示了从集群模型中可以期待什么。
更新时间: 2025-07-25 10:41:54
领域: cs.LG,cs.AI
Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
The excessive use of visual tokens in existing Multimoal Large Language Models (MLLMs) often exhibits obvious redundancy and brings in prohibitively expensive computation. To gain insights into this problem, we first conduct extensive empirical studies on the attention behaviors of MLLMs, and summarize three main inference stages in MLLMs: (i) Early fusion between tokens is first accomplished quickly. (ii) Intra-modality modeling then comes to play. (iii) Multimodal reasoning} resumes and lasts until the end of inference. In particular, we reveal that visual tokens will stop contributing to reasoning when the text tokens receive enough image information, yielding obvious visual redundancy. Based on these generalized observations, we propose a simple yet effective method to improve the efficiency of MLLMs, termed dynamic visual-token exit (DyVTE). DyVTE uses lightweight hyper-networks to perceive the text token status and decide the removal of all visual tokens after a certain layer, thereby addressing the observed visual redundancy. To validate VTE, we apply it to a set of MLLMs, including LLaVA, VILA, Eagle and InternVL, and conduct extensive experiments on a bunch of benchmarks. The experiment results not only show the effectiveness of our VTE in improving MLLMs' efficiency, but also yield the general modeling patterns of MLLMs, well facilitating the in-depth understanding of MLLMs. Our code is released at https://github.com/DoubtedSteam/DyVTE.
Updated: 2025-07-25 10:41:09
标题: 通过动态视觉标记退出和实证发现加速多模态大型语言模型
摘要: 现有的多模态大型语言模型(MLLMs)中过度使用视觉令牌通常表现出明显的冗余,并带来极高昂的计算。为了深入了解这个问题,我们首先对MLLMs的注意力行为进行了广泛的实证研究,并总结了MLLMs中的三个主要推理阶段:(i)令牌之间的早期融合首先迅速完成。(ii)然后进行内部模态建模。(iii)多模态推理持续进行直至推理结束。特别是,我们发现当文本令牌获得足够的图像信息时,视觉令牌将停止对推理的贡献,导致明显的视觉冗余。基于这些概括的观察,我们提出了一种简单而有效的方法来提高MLLMs的效率,称为动态视觉令牌退出(DyVTE)。DyVTE使用轻量级超网络来感知文本令牌状态,并在特定层之后决定移除所有视觉令牌,从而解决观察到的视觉冗余。为了验证VTE,我们将其应用于一组MLLMs,包括LLaVA、VILA、Eagle和InternVL,并在一系列基准测试上进行了广泛实验。实验结果不仅显示了我们的VTE在提高MLLMs效率方面的有效性,还产生了MLLMs的一般建模模式,有助于深入理解MLLMs。我们的代码已发布在https://github.com/DoubtedSteam/DyVTE。
更新时间: 2025-07-25 10:41:09
领域: cs.CV,cs.CL,cs.LG,cs.MM
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation
Large-scale text-to-image diffusion models have been a revolutionary milestone in the evolution of generative AI and multimodal technology, allowing wonderful image generation with natural-language text prompt. However, the issue of lacking controllability of such models restricts their practical applicability for real-life content creation. Thus, attention has been focused on leveraging a reference image to control text-to-image synthesis, which is also regarded as manipulating (or editing) a reference image as per a text prompt, namely, text-driven image-to-image translation. This paper contributes a novel, concise, and efficient approach that adapts pre-trained large-scale text-to-image (T2I) diffusion model to the image-to-image (I2I) paradigm in a plug-and-play manner, realizing high-quality and versatile text-driven I2I translation without any model training, model fine-tuning, or online optimization process. To guide T2I generation with a reference image, we propose to decompose diverse guiding factors with different frequency bands of diffusion features in the DCT spectral space, and accordingly devise a novel frequency band substitution layer which realizes dynamic control of the reference image to the T2I generation result in a plug-and-play manner. We demonstrate that our method allows flexible control over both guiding factor and guiding intensity of the reference image simply by tuning the type and bandwidth of the substituted frequency band, respectively. Extensive qualitative and quantitative experiments verify superiority of our approach over related methods in I2I translation visual quality, versatility, and controllability. The code is publicly available at: https://github.com/XiangGao1102/FBSDiff.
Updated: 2025-07-25 10:37:53
标题: FBSDiff:可控文本驱动图像翻译的即插即用频带替换扩散特征
摘要: 大规模文本到图像扩散模型已经成为生成式人工智能和多模态技术发展中的一个革命性里程碑,使得自然语言文本提示可以产生出美妙的图像。然而,这类模型缺乏可控性的问题限制了它们在实际内容创作中的实用性。因此,关注点转向利用参考图像来控制文本到图像合成,这也被视为根据文本提示操纵(或编辑)参考图像,即文本驱动的图像到图像翻译。本文提出了一种新颖、简洁、高效的方法,以即插即用的方式将预训练的大规模文本到图像(T2I)扩散模型调整到图像到图像(I2I)范式,实现高质量、多功能的文本驱动I2I翻译,无需进行模型训练、微调或在线优化过程。为了引导T2I生成使用参考图像,我们提出了在DCT频谱空间中用不同频率带的扩散特征分解多样引导因素,并相应设计了一种新颖的频带替换层,以即插即用的方式实现对参考图像对T2I生成结果的动态控制。我们证明,我们的方法允许通过简单调整替代频率带的类型和带宽,灵活控制参考图像的引导因素和引导强度。广泛的定性和定量实验验证了我们的方法在I2I翻译视觉质量、多功能性和可控性方面优于相关方法。代码公开可用于:https://github.com/XiangGao1102/FBSDiff。
更新时间: 2025-07-25 10:37:53
领域: cs.CV,cs.AI
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation
Large-scale text-to-image diffusion models have been a revolutionary milestone in the evolution of generative AI and multimodal technology, allowing wonderful image generation with natural-language text prompt. However, the issue of lacking controllability of such models restricts their practical applicability for real-life content creation. Thus, attention has been focused on leveraging a reference image to control text-to-image synthesis, which is also regarded as manipulating (or editing) a reference image as per a text prompt, namely, text-driven image-to-image translation. This paper contributes a novel, concise, and efficient approach that adapts pre-trained large-scale text-to-image (T2I) diffusion model to the image-to-image (I2I) paradigm in a plug-and-play manner, realizing high-quality and versatile text-driven I2I translation without any model training, model fine-tuning, or online optimization process. To guide T2I generation with a reference image, we propose to decompose diverse guiding factors with different frequency bands of diffusion features in the DCT spectral space, and accordingly devise a novel frequency band substitution layer which realizes dynamic control of the reference image to the T2I generation result in a plug-and-play manner. We demonstrate that our method allows flexible control over both guiding factor and guiding intensity of the reference image simply by tuning the type and bandwidth of the substituted frequency band, respectively. Extensive qualitative and quantitative experiments verify superiority of our approach over related methods in I2I translation visual quality, versatility, and controllability. The code is publicly available at: https://github.com/XiangGao1102/FBSDiff.
Updated: 2025-07-25 10:37:53
标题: FBSDiff:用于高度可控文本驱动图像翻译的即插即用频段替换扩散特征
摘要: 大规模文本到图像扩散模型已经成为生成AI和多模态技术发展中的一个革命性里程碑,使得自然语言文本提示下的出色图像生成成为可能。然而,这类模型缺乏可控性的问题限制了它们在实际内容创作中的应用。因此,人们开始关注利用参考图像来控制文本到图像合成,这也被视为根据文本提示操纵(或编辑)参考图像的过程,即文本驱动的图像到图像转换。本文提出了一种新颖、简洁和高效的方法,以插拔方式将预训练的大规模文本到图像(T2I)扩散模型适应到图像到图像(I2I)范式中,实现了高质量且多功能的文本驱动I2I转换,无需进行模型训练、微调或在线优化过程。为了引导T2I生成与参考图像,我们提出在DCT频谱空间中分解不同频率带扩散特征的多样引导因素,并相应地设计了一种新颖的频率带替换层,以插拔方式实现对参考图像在T2I生成结果中的动态控制。我们展示了我们的方法允许通过调整替代频率带的类型和带宽,灵活控制参考图像的引导因素和引导强度。广泛的定性和定量实验证明了我们的方法在I2I转换的视觉质量、多功能性和可控性方面优于相关方法。代码公开可用在:https://github.com/XiangGao1102/FBSDiff。
更新时间: 2025-07-25 10:37:53
领域: cs.CV,cs.AI
Diverse and Adaptive Behavior Curriculum for Autonomous Driving: A Student-Teacher Framework with Multi-Agent RL
Autonomous driving faces challenges in navigating complex real-world traffic, requiring safe handling of both common and critical scenarios. Reinforcement learning (RL), a prominent method in end-to-end driving, enables agents to learn through trial and error in simulation. However, RL training often relies on rule-based traffic scenarios, limiting generalization. Additionally, current scenario generation methods focus heavily on critical scenarios, neglecting a balance with routine driving behaviors. Curriculum learning, which progressively trains agents on increasingly complex tasks, is a promising approach to improving the robustness and coverage of RL driving policies. However, existing research mainly emphasizes manually designed curricula, focusing on scenery and actor placement rather than traffic behavior dynamics. This work introduces a novel student-teacher framework for automatic curriculum learning. The teacher, a graph-based multi-agent RL component, adaptively generates traffic behaviors across diverse difficulty levels. An adaptive mechanism adjusts task difficulty based on student performance, ensuring exposure to behaviors ranging from common to critical. The student, though exchangeable, is realized as a deep RL agent with partial observability, reflecting real-world perception constraints. Results demonstrate the teacher's ability to generate diverse traffic behaviors. The student, trained with automatic curricula, outperformed agents trained on rule-based traffic, achieving higher rewards and exhibiting balanced, assertive driving.
Updated: 2025-07-25 10:35:30
标题: 多样化和适应性行为课程对自动驾驶的影响:基于多智能体强化学习的学生-教师框架
摘要: 自动驾驶面临着在复杂的现实世界交通中导航的挑战,需要安全地处理常见和关键情景。强化学习(RL)是端到端驾驶中一种著名的方法,使代理能够通过在模拟中的实验和错误中学习。然而,RL训练通常依赖于基于规则的交通场景,限制了泛化能力。此外,当前的场景生成方法主要侧重于关键情景,忽视了与例行驾驶行为的平衡。课程学习逐渐训练代理人执行越来越复杂的任务,是改善RL驾驶策略的鲁棒性和覆盖范围的一种有前途的方法。然而,现有研究主要强调手动设计的课程,侧重于场景和角色安置而不是交通行为动态。本文介绍了一种用于自动课程学习的新型师生框架。教师是一个基于图形的多智能体RL组件,能够自适应地生成各种难度级别的交通行为。一个自适应机制根据学生表现调整任务难度,确保接触到从常见到关键的行为。学生虽然可互换,但被实现为一个具有部分可观察性的深度RL代理,反映了现实世界的感知约束。结果表明,教师能够生成多样化的交通行为。通过自动课程训练的学生,表现优于基于规则的交通训练的代理,获得更高的奖励并表现出平衡、果断的驾驶行为。
更新时间: 2025-07-25 10:35:30
领域: cs.RO,cs.LG
Solar Photovoltaic Assessment with Large Language Model
Accurate detection and localization of solar photovoltaic (PV) panels in satellite imagery is essential for optimizing microgrids and active distribution networks (ADNs), which are critical components of renewable energy systems. Existing methods lack transparency regarding their underlying algorithms or training datasets, rely on large, high-quality PV training data, and struggle to generalize to new geographic regions or varied environmental conditions without extensive re-training. These limitations lead to inconsistent detection outcomes, hindering large-scale deployment and data-driven grid optimization. In this paper, we investigate how large language models (LLMs) can be leveraged to overcome these challenges. Despite their promise, LLMs face several challenges in solar panel detection, including difficulties with multi-step logical processes, inconsistent output formatting, frequent misclassification of visually similar objects (e.g., shadows, parking lots), and low accuracy in complex tasks such as spatial localization and quantification. To overcome these issues, we propose the PV Assessment with LLMs (PVAL) framework, which incorporates task decomposition for more efficient workflows, output standardization for consistent and scalable formatting, few-shot prompting to enhance classification accuracy, and fine-tuning using curated PV datasets with detailed annotations. PVAL ensures transparency, scalability, and adaptability across heterogeneous datasets while minimizing computational overhead. By combining open-source accessibility with robust methodologies, PVAL establishes an automated and reproducible pipeline for solar panel detection, paving the way for large-scale renewable energy integration and optimized grid management.
Updated: 2025-07-25 10:26:29
标题: 大型语言模型在太阳能光伏评估中的应用
摘要: 太阳能光伏(PV)面板在卫星图像中的准确检测和定位对于优化微网和主动配电网络(ADNs)至关重要,这些是可再生能源系统的关键组成部分。现有方法缺乏关于其基础算法或训练数据集的透明度,依赖于大规模、高质量的PV训练数据,并且在没有进行大量重新训练的情况下很难推广到新的地理区域或不同的环境条件。这些限制导致检测结果不一致,阻碍了大规模部署和基于数据的电网优化。在本文中,我们探讨了如何利用大型语言模型(LLMs)来克服这些挑战。尽管LLMs具有潜力,但在太阳能面板检测中面临着一些挑战,包括多步逻辑过程的困难、不一致的输出格式、频繁将视觉上相似的对象(例如阴影、停车场)错误分类,以及在空间定位和量化等复杂任务中的低准确性。为了克服这些问题,我们提出了PV评估与LLMs(PVAL)框架,该框架包括任务分解以实现更高效的工作流程、输出标准化以实现一致和可扩展的格式、少样本提示以提高分类准确性,以及使用带有详细注释的筛选PV数据集进行微调。PVAL确保了透明度、可伸缩性和适应性跨异构数据集,同时最大限度地减少了计算开销。通过将开源可访问性与强大的方法学相结合,PVAL建立了一个自动化且可重现的太阳能面板检测流水线,为大规模可再生能源整合和优化网格管理铺平了道路。
更新时间: 2025-07-25 10:26:29
领域: cs.LG,cs.AI
Solar Photovoltaic Assessment with Large Language Model
Accurate detection and localization of solar photovoltaic (PV) panels in satellite imagery is essential for optimizing microgrids and active distribution networks (ADNs), which are critical components of renewable energy systems. Existing methods lack transparency regarding their underlying algorithms or training datasets, rely on large, high-quality PV training data, and struggle to generalize to new geographic regions or varied environmental conditions without extensive re-training. These limitations lead to inconsistent detection outcomes, hindering large-scale deployment and data-driven grid optimization. In this paper, we investigate how large language models (LLMs) can be leveraged to overcome these challenges. Despite their promise, LLMs face several challenges in solar panel detection, including difficulties with multi-step logical processes, inconsistent output formatting, frequent misclassification of visually similar objects (e.g., shadows, parking lots), and low accuracy in complex tasks such as spatial localization and quantification. To overcome these issues, we propose the PV Assessment with LLMs (PVAL) framework, which incorporates task decomposition for more efficient workflows, output standardization for consistent and scalable formatting, few-shot prompting to enhance classification accuracy, and fine-tuning using curated PV datasets with detailed annotations. PVAL ensures transparency, scalability, and adaptability across heterogeneous datasets while minimizing computational overhead. By combining open-source accessibility with robust methodologies, PVAL establishes an automated and reproducible pipeline for solar panel detection, paving the way for large-scale renewable energy integration and optimized grid management.
Updated: 2025-07-25 10:26:29
标题: 利用大型语言模型进行太阳能光伏评估
摘要: 太阳能光伏(PV)面板在卫星图像中的准确检测和定位对于优化微网和主动配电网(ADNs)至关重要,这些是可再生能源系统的关键组成部分。现有方法缺乏对其基础算法或训练数据集的透明度,依赖大型高质量的PV训练数据,并且在没有进行大量重新训练的情况下难以推广到新的地理区域或不同的环境条件。这些限制导致检测结果不一致,阻碍了大规模部署和基于数据的网格优化。在本文中,我们探讨了如何利用大型语言模型(LLMs)来克服这些挑战。尽管它们有潜力,但LLMs在太阳能面板检测中面临一些挑战,包括多步逻辑过程的困难、输出格式不一致、经常将视觉上相似的对象(例如阴影、停车场)错误分类以及在空间定位和量化等复杂任务中的低准确性。为了解决这些问题,我们提出了PV评估与LLMs(PVAL)框架,该框架包括任务分解以实现更高效的工作流程、输出标准化以实现一致和可扩展的格式、少量提示以增强分类准确性,以及使用详细注释的策划PV数据集进行微调。PVAL确保跨异构数据集的透明度、可扩展性和适应性,同时最大限度地减少计算开销。通过将开源可访问性与强大的方法论相结合,PVAL建立了一个自动化和可重现的太阳能面板检测流水线,为大规模可再生能源整合和优化网格管理铺平了道路。
更新时间: 2025-07-25 10:26:29
领域: cs.LG,cs.AI
Game-Theoretic Gradient Control for Robust Neural Network Training
Feed-forward neural networks (FFNNs) are vulnerable to input noise, reducing prediction performance. Existing regularization methods like dropout often alter network architecture or overlook neuron interactions. This study aims to enhance FFNN noise robustness by modifying backpropagation, interpreted as a multi-agent game, and exploring controlled target variable noising. Our "gradient dropout" selectively nullifies hidden layer neuron gradients with probability 1 - p during backpropagation, while keeping forward passes active. This is framed within compositional game theory. Additionally, target variables were perturbed with white noise or stable distributions. Experiments on ten diverse tabular datasets show varying impacts: improvement or diminishing of robustness and accuracy, depending on dataset and hyperparameters. Notably, on regression tasks, gradient dropout (p = 0.9) combined with stable distribution target noising significantly increased input noise robustness, evidenced by flatter MSE curves and more stable SMAPE values. These results highlight the method's potential, underscore the critical role of adaptive parameter tuning, and open new avenues for analyzing neural networks as complex adaptive systems exhibiting emergent behavior within a game-theoretic framework.
Updated: 2025-07-25 10:26:25
标题: 博弈论梯度控制用于稳健神经网络训练
摘要: 前馈神经网络(FFNNs)容易受到输入噪音的影响,降低了预测性能。现有的正规化方法如dropout经常改变网络架构或忽视神经元之间的相互作用。本研究旨在通过修改反向传播,将其解释为一个多智能体游戏,并探索受控目标变量加噪来增强FFNN的噪声鲁棒性。我们的“梯度dropout”在反向传播过程中有概率1-p选择性地使隐藏层神经元梯度无效,同时保持正向传递活跃。这是在组合博弈理论框架内进行的。此外,目标变量受到白噪声或稳定分布的扰动。对十个不同的表格数据集进行的实验显示不同的影响:根据数据集和超参数的不同,提高或降低了鲁棒性和准确性。值得注意的是,在回归任务中,梯度dropout(p = 0.9)结合稳定分布目标加噪显著提高了输入噪声鲁棒性,表现为更平缓的MSE曲线和更稳定的SMAPE值。这些结果突显了该方法的潜力,强调了自适应参数调整的关键作用,并为在博弈理论框架内分析神经网络作为复杂自适应系统,展现出新的研究方向。
更新时间: 2025-07-25 10:26:25
领域: cs.NE,cs.LG,68T07,I.2.6
Assessment of Personality Dimensions Across Situations Using Conversational Speech
Prior research indicates that users prefer assistive technologies whose personalities align with their own. This has sparked interest in automatic personality perception (APP), which aims to predict an individual's perceived personality traits. Previous studies in APP have treated personalities as static traits, independent of context. However, perceived personalities can vary by context and situation as shown in psychological research. In this study, we investigate the relationship between conversational speech and perceived personality for participants engaged in two work situations (a neutral interview and a stressful client interaction). Our key findings are: 1) perceived personalities differ significantly across interactions, 2) loudness, sound level, and spectral flux features are indicative of perceived extraversion, agreeableness, conscientiousness, and openness in neutral interactions, while neuroticism correlates with these features in stressful contexts, 3) handcrafted acoustic features and non-verbal features outperform speaker embeddings in inference of perceived personality, and 4) stressful interactions are more predictive of neuroticism, aligning with existing psychological research.
Updated: 2025-07-25 10:18:28
标题: 使用会话语音评估不同情境下的人格维度
摘要: 先前的研究表明,用户更喜欢与自己个性相符的辅助技术。这引发了对自动个性感知(APP)的兴趣,该技术旨在预测个人被感知的个性特征。先前的APP研究将个性视为静态特征,独立于上下文。然而,正如心理研究所显示的,被感知的个性在不同的上下文和情境中可能会有所变化。在这项研究中,我们调查了参与者在两种工作情境(中性面试和紧张的客户互动)中从事对话时的言语和被感知个性之间的关系。我们的主要发现是:1)在不同互动中,被感知的个性存在显著差异;2)在中性互动中,响度、声音级别和频谱通量特征表明被感知的外向性、宜人性、尽责性和开放性,而神经质与这些特征在紧张情境中相关;3)手工制作的声学特征和非语言特征在推断被感知的个性方面优于说话者嵌入;4)紧张的互动更容易预测神经质,这与现有的心理研究结果相吻合。
更新时间: 2025-07-25 10:18:28
领域: eess.AS,cs.AI,cs.SD
Assessment of Personality Dimensions Across Situations Using Conversational Speech
Prior research indicates that users prefer assistive technologies whose personalities align with their own. This has sparked interest in automatic personality perception (APP), which aims to predict an individual's perceived personality traits. Previous studies in APP have treated personalities as static traits, independent of context. However, perceived personalities can vary by context and situation as shown in psychological research. In this study, we investigate the relationship between conversational speech and perceived personality for participants engaged in two work situations (a neutral interview and a stressful client interaction). Our key findings are: 1) perceived personalities differ significantly across interactions, 2) loudness, sound level, and spectral flux features are indicative of perceived extraversion, agreeableness, conscientiousness, and openness in neutral interactions, while neuroticism correlates with these features in stressful contexts, 3) handcrafted acoustic features and non-verbal features outperform speaker embeddings in inference of perceived personality, and 4) stressful interactions are more predictive of neuroticism, aligning with existing psychological research.
Updated: 2025-07-25 10:18:28
标题: 使用对话语音评估不同情境下的个性维度
摘要: 先前的研究表明,用户更喜欢与他们自己的个性相符的辅助技术。这引起了对自动个性感知(APP)的兴趣,该技术旨在预测个体的感知个性特征。先前的APP研究将个性视为静态特征,独立于上下文。然而,心理研究显示,感知个性在不同上下文和情境中可能有所不同。在这项研究中,我们调查了参与两种工作情境(中性面试和紧张客户互动)的参与者的对话语言和感知个性之间的关系。我们的主要发现是:1)不同互动之间的感知个性存在显著差异,2)在中性互动中,声音的大声程度、声音水平和谱通量特征表明了感知外向、宜人、尽责和开放性,而在紧张情境中,神经质与这些特征相关,3)手工制作的声学特征和非语言特征在推断感知个性方面优于说话者嵌入,4)紧张互动更具预测性的是神经质,与现有的心理研究相一致。
更新时间: 2025-07-25 10:18:28
领域: eess.AS,cs.AI,cs.SD
OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth?
Computer-using agents have shown strong potential to boost human productivity and enable new application forms across platforms. While recent advances have led to usable applications, existing benchmarks fail to account for the internal task heterogeneity and the corresponding agent capabilities, as well as their alignment with actual user demands-hindering both targeted capability development and the reliable transition of research progress into practical deployment. To bridge the gap, we present OS-MAP, a benchmark for daily computer-using automation that organizes its 416 realistic tasks across 15 applications along two key dimensions: a five-level taxonomy of automation and a generalization scope derived from a real-world user demand hierarchy. To enable fine-grained analysis of required capabilities and alignment with real-world scenarios, OS-MAP evaluates agents along two dimensions: automation level across a five-level taxonomy, and generalization scope across a demand hierarchy. This design captures varying levels of required agent autonomy and generalization, forming a performance-generalization evaluation matrix for structured and comprehensive assessment. Experiments show that even State-of-the-Art agents with VLM backbones struggle with higher-level tasks involving perception, reasoning, and coordination-highlighting the need for a deeper understanding of current strengths and limitations to drive the future progress in computer-using agents research and deployment. All code, environments, baselines, and data are publicly available at https://github.com/OS-Copilot/OS-Map.
Updated: 2025-07-25 10:14:53
标题: OS-MAP:计算机使用代理能在广度和深度上走多远?
摘要: 计算机使用代理已经展现出巨大潜力,可以提高人类生产力,并在各个平台上实现新的应用形式。尽管最近的进展已经导致可用的应用程序,但现有的基准测试未能考虑内部任务的异质性以及相应的代理能力,以及它们与实际用户需求的一致性,这阻碍了针对性能力的发展和研究进展可靠地过渡到实际部署。为了弥合这一差距,我们提出了OS-MAP,这是一个用于日常计算机自动化的基准测试,将其416个现实任务组织在15个应用程序中,涵盖了两个关键维度:一个五级自动化分类和一个来自实际用户需求层次结构的泛化范围。为了实现对所需能力和与实际场景的一致性的细粒度分析,OS-MAP沿着两个维度评估代理:自动化分类的五级分类和需求层次结构的泛化范围。这种设计捕捉了所需代理自主性和泛化的不同水平,形成了一个用于结构化和全面评估的性能-泛化评估矩阵。实验表明,即使具有VLM骨干的最先进代理在涉及感知、推理和协调的高级任务上也遇到困难,突出了对当前优势和局限性的深入理解,推动未来计算机使用代理研究和部署的进展。所有代码、环境、基线和数据都可以在https://github.com/OS-Copilot/OS-Map上公开获取。
更新时间: 2025-07-25 10:14:53
领域: cs.AI,cs.CL,cs.CV,cs.HC
OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth?
Computer-using agents have shown strong potential to boost human productivity and enable new application forms across platforms. While recent advances have led to usable applications, existing benchmarks fail to account for the internal task heterogeneity and the corresponding agent capabilities, as well as their alignment with actual user demands-hindering both targeted capability development and the reliable transition of research progress into practical deployment. To bridge the gap, we present OS-MAP, a benchmark for daily computer-using automation that organizes its 416 realistic tasks across 15 applications along two key dimensions: a five-level taxonomy of automation and a generalization scope derived from a real-world user demand hierarchy. To enable fine-grained analysis of required capabilities and alignment with real-world scenarios, OS-MAP evaluates agents along two dimensions: automation level across a five-level taxonomy, and generalization scope across a demand hierarchy. This design captures varying levels of required agent autonomy and generalization, forming a performance-generalization evaluation matrix for structured and comprehensive assessment. Experiments show that even State-of-the-Art agents with VLM backbones struggle with higher-level tasks involving perception, reasoning, and coordination-highlighting the need for a deeper understanding of current strengths and limitations to drive the future progress in computer-using agents research and deployment. All code, environments, baselines, and data are publicly available at https://github.com/OS-Copilot/OS-Map.
Updated: 2025-07-25 10:14:53
标题: OS-MAP:计算机使用代理人在广度和深度上能走多远?
摘要: 计算机代理已经表现出强大的潜力,可以提高人类生产力并在各个平台上实现新的应用形式。虽然最近的进展已经导致可用的应用程序,但现有的基准测试未能考虑到内部任务的异质性以及相应的代理能力,以及它们与实际用户需求的一致性,这阻碍了针对性能力发展和研究进展可靠地转化为实际部署。为了弥合这一差距,我们提出了OS-MAP,这是一个用于日常计算机自动化的基准,通过两个关键维度将其416个现实任务组织在15个应用程序中:自动化的五级分类和从实际用户需求层次结构中衍生出的泛化范围。为了实现对所需能力和与实际场景的一致性进行细粒度分析,OS-MAP沿着两个维度评估代理:自动化水平跨越五级分类,以及泛化范围跨越需求层次结构。这种设计捕捉了所需代理自主性和泛化的不同水平,形成了一个结构化和全面评估的性能-泛化评估矩阵。实验表明,即使具有VLM骨干的最新技术代理在涉及感知、推理和协调的高级任务上也存在困难,这突显了对当前优势和局限性进行更深入了解以推动未来计算机代理研究和部署的进展的必要性。所有代码、环境、基准线和数据都可以在https://github.com/OS-Copilot/OS-Map 上公开获取。
更新时间: 2025-07-25 10:14:53
领域: cs.AI,cs.CL,cs.CV,cs.HC
Reshaping MOFs text mining with a dynamic multi-agents framework of large language model
Accurately identifying synthesis conditions for metal-organic frameworks (MOFs) remains a critical bottleneck in materials research, as translating literature-derived knowledge into actionable insights is hindered by the unstructured and heterogeneous nature of scientific texts. Here we present MOFh6, a large language model (LLM)-based multi-agent system designed to extract, structure, and apply synthesis knowledge from diverse input formats, including raw literature and crystal codes. Built on gpt-4o-mini and fine-tuned with up to few-shot expert-annotated data, MOFh6 achieves 99% accuracy in synthesis data parsing and resolves 94.1% of complex co-reference abbreviations. It processes a single full-text document in 9.6 seconds and localizes structured synthesis descriptions within 36 seconds, with the cost per 100 papers reduced to USD 4.24, a 76% saving over existing systems. By addressing long-standing limitations in cross-paragraph semantic fusion and terminology standardization, MOFh6 reshapes the LLM-based paradigm for MOF synthesis research, transforming static retrieval into an integrated and dynamic knowledge acquisition process. This shift bridges the gap between scientific literature and actionable synthesis design, providing a scalable framework for accelerating materials discovery.
Updated: 2025-07-25 10:08:19
标题: 用大语言模型的动态多智能体框架重塑MOFs文本挖掘
摘要: 精确地确定金属有机框架(MOFs)的合成条件仍然是材料研究中的一个关键瓶颈,因为将文献中获得的知识转化为可操作的见解受到科学文本的非结构化和异构性的阻碍。在这里,我们介绍了MOFh6,这是一个基于大型语言模型(LLM)的多智能体系统,旨在从各种输入格式中提取、结构化和应用合成知识,包括原始文献和晶体代码。MOFh6建立在gpt-4o-mini上,并通过少量专家注释的数据进行了微调,实现了99%的合成数据解析准确率,并解决了94.1%的复杂共指缩写。它能够在9.6秒内处理单个全文文档,并在36秒内定位结构化的合成描述,每100篇论文的成本降低到4.24美元,比现有系统节省了76%。通过解决跨段落语义融合和术语标准化方面的长期限制,MOFh6重塑了基于LLM的MOF合成研究范式,将静态检索转变为集成和动态知识获取过程。这种转变弥合了科学文献与可操作合成设计之间的差距,为加速材料发现提供了可扩展的框架。
更新时间: 2025-07-25 10:08:19
领域: cs.AI,cond-mat.mtrl-sci
Reshaping MOFs text mining with a dynamic multi-agents framework of large language model
Accurately identifying synthesis conditions for metal-organic frameworks (MOFs) remains a critical bottleneck in materials research, as translating literature-derived knowledge into actionable insights is hindered by the unstructured and heterogeneous nature of scientific texts. Here we present MOFh6, a large language model (LLM)-based multi-agent system designed to extract, structure, and apply synthesis knowledge from diverse input formats, including raw literature and crystal codes. Built on gpt-4o-mini and fine-tuned with up to few-shot expert-annotated data, MOFh6 achieves 99% accuracy in synthesis data parsing and resolves 94.1% of complex co-reference abbreviations. It processes a single full-text document in 9.6 seconds and localizes structured synthesis descriptions within 36 seconds, with the cost per 100 papers reduced to USD 4.24, a 76% saving over existing systems. By addressing long-standing limitations in cross-paragraph semantic fusion and terminology standardization, MOFh6 reshapes the LLM-based paradigm for MOF synthesis research, transforming static retrieval into an integrated and dynamic knowledge acquisition process. This shift bridges the gap between scientific literature and actionable synthesis design, providing a scalable framework for accelerating materials discovery.
Updated: 2025-07-25 10:08:19
标题: 用大语言模型的动态多代理框架重塑MOFs文本挖掘
摘要: 准确确定金属有机框架(MOFs)的合成条件仍然是材料研究中的关键瓶颈,因为将文献中获得的知识转化为可操作的见解受到科学文本的非结构化和异质性的阻碍。在这里,我们介绍了MOFh6,一个基于大型语言模型(LLM)的多智能体系统,旨在从多种输入格式中提取、结构化和应用合成知识,包括原始文献和晶体代码。基于gpt-4o-mini构建,并通过少量专家注释的数据进行微调,MOFh6在合成数据解析中达到了99%的准确率,解决了94.1%的复杂的共指缩写。它能在9.6秒内处理一篇完整的文档,并在36秒内定位结构化的合成描述,每100篇论文的成本降低到4.24美元,比现有系统节省了76%。通过解决长期存在的跨段落语义融合和术语标准化方面的局限性,MOFh6重塑了基于LLM的MOF合成研究范式,将静态检索转变为集成和动态知识获取过程。这一转变弥合了科学文献和可操作合成设计之间的差距,为加速材料发现提供了一个可扩展的框架。
更新时间: 2025-07-25 10:08:19
领域: cs.AI,cond-mat.mtrl-sci
Large Language Models as Attribution Regularizers for Efficient Model Training
Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains. However, effectively leveraging their vast knowledge for training smaller downstream models remains an open challenge, especially in domains like tabular data learning, where simpler models are often preferred due to interpretability and efficiency. In this paper, we introduce a novel yet straightforward method for incorporating LLM-generated global task feature attributions into the training process of smaller networks. Specifically, we propose an attribution-matching regularization term that aligns the training dynamics of the smaller model with the insights provided by the LLM. By doing so, our approach yields superior performance in few-shot learning scenarios. Notably, our method requires only black-box API access to the LLM, making it easy to integrate into existing training pipelines with minimal computational overhead. Furthermore, we demonstrate how this method can be used to address common issues in real-world datasets, such as skewness and bias. By integrating high-level knowledge from LLMs, our approach improves generalization, even when training data is limited or imbalanced. We validate its effectiveness through extensive experiments across multiple tasks, demonstrating improved learning efficiency and model robustness.
Updated: 2025-07-25 09:56:38
标题: 大型语言模型作为归因正则化器,用于高效模型训练
摘要: 大型语言模型(LLM)在各个领域展现出了卓越的性能。然而,有效利用它们丰富的知识来训练较小的下游模型仍然是一个挑战,特别是在诸如表格数据学习等领域,简单模型通常更受青睐,因为可解释性和效率。 在本文中,我们介绍了一种新颖而简单的方法,将LLM生成的全局任务特征归因融入到较小网络的训练过程中。具体来说,我们提出了一个归因匹配正则项,将较小模型的训练动态与LLM提供的见解对齐。通过这样做,我们的方法在少样本学习场景中表现出更优越的性能。值得注意的是,我们的方法只需要对LLM进行黑盒API访问,易于与现有训练流程集成,计算开销最小。 此外,我们展示了如何利用这种方法来解决真实世界数据集中的常见问题,如偏斜和偏见。通过整合LLM的高级知识,我们的方法提高了泛化能力,即使训练数据有限或不平衡。我们通过多个任务的广泛实验验证了其有效性,展示了改进的学习效率和模型鲁棒性。
更新时间: 2025-07-25 09:56:38
领域: cs.LG,cs.AI,I.2.6
Large Language Models as Attribution Regularizers for Efficient Model Training
Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains. However, effectively leveraging their vast knowledge for training smaller downstream models remains an open challenge, especially in domains like tabular data learning, where simpler models are often preferred due to interpretability and efficiency. In this paper, we introduce a novel yet straightforward method for incorporating LLM-generated global task feature attributions into the training process of smaller networks. Specifically, we propose an attribution-matching regularization term that aligns the training dynamics of the smaller model with the insights provided by the LLM. By doing so, our approach yields superior performance in few-shot learning scenarios. Notably, our method requires only black-box API access to the LLM, making it easy to integrate into existing training pipelines with minimal computational overhead. Furthermore, we demonstrate how this method can be used to address common issues in real-world datasets, such as skewness and bias. By integrating high-level knowledge from LLMs, our approach improves generalization, even when training data is limited or imbalanced. We validate its effectiveness through extensive experiments across multiple tasks, demonstrating improved learning efficiency and model robustness.
Updated: 2025-07-25 09:56:38
标题: 大型语言模型作为归因正则化器,用于高效模型训练
摘要: 大型语言模型(LLMs)在各个领域展现了出色的表现。然而,有效利用它们的广泛知识来训练较小的下游模型仍然是一个挑战,特别是在诸如表格数据学习等领域,因为在这些领域中通常更喜欢简单的模型以便解释和效率。 在本文中,我们介绍了一种新颖但简单的方法,将LLM生成的全局任务特征归因融入到较小网络的训练过程中。具体地,我们提出了一个归因匹配的正则化项,将较小模型的训练动态与LLM提供的见解对齐。通过这种方式,我们的方法在少样本学习场景中表现出更好的性能。值得注意的是,我们的方法只需要对LLM进行黑盒API访问,易于集成到现有的训练流程中,并带来最小的计算开销。 此外,我们展示了如何利用这种方法解决真实数据集中的常见问题,如偏斜和偏见。通过整合LLMs的高级知识,我们的方法提高了泛化性能,即使在训练数据有限或不平衡的情况下也是如此。我们通过多个任务上的广泛实验验证了其有效性,展示了改进的学习效率和模型的稳健性。
更新时间: 2025-07-25 09:56:38
领域: cs.LG,cs.AI,I.2.6
PatchTraj: Dynamic Patch Representation Learning for Time-Frequency Trajectory Prediction
Pedestrian trajectory prediction is crucial for autonomous driving and robotics. While existing point-based and grid-based methods expose two key limitations: insufficiently modeling human motion dynamics, as they fail to balance local motion details with long-range spatiotemporal dependencies, and the time representation lacks interaction with the frequency domain in modeling trajectory sequences. To address these challenges, we propose PatchTraj, a dynamic patch-based trajectory prediction framework that unifies time-domain and frequency-domain representations. Specifically, we decompose the trajectory into raw time sequences and frequency components, employing dynamic patch partitioning for multi-scale trajectory segmentation to capture hierarchical motion patterns. Each patch is processed by an adaptive embedding layer with scale-aware feature extraction, followed by hierarchical feature aggregation to model both fine-grained and long-range dependencies. The outputs of two branches interact via cross-modal attention, enabling complementary fusion of temporal and spectral cues. Finally, a Transformer encoder-decoder integrates both modalities to autoregressively predict future trajectories. Extensive experiments on ETH-UCY, SDD, NBA, and JRDB datasets demonstrate that our method achieves state-of-the-art performance with high efficiency.
Updated: 2025-07-25 09:55:33
标题: PatchTraj:用于时频轨迹预测的动态贴片表示学习
摘要: 行人轨迹预测对于自动驾驶和机器人技术至关重要。现有的基于点和基于网格的方法存在两个关键局限性:未能充分建模人类运动动态,因为它们未能平衡局部运动细节与长程时空依赖关系,并且时间表示缺乏与频域的交互以建模轨迹序列。为了解决这些挑战,我们提出了PatchTraj,一个动态基于补丁的轨迹预测框架,统一了时域和频域表示。具体而言,我们将轨迹分解为原始时间序列和频率成分,利用动态补丁分区进行多尺度轨迹分割,以捕捉分层运动模式。每个补丁由一个自适应嵌入层进行处理,具有尺度感知的特征提取,随后进行层次化特征聚合以建模细粒度和长程依赖关系。两个分支的输出通过交叉模态注意力进行交互,实现时间和频谱线索的互补融合。最后,一个Transformer编码器-解码器集成了两种模态,以自回归方式预测未来轨迹。在ETH-UCY、SDD、NBA和JRDB数据集上进行的大量实验证明,我们的方法以高效率实现了最先进的性能。
更新时间: 2025-07-25 09:55:33
领域: cs.CV,cs.AI
PatchTraj: Dynamic Patch Representation Learning for Time-Frequency Trajectory Prediction
Pedestrian trajectory prediction is crucial for autonomous driving and robotics. While existing point-based and grid-based methods expose two key limitations: insufficiently modeling human motion dynamics, as they fail to balance local motion details with long-range spatiotemporal dependencies, and the time representation lacks interaction with the frequency domain in modeling trajectory sequences. To address these challenges, we propose PatchTraj, a dynamic patch-based trajectory prediction framework that unifies time-domain and frequency-domain representations. Specifically, we decompose the trajectory into raw time sequences and frequency components, employing dynamic patch partitioning for multi-scale trajectory segmentation to capture hierarchical motion patterns. Each patch is processed by an adaptive embedding layer with scale-aware feature extraction, followed by hierarchical feature aggregation to model both fine-grained and long-range dependencies. The outputs of two branches interact via cross-modal attention, enabling complementary fusion of temporal and spectral cues. Finally, a Transformer encoder-decoder integrates both modalities to autoregressively predict future trajectories. Extensive experiments on ETH-UCY, SDD, NBA, and JRDB datasets demonstrate that our method achieves state-of-the-art performance with high efficiency.
Updated: 2025-07-25 09:55:33
标题: PatchTraj:用于时频轨迹预测的动态补丁表示学习
摘要: 行人轨迹预测对于自动驾驶和机器人技术至关重要。然而,现有的基于点和基于网格的方法存在两个关键限制:未能充分建模人类运动动态,因为它们未能平衡局部运动细节与远程时空依赖性,并且时间表示缺乏与频率域的相互作用来建模轨迹序列。为了解决这些挑战,我们提出了PatchTraj,一个动态基于补丁的轨迹预测框架,统一了时域和频域表示。具体而言,我们将轨迹分解为原始时间序列和频率分量,利用动态补丁分区进行多尺度轨迹分割,以捕捉分层运动模式。每个补丁都经过一个自适应嵌入层进行处理,具有尺度感知的特征提取,然后通过分层特征聚合来建模细粒度和远程依赖性。两个分支的输出通过跨模态注意力相互作用,实现了时间和频谱线索的互补融合。最后,一个Transformer编码器-解码器整合了两种模态,以自回归方式预测未来的轨迹。在ETH-UCY、SDD、NBA和JRDB数据集上进行的广泛实验表明,我们的方法在高效性方面达到了最先进的性能水平。
更新时间: 2025-07-25 09:55:33
领域: cs.CV,cs.AI
Graph Structure Learning with Privacy Guarantees for Open Graph Data
Ensuring privacy in large-scale open datasets is increasingly challenging under regulations such as the General Data Protection Regulation (GDPR). While differential privacy (DP) provides strong theoretical guarantees, it primarily focuses on noise injection during model training, neglecting privacy preservation at the data publishing stage. Existing privacy-preserving data publishing (PPDP) approaches struggle to balance privacy and utility, particularly when data publishers and users are distinct entities. To address this gap, we focus on the graph recovery problem and propose a novel privacy-preserving estimation framework for open graph data, leveraging Gaussian DP (GDP) with a structured noise-injection mechanism. Unlike traditional methods that perturb gradients or model updates, our approach ensures unbiased graph structure recovery while enforcing DP at the data publishing stage. Moreover, we provide theoretical guarantees on estimation accuracy and extend our method to discrete-variable graphs, a setting often overlooked in DP research. Experimental results in graph learning demonstrate robust performance, offering a viable solution for privacy-conscious graph analysis.
Updated: 2025-07-25 09:51:12
标题: 使用隐私保证进行开放图数据的图结构学习
摘要: 在像欧盟的《通用数据保护条例》(GDPR)这样的法规下,确保大规模开放数据集的隐私越来越具有挑战性。虽然差分隐私(DP)提供了强大的理论保证,但主要关注在模型训练过程中注入噪声,忽视了在数据发布阶段的隐私保护。现有的隐私保护数据发布(PPDP)方法在平衡隐私和效用方面面临困难,特别是当数据发布者和用户是不同的实体时。为了解决这一问题,我们关注图恢复问题,并提出了一个新颖的面向开放图数据的隐私保护估计框架,利用带有结构化噪声注入机制的高斯差分隐私(GDP)。与传统方法不同,我们的方法确保在数据发布阶段强制执行DP的同时无偏地恢复图结构。此外,我们在估计准确度上提供了理论保证,并将我们的方法扩展到离散变量图,这在DP研究中经常被忽视。在图学习方面的实验结果展示了稳健的性能,为注重隐私的图分析提供了可行的解决方案。
更新时间: 2025-07-25 09:51:12
领域: cs.LG,cs.AI
Graph Structure Learning with Privacy Guarantees for Open Graph Data
Ensuring privacy in large-scale open datasets is increasingly challenging under regulations such as the General Data Protection Regulation (GDPR). While differential privacy (DP) provides strong theoretical guarantees, it primarily focuses on noise injection during model training, neglecting privacy preservation at the data publishing stage. Existing privacy-preserving data publishing (PPDP) approaches struggle to balance privacy and utility, particularly when data publishers and users are distinct entities. To address this gap, we focus on the graph recovery problem and propose a novel privacy-preserving estimation framework for open graph data, leveraging Gaussian DP (GDP) with a structured noise-injection mechanism. Unlike traditional methods that perturb gradients or model updates, our approach ensures unbiased graph structure recovery while enforcing DP at the data publishing stage. Moreover, we provide theoretical guarantees on estimation accuracy and extend our method to discrete-variable graphs, a setting often overlooked in DP research. Experimental results in graph learning demonstrate robust performance, offering a viable solution for privacy-conscious graph analysis.
Updated: 2025-07-25 09:51:12
标题: 使用带有隐私保证的方法学习开放图数据的图结构
摘要: 在像通用数据保护条例(GDPR)这样的法规下,确保大规模开放数据集的隐私日益具有挑战性。虽然差分隐私(DP)提供了强大的理论保证,但它主要关注模型训练过程中的噪声注入,忽视了数据发布阶段的隐私保护。现有的隐私保护数据发布(PPDP)方法在平衡隐私和实用性方面存在困难,特别是在数据发布者和用户是不同实体的情况下。为了解决这一差距,我们专注于图恢复问题,并提出了一种新颖的针对开放图数据的隐私保护估计框架,利用了高斯差分隐私(GDP)和结构化噪声注入机制。与传统方法通过扰动梯度或模型更新不同,我们的方法确保了在数据发布阶段强制执行DP的无偏图结构恢复。此外,我们提供了关于估计准确性的理论保证,并将我们的方法扩展到离散变量图,这是差分隐私研究中经常被忽视的设置。图学习的实验结果展示了出色的性能,为注重隐私的图分析提供了可行的解决方案。
更新时间: 2025-07-25 09:51:12
领域: cs.LG,cs.AI
Automated Code Review Using Large Language Models at Ericsson: An Experience Report
Code review is one of the primary means of assuring the quality of released software along with testing and static analysis. However, code review requires experienced developers who may not always have the time to perform an in-depth review of code. Thus, automating code review can help alleviate the cognitive burden on experienced software developers allowing them to focus on their primary activities of writing code to add new features and fix bugs. In this paper, we describe our experience in using Large Language Models towards automating the code review process in Ericsson. We describe the development of a lightweight tool using LLMs and static program analysis. We then describe our preliminary experiments with experienced developers in evaluating our code review tool and the encouraging results.
Updated: 2025-07-25 09:50:48
标题: Ericsson在使用大型语言模型进行自动化代码审查:一份经验报告
摘要: 代码审查是确保发布的软件质量的主要手段之一,除了测试和静态分析。然而,代码审查需要有经验的开发人员,他们可能并不总是有时间进行深入审查代码。因此,自动化代码审查可以帮助减轻有经验的软件开发人员的认知负担,使他们能够专注于撰写代码以添加新功能和修复错误的主要活动。在本文中,我们描述了我们在爱立信使用大型语言模型自动化代码审查过程的经验。我们描述了使用LLMs和静态程序分析开发轻量级工具的过程。然后,我们描述了与有经验的开发人员进行初步实验以评估我们的代码审查工具以及令人鼓舞的结果。
更新时间: 2025-07-25 09:50:48
领域: cs.SE,cs.AI
Automated Code Review Using Large Language Models at Ericsson: An Experience Report
Code review is one of the primary means of assuring the quality of released software along with testing and static analysis. However, code review requires experienced developers who may not always have the time to perform an in-depth review of code. Thus, automating code review can help alleviate the cognitive burden on experienced software developers allowing them to focus on their primary activities of writing code to add new features and fix bugs. In this paper, we describe our experience in using Large Language Models towards automating the code review process in Ericsson. We describe the development of a lightweight tool using LLMs and static program analysis. We then describe our preliminary experiments with experienced developers in evaluating our code review tool and the encouraging results.
Updated: 2025-07-25 09:50:48
标题: 爱立信公司使用大型语言模型进行自动化代码审查:一个经验报告
摘要: 代码审查是确保发布的软件质量的主要手段之一,以及测试和静态分析。然而,代码审查需要有经验的开发人员,他们可能并不总是有时间进行深入的代码审查。因此,自动化代码审查可以帮助减轻有经验的软件开发人员的认知负担,使他们能够专注于编写代码以添加新功能和修复错误的主要活动。在这篇论文中,我们描述了在爱立信公司使用大型语言模型自动化代码审查过程的经验。我们描述了使用LLM和静态程序分析开发轻量级工具的过程。然后,我们描述了与有经验的开发人员进行初步实验,评估我们的代码审查工具并取得了令人鼓舞的结果。
更新时间: 2025-07-25 09:50:48
领域: cs.SE,cs.AI
Pareto-NRPA: A Novel Monte-Carlo Search Algorithm for Multi-Objective Optimization
We introduce Pareto-NRPA, a new Monte-Carlo algorithm designed for multi-objective optimization problems over discrete search spaces. Extending the Nested Rollout Policy Adaptation (NRPA) algorithm originally formulated for single-objective problems, Pareto-NRPA generalizes the nested search and policy update mechanism to multi-objective optimization. The algorithm uses a set of policies to concurrently explore different regions of the solution space and maintains non-dominated fronts at each level of search. Policy adaptation is performed with respect to the diversity and isolation of sequences within the Pareto front. We benchmark Pareto-NRPA on two classes of problems: a novel bi-objective variant of the Traveling Salesman Problem with Time Windows problem (MO-TSPTW), and a neural architecture search task on well-known benchmarks. Results demonstrate that Pareto-NRPA achieves competitive performance against state-of-the-art multi-objective algorithms, both in terms of convergence and diversity of solutions. Particularly, Pareto-NRPA strongly outperforms state-of-the-art evolutionary multi-objective algorithms on constrained search spaces. To our knowledge, this work constitutes the first adaptation of NRPA to the multi-objective setting.
Updated: 2025-07-25 09:46:25
标题: Pareto-NRPA:一种新颖的蒙特卡洛搜索算法,用于多目标优化
摘要: 我们介绍了Pareto-NRPA,这是一种针对离散搜索空间的多目标优化问题设计的新的蒙特卡洛算法。扩展了最初为单目标问题制定的Nested Rollout Policy Adaptation (NRPA)算法,Pareto-NRPA将嵌套搜索和策略更新机制推广到多目标优化。该算法使用一组策略同时探索解空间的不同区域,并在每个搜索级别维护无支配前沿。策略适应是针对Pareto前沿中序列的多样性和隔离性进行的。我们在两类问题上对Pareto-NRPA进行基准测试:一种新颖的旅行商问题时间窗口的双目标变体问题(MO-TSPTW),以及在知名基准测试上进行的神经架构搜索任务。结果表明,Pareto-NRPA在收敛性和解的多样性方面与最先进的多目标算法相比具有竞争力的性能。特别是,在受限搜索空间上,Pareto-NRPA明显优于最先进的进化多目标算法。据我们所知,这项工作构成了将NRPA适应到多目标设置的第一次尝试。
更新时间: 2025-07-25 09:46:25
领域: cs.AI
Pareto-NRPA: A Novel Monte-Carlo Search Algorithm for Multi-Objective Optimization
We introduce Pareto-NRPA, a new Monte-Carlo algorithm designed for multi-objective optimization problems over discrete search spaces. Extending the Nested Rollout Policy Adaptation (NRPA) algorithm originally formulated for single-objective problems, Pareto-NRPA generalizes the nested search and policy update mechanism to multi-objective optimization. The algorithm uses a set of policies to concurrently explore different regions of the solution space and maintains non-dominated fronts at each level of search. Policy adaptation is performed with respect to the diversity and isolation of sequences within the Pareto front. We benchmark Pareto-NRPA on two classes of problems: a novel bi-objective variant of the Traveling Salesman Problem with Time Windows problem (MO-TSPTW), and a neural architecture search task on well-known benchmarks. Results demonstrate that Pareto-NRPA achieves competitive performance against state-of-the-art multi-objective algorithms, both in terms of convergence and diversity of solutions. Particularly, Pareto-NRPA strongly outperforms state-of-the-art evolutionary multi-objective algorithms on constrained search spaces. To our knowledge, this work constitutes the first adaptation of NRPA to the multi-objective setting.
Updated: 2025-07-25 09:46:25
标题: Pareto-NRPA: 一种新的蒙特卡洛搜索算法用于多目标优化
摘要: 我们介绍了Pareto-NRPA,这是一种新的蒙特卡罗算法,专为离散搜索空间上的多目标优化问题而设计。扩展了最初针对单目标问题制定的Nested Rollout Policy Adaptation (NRPA)算法,Pareto-NRPA将嵌套搜索和策略更新机制推广到多目标优化。该算法使用一组策略同时探索解空间的不同区域,并在每个搜索级别上保持非支配前沿。策略调整是针对Pareto前沿内序列的多样性和隔离性进行的。我们在两类问题上对Pareto-NRPA进行了基准测试:一种新颖的带有时间窗口的双目标旅行商问题(MO-TSPTW)变种,以及在知名基准上进行的神经架构搜索任务。结果表明,Pareto-NRPA在收敛性和解的多样性方面与最先进的多目标算法相比具有竞争性表现。特别是,在受限搜索空间上,Pareto-NRPA明显优于最先进的进化多目标算法。据我们所知,这项工作构成了NRPA在多目标环境中的首次适应。
更新时间: 2025-07-25 09:46:25
领域: cs.AI
Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating retrieved information. Standard retrieval process prioritized relevance, focusing on topical alignment between queries and passages. In contrast, in RAG, the emphasis has shifted to utility, which considers the usefulness of passages for generating accurate answers. Despite empirical evidence showing the benefits of utility-based retrieval in RAG, the high computational cost of using LLMs for utility judgments limits the number of passages evaluated. This restriction is problematic for complex queries requiring extensive information. To address this, we propose a method to distill the utility judgment capabilities of LLMs into smaller, more efficient models. Our approach focuses on utility-based selection rather than ranking, enabling dynamic passage selection tailored to specific queries without the need for fixed thresholds. We train student models to learn pseudo-answer generation and utility judgments from teacher LLMs, using a sliding window method that dynamically selects useful passages. Our experiments demonstrate that utility-based selection provides a flexible and cost-effective solution for RAG, significantly reducing computational costs while improving answer quality. We present the distillation results using Qwen3-32B as the teacher model for both relevance ranking and utility-based selection, distilled into RankQwen1.7B and UtilityQwen1.7B. Our findings indicate that for complex questions, utility-based selection is more effective than relevance ranking in enhancing answer generation performance. We will release the relevance ranking and utility-based selection annotations for the MS MARCO dataset, supporting further research in this area.
Updated: 2025-07-25 09:32:29
标题: 提炼一个小型基于效用的段落选择器以增强检索增强生成
摘要: 检索增强生成(RAG)通过整合检索信息来增强大型语言模型(LLMs)。标准的检索过程优先考虑相关性,侧重于查询和段落之间的主题对齐。相比之下,在RAG中,重点已转移到效用,考虑到段落对生成准确答案的有用性。尽管经验证据显示效用为基础的检索在RAG中的好处,但使用LLMs进行效用判断的高计算成本限制了评估的段落数量。这种限制对于需要大量信息的复杂查询是有问题的。为了解决这个问题,我们提出了一种方法,将LLMs的效用判断能力提炼成更小、更高效的模型。我们的方法侧重于基于效用的选择而不是排名,实现了根据特定查询动态选择段落的能力,无需固定阈值。我们训练学生模型从教师LLMs中学习伪答案生成和效用判断,使用滑动窗口方法动态选择有用的段落。我们的实验表明,基于效用的选择为RAG提供了灵活且具有成本效益的解决方案,显著降低了计算成本同时提高了答案质量。我们以Qwen3-32B作为教师模型,将相关性排名和基于效用的选择提炼成RankQwen1.7B和UtilityQwen1.7B。我们的发现表明,对于复杂问题,基于效用的选择在增强答案生成性能方面比相关性排名更有效。我们将发布MS MARCO数据集的相关性排名和基于效用的选择注释,支持这一领域的进一步研究。
更新时间: 2025-07-25 09:32:29
领域: cs.IR,cs.AI,cs.CL,cs.LG
Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating retrieved information. Standard retrieval process prioritized relevance, focusing on topical alignment between queries and passages. In contrast, in RAG, the emphasis has shifted to utility, which considers the usefulness of passages for generating accurate answers. Despite empirical evidence showing the benefits of utility-based retrieval in RAG, the high computational cost of using LLMs for utility judgments limits the number of passages evaluated. This restriction is problematic for complex queries requiring extensive information. To address this, we propose a method to distill the utility judgment capabilities of LLMs into smaller, more efficient models. Our approach focuses on utility-based selection rather than ranking, enabling dynamic passage selection tailored to specific queries without the need for fixed thresholds. We train student models to learn pseudo-answer generation and utility judgments from teacher LLMs, using a sliding window method that dynamically selects useful passages. Our experiments demonstrate that utility-based selection provides a flexible and cost-effective solution for RAG, significantly reducing computational costs while improving answer quality. We present the distillation results using Qwen3-32B as the teacher model for both relevance ranking and utility-based selection, distilled into RankQwen1.7B and UtilityQwen1.7B. Our findings indicate that for complex questions, utility-based selection is more effective than relevance ranking in enhancing answer generation performance. We will release the relevance ranking and utility-based selection annotations for the MS MARCO dataset, supporting further research in this area.
Updated: 2025-07-25 09:32:29
标题: 提炼小型基于效用的段落选择器以增强检索增强生成
摘要: 检索增强生成(RAG)通过整合检索信息增强了大型语言模型(LLM)。标准检索过程优先考虑相关性,专注于查询和段落之间的主题对齐。相比之下,在RAG中,重点已经转向效用,即考虑段落对于生成准确答案的有用性。尽管经验证据显示了效用为基础的检索在RAG中的好处,但使用LLMs进行效用判断的高计算成本限制了所评估的段落数量。这一限制对于需要大量信息的复杂查询是有问题的。为了解决这个问题,我们提出了一种方法,将LLMs的效用判断能力提炼成更小、更高效的模型。我们的方法侧重于基于效用的选择而不是排名,实现了针对特定查询的动态段落选择,无需固定阈值。我们训练学生模型从教师LLMs那里学习伪答案生成和效用判断,使用滑动窗口方法动态选择有用的段落。我们的实验表明,基于效用的选择为RAG提供了一种灵活且经济有效的解决方案,显著降低了计算成本同时提高了答案质量。我们使用Qwen3-32B作为教师模型,将其提炼为RankQwen1.7B和UtilityQwen1.7B,展示了相关性排名和基于效用的选择结果。我们的研究结果表明,对于复杂问题,基于效用的选择比相关性排名在增强答案生成性能方面更为有效。我们将发布MS MARCO数据集的相关性排名和基于效用的选择注释,支持这一领域的进一步研究。
更新时间: 2025-07-25 09:32:29
领域: cs.IR,cs.AI,cs.CL,cs.LG
MedSymmFlow: Bridging Generative Modeling and Classification in Medical Imaging through Symmetrical Flow Matching
Reliable medical image classification requires accurate predictions and well-calibrated uncertainty estimates, especially in high-stakes clinical settings. This work presents MedSymmFlow, a generative-discriminative hybrid model built on Symmetrical Flow Matching, designed to unify classification, generation, and uncertainty quantification in medical imaging. MedSymmFlow leverages a latent-space formulation that scales to high-resolution inputs and introduces a semantic mask conditioning mechanism to enhance diagnostic relevance. Unlike standard discriminative models, it naturally estimates uncertainty through its generative sampling process. The model is evaluated on four MedMNIST datasets, covering a range of modalities and pathologies. The results show that MedSymmFlow matches or exceeds the performance of established baselines in classification accuracy and AUC, while also delivering reliable uncertainty estimates validated by performance improvements under selective prediction.
Updated: 2025-07-25 09:30:40
标题: MedSymmFlow:通过对称流匹配将生成建模和分类在医学成像中进行桥接
摘要: 可靠的医学图像分类需要准确的预测和良好校准的不确定性估计,尤其是在高风险的临床环境中。本文介绍了MedSymmFlow,这是一个基于对称流匹配的生成-判别混合模型,旨在统一医学成像中的分类、生成和不确定性量化。MedSymmFlow利用一个潜在空间公式,可扩展到高分辨率输入,并引入语义掩模调节机制以增强诊断相关性。与标准判别模型不同,它通过其生成采样过程自然地估计不确定性。该模型在涵盖各种模态和病理学的四个MedMNIST数据集上进行了评估。结果表明,MedSymmFlow在分类准确性和AUC方面与已建立的基线模型相匹配或超过,并且通过选择性预测下的性能改进验证了可靠的不确定性估计。
更新时间: 2025-07-25 09:30:40
领域: cs.CV,cs.AI
MedSymmFlow: Bridging Generative Modeling and Classification in Medical Imaging through Symmetrical Flow Matching
Reliable medical image classification requires accurate predictions and well-calibrated uncertainty estimates, especially in high-stakes clinical settings. This work presents MedSymmFlow, a generative-discriminative hybrid model built on Symmetrical Flow Matching, designed to unify classification, generation, and uncertainty quantification in medical imaging. MedSymmFlow leverages a latent-space formulation that scales to high-resolution inputs and introduces a semantic mask conditioning mechanism to enhance diagnostic relevance. Unlike standard discriminative models, it naturally estimates uncertainty through its generative sampling process. The model is evaluated on four MedMNIST datasets, covering a range of modalities and pathologies. The results show that MedSymmFlow matches or exceeds the performance of established baselines in classification accuracy and AUC, while also delivering reliable uncertainty estimates validated by performance improvements under selective prediction.
Updated: 2025-07-25 09:30:40
标题: MedSymmFlow:通过对称流匹配在医学影像中连接生成建模和分类
摘要: 可靠的医学图像分类需要准确的预测和良好校准的不确定性估计,尤其是在高风险的临床环境中。本文介绍了MedSymmFlow,这是一个建立在对称流匹配上的生成-判别混合模型,旨在统一医学图像分类、生成和不确定性量化。MedSymmFlow利用了一个潜在空间公式,可适用于高分辨率输入,并引入了一个语义掩模调节机制,以提高诊断相关性。与标准的判别模型不同,它通过其生成取样过程自然地估计不确定性。该模型在涵盖多种模态和病理学的四个MedMNIST数据集上进行评估。结果表明,MedSymmFlow在分类准确度和AUC方面与已建立的基线模型相匹配或超越,同时还提供可靠的不确定性估计,通过选择性预测的性能改进得到验证。
更新时间: 2025-07-25 09:30:40
领域: cs.CV,cs.AI
JCAPT: A Joint Modeling Approach for CAPT
Effective pronunciation feedback is critical in second language (L2) learning, for which computer-assisted pronunciation training (CAPT) systems often encompass two key tasks: automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD). Recent work has shown that joint modeling of these two tasks can yield mutual benefits. Our unified framework leverages Mamba, a selective state space model (SSM), while integrating phonological features and think token strategies to jointly enhance interpretability and fine-grained temporal reasoning in APA and MDD. To our knowledge, this is the first study to combine phonological attribution, SSM-based modeling, and prompting in CAPT. A series of experiments conducted on the speechocean762 benchmark demonstrate that our model consistently outperforms prior methods, particularly on the MDD task.
Updated: 2025-07-25 09:26:59
标题: JCAPT:一种联合建模方法用于CAPT
摘要: 有效的发音反馈在第二语言(L2)学习中至关重要,为此,计算机辅助发音训练(CAPT)系统通常包括两个关键任务:自动发音评估(APA)和发音错误检测与诊断(MDD)。最近的研究表明,这两个任务的联合建模可以产生相互的好处。我们的统一框架利用了Mamba,一种选择性状态空间模型(SSM),同时整合了语音特征和思考令牌策略,共同增强了APA和MDD中的解释性和细粒度时间推理。据我们所知,这是第一项将语音归因、基于SSM的建模和提示结合在CAPT中的研究。在speechocean762基准上进行的一系列实验表明,我们的模型始终优于先前的方法,特别是在MDD任务上。
更新时间: 2025-07-25 09:26:59
领域: cs.CL,cs.AI,eess.AS
JCAPT: A Joint Modeling Approach for CAPT
Effective pronunciation feedback is critical in second language (L2) learning, for which computer-assisted pronunciation training (CAPT) systems often encompass two key tasks: automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD). Recent work has shown that joint modeling of these two tasks can yield mutual benefits. Our unified framework leverages Mamba, a selective state space model (SSM), while integrating phonological features and think token strategies to jointly enhance interpretability and fine-grained temporal reasoning in APA and MDD. To our knowledge, this is the first study to combine phonological attribution, SSM-based modeling, and prompting in CAPT. A series of experiments conducted on the speechocean762 benchmark demonstrate that our model consistently outperforms prior methods, particularly on the MDD task.
Updated: 2025-07-25 09:26:59
标题: JCAPT:CAPT的联合建模方法
摘要: 有效的发音反馈在第二语言(L2)学习中至关重要,计算机辅助发音训练(CAPT)系统通常包括两个关键任务:自动发音评估(APA)和发音错误检测与诊断(MDD)。最近的研究表明,这两个任务的联合建模可以带来相互的益处。我们的统一框架利用了Mamba,一种选择性状态空间模型(SSM),同时整合了语音特征和思考令牌策略,共同增强了APA和MDD中的可解释性和细粒度时间推理。据我们所知,这是第一项在CAPT中结合语音归因、基于SSM的建模和提示的研究。在speechocean762基准上进行的一系列实验表明,我们的模型在MDD任务上始终优于先前的方法。
更新时间: 2025-07-25 09:26:59
领域: cs.CL,cs.AI,eess.AS
GCL-GCN: Graphormer and Contrastive Learning Enhanced Attributed Graph Clustering Network
Attributed graph clustering holds significant importance in modern data analysis. However, due to the complexity of graph data and the heterogeneity of node attributes, leveraging graph information for clustering remains challenging. To address this, we propose a novel deep graph clustering model, GCL-GCN, specifically designed to address the limitations of existing models in capturing local dependencies and complex structures when dealing with sparse and heterogeneous graph data. GCL-GCN introduces an innovative Graphormer module that combines centrality encoding and spatial relationships, effectively capturing both global and local information between nodes, thereby enhancing the quality of node representations. Additionally, we propose a novel contrastive learning module that significantly enhances the discriminative power of feature representations. In the pre-training phase, this module increases feature distinction through contrastive learning on the original feature matrix, ensuring more identifiable initial representations for subsequent graph convolution and clustering tasks. Extensive experimental results on six datasets demonstrate that GCL-GCN outperforms 14 advanced methods in terms of clustering quality and robustness. Specifically, on the Cora dataset, it improves ACC, NMI, and ARI by 4.94%, 13.01%, and 10.97%, respectively, compared to the primary comparison method MBN.
Updated: 2025-07-25 09:25:55
标题: GCL-GCN: 图形变压器和对比学习增强的属性图聚类网络
摘要: 属性图聚类在现代数据分析中具有重要意义。然而,由于图数据的复杂性和节点属性的异质性,利用图信息进行聚类仍然具有挑战性。为了解决这个问题,我们提出了一种新颖的深度图聚类模型GCL-GCN,专门设计用于解决现有模型在处理稀疏和异质图数据时捕捉局部依赖性和复杂结构的局限性。GCL-GCN引入了一种创新的Graphormer模块,结合了中心性编码和空间关系,有效捕捉了节点之间的全局和局部信息,从而提高了节点表示的质量。此外,我们提出了一种新颖的对比学习模块,显著增强了特征表示的判别力。在预训练阶段,该模块通过对原始特征矩阵进行对比学习,增加了特征的区别性,确保了更可识别的初始表示,以用于后续的图卷积和聚类任务。对六个数据集的广泛实验结果表明,GCL-GCN在聚类质量和稳健性方面优于14种先进方法。具体而言,在Cora数据集上,与主要比较方法MBN相比,它分别提高了ACC、NMI和ARI分别为4.94%、13.01%和10.97%。
更新时间: 2025-07-25 09:25:55
领域: cs.LG
Graph Neural Network-Based Predictor for Optimal Quantum Hardware Selection
The growing variety of quantum hardware technologies, each with unique peculiarities such as connectivity and native gate sets, creates challenges when selecting the best platform for executing a specific quantum circuit. This selection process usually involves a brute-force approach: compiling the circuit on various devices and evaluating performance based on factors such as circuit depth and gate fidelity. However, this method is computationally expensive and does not scale well as the number of available quantum processors increases. In this work, we propose a Graph Neural Network (GNN)-based predictor that automates hardware selection by analyzing the Directed Acyclic Graph (DAG) representation of a quantum circuit. Our study evaluates 498 quantum circuits (up to 27 qubits) from the MQT Bench dataset, compiled using Qiskit on four devices: three superconducting quantum processors (IBM-Kyiv, IBM-Brisbane, IBM-Sherbrooke) and one trapped-ion processor (IONQ-Forte). Performance is estimated using a metric that integrates circuit depth and gate fidelity, resulting in a dataset where 93 circuits are optimally compiled on the trapped-ion device, while the remaining circuits prefer superconducting platforms. By exploiting graph-based machine learning, our approach avoids extracting the circuit features for the model evaluation but directly embeds it as a graph, significantly accelerating the optimal target decision-making process and maintaining all the information. Experimental results prove 94.4% accuracy and an 85.5% F1 score for the minority class, effectively predicting the best compilation target. The developed code is publicly available on GitHub (https://github.com/antotu/GNN-Model-Quantum-Predictor).
Updated: 2025-07-25 09:23:04
标题: 基于图神经网络的最佳量子硬件选择预测器
摘要: 随着量子硬件技术的不断增多,每种技术都具有独特的特性,如连接性和本地门集,选择执行特定量子电路的最佳平台变得具有挑战性。这种选择过程通常涉及蛮力方法:在各种设备上编译电路,并根据电路深度和门保真度等因素评估性能。然而,这种方法计算成本高昂,在可用量子处理器数量增加时不具有良好的可扩展性。在这项工作中,我们提出了一种基于图神经网络(GNN)的预测器,通过分析量子电路的有向无环图(DAG)表示来自动化硬件选择。我们的研究评估了498个量子电路(最多27个量子比特)来自MQT Bench数据集,使用Qiskit在四个设备上编译:三个超导量子处理器(IBM-Kyiv、IBM-Brisbane、IBM-Sherbrooke)和一个离子阱处理器(IONQ-Forte)。性能估计使用一个综合考虑电路深度和门保真度的指标,结果得到一个数据集,其中93个电路在离子阱设备上被最佳编译,而其余电路更倾向于超导平台。通过利用基于图的机器学习,我们的方法避免了为模型评估提取电路特征,而是将其直接嵌入为图,显著加速了最佳目标决策过程并保留了所有信息。实验结果证明了94.4%的准确度和85.5%的F1分数对于少数类,有效地预测了最佳编译目标。开发的代码公开在GitHub上可用(https://github.com/antotu/GNN-Model-Quantum-Predictor)。
更新时间: 2025-07-25 09:23:04
领域: quant-ph,cs.LG
Mean flow data assimilation using physics-constrained Graph Neural Networks
Despite their widespread use, purely data-driven methods often suffer from overfitting, lack of physical consistency, and high data dependency, particularly when physical constraints are not incorporated. This study introduces a novel data assimilation approach that integrates Graph Neural Networks (GNNs) with optimisation techniques to enhance the accuracy of mean flow reconstruction, using Reynolds-Averaged Navier-Stokes (RANS) equations as a baseline. The method leverages the adjoint approach, incorporating RANS-derived gradients as optimisation terms during GNN training, ensuring that the learned model adheres to physical laws and maintains consistency. Additionally, the GNN framework is well-suited for handling unstructured data, which is common in the complex geometries encountered in Computational Fluid Dynamics (CFD). The GNN is interfaced with the Finite Element Method (FEM) for numerical simulations, enabling accurate modelling in unstructured domains. We consider the reconstruction of mean flow past bluff bodies at low Reynolds numbers as a test case, addressing tasks such as sparse data recovery, denoising, and inpainting of missing flow data. The key strengths of the approach lie in its integration of physical constraints into the GNN training process, leading to accurate predictions with limited data, making it particularly valuable when data are scarce or corrupted. Results demonstrate significant improvements in the accuracy of mean flow reconstructions, even with limited training data, compared to analogous purely data-driven models.
Updated: 2025-07-25 09:18:14
标题: 使用物理约束图神经网络进行平均流数据同化
摘要: 尽管纯数据驱动方法被广泛使用,但往往存在过拟合、缺乏物理一致性和高度数据依赖性的问题,特别是当物理约束未被纳入时。本研究引入了一种新颖的数据同化方法,将图神经网络(GNNs)与优化技术相结合,以增强平均流重构的准确性,使用雷诺平均-纳维-斯托克斯(RANS)方程作为基准。该方法利用伴随方法,将RANS导出的梯度作为优化项在GNN训练期间进行融合,确保学习模型遵循物理定律并保持一致性。此外,GNN框架非常适合处理非结构化数据,在计算流体动力学(CFD)中常见的复杂几何结构中。GNN与有限元方法(FEM)接口,用于数值模拟,在非结构化域中实现精确建模。我们将过低雷诺数下的方形体的平均流重构作为测试案例,解决稀疏数据恢复、去噪和填补缺失流数据等任务。该方法的关键优势在于将物理约束整合到GNN训练过程中,从而实现在数据有限时准确预测,当数据稀缺或损坏时尤为有价值。结果表明,与类似的纯数据驱动模型相比,即使训练数据有限,平均流重构的准确性也有显著提高。
更新时间: 2025-07-25 09:18:14
领域: physics.flu-dyn,cs.LG
Fine-Grained Traffic Inference from Road to Lane via Spatio-Temporal Graph Node Generation
Fine-grained traffic management and prediction are fundamental to key applications such as autonomous driving, lane change guidance, and traffic signal control. However, obtaining lane-level traffic data has become a critical bottleneck for data-driven models due to limitations in the types and number of sensors and issues with the accuracy of tracking algorithms. To address this, we propose the Fine-grained Road Traffic Inference (FRTI) task, which aims to generate more detailed lane-level traffic information using limited road data, providing a more energy-efficient and cost-effective solution for precise traffic management. This task is abstracted as the first scene of the spatio-temporal graph node generation problem. We designed a two-stage framework--RoadDiff--to solve the FRTI task. solve the FRTI task. This framework leverages the Road-Lane Correlation Autoencoder-Decoder and the Lane Diffusion Module to fully utilize the limited spatio-temporal dependencies and distribution relationships of road data to accurately infer fine-grained lane traffic states. Based on existing research, we designed several baseline models with the potential to solve the FRTI task and conducted extensive experiments on six datasets representing different road conditions to validate the effectiveness of the RoadDiff model in addressing the FRTI task. The relevant datasets and code are available at https://github.com/ShuhaoLii/RoadDiff.
Updated: 2025-07-25 09:15:18
标题: 通过时空图节点生成从道路到车道的细粒度交通推断
摘要: 细粒度的交通管理和预测对于关键应用程序如自动驾驶、换道指导和交通信号控制至关重要。然而,由于传感器类型和数量的限制以及跟踪算法准确性的问题,获取车道级别的交通数据已成为数据驱动模型的一个关键瓶颈。为了解决这个问题,我们提出了细粒度道路交通推断(FRTI)任务,旨在使用有限的道路数据生成更详细的车道级别交通信息,为精确的交通管理提供更节能和经济有效的解决方案。这个任务被抽象为时空图节点生成问题的第一个场景。我们设计了一个两阶段框架RoadDiff来解决FRTI任务。该框架利用道路车道相关性自编码器-解码器和车道扩散模块充分利用有限的时空依赖性和道路数据的分布关系,精确推断细粒度车道交通状态。基于现有研究,我们设计了几个具有潜力解决FRTI任务的基线模型,并在代表不同道路条件的六个数据集上进行了大量实验,以验证RoadDiff模型在解决FRTI任务中的有效性。相关数据集和代码可在https://github.com/ShuhaoLii/RoadDiff 上找到。
更新时间: 2025-07-25 09:15:18
领域: cs.AI,cs.CV
Fine-Grained Traffic Inference from Road to Lane via Spatio-Temporal Graph Node Generation
Fine-grained traffic management and prediction are fundamental to key applications such as autonomous driving, lane change guidance, and traffic signal control. However, obtaining lane-level traffic data has become a critical bottleneck for data-driven models due to limitations in the types and number of sensors and issues with the accuracy of tracking algorithms. To address this, we propose the Fine-grained Road Traffic Inference (FRTI) task, which aims to generate more detailed lane-level traffic information using limited road data, providing a more energy-efficient and cost-effective solution for precise traffic management. This task is abstracted as the first scene of the spatio-temporal graph node generation problem. We designed a two-stage framework--RoadDiff--to solve the FRTI task. solve the FRTI task. This framework leverages the Road-Lane Correlation Autoencoder-Decoder and the Lane Diffusion Module to fully utilize the limited spatio-temporal dependencies and distribution relationships of road data to accurately infer fine-grained lane traffic states. Based on existing research, we designed several baseline models with the potential to solve the FRTI task and conducted extensive experiments on six datasets representing different road conditions to validate the effectiveness of the RoadDiff model in addressing the FRTI task. The relevant datasets and code are available at https://github.com/ShuhaoLii/RoadDiff.
Updated: 2025-07-25 09:15:18
标题: 通过时空图节点生成从道路到车道的细粒度交通推断
摘要: 精细化的交通管理和预测对于关键应用程序如自动驾驶、换道指导和交通信号控制至关重要。然而,由于传感器类型和数量的限制以及跟踪算法精度的问题,获取车道级交通数据已经成为数据驱动模型的关键瓶颈。为了解决这个问题,我们提出了精细化道路交通推断(FRTI)任务,旨在利用有限的道路数据生成更详细的车道级交通信息,提供更节能和成本效益的解决方案,用于精确的交通管理。该任务被抽象为时空图节点生成问题的第一个场景。我们设计了一个两阶段框架——RoadDiff——来解决FRTI任务。该框架利用道路-车道相关自编码器-解码器和车道扩散模块,充分利用有限的时空依赖性和分布关系的道路数据,准确推断精细化车道的交通状态。基于现有研究,我们设计了几个基线模型,具有解决FRTI任务的潜力,并在代表不同道路状况的六个数据集上进行了大量实验,以验证RoadDiff模型在解决FRTI任务时的有效性。相关数据集和代码可在https://github.com/ShuhaoLii/RoadDiff上找到。
更新时间: 2025-07-25 09:15:18
领域: cs.AI,cs.CV
Clustering-Oriented Generative Attribute Graph Imputation
Attribute-missing graph clustering has emerged as a significant unsupervised task, where only attribute vectors of partial nodes are available and the graph structure is intact. The related models generally follow the two-step paradigm of imputation and refinement. However, most imputation approaches fail to capture class-relevant semantic information, leading to sub-optimal imputation for clustering. Moreover, existing refinement strategies optimize the learned embedding through graph reconstruction, while neglecting the fact that some attributes are uncorrelated with the graph. To remedy the problems, we establish the Clustering-oriented Generative Imputation with reliable Refinement (CGIR) model. Concretely, the subcluster distributions are estimated to reveal the class-specific characteristics precisely, and constrain the sampling space of the generative adversarial module, such that the imputation nodes are impelled to align with the correct clusters. Afterwards, multiple subclusters are merged to guide the proposed edge attention network, which identifies the edge-wise attributes for each class, so as to avoid the redundant attributes in graph reconstruction from disturbing the refinement of overall embedding. To sum up, CGIR splits attribute-missing graph clustering into the search and mergence of subclusters, which guides to implement node imputation and refinement within a unified framework. Extensive experiments prove the advantages of CGIR over state-of-the-art competitors.
Updated: 2025-07-25 09:11:38
标题: 基于聚类的生成属性图填充
摘要: 属性缺失图聚类已经成为一个重要的无监督任务,其中只有部分节点的属性向量可用,而图结构保持完整。相关模型通常遵循填充和优化的两步范式。然而,大多数填充方法未能捕捉与类相关的语义信息,导致聚类的填充效果不佳。此外,现有的优化策略通过图重建来优化学到的嵌入,而忽略了某些属性与图不相关的事实。为了解决这些问题,我们建立了面向聚类的生成填充与可靠优化(CGIR)模型。具体来说,估计子聚类分布以准确揭示类别特定特征,并约束生成对抗模块的采样空间,使得填充节点被迫与正确的聚类对齐。随后,多个子聚类被合并以指导提出的边注意力网络,该网络为每个类别识别边属性,以避免在图重建中的冗余属性干扰整体嵌入的优化。总之,CGIR将属性缺失图聚类分为子聚类的搜索和合并,从而指导在统一框架内实现节点填充和优化。大量实验证明了CGIR相对于最先进的竞争对手的优势。
更新时间: 2025-07-25 09:11:38
领域: cs.LG
SmartPNT-MSF: A Multi-Sensor Fusion Dataset for Positioning and Navigation Research
High-precision navigation and positioning systems are critical for applications in autonomous vehicles and mobile mapping, where robust and continuous localization is essential. To test and enhance the performance of algorithms, some research institutions and companies have successively constructed and publicly released datasets. However, existing datasets still suffer from limitations in sensor diversity and environmental coverage. To address these shortcomings and advance development in related fields, the SmartPNT Multisource Integrated Navigation, Positioning, and Attitude Dataset has been developed. This dataset integrates data from multiple sensors, including Global Navigation Satellite Systems (GNSS), Inertial Measurement Units (IMU), optical cameras, and LiDAR, to provide a rich and versatile resource for research in multi-sensor fusion and high-precision navigation. The dataset construction process is thoroughly documented, encompassing sensor configurations, coordinate system definitions, and calibration procedures for both cameras and LiDAR. A standardized framework for data collection and processing ensures consistency and scalability, enabling large-scale analysis. Validation using state-of-the-art Simultaneous Localization and Mapping (SLAM) algorithms, such as VINS-Mono and LIO-SAM, demonstrates the dataset's applicability for advanced navigation research. Covering a wide range of real-world scenarios, including urban areas, campuses, tunnels, and suburban environments, the dataset offers a valuable tool for advancing navigation technologies and addressing challenges in complex environments. By providing a publicly accessible, high-quality dataset, this work aims to bridge gaps in sensor diversity, data accessibility, and environmental representation, fostering further innovation in the field.
Updated: 2025-07-25 09:06:11
标题: 智能PNT-MSF:用于定位和导航研究的多传感器融合数据集
摘要: 高精度导航和定位系统对于自主车辆和移动地图等应用至关重要,其中强大和持续的定位至关重要。为了测试和提升算法的性能,一些研究机构和公司已经连续构建并公开发布数据集。然而,现有数据集仍然存在传感器多样性和环境覆盖范围方面的限制。为了解决这些缺点并推动相关领域的发展,SmartPNT多源集成导航、定位和姿态数据集应运而生。该数据集整合了来自多种传感器的数据,包括全球导航卫星系统(GNSS)、惯性测量单元(IMU)、光学相机和激光雷达,为多传感器融合和高精度导航的研究提供了丰富和多样化的资源。数据集构建过程已经详细记录,包括传感器配置、坐标系定义以及相机和激光雷达的校准程序。数据收集和处理的标准化框架确保了一致性和可扩展性,实现了大规模分析的可能。使用最先进的同时定位和地图构建(SLAM)算法,如VINS-Mono和LIO-SAM进行验证,证明了数据集对于先进导航研究的适用性。覆盖城市区域、校园、隧道和郊区等各种真实场景,该数据集为推进导航技术并解决复杂环境中的挑战提供了宝贵的工具。通过提供公开获取的高质量数据集,本研究旨在填补传感器多样性、数据可访问性和环境表征方面的差距,促进领域的进一步创新。
更新时间: 2025-07-25 09:06:11
领域: cs.RO,cs.LG
Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs
Adversarial attacks aim to generate malicious inputs that mislead deep models, but beyond causing model failure, they cannot provide certain interpretable information such as ``\textit{What content in inputs make models more likely to fail?}'' However, this information is crucial for researchers to specifically improve model robustness. Recent research suggests that models may be particularly sensitive to certain semantics in visual inputs (such as ``wet,'' ``foggy''), making them prone to errors. Inspired by this, in this paper we conducted the first exploration on large vision-language models (LVLMs) and found that LVLMs indeed are susceptible to hallucinations and various errors when facing specific semantic concepts in images. To efficiently search for these sensitive concepts, we integrated large language models (LLMs) and text-to-image (T2I) models to propose a novel semantic evolution framework. Randomly initialized semantic concepts undergo LLM-based crossover and mutation operations to form image descriptions, which are then converted by T2I models into visual inputs for LVLMs. The task-specific performance of LVLMs on each input is quantified as fitness scores for the involved semantics and serves as reward signals to further guide LLMs in exploring concepts that induce LVLMs. Extensive experiments on seven mainstream LVLMs and two multimodal tasks demonstrate the effectiveness of our method. Additionally, we provide interesting findings about the sensitive semantics of LVLMs, aiming to inspire further in-depth research.
Updated: 2025-07-25 09:03:08
标题: 盲点导航:LVLMs敏感语义概念的进化发现
摘要: 对抗性攻击旨在生成误导深度模型的恶意输入,但除了导致模型失败之外,它们无法提供诸如“输入中的哪些内容使模型更容易失败?”等可解释信息。然而,这些信息对于研究人员特别是提高模型鲁棒性至关重要。最近的研究表明,模型可能对视觉输入中的某些语义(如“潮湿”,“多雾”)特别敏感,使其容易出错。受此启发,本文首次对大规模视觉-语言模型(LVLMs)进行了探索,并发现LVLMs确实在面对图像中特定语义概念时容易产生幻觉和各种错误。为了高效地搜索这些敏感概念,我们整合了大型语言模型(LLMs)和文本到图像(T2I)模型,提出了一种新颖的语义演化框架。随机初始化的语义概念经过基于LLM的交叉和变异操作形成图像描述,然后由T2I模型转换为LVLMs的视觉输入。LVLMs在每个输入上的任务特定性能被量化为涉及语义的适应性评分,并作为奖励信号进一步指导LLMs探索诱导LVLMs的概念。对七种主流LVLMs和两个多模态任务进行的大量实验证明了我们方法的有效性。此外,我们提供有关LVLMs敏感语义的有趣发现,旨在激发进一步深入研究。
更新时间: 2025-07-25 09:03:08
领域: cs.CV,cs.AI,cs.CR
Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs
Adversarial attacks aim to generate malicious inputs that mislead deep models, but beyond causing model failure, they cannot provide certain interpretable information such as ``\textit{What content in inputs make models more likely to fail?}'' However, this information is crucial for researchers to specifically improve model robustness. Recent research suggests that models may be particularly sensitive to certain semantics in visual inputs (such as ``wet,'' ``foggy''), making them prone to errors. Inspired by this, in this paper we conducted the first exploration on large vision-language models (LVLMs) and found that LVLMs indeed are susceptible to hallucinations and various errors when facing specific semantic concepts in images. To efficiently search for these sensitive concepts, we integrated large language models (LLMs) and text-to-image (T2I) models to propose a novel semantic evolution framework. Randomly initialized semantic concepts undergo LLM-based crossover and mutation operations to form image descriptions, which are then converted by T2I models into visual inputs for LVLMs. The task-specific performance of LVLMs on each input is quantified as fitness scores for the involved semantics and serves as reward signals to further guide LLMs in exploring concepts that induce LVLMs. Extensive experiments on seven mainstream LVLMs and two multimodal tasks demonstrate the effectiveness of our method. Additionally, we provide interesting findings about the sensitive semantics of LVLMs, aiming to inspire further in-depth research.
Updated: 2025-07-25 09:03:08
标题: 盲点导航:对LVLMs敏感语义概念的进化发现
摘要: 对抗性攻击旨在生成误导深度模型的恶意输入,但除了导致模型失败外,它们无法提供诸如“输入中的哪些内容会使模型更容易失败?”等可解释信息。然而,这些信息对于研究人员特别是提高模型稳健性至关重要。最近的研究表明,模型可能对视觉输入中的某些语义特别敏感(如“湿润”,“多雾”),从而容易出错。受此启发,本文首次对大型视觉-语言模型(LVLMs)进行了探索,并发现LVLMs确实在面对图像中特定语义概念时容易产生幻觉和各种错误。为了高效地搜索这些敏感概念,我们将大型语言模型(LLMs)和文本到图像(T2I)模型整合,提出了一种新颖的语义演化框架。随机初始化的语义概念经过基于LLMs的交叉和突变操作形成图像描述,然后由T2I模型转换为LVLMs的视觉输入。LVLMs对每个输入的任务特定性能被量化为涉及语义的适应度分数,并作为奖励信号进一步引导LLMs探索引起LVLMs的概念。在七种主流LVLMs和两个多模态任务上进行的大量实验表明了我们方法的有效性。此外,我们提供了有关LVLMs敏感语义的有趣发现,旨在激发进一步深入研究。
更新时间: 2025-07-25 09:03:08
领域: cs.CV,cs.AI,cs.CR
Ambient Noise Full Waveform Inversion with Neural Operators
Numerical simulations of seismic wave propagation are crucial for investigating velocity structures and improving seismic hazard assessment. However, standard methods such as finite difference or finite element are computationally expensive. Recent studies have shown that a new class of machine learning models, called neural operators, can solve the elastodynamic wave equation orders of magnitude faster than conventional methods. Full waveform inversion is a prime beneficiary of the accelerated simulations. Neural operators, as end-to-end differentiable operators, combined with automatic differentiation, provide an alternative approach to the adjoint-state method. State-of-the-art optimization techniques built into PyTorch provide neural operators with greater flexibility to improve the optimization dynamics of full waveform inversion, thereby mitigating cycle-skipping problems. In this study, we demonstrate the first application of neural operators for full waveform inversion on a real seismic dataset, which consists of several nodal transects collected across the San Gabriel, Chino, and San Bernardino basins in the Los Angeles metropolitan area.
Updated: 2025-07-25 08:43:53
标题: 使用神经算符的环境噪音全波形反演
摘要: 地震波传播的数值模拟对于研究速度结构和改善地震危险评估至关重要。然而,标准方法如有限差分或有限元在计算上是昂贵的。最近的研究表明,一种新的机器学习模型,称为神经算子,可以比传统方法解决弹性动力波方程快几个数量级。全波形反演是加速模拟的主要受益者。神经算子作为端到端可微分算子,结合自动微分,提供了一种替代的伴随状态方法。内置在PyTorch中的最先进的优化技术为神经算子提供了更大的灵活性,以改善全波形反演的优化动力学,从而缓解循环跳跃问题。在这项研究中,我们展示了神经算子在一个真实地震数据集上进行全波形反演的首次应用,该数据集由收集在洛杉矶大都会区的圣加布里埃尔、奇诺和圣贝纳迪诺盆地的几个节点横断面组成。
更新时间: 2025-07-25 08:43:53
领域: physics.geo-ph,cs.LG
A self-supervised neural-analytic method to predict the evolution of COVID-19 in Romania
Analysing and understanding the transmission and evolution of the COVID-19 pandemic is mandatory to be able to design the best social and medical policies, foresee their outcomes and deal with all the subsequent socio-economic effects. We address this important problem from a computational and machine learning perspective. More specifically, we want to statistically estimate all the relevant parameters for the new coronavirus COVID-19, such as the reproduction number, fatality rate or length of infectiousness period, based on Romanian patients, as well as be able to predict future outcomes. This endeavor is important, since it is well known that these factors vary across the globe, and might be dependent on many causes, including social, medical, age and genetic factors. We use a recently published improved version of SEIR, which is the classic, established model for infectious diseases. We want to infer all the parameters of the model, which govern the evolution of the pandemic in Romania, based on the only reliable, true measurement, which is the number of deaths. Once the model parameters are estimated, we are able to predict all the other relevant measures, such as the number of exposed and infectious people. To this end, we propose a self-supervised approach to train a deep convolutional network to guess the correct set of Modified-SEIR model parameters, given the observed number of daily fatalities. Then, we refine the solution with a stochastic coordinate descent approach. We compare our deep learning optimization scheme with the classic grid search approach and show great improvement in both computational time and prediction accuracy. We find an optimistic result in the case fatality rate for Romania which may be around 0.3% and we also demonstrate that our model is able to correctly predict the number of daily fatalities for up to three weeks in the future.
Updated: 2025-07-25 08:32:46
标题: 一个自监督的神经分析方法来预测罗马尼亚COVID-19的演变
摘要: 分析和理解COVID-19大流行的传播和演变对于设计最佳社会和医疗政策、预测其结果并处理所有随后的社会经济影响是必要的。我们从计算和机器学习的角度解决了这个重要问题。更具体地说,我们希望基于罗马尼亚患者,统计估计新冠病毒COVID-19的所有相关参数,如繁殖数、致死率或传染期长度,并能够预测未来的结果。这项努力是重要的,因为众所周知,这些因素在全球范围内变化,并可能取决于许多原因,包括社会、医疗、年龄和遗传因素。我们使用了最近发表的改进版SEIR模型,这是传染病的经典模型。我们希望根据唯一可靠的真实测量,即死亡人数,推断出控制罗马尼亚疫情演变的模型的所有参数。一旦估计出模型参数,我们就能预测所有其他相关的测量,如暴露和感染人数。为此,我们提出了一种自监督方法来训练一个深度卷积网络,猜测正确的Modified-SEIR模型参数集,给定观察到的每日死亡人数。然后,我们使用随机坐标下降方法对解决方案进行优化。我们将我们的深度学习优化方案与经典的网格搜索方法进行比较,并展示了在计算时间和预测准确性方面的显著改进。我们发现罗马尼亚的病死率可能约为0.3%,并且还证明了我们的模型能够正确预测未来三周的每日死亡人数。
更新时间: 2025-07-25 08:32:46
领域: q-bio.PE,cs.LG
PBiLoss: Popularity-Aware Regularization to Improve Fairness in Graph-Based Recommender Systems
Recommender systems, especially those based on graph neural networks (GNNs), have achieved remarkable success in capturing user-item interaction patterns. However, they remain susceptible to popularity bias--the tendency to over-recommend popular items--resulting in reduced content diversity and compromised fairness. In this paper, we propose PBiLoss, a novel regularization-based loss function designed to counteract popularity bias in graph-based recommender models explicitly. PBiLoss augments traditional training objectives by penalizing the model's inclination toward popular items, thereby encouraging the recommendation of less popular but potentially more personalized content. We introduce two sampling strategies: Popular Positive (PopPos) and Popular Negative (PopNeg), which respectively modulate the contribution of the positive and negative popular items during training. We further explore two methods to distinguish popular items: one based on a fixed popularity threshold and another without any threshold, making the approach flexible and adaptive. Our proposed method is model-agnostic and can be seamlessly integrated into state-of-the-art graph-based frameworks such as LightGCN and its variants. Comprehensive experiments across multiple real-world datasets demonstrate that PBiLoss significantly improves fairness, as demonstrated by reductions in the Popularity-Rank Correlation for Users (PRU) and Popularity-Rank Correlation for Items (PRI), while maintaining or even enhancing standard recommendation accuracy and ranking metrics. These results highlight the effectiveness of directly embedding fairness objectives into the optimization process, providing a practical and scalable solution for balancing accuracy and equitable content exposure in modern recommender systems.
Updated: 2025-07-25 08:29:32
标题: PBiLoss:基于图的推荐系统中改善公平性的受欢迎度感知正则化
摘要: 推荐系统,特别是基于图神经网络(GNNs)的系统,在捕捉用户 - 物品交互模式方面取得了显著成功。然而,它们仍然容易受到流行度偏见的影响 - 即倾向于过度推荐流行物品,从而导致内容多样性减少和公平性受损。在本文中,我们提出了一种新颖的基于正则化的损失函数PBiLoss,旨在明确地对抗图形推荐模型中的流行度偏见。PBiLoss通过对模型倾向于流行物品的惩罚来增强传统的训练目标,从而鼓励推荐不太流行但潜在更具个性化的内容。我们引入了两种抽样策略:Popular Positive(PopPos)和Popular Negative(PopNeg),分别在训练过程中调节正面和负面流行物品的贡献。我们进一步探讨了两种区分流行物品的方法:一种基于固定的流行度阈值,另一种则没有任何阈值,使得该方法灵活且适应性强。我们提出的方法与模型无关,可以无缝集成到最先进的基于图的框架中,如LightGCN及其变体。跨多个真实世界数据集进行的全面实验表明,PBiLoss显著提高了公平性,通过减少用户的流行度等级相关性(PRU)和物品的流行度等级相关性(PRI)来证明,同时保持甚至提高标准推荐准确性和排名指标。这些结果突显了将公平性目标直接嵌入优化过程的有效性,为现代推荐系统中平衡准确性和公平内容曝光提供了实用和可扩展的解决方案。
更新时间: 2025-07-25 08:29:32
领域: cs.IR,cs.AI,cs.NE
PBiLoss: Popularity-Aware Regularization to Improve Fairness in Graph-Based Recommender Systems
Recommender systems, especially those based on graph neural networks (GNNs), have achieved remarkable success in capturing user-item interaction patterns. However, they remain susceptible to popularity bias--the tendency to over-recommend popular items--resulting in reduced content diversity and compromised fairness. In this paper, we propose PBiLoss, a novel regularization-based loss function designed to counteract popularity bias in graph-based recommender models explicitly. PBiLoss augments traditional training objectives by penalizing the model's inclination toward popular items, thereby encouraging the recommendation of less popular but potentially more personalized content. We introduce two sampling strategies: Popular Positive (PopPos) and Popular Negative (PopNeg), which respectively modulate the contribution of the positive and negative popular items during training. We further explore two methods to distinguish popular items: one based on a fixed popularity threshold and another without any threshold, making the approach flexible and adaptive. Our proposed method is model-agnostic and can be seamlessly integrated into state-of-the-art graph-based frameworks such as LightGCN and its variants. Comprehensive experiments across multiple real-world datasets demonstrate that PBiLoss significantly improves fairness, as demonstrated by reductions in the Popularity-Rank Correlation for Users (PRU) and Popularity-Rank Correlation for Items (PRI), while maintaining or even enhancing standard recommendation accuracy and ranking metrics. These results highlight the effectiveness of directly embedding fairness objectives into the optimization process, providing a practical and scalable solution for balancing accuracy and equitable content exposure in modern recommender systems.
Updated: 2025-07-25 08:29:32
标题: PBiLoss: 基于图的推荐系统中提高公平性的热门度感知正则化
摘要: 推荐系统,特别是基于图神经网络(GNNs)的系统,在捕捉用户-物品交互模式方面取得了显著成功。然而,它们仍然容易受到流行度偏见的影响——即倾向于过度推荐流行物品——导致内容多样性减少和公平性受损。在本文中,我们提出了一种新颖的基于正则化的损失函数PBiLoss,旨在明确地对抗图形推荐模型中的流行度偏见。PBiLoss通过惩罚模型倾向于流行物品,增加了传统训练目标,从而鼓励推荐较不流行但可能更个性化的内容。我们引入了两种抽样策略:Popular Positive (PopPos)和Popular Negative (PopNeg),分别在训练过程中调节正面和负面流行物品的贡献。我们进一步探讨了两种区分流行物品的方法:一种基于固定的流行度阈值,另一种则没有任何阈值,使得该方法灵活而适应性强。我们提出的方法是与模型无关的,可以无缝集成到最先进的基于图的框架中,如LightGCN及其变体。通过在多个真实世界数据集上进行全面实验,我们证明PBiLoss显著改善了公平性,通过减少用户的流行度-排名相关性(PRU)和物品的流行度-排名相关性(PRI),同时保持甚至提高了标准推荐准确度和排名指标。这些结果突出了将公平性目标直接嵌入到优化过程中的有效性,为在现代推荐系统中平衡准确性和公平内容曝光提供了实用且可扩展的解决方案。
更新时间: 2025-07-25 08:29:32
领域: cs.IR,cs.AI,cs.NE
ToolACE: Winning the Points of LLM Function Calling
Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.
Updated: 2025-07-25 08:26:54
标题: ToolACE:赢得LLM函数调用的关键点
摘要: 功能调用显著扩展了大型语言模型的应用边界,其中高质量和多样化的训练数据对于释放这一能力至关重要。然而,真实的功能调用数据收集和注释起来相当具有挑战性,而由现有流水线生成的合成数据往往缺乏覆盖范围和准确性。在本文中,我们介绍了ToolACE,一个旨在生成准确、复杂和多样化工具学习数据的自动代理管道。ToolACE利用一种新颖的自进化合成过程来策划一个包含26,507个多样化API的全面API池。对话进一步通过多个代理之间的互动生成,受到形式化思维过程的指导。为了确保数据准确性,我们实现了一个结合基于规则和基于模型的检查的双层验证系统。我们展示了在我们合成数据上训练的模型,即使只有8B参数,也在伯克利功能调用排行榜上实现了最先进的性能,与最新的GPT-4模型不相上下。我们的模型和部分数据公开可用于https://huggingface.co/Team-ACE。
更新时间: 2025-07-25 08:26:54
领域: cs.LG,cs.AI,cs.CL
ToolACE: Winning the Points of LLM Function Calling
Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.
Updated: 2025-07-25 08:26:54
标题: ToolACE:赢得LLM函数调用点
摘要: 函数调用显著扩展了大型语言模型的应用范围,高质量和多样化的训练数据对于释放这种能力至关重要。然而,真实的函数调用数据收集和注释非常具有挑战性,而现有管道生成的合成数据往往缺乏覆盖范围和准确性。在本文中,我们提出了ToolACE,一个自动代理管道,旨在生成准确、复杂和多样化的工具学习数据。ToolACE利用一种新颖的自我演化合成过程来策划一个包含26,507个多样化API的全面API池。通过多个代理之间的相互作用生成对话,引导一个形式化的思维过程。为了确保数据的准确性,我们实施了一个结合基于规则和基于模型的双层验证系统。我们展示,即使只有8B参数,训练在我们合成数据上的模型也在伯克利函数调用排行榜上取得了最先进的性能,与最新的GPT-4模型相媲美。我们的模型和部分数据可以在https://huggingface.co/Team-ACE 上公开获取。
更新时间: 2025-07-25 08:26:54
领域: cs.LG,cs.AI,cs.CL
Towards Sustainability Model Cards
The growth of machine learning (ML) models and associated datasets triggers a consequent dramatic increase in energy costs for the use and training of these models. In the current context of environmental awareness and global sustainability concerns involving ICT, Green AI is becoming an important research topic. Initiatives like the AI Energy Score Ratings are a good example. Nevertheless, these benchmarking attempts are still to be integrated with existing work on Quality Models and Service-Level Agreements common in other, more mature, ICT subfields. This limits the (automatic) analysis of this model energy descriptions and their use in (semi)automatic model comparison, selection, and certification processes. We aim to leverage the concept of quality models and merge it with existing ML model reporting initiatives and Green/Frugal AI proposals to formalize a Sustainable Quality Model for AI/ML models. As a first step, we propose a new Domain-Specific Language to precisely define the sustainability aspects of an ML model (including the energy costs for its different tasks). This information can then be exported as an extended version of the well-known Model Cards initiative while, at the same time, being formal enough to be input of any other model description automatic process.
Updated: 2025-07-25 08:26:53
标题: 走向可持续性模型卡
摘要: 机器学习(ML)模型的增长以及相关数据集的增加导致了使用和训练这些模型的能源成本的急剧增加。在当前环境意识和全球可持续性关注ICT的背景下,绿色人工智能正成为一个重要的研究课题。像AI能源评分等倡议是一个很好的例子。然而,这些基准尝试仍需与其他更成熟的ICT子领域中常见的质量模型和服务级别协议的现有工作整合。这限制了对这些模型能源描述的(自动)分析及其在(半)自动模型比较、选择和认证过程中的使用。我们旨在利用质量模型的概念,并将其与现有的ML模型报告倡议和绿色/节约AI提案相结合,以形成一个针对AI/ML模型的可持续质量模型。作为第一步,我们提出了一种新的领域特定语言,以精确定义ML模型的可持续性方面(包括其不同任务的能源成本)。这些信息随后可以导出为众所周知的模型卡片倡议的扩展版本,同时也足够正式,可作为任何其他模型描述自动流程的输入。
更新时间: 2025-07-25 08:26:53
领域: cs.CY,cs.AI,cs.LG
Decision by Supervised Learning with Deep Ensembles: A Practical Framework for Robust Portfolio Optimization
We propose Decision by Supervised Learning (DSL), a practical framework for robust portfolio optimization. DSL reframes portfolio construction as a supervised learning problem: models are trained to predict optimal portfolio weights, using cross-entropy loss and portfolios constructed by maximizing the Sharpe or Sortino ratio. To further enhance stability and reliability, DSL employs Deep Ensemble methods, substantially reducing variance in portfolio allocations. Through comprehensive backtesting across diverse market universes and neural architectures, shows superior performance compared to both traditional strategies and leading machine learning-based methods, including Prediction-Focused Learning and End-to-End Learning. We show that increasing the ensemble size leads to higher median returns and more stable risk-adjusted performance. The code is available at https://github.com/DSLwDE/DSLwDE.
Updated: 2025-07-25 08:25:59
标题: 用监督学习和深度集成进行决策:稳健投资组合优化的实用框架
摘要: 我们提出了一种名为DSL(Decision by Supervised Learning)的实用框架,用于稳健的投资组合优化。DSL将投资组合构建重新框定为一个监督学习问题:模型经过训练来预测最佳投资组合权重,使用交叉熵损失和通过最大化夏普比率或Sortino比率构建的投资组合。为了进一步增强稳定性和可靠性,DSL采用了深度集成方法,大幅减少了投资组合配置中的方差。通过在不同市场范围和神经架构上进行全面回测,相比传统策略和领先的基于机器学习的方法(包括基于预测的学习和端到端学习),显示出卓越的表现。我们表明增加集成规模会导致更高的中位回报和更稳定的风险调整表现。代码可在https://github.com/DSLwDE/DSLwDE 上找到。
更新时间: 2025-07-25 08:25:59
领域: cs.LG,q-fin.CP,q-fin.PM
GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
Recent advances in end-to-end spoken language models (SLMs) have significantly improved the ability of AI systems to engage in natural spoken interactions. However, most existing models treat speech merely as a vehicle for linguistic content, often overlooking the rich paralinguistic and speaker characteristic cues embedded in human speech, such as dialect, age, emotion, and non-speech vocalizations. In this work, we introduce GOAT-SLM, a novel spoken language model with paralinguistic and speaker characteristic awareness, designed to extend spoken language modeling beyond text semantics. GOAT-SLM adopts a dual-modality head architecture that decouples linguistic modeling from acoustic realization, enabling robust language understanding while supporting expressive and adaptive speech generation. To enhance model efficiency and versatility, we propose a modular, staged training strategy that progressively aligns linguistic, paralinguistic, and speaker characteristic information using large-scale speech-text corpora. Experimental results on TELEVAL, a multi-dimensional evaluation benchmark, demonstrate that GOAT-SLM achieves well-balanced performance across both semantic and non-semantic tasks, and outperforms existing open-source models in handling emotion, dialectal variation, and age-sensitive interactions. This work highlights the importance of modeling beyond linguistic content and advances the development of more natural, adaptive, and socially aware spoken language systems.
Updated: 2025-07-25 08:25:27
标题: GOAT-SLM:一种具有语用和说话者特征意识的口语语言模型
摘要: 最近在端到端口语语言模型(SLMs)方面取得的进展显著提高了人工智能系统进行自然口语交互的能力。然而,大多数现有模型仅将语音视为语言内容的载体,通常忽视了人类语音中嵌入的丰富的语用和说话者特征线索,如方言、年龄、情绪和非语音发声。在这项工作中,我们介绍了GOAT-SLM,一种具有语用和说话者特征意识的新型口语语言模型,旨在将口语语言建模扩展到文本语义之外。GOAT-SLM采用了双模态头部架构,将语言建模与声学实现分离,实现了强大的语言理解,同时支持富有表现力和适应性的语音生成。为了提高模型的效率和多功能性,我们提出了一种模块化、分阶段的训练策略,逐步利用大规模语音-文本语料库对语言、语用和说话者特征信息进行对齐。在多维评估基准TELEVAL上的实验结果表明,GOAT-SLM在语义和非语义任务之间实现了良好平衡的性能,并在处理情绪、方言变化和年龄敏感交互方面优于现有的开源模型。这项工作强调了超越语言内容建模的重要性,推动了更自然、适应性更强、社会意识更强的口语语言系统的发展。
更新时间: 2025-07-25 08:25:27
领域: cs.CL,cs.AI,cs.SD,eess.AS
GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
Recent advances in end-to-end spoken language models (SLMs) have significantly improved the ability of AI systems to engage in natural spoken interactions. However, most existing models treat speech merely as a vehicle for linguistic content, often overlooking the rich paralinguistic and speaker characteristic cues embedded in human speech, such as dialect, age, emotion, and non-speech vocalizations. In this work, we introduce GOAT-SLM, a novel spoken language model with paralinguistic and speaker characteristic awareness, designed to extend spoken language modeling beyond text semantics. GOAT-SLM adopts a dual-modality head architecture that decouples linguistic modeling from acoustic realization, enabling robust language understanding while supporting expressive and adaptive speech generation. To enhance model efficiency and versatility, we propose a modular, staged training strategy that progressively aligns linguistic, paralinguistic, and speaker characteristic information using large-scale speech-text corpora. Experimental results on TELEVAL, a multi-dimensional evaluation benchmark, demonstrate that GOAT-SLM achieves well-balanced performance across both semantic and non-semantic tasks, and outperforms existing open-source models in handling emotion, dialectal variation, and age-sensitive interactions. This work highlights the importance of modeling beyond linguistic content and advances the development of more natural, adaptive, and socially aware spoken language systems.
Updated: 2025-07-25 08:25:27
标题: GOAT-SLM: 具有语际和说话者特征意识的口语语言模型
摘要: 最近,端到端口语语言模型(SLMs)的进展显著提高了人工智能系统进行自然口语交互的能力。然而,大多数现有模型仅将语音视为语言内容的载体,往往忽视了人类语音中嵌入的丰富的语用和说话者特征线索,如方言、年龄、情绪和非语言语音化。在这项工作中,我们介绍了GOAT-SLM,这是一个具有语用和说话者特征意识的新型口语语言模型,旨在将口语语言建模拓展到超出文本语义范畴。GOAT-SLM采用了双模态头结构,将语言建模与声学实现分离,实现了强大的语言理解,同时支持富有表现力和适应性的语音生成。为了增强模型效率和多功能性,我们提出了一种模块化、分阶段的训练策略,逐步利用大规模语音文本语料库对齐语言、语用和说话者特征信息。在TELEVAL上的实验结果,一个多维评估基准,表明GOAT-SLM在语义和非语义任务之间实现了良好的平衡性能,并且在处理情绪、方言变化和年龄敏感交互方面优于现有的开源模型。这项工作突显了超越语言内容建模的重要性,推动了更自然、适应性更强、更具社会意识的口语语言系统的发展。
更新时间: 2025-07-25 08:25:27
领域: cs.CL,cs.AI,cs.SD,eess.AS
XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare
Clinical decision support systems require models that are not only highly accurate but also equitable and sensitive to the implications of missed diagnoses. In this study, we introduce a knowledge-guided in-context learning (ICL) framework designed to enable large language models (LLMs) to effectively process structured clinical data. Our approach integrates domain-specific feature groupings, carefully balanced few-shot examples, and task-specific prompting strategies. We systematically evaluate this method across seventy distinct ICL designs by various prompt variations and two different communication styles-natural-language narrative and numeric conversational-and compare its performance to robust classical machine learning (ML) benchmarks on tasks involving heart disease and diabetes prediction. Our findings indicate that while traditional ML models maintain superior performance in balanced precision-recall scenarios, LLMs employing narrative prompts with integrated domain knowledge achieve higher recall and significantly reduce gender bias, effectively narrowing fairness disparities by an order of magnitude. Despite the current limitation of increased inference latency, LLMs provide notable advantages, including the capacity for zero-shot deployment and enhanced equity. This research offers the first comprehensive analysis of ICL design considerations for applying LLMs to tabular clinical tasks and highlights distillation and multimodal extensions as promising directions for future research.
Updated: 2025-07-25 08:24:58
标题: XAI4LLM. 让机器学习模型和LLMs合作,以增强医疗保健领域的上下文学习
摘要: 临床决策支持系统需要不仅高度准确,而且公平且敏感于漏诊的影响的模型。在这项研究中,我们引入了一个知识引导的上下文学习(ICL)框架,旨在使大型语言模型(LLMs)能够有效处理结构化临床数据。我们的方法整合了领域特定的特征分组、精心平衡的少量示例和任务特定的提示策略。我们通过七十种不同的ICL设计系统地评估这种方法,通过各种提示变化和两种不同的通信样式-自然语言叙述和数字对话,将其性能与强大的经典机器学习(ML)基准在涉及心脏病和糖尿病预测任务上进行比较。 我们的研究结果表明,传统的机器学习模型在平衡精确度-召回率场景中保持优越性能,而采用整合领域知识的叙述提示的LLMs实现了更高的召回率,并显著减少了性别偏见,有效地将公平差距缩小了一个数量级。尽管当前存在增加推理延迟的限制,LLMs提供了显著优势,包括零-shot部署能力和增强的公平性。这项研究为将LLMs应用于表格化临床任务的ICL设计考虑提供了首次全面分析,并强调了提炼和多模态扩展作为未来研究的有前途的方向。
更新时间: 2025-07-25 08:24:58
领域: cs.LG,cs.AI,cs.CL
XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare
Clinical decision support systems require models that are not only highly accurate but also equitable and sensitive to the implications of missed diagnoses. In this study, we introduce a knowledge-guided in-context learning (ICL) framework designed to enable large language models (LLMs) to effectively process structured clinical data. Our approach integrates domain-specific feature groupings, carefully balanced few-shot examples, and task-specific prompting strategies. We systematically evaluate this method across seventy distinct ICL designs by various prompt variations and two different communication styles-natural-language narrative and numeric conversational-and compare its performance to robust classical machine learning (ML) benchmarks on tasks involving heart disease and diabetes prediction. Our findings indicate that while traditional ML models maintain superior performance in balanced precision-recall scenarios, LLMs employing narrative prompts with integrated domain knowledge achieve higher recall and significantly reduce gender bias, effectively narrowing fairness disparities by an order of magnitude. Despite the current limitation of increased inference latency, LLMs provide notable advantages, including the capacity for zero-shot deployment and enhanced equity. This research offers the first comprehensive analysis of ICL design considerations for applying LLMs to tabular clinical tasks and highlights distillation and multimodal extensions as promising directions for future research.
Updated: 2025-07-25 08:24:58
标题: XAI4LLM. 让机器学习模型和LLMs合作,提升医疗领域中的上下文学习
摘要: 临床决策支持系统需要既高度准确又公平敏感于漏诊的影响的模型。在本研究中,我们引入了一个知识引导的上下文学习(ICL)框架,旨在使大型语言模型(LLMs)能够有效处理结构化的临床数据。我们的方法整合了领域特定的特征分组、精心平衡的少量示例以及任务特定的提示策略。我们通过七十种不同的ICL设计系统地评估了这种方法,通过各种提示变化和两种不同的通信风格-自然语言叙述和数字交流-将其性能与涉及心脏病和糖尿病预测的健壮经典机器学习(ML)基准进行比较。 我们的研究结果表明,传统的ML模型在平衡精确度-召回率方案中保持卓越性能,而使用结合领域知识的叙述提示的LLMs实现了更高的召回率,并显著减少了性别偏见,有效地将公平差距缩小了一个数量级。尽管目前存在增加推断延迟的局限性,LLMs提供了显著优势,包括零-shot部署的能力和增强的公平性。这项研究为将LLMs应用于表格化临床任务的ICL设计考虑提供了首个全面分析,并强调了提炼和多模式扩展作为未来研究的有希望的方向。
更新时间: 2025-07-25 08:24:58
领域: cs.LG,cs.AI,cs.CL
PurpCode: Reasoning for Safer Code Generation
We introduce PurpCode, the first post-training recipe for training safe code reasoning models towards generating secure code and defending against malicious cyberactivities. PurpCode trains a reasoning model in two stages: (i) Rule Learning, which explicitly teaches the model to reference cybersafety rules to generate vulnerability-free code and to avoid facilitating malicious cyberactivities; and (ii) Reinforcement Learning, which optimizes model safety and preserves model utility through diverse, multi-objective reward mechanisms. To empower the training pipelines with comprehensive cybersafety data, we conduct internal red-teaming to synthesize comprehensive and high-coverage prompts based on real-world tasks for inducing unsafe cyberactivities in the model. Based on PurpCode, we develop a reasoning-based coding model, namely PurpCode-32B, which demonstrates state-of-the-art cybersafety, outperforming various frontier models. Meanwhile, our alignment method decreases the model overrefusal rates in both general and cybersafety-specific scenarios, while preserving model utility in both code generation and common security knowledge.
Updated: 2025-07-25 08:23:00
标题: PurpCode:更安全代码生成的推理
摘要: 我们介绍了PurpCode,这是第一个用于训练安全代码推理模型生成安全代码并防御恶意网络活动的后训练配方。PurpCode在两个阶段训练推理模型:(i)规则学习,明确教导模型参考网络安全规则生成无漏洞代码并避免促进恶意网络活动;以及(ii)强化学习,通过多样化、多目标奖励机制优化模型安全性并保留模型效用。为了为训练管道提供全面的网络安全数据,我们进行内部红队合作,基于真实任务合成全面而高覆盖的提示,诱发模型中的不安全网络活动。基于PurpCode,我们开发了一种基于推理的编码模型,即PurpCode-32B,展示了最先进的网络安全性,优于各种前沿模型。同时,我们的对齐方法降低了模型在一般和网络安全特定场景中的过度拒绝率,同时在代码生成和常见安全知识方面保持了模型效用。
更新时间: 2025-07-25 08:23:00
领域: cs.CR,cs.CL,cs.LG,cs.SE
Generating Adversarial Point Clouds Using Diffusion Model
Adversarial attack methods for 3D point cloud classification reveal the vulnerabilities of point cloud recognition models. This vulnerability could lead to safety risks in critical applications that use deep learning models, such as autonomous vehicles. To uncover the deficiencies of these models, researchers can evaluate their security through adversarial attacks. However, most existing adversarial attack methods are based on white-box attacks. While these methods achieve high attack success rates and imperceptibility, their applicability in real-world scenarios is limited. Black-box attacks, which are more meaningful in real-world scenarios, often yield poor results. This paper proposes a novel black-box adversarial example generation method that utilizes a diffusion model to improve the attack success rate and imperceptibility in the black-box setting, without relying on the internal information of the point cloud classification model to generate adversarial samples. We use a 3D diffusion model to use the compressed features of the point cloud as prior knowledge to guide the reverse diffusion process to add adversarial points to clean examples. Subsequently, its reverse process is employed to transform the distribution of other categories into adversarial points, which are then added to the point cloud.
Updated: 2025-07-25 08:20:41
标题: 使用扩散模型生成对抗性点云
摘要: 三维点云分类的对抗攻击方法揭示了点云识别模型的脆弱性。这种脆弱性可能导致使用深度学习模型的关键应用中存在安全风险,例如自动驾驶汽车。为了揭示这些模型的缺陷,研究人员可以通过对抗攻击来评估它们的安全性。然而,大多数现有的对抗攻击方法都是基于白盒攻击。虽然这些方法可以实现较高的攻击成功率和不可察觉性,但它们在现实场景中的适用性有限。黑盒攻击在现实场景中更有意义,但通常效果较差。本文提出了一种利用扩散模型提高黑盒设置中攻击成功率和不可察觉性的新型黑盒对抗样本生成方法,而无需依赖点云分类模型的内部信息来生成对抗样本。我们使用三维扩散模型利用点云的压缩特征作为先验知识,引导反向扩散过程向干净示例中添加对抗点。随后,利用其反向过程将其他类别的分布转化为对抗点,然后将其添加到点云中。
更新时间: 2025-07-25 08:20:41
领域: cs.CR,cs.AI,cs.LG
Exploring molecular assembly as a biosignature using mass spectrometry and machine learning
Molecular assembly offers a promising path to detect life beyond Earth, while minimizing assumptions based on terrestrial life. As mass spectrometers will be central to upcoming Solar System missions, predicting molecular assembly from their data without needing to elucidate unknown structures will be essential for unbiased life detection. An ideal agnostic biosignature must be interpretable and experimentally measurable. Here, we show that molecular assembly, a recently developed approach to measure objects that have been produced by evolution, satisfies both criteria. First, it is interpretable for life detection, as it reflects the assembly of molecules with their bonds as building blocks, in contrast to approaches that discount construction history. Second, it can be determined without structural elucidation, as it can be physically measured by mass spectrometry, a property that distinguishes it from other approaches that use structure-based information measures for molecular complexity. Whilst molecular assembly is directly measurable using mass spectrometry data, there are limits imposed by mission constraints. To address this, we developed a machine learning model that predicts molecular assembly with high accuracy, reducing error by three-fold compared to baseline models. Simulated data shows that even small instrumental inconsistencies can double model error, emphasizing the need for standardization. These results suggest that standardized mass spectrometry databases could enable accurate molecular assembly prediction, without structural elucidation, providing a proof-of-concept for future astrobiology missions.
Updated: 2025-07-25 08:19:15
标题: 利用质谱和机器学习探索分子组装作为生物标志的研究
摘要: 分子组装提供了一条有希望的途径来探测地球以外的生命,同时尽量减少基于地球生命的假设。由于质谱仪将成为即将到来的太阳系任务的核心,因此在不需要阐明未知结构的情况下,从它们的数据中预测分子组装将对无偏见的生命检测至关重要。一种理想的不可知生物标志必须是可解释和可实验测量的。在这里,我们展示了分子组装,这是一种最近开发的方法,用于测量通过进化产生的物体,满足这两个标准。首先,它对于生命检测是可解释的,因为它反映了分子的组装,其键作为构建块,与不考虑构建历史的方法形成对比。其次,它可以在不需要解明结构的情况下确定,因为它可以通过质谱仪进行物理测量,这与使用基于结构的信息测量来衡量分子复杂性的其他方法有所不同。虽然分子组装可以直接通过质谱仪数据进行测量,但受到任务限制的限制。为了解决这个问题,我们开发了一个机器学习模型,可以高精度地预测分子组装,与基准模型相比,将误差降低了三倍。模拟数据显示,即使是小的仪器不一致性也会使模型误差翻倍,强调了标准化的必要性。这些结果表明,标准化的质谱数据库可以实现准确的分子组装预测,而无需解明结构,为未来天体生物学任务提供了概念验证。
更新时间: 2025-07-25 08:19:15
领域: cs.LG
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Text-to-image (T2I) models have rapidly advanced, enabling the generation of high-quality images from text prompts across various domains. However, these models present notable safety concerns, including the risk of generating harmful, biased, or private content. Current research on assessing T2I safety remains in its early stages. While some efforts have been made to evaluate models on specific safety dimensions, many critical risks remain unexplored. To address this gap, we introduce T2ISafety, a safety benchmark that evaluates T2I models across three key domains: toxicity, fairness, and bias. We build a detailed hierarchy of 12 tasks and 44 categories based on these three domains, and meticulously collect 70K corresponding prompts. Based on this taxonomy and prompt set, we build a large-scale T2I dataset with 68K manually annotated images and train an evaluator capable of detecting critical risks that previous work has failed to identify, including risks that even ultra-large proprietary models like GPTs cannot correctly detect. We evaluate 12 prominent diffusion models on T2ISafety and reveal several concerns including persistent issues with racial fairness, a tendency to generate toxic content, and significant variation in privacy protection across the models, even with defense methods like concept erasing. Data and evaluator are released under https://github.com/adwardlee/t2i_safety.
Updated: 2025-07-25 08:18:26
标题: T2ISafety:评估图像生成中公平性、毒性和隐私的基准
摘要: 文本到图像(T2I)模型已经迅速发展,可以通过各个领域的文本提示生成高质量的图像。然而,这些模型存在明显的安全问题,包括生成有害、偏见或私人内容的风险。目前关于评估T2I安全性的研究仍处于早期阶段。虽然已经有一些努力在特定安全维度上评估模型,但许多关键风险仍未被探索。为了填补这一空白,我们引入了T2ISafety,这是一个评估T2I模型在毒性、公平性和偏见三个关键领域的安全基准。我们基于这三个领域构建了一个包含12个任务和44个类别的详细层次结构,并精心收集了70,000个相应的提示。基于这个分类法和提示集,我们构建了一个大规模的T2I数据集,包括68,000张手动注释的图像,并训练了一个评估器,能够检测到以前的工作未能识别的关键风险,包括即使是像GPT这样的超大专有模型也无法正确检测的风险。我们在T2ISafety上评估了12个主要的扩散模型,并揭示了一些问题,包括在种族公平性方面持续存在的问题、生成有毒内容的倾向,以及在隐私保护方面的显著差异,即使使用了概念擦除等防御方法。数据和评估器可以在https://github.com/adwardlee/t2i_safety 上获取。
更新时间: 2025-07-25 08:18:26
领域: cs.CL,cs.CR
Virtual local area network over HTTP for launching an insider attack
Computers and computer networks have become integral to virtually every aspect of modern life, with the Internet playing an indispensable role. Organizations, businesses, and individuals now store vast amounts of proprietary, confidential, and personal data digitally. As such, ensuring the security of this data from unauthorized access is critical. Common security measures, such as firewalls, intrusion detection systems (IDS), intrusion prevention systems (IPS), and antivirus software, are constantly evolving to safeguard computer systems and networks. However, these tools primarily focus on defending against external threats, leaving systems vulnerable to insider attacks. Security solutions designed to mitigate risks originating from within the organization are relatively limited and often ineffective. This paper demonstrates how a Local Area Network (LAN) can be covertly exposed to the Internet via an insider attack. Specifically, it illustrates how an external machine can gain access to a LAN by exploiting an unused secondary IP address of the attacked LAN, effectively bypassing existing security mechanisms by also exploiting Hyper Text Transfer Protocol (HTTP). Despite the presence of robust external protections, such as firewalls and IDS, this form of insider attack reveals significant vulnerabilities in the way internal threats are addressed.
Updated: 2025-07-25 08:16:19
标题: 基于HTTP的虚拟局域网用于发动内部人攻击
摘要: 计算机和计算机网络已经成为现代生活几乎每个方面不可或缺的一部分,互联网发挥着不可或缺的作用。组织、企业和个人现在以数字方式存储大量专有、机密和个人数据。因此,确保这些数据免受未经授权的访问是至关重要的。常见的安全措施,如防火墙、入侵检测系统(IDS)、入侵防范系统(IPS)和防病毒软件,不断发展,以保护计算机系统和网络。然而,这些工具主要侧重于防御外部威胁,使系统容易受到内部攻击。旨在减轻内部风险的安全解决方案相对有限,通常效果不佳。本文演示了如何通过内部攻击将局域网(LAN)暴露给互联网。具体来说,它说明了外部计算机如何利用被攻击的LAN的未使用的次要IP地址来访问LAN,有效地绕过现有的安全机制,同时利用超文本传输协议(HTTP)。尽管存在强大的外部保护措施,如防火墙和IDS,这种形式的内部攻击揭示了方式内部威胁被处理存在重大漏洞。
更新时间: 2025-07-25 08:16:19
领域: cs.CR,cs.NI
Closing the Modality Gap for Mixed Modality Search
Mixed modality search -- retrieving information across a heterogeneous corpus composed of images, texts, and multimodal documents -- is an important yet underexplored real-world application. In this work, we investigate how contrastive vision-language models, such as CLIP, perform on the mixed modality search task. Our analysis reveals a critical limitation: these models exhibit a pronounced modality gap in the embedding space, where image and text embeddings form distinct clusters, leading to intra-modal ranking bias and inter-modal fusion failure. To address this issue, we propose GR-CLIP, a lightweight post-hoc calibration method that removes the modality gap in CLIP's embedding space. Evaluated on MixBench -- the first benchmark specifically designed for mixed modality search -- GR-CLIP improves NDCG@10 by up to 26 percentage points over CLIP, surpasses recent vision-language generative embedding models by 4 percentage points, while using 75x less compute.
Updated: 2025-07-25 08:15:28
标题: 消除混合模态搜索的模态差距
摘要: 混合模态搜索——检索跨图像、文本和多模态文档的异构语料库中的信息——是一个重要但尚未充分探索的真实世界应用。在这项工作中,我们研究了对比视觉-语言模型(如CLIP)在混合模态搜索任务中的表现。我们的分析揭示了一个关键限制:这些模型在嵌入空间中存在明显的模态差距,图像和文本嵌入形成不同的聚类,导致模态内排序偏差和模态间融合失败。为了解决这个问题,我们提出了GR-CLIP,这是一种轻量级的事后校准方法,可以消除CLIP嵌入空间中的模态差距。在MixBench上进行评估——这是专门为混合模态搜索设计的第一个基准测试——GR-CLIP将NDCG@10提高了高达26个百分点,超过了最近的视觉-语言生成嵌入模型4个百分点,同时使用的计算量减少了75倍。
更新时间: 2025-07-25 08:15:28
领域: cs.CV,cs.AI,cs.CL,cs.IR,cs.LG
Closing the Modality Gap for Mixed Modality Search
Mixed modality search -- retrieving information across a heterogeneous corpus composed of images, texts, and multimodal documents -- is an important yet underexplored real-world application. In this work, we investigate how contrastive vision-language models, such as CLIP, perform on the mixed modality search task. Our analysis reveals a critical limitation: these models exhibit a pronounced modality gap in the embedding space, where image and text embeddings form distinct clusters, leading to intra-modal ranking bias and inter-modal fusion failure. To address this issue, we propose GR-CLIP, a lightweight post-hoc calibration method that removes the modality gap in CLIP's embedding space. Evaluated on MixBench -- the first benchmark specifically designed for mixed modality search -- GR-CLIP improves NDCG@10 by up to 26 percentage points over CLIP, surpasses recent vision-language generative embedding models by 4 percentage points, while using 75x less compute.
Updated: 2025-07-25 08:15:28
标题: 弥合混合模态搜索的模态差距
摘要: 混合模态搜索——检索跨越由图像、文本和多模式文档组成的异构语料库的信息——是一个重要但尚未充分探索的现实世界应用。在这项工作中,我们调查了对比视觉语言模型(如CLIP)在混合模态搜索任务上的表现。我们的分析揭示了一个关键限制:这些模型在嵌入空间中表现出明显的模态差距,其中图像和文本嵌入形成不同的聚类,导致了模态内排序偏差和模态间融合失败。为了解决这个问题,我们提出了GR-CLIP,这是一种轻量级的事后校准方法,可以消除CLIP嵌入空间中的模态差距。在MixBench上进行评估——这是专门设计用于混合模态搜索的第一个基准测试——GR-CLIP将NDCG@10提高了高达26个百分点,超过了最近的视觉语言生成嵌入模型4个百分点,同时使用的计算资源少了75倍。
更新时间: 2025-07-25 08:15:28
领域: cs.CV,cs.AI,cs.CL,cs.IR,cs.LG
Dynamics-Informed Reservoir Computing with Visibility Graphs
Accurate prediction of complex and nonlinear time series remains a challenging problem across engineering and scientific disciplines. Reservoir computing (RC) offers a computationally efficient alternative to traditional deep learning by training only the read-out layer while employing a randomly structured and fixed reservoir network. Despite its advantages, the largely random reservoir graph architecture often results in suboptimal and oversized networks with poorly understood dynamics. Addressing this issue, we propose a novel Dynamics-Informed Reservoir Computing (DyRC) framework that systematically infers the reservoir network structure directly from the input training sequence. This work proposes to employ the visibility graph (VG) technique, which converts time series data into networks by representing measurement points as nodes linked by mutual visibility. The reservoir network is constructed by directly adopting the VG network from a training data sequence, leveraging the parameter-free visibility graph approach to avoid expensive hyperparameter tuning. This process results in a reservoir that is directly informed by the specific dynamics of the prediction task under study. We assess the DyRC-VG method through prediction tasks involving the canonical nonlinear Duffing oscillator, evaluating prediction accuracy and consistency. Compared to an Erd\H{o}s-R\'enyi graph of the same size, spectral radius, and comparable density, we observe higher prediction quality and more consistent performance over repeated implementations in the DyRC-VG.
Updated: 2025-07-25 08:07:17
标题: 基于可见性图的动力学信息感知的水库计算
摘要: 准确预测复杂和非线性时间序列仍然是工程和科学领域面临的一个具有挑战性的问题。沉积计算(RC)通过仅训练输出层而利用随机结构和固定储层网络,为传统深度学习提供了一种计算效率高的替代方案。尽管具有这些优势,但很大程度上随机的储层图结构通常会导致效果不佳且过度大小的网络,其动态行为难以理解。为解决这一问题,我们提出了一种新颖的基于动态信息的沉积计算(DyRC)框架,该框架通过系统地从输入训练序列中推断储层网络结构。本研究提出采用可见度图(VG)技术,将时间序列数据转化为网络,将测量点表示为通过相互可见性连接的节点。储层网络是通过直接采用来自训练数据序列的VG网络构建的,利用无参数可见度图方法来避免昂贵的超参数调整。这个过程导致了一个直接受到研究对象预测任务具体动态影响的储层。我们通过涉及经典非线性达芬振荡器的预测任务评估DyRC-VG方法,评估预测准确性和一致性。与相同大小、谱半径和可比密度的Erd\H{o}s-R\'enyi图相比,在DyRC-VG中的重复实现中观察到更高的预测质量和更一致的性能。
更新时间: 2025-07-25 08:07:17
领域: cs.LG
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
Nonprehensile manipulation is crucial for handling objects that are too thin, large, or otherwise ungraspable in unstructured environments. While conventional planning-based approaches struggle with complex contact modeling, learning-based methods have recently emerged as a promising alternative. However, existing learning-based approaches face two major limitations: they heavily rely on multi-view cameras and precise pose tracking, and they fail to generalize across varying physical conditions, such as changes in object mass and table friction. To address these challenges, we propose the Dynamics-Adaptive World Action Model (DyWA), a novel framework that enhances action learning by jointly predicting future states while adapting to dynamics variations based on historical trajectories. By unifying the modeling of geometry, state, physics, and robot actions, DyWA enables more robust policy learning under partial observability. Compared to baselines, our method improves the success rate by 31.5% using only single-view point cloud observations in the simulation. Furthermore, DyWA achieves an average success rate of 68% in real-world experiments, demonstrating its ability to generalize across diverse object geometries, adapt to varying table friction, and robustness in challenging scenarios such as half-filled water bottles and slippery surfaces.
Updated: 2025-07-25 07:49:01
标题: DyWA: 用于通用非抓取操作的动态自适应世界动作模型
摘要: 非抓取式操纵对于处理在非结构化环境中过于薄、大或无法抓取的物体至关重要。虽然传统的基于规划的方法在复杂的接触建模方面存在困难,但学习为基础的方法最近作为一种有前途的替代方案出现。然而,现有的学习为基础的方法面临两个主要限制:它们严重依赖于多视角摄像头和精确的姿态跟踪,并且无法在不同的物理条件下进行泛化,比如物体质量和桌面摩擦的变化。为了解决这些挑战,我们提出了一种新颖的框架——动态自适应世界动作模型(DyWA),通过在历史轨迹基础上联合预测未来状态,并根据动态变化来适应,增强行动学习。通过统一几何建模、状态、物理和机器人动作,DyWA在部分可观测性下实现了更稳健的策略学习。与基线相比,我们的方法在模拟中仅使用单视角点云观察就将成功率提高了31.5%。此外,DyWA在真实世界实验中实现了平均成功率为68%,展示了其在不同物体几何形状、适应不同桌面摩擦和具有挑战性的情景(如半满的水瓶和滑动表面)中的泛化能力和稳健性。
更新时间: 2025-07-25 07:49:01
领域: cs.RO,cs.AI
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
Nonprehensile manipulation is crucial for handling objects that are too thin, large, or otherwise ungraspable in unstructured environments. While conventional planning-based approaches struggle with complex contact modeling, learning-based methods have recently emerged as a promising alternative. However, existing learning-based approaches face two major limitations: they heavily rely on multi-view cameras and precise pose tracking, and they fail to generalize across varying physical conditions, such as changes in object mass and table friction. To address these challenges, we propose the Dynamics-Adaptive World Action Model (DyWA), a novel framework that enhances action learning by jointly predicting future states while adapting to dynamics variations based on historical trajectories. By unifying the modeling of geometry, state, physics, and robot actions, DyWA enables more robust policy learning under partial observability. Compared to baselines, our method improves the success rate by 31.5% using only single-view point cloud observations in the simulation. Furthermore, DyWA achieves an average success rate of 68% in real-world experiments, demonstrating its ability to generalize across diverse object geometries, adapt to varying table friction, and robustness in challenging scenarios such as half-filled water bottles and slippery surfaces.
Updated: 2025-07-25 07:49:01
标题: DyWA:动态自适应世界动作模型用于通用非抓取操纵
摘要: 非抓握式操作对于处理在非结构化环境中太薄、太大或其他无法抓握的物体至关重要。虽然传统的基于规划的方法在复杂接触建模方面存在困难,但学习型方法最近已经成为一个有希望的替代方案。然而,现有的基于学习的方法面临两个主要限制:它们严重依赖于多视角摄像头和精确的姿态跟踪,并且无法在不同的物理条件下泛化,比如物体质量和桌面摩擦的变化。为了解决这些挑战,我们提出了动态自适应世界动作模型(DyWA),这是一个新颖的框架,通过联合预测未来状态并根据历史轨迹调整动态变化来增强行动学习。通过统一几何、状态、物理和机器人行动的建模,DyWA使得在部分可观察性下更加鲁棒的策略学习成为可能。与基线相比,我们的方法在模拟中仅使用单视角点云观测就将成功率提高了31.5%。此外,DyWA在真实世界实验中实现了平均成功率为68%,展示了其在泛化到不同物体几何形状、适应不同桌面摩擦和挑战场景中的鲁棒性,例如半满水瓶和滑动表面。
更新时间: 2025-07-25 07:49:01
领域: cs.RO,cs.AI
Neural Ordinary Differential Equations for Learning and Extrapolating System Dynamics Across Bifurcations
Forecasting system behaviour near and across bifurcations is crucial for identifying potential shifts in dynamical systems. While machine learning has recently been used to learn critical transitions and bifurcation structures from data, most studies remain limited as they exclusively focus on discrete-time methods and local bifurcations. To address these limitations, we use Neural Ordinary Differential Equations which provide a continuous, data-driven framework for learning system dynamics. We apply our approach to a predator-prey system that features both local and global bifurcations, presenting a challenging test case. Our results show that Neural Ordinary Differential Equations can recover underlying bifurcation structures directly from timeseries data by learning parameter-dependent vector fields. Notably, we demonstrate that Neural Ordinary Differential Equations can forecast bifurcations even beyond the parameter regions represented in the training data. We also assess the method's performance under limited and noisy data conditions, finding that model accuracy depends more on the quality of information that can be inferred from the training data, than on the amount of data available.
Updated: 2025-07-25 07:44:34
标题: 神经普通微分方程用于学习和推断系统动力学在分歧点上的变化
摘要: 预测系统在分叉附近和跨越分叉的行为是识别动态系统潜在转变的关键。虽然最近已经使用机器学习从数据中学习关键转变和分叉结构,但大多数研究仍然局限于专注于离散时间方法和局部分叉。为了解决这些局限性,我们使用神经普通微分方程,为学习系统动态提供了连续的、数据驱动的框架。我们将我们的方法应用于一个包含局部和全局分叉的捕食-被捕食系统,提供了一个具有挑战性的测试案例。我们的结果表明,神经普通微分方程可以通过学习参数相关的向量场直接从时间序列数据中恢复潜在的分叉结构。值得注意的是,我们证明神经普通微分方程可以预测甚至超出训练数据中代表的参数区域的分叉。我们还评估了该方法在有限和嘈杂数据条件下的性能,发现模型的准确性更多地取决于可以从训练数据中推断出的信息质量,而不是可用数据的数量。
更新时间: 2025-07-25 07:44:34
领域: cs.LG,math.DS
Dual Path Learning -- learning from noise and context for medical image denoising
Medical imaging plays a critical role in modern healthcare, enabling clinicians to accurately diagnose diseases and develop effective treatment plans. However, noise, often introduced by imaging devices, can degrade image quality, leading to misinterpretation and compromised clinical outcomes. Existing denoising approaches typically rely either on noise characteristics or on contextual information from the image. Moreover, they are commonly developed and evaluated for a single imaging modality and noise type. Motivated by Geng et.al CNCL, which integrates both noise and context, this study introduces a Dual-Pathway Learning (DPL) model architecture that effectively denoises medical images by leveraging both sources of information and fusing them to generate the final output. DPL is evaluated across multiple imaging modalities and various types of noise, demonstrating its robustness and generalizability. DPL improves PSNR by 3.35% compared to the baseline UNet when evaluated on Gaussian noise and trained across all modalities. The code is available at 10.5281/zenodo.15836053.
Updated: 2025-07-25 07:43:50
标题: 双路径学习——从噪声和背景中学习进行医学图像去噪
摘要: 医学影像在现代医疗保健中发挥着至关重要的作用,使临床医生能够准确诊断疾病并制定有效的治疗方案。然而,噪音通常由影像设备引入,可能降低图像质量,导致误解和临床结果受损。现有的去噪方法通常依赖于噪音特征或图像的上下文信息。此外,它们通常针对单一成像模态和噪音类型进行开发和评估。受到Geng等人的CNCL启发,该研究引入了一种双通道学习(DPL)模型架构,通过利用噪音和上下文两种信息源并将它们融合以生成最终输出来有效去噪医学图像。DPL在多个成像模态和各种类型的噪音中进行评估,展示了其鲁棒性和泛化能力。与基准UNet相比,当在高斯噪音上进行评估并跨所有模态进行训练时,DPL将PSNR提高了3.35%。代码可在10.5281/zenodo.15836053上找到。
更新时间: 2025-07-25 07:43:50
领域: cs.CV,cs.AI
Dual Path Learning -- learning from noise and context for medical image denoising
Medical imaging plays a critical role in modern healthcare, enabling clinicians to accurately diagnose diseases and develop effective treatment plans. However, noise, often introduced by imaging devices, can degrade image quality, leading to misinterpretation and compromised clinical outcomes. Existing denoising approaches typically rely either on noise characteristics or on contextual information from the image. Moreover, they are commonly developed and evaluated for a single imaging modality and noise type. Motivated by Geng et.al CNCL, which integrates both noise and context, this study introduces a Dual-Pathway Learning (DPL) model architecture that effectively denoises medical images by leveraging both sources of information and fusing them to generate the final output. DPL is evaluated across multiple imaging modalities and various types of noise, demonstrating its robustness and generalizability. DPL improves PSNR by 3.35% compared to the baseline UNet when evaluated on Gaussian noise and trained across all modalities. The code is available at 10.5281/zenodo.15836053.
Updated: 2025-07-25 07:43:50
标题: 双路径学习——从噪声和背景中学习用于医学图像去噪
摘要: 医学成像在现代医疗保健中发挥着关键作用,使临床医生能够准确诊断疾病并制定有效的治疗计划。然而,噪音常常由成像设备引入,可能降低图像质量,导致误解和临床结果受损。现有的去噪方法通常依赖于噪声特性或图像的上下文信息。此外,它们通常是针对单一成像模态和噪声类型开发和评估的。受耿等人CNCL研究的启发,该研究引入了一种双通道学习(DPL)模型架构,通过利用两种信息源并将它们融合以生成最终输出,有效地去噪医学图像。DPL在多种成像模态和各种类型的噪声下进行评估,展示了其鲁棒性和泛化能力。与基准UNet相比,DPL在高斯噪声下评估并在所有模态下训练时,PSNR提高了3.35%。该代码可在10.5281/zenodo.15836053找到。
更新时间: 2025-07-25 07:43:50
领域: cs.CV,cs.AI
KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment
Cyber threat intelligence (CTI) is a crucial tool to prevent sophisticated, organized, and weaponized cyber attacks. However, few studies have focused on the credibility assessment of CTI, and this work still requires manual analysis by cybersecurity experts. In this paper, we propose Knowledge Graph-based Verifier (KGV), the first framework integrating large language models (LLMs) with simple structured knowledge graphs (KGs) for automated CTI credibility assessment. Unlike entity-centric KGs, KGV constructs paragraph-level semantic graphs where nodes represent text segments connected through similarity analysis, which effectively enhances the semantic understanding ability of the model, reduces KG density and greatly improves response speed. Experimental results demonstrate that our KGV outperforms state-of-the-art fact reasoning methods on the CTI-200 dataset, achieving a 5.7\% improvement in F1. Additionally, it shows strong scalability on factual QA and fake news detection datasets. Compared to entity-based knowledge graphs (KGs) for equivalent-length texts, our structurally simple KG reduces node quantities by nearly two-thirds while boosting precision by 1.7\% and cutting response time by 46.7\%. In addition, we have created and publicly released the first CTI credibility assessment dataset, CTI-200. Distinct from CTI identification datasets, CTI-200 refines CTI summaries and key sentences to focus specifically on credibility assessment.
Updated: 2025-07-25 07:41:37
标题: KGV:将大型语言模型与知识图谱集成,用于网络威胁情报可信度评估
摘要: 网络威胁情报(CTI)是防范复杂、有组织和武器化网络攻击的关键工具。然而,很少有研究关注CTI的可信度评估,这项工作仍需要网络安全专家进行手动分析。在本文中,我们提出了基于知识图的验证器(KGV),这是第一个将大型语言模型(LLMs)与简单结构化知识图(KGs)集成在一起,用于自动化CTI可信度评估的框架。与以实体为中心的知识图不同,KGV构建了段落级语义图,其中节点表示通过相似性分析连接的文本段,这有效地增强了模型的语义理解能力,减少了知识图的密度,并极大地提高了响应速度。实验结果表明,我们的KGV在CTI-200数据集上的表现优于最先进的事实推理方法,F1得分提高了5.7%。此外,它在事实问答和假新闻检测数据集上表现出很强的可扩展性。与等长文本的基于实体的知识图(KGs)相比,我们的结构简单的知识图减少了将近三分之二的节点数量,同时提高了1.7%的精度,并缩短了46.7%的响应时间。此外,我们创建并公开发布了第一个CTI可信度评估数据集CTI-200。与CTI识别数据集不同,CTI-200对CTI摘要和关键句进行了精炼,专注于可信度评估。
更新时间: 2025-07-25 07:41:37
领域: cs.CR,cs.IR
How to Copy-Protect Malleable-Puncturable Cryptographic Functionalities Under Arbitrary Challenge Distributions
A quantum copy-protection scheme (Aaronson, CCC 2009) encodes a functionality into a quantum state such that given this state, no efficient adversary can create two (possibly entangled) quantum states that are both capable of running the functionality. There has been a recent line of works on constructing provably-secure copy-protection schemes for general classes of schemes in the plain model, and most recently the recent work of \c{C}akan and Goyal (IACR Eprint, 2025) showed how to copy-protect all cryptographically puncturable schemes with pseudorandom puncturing points. In this work, we show how to copy-protect even a larger class of schemes. We define a class of cryptographic schemes called malleable-puncturable schemes where the only requirement is that one can create a circuit that is capable of answering inputs at points that are unrelated to the challenge in the security game but does not help the adversary answer inputs related to the challenge. This is a flexible generalization of puncturable schemes, and can capture a wide range of primitives that was not known how to copy-protect prior to our work. Going further, we show that our scheme is secure against arbitrary high min-entropy challenge distributions whereas previous work has only considered schemes that are punctured at pseudorandom points.
Updated: 2025-07-25 07:40:44
标题: 如何在任意挑战分布下保护可塑性可穿透性密码功能
摘要: 一种量子复制保护方案(Aaronson, CCC 2009)将功能编码到一个量子态中,以使得在给定该态的情况下,没有有效的对手能够创建两个(可能是纠缠的)量子态,二者都能运行该功能。最近,有一系列关于在普通模型中为一般类别的方案构建可证明安全的复制保护方案的工作,最近的Cakan和Goyal的工作(IACR Eprint,2025)展示了如何使用伪随机穿孔点复制保护所有密码学可穿孔方案。在这项工作中,我们展示了如何复制保护更大类别的方案。我们定义了一类称为可塑性可穿孔方案的密码学方案,其唯一要求是可以创建一个电路,能够回答与安全游戏中挑战无关的点的输入,但不能帮助对手回答与挑战相关的输入。这是可穿孔方案的灵活泛化,可以捕捉一系列以前不知道如何复制保护的原语。更进一步,我们展示了我们的方案对任意高最小熵挑战分布是安全的,而以前的工作只考虑在伪随机点上穿孔的方案。
更新时间: 2025-07-25 07:40:44
领域: cs.CR
ProGMLP: A Progressive Framework for GNN-to-MLP Knowledge Distillation with Efficient Trade-offs
GNN-to-MLP (G2M) methods have emerged as a promising approach to accelerate Graph Neural Networks (GNNs) by distilling their knowledge into simpler Multi-Layer Perceptrons (MLPs). These methods bridge the gap between the expressive power of GNNs and the computational efficiency of MLPs, making them well-suited for resource-constrained environments. However, existing G2M methods are limited by their inability to flexibly adjust inference cost and accuracy dynamically, a critical requirement for real-world applications where computational resources and time constraints can vary significantly. To address this, we introduce a Progressive framework designed to offer flexible and on-demand trade-offs between inference cost and accuracy for GNN-to-MLP knowledge distillation (ProGMLP). ProGMLP employs a Progressive Training Structure (PTS), where multiple MLP students are trained in sequence, each building on the previous one. Furthermore, ProGMLP incorporates Progressive Knowledge Distillation (PKD) to iteratively refine the distillation process from GNNs to MLPs, and Progressive Mixup Augmentation (PMA) to enhance generalization by progressively generating harder mixed samples. Our approach is validated through comprehensive experiments on eight real-world graph datasets, demonstrating that ProGMLP maintains high accuracy while dynamically adapting to varying runtime scenarios, making it highly effective for deployment in diverse application settings.
Updated: 2025-07-25 07:35:09
标题: ProGMLP:一种具有高效权衡的GNN到MLP知识蒸馏的渐进式框架
摘要: GNN-to-MLP (G2M)方法已经成为一种有前途的方法,可以加速图神经网络(GNNs)通过将它们的知识提炼成更简单的多层感知器(MLPs)。这些方法弥合了GNNs的表达能力和MLPs的计算效率之间的差距,使它们非常适合资源受限的环境。然而,现有的G2M方法受到它们无法灵活调整推断成本和准确性的限制,这是现实世界应用中的一个关键要求,其中计算资源和时间限制可能会显著变化。为了解决这个问题,我们引入了一个渐进性框架,旨在为GNN-to-MLP知识提炼(ProGMLP)提供灵活和按需的推断成本和准确性之间的权衡。ProGMLP采用渐进训练结构(PTS),其中多个MLP学生依次训练,每个学生都在前一个学生的基础上构建。此外,ProGMLP还融入了渐进知识蒸馏(PKD)来逐步完善从GNNs到MLPs的提炼过程,以及渐进混合增强(PMA)来通过逐步生成更难的混合样本来增强泛化能力。通过对八个真实世界图数据集的全面实验验证了我们的方法,表明ProGMLP在保持高准确性的同时动态适应不同的运行时场景,使其在各种应用设置中高效部署。
更新时间: 2025-07-25 07:35:09
领域: cs.LG
Stella Nera: A Differentiable Maddness-Based Hardware Accelerator for Efficient Approximate Matrix Multiplication
Artificial intelligence has surged in recent years, with advancements in machine learning rapidly impacting nearly every area of life. However, the growing complexity of these models has far outpaced advancements in available hardware accelerators, leading to significant computational and energy demands, primarily due to matrix multiplications, which dominate the compute workload. Maddness (i.e., Multiply-ADDitioN-lESS) presents a hash-based version of product quantization, which renders matrix multiplications into lookups and additions, eliminating the need for multipliers entirely. We present Stella Nera, the first Maddness-based accelerator achieving an energy efficiency of 161 TOp/s/W@0.55V, 25x better than conventional MatMul accelerators due to its small components and reduced computational complexity. We further enhance Maddness with a differentiable approximation, allowing for gradient-based fine-tuning and achieving an end-to-end performance of 92.5% Top-1 accuracy on CIFAR-10.
Updated: 2025-07-25 07:29:36
标题: 黑星:一种基于可微疯狂的硬件加速器,用于高效的近似矩阵乘法
摘要: 近年来,人工智能迅速发展,机器学习的进步迅速影响到生活的各个领域。然而,这些模型的增长复杂性远远超过了可用硬件加速器的进展,导致了巨大的计算和能量需求,主要是由于矩阵乘法占据了计算工作负载的主导地位。Maddness(即Multiply-ADDitioN-lESS)提出了基于哈希的产品量化版本,将矩阵乘法转换为查找和添加,完全消除了对乘法器的需求。我们介绍了基于Maddness的加速器Stella Nera,其能效达到了161 TOp/s/W@0.55V,由于其小型组件和降低的计算复杂性,比传统的MatMul加速器好25倍。我们进一步改进了Maddness,采用可微分的逼近方法,实现了基于梯度的微调,并在CIFAR-10上实现了92.5%的Top-1准确率的端到端性能。
更新时间: 2025-07-25 07:29:36
领域: cs.AR,cs.CV,cs.LG,stat.ML
Kill two birds with one stone: generalized and robust AI-generated text detection via dynamic perturbations
The growing popularity of large language models has raised concerns regarding the potential to misuse AI-generated text (AIGT). It becomes increasingly critical to establish an excellent AIGT detection method with high generalization and robustness. However, existing methods either focus on model generalization or concentrate on robustness. The unified mechanism, to simultaneously address the challenges of generalization and robustness, is less explored. In this paper, we argue that robustness can be view as a specific form of domain shift, and empirically reveal an intrinsic mechanism for model generalization of AIGT detection task. Then, we proposed a novel AIGT detection method (DP-Net) via dynamic perturbations introduced by a reinforcement learning with elaborated reward and action. Experimentally, extensive results show that the proposed DP-Net significantly outperforms some state-of-the-art AIGT detection methods for generalization capacity in three cross-domain scenarios. Meanwhile, the DP-Net achieves best robustness under two text adversarial attacks. The code is publicly available at https://github.com/CAU-ISS-Lab/AIGT-Detection-Evade-Detection/tree/main/DP-Net.
Updated: 2025-07-25 07:21:08
标题: 一石二鸟:通过动态扰动实现广义和强健的AI生成文本检测
摘要: 随着大型语言模型的日益普及,人们对人工智能生成文本(AIGT)可能被滥用的担忧日益加剧。建立一种具有高泛化性和鲁棒性的优秀AIGT检测方法变得越来越关键。然而,现有方法要么专注于模型的泛化性,要么专注于鲁棒性。同时解决泛化性和鲁棒性挑战的统一机制研究较少。在本文中,我们认为鲁棒性可以视为一种特定形式的领域转移,并在经验上揭示了AIGT检测任务模型泛化的固有机制。然后,我们提出了一种新颖的AIGT检测方法(DP-Net),通过引入由精心设计的奖励和行动的强化学习引入的动态扰动。实验结果表明,所提出的DP-Net在三个跨领域场景中的泛化能力显著优于一些最先进的AIGT检测方法。与此同时,DP-Net在两种文本对抗攻击下表现出最佳的鲁棒性。该代码可以在https://github.com/CAU-ISS-Lab/AIGT-Detection-Evade-Detection/tree/main/DP-Net 上公开获取。
更新时间: 2025-07-25 07:21:08
领域: cs.CL,cs.AI
Kill two birds with one stone: generalized and robust AI-generated text detection via dynamic perturbations
The growing popularity of large language models has raised concerns regarding the potential to misuse AI-generated text (AIGT). It becomes increasingly critical to establish an excellent AIGT detection method with high generalization and robustness. However, existing methods either focus on model generalization or concentrate on robustness. The unified mechanism, to simultaneously address the challenges of generalization and robustness, is less explored. In this paper, we argue that robustness can be view as a specific form of domain shift, and empirically reveal an intrinsic mechanism for model generalization of AIGT detection task. Then, we proposed a novel AIGT detection method (DP-Net) via dynamic perturbations introduced by a reinforcement learning with elaborated reward and action. Experimentally, extensive results show that the proposed DP-Net significantly outperforms some state-of-the-art AIGT detection methods for generalization capacity in three cross-domain scenarios. Meanwhile, the DP-Net achieves best robustness under two text adversarial attacks. The code is publicly available at https://github.com/CAU-ISS-Lab/AIGT-Detection-Evade-Detection/tree/main/DP-Net.
Updated: 2025-07-25 07:21:08
标题: 一石二鸟:通过动态扰动实现广义和鲁棒的 AI 生成文本检测
摘要: 大语言模型日益受欢迎,引发了人们对滥用人工智能生成文本(AIGT)潜力的担忧。建立一个具有高泛化能力和鲁棒性的优秀AIGT检测方法变得日益关键。然而,现有方法要么专注于模型泛化,要么专注于鲁棒性。同时解决泛化和鲁棒性挑战的统一机制尚未得到充分探索。在本文中,我们认为鲁棒性可以被看作一种特定形式的领域转移,并通过实验证明了AIGT检测任务模型泛化的内在机制。然后,我们提出了一种通过引入精心设计的奖励和行动的强化学习动态扰动(DP-Net)的新型AIGT检测方法。实验结果表明,所提出的DP-Net在三个跨领域场景中明显优于一些最新的AIGT检测方法的泛化能力。与此同时,DP-Net在两种文本敌对攻击下实现了最佳鲁棒性。代码可在https://github.com/CAU-ISS-Lab/AIGT-Detection-Evade-Detection/tree/main/DP-Net 上公开获取。
更新时间: 2025-07-25 07:21:08
领域: cs.CL,cs.AI
Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
With rapid advancement and increasing accessibility of LLMs, fine-tuning aligned models has become a critical step for adapting them to real-world applications, which makes the safety of this fine-tuning process more important than ever. However, recent studies have highlighted a critical challenge: even when fine-tuning with seemingly benign downstream datasets, the safety of aligned LLMs can be compromised, making them more susceptible to malicious instructions. In this paper, we show that fine-tuning datasets often contain samples with safety-degrading features that are not easily identifiable on the surface. These samples can significantly degrade the safety alignment of LLMs during fine-tuning. To address this issue, we propose LARF, a Layer-Aware Representation Filtering method. This method identifies safety-sensitive layers within the LLM and leverages their representations to detect which data samples in the post-training dataset contain safety-degrading features. Experimental results demonstrate that LARF can effectively identify benign data with safety-degrading features. After removing such data, the safety alignment degradation caused by fine-tuning is mitigated. Please see our code at https://github.com/LLLeoLi/LARF.
Updated: 2025-07-25 07:20:24
标题: 层感知表示过滤:净化微调数据以保持LLM安全对齐
摘要: 随着LLM技术的快速发展和日益普及,对齐模型的微调已成为将它们适应现实世界应用的关键步骤,这使得这一微调过程的安全性比以往任何时候都更加重要。然而,最近的研究突出了一个关键挑战:即使在看似良性的下游数据集上进行微调,对齐模型的安全性也可能会受到影响,使其更容易受到恶意指令的影响。 在本文中,我们展示了微调数据集通常包含具有不易在表面上识别的安全降级特征的样本。这些样本在微调过程中可以显著降低LLM的安全对齐性。为解决这一问题,我们提出了一种名为LARF的Layer-Aware Representation Filtering方法。该方法识别LLM内的安全敏感层,并利用它们的表示来检测后训练数据集中包含安全降级特征的数据样本。 实验结果表明,LARF能够有效识别具有安全降级特征的良性数据。在删除此类数据后,微调引起的安全对齐降级得到缓解。请查看我们的代码:https://github.com/LLLeoLi/LARF。
更新时间: 2025-07-25 07:20:24
领域: cs.CR
MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster
Reinforcement learning (RL) is a paradigm increasingly used to align large language models. Popular RL algorithms utilize multiple workers and can be modeled as a graph, where each node is the status of a worker and each edge represents dataflow between nodes. Owing to the heavy cross-node dependencies, the RL training system usually suffers from poor cluster scalability and low memory utilization. In this article, we introduce MindSpeed RL, an effective and efficient system for large-scale RL training. Unlike existing centralized methods, MindSpeed RL organizes the essential data dependencies in RL training, i.e., sample flow and resharding flow, from a distributed view. On the one hand, a distributed transfer dock strategy, which sets controllers and warehouses on the basis of the conventional replay buffer, is designed to release the dispatch overhead in the sample flow. A practical allgather--swap strategy is presented to eliminate redundant memory usage in resharding flow. In addition, MindSpeed RL further integrates numerous parallelization strategies and acceleration techniques for systematic optimization. Compared with existing state-of-the-art systems, comprehensive experiments on the RL training of popular Qwen2.5-Dense-7B/32B, Qwen3-MoE-30B, and DeepSeek-R1-MoE-671B show that MindSpeed RL increases the throughput by 1.42 ~ 3.97 times. Finally, we open--source MindSpeed RL and perform all the experiments on a super pod of Ascend with 384 neural processing units (NPUs) to demonstrate the powerful performance and reliability of Ascend.
Updated: 2025-07-25 07:11:49
标题: MindSpeed RL: 在Ascend NPU集群上实现可扩展和高效的RL训练的分布式数据流
摘要: 强化学习(RL)是一种越来越多地被用于对齐大型语言模型的范式。流行的RL算法利用多个工作者,并可以建模为一个图,其中每个节点是一个工作者的状态,每个边代表节点之间的数据流。由于跨节点依赖性较重,RL训练系统通常受到集群可伸缩性差和内存利用率低的影响。在本文中,我们介绍了MindSpeed RL,这是一个用于大规模RL训练的有效和高效系统。与现有的集中式方法不同,MindSpeed RL从分布式视角组织RL训练中的关键数据依赖,即样本流和重新分片流。一方面,设计了一种分布式传输舱策略,根据传统回放缓冲区设置控制器和仓库,以释放样本流中的调度开销。提出了一种实用的allgather-swap策略,以消除重新分片流中的冗余内存使用。此外,MindSpeed RL进一步整合了许多并行化策略和加速技术,进行系统优化。与现有最先进的系统相比,对流行的Qwen2.5-Dense-7B/32B、Qwen3-MoE-30B和DeepSeek-R1-MoE-671B的RL训练进行了全面实验,结果显示MindSpeed RL的吞吐量增加了1.42~3.97倍。最后,我们公开了MindSpeed RL,并在具有384个神经处理单元(NPU)的Ascend超级Pod上执行了所有实验,以展示Ascend的强大性能和可靠性。
更新时间: 2025-07-25 07:11:49
领域: cs.LG,cs.AI,CS
MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster
Reinforcement learning (RL) is a paradigm increasingly used to align large language models. Popular RL algorithms utilize multiple workers and can be modeled as a graph, where each node is the status of a worker and each edge represents dataflow between nodes. Owing to the heavy cross-node dependencies, the RL training system usually suffers from poor cluster scalability and low memory utilization. In this article, we introduce MindSpeed RL, an effective and efficient system for large-scale RL training. Unlike existing centralized methods, MindSpeed RL organizes the essential data dependencies in RL training, i.e., sample flow and resharding flow, from a distributed view. On the one hand, a distributed transfer dock strategy, which sets controllers and warehouses on the basis of the conventional replay buffer, is designed to release the dispatch overhead in the sample flow. A practical allgather--swap strategy is presented to eliminate redundant memory usage in resharding flow. In addition, MindSpeed RL further integrates numerous parallelization strategies and acceleration techniques for systematic optimization. Compared with existing state-of-the-art systems, comprehensive experiments on the RL training of popular Qwen2.5-Dense-7B/32B, Qwen3-MoE-30B, and DeepSeek-R1-MoE-671B show that MindSpeed RL increases the throughput by 1.42 ~ 3.97 times. Finally, we open--source MindSpeed RL and perform all the experiments on a super pod of Ascend with 384 neural processing units (NPUs) to demonstrate the powerful performance and reliability of Ascend.
Updated: 2025-07-25 07:11:49
标题: MindSpeed RL:在Ascend NPU集群上实现可扩展高效强化学习训练的分布式数据流
摘要: 强化学习(RL)是一种越来越常用的范式,用于对齐大型语言模型。流行的RL算法利用多个工作节点,并可以建模为一个图,其中每个节点是一个工作节点的状态,每个边代表节点之间的数据流。由于跨节点依赖性较强,RL训练系统通常受到集群可伸缩性差和内存利用率低的影响。在本文中,我们介绍了MindSpeed RL,这是一个用于大规模RL训练的有效和高效系统。与现有的集中式方法不同,MindSpeed RL从分布式视图组织RL训练中的关键数据依赖,即样本流和重新划分流。一方面,设计了一种分布式传输停靠策略,该策略基于传统的重放缓冲区设置控制器和仓库,用于释放样本流中的调度开销。提出了一种实用的allgather--swap策略,用于消除重新划分流中的冗余内存使用。此外,MindSpeed RL进一步集成了众多并行化策略和加速技术,进行系统优化。与现有最先进的系统相比,对流行的Qwen2.5-Dense-7B/32B,Qwen3-MoE-30B和DeepSeek-R1-MoE-671B的RL训练进行了全面实验,结果显示MindSpeed RL的吞吐量提高了1.42~3.97倍。最后,我们开源了MindSpeed RL,并在具有384个神经处理单元(NPU)的Ascend超级Pod上进行了所有实验,以展示Ascend的强大性能和可靠性。
更新时间: 2025-07-25 07:11:49
领域: cs.LG,cs.AI,CS
Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains
To gain deeper insights into a complex sensor system through the lens of causality, we present common and individual causal mechanism estimation (CICME), a novel three-step approach to inferring causal mechanisms from heterogeneous data collected across multiple domains. By leveraging the principle of Causal Transfer Learning (CTL), CICME is able to reliably detect domain-invariant causal mechanisms when provided with sufficient samples. The identified common causal mechanisms are further used to guide the estimation of the remaining causal mechanisms in each domain individually. The performance of CICME is evaluated on linear Gaussian models under scenarios inspired from a manufacturing process. Building upon existing continuous optimization-based causal discovery methods, we show that CICME leverages the benefits of applying causal discovery on the pooled data and repeatedly on data from individual domains, and it even outperforms both baseline methods under certain scenarios.
Updated: 2025-07-25 07:07:32
标题: 多传感器系统跨多个领域中的因果机制估计
摘要: 为了通过因果关系的视角深入了解复杂传感器系统,我们提出了常见和个体因果机制估计(CICME),这是一种从跨多个领域收集的异构数据中推断因果机制的新颖三步方法。通过利用因果传递学习(CTL)原则,CICME能够在提供足够样本的情况下可靠地检测领域不变的因果机制。确定的常见因果机制进一步用于引导在每个领域中单独估计剩余的因果机制。我们在受制造过程启发的情景下评估了CICME在线性高斯模型上的性能。在现有基于连续优化的因果发现方法基础上,我们展示了CICME利用将因果发现应用于汇总数据和重复应用于各个领域数据的好处,并且在某些情况下甚至胜过基准方法。
更新时间: 2025-07-25 07:07:32
领域: cs.LG,stat.ML
MedIQA: A Scalable Foundation Model for Prompt-Driven Medical Image Quality Assessment
Rapid advances in medical imaging technology underscore the critical need for precise and automated image quality assessment (IQA) to ensure diagnostic accuracy. Existing medical IQA methods, however, struggle to generalize across diverse modalities and clinical scenarios. In response, we introduce MedIQA, the first comprehensive foundation model for medical IQA, designed to handle variability in image dimensions, modalities, anatomical regions, and types. We developed a large-scale multi-modality dataset with plentiful manually annotated quality scores to support this. Our model integrates a salient slice assessment module to focus on diagnostically relevant regions feature retrieval and employs an automatic prompt strategy that aligns upstream physical parameter pre-training with downstream expert annotation fine-tuning. Extensive experiments demonstrate that MedIQA significantly outperforms baselines in multiple downstream tasks, establishing a scalable framework for medical IQA and advancing diagnostic workflows and clinical decision-making.
Updated: 2025-07-25 07:02:47
标题: MedIQA:一种可扩展的基础模型,用于驱动医学图像质量评估
摘要: 医学影像技术的快速进步凸显了对精确和自动化图像质量评估(IQA)的关键需求,以确保诊断准确性。然而,现有的医学IQA方法在跨不同模态和临床场景方面存在困难。为此,我们引入了MedIQA,这是第一个针对医学IQA的全面基础模型,旨在处理图像尺寸、模态、解剖区域和类型的变化。我们开发了一个大规模多模态数据集,其中包含大量手动标注的质量分数,以支持此模型。我们的模型集成了一个突出的切片评估模块,用于专注于诊断相关区域的特征检索,并采用了一种自动提示策略,将上游物理参数预训练与下游专家注释微调相结合。广泛的实验表明,MedIQA在多个下游任务中明显优于基线模型,为医学IQA建立了可扩展的框架,推进了诊断工作流程和临床决策制定。
更新时间: 2025-07-25 07:02:47
领域: cs.CV,cs.AI
MedIQA: A Scalable Foundation Model for Prompt-Driven Medical Image Quality Assessment
Rapid advances in medical imaging technology underscore the critical need for precise and automated image quality assessment (IQA) to ensure diagnostic accuracy. Existing medical IQA methods, however, struggle to generalize across diverse modalities and clinical scenarios. In response, we introduce MedIQA, the first comprehensive foundation model for medical IQA, designed to handle variability in image dimensions, modalities, anatomical regions, and types. We developed a large-scale multi-modality dataset with plentiful manually annotated quality scores to support this. Our model integrates a salient slice assessment module to focus on diagnostically relevant regions feature retrieval and employs an automatic prompt strategy that aligns upstream physical parameter pre-training with downstream expert annotation fine-tuning. Extensive experiments demonstrate that MedIQA significantly outperforms baselines in multiple downstream tasks, establishing a scalable framework for medical IQA and advancing diagnostic workflows and clinical decision-making.
Updated: 2025-07-25 07:02:47
标题: MedIQA:一种可扩展的基于提示驱动的医学图像质量评估基础模型
摘要: 医学成像技术的快速发展强调了对精确和自动化图像质量评估(IQA)的关键需求,以确保诊断准确性。然而,现有的医学IQA方法往往难以泛化到不同的模态和临床情景中。为此,我们引入了MedIQA,这是第一个专门用于医学IQA的综合基础模型,旨在处理图像尺寸、模态、解剖区域和类型的变化。我们开发了一个大规模的多模态数据集,其中包含丰富的手动注释的质量评分,以支持这一工作。我们的模型集成了一个显著的切片评估模块,以便专注于诊断相关区域的特征检索,并采用了自动提示策略,将上游物理参数预训练与下游专家注释微调相结合。大量实验证明,MedIQA在多个下游任务中明显优于基线方法,为医学IQA建立了一个可扩展的框架,推动了诊断工作流程和临床决策的发展。
更新时间: 2025-07-25 07:02:47
领域: cs.CV,cs.AI
A diffusion-based generative model for financial time series via geometric Brownian motion
We propose a novel diffusion-based generative framework for financial time series that incorporates geometric Brownian motion (GBM), the foundation of the Black--Scholes theory, into the forward noising process. Unlike standard score-based models that treat price trajectories as generic numerical sequences, our method injects noise proportionally to asset prices at each time step, reflecting the heteroskedasticity observed in financial time series. By accurately balancing the drift and diffusion terms, we show that the resulting log-price process reduces to a variance-exploding stochastic differential equation, aligning with the formulation in score-based generative models. The reverse-time generative process is trained via denoising score matching using a Transformer-based architecture adapted from the Conditional Score-based Diffusion Imputation (CSDI) framework. Empirical evaluations on historical stock data demonstrate that our model reproduces key stylized facts heavy-tailed return distributions, volatility clustering, and the leverage effect more realistically than conventional diffusion models.
Updated: 2025-07-25 07:02:09
标题: 基于扩散的几何布朗运动的金融时间序列生成模型
摘要: 我们提出了一种新颖的基于扩散的金融时间序列生成框架,将几何布朗运动(GBM)——Black-Scholes理论的基础——融入到前向噪声过程中。与将价格轨迹视为通用数字序列的标准基于分数的模型不同,我们的方法在每个时间步根据资产价格比例注入噪声,反映了金融时间序列中观察到的异方差性。通过准确平衡漂移和扩散项,我们表明所得到的对数价格过程简化为一个方差爆炸的随机微分方程,与基于分数的生成模型中的公式一致。通过使用从条件基于分数扩散插补(CSDI)框架中调整的基于Transformer的架构进行去噪分数匹配训练逆时间生成过程。对历史股票数据的实证评估表明,我们的模型比传统扩散模型更真实地复制了关键的特征,如重尾回报分布、波动率聚类和杠杆效应。
更新时间: 2025-07-25 07:02:09
领域: cs.LG,cs.AI,cs.NA,math.NA,60H10, 91G80, 91G60
A diffusion-based generative model for financial time series via geometric Brownian motion
We propose a novel diffusion-based generative framework for financial time series that incorporates geometric Brownian motion (GBM), the foundation of the Black--Scholes theory, into the forward noising process. Unlike standard score-based models that treat price trajectories as generic numerical sequences, our method injects noise proportionally to asset prices at each time step, reflecting the heteroskedasticity observed in financial time series. By accurately balancing the drift and diffusion terms, we show that the resulting log-price process reduces to a variance-exploding stochastic differential equation, aligning with the formulation in score-based generative models. The reverse-time generative process is trained via denoising score matching using a Transformer-based architecture adapted from the Conditional Score-based Diffusion Imputation (CSDI) framework. Empirical evaluations on historical stock data demonstrate that our model reproduces key stylized facts heavy-tailed return distributions, volatility clustering, and the leverage effect more realistically than conventional diffusion models.
Updated: 2025-07-25 07:02:09
标题: 一个基于扩散的生成模型,通过几何布朗运动对金融时间序列进行建模
摘要: 我们提出了一种新颖的基于扩散的金融时间序列生成框架,将几何布朗运动(GBM)——Black-Scholes理论的基础——纳入到前向噪声过程中。与标准基于分数的模型不同,这些模型将价格轨迹视为通用的数字序列,我们的方法在每个时间步骤按资产价格比例注入噪声,反映了金融时间序列中观察到的异方差性。通过准确平衡漂移和扩散项,我们表明由此产生的对数价格过程简化为一个方差爆炸的随机微分方程,与基于分数的生成模型的表述一致。通过使用从条件基于分数扩散填充(CSDI)框架中调整的Transformer架构进行去噪分数匹配训练逆向时间生成过程。对历史股票数据的实证评估表明,我们的模型比传统的扩散模型更真实地复制了一些关键的风格化事实,如重尾的回报分布、波动率聚类和杠杆效应。
更新时间: 2025-07-25 07:02:09
领域: cs.LG,cs.AI,cs.NA,math.NA,60H10, 91G80, 91G60
Adapting to Fragmented and Evolving Data: A Fisher Information Perspective
Modern machine learning systems operating in dynamic environments often face \textit{sequential covariate shift} (SCS), where input distributions evolve over time while the conditional distribution remains stable. We introduce FADE (Fisher-based Adaptation to Dynamic Environments), a lightweight and theoretically grounded framework for robust learning under SCS. FADE employs a shift-aware regularization mechanism anchored in Fisher information geometry, guiding adaptation by modulating parameter updates based on sensitivity and stability. To detect significant distribution changes, we propose a Cramer-Rao-informed shift signal that integrates KL divergence with temporal Fisher dynamics. Unlike prior methods requiring task boundaries, target supervision, or experience replay, FADE operates online with fixed memory and no access to target labels. Evaluated on seven benchmarks spanning vision, language, and tabular data, FADE achieves up to 19\% higher accuracy under severe shifts, outperforming methods such as TENT and DIW. FADE also generalizes naturally to federated learning by treating heterogeneous clients as temporally fragmented environments, enabling scalable and stable adaptation in decentralized settings. Theoretical analysis guarantees bounded regret and parameter consistency, while empirical results demonstrate FADE's robustness across modalities and shift intensities.
Updated: 2025-07-25 06:50:09
标题: 适应碎片化和不断演变的数据:费舍尔信息视角
摘要: 现代机器学习系统在动态环境中运行时经常面临\textit{序贯协变量转移} (SCS),即输入分布随时间演化而条件分布保持稳定。我们引入了FADE(基于Fisher的适应动态环境)这一轻量级且理论基础的框架,用于在SCS下实现稳健学习。FADE采用了一个基于Fisher信息几何的转移感知正则化机制,通过根据敏感性和稳定性调节参数更新来引导适应。为了检测显著的分布变化,我们提出了一个Cramer-Rao信息的转移信号,将KL散度与时间Fisher动态相结合。与需要任务边界、目标监督或经验重现的先前方法不同,FADE在线运行,具有固定内存且无需访问目标标签。在跨视觉、语言和表格数据的七个基准数据集上评估,FADE在严重转移下实现了高达19\%的更高准确率,胜过了TENT和DIW等方法。FADE还通过将异质客户视为时间分段环境,自然地推广到联邦学习,从而在分散设置中实现可扩展和稳定的适应。理论分析保证了有界的后悔和参数一致性,而实证结果表明FADE在不同模态和转移强度下的稳健性。
更新时间: 2025-07-25 06:50:09
领域: cs.LG
PEMUTA: Pedagogically-Enriched Multi-Granular Undergraduate Thesis Assessment
The undergraduate thesis (UGTE) plays an indispensable role in assessing a student's cumulative academic development throughout their college years. Although large language models (LLMs) have advanced education intelligence, they typically focus on holistic assessment with only one single evaluation score, but ignore the intricate nuances across multifaceted criteria, limiting their ability to reflect structural criteria, pedagogical objectives, and diverse academic competencies. Meanwhile, pedagogical theories have long informed manual UGTE evaluation through multi-dimensional assessment of cognitive development, disciplinary thinking, and academic performance, yet remain underutilized in automated settings. Motivated by the research gap, we pioneer PEMUTA, a pedagogically-enriched framework that effectively activates domain-specific knowledge from LLMs for multi-granular UGTE assessment. Guided by Vygotsky's theory and Bloom's Taxonomy, PEMUTA incorporates a hierarchical prompting scheme that evaluates UGTEs across six fine-grained dimensions: Structure, Logic, Originality, Writing, Proficiency, and Rigor (SLOWPR), followed by holistic synthesis. Two in-context learning techniques, \ie, few-shot prompting and role-play prompting, are also incorporated to further enhance alignment with expert judgments without fine-tuning. We curate a dataset of authentic UGTEs with expert-provided SLOWPR-aligned annotations to support multi-granular UGTE assessment. Extensive experiments demonstrate that PEMUTA achieves strong alignment with expert evaluations, and exhibits strong potential for fine-grained, pedagogically-informed UGTE evaluations.
Updated: 2025-07-25 06:47:26
标题: PEMUTA:教育丰富的多粒度本科论文评估
摘要: 本科毕业论文(UGTE)在评估学生在大学期间累积学术发展方面发挥着不可或缺的作用。尽管大型语言模型(LLMs)已经推动了教育智能的发展,但它们通常只关注整体评估,并只有一个评分,而忽略了跨多方面标准的复杂细微差别,从而限制了它们反映结构标准、教学目标和多样化学术能力的能力。同时,教学理论长期以来一直通过认知发展、学科思维和学术表现的多维评估来指导手动UGTE评估,但在自动化环境中仍未充分利用。受到研究空白的启发,我们开创了PEMUTA,这是一个从LLMs中有效激活领域特定知识用于多粒度UGTE评估的教学丰富框架。PEMUTA受到维果茨基的理论和布鲁姆的分类法的指导,它包括一个分层提示方案,评估UGTE在六个细粒度维度上:结构、逻辑、原创性、写作、熟练度和严谨度(SLOWPR),然后进行整体综合。此外,还结合了两种上下文学习技术,即少样本提示和角色扮演提示,以进一步增强与专家判断的一致性而无需微调。我们整理了一个包含专家提供的SLOWPR对齐注释的真实UGTE数据集,以支持多粒度UGTE评估。广泛的实验表明,PEMUTA与专家评估具有很强的一致性,并展现了对细粒度、教学丰富的UGTE评估的强大潜力。
更新时间: 2025-07-25 06:47:26
领域: cs.CY,cs.AI
Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning
Motion planning is a crucial component of autonomous robot driving. While various trajectory datasets exist, effectively utilizing them for a target domain remains challenging due to differences in agent interactions and environmental characteristics. Conventional approaches, such as domain adaptation or ensemble learning, leverage multiple source datasets but suffer from domain imbalance, catastrophic forgetting, and high computational costs. To address these challenges, we propose Interaction-Merged Motion Planning (IMMP), a novel approach that leverages parameter checkpoints trained on different domains during adaptation to the target domain. IMMP follows a two-step process: pre-merging to capture agent behaviors and interactions, sufficiently extracting diverse information from the source domain, followed by merging to construct an adaptable model that efficiently transfers diverse interactions to the target domain. Our method is evaluated on various planning benchmarks and models, demonstrating superior performance compared to conventional approaches.
Updated: 2025-07-25 06:46:34
标题: 交互合并运动规划:有效利用多样化的运动数据集进行稳健规划
摘要: 运动规划是自主机器人驾驶的关键组成部分。虽然存在各种轨迹数据集,但有效地利用它们用于目标域仍然具有挑战性,因为代理之间的互动和环境特征的差异。传统方法,如领域适应或集成学习,利用多个源数据集,但存在领域不平衡、灾难性遗忘和高计算成本的问题。为了解决这些挑战,我们提出了互动合并运动规划(IMMP)的新方法,在适应目标域过程中利用在不同领域训练的参数检查点。IMMP遵循两个步骤的过程:预合并捕捉代理行为和互动,充分提取源域的多样信息,然后合并构建一个可适应的模型,有效地将多样的互动转移到目标域。我们的方法在各种规划基准和模型上进行了评估,表现出与传统方法相比的卓越性能。
更新时间: 2025-07-25 06:46:34
领域: cs.RO,cs.AI,cs.CV,cs.LG
Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning
Motion planning is a crucial component of autonomous robot driving. While various trajectory datasets exist, effectively utilizing them for a target domain remains challenging due to differences in agent interactions and environmental characteristics. Conventional approaches, such as domain adaptation or ensemble learning, leverage multiple source datasets but suffer from domain imbalance, catastrophic forgetting, and high computational costs. To address these challenges, we propose Interaction-Merged Motion Planning (IMMP), a novel approach that leverages parameter checkpoints trained on different domains during adaptation to the target domain. IMMP follows a two-step process: pre-merging to capture agent behaviors and interactions, sufficiently extracting diverse information from the source domain, followed by merging to construct an adaptable model that efficiently transfers diverse interactions to the target domain. Our method is evaluated on various planning benchmarks and models, demonstrating superior performance compared to conventional approaches.
Updated: 2025-07-25 06:46:34
标题: 交互融合运动规划:有效利用多样化的运动数据集进行稳健规划
摘要: 运动规划是自主机器人驾驶的关键组成部分。尽管存在各种轨迹数据集,但有效利用它们来应用于目标领域仍然具有挑战性,因为不同的智能体相互作用和环境特征存在差异。传统方法,如领域适应或集成学习,利用多个源数据集,但存在领域不平衡、灾难性遗忘和高计算成本的问题。为了解决这些挑战,我们提出了交互合并运动规划(IMMP),这是一种新颖的方法,利用在适应目标领域时训练的不同领域的参数检查点。IMMP遵循一个两步过程:预合并以捕获智能体行为和相互作用,充分从源域中提取多样化信息,然后合并以构建一个可适应的模型,有效地将不同的相互作用转移到目标领域。我们的方法在各种规划基准和模型上进行了评估,表现出优于传统方法的性能。
更新时间: 2025-07-25 06:46:34
领域: cs.RO,cs.AI,cs.CV,cs.LG
Agent0: Leveraging LLM Agents to Discover Multi-value Features from Text for Enhanced Recommendations
Large language models (LLMs) and their associated agent-based frameworks have significantly advanced automated information extraction, a critical component of modern recommender systems. While these multitask frameworks are widely used in code generation, their application in data-centric research is still largely untapped. This paper presents Agent0, an LLM-driven, agent-based system designed to automate information extraction and feature construction from raw, unstructured text. Categorical features are crucial for large-scale recommender systems but are often expensive to acquire. Agent0 coordinates a group of interacting LLM agents to automatically identify the most valuable text aspects for subsequent tasks (such as models or AutoML pipelines). Beyond its feature engineering capabilities, Agent0 also offers an automated prompt-engineering tuning method that utilizes dynamic feedback loops from an oracle. Our findings demonstrate that this closed-loop methodology is both practical and effective for automated feature discovery, which is recognized as one of the most challenging phases in current recommender system development.
Updated: 2025-07-25 06:45:10
标题: Agent0:利用LLM代理发现文本中的多值特征,以增强推荐
摘要: 大型语言模型(LLMs)及其相关的基于代理的框架显著推进了自动化信息提取,这是现代推荐系统的关键组成部分。虽然这些多任务框架在代码生成中被广泛使用,但它们在数据中心研究中的应用仍然大部分未开发。本文介绍了Agent0,一个由LLM驱动的基于代理的系统,旨在从原始、非结构化文本中自动提取信息和构建特征。类别特征对于大型推荐系统至关重要,但通常很昂贵。Agent0协调一组互动的LLM代理,自动识别对后续任务(例如模型或AutoML管道)最有价值的文本方面。除了其特征工程能力,Agent0还提供了一种利用来自Oracle的动态反馈循环的自动提示工程调优方法。我们的研究结果表明,这种闭环方法对于自动特征发现是实际且有效的,而自动特征发现被认为是当前推荐系统开发中最具挑战性的阶段之一。
更新时间: 2025-07-25 06:45:10
领域: cs.IR,cs.LG
Reinforcement Learning via Conservative Agent for Environments with Random Delays
Real-world reinforcement learning applications are often hindered by delayed feedback from environments, which violates the Markov assumption and introduces significant challenges. Although numerous delay-compensating methods have been proposed for environments with constant delays, environments with random delays remain largely unexplored due to their inherent variability and unpredictability. In this study, we propose a simple yet robust agent for decision-making under random delays, termed the conservative agent, which reformulates the random-delay environment into its constant-delay equivalent. This transformation enables any state-of-the-art constant-delay method to be directly extended to the random-delay environments without modifying the algorithmic structure or sacrificing performance. We evaluate the conservative agent-based algorithm on continuous control tasks, and empirical results demonstrate that it significantly outperforms existing baseline algorithms in terms of asymptotic performance and sample efficiency.
Updated: 2025-07-25 06:41:06
标题: 使用保守代理进行强化学习处理具有随机延迟环境
摘要: 现实世界中的强化学习应用往往受到来自环境的延迟反馈的限制,这违反了马尔可夫假设并引入了重大挑战。尽管已经提出了许多延迟补偿方法用于具有恒定延迟的环境,但由于其固有的可变性和不可预测性,具有随机延迟的环境仍然大多未被探索。在本研究中,我们提出了一种简单但健壮的代理人,用于在随机延迟下做出决策,称为保守代理人,它将随机延迟环境重新构造成其恒定延迟等效环境。这种转换使得任何最先进的恒定延迟方法可以直接扩展到随机延迟环境,而无需修改算法结构或牺牲性能。我们在连续控制任务上评估了基于保守代理人的算法,并实证结果表明,它在渐近性能和样本效率方面明显优于现有的基准算法。
更新时间: 2025-07-25 06:41:06
领域: cs.LG
HIVMedQA: Benchmarking large language models for HIV medical decision support
Large language models (LLMs) are emerging as valuable tools to support clinicians in routine decision-making. HIV management is a compelling use case due to its complexity, including diverse treatment options, comorbidities, and adherence challenges. However, integrating LLMs into clinical practice raises concerns about accuracy, potential harm, and clinician acceptance. Despite their promise, AI applications in HIV care remain underexplored, and LLM benchmarking studies are scarce. This study evaluates the current capabilities of LLMs in HIV management, highlighting their strengths and limitations. We introduce HIVMedQA, a benchmark designed to assess open-ended medical question answering in HIV care. The dataset consists of curated, clinically relevant questions developed with input from an infectious disease physician. We evaluated seven general-purpose and three medically specialized LLMs, applying prompt engineering to enhance performance. Our evaluation framework incorporates both lexical similarity and an LLM-as-a-judge approach, extended to better reflect clinical relevance. We assessed performance across key dimensions: question comprehension, reasoning, knowledge recall, bias, potential harm, and factual accuracy. Results show that Gemini 2.5 Pro consistently outperformed other models across most dimensions. Notably, two of the top three models were proprietary. Performance declined as question complexity increased. Medically fine-tuned models did not always outperform general-purpose ones, and larger model size was not a reliable predictor of performance. Reasoning and comprehension were more challenging than factual recall, and cognitive biases such as recency and status quo were observed. These findings underscore the need for targeted development and evaluation to ensure safe, effective LLM integration in clinical care.
Updated: 2025-07-25 06:40:44
标题: HIVMedQA:针对HIV医疗决策支持的大型语言模型进行基准测试
摘要: 大型语言模型(LLMs)正逐渐成为支持临床医生在日常决策中的有价值工具。HIV管理是一个引人关注的应用案例,因为其复杂性,包括多样化的治疗选择、合并症和依从性挑战。然而,将LLMs整合到临床实践中引发了对准确性、潜在伤害和临床医生接受程度的担忧。尽管有着许诺,但AI在HIV护理中的应用仍未得到充分探索,而LLM基准研究也很少。本研究评估了LLMs在HIV管理中的当前能力,突出了它们的优势和局限性。我们介绍了HIVMedQA,一个旨在评估HIV护理中开放式医学问题回答的基准。该数据集由一个传染病医师提供输入,包含经过筛选的临床相关问题。我们评估了七个通用型和三个医学专业化的LLMs,应用提示工程来提升性能。我们的评估框架包括词汇相似性和LLM作为评判者的方法,扩展以更好地体现临床相关性。我们跨足关键维度评估性能:问题理解、推理、知识回忆、偏见、潜在伤害和事实准确性。结果显示Gemini 2.5 Pro在大多数维度上始终优于其他模型。值得注意的是,前三名中有两个是专有的模型。随着问题复杂性的增加,性能下降。经过医学微调的模型并不总是优于通用型模型,而更大的模型尺寸并不是性能的可靠预测因子。推理和理解比事实回忆更具挑战性,观察到认知偏见,如近期性和现状。这些发现强调了有针对性的开发和评估的必要性,以确保LLM在临床护理中的安全、有效整合。
更新时间: 2025-07-25 06:40:44
领域: cs.CL,cs.AI
HIVMedQA: Benchmarking large language models for HIV medical decision support
Large language models (LLMs) are emerging as valuable tools to support clinicians in routine decision-making. HIV management is a compelling use case due to its complexity, including diverse treatment options, comorbidities, and adherence challenges. However, integrating LLMs into clinical practice raises concerns about accuracy, potential harm, and clinician acceptance. Despite their promise, AI applications in HIV care remain underexplored, and LLM benchmarking studies are scarce. This study evaluates the current capabilities of LLMs in HIV management, highlighting their strengths and limitations. We introduce HIVMedQA, a benchmark designed to assess open-ended medical question answering in HIV care. The dataset consists of curated, clinically relevant questions developed with input from an infectious disease physician. We evaluated seven general-purpose and three medically specialized LLMs, applying prompt engineering to enhance performance. Our evaluation framework incorporates both lexical similarity and an LLM-as-a-judge approach, extended to better reflect clinical relevance. We assessed performance across key dimensions: question comprehension, reasoning, knowledge recall, bias, potential harm, and factual accuracy. Results show that Gemini 2.5 Pro consistently outperformed other models across most dimensions. Notably, two of the top three models were proprietary. Performance declined as question complexity increased. Medically fine-tuned models did not always outperform general-purpose ones, and larger model size was not a reliable predictor of performance. Reasoning and comprehension were more challenging than factual recall, and cognitive biases such as recency and status quo were observed. These findings underscore the need for targeted development and evaluation to ensure safe, effective LLM integration in clinical care.
Updated: 2025-07-25 06:40:44
标题: HIVMedQA: 用于HIV医疗决策支持的大型语言模型基准测试
摘要: 大型语言模型(LLMs)正逐渐成为支持临床医生进行常规决策的有价值工具。HIV管理是一个引人关注的应用案例,因为其复杂性包括多样化的治疗选择、合并症和依从性挑战。然而,将LLMs整合到临床实践中引发了关于准确性、潜在危害和临床医生接受度的担忧。尽管具有潜力,但在HIV护理中的AI应用仍未得到充分探索,LLM基准研究也很少。本研究评估了LLMs在HIV管理中的当前能力,突出了它们的优势和局限性。我们引入了HIVMedQA,这是一个旨在评估HIV护理中开放式医疗问题回答的基准。该数据集由经验传染病医师提供输入开发的精心策划的临床相关问题组成。我们评估了七个通用型和三个医学专业化的LLMs,应用提示工程来增强性能。我们的评估框架既包括词汇相似度,又包括LLM作为评判者的方法,并扩展到更好地反映临床相关性。我们评估了关键维度上的性能:问题理解、推理、知识回忆、偏见、潜在危害和事实准确性。结果显示,Gemini 2.5 Pro在大多数维度上始终优于其他模型。值得注意的是,前三名模型中有两个是专有的。随着问题复杂性的增加,性能下降。医学调优模型并不总是优于通用型模型,而更大的模型尺寸并不是性能的可靠预测因子。推理和理解比事实回忆更具挑战性,并观察到认知偏见,如近期性和现状保持。这些发现强调了需要有针对性的开发和评估,以确保安全、有效地将LLMs整合到临床护理中。
更新时间: 2025-07-25 06:40:44
领域: cs.CL,cs.AI
GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units
As AI workloads proliferate, optimizing arithmetic units is becoming increasingly important to reduce the footprint of digital systems. Conventional design flows, which often rely on manual or heuristics-based optimization, are limited in their ability to thoroughly explore the vast design space. In this paper, we introduce GENIAL, a machine learning-based framework for the automatic generation and optimization of arithmetic units, more specifically multipliers. At the core of GENIAL is a Transformer-based surrogate model trained in two stages, involving self-supervised pretraining followed by supervised finetuning, to robustly forecast key hardware metrics such as power and area from abstracted design representations. By inverting the surrogate model, GENIAL efficiently searches for new operand encodings that directly minimize power consumption in arithmetic units for specific input data distributions. Extensive experiments on large datasets demonstrate that GENIAL is consistently more sample efficient than other methods, and converges faster towards optimized designs. This enables to deploy a high-effort logic synthesis optimization flow in the loop, improving the accuracy of the surrogate model. Notably, GENIAL automatically discovers encodings that achieve up to 18% switching activity savings within multipliers on representative AI workloads compared with the conventional two's complement. We also demonstrate the versatility of our approach by achieving significant improvements on Finite State Machines, highlighting GENIAL's applicability for a wide spectrum of logic functions. Together, these advances mark a significant step toward automated Quality-of-Results-optimized combinational circuit generation for digital systems.
Updated: 2025-07-25 06:34:59
标题: GENIAL:通过网络反演进行低功耗算法逻辑单元的生成设计空间探索
摘要: 随着人工智能工作负载的增加,优化算术单元对于减少数字系统的占地面积变得越来越重要。传统的设计流程通常依赖于手动或基于启发式的优化,其能力有限,无法全面探索广阔的设计空间。在本文中,我们介绍了GENIAL,这是一个基于机器学习的框架,用于自动生成和优化算术单元,更具体地说是乘法器。 GENIAL的核心是一个基于Transformer的替代模型,经过两个阶段的训练,包括自监督的预训练和监督的微调,可以稳健地预测关键的硬件指标,如功耗和面积,从抽象的设计表示中。通过反转替代模型,GENIAL可以高效地搜索新的操作数编码,直接最小化特定输入数据分布中算术单元的功耗。大量实验表明,GENIAL始终比其他方法更具样本效率,并且收敛速度更快,朝着优化设计方向。 这使得可以在循环中部署高效的逻辑综合优化流程,提高替代模型的准确性。值得注意的是,与传统的二进制补码相比,GENIAL自动发现的编码可以使多工器在代表性人工智能工作负载中实现高达18%的开关活动节省。我们还通过在有限状态机上取得显著改进,突出了GENIAL在各种逻辑功能上的适用性。所有这些进步标志着朝着自动化的优化结果优化的组合电路生成数字系统迈出了重要的一步。
更新时间: 2025-07-25 06:34:59
领域: cs.LG,cs.AI,cs.AR
GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units
As AI workloads proliferate, optimizing arithmetic units is becoming increasingly important to reduce the footprint of digital systems. Conventional design flows, which often rely on manual or heuristics-based optimization, are limited in their ability to thoroughly explore the vast design space. In this paper, we introduce GENIAL, a machine learning-based framework for the automatic generation and optimization of arithmetic units, more specifically multipliers. At the core of GENIAL is a Transformer-based surrogate model trained in two stages, involving self-supervised pretraining followed by supervised finetuning, to robustly forecast key hardware metrics such as power and area from abstracted design representations. By inverting the surrogate model, GENIAL efficiently searches for new operand encodings that directly minimize power consumption in arithmetic units for specific input data distributions. Extensive experiments on large datasets demonstrate that GENIAL is consistently more sample efficient than other methods, and converges faster towards optimized designs. This enables to deploy a high-effort logic synthesis optimization flow in the loop, improving the accuracy of the surrogate model. Notably, GENIAL automatically discovers encodings that achieve up to 18% switching activity savings within multipliers on representative AI workloads compared with the conventional two's complement. We also demonstrate the versatility of our approach by achieving significant improvements on Finite State Machines, highlighting GENIAL's applicability for a wide spectrum of logic functions. Together, these advances mark a significant step toward automated Quality-of-Results-optimized combinational circuit generation for digital systems.
Updated: 2025-07-25 06:34:59
标题: GENIAL: 通过网络反演进行低功耗算法逻辑单元的生成设计空间探索
摘要: 随着人工智能工作负载的增加,优化算术单元对于减少数字系统的占地面积变得越来越重要。传统的设计流程通常依赖于手动或基于启发式的优化,其在全面探索庞大的设计空间方面存在局限性。本文介绍了GENIAL,这是一个基于机器学习的框架,用于自动生成和优化算术单元,更具体地说是乘法器。 GENIAL的核心是一个基于Transformer的替代模型,经过两个阶段的训练,包括自监督预训练和监督微调,可以稳健地预测关键的硬件指标,如功耗和面积,从抽象的设计表示中。通过反转替代模型,GENIAL有效地搜索新的操作数编码,直接最小化特定输入数据分布的算术单元的功耗。大量实验证明,GENIAL始终比其他方法更具样本效率,并更快地收敛于优化设计。这使得在循环中部署高效的逻辑综合优化流程,提高了替代模型的准确性。值得注意的是,GENIAL自动发现的编码相较于传统的二进制补码,在代表性的人工智能工作负载中可以实现高达18%的开关活动节约。我们还通过在有限状态机上取得显著改进,突显了GENIAL在广泛逻辑函数中的适用性。总而言之,这些进步标志着朝着自动化优化组合电路生成数字系统的结果质量迈出了重要一步。
更新时间: 2025-07-25 06:34:59
领域: cs.LG,cs.AI,cs.AR
AEDR: Training-Free AI-Generated Image Attribution via Autoencoder Double-Reconstruction
The rapid advancement of image-generation technologies has made it possible for anyone to create photorealistic images using generative models, raising significant security concerns. To mitigate malicious use, tracing the origin of such images is essential. Reconstruction-based attribution methods offer a promising solution, but they often suffer from reduced accuracy and high computational costs when applied to state-of-the-art (SOTA) models. To address these challenges, we propose AEDR (AutoEncoder Double-Reconstruction), a novel training-free attribution method designed for generative models with continuous autoencoders. Unlike existing reconstruction-based approaches that rely on the value of a single reconstruction loss, AEDR performs two consecutive reconstructions using the model's autoencoder, and adopts the ratio of these two reconstruction losses as the attribution signal. This signal is further calibrated using the image homogeneity metric to improve accuracy, which inherently cancels out absolute biases caused by image complexity, with autoencoder-based reconstruction ensuring superior computational efficiency. Experiments on eight top latent diffusion models show that AEDR achieves 25.5% higher attribution accuracy than existing reconstruction-based methods, while requiring only 1% of the computational time.
Updated: 2025-07-25 06:34:58
标题: AEDR:通过自动编码器双重重建实现的无需训练的AI生成图像归属
摘要: 图像生成技术的快速发展使任何人都能使用生成模型创建逼真的图像,这引发了重要的安全担忧。为了减轻恶意使用,追踪这些图像的来源至关重要。基于重建的归因方法提供了一个有希望的解决方案,但当应用于最先进的模型时,它们往往受到准确性降低和高计算成本的困扰。为了解决这些挑战,我们提出了AEDR(AutoEncoder Double-Reconstruction),这是一种为具有连续自动编码器的生成模型设计的新颖的无需训练的归因方法。与现有的基于重建的方法不同,这些方法依赖于单个重建损失的值,AEDR使用模型的自动编码器进行两次连续重建,并采用这两个重建损失的比率作为归因信号。这个信号进一步经过图像均匀度度量标准校准以提高准确性,这从本质上消除了由图像复杂性造成的绝对偏差,同时基于自动编码器的重建确保了卓越的计算效率。对八个顶级的潜在扩散模型进行的实验表明,AEDR比现有的基于重建的方法实现了25.5%更高的归因准确性,同时仅需要1%的计算时间。
更新时间: 2025-07-25 06:34:58
领域: cs.CV,cs.CR
CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation
Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where adversaries poison training data to implant backdoor into the victim model. Current backdoor defenses on poisoned data often suffer from high computational costs or low effectiveness against advanced attacks like clean-label and clean-image backdoors. To address them, we introduce CLIP-Guided backdoor Defense (CGD), an efficient and effective method that mitigates various backdoor attacks. CGD utilizes a publicly accessible CLIP model to identify inputs that are likely to be clean or poisoned. It then retrains the model with these inputs, using CLIP's logits as a guidance to effectively neutralize the backdoor. Experiments on 4 datasets and 11 attack types demonstrate that CGD reduces attack success rates (ASRs) to below 1% while maintaining clean accuracy (CA) with a maximum drop of only 0.3%, outperforming existing defenses. Additionally, we show that clean-data-based defenses can be adapted to poisoned data using CGD. Also, CGD exhibits strong robustness, maintaining low ASRs even when employing a weaker CLIP model or when CLIP itself is compromised by a backdoor. These findings underscore CGD's exceptional efficiency, effectiveness, and applicability for real-world backdoor defense scenarios. Code: https://github.com/binyxu/CGD.
Updated: 2025-07-25 06:33:41
标题: 使用熵基于中毒数据集分离的CLIP引导后门防御
摘要: Deep Neural Networks (DNNs)是容易受到后门攻击的,即对手会毒害训练数据以植入后门到受害模型中。目前针对毒害数据的后门防御方法往往面临高计算成本或对高级攻击,如干净标签和干净图像后门,效果不佳的问题。为了解决这些问题,我们引入了CLIP-Guided后门防御(CGD),这是一种高效且有效的方法,可以减轻各种后门攻击。CGD利用公开可访问的CLIP模型来识别可能是干净或毒害的输入。然后,使用CLIP的logits作为指导,重新训练模型以有效中和后门。对4个数据集和11种攻击类型的实验表明,CGD将攻击成功率(ASR)降低到1%以下,同时保持干净准确率(CA)最多仅下降0.3%,优于现有的防御方法。此外,我们展示了基于干净数据的防御方法可以通过CGD适用于毒害数据。此外,CGD表现出强大的鲁棒性,即使使用较弱的CLIP模型或CLIP本身受到后门攻击,也能保持低ASR。这些发现强调了CGD在现实世界后门防御场景中的卓越效率、有效性和适用性。 代码: https://github.com/binyxu/CGD。
更新时间: 2025-07-25 06:33:41
领域: cs.MM,cs.CR,cs.LG,68T07,I.2.6
Verbalized Representation Learning for Interpretable Few-Shot Generalization
Humans recognize objects after observing only a few examples, a remarkable capability enabled by their inherent language understanding of the real-world environment. Developing verbalized and interpretable representation can significantly improve model generalization in low-data settings. In this work, we propose Verbalized Representation Learning (VRL), a novel approach for automatically extracting human-interpretable features for object recognition using few-shot data. Our method uniquely captures inter-class differences and intra-class commonalities in the form of natural language by employing a Vision-Language Model (VLM) to identify key discriminative features between different classes and shared characteristics within the same class. These verbalized features are then mapped to numeric vectors through the VLM. The resulting feature vectors can be further utilized to train and infer with downstream classifiers. Experimental results show that, at the same model scale, VRL achieves a 24% absolute improvement over prior state-of-the-art methods while using 95% less data and a smaller mode. Furthermore, compared to human-labeled attributes, the features learned by VRL exhibit a 20% absolute gain when used for downstream classification tasks. Code is available at: https://github.com/joeyy5588/VRL/tree/main.
Updated: 2025-07-25 06:33:25
标题: 可解释的少样本泛化的表征学习
摘要: 人类在观察几个示例后就能识别物体,这是一种非凡的能力,这种能力是由他们对现实世界环境的固有语言理解所实现的。开发可言说和可解释的表示形式可以显著提高在低数据环境中模型的泛化能力。在这项工作中,我们提出了一种名为Verbalized Representation Learning(VRL)的新方法,用于自动提取人类可解释的特征,以进行物体识别。我们的方法通过利用视觉语言模型(VLM)来以自然语言的形式捕捉类间差异和类内共同特征,以识别不同类别之间的关键区分特征和同一类别内的共享特征。然后,这些言语化特征通过VLM映射为数字向量。由此产生的特征向量可以进一步用于训练和推断下游分类器。实验结果表明,在相同的模型规模下,VRL在使用95%更少数据和更小的模型的情况下,比先前最先进的方法取得了24%的绝对改善。此外,与人工标记的属性相比,VRL学习到的特征在用于下游分类任务时表现出20%的绝对增益。代码可在以下链接找到:https://github.com/joeyy5588/VRL/tree/main。
更新时间: 2025-07-25 06:33:25
领域: cs.CV,cs.CL,cs.LG
Differentiated Thyroid Cancer Recurrence Classification Using Machine Learning Models and Bayesian Neural Networks with Varying Priors: A SHAP-Based Interpretation of the Best Performing Model
Differentiated thyroid cancer DTC recurrence is a major public health concern, requiring classification and predictive models that are not only accurate but also interpretable and uncertainty aware. This study introduces a comprehensive framework for DTC recurrence classification using a dataset containing 383 patients and 16 clinical and pathological variables. Initially, 11 machine learning ML models were employed using the complete dataset, where the Support Vector Machines SVM model achieved the highest accuracy of 0.9481. To reduce complexity and redundancy, feature selection was carried out using the Boruta algorithm, and the same ML models were applied to the reduced dataset, where it was observed that the Logistic Regression LR model obtained the maximum accuracy of 0.9611. However, these ML models often lack uncertainty quantification, which is critical in clinical decision making. Therefore, to address this limitation, the Bayesian Neural Networks BNN with six varying prior distributions, including Normal 0,1, Normal 0,10, Laplace 0,1, Cauchy 0,1, Cauchy 0,2.5, and Horseshoe 1, were implemented on both the complete and reduced datasets. The BNN model with Normal 0,10 prior distribution exhibited maximum accuracies of 0.9740 and 0.9870 before and after feature selection, respectively.
Updated: 2025-07-25 06:31:31
标题: 使用机器学习模型和具有不同先验的贝叶斯神经网络对分化型甲状腺癌复发进行分类:基于SHAP的最佳模型解释
摘要: 甲状腺分化癌(DTC)的复发是一个重要的公共卫生问题,需要分类和预测模型不仅准确,而且可解释和具有不确定性意识。本研究引入了一个全面的框架,用于使用包含383名患者和16个临床和病理变量的数据集对DTC复发进行分类。最初,使用完整数据集采用11个机器学习(ML)模型,其中支持向量机(SVM)模型实现了最高精度为0.9481。为了减少复杂性和冗余性,使用Boruta算法进行特征选择,并将相同的ML模型应用于减少后的数据集,观察到逻辑回归(LR)模型获得了最大精度0.9611。然而,这些ML模型通常缺乏不确定性量化,这在临床决策中至关重要。因此,为了解决这一限制,使用六种不同的先验分布,包括正态0,1,正态0,10,拉普拉斯0,1,柯西0,1,柯西0,2.5和马蹄1,将贝叶斯神经网络(BNN)模型分别应用于完整和减少后的数据集。具有正态0,10先验分布的BNN模型在特征选择前后分别达到了0.9740和0.9870的最大精度。
更新时间: 2025-07-25 06:31:31
领域: cs.LG,cs.AI
Differentiated Thyroid Cancer Recurrence Classification Using Machine Learning Models and Bayesian Neural Networks with Varying Priors: A SHAP-Based Interpretation of the Best Performing Model
Differentiated thyroid cancer DTC recurrence is a major public health concern, requiring classification and predictive models that are not only accurate but also interpretable and uncertainty aware. This study introduces a comprehensive framework for DTC recurrence classification using a dataset containing 383 patients and 16 clinical and pathological variables. Initially, 11 machine learning ML models were employed using the complete dataset, where the Support Vector Machines SVM model achieved the highest accuracy of 0.9481. To reduce complexity and redundancy, feature selection was carried out using the Boruta algorithm, and the same ML models were applied to the reduced dataset, where it was observed that the Logistic Regression LR model obtained the maximum accuracy of 0.9611. However, these ML models often lack uncertainty quantification, which is critical in clinical decision making. Therefore, to address this limitation, the Bayesian Neural Networks BNN with six varying prior distributions, including Normal 0,1, Normal 0,10, Laplace 0,1, Cauchy 0,1, Cauchy 0,2.5, and Horseshoe 1, were implemented on both the complete and reduced datasets. The BNN model with Normal 0,10 prior distribution exhibited maximum accuracies of 0.9740 and 0.9870 before and after feature selection, respectively.
Updated: 2025-07-25 06:31:31
标题: 使用机器学习模型和具有不同先验概率的贝叶斯神经网络对分化型甲状腺癌复发进行分类:基于SHAP的最佳模型解释
摘要: 甲状腺分化癌(DTC)的复发是一个重要的公共健康问题,需要准确、可解释并能识别不确定性的分类和预测模型。本研究引入了一个全面的框架,利用一个包含383名患者和16个临床和病理变量的数据集进行DTC复发分类。最初,使用完整数据集采用了11个机器学习(ML)模型,其中支持向量机(SVM)模型获得了最高准确率0.9481。为了降低复杂性和冗余性,使用Boruta算法进行特征选择,并将相同的ML模型应用于缩减的数据集,观察到逻辑回归(LR)模型获得了最大准确率0.9611。然而,这些ML模型往往缺乏不确定性量化,这在临床决策中至关重要。因此,为了解决这一限制,使用六种不同的先验分布(包括正态0,1、正态0,10、拉普拉斯0,1、柯西0,1、柯西0,2.5和马蹄1)在完整和缩减数据集上实现了贝叶斯神经网络(BNN)模型。具有正态0,10先验分布的BNN模型在特征选择前后分别展现了最大准确率0.9740和0.9870。
更新时间: 2025-07-25 06:31:31
领域: cs.LG,cs.AI
CoCoEvo: Co-Evolution of Programs and Test Cases to Enhance Code Generation
Large Language Models (LLMs) have shown remarkable performance in automated code generation. However, existing approaches often rely heavily on pre-defined test cases, which become impractical in scenarios where such cases are unavailable. While prior works explore filtering techniques between programs and test cases, they overlook the refinement of test cases. To address this limitation, we introduce CoCoEvo, a novel LLM-based co-evolution framework that simultaneously evolves programs and test cases. CoCoEvo eliminates the dependency on pre-defined test cases by generating both programs and test cases directly from natural language problem descriptions and function headers. The framework employs specialized evolutionary operators, including LLM-based crossover and mutation operators for program evolution, along with an additional test case generation operator for test case evolution. Additionally, we propose optimization strategies such as a crossover rate scheduler to balance exploration and convergence, and a multi-objective optimization method for test case selection. Experimental results on multiple state-of-the-art LLMs demonstrate that CoCoEvo surpasses existing methods, achieving state-of-the-art performance in automated code generation and testing. These results underscore the potential of co-evolutionary techniques in advancing the field of automated programming.
Updated: 2025-07-25 06:26:07
标题: CoCoEvo: 程序和测试用例的共同进化以增强代码生成
摘要: 大型语言模型(LLMs)在自动生成代码方面表现出色。然而,现有方法通常严重依赖预定义的测试用例,在这些用例不可用的情况下变得不切实际。虽然先前的研究探讨了程序和测试用例之间的过滤技术,但它们忽略了测试用例的改进。为了解决这个限制,我们引入了CoCoEvo,这是一个新颖的基于LLM的协同进化框架,同时演化程序和测试用例。CoCoEvo通过直接从自然语言问题描述和函数头部生成程序和测试用例来消除对预定义测试用例的依赖。该框架采用专门的进化算子,包括基于LLM的交叉和突变算子用于程序演化,以及一个额外的测试用例生成算子用于测试用例演化。此外,我们提出了优化策略,如交叉率调度器以平衡探索和收敛,并为测试用例选择提出了多目标优化方法。对多个最先进的LLMs进行的实验结果表明,CoCoEvo超越了现有方法,在自动生成代码和测试方面取得了最先进的性能。这些结果强调了协同进化技术在推动自动编程领域的潜力。
更新时间: 2025-07-25 06:26:07
领域: cs.SE,cs.AI
CoCoEvo: Co-Evolution of Programs and Test Cases to Enhance Code Generation
Large Language Models (LLMs) have shown remarkable performance in automated code generation. However, existing approaches often rely heavily on pre-defined test cases, which become impractical in scenarios where such cases are unavailable. While prior works explore filtering techniques between programs and test cases, they overlook the refinement of test cases. To address this limitation, we introduce CoCoEvo, a novel LLM-based co-evolution framework that simultaneously evolves programs and test cases. CoCoEvo eliminates the dependency on pre-defined test cases by generating both programs and test cases directly from natural language problem descriptions and function headers. The framework employs specialized evolutionary operators, including LLM-based crossover and mutation operators for program evolution, along with an additional test case generation operator for test case evolution. Additionally, we propose optimization strategies such as a crossover rate scheduler to balance exploration and convergence, and a multi-objective optimization method for test case selection. Experimental results on multiple state-of-the-art LLMs demonstrate that CoCoEvo surpasses existing methods, achieving state-of-the-art performance in automated code generation and testing. These results underscore the potential of co-evolutionary techniques in advancing the field of automated programming.
Updated: 2025-07-25 06:26:07
标题: CoCoEvo:程序和测试用例的共同演化以增强代码生成
摘要: 大型语言模型(LLMs)在自动化代码生成方面表现出色。然而,现有方法往往严重依赖预定义的测试用例,在这些用例不可用的情况下变得不切实际。尽管先前的研究探讨了程序和测试用例之间的筛选技术,但他们忽视了测试用例的细化。为了解决这一限制,我们引入了CoCoEvo,这是一个新颖的基于LLM的共同进化框架,可以同时进化程序和测试用例。CoCoEvo通过直接从自然语言问题描述和函数头生成程序和测试用例,从而消除了对预定义测试用例的依赖。该框架采用了专门的进化算子,包括基于LLM的交叉和突变算子用于程序进化,以及额外的测试用例生成算子用于测试用例进化。此外,我们提出了优化策略,如交叉率调度器来平衡探索和收敛,以及多目标优化方法用于测试用例选择。对多个最先进的LLMs进行的实验结果表明,CoCoEvo超越了现有方法,在自动化代码生成和测试方面表现出色。这些结果凸显了共同进化技术在推动自动化编程领域的潜力。
更新时间: 2025-07-25 06:26:07
领域: cs.SE,cs.AI
KASPER: Kolmogorov Arnold Networks for Stock Prediction and Explainable Regimes
Forecasting in financial markets remains a significant challenge due to their nonlinear and regime-dependent dynamics. Traditional deep learning models, such as long short-term memory networks and multilayer perceptrons, often struggle to generalize across shifting market conditions, highlighting the need for a more adaptive and interpretable approach. To address this, we introduce Kolmogorov-Arnold networks for stock prediction and explainable regimes (KASPER), a novel framework that integrates regime detection, sparse spline-based function modeling, and symbolic rule extraction. The framework identifies hidden market conditions using a Gumbel-Softmax-based mechanism, enabling regime-specific forecasting. For each regime, it employs Kolmogorov-Arnold networks with sparse spline activations to capture intricate price behaviors while maintaining robustness. Interpretability is achieved through symbolic learning based on Monte Carlo Shapley values, which extracts human-readable rules tailored to each regime. Applied to real-world financial time series from Yahoo Finance, the model achieves an $R^2$ score of 0.89, a Sharpe Ratio of 12.02, and a mean squared error as low as 0.0001, outperforming existing methods. This research establishes a new direction for regime-aware, transparent, and robust forecasting in financial markets.
Updated: 2025-07-25 06:21:24
标题: KASPER:用于股票预测和可解释制度的科尔莫戈洛夫阿诺德网络
摘要: 金融市场的预测仍然是一个重要挑战,因为其非线性和制度依赖性动态。传统的深度学习模型,如长短期记忆网络和多层感知器,通常难以泛化到不断变化的市场条件,突出了需要更具适应性和可解释性的方法。为了解决这个问题,我们引入了Kolmogorov-Arnold网络用于股票预测和可解释制度(KASPER),这是一个集成制度检测、稀疏样条函数建模和符号规则提取的新框架。该框架使用基于Gumbel-Softmax的机制识别隐藏的市场条件,实现特定制度的预测。对于每个制度,它采用Kolmogorov-Arnold网络与稀疏样条激活来捕捉复杂的价格行为同时保持稳健性。通过基于蒙特卡洛Shapley值的符号学习实现可解释性,提取适合每个制度的人类可读规则。应用于Yahoo Finance的真实金融时间序列,该模型实现了0.89的R^2得分,12.02的夏普比率和低至0.0001的均方误差,优于现有方法。这项研究为金融市场中具有制度意识、透明和稳健的预测奠定了新方向。
更新时间: 2025-07-25 06:21:24
领域: cs.LG
Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
The capabilities of recent large language models (LLMs) to generate high-quality content indistinguishable by humans from human-written texts raises many concerns regarding their misuse. Previous research has shown that LLMs can be effectively misused for generating disinformation news articles following predefined narratives. Their capabilities to generate personalized (in various aspects) content have also been evaluated and mostly found usable. However, a combination of personalization and disinformation abilities of LLMs has not been comprehensively studied yet. Such a dangerous combination should trigger integrated safety filters of the LLMs, if there are some. This study fills this gap by evaluating vulnerabilities of recent open and closed LLMs, and their willingness to generate personalized disinformation news articles in English. We further explore whether the LLMs can reliably meta-evaluate the personalization quality and whether the personalization affects the generated-texts detectability. Our results demonstrate the need for stronger safety-filters and disclaimers, as those are not properly functioning in most of the evaluated LLMs. Additionally, our study revealed that the personalization actually reduces the safety-filter activations; thus effectively functioning as a jailbreak. Such behavior must be urgently addressed by LLM developers and service providers.
Updated: 2025-07-25 06:20:38
标题: 评估LLM在个性化虚假信息生成中被滥用的漏洞
摘要: 最近大型语言模型(LLMs)生成高质量内容的能力,使其与人类撰写的文本无法区分,引发了许多关于其滥用的担忧。先前的研究表明,LLMs可以有效地被滥用用于生成遵循预定义叙事的虚假新闻文章。它们生成个性化(在各个方面)内容的能力也已得到评估,大多数被认为可用。然而,LLMs个性化和虚假信息生成能力的结合尚未得到全面研究。这种危险的组合应该引发LLMs的集成安全过滤器,如果存在的话。本研究通过评估最近的开放和封闭LLMs的漏洞以及它们生成个性化虚假新闻文章的意愿来填补这一空白。我们进一步探讨LLMs是否可以可靠地对个性化质量进行元评估,以及个性化是否影响生成文本的可检测性。我们的结果表明需要更强大的安全过滤器和免责声明,因为在大多数评估的LLMs中这些功能并不正常。此外,我们的研究揭示了个性化实际上会减少安全过滤器的激活,因此有效地充当越狱。LLMs的开发人员和服务提供商必须紧急解决这种行为。
更新时间: 2025-07-25 06:20:38
领域: cs.CL,cs.AI,cs.CY
Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
The capabilities of recent large language models (LLMs) to generate high-quality content indistinguishable by humans from human-written texts raises many concerns regarding their misuse. Previous research has shown that LLMs can be effectively misused for generating disinformation news articles following predefined narratives. Their capabilities to generate personalized (in various aspects) content have also been evaluated and mostly found usable. However, a combination of personalization and disinformation abilities of LLMs has not been comprehensively studied yet. Such a dangerous combination should trigger integrated safety filters of the LLMs, if there are some. This study fills this gap by evaluating vulnerabilities of recent open and closed LLMs, and their willingness to generate personalized disinformation news articles in English. We further explore whether the LLMs can reliably meta-evaluate the personalization quality and whether the personalization affects the generated-texts detectability. Our results demonstrate the need for stronger safety-filters and disclaimers, as those are not properly functioning in most of the evaluated LLMs. Additionally, our study revealed that the personalization actually reduces the safety-filter activations; thus effectively functioning as a jailbreak. Such behavior must be urgently addressed by LLM developers and service providers.
Updated: 2025-07-25 06:20:38
标题: 评估低级编程语言(LLM)易受个性化虚假信息生成滥用的脆弱性
摘要: 最近大型语言模型(LLMs)生成高质量内容的能力已经达到了人类无法区分与人类撰写文本的水平,引发了许多关于其滥用的担忧。先前的研究表明,LLMs可以有效地被滥用来生成遵循预定义叙述的虚假新闻文章。它们生成个性化内容的能力也已经得到评估,并且大多数情况下是可用的。然而,LLMs的个性化和虚假信息生成能力的结合尚未得到全面研究。这种危险的组合应该触发LLMs的综合安全过滤器,如果有的话。本研究通过评估最近的开放和闭合LLMs的脆弱性,以及它们生成英文个性化虚假新闻文章的意愿来填补这一空白。我们进一步探讨LLMs是否能够可靠地元评估个性化质量,以及个性化是否影响生成文本的可检测性。我们的结果表明,需要更强大的安全过滤器和免责声明,因为大多数评估的LLMs在这方面功能不全。此外,我们的研究还揭示了个性化实际上降低了安全过滤器的激活次数;因此有效地充当了越狱的作用。LLM的开发者和服务提供商必须紧急解决这种行为。
更新时间: 2025-07-25 06:20:38
领域: cs.CL,cs.AI,cs.CY
Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies
Multi-agent reinforcement learning (MARL) for cyber-physical vehicle systems usually requires a significantly long training time due to their inherent complexity. Furthermore, deploying the trained policies in the real world demands a feature-rich environment along with multiple physical embodied agents, which may not be feasible due to monetary, physical, energy, or safety constraints. This work seeks to address these pain points by presenting a mixed-reality (MR) digital twin (DT) framework capable of: (i) boosting training speeds by selectively scaling parallelized simulation workloads on-demand, and (ii) immersing the MARL policies across hybrid simulation-to-reality (sim2real) experiments. The viability and performance of the proposed framework are highlighted through two representative use cases, which cover cooperative as well as competitive classes of MARL problems. We study the effect of: (i) agent and environment parallelization on training time, and (ii) systematic domain randomization on zero-shot sim2real transfer, across both case studies. Results indicate up to 76.3% reduction in training time with the proposed parallelization scheme and sim2real gap as low as 2.9% using the proposed deployment method.
Updated: 2025-07-25 06:15:33
标题: 混合现实数字孪生体:利用物理和虚拟世界实现多智能体强化学习策略的混合Sim2Real过渡
摘要: 多智能体强化学习(MARL)用于网络物理车辆系统通常需要很长的训练时间,因为其固有的复杂性。此外,在现实世界中部署训练好的策略需要一个功能丰富的环境以及多个物理实体代理,由于金钱、物理、能源或安全限制,这可能是不可行的。本研究旨在通过提出一个混合现实(MR)数字孪生(DT)框架来解决这些问题,该框架能够:(i)通过有选择地扩展并行化仿真工作负载来加快训练速度,并且(ii)将MARL策略沉浸在混合仿真-现实(sim2real)实验中。通过两个代表性用例突出了所提出框架的可行性和性能,涵盖合作和竞争性MARL问题类别。我们研究了:(i)代理和环境并行化对训练时间的影响,以及(ii)系统化领域随机化对零次仿真到现实转移的影响,在两个案例研究中。结果表明,通过所提出的并行化方案训练时间减少高达76.3%,并且使用所提出的部署方法sim2real差距低至2.9%。
更新时间: 2025-07-25 06:15:33
领域: cs.RO,cs.LG,cs.MA
MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts
Recent LLMs are able to generate high-quality multilingual texts, indistinguishable for humans from authentic human-written ones. Research in machine-generated text detection is however mostly focused on the English language and longer texts, such as news articles, scientific papers or student essays. Social-media texts are usually much shorter and often feature informal language, grammatical errors, or distinct linguistic items (e.g., emoticons, hashtags). There is a gap in studying the ability of existing methods in detection of such texts, reflected also in the lack of existing multilingual benchmark datasets. To fill this gap we propose the first multilingual (22 languages) and multi-platform (5 social media platforms) dataset for benchmarking machine-generated text detection in the social-media domain, called MultiSocial. It contains 472,097 texts, of which about 58k are human-written and approximately the same amount is generated by each of 7 multilingual LLMs. We use this benchmark to compare existing detection methods in zero-shot as well as fine-tuned form. Our results indicate that the fine-tuned detectors have no problem to be trained on social-media texts and that the platform selection for training matters.
Updated: 2025-07-25 06:08:21
标题: 多语言:多语言机器生成文本社交媒体文本检测基准
摘要: 最近的大型语言模型能够生成高质量的多语言文本,与人类写作的文本无法区分。然而,机器生成文本检测的研究主要集中在英语和较长的文本,如新闻文章、科学论文或学生论文。社交媒体文本通常要短得多,并经常包含非正式语言、语法错误或独特的语言项目(例如表情符号、主题标签)。现有方法在检测此类文本能力方面存在差距,也反映在缺乏现有多语言基准数据集。为了填补这一空白,我们提出了第一个多语言(22种语言)和多平台(5个社交媒体平台)数据集,用于在社交媒体领域评估机器生成文本检测的基准,称为MultiSocial。它包含472,097个文本,其中约58,000个是人类写作的,大约同样数量由7个多语言大型语言模型生成。我们使用这个基准数据集来比较现有的检测方法在零样本和微调形式下的表现。我们的结果表明,微调的检测器在社交媒体文本上训练没有问题,而训练的平台选择也很重要。
更新时间: 2025-07-25 06:08:21
领域: cs.CL,cs.AI
MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts
Recent LLMs are able to generate high-quality multilingual texts, indistinguishable for humans from authentic human-written ones. Research in machine-generated text detection is however mostly focused on the English language and longer texts, such as news articles, scientific papers or student essays. Social-media texts are usually much shorter and often feature informal language, grammatical errors, or distinct linguistic items (e.g., emoticons, hashtags). There is a gap in studying the ability of existing methods in detection of such texts, reflected also in the lack of existing multilingual benchmark datasets. To fill this gap we propose the first multilingual (22 languages) and multi-platform (5 social media platforms) dataset for benchmarking machine-generated text detection in the social-media domain, called MultiSocial. It contains 472,097 texts, of which about 58k are human-written and approximately the same amount is generated by each of 7 multilingual LLMs. We use this benchmark to compare existing detection methods in zero-shot as well as fine-tuned form. Our results indicate that the fine-tuned detectors have no problem to be trained on social-media texts and that the platform selection for training matters.
Updated: 2025-07-25 06:08:21
标题: MultiSocial:社交媒体文本机器生成文本检测的多语言基准
摘要: 最近的大型语言模型能够生成高质量的多语言文本,与人类撰写的真实文本无法区分。然而,机器生成文本检测领域的研究主要集中在英语和较长的文本,如新闻文章、科学论文或学生论文。社交媒体文本通常更短,经常包含非正式语言、语法错误或独特的语言项目(例如表情符号、标签)。目前缺乏研究现有方法在检测此类文本方面的能力,这也反映在缺乏现有多语言基准数据集上。为填补这一空白,我们提出了第一个多语言(22种语言)和多平台(5个社交媒体平台)数据集,用于在社交媒体领域对机器生成文本检测进行基准测试,称为MultiSocial。它包含472,097个文本,其中约58,000个是人工撰写的,大约同样数量是由7个多语言的大型语言模型生成的。我们利用这个基准数据集比较现有的零样本和微调形式的检测方法。我们的结果表明,微调过的检测器无法在社交媒体文本上训练,平台选择对于训练是重要的。
更新时间: 2025-07-25 06:08:21
领域: cs.CL,cs.AI
Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits
We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To address this, we introduce a novel probing framework that strategically gathers information about selected arms before allocation. In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound. For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness. Extensive experiments on synthetic and real-world datasets show that our approach outperforms baseline methods, achieving better fairness and efficiency.
Updated: 2025-07-25 06:06:47
标题: 公平算法与探测对多智能体多臂老虎机的影响
摘要: 我们提出了一个旨在确保跨代理之间公平结果的多智能体多臂老虎机(MA-MAB)框架,同时最大化整体系统性能。在这种情况下的一个关键挑战是在有关臂奖励的有限信息下做出决策。为了解决这个问题,我们引入了一个新颖的探测框架,可以在分配之前有策略地收集关于选定臂的信息。在已知奖励分布的离线设置中,我们利用次模性质设计了一个具有可证明性能界限的贪婪探测算法。对于更复杂的在线设置,我们开发了一个算法,实现次线性后悔同时保持公平性。在合成和真实世界数据集上的大量实验表明,我们的方法优于基线方法,实现更好的公平性和效率。
更新时间: 2025-07-25 06:06:47
领域: cs.LG,cs.AI
Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits
We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To address this, we introduce a novel probing framework that strategically gathers information about selected arms before allocation. In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound. For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness. Extensive experiments on synthetic and real-world datasets show that our approach outperforms baseline methods, achieving better fairness and efficiency.
Updated: 2025-07-25 06:06:47
标题: 具有探测功能的公平算法用于多智能体多臂老虎机
摘要: 我们提出了一个旨在确保在各个代理之间获得公平结果的多智能体多臂老虎机(MA-MAB)框架,同时最大化整体系统性能。在这种情况下的一个关键挑战是在有关臂奖励的信息有限的情况下进行决策。为了解决这个问题,我们引入了一个新颖的探测框架,可以在分配之前有策略地收集有关所选臂的信息。在已知奖励分布的离线设置中,我们利用次模性质设计了一个具有可证明性能界限的贪婪探测算法。对于更复杂的在线设置,我们开发了一个算法,可以实现亚线性后悔,并保持公平性。对合成和现实数据集进行了大量实验,结果表明我们的方法优于基准方法,实现了更好的公平性和效率。
更新时间: 2025-07-25 06:06:47
领域: cs.LG,cs.AI
Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution
While diffusion models excel at image generation, their growing adoption raises critical concerns around copyright issues and model transparency. Existing attribution methods identify training examples influencing an entire image, but fall short in isolating contributions to specific elements, such as styles or objects, that matter most to stakeholders. To bridge this gap, we introduce \emph{concept-level attribution} via a novel method called \emph{Concept-TRAK}. Concept-TRAK extends influence functions with two key innovations: (1) a reformulated diffusion training loss based on diffusion posterior sampling, enabling robust, sample-specific attribution; and (2) a concept-aware reward function that emphasizes semantic relevance. We evaluate Concept-TRAK on the AbC benchmark, showing substantial improvements over prior methods. Through diverse case studies--ranging from identifying IP-protected and unsafe content to analyzing prompt engineering and compositional learning--we demonstrate how concept-level attribution yields actionable insights for responsible generative AI development and governance.
Updated: 2025-07-25 06:06:12
标题: Concept-TRAK:通过概念级别的归因理解扩散模型学习概念
摘要: 尽管扩散模型在图像生成方面表现出色,但它们日益被采用引发了关于版权问题和模型透明度的重要关注。现有的归因方法确定影响整个图像的训练示例,但在隔离对特定元素(如风格或对象)的贡献方面表现不佳,而这些对利益相关者最为重要。为了弥合这一差距,我们通过一种名为Concept-TRAK的新方法引入了\emph{概念级别的归因}。Concept-TRAK通过两个关键创新扩展了影响函数:(1)基于扩散后验抽样的改进扩散训练损失,实现了稳健的、样本特定的归因;(2)一个概念感知的奖励函数,强调语义相关性。我们在AbC基准上评估了Concept-TRAK,显示出与先前方法相比的显著改进。通过各种案例研究--从识别受知识产权保护和不安全内容到分析提示工程和组合学习--我们展示了概念级别的归因如何为负责任的生成式人工智能开发和治理提供可操作的见解。
更新时间: 2025-07-25 06:06:12
领域: cs.CV,cs.LG
Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs through Global Similarity and Weighted Sampling
Temporal Knowledge Graph (TKG) completion models traditionally assume access to the entire graph during training. This overlooks challenges stemming from the evolving nature of TKGs, such as: (i) the model's requirement to generalize and assimilate new knowledge, and (ii) the task of managing new or unseen entities that often have sparse connections. In this paper, we present an incremental training framework specifically designed for TKGs, aiming to address entities that are either not observed during training or have sparse connections. Our approach combines a model-agnostic enhancement layer with a weighted sampling strategy, that can be augmented to and improve any existing TKG completion method. The enhancement layer leverages a broader, global definition of entity similarity, which moves beyond mere local neighborhood proximity of GNN-based methods. The weighted sampling strategy employed in training accentuates edges linked to infrequently occurring entities. We evaluate our method on two benchmark datasets, and demonstrate that our framework outperforms existing methods in total link prediction, inductive link prediction, and in addressing long-tail entities. Notably, our method achieves a 10\% improvement and a 15\% boost in MRR for these datasets. The results underscore the potential of our approach in mitigating catastrophic forgetting and enhancing the robustness of TKG completion methods, especially in an incremental training context
Updated: 2025-07-25 06:02:48
标题: 朝着通过全局相似性和加权抽样改进时间知识图中的长尾实体预测
摘要: 时间知识图谱(Temporal Knowledge Graph,TKG)完成模型传统上假设在训练过程中可以访问整个图形。这忽视了源于TKG不断演变的挑战,例如:(i)模型需要泛化和吸收新知识,以及(ii)管理常常具有稀疏连接的新或未见实体的任务。在本文中,我们提出了一个专门为TKG设计的增量训练框架,旨在解决训练过程中未观察到或具有稀疏连接的实体。我们的方法结合了一个与模型无关的增强层和加权抽样策略,可以扩展并改进任何现有的TKG完成方法。增强层利用了更广泛的全局实体相似性定义,超越了基于GNN方法的仅仅局部邻域接近。训练中采用的加权抽样策略强调与不经常出现的实体相关联的边。我们在两个基准数据集上评估了我们的方法,并展示了我们的框架在总链接预测、归纳链接预测和解决长尾实体方面优于现有方法。值得注意的是,我们的方法在这些数据集上实现了10%的改进和15%的MRR提升。结果强调了我们的方法在缓解灾难性遗忘和增强TKG完成方法的稳健性方面的潜力,特别是在增量训练环境中。
更新时间: 2025-07-25 06:02:48
领域: cs.AI
Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs through Global Similarity and Weighted Sampling
Temporal Knowledge Graph (TKG) completion models traditionally assume access to the entire graph during training. This overlooks challenges stemming from the evolving nature of TKGs, such as: (i) the model's requirement to generalize and assimilate new knowledge, and (ii) the task of managing new or unseen entities that often have sparse connections. In this paper, we present an incremental training framework specifically designed for TKGs, aiming to address entities that are either not observed during training or have sparse connections. Our approach combines a model-agnostic enhancement layer with a weighted sampling strategy, that can be augmented to and improve any existing TKG completion method. The enhancement layer leverages a broader, global definition of entity similarity, which moves beyond mere local neighborhood proximity of GNN-based methods. The weighted sampling strategy employed in training accentuates edges linked to infrequently occurring entities. We evaluate our method on two benchmark datasets, and demonstrate that our framework outperforms existing methods in total link prediction, inductive link prediction, and in addressing long-tail entities. Notably, our method achieves a 10\% improvement and a 15\% boost in MRR for these datasets. The results underscore the potential of our approach in mitigating catastrophic forgetting and enhancing the robustness of TKG completion methods, especially in an incremental training context
Updated: 2025-07-25 06:02:48
标题: 朝着通过全局相似性和加权抽样改进时间知识图中长尾实体预测
摘要: 时间知识图谱 (TKG) 补全模型传统上假设在训练过程中可以访问整个图谱。这忽略了源自 TKG 演变性质的挑战,如:(i) 模型需要泛化和吸收新知识,以及 (ii) 管理新或未见实体的任务,这些实体通常连接稀疏。在本文中,我们提出了一个专门为 TKG 设计的增量式训练框架,旨在处理在训练过程中未观察到或连接稀疏的实体。我们的方法结合了一个与模型无关的增强层和加权采样策略,可以用于改进任何现有的 TKG 补全方法。增强层利用更广泛的全局实体相似性定义,超越了仅基于 GNN 方法的局部邻域接近。在训练中采用的加权采样策略突出显示与不经常出现的实体相关的边缘。我们在两个基准数据集上评估了我们的方法,并展示了我们的框架在总链接预测、归纳链接预测和解决长尾实体方面优于现有方法。值得注意的是,我们的方法在这些数据集上的 MRR 提高了 10%,总体性能提高了 15%。结果强调了我们的方法在减轻灾难性遗忘和增强 TKG 补全方法的鲁棒性方面的潜力,尤其是在增量式训练环境中。
更新时间: 2025-07-25 06:02:48
领域: cs.AI
Secure Best Arm Identification in the Presence of a Copycat
Consider the problem of best arm identification with a security constraint. Specifically, assume a setup of stochastic linear bandits with $K$ arms of dimension $d$. In each arm pull, the player receives a reward that is the sum of the dot product of the arm with an unknown parameter vector and independent noise. The player's goal is to identify the best arm after $T$ arm pulls. Moreover, assume a copycat Chloe is observing the arm pulls. The player wishes to keep Chloe ignorant of the best arm. While a minimax--optimal algorithm identifies the best arm with an $\Omega\left(\frac{T}{\log(d)}\right)$ error exponent, it easily reveals its best-arm estimate to an outside observer, as the best arms are played more frequently. A naive secure algorithm that plays all arms equally results in an $\Omega\left(\frac{T}{d}\right)$ exponent. In this paper, we propose a secure algorithm that plays with \emph{coded arms}. The algorithm does not require any key or cryptographic primitives, yet achieves an $\Omega\left(\frac{T}{\log^2(d)}\right)$ exponent while revealing almost no information on the best arm.
Updated: 2025-07-25 06:00:44
标题: 在存在模仿者的情况下确保最佳手臂识别
摘要: 考虑带有安全约束的最佳臂识别问题。具体地,假设存在一个具有$K$臂和维度$d$的随机线性臂带设置。在每次臂拉动中,玩家将获得一个奖励,该奖励是臂与未知参数向量的点积和独立噪声的总和。玩家的目标是在$T$次臂拉动后识别最佳臂。此外,假设有一个观察臂拉动的模仿者Chloe。玩家希望让Chloe对最佳臂一无所知。尽管一个极小极大的最优算法以$\Omega\left(\frac{T}{\log(d)}\right)$的错误指数识别最佳臂,但它很容易向外部观察者透露其最佳臂估计,因为最佳臂被更频繁地玩。一个简单的安全算法,即所有臂都平等地玩,结果是一个$\Omega\left(\frac{T}{d}\right)$指数。在本文中,我们提出了一个使用“编码臂”进行玩的安全算法。该算法不需要任何密钥或密码原语,但却实现了一个$\Omega\left(\frac{T}{\log^2(d)}\right)$指数,同时几乎不透露有关最佳臂的任何信息。
更新时间: 2025-07-25 06:00:44
领域: cs.LG
A Toolbox, Not a Hammer -- Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation
Augmenting large language models (LLMs) with external tools is a promising avenue for developing high-performance mathematical reasoning systems. Prior tool-augmented approaches typically finetune an LLM to select and invoke a single tool at each reasoning step and show promising results on simpler math reasoning benchmarks such as GSM8K. However, these approaches struggle with more complex math problems that require precise reasoning over multiple steps. To address this limitation, in this work, we propose Multi-TAG, a Multi-Tool AGgregation-based framework. Instead of relying on a single tool, Multi-TAG guides an LLM to concurrently invoke multiple tools at each reasoning step. It then aggregates their diverse outputs to verify and refine the reasoning process, enhancing solution robustness and accuracy. Notably, Multi-TAG is a finetuning-free, inference-only framework, making it readily applicable to any LLM backbone, including large open-weight models which are computationally expensive to finetune and proprietary frontier models which cannot be finetuned with custom recipes. We evaluate Multi-TAG on four challenging benchmarks: MATH500, AIME, AMC, and OlympiadBench. Across both open-weight and closed-source LLM backbones, Multi-TAG consistently and substantially outperforms state-of-the-art baselines, achieving average improvements of 6.0% to 7.5% over state-of-the-art baselines.
Updated: 2025-07-25 05:57:47
标题: 一个工具箱,而不是一把锤子--多标签组合:用多工具聚合来扩展数学推理
摘要: 将大型语言模型(LLMs)与外部工具结合起来是开发高性能数学推理系统的一种有前途的途径。先前的工具增强方法通常是对LLM进行微调,以在每个推理步骤中选择和调用单个工具,并在简单的数学推理基准测试如GSM8K上显示出有希望的结果。然而,这些方法在需要对多个步骤进行精确推理的更复杂的数学问题上表现不佳。为了解决这一限制,本文提出了Multi-TAG,一种基于多工具聚合的框架。Multi-TAG不依赖于单个工具,而是引导LLM在每个推理步骤中同时调用多个工具。然后,它聚合它们的多样化输出来验证和完善推理过程,增强解决方案的稳健性和准确性。值得注意的是,Multi-TAG是一种无需微调,仅推理的框架,使其可以轻松应用于任何LLM骨干,包括计算昂贵的大型开放权重模型和无法使用自定义配方进行微调的专有前沿模型。我们在四个具有挑战性的基准测试(MATH500、AIME、AMC和OlympiadBench)上评估了Multi-TAG。在开放权重和封闭源LLM骨干上,Multi-TAG始终且显著地优于最先进的基线,平均改进幅度为6.0%至7.5%。
更新时间: 2025-07-25 05:57:47
领域: cs.CL,cs.AI,cs.LG
A Toolbox, Not a Hammer -- Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation
Augmenting large language models (LLMs) with external tools is a promising avenue for developing high-performance mathematical reasoning systems. Prior tool-augmented approaches typically finetune an LLM to select and invoke a single tool at each reasoning step and show promising results on simpler math reasoning benchmarks such as GSM8K. However, these approaches struggle with more complex math problems that require precise reasoning over multiple steps. To address this limitation, in this work, we propose Multi-TAG, a Multi-Tool AGgregation-based framework. Instead of relying on a single tool, Multi-TAG guides an LLM to concurrently invoke multiple tools at each reasoning step. It then aggregates their diverse outputs to verify and refine the reasoning process, enhancing solution robustness and accuracy. Notably, Multi-TAG is a finetuning-free, inference-only framework, making it readily applicable to any LLM backbone, including large open-weight models which are computationally expensive to finetune and proprietary frontier models which cannot be finetuned with custom recipes. We evaluate Multi-TAG on four challenging benchmarks: MATH500, AIME, AMC, and OlympiadBench. Across both open-weight and closed-source LLM backbones, Multi-TAG consistently and substantially outperforms state-of-the-art baselines, achieving average improvements of 6.0% to 7.5% over state-of-the-art baselines.
Updated: 2025-07-25 05:57:47
标题: 一个工具箱,而不是一把锤子——Multi-TAG:通过多工具聚合扩展数学推理
摘要: 将大型语言模型(LLMs)与外部工具结合起来是开发高性能数学推理系统的一个有前途的途径。先前的工具增强方法通常是对LLM进行微调,以在每个推理步骤中选择并调用单个工具,并在较简单的数学推理基准测试(如GSM8K)上展示了有希望的结果。然而,这些方法在需要对多个步骤进行精确推理的更复杂的数学问题上遇到困难。为了解决这一限制,在这项工作中,我们提出了Multi-TAG,一种基于多工具聚合的框架。Multi-TAG 不依赖于单一工具,而是指导LLM在每个推理步骤中同时调用多个工具。然后,它聚合它们的多样化输出来验证和完善推理过程,增强解决方案的稳健性和准确性。值得注意的是,Multi-TAG 是一种无微调、仅推理的框架,使其可以轻松应用于任何LLM骨干,包括计算昂贵的大型开放权重模型和不能用自定义配方进行微调的专有前沿模型。我们在四个具有挑战性的基准测试上评估了Multi-TAG:MATH500,AIME,AMC 和OlympiadBench。在开放权重和闭源LLM骨干上,Multi-TAG 一直稳定且显著地优于最先进的基线,平均改进幅度为6.0%至7.5%。
更新时间: 2025-07-25 05:57:47
领域: cs.CL,cs.AI,cs.LG
Tell Me What to Track: Infusing Robust Language Guidance for Enhanced Referring Multi-Object Tracking
Referring multi-object tracking (RMOT) is an emerging cross-modal task that aims to localize an arbitrary number of targets based on a language expression and continuously track them in a video. This intricate task involves reasoning on multi-modal data and precise target localization with temporal association. However, prior studies overlook the imbalanced data distribution between newborn targets and existing targets due to the nature of the task. In addition, they only indirectly fuse multi-modal features, struggling to deliver clear guidance on newborn target detection. To solve the above issues, we conduct a collaborative matching strategy to alleviate the impact of the imbalance, boosting the ability to detect newborn targets while maintaining tracking performance. In the encoder, we integrate and enhance the cross-modal and multi-scale fusion, overcoming the bottlenecks in previous work, where limited multi-modal information is shared and interacted between feature maps. In the decoder, we also develop a referring-infused adaptation that provides explicit referring guidance through the query tokens. The experiments showcase the superior performance of our model (+3.42%) compared to prior works, demonstrating the effectiveness of our designs.
Updated: 2025-07-25 05:50:30
标题: 告诉我要跟踪什么:为增强引用多目标跟踪的语言指导注入强大的指导
摘要: Referring multi-object tracking (RMOT)是一项新兴的跨模态任务,旨在基于语言表达式定位任意数量的目标,并在视频中持续跟踪它们。这一复杂任务涉及对多模态数据和精确目标定位以及时间关联的推理。然而,先前的研究忽视了由于任务性质而存在的新生目标和现有目标之间的数据分布不平衡。此外,它们仅间接融合多模态特征,难以提供对新生目标检测的清晰指导。为了解决上述问题,我们采用协作匹配策略来缓解不平衡的影响,提高检测新生目标的能力同时保持跟踪性能。在编码器中,我们整合和增强了跨模态和多尺度融合,克服了先前工作中有限的多模态信息在特征图之间共享和交互的瓶颈。在解码器中,我们还开发了一个注入指引的适应性,通过查询令牌提供明确的指导。实验证明,我们的模型相较于先前的研究具有更优越的性能(+3.42%),证明了我们设计的有效性。
更新时间: 2025-07-25 05:50:30
领域: cs.CV,cs.AI
Tell Me What to Track: Infusing Robust Language Guidance for Enhanced Referring Multi-Object Tracking
Referring multi-object tracking (RMOT) is an emerging cross-modal task that aims to localize an arbitrary number of targets based on a language expression and continuously track them in a video. This intricate task involves reasoning on multi-modal data and precise target localization with temporal association. However, prior studies overlook the imbalanced data distribution between newborn targets and existing targets due to the nature of the task. In addition, they only indirectly fuse multi-modal features, struggling to deliver clear guidance on newborn target detection. To solve the above issues, we conduct a collaborative matching strategy to alleviate the impact of the imbalance, boosting the ability to detect newborn targets while maintaining tracking performance. In the encoder, we integrate and enhance the cross-modal and multi-scale fusion, overcoming the bottlenecks in previous work, where limited multi-modal information is shared and interacted between feature maps. In the decoder, we also develop a referring-infused adaptation that provides explicit referring guidance through the query tokens. The experiments showcase the superior performance of our model (+3.42%) compared to prior works, demonstrating the effectiveness of our designs.
Updated: 2025-07-25 05:50:30
标题: 告诉我要追踪什么:为增强引用多目标跟踪注入强大的语言指导
摘要: 多目标跟踪引用(RMOT)是一项新兴的跨模态任务,旨在基于语言表达定位任意数量的目标,并在视频中持续跟踪它们。这一复杂的任务涉及对多模态数据的推理和精确的目标定位与时间关联。然而,先前的研究忽视了由于任务的性质而存在的新生目标和现有目标之间的数据分布不平衡。此外,它们仅间接融合多模态特征,难以为新生目标检测提供清晰的指导。为了解决上述问题,我们采用协作匹配策略来减轻不平衡的影响,提高检测新生目标的能力同时保持跟踪性能。在编码器中,我们整合和增强了跨模态和多尺度融合,克服了先前工作中有限的多模态信息在特征图之间共享和交互的瓶颈。在解码器中,我们还开发了一个引用注入的调整,通过查询令牌提供明确的引用指导。实验展示了我们的模型相对于先前的工作表现出的优越性能(+3.42%),证明了我们设计的有效性。
更新时间: 2025-07-25 05:50:30
领域: cs.CV,cs.AI
TiVy: Time Series Visual Summary for Scalable Visualization
Visualizing multiple time series presents fundamental tradeoffs between scalability and visual clarity. Time series capture the behavior of many large-scale real-world processes, from stock market trends to urban activities. Users often gain insights by visualizing them as line charts, juxtaposing or superposing multiple time series to compare them and identify trends and patterns. However, existing representations struggle with scalability: when covering long time spans, leading to visual clutter from too many small multiples or overlapping lines. We propose TiVy, a new algorithm that summarizes time series using sequential patterns. It transforms the series into a set of symbolic sequences based on subsequence visual similarity using Dynamic Time Warping (DTW), then constructs a disjoint grouping of similar subsequences based on the frequent sequential patterns. The grouping result, a visual summary of time series, provides uncluttered superposition with fewer small multiples. Unlike common clustering techniques, TiVy extracts similar subsequences (of varying lengths) aligned in time. We also present an interactive time series visualization that renders large-scale time series in real-time. Our experimental evaluation shows that our algorithm (1) extracts clear and accurate patterns when visualizing time series data, (2) achieves a significant speed-up (1000X) compared to a straightforward DTW clustering. We also demonstrate the efficiency of our approach to explore hidden structures in massive time series data in two usage scenarios.
Updated: 2025-07-25 05:50:01
标题: TiVy:可扩展可视化的时间序列视觉总结
摘要: 可视化多个时间序列存在基本的可伸缩性和视觉清晰度之间的权衡。时间序列捕捉了许多大规模现实世界过程的行为,从股市趋势到城市活动。用户通常通过将它们可视化为线形图来获得见解,将多个时间序列并置或叠加以比较它们并识别趋势和模式。然而,现有的表示形式在可伸缩性方面存在问题:当覆盖长时间跨度时,由于太多小的多重图或重叠线条而导致视觉混乱。我们提出了TiVy,一种使用顺序模式总结时间序列的新算法。它通过使用动态时间规整(DTW)将序列转换为一组基于子序列视觉相似性的符号序列,然后根据频繁的顺序模式构建相似子序列的不相交分组。分组结果,即时间序列的视觉摘要,提供了更少的小倍数的无杂乱叠加。与常见的聚类技术不同,TiVy提取在时间上对齐的相似子序列(长度各异)。我们还提供了一个交互式时间序列可视化,可以实时呈现大规模时间序列。我们的实验评估表明,我们的算法(1)在可视化时间序列数据时提取了清晰和准确的模式,(2)与直接的DTW聚类相比实现了显著的加速(1000倍)。我们还展示了我们的方法在两种使用场景中探索大规模时间序列数据中隐藏结构的高效性。
更新时间: 2025-07-25 05:50:01
领域: cs.GR,cs.LG
Underwater Waste Detection Using Deep Learning A Performance Comparison of YOLOv7 to 10 and Faster RCNN
Underwater pollution is one of today's most significant environmental concerns, with vast volumes of garbage found in seas, rivers, and landscapes around the world. Accurate detection of these waste materials is crucial for successful waste management, environmental monitoring, and mitigation strategies. In this study, we investigated the performance of five cutting-edge object recognition algorithms, namely YOLO (You Only Look Once) models, including YOLOv7, YOLOv8, YOLOv9, YOLOv10, and Faster Region-Convolutional Neural Network (R-CNN), to identify which model was most effective at recognizing materials in underwater situations. The models were thoroughly trained and tested on a large dataset containing fifteen different classes under diverse conditions, such as low visibility and variable depths. From the above-mentioned models, YOLOv8 outperformed the others, with a mean Average Precision (mAP) of 80.9%, indicating a significant performance. This increased performance is attributed to YOLOv8's architecture, which incorporates advanced features such as improved anchor-free mechanisms and self-supervised learning, allowing for more precise and efficient recognition of items in a variety of settings. These findings highlight the YOLOv8 model's potential as an effective tool in the global fight against pollution, improving both the detection capabilities and scalability of underwater cleanup operations.
Updated: 2025-07-25 05:36:37
标题: 深度学习在水下废物检测中的应用:YOLOv7、YOLOv10和Faster RCNN性能比较
摘要: 水下污染是当今最重要的环境问题之一,世界各地的海洋、河流和景观中发现了大量垃圾。准确检测这些废弃物是成功的废物管理、环境监测和缓解策略的关键。在这项研究中,我们调查了五种前沿的物体识别算法的性能,即YOLO(You Only Look Once)模型,包括YOLOv7、YOLOv8、YOLOv9、YOLOv10和Faster Region-Convolutional Neural Network(R-CNN),以确定哪种模型在水下情况下最有效地识别材料。这些模型经过充分训练并在一个包含十五种不同类别的大型数据集上进行了测试,包括低能见度和不同深度等多种条件。在上述模型中,YOLOv8表现优于其他模型,平均精度(mAP)为80.9%,表明其性能显著。这种提高的性能归因于YOLOv8的架构,该架构包括改进的无锚机制和自监督学习等先进功能,使其能够更精确、更高效地识别各种环境中的物品。这些发现突显了YOLOv8模型作为全球污染防治的有效工具的潜力,提高了水下清理作业的检测能力和可扩展性。
更新时间: 2025-07-25 05:36:37
领域: cs.CV,cs.AI,cs.LG
Underwater Waste Detection Using Deep Learning A Performance Comparison of YOLOv7 to 10 and Faster RCNN
Underwater pollution is one of today's most significant environmental concerns, with vast volumes of garbage found in seas, rivers, and landscapes around the world. Accurate detection of these waste materials is crucial for successful waste management, environmental monitoring, and mitigation strategies. In this study, we investigated the performance of five cutting-edge object recognition algorithms, namely YOLO (You Only Look Once) models, including YOLOv7, YOLOv8, YOLOv9, YOLOv10, and Faster Region-Convolutional Neural Network (R-CNN), to identify which model was most effective at recognizing materials in underwater situations. The models were thoroughly trained and tested on a large dataset containing fifteen different classes under diverse conditions, such as low visibility and variable depths. From the above-mentioned models, YOLOv8 outperformed the others, with a mean Average Precision (mAP) of 80.9%, indicating a significant performance. This increased performance is attributed to YOLOv8's architecture, which incorporates advanced features such as improved anchor-free mechanisms and self-supervised learning, allowing for more precise and efficient recognition of items in a variety of settings. These findings highlight the YOLOv8 model's potential as an effective tool in the global fight against pollution, improving both the detection capabilities and scalability of underwater cleanup operations.
Updated: 2025-07-25 05:36:37
标题: 深度学习在水下垃圾检测中的应用:YOLOv7至10和Faster RCNN性能比较
摘要: 水下污染是当今最重要的环境问题之一,全球海洋、河流和陆地上发现了大量垃圾。准确检测这些废物对于成功的废物管理、环境监测和缓解策略至关重要。本研究调查了五种领先的物体识别算法的性能,分别是YOLO(You Only Look Once)模型,包括YOLOv7、YOLOv8、YOLOv9、YOLOv10和Faster Region-Convolutional Neural Network(R-CNN),以确定哪种模型在水下情况下最有效地识别材料。这些模型在包含十五种不同类别的大型数据集上经过彻底训练和测试,在低能见度和不同深度等多种条件下进行了测试。从上述模型中,YOLOv8表现优于其他模型,平均精度(mAP)为80.9%,表明有显著的性能提升。这种性能提升归因于YOLOv8的架构,该架构融入了改进的无锚机制和自监督学习等先进功能,使其能够更精确和高效地识别各种环境中的物品。这些发现突显了YOLOv8模型在全球抗污染斗争中作为有效工具的潜力,提高了水下清理行动的检测能力和可扩展性。
更新时间: 2025-07-25 05:36:37
领域: cs.CV,cs.AI,cs.LG
Improving Question Embeddings with Cognitive Representation Optimization for Knowledge Tracing
Designed to track changes in students' knowledge status and predict their future answers based on students' historical answer records. Current research on KT modeling focuses on predicting future student performance based on existing, unupdated records of student learning interactions. However, these methods ignore distractions in the response process (such as slipping and guessing) and ignore that static cognitive representations are temporary and limited. Most of them assume that there are no distractions during the answering process, and that the recorded representation fully represents the student's understanding and proficiency in knowledge. This can lead to many dissonant and uncoordinated issues in the original record. Therefore, we propose a knowledge-tracking cognitive representation optimization (CRO-KT) model that uses dynamic programming algorithms to optimize the structure of cognitive representation. This ensures that the structure matches the student's cognitive patterns in terms of practice difficulty. In addition, we use a synergistic optimization algorithm to optimize the cognitive representation of sub-target exercises based on the overall picture of exercise responses by considering all exercises with synergistic relationships as one goal. At the same time, the CRO-KT model integrates the relationship embedding learned in the dichotomous graph with the optimized record representation in a weighted manner, which enhances students' cognitive expression ability. Finally, experiments were conducted on three public datasets to verify the effectiveness of the proposed cognitive representation optimization model.
Updated: 2025-07-25 05:26:24
标题: 优化认知表示以改进知识追踪中的问题嵌入
摘要: 设计用于跟踪学生知识状态变化并根据学生历史答题记录预测其未来答案的模型。当前关于知识跟踪(KT)建模的研究主要集中在基于学生学习互动的现有但未更新记录来预测未来学生表现。然而,这些方法忽视了答题过程中的干扰(如失误和猜测),并且没有考虑到静态认知表示是暂时和有限的。大部分方法假设在答题过程中没有干扰,记录的表征完全代表学生的理解和知识熟练程度。这可能导致原始记录中许多不协调和不一致的问题。因此,我们提出了一种知识跟踪认知表示优化(CRO-KT)模型,该模型使用动态规划算法来优化认知表示的结构。这确保了结构与学生的认知模式在练习难度方面匹配。此外,我们使用协同优化算法来基于整体练习响应的情况来优化子目标练习的认知表示,考虑到所有具有协同关系的练习作为一个目标。同时,CRO-KT模型以加权方式将在二分图中学习到的关系嵌入与优化的记录表示相结合,以增强学生的认知表达能力。最后,我们在三个公共数据集上进行了实验,以验证所提出的认知表示优化模型的有效性。
更新时间: 2025-07-25 05:26:24
领域: cs.AI
Improving Question Embeddings with Cognitive Representation Optimization for Knowledge Tracing
Designed to track changes in students' knowledge status and predict their future answers based on students' historical answer records. Current research on KT modeling focuses on predicting future student performance based on existing, unupdated records of student learning interactions. However, these methods ignore distractions in the response process (such as slipping and guessing) and ignore that static cognitive representations are temporary and limited. Most of them assume that there are no distractions during the answering process, and that the recorded representation fully represents the student's understanding and proficiency in knowledge. This can lead to many dissonant and uncoordinated issues in the original record. Therefore, we propose a knowledge-tracking cognitive representation optimization (CRO-KT) model that uses dynamic programming algorithms to optimize the structure of cognitive representation. This ensures that the structure matches the student's cognitive patterns in terms of practice difficulty. In addition, we use a synergistic optimization algorithm to optimize the cognitive representation of sub-target exercises based on the overall picture of exercise responses by considering all exercises with synergistic relationships as one goal. At the same time, the CRO-KT model integrates the relationship embedding learned in the dichotomous graph with the optimized record representation in a weighted manner, which enhances students' cognitive expression ability. Finally, experiments were conducted on three public datasets to verify the effectiveness of the proposed cognitive representation optimization model.
Updated: 2025-07-25 05:26:24
标题: 使用认知表示优化改进问题嵌入以提高知识追踪
摘要: 这篇文献摘要旨在追踪学生知识状态的变化并基于学生历史答题记录预测他们未来的答案。目前关于知识跟踪(KT)建模的研究侧重于基于学生学习互动的现有未更新记录来预测未来学生表现。然而,这些方法忽略了回答过程中的干扰(如失误和猜测),也忽略了静态认知表征是暂时且有限的。大多数方法假设在回答过程中没有干扰,并且记录的表征完全代表学生对知识的理解和熟练程度。这可能导致原始记录中许多不协调和不一致的问题。因此,我们提出了一种使用动态规划算法优化认知表征结构的知识跟踪认知表征优化(CRO-KT)模型。这确保了表征结构与学生的认知模式在练习难度方面匹配。此外,我们使用协同优化算法基于整体练习响应图来优化子目标练习的认知表征,考虑所有具有协同关系的练习作为一个目标。同时,CRO-KT模型以加权方式将学习到的二元图中的关系嵌入与优化的记录表征整合在一起,增强学生的认知表达能力。最后,在三个公共数据集上进行了实验,验证了所提出的认知表征优化模型的有效性。
更新时间: 2025-07-25 05:26:24
领域: cs.AI
Extending Group Relative Policy Optimization to Continuous Control: A Theoretical Framework for Robotic Reinforcement Learning
Group Relative Policy Optimization (GRPO) has shown promise in discrete action spaces by eliminating value function dependencies through group-based advantage estimation. However, its application to continuous control remains unexplored, limiting its utility in robotics where continuous actions are essential. This paper presents a theoretical framework extending GRPO to continuous control environments, addressing challenges in high-dimensional action spaces, sparse rewards, and temporal dynamics. Our approach introduces trajectory-based policy clustering, state-aware advantage estimation, and regularized policy updates designed for robotic applications. We provide theoretical analysis of convergence properties and computational complexity, establishing a foundation for future empirical validation in robotic systems including locomotion and manipulation tasks.
Updated: 2025-07-25 05:25:40
标题: 将"Extending Group Relative Policy Optimization to Continuous Control: A Theoretical Framework for Robotic Reinforcement Learning"翻译为:将组相对策略优化扩展到连续控制:机器人强化学习的理论框架
摘要: Group Relative Policy Optimization (GRPO) 在离散动作空间中通过基于群体的优势估计消除价值函数的依赖关系,展现出了潜力。然而,其在连续控制方面的应用尚未被探索,这限制了其在机器人领域的实用性,因为连续动作是必不可少的。本文提出了一个理论框架,将GRPO扩展到连续控制环境,解决了高维动作空间、稀疏奖励和时间动态方面的挑战。我们的方法引入了基于轨迹的策略聚类、状态感知的优势估计和为机器人应用而设计的正则化策略更新。我们对收敛性和计算复杂性进行了理论分析,为未来在包括 locomotion 和 manipulation 任务在内的机器人系统中的经验验证奠定了基础。
更新时间: 2025-07-25 05:25:40
领域: cs.RO,cs.AI
Spike No More: Stabilizing the Pre-training of Large Language Models
Loss spikes often occur during pre-training of large language models. The spikes degrade the performance of large language models and sometimes ruin the pre-training. Since the pre-training needs a vast computational budget, we should avoid such spikes. Based on the assumption that the loss spike is caused by the sudden growth of the gradient norm, we explore factors to keep the gradient norm small through an analysis of the spectral norms of the Jacobian matrices for the sub-layers. Our findings suggest that stabilizing the pre-training process requires two conditions: small sub-layers and large shortcut. We conduct various experiments to empirically verify our theoretical analyses. Experimental results demonstrate that methods satisfying the conditions effectively prevent loss spikes during pre-training.
Updated: 2025-07-25 05:09:17
标题: 不再尖峰:稳定大型语言模型的预训练
摘要: Loss spikes在大型语言模型的预训练过程中经常发生。这些峰值会降低大型语言模型的性能,有时会破坏预训练过程。由于预训练需要巨大的计算预算,我们应该避免这种峰值。基于loss spike是由梯度范数突然增长引起的假设,我们通过分析子层的雅可比矩阵的谱范数来探索保持梯度范数小的因素。我们的研究结果表明,稳定预训练过程需要满足两个条件:小的子层和大的shortcut。我们进行了各种实验来经验性地验证我们的理论分析。实验结果表明,满足这些条件的方法有效地防止了预训练过程中的loss spikes。
更新时间: 2025-07-25 05:09:17
领域: cs.CL,cs.AI
Spike No More: Stabilizing the Pre-training of Large Language Models
Loss spikes often occur during pre-training of large language models. The spikes degrade the performance of large language models and sometimes ruin the pre-training. Since the pre-training needs a vast computational budget, we should avoid such spikes. Based on the assumption that the loss spike is caused by the sudden growth of the gradient norm, we explore factors to keep the gradient norm small through an analysis of the spectral norms of the Jacobian matrices for the sub-layers. Our findings suggest that stabilizing the pre-training process requires two conditions: small sub-layers and large shortcut. We conduct various experiments to empirically verify our theoretical analyses. Experimental results demonstrate that methods satisfying the conditions effectively prevent loss spikes during pre-training.
Updated: 2025-07-25 05:09:17
标题: 不再尖峰:稳定大型语言模型的预训练
摘要: Loss spikes often occur during pre-training of large language models. The spikes degrade the performance of large language models and sometimes ruin the pre-training. Since the pre-training needs a vast computational budget, we should avoid such spikes. Based on the assumption that the loss spike is caused by the sudden growth of the gradient norm, we explore factors to keep the gradient norm small through an analysis of the spectral norms of the Jacobian matrices for the sub-layers. Our findings suggest that stabilizing the pre-training process requires two conditions: small sub-layers and large shortcut. We conduct various experiments to empirically verify our theoretical analyses. Experimental results demonstrate that methods satisfying the conditions effectively prevent loss spikes during pre-training.
更新时间: 2025-07-25 05:09:17
领域: cs.CL,cs.AI
On exploration of an interior mirror descent flow for stochastic nonconvex constrained problem
We study a nonsmooth nonconvex optimization problem defined over nonconvex constraints, where the feasible set is given by the intersection of the closure of an open set and a smooth manifold. By endowing the open set with a Riemannian metric induced by a barrier function, we obtain a Riemannian subgradient flow formulated as a differential inclusion, which remains strictly within the interior of the feasible set. This continuous dynamical system unifies two classes of iterative optimization methods, namely the Hessian barrier method and mirror descent scheme, by revealing that these methods can be interpreted as discrete approximations of the continuous flow. We explore the long-term behavior of the trajectories generated by this dynamical system and show that the existing deficient convergence properties of the Hessian barrier and mirror descent scheme can be unifily and more insightfully interpreted through these of the continuous trajectory. For instance, the notorious spurious stationary points \cite{chen2024spurious} observed in Hessian barrier method and mirror descent scheme are interpreted as stable equilibria of the dynamical system that do not correspond to real stationary points of the original optimization problem. We provide two sufficient condition such that these spurious stationary points can be avoided if the strict complementarity conditions holds. In the absence of these regularity condition, we propose a random perturbation strategy that ensures the trajectory converges (subsequentially) to an approximate stationary point. Building on these insights, we introduce two iterative Riemannian subgradient methods, form of interior point methods, that generalizes the existing Hessian barrier method and mirror descent scheme for solving nonsmooth nonconvex optimization problems.
Updated: 2025-07-25 05:02:24
标题: 对随机非凸约束问题内部镜像下降流的探索
摘要: 我们研究了一个在非凸约束下定义的非光滑非凸优化问题,其中可行集由一个开集的闭包和一个光滑流形的交集给出。通过赋予开集一个由障碍函数诱导的黎曼度量,我们得到了一个黎曼次梯度流,其形式化为微分包含,并且始终严格保持在可行集的内部。这个连续动力系统将两类迭代优化方法统一起来,即海森障碍方法和镜面下降方案,通过揭示这些方法可以解释为连续流的离散近似。我们探讨了这个动力系统产生的轨迹的长期行为,并且表明海森障碍和镜面下降方案的现有不足收敛性质可以通过这些连续轨迹更洞察地解释。例如,海森障碍方法和镜面下降方案中观察到的臭名昭著的虚假稳定点\cite{chen2024spurious}被解释为动力系统的稳定平衡点,这些平衡点并不对应原始优化问题的真实稳定点。我们提供了两个充分条件,以确保如果严格互补条件成立,则可以避免这些虚假稳定点。在没有这些正则条件的情况下,我们提出了一种随机扰动策略,以确保轨迹(逐步)收敛到一个近似稳定点。基于这些见解,我们引入了两种迭代黎曼次梯度方法,形式为内点方法,这扩展了现有的海森障碍方法和镜面下降方案,用于解决非光滑非凸优化问题。
更新时间: 2025-07-25 05:02:24
领域: math.OC,cs.LG
Geometric Origins of Bias in Deep Neural Networks: A Human Visual System Perspective
Bias formation in deep neural networks (DNNs) remains a critical yet poorly understood challenge, influencing both fairness and reliability in artificial intelligence systems. Inspired by the human visual system, which decouples object manifolds through hierarchical processing to achieve object recognition, we propose a geometric analysis framework linking the geometric complexity of class-specific perceptual manifolds in DNNs to model bias. Our findings reveal that differences in geometric complexity can lead to varying recognition capabilities across categories, introducing biases. To support this analysis, we present the Perceptual-Manifold-Geometry library, designed for calculating the geometric properties of perceptual manifolds. The toolkit has been downloaded and installed over 4,500 times. This work provides a novel geometric perspective on bias formation in modern learning systems and lays a theoretical foundation for developing more equitable and robust artificial intelligence.
Updated: 2025-07-25 04:47:04
标题: 深度神经网络中偏见的几何起源:人类视觉系统的视角
摘要: 在深度神经网络(DNNs)中,偏见形成仍然是一个关键但理解不足的挑战,影响人工智能系统中的公平性和可靠性。受人类视觉系统的启发,通过分层处理解耦对象流形以实现对象识别,我们提出了一个几何分析框架,将DNNs中特定类别感知流形的几何复杂性与模型偏见联系起来。我们的研究发现,几何复杂性的差异可能导致不同类别之间的识别能力不同,引入偏见。为了支持这一分析,我们提出了Perceptual-Manifold-Geometry库,用于计算感知流形的几何属性。该工具包已被下载并安装超过4500次。这项工作为现代学习系统中偏见形成提供了一种新颖的几何视角,并为开发更公平和稳健的人工智能奠定了理论基础。
更新时间: 2025-07-25 04:47:04
领域: cs.CV,cs.AI
Geometric Origins of Bias in Deep Neural Networks: A Human Visual System Perspective
Bias formation in deep neural networks (DNNs) remains a critical yet poorly understood challenge, influencing both fairness and reliability in artificial intelligence systems. Inspired by the human visual system, which decouples object manifolds through hierarchical processing to achieve object recognition, we propose a geometric analysis framework linking the geometric complexity of class-specific perceptual manifolds in DNNs to model bias. Our findings reveal that differences in geometric complexity can lead to varying recognition capabilities across categories, introducing biases. To support this analysis, we present the Perceptual-Manifold-Geometry library, designed for calculating the geometric properties of perceptual manifolds. The toolkit has been downloaded and installed over 4,500 times. This work provides a novel geometric perspective on bias formation in modern learning systems and lays a theoretical foundation for developing more equitable and robust artificial intelligence.
Updated: 2025-07-25 04:47:04
标题: 深度神经网络中偏差的几何起源:从人类视觉系统的角度看
摘要: 深度神经网络(DNNs)中的偏见形成仍然是一个关键但不够理解的挑战,影响着人工智能系统中的公平性和可靠性。受人类视觉系统启发,通过分层处理将对象流形解耦以实现对象识别,我们提出了一个几何分析框架,将DNNs中类特定感知流形的几何复杂性与模型偏见联系起来。我们的研究结果表明,几何复杂性的差异可能导致不同类别之间的识别能力差异,从而引入偏见。为了支持这一分析,我们提出了Perceptual-Manifold-Geometry库,用于计算感知流形的几何特性。这个工具包已经被下载和安装了超过4,500次。这项工作提供了一个新颖的几何视角,探讨了现代学习系统中偏见形成的问题,并为发展更加公平和强大的人工智能奠定了理论基础。
更新时间: 2025-07-25 04:47:04
领域: cs.CV,cs.AI
MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model
Recent advances in natural language processing (NLP) have been driven bypretrained language models like BERT, RoBERTa, T5, and GPT. Thesemodels excel at understanding complex texts, but biomedical literature, withits domain-specific terminology, poses challenges that models likeWord2Vec and bidirectional long short-term memory (Bi-LSTM) can't fullyaddress. GPT and T5, despite capturing context, fall short in tasks needingbidirectional understanding, unlike BERT. Addressing this, we proposedMedicalBERT, a pretrained BERT model trained on a large biomedicaldataset and equipped with domain-specific vocabulary that enhances thecomprehension of biomedical terminology. MedicalBERT model is furtheroptimized and fine-tuned to address diverse tasks, including named entityrecognition, relation extraction, question answering, sentence similarity, anddocument classification. Performance metrics such as the F1-score,accuracy, and Pearson correlation are employed to showcase the efficiencyof our model in comparison to other BERT-based models such as BioBERT,SciBERT, and ClinicalBERT. MedicalBERT outperforms these models onmost of the benchmarks, and surpasses the general-purpose BERT model by5.67% on average across all the tasks evaluated respectively. This work alsounderscores the potential of leveraging pretrained BERT models for medicalNLP tasks, demonstrating the effectiveness of transfer learning techniques incapturing domain-specific information. (PDF) MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model. Available from: https://www.researchgate.net/publication/392489050_MedicalBERT_enhancing_biomedical_natural_language_processing_using_pretrained_BERT-based_model [accessed Jul 06 2025].
Updated: 2025-07-25 04:44:25
标题: MedicalBERT:利用预训练的基于BERT的模型增强生物医学自然语言处理
摘要: 最近自然语言处理(NLP)领域的最新进展受到了预训练语言模型(如BERT、RoBERTa、T5和GPT)的推动。这些模型擅长理解复杂文本,但生物医学文献中的领域特定术语给Word2Vec和双向长短期记忆(Bi-LSTM)等模型带来了挑战。与BERT不同,GPT和T5虽然捕捉了上下文,但在需要双向理解的任务中表现不佳。为此,我们提出了MedicalBERT,这是一个在大型生物医学数据集上训练的预训练BERT模型,配备了领域特定词汇,增强了对生物医学术语的理解。MedicalBERT模型进一步优化和微调,以解决命名实体识别、关系提取、问答、句子相似度和文档分类等多样化任务。我们使用F1分数、准确性和皮尔逊相关性等性能指标展示了我们的模型相对于其他基于BERT的模型(如BioBERT、SciBERT和ClinicalBERT)在效率上的优势。MedicalBERT在大多数基准测试中胜过这些模型,并在所有评估任务中平均超过通用BERT模型5.67%。这项工作还强调了利用预训练BERT模型进行医学NLP任务的潜力,展示了转移学习技术在捕捉领域特定信息方面的有效性。
更新时间: 2025-07-25 04:44:25
领域: cs.CL,cs.AI,cs.LG
MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model
Recent advances in natural language processing (NLP) have been driven bypretrained language models like BERT, RoBERTa, T5, and GPT. Thesemodels excel at understanding complex texts, but biomedical literature, withits domain-specific terminology, poses challenges that models likeWord2Vec and bidirectional long short-term memory (Bi-LSTM) can't fullyaddress. GPT and T5, despite capturing context, fall short in tasks needingbidirectional understanding, unlike BERT. Addressing this, we proposedMedicalBERT, a pretrained BERT model trained on a large biomedicaldataset and equipped with domain-specific vocabulary that enhances thecomprehension of biomedical terminology. MedicalBERT model is furtheroptimized and fine-tuned to address diverse tasks, including named entityrecognition, relation extraction, question answering, sentence similarity, anddocument classification. Performance metrics such as the F1-score,accuracy, and Pearson correlation are employed to showcase the efficiencyof our model in comparison to other BERT-based models such as BioBERT,SciBERT, and ClinicalBERT. MedicalBERT outperforms these models onmost of the benchmarks, and surpasses the general-purpose BERT model by5.67% on average across all the tasks evaluated respectively. This work alsounderscores the potential of leveraging pretrained BERT models for medicalNLP tasks, demonstrating the effectiveness of transfer learning techniques incapturing domain-specific information. (PDF) MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model. Available from: https://www.researchgate.net/publication/392489050_MedicalBERT_enhancing_biomedical_natural_language_processing_using_pretrained_BERT-based_model [accessed Jul 06 2025].
Updated: 2025-07-25 04:44:25
标题: MedicalBERT: 使用预训练的基于BERT模型的方法增强生物医学自然语言处理
摘要: 最近自然语言处理(NLP)领域的最新进展是由预训练语言模型(如BERT、RoBERTa、T5和GPT)推动的。这些模型擅长理解复杂文本,但生物医学文献具有领域特定术语,这给像Word2Vec和双向长短期记忆(Bi-LSTM)这样的模型带来了挑战。与BERT不同,尽管GPT和T5能够捕捉上下文,但在需要双向理解的任务上表现不佳。为了解决这一问题,我们提出了MedicalBERT,这是一个在大型生物医学数据集上训练的预训练BERT模型,配备了增强对生物医学术语理解的领域特定词汇表。MedicalBERT模型进一步优化和微调,以处理包括命名实体识别、关系提取、问题回答、句子相似度和文档分类在内的多样化任务。我们使用F1分数、准确度和皮尔逊相关性等性能指标来展示我们的模型相对于其他基于BERT的模型(如BioBERT、SciBERT和ClinicalBERT)在效率上的优势。MedicalBERT在大多数基准测试中表现优异,并在所有评估任务中平均超过通用BERT模型5.67%。这项工作还强调了利用预训练BERT模型进行医学NLP任务的潜力,展示了在捕捉领域特定信息方面,迁移学习技术的有效性。
更新时间: 2025-07-25 04:44:25
领域: cs.CL,cs.AI,cs.LG
From Cloud-Native to Trust-Native: A Protocol for Verifiable Multi-Agent Systems
As autonomous agents powered by large language models (LLMs) proliferate in high-stakes domains -- from pharmaceuticals to legal workflows -- the challenge is no longer just intelligence, but verifiability. We introduce TrustTrack, a protocol that embeds structural guarantees -- verifiable identity, policy commitments, and tamper-resistant behavioral logs -- directly into agent infrastructure. This enables a new systems paradigm: trust-native autonomy. By treating compliance as a design constraint rather than post-hoc oversight, TrustTrack reframes how intelligent agents operate across organizations and jurisdictions. We present the protocol design, system requirements, and use cases in regulated domains such as pharmaceutical R&D, legal automation, and AI-native collaboration. We argue that the Cloud -> AI -> Agent -> Trust transition represents the next architectural layer for autonomous systems.
Updated: 2025-07-25 04:38:38
标题: 从云原生到信任原生:一种用于可验证多智能体系统的协议
摘要: 随着由大型语言模型(LLMs)提供动力的自治代理在高风险领域广泛应用 - 从制药到法律工作流程 - 挑战不再仅仅是智能,而是可验证性。我们介绍了TrustTrack,这是一个协议,将结构性保证 - 可验证身份、政策承诺和防篡改的行为日志 - 直接嵌入到代理基础设施中。这使得一种新的系统范式成为可能:信任本地自治。通过将合规性视为设计约束而非事后监督,TrustTrack重新定义了智能代理在组织和司法管辖区域间的运作方式。我们提出了协议设计、系统要求和在受监管领域(如制药研发、法律自动化和AI原生协作)中的应用案例。我们认为,云-> AI -> 代理 -> 信任 过渡代表了自治系统的下一个架构层次。
更新时间: 2025-07-25 04:38:38
领域: cs.MA,cs.AI,cs.CR
TreeReader: A Hierarchical Academic Paper Reader Powered by Language Models
Efficiently navigating and understanding academic papers is crucial for scientific progress. Traditional linear formats like PDF and HTML can cause cognitive overload and obscure a paper's hierarchical structure, making it difficult to locate key information. While LLM-based chatbots offer summarization, they often lack nuanced understanding of specific sections, may produce unreliable information, and typically discard the document's navigational structure. Drawing insights from a formative study on academic reading practices, we introduce TreeReader, a novel language model-augmented paper reader. TreeReader decomposes papers into an interactive tree structure where each section is initially represented by an LLM-generated concise summary, with underlying details accessible on demand. This design allows users to quickly grasp core ideas, selectively explore sections of interest, and verify summaries against the source text. A user study was conducted to evaluate TreeReader's impact on reading efficiency and comprehension. TreeReader provides a more focused and efficient way to navigate and understand complex academic literature by bridging hierarchical summarization with interactive exploration.
Updated: 2025-07-25 04:31:09
标题: TreeReader:由语言模型驱动的层级学术论文阅读器
摘要: 高效地浏览和理解学术论文对于科学进步至关重要。传统的线性格式,如PDF和HTML,可能会导致认知负荷过大,并且遮蔽论文的层次结构,使关键信息难以定位。虽然基于LLM的聊天机器人提供了摘要,但它们通常缺乏对特定部分的微妙理解,可能会产生不可靠的信息,并且通常会丢弃文档的导航结构。借鉴学术阅读实践的形成性研究,我们介绍了TreeReader,一种新颖的语言模型增强型论文阅读器。TreeReader将论文分解为交互式树结构,其中每个部分最初由LLM生成的简洁摘要表示,底层细节可按需访问。这种设计允许用户快速掌握核心思想,选择性地探索感兴趣的部分,并根据源文本验证摘要。进行了用户研究以评估TreeReader对阅读效率和理解力的影响。TreeReader通过将层次摘要与交互式探索相结合,为导航和理解复杂学术文献提供了更加集中和高效的方式。
更新时间: 2025-07-25 04:31:09
领域: cs.HC,cs.AI,cs.CL
TreeReader: A Hierarchical Academic Paper Reader Powered by Language Models
Efficiently navigating and understanding academic papers is crucial for scientific progress. Traditional linear formats like PDF and HTML can cause cognitive overload and obscure a paper's hierarchical structure, making it difficult to locate key information. While LLM-based chatbots offer summarization, they often lack nuanced understanding of specific sections, may produce unreliable information, and typically discard the document's navigational structure. Drawing insights from a formative study on academic reading practices, we introduce TreeReader, a novel language model-augmented paper reader. TreeReader decomposes papers into an interactive tree structure where each section is initially represented by an LLM-generated concise summary, with underlying details accessible on demand. This design allows users to quickly grasp core ideas, selectively explore sections of interest, and verify summaries against the source text. A user study was conducted to evaluate TreeReader's impact on reading efficiency and comprehension. TreeReader provides a more focused and efficient way to navigate and understand complex academic literature by bridging hierarchical summarization with interactive exploration.
Updated: 2025-07-25 04:31:09
标题: TreeReader:由语言模型驱动的层次化学术论文阅读器
摘要: 高效地浏览和理解学术论文对科学进展至关重要。传统的线性格式如PDF和HTML可能会导致认知负荷过重,并掩盖论文的层次结构,使得定位关键信息变得困难。虽然基于LLM的聊天机器人提供了摘要,但它们经常缺乏对特定部分的微妙理解,可能会产生不可靠的信息,并且通常会丢弃文档的导航结构。借鉴学术阅读实践的形成性研究,我们介绍了TreeReader,一种新颖的语言模型增强的论文阅读器。TreeReader将论文分解为交互式树结构,其中每个部分最初由LLM生成的简洁摘要表示,底层详细信息可按需访问。这种设计使用户能够快速掌握核心思想,选择性地探索感兴趣的部分,并针对原始文本验证摘要。进行了用户研究以评估TreeReader对阅读效率和理解力的影响。TreeReader通过将层次总结与互动探索相结合,提供了一种更加专注和高效的方式来浏览和理解复杂的学术文献。
更新时间: 2025-07-25 04:31:09
领域: cs.HC,cs.AI,cs.CL
Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights
Fisher information matrices and neural tangent kernels (NTK) for 2-layer ReLU networks with random hidden weight are argued. We discuss the relation between both notions as a linear transformation and show that spectral decomposition of NTK with concrete forms of eigenfunctions with major eigenvalues. We also obtain an approximation formula of the functions presented by the 2-layer neural networks.
Updated: 2025-07-25 04:19:13
标题: 简单ReLU网络的神经切线核和Fisher信息矩阵与随机隐藏权重
摘要: 这篇文献摘要讨论了具有随机隐藏权重的2层ReLU网络的Fisher信息矩阵和神经切线核(NTK)。我们讨论了两者之间作为线性变换的关系,并展示了具有主要特征值的具体形式的特征函数的NTK的谱分解。我们还获得了由2层神经网络呈现的函数的近似公式。
更新时间: 2025-07-25 04:19:13
领域: cs.LG,stat.ML
Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights
Fisher information matrices and neural tangent kernels (NTK) for 2-layer ReLU networks with random hidden weight are argued. We discuss the relation between both notions as a linear transformation and show that spectral decomposition of NTK with concrete forms of eigenfunctions with major eigenvalues. We also obtain an approximation formula of the functions presented by the 2-layer neural networks.
Updated: 2025-07-25 04:19:13
标题: 神经切线核和简单ReLU网络具有随机隐藏权重的费舍尔信息矩阵
摘要: 我们研究了具有随机隐藏权重的2层ReLU网络的Fisher信息矩阵和神经切向核(NTK)。我们讨论了两者作为线性变换之间的关系,并展示了具有主要特征值的具体形式特征函数的NTK的谱分解。我们还得到了2层神经网络呈现的函数的近似公式。
更新时间: 2025-07-25 04:19:13
领域: cs.LG,stat.ML
CNN-based Surface Temperature Forecasts with Ensemble Numerical Weather Prediction over Medium-range Forecast Periods
This study proposes a method that integrates convolutional neural networks (CNNs) with ensemble numerical weather prediction (NWP) models, enabling surface temperature forecasting at lead times beyond the short-range (five-day) forecast period. Owing to limited computational resources, operational medium-range temperature forecasts typically rely on low-resolution NWP models, which are prone to systematic and random errors. To resolve these limitations, the proposed method first reduces systematic errors through CNN-based post-processing (bias correction and spatial super-resolution) on each ensemble member, reconstructing high-resolution temperature fields from low-resolution model outputs. Second, it reduces random errors through ensemble averaging of the CNN-corrected members. This study also investigates whether the sequence of CNN correction and ensemble averaging affects the forecast accuracy. For comparison with the proposed method, we additionally conducted experiments with the CNN trained on ensemble-averaged forecasts. The first approach--CNN correction before ensemble averaging--consistently achieved higher accuracy than the reverse approach. Although based on low-resolution ensemble forecasts, the proposed method notably outperformed the high-resolution deterministic NWP models. These findings indicate that combining CNN-based correction with ensemble averaging effectively reduces both the systematic and random errors in NWP model outputs. The proposed approach is a practical and scalable solution for improving medium-range temperature forecasts, and is particularly valuable at operational centers with limited computational resources.
Updated: 2025-07-25 04:19:05
标题: 使用集合数值天气预报的基于CNN的中长期表面温度预报
摘要: 这项研究提出了一种方法,将卷积神经网络(CNNs)与集成数值天气预报(NWP)模型相结合,实现超出短期(五天)预报期的表面温度预测。由于计算资源有限,运行中期温度预报通常依赖于低分辨率NWP模型,这些模型容易出现系统性和随机误差。为了解决这些限制,所提出的方法首先通过基于CNN的后处理(偏差校正和空间超分辨率)来降低系统性误差,对每个集成成员进行重建,从低分辨率模型输出中重建高分辨率温度场。其次,通过对CNN校正成员进行集成平均来减少随机误差。该研究还调查了CNN校正和集成平均的顺序是否影响了预测准确性。为了与所提出的方法进行比较,我们还进行了使用集成平均预报训练的CNN的实验。第一种方法——在集成平均之前进行CNN校正——始终比反向方法实现了更高的准确性。尽管基于低分辨率的集成预报,所提出的方法明显优于高分辨率的确定性NWP模型。这些发现表明,将基于CNN的校正与集成平均相结合有效地减少了NWP模型输出中的系统性和随机误差。所提出的方法是改善中期温度预报的实用且可扩展的解决方案,特别适用于计算资源有限的操作中心。
更新时间: 2025-07-25 04:19:05
领域: physics.ao-ph,cs.AI,cs.LG
CNN-based Surface Temperature Forecasts with Ensemble Numerical Weather Prediction over Medium-range Forecast Periods
This study proposes a method that integrates convolutional neural networks (CNNs) with ensemble numerical weather prediction (NWP) models, enabling surface temperature forecasting at lead times beyond the short-range (five-day) forecast period. Owing to limited computational resources, operational medium-range temperature forecasts typically rely on low-resolution NWP models, which are prone to systematic and random errors. To resolve these limitations, the proposed method first reduces systematic errors through CNN-based post-processing (bias correction and spatial super-resolution) on each ensemble member, reconstructing high-resolution temperature fields from low-resolution model outputs. Second, it reduces random errors through ensemble averaging of the CNN-corrected members. This study also investigates whether the sequence of CNN correction and ensemble averaging affects the forecast accuracy. For comparison with the proposed method, we additionally conducted experiments with the CNN trained on ensemble-averaged forecasts. The first approach--CNN correction before ensemble averaging--consistently achieved higher accuracy than the reverse approach. Although based on low-resolution ensemble forecasts, the proposed method notably outperformed the high-resolution deterministic NWP models. These findings indicate that combining CNN-based correction with ensemble averaging effectively reduces both the systematic and random errors in NWP model outputs. The proposed approach is a practical and scalable solution for improving medium-range temperature forecasts, and is particularly valuable at operational centers with limited computational resources.
Updated: 2025-07-25 04:19:05
标题: 基于CNN的集成数值天气预报在中期预报期间的地表温度预测
摘要: 本研究提出了一种方法,将卷积神经网络(CNN)与集合数值天气预报(NWP)模型相结合,实现了超过短期(五天)预报周期的地表温度预测。由于计算资源有限,操作性中期温度预报通常依赖于低分辨率NWP模型,这些模型容易出现系统性和随机误差。为了解决这些限制,提出的方法首先通过基于CNN的后处理(偏差校正和空间超分辨率)来降低每个集合成员的系统误差,从低分辨率模型输出中重建高分辨率温度场。其次,通过对CNN校正后的成员进行集合平均来降低随机误差。本研究还调查了CNN校正和集合平均的顺序是否影响了预报的准确性。为了与提出的方法进行比较,我们还进行了使用集合平均预测的CNN训练的实验。第一种方法--在集合平均之前进行CNN校正--始终比反向方法获得更高的准确性。尽管基于低分辨率集合预报,提出的方法明显优于高分辨率确定性NWP模型。这些发现表明,将基于CNN的校正与集合平均相结合有效地减少了NWP模型输出中的系统性和随机误差。提出的方法是改善中期温度预报的实际和可扩展的解决方案,特别适用于计算资源有限的操作中心。
更新时间: 2025-07-25 04:19:05
领域: physics.ao-ph,cs.AI,cs.LG
When Noisy Labels Meet Class Imbalance on Graphs: A Graph Augmentation Method with LLM and Pseudo Label
Class-imbalanced graph node classification is a practical yet underexplored research problem. Although recent studies have attempted to address this issue, they typically assume clean and reliable labels when processing class-imbalanced graphs. This assumption often violates the nature of real-world graphs, where labels frequently contain noise. Given this gap, this paper systematically investigates robust node classification for class-imbalanced graphs with noisy labels. We propose GraphALP, a novel Graph Augmentation framework based on Large language models (LLMs) and Pseudo-labeling techniques. Specifically, we design an LLM-based oversampling method to generate synthetic minority nodes, producing label-accurate minority nodes to alleviate class imbalance. Based on the class-balanced graphs, we develop a dynamically weighted pseudo-labeling method to obtain high-confidence pseudo labels to reduce label noise ratio. Additionally, we implement a secondary LLM-guided oversampling mechanism to mitigate potential class distribution skew caused by pseudo labels. Experimental results show that GraphALP achieves superior performance over state-of-the-art methods on class-imbalanced graphs with noisy labels.
Updated: 2025-07-25 04:04:58
标题: 当嘈杂标签遇上图中的类别不平衡:一种利用LLM和伪标签的图增强方法
摘要: 类不平衡的图节点分类是一个实际但尚未充分探讨的研究问题。尽管最近的研究尝试解决这个问题,但它们通常在处理类不平衡图时假设标签是干净可靠的。这种假设经常违反现实世界图的特性,其中标签经常包含噪音。鉴于这一差距,本文系统地研究了具有噪声标签的类不平衡图的鲁棒节点分类。我们提出了GraphALP,一种基于大型语言模型(LLMs)和伪标记技术的新颖图增强框架。具体而言,我们设计了一种基于LLM的过采样方法来生成合成的少数节点,产生准确标记的少数节点以减轻类不平衡。基于类平衡的图,我们开发了一种动态加权的伪标记方法,以获得高置信度的伪标签,从而降低标签噪声比率。此外,我们实施了一个次级LLM引导的过采样机制,以减轻由伪标签引起的潜在类分布偏斜。实验结果表明,GraphALP在具有噪声标签的类不平衡图上实现了比最先进方法更优越的性能。
更新时间: 2025-07-25 04:04:58
领域: cs.LG,cs.AI
Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters
Multilingual translation stands as a challenging task for large language models (LLMs) to handle intricate language patterns and stilted translations that arise in automated translations. In this paper, we introduce Seed-X, a family of open-source LLMs comprising instruct and reasoning models, pushing the limits of translation capability with 7B parameter size. The base model is pre-trained on a diverse, high-quality dataset encompassing both monolingual and bilingual content across 28 languages, harnessing the full potential of multilingual data. The instruct model is then finetuned to translate by Chain-of-Thought (CoT) reasoning and further enhanced through reinforcement learning (RL) to achieve better generalization across diverse language pairs. Seed-X achieves performance comparable to leading closed-source models, including Gemini-2.5 and GPT-4o, across 28 languages, and significantly outperforms larger open-source models in both automatic metrics and human evaluations. We share the best practices through our optimization process, and make the parameter public available for advancing translation research and applications.
Updated: 2025-07-25 03:46:56
标题: Seed-X:具有70亿参数的强大多语言翻译LLM的构建
摘要: 多语言翻译对于大型语言模型(LLMs)来说是一个具有挑战性的任务,因为它们需要处理复杂的语言模式和生硬的翻译,这些问题在自动翻译中经常出现。在本文中,我们介绍了Seed-X,这是一个由指导和推理模型组成的开源LLMs系列,通过拥有7B参数大小的模型来推动翻译能力的极限。基础模型在一个涵盖28种语言的多样化、高质量数据集上进行了预训练,包括单语和双语内容,充分利用了多语言数据的潜力。然后通过Chain-of-Thought (CoT)推理对指导模型进行微调,通过强化学习(RL)进一步提高了在不同语言对之间的泛化能力。Seed-X在28种语言中实现了与领先的闭源模型(包括Gemini-2.5和GPT-4o)相媲美的性能,并在自动度量和人工评估方面明显优于更大的开源模型。我们通过我们的优化过程分享了最佳实践,并公开了参数,以推动翻译研究和应用。
更新时间: 2025-07-25 03:46:56
领域: cs.CL,cs.AI
Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters
Multilingual translation stands as a challenging task for large language models (LLMs) to handle intricate language patterns and stilted translations that arise in automated translations. In this paper, we introduce Seed-X, a family of open-source LLMs comprising instruct and reasoning models, pushing the limits of translation capability with 7B parameter size. The base model is pre-trained on a diverse, high-quality dataset encompassing both monolingual and bilingual content across 28 languages, harnessing the full potential of multilingual data. The instruct model is then finetuned to translate by Chain-of-Thought (CoT) reasoning and further enhanced through reinforcement learning (RL) to achieve better generalization across diverse language pairs. Seed-X achieves performance comparable to leading closed-source models, including Gemini-2.5 and GPT-4o, across 28 languages, and significantly outperforms larger open-source models in both automatic metrics and human evaluations. We share the best practices through our optimization process, and make the parameter public available for advancing translation research and applications.
Updated: 2025-07-25 03:46:56
标题: Seed-X:使用7B参数构建强大的多语言翻译LLM
摘要: 多语言翻译对于大型语言模型(LLMs)来说是一个具有挑战性的任务,因为它们需要处理复杂的语言模式和生硬的翻译问题。本文介绍了Seed-X,这是一个由指导和推理模型组成的开源LLMs系列,拥有7B参数大小,推动了翻译能力的极限。基本模型在一个包含28种语言的多样化高质量数据集上进行了预训练,涵盖了单语和双语内容,充分利用了多语言数据的潜力。然后通过Chain-of-Thought (CoT)推理对指导模型进行微调,并通过强化学习(RL)进一步增强,以实现在不同语言对之间更好的泛化能力。Seed-X在28种语言上实现了与领先的闭源模型(包括Gemini-2.5和GPT-4o)可比的性能,并且在自动指标和人类评估方面明显优于更大的开源模型。我们通过优化过程分享了最佳实践,并将参数公开提供给推进翻译研究和应用的人士。
更新时间: 2025-07-25 03:46:56
领域: cs.CL,cs.AI
MGHFT: Multi-Granularity Hierarchical Fusion Transformer for Cross-Modal Sticker Emotion Recognition
Although pre-trained visual models with text have demonstrated strong capabilities in visual feature extraction, sticker emotion understanding remains challenging due to its reliance on multi-view information, such as background knowledge and stylistic cues. To address this, we propose a novel multi-granularity hierarchical fusion transformer (MGHFT), with a multi-view sticker interpreter based on Multimodal Large Language Models. Specifically, inspired by the human ability to interpret sticker emotions from multiple views, we first use Multimodal Large Language Models to interpret stickers by providing rich textual context via multi-view descriptions. Then, we design a hierarchical fusion strategy to fuse the textual context into visual understanding, which builds upon a pyramid visual transformer to extract both global and local sticker features at multiple stages. Through contrastive learning and attention mechanisms, textual features are injected at different stages of the visual backbone, enhancing the fusion of global- and local-granularity visual semantics with textual guidance. Finally, we introduce a text-guided fusion attention mechanism to effectively integrate the overall multimodal features, enhancing semantic understanding. Extensive experiments on 2 public sticker emotion datasets demonstrate that MGHFT significantly outperforms existing sticker emotion recognition approaches, achieving higher accuracy and more fine-grained emotion recognition. Compared to the best pre-trained visual models, our MGHFT also obtains an obvious improvement, 5.4% on F1 and 4.0% on accuracy. The code is released at https://github.com/cccccj-03/MGHFT_ACMMM2025.
Updated: 2025-07-25 03:42:26
标题: MGHFT:跨模态贴纸情感识别的多粒度层次融合Transformer
摘要: 尽管具有文本的预训练视觉模型已经展示了强大的视觉特征提取能力,但由于其依赖于背景知识和风格线索等多视角信息,贴纸情感理解仍然具有挑战性。为了解决这个问题,我们提出了一种新颖的多粒度层次融合变压器(MGHFT),其中包含基于多模态大型语言模型的多视图贴纸解释器。具体来说,受到人类从多个视角解释贴纸情感的能力的启发,我们首先使用多模态大型语言模型通过多视角描述提供丰富的文本背景来解释贴纸。然后,我们设计了一种层次融合策略,将文本背景融合到视觉理解中,这建立在金字塔视觉变压器之上,在多个阶段提取全局和局部贴纸特征。通过对比学习和注意机制,在视觉骨干的不同阶段注入文本特征,增强了全局和局部粒度视觉语义与文本引导的融合。最后,我们引入了一个文本引导的融合注意机制,有效整合了整体多模态特征,增强了语义理解。在2个公共贴纸情感数据集上进行的大量实验表明,MGHFT明显优于现有的贴纸情感识别方法,实现了更高的准确性和更细粒度的情感识别。与最佳的预训练视觉模型相比,我们的MGHFT在F1上提高了5.4%,准确性提高了4.0%。代码发布在https://github.com/cccccj-03/MGHFT_ACMMM2025。
更新时间: 2025-07-25 03:42:26
领域: cs.CV,cs.AI
MGHFT: Multi-Granularity Hierarchical Fusion Transformer for Cross-Modal Sticker Emotion Recognition
Although pre-trained visual models with text have demonstrated strong capabilities in visual feature extraction, sticker emotion understanding remains challenging due to its reliance on multi-view information, such as background knowledge and stylistic cues. To address this, we propose a novel multi-granularity hierarchical fusion transformer (MGHFT), with a multi-view sticker interpreter based on Multimodal Large Language Models. Specifically, inspired by the human ability to interpret sticker emotions from multiple views, we first use Multimodal Large Language Models to interpret stickers by providing rich textual context via multi-view descriptions. Then, we design a hierarchical fusion strategy to fuse the textual context into visual understanding, which builds upon a pyramid visual transformer to extract both global and local sticker features at multiple stages. Through contrastive learning and attention mechanisms, textual features are injected at different stages of the visual backbone, enhancing the fusion of global- and local-granularity visual semantics with textual guidance. Finally, we introduce a text-guided fusion attention mechanism to effectively integrate the overall multimodal features, enhancing semantic understanding. Extensive experiments on 2 public sticker emotion datasets demonstrate that MGHFT significantly outperforms existing sticker emotion recognition approaches, achieving higher accuracy and more fine-grained emotion recognition. Compared to the best pre-trained visual models, our MGHFT also obtains an obvious improvement, 5.4% on F1 and 4.0% on accuracy. The code is released at https://github.com/cccccj-03/MGHFT_ACMMM2025.
Updated: 2025-07-25 03:42:26
标题: MGHFT:跨模态贴纸情感识别的多粒度分层融合变压器
摘要: 尽管带有文本的预训练视觉模型展示了在视觉特征提取方面的强大能力,但由于其依赖于多视角信息(例如背景知识和风格线索),贴纸情感理解仍然具有挑战性。为了解决这个问题,我们提出了一种新颖的多粒度分层融合变压器(MGHFT),具有基于多模态大型语言模型的多视角贴纸解释器。具体来说,受到人类能够从多个视角解释贴纸情感的能力的启发,我们首先使用多模态大型语言模型通过多视角描述提供丰富的文本上下文来解释贴纸。然后,我们设计了一种分层融合策略,将文本上下文融合到视觉理解中,该策略建立在金字塔视觉变压器基础上,在多个阶段提取全局和局部贴纸特征。通过对比学习和注意机制,在视觉骨干的不同阶段注入文本特征,增强了全局和局部粒度视觉语义与文本引导的融合。最后,我们引入了一个文本引导的融合注意机制,有效整合了整体多模态特征,增强了语义理解。在两个公共贴纸情感数据集上进行的大量实验证明,MGHFT明显优于现有的贴纸情感识别方法,实现了更高的准确性和更精细的情感识别。与最佳预训练视觉模型相比,我们的MGHFT在F1上提高了5.4%,准确性提高了4.0%。代码发布在https://github.com/cccccj-03/MGHFT_ACMMM2025。
更新时间: 2025-07-25 03:42:26
领域: cs.CV,cs.AI
Improving Count-Mean Sketch as the Leading Locally Differentially Private Frequency Estimator for Large Dictionaries
This paper identifies that a group of latest locally-differentially-private (LDP) algorithms for frequency estimation, including all the Hadamard-matrix-based algorithms, are equivalent to the private Count-Mean Sketch (CMS) algorithm with different parameters. Therefore, we revisit the private CMS, correct errors in the original CMS paper regarding expectation and variance, modify the CMS implementation to eliminate existing bias, and optimize CMS using randomized response (RR) as the perturbation method. The optimized CMS with RR is shown to outperform CMS variants with other known perturbations in reducing the worst-case mean squared error (MSE), $l_1$ loss, and $l_2$ loss. Additionally, we prove that pairwise-independent hashing is sufficient for CMS, reducing its communication cost to the logarithm of the cardinality of all possible values (i.e., a dictionary). As a result, the optimized CMS with RR is proven theoretically and empirically as the leading algorithm for reducing the aforementioned loss functions when dealing with a very large dictionary. Furthermore, we demonstrate that randomness is necessary to ensure the correctness of CMS, and the communication cost of CMS, though low, is unavoidable despite the randomness being public or private.
Updated: 2025-07-25 03:40:10
标题: 改进计数均值草图作为大型字典中领先的局部差分私有频率估计器
摘要: 这篇论文指出,一组最新的本地差分隐私(LDP)算法用于频率估计,包括所有基于哈达玛矩阵的算法,等效于具有不同参数的私人计数平均草图(CMS)算法。因此,我们重新审视私人CMS,纠正原始CMS论文中关于期望和方差的错误,修改CMS实现以消除现有的偏差,并使用随机响应(RR)作为扰动方法优化CMS。经过优化的CMS与RR显示出比其他已知扰动的CMS变体在减少最坏情况均方误差(MSE)、$l_1$损失和$l_2$损失方面更好的性能。此外,我们证明成对独立哈希对于CMS是足够的,将其通信成本降低到所有可能值的对数(即字典的基数)。因此,经过优化的CMS与RR在理论上和实践中被证明是处理非常大字典时减少上述损失函数的领先算法。此外,我们展示了随机性对于确保CMS的正确性是必要的,而CMS的通信成本,尽管低,却是不可避免的,无论随机性是公开还是私有。
更新时间: 2025-07-25 03:40:10
领域: cs.CR
Geometric Multi-color Message Passing Graph Neural Networks for Blood-brain Barrier Permeability Prediction
Accurate prediction of blood-brain barrier permeability (BBBP) is essential for central nervous system (CNS) drug development. While graph neural networks (GNNs) have advanced molecular property prediction, they often rely on molecular topology and neglect the three-dimensional geometric information crucial for modeling transport mechanisms. This paper introduces the geometric multi-color message-passing graph neural network (GMC-MPNN), a novel framework that enhances standard message-passing architectures by explicitly incorporating atomic-level geometric features and long-range interactions. Our model constructs weighted colored subgraphs based on atom types to capture the spatial relationships and chemical context that govern BBB permeability. We evaluated GMC-MPNN on three benchmark datasets for both classification and regression tasks, using rigorous scaffold-based splitting to ensure a robust assessment of generalization. The results demonstrate that GMC-MPNN consistently outperforms existing state-of-the-art models, achieving superior performance in both classifying compounds as permeable/non-permeable (AUC-ROC of 0.9704 and 0.9685) and in regressing continuous permeability values (RMSE of 0.4609, Pearson correlation of 0.7759). An ablation study further quantified the impact of specific atom-pair interactions, revealing that the model's predictive power derives from its ability to learn from both common and rare, but chemically significant, functional motifs. By integrating spatial geometry into the graph representation, GMC-MPNN sets a new performance benchmark and offers a more accurate and generalizable tool for drug discovery pipelines.
Updated: 2025-07-25 03:38:46
标题: 几何多色信息传递图神经网络用于血脑屏障通透性预测
摘要: 准确预测血脑屏障通透性(BBBP)对于中枢神经系统(CNS)药物开发至关重要。虽然图神经网络(GNNs)已经推动了分子性质预测的发展,但它们通常依赖分子拓扑结构,并忽略了对于模拟传输机制至关重要的三维几何信息。本文介绍了几何多色信息传递图神经网络(GMC-MPNN),这是一个通过显式地整合原子级几何特征和远程相互作用来增强标准信息传递架构的新框架。我们的模型基于原子类型构建加权的彩色子图,以捕获决定BBB通透性的空间关系和化学环境。我们在三个基准数据集上评估了GMC-MPNN的分类和回归任务,使用严格的基于脚手架的拆分来确保对泛化性能的稳健评估。结果表明,GMC-MPNN始终优于现有的最先进模型,在将化合物分类为可渗透/不可渗透(AUC-ROC分别为0.9704和0.9685)和回归连续渗透性值(RMSE为0.4609,Pearson相关系数为0.7759)方面表现出色。消融研究进一步量化了特定原子对相互作用的影响,揭示了模型预测能力的来源,即它能够从常见和罕见但具有化学显著性的功能模式中学习。通过将空间几何信息整合到图表示中,GMC-MPNN设立了一个新的性能基准,并为药物发现流程提供了更准确和通用的工具。
更新时间: 2025-07-25 03:38:46
领域: cs.LG
WiSE-OD: Benchmarking Robustness in Infrared Object Detection
Object detection (OD) in infrared (IR) imagery is critical for low-light and nighttime applications. However, the scarcity of large-scale IR datasets forces models to rely on weights pre-trained on RGB images. While fine-tuning on IR improves accuracy, it often compromises robustness under distribution shifts due to the inherent modality gap between RGB and IR. To address this, we introduce LLVIP-C and FLIR-C, two cross-modality out-of-distribution (OOD) benchmarks built by applying corruption to standard IR datasets. Additionally, to fully leverage the complementary knowledge from RGB and infrared trained models, we propose WiSE-OD, a weight-space ensembling method with two variants: WiSE-OD$_{ZS}$, which combines RGB zero-shot and IR fine-tuned weights, and WiSE-OD$_{LP}$, which blends zero-shot and linear probing. Evaluated across three RGB-pretrained detectors and two robust baselines, WiSE-OD improves both cross-modality and corruption robustness without any additional training or inference cost.
Updated: 2025-07-25 03:33:50
标题: WiSE-OD:红外目标检测中鲁棒性的基准测试
摘要: 红外图像中的目标检测(OD)对于低光和夜间应用至关重要。然而,大规模红外数据集的稀缺性迫使模型依赖于在RGB图像上预先训练的权重。虽然在红外上微调可以提高准确性,但往往会因RGB和红外之间固有的模态差距而牺牲鲁棒性。为了解决这个问题,我们引入了LLVIP-C和FLIR-C,这是两个通过对标准红外数据集应用破坏来构建的跨模态的离域(OOD)基准。此外,为了充分利用RGB和红外训练模型的互补知识,我们提出了WiSE-OD,这是一种权重空间集成方法,有两种变体:WiSE-OD$_{ZS}$,它结合了RGB零样本和红外微调的权重,以及WiSE-OD$_{LP}$,它融合了零样本和线性探测。通过对三个RGB预训练检测器和两个鲁棒基线进行评估,WiSE-OD提高了跨模态和破坏鲁棒性,而无需额外的训练或推理成本。
更新时间: 2025-07-25 03:33:50
领域: cs.CV,cs.AI
WiSE-OD: Benchmarking Robustness in Infrared Object Detection
Object detection (OD) in infrared (IR) imagery is critical for low-light and nighttime applications. However, the scarcity of large-scale IR datasets forces models to rely on weights pre-trained on RGB images. While fine-tuning on IR improves accuracy, it often compromises robustness under distribution shifts due to the inherent modality gap between RGB and IR. To address this, we introduce LLVIP-C and FLIR-C, two cross-modality out-of-distribution (OOD) benchmarks built by applying corruption to standard IR datasets. Additionally, to fully leverage the complementary knowledge from RGB and infrared trained models, we propose WiSE-OD, a weight-space ensembling method with two variants: WiSE-OD$_{ZS}$, which combines RGB zero-shot and IR fine-tuned weights, and WiSE-OD$_{LP}$, which blends zero-shot and linear probing. Evaluated across three RGB-pretrained detectors and two robust baselines, WiSE-OD improves both cross-modality and corruption robustness without any additional training or inference cost.
Updated: 2025-07-25 03:33:50
标题: WiSE-OD:红外目标检测中鲁棒性的基准测试
摘要: 红外(IR)图像中的目标检测(OD)对于低光和夜间应用至关重要。然而,大规模IR数据集的稀缺性迫使模型依赖于在RGB图像上预先训练的权重。虽然在IR上微调可以提高准确性,但由于RGB和IR之间固有的模态差距,往往会损害在分布转移下的鲁棒性。为了解决这个问题,我们引入LLVIP-C和FLIR-C,这是两个通过对标准IR数据集应用损坏来构建的跨模态的离群(OOD)基准。此外,为了充分利用RGB和红外训练模型的互补知识,我们提出了WiSE-OD,这是一种权重空间集成方法,有两个变体:WiSE-OD$_{ZS}$,它结合了RGB零射和IR微调权重,以及WiSE-OD$_{LP}$,它混合了零射和线性探测。在三个RGB预训练检测器和两个稳健基线上进行评估,WiSE-OD提高了跨模态和损坏鲁棒性,而无需额外的训练或推断成本。
更新时间: 2025-07-25 03:33:50
领域: cs.CV,cs.AI
Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses
The diagnostic value of electrocardiogram (ECG) lies in its dynamic characteristics, ranging from rhythm fluctuations to subtle waveform deformations that evolve across time and frequency domains. However, supervised ECG models tend to overfit dominant and repetitive patterns, overlooking fine-grained but clinically critical cues, a phenomenon known as Simplicity Bias (SB), where models favor easily learnable signals over subtle but informative ones. In this work, we first empirically demonstrate the presence of SB in ECG analyses and its negative impact on diagnostic performance, while simultaneously discovering that self-supervised learning (SSL) can alleviate it, providing a promising direction for tackling the bias. Following the SSL paradigm, we propose a novel method comprising two key components: 1) Temporal-Frequency aware Filters to capture temporal-frequency features reflecting the dynamic characteristics of ECG signals, and 2) building on this, Multi-Grained Prototype Reconstruction for coarse and fine representation learning across dual domains, further mitigating SB. To advance SSL in ECG analyses, we curate a large-scale multi-site ECG dataset with 1.53 million recordings from over 300 clinical centers. Experiments on three downstream tasks across six ECG datasets demonstrate that our method effectively reduces SB and achieves state-of-the-art performance. Code and dataset will be released publicly.
Updated: 2025-07-25 03:25:33
标题: 探索心脏感知的掩盖自编码器:揭示ECG分析的简单性偏差
摘要: 心电图(ECG)的诊断价值在于其动态特性,范围从节律波动到随时间和频率领域演变的微妙波形变形。然而,监督学习的心电图模型往往会过度拟合主导性和重复性模式,忽视细粒度但临床关键的线索,这种现象被称为简单偏差(SB),模型更喜欢易学习的信号而不是微妙但信息丰富的信号。在这项工作中,我们首先经验性地证明了心电图分析中SB的存在及其对诊断性能的负面影响,同时发现自监督学习(SSL)可以缓解这种现象,为解决偏见提供了一个有前途的方向。遵循SSL范式,我们提出了一种包括两个关键组件的新方法:1)时间-频率感知滤波器,用于捕获反映心电信号动态特性的时间-频率特征,2)在此基础上构建多粒度原型重构,用于跨双域进行粗细表示学习,进一步减轻SB。为了推动SSL在心电图分析中的发展,我们精心策划了一个大规模的多中心心电图数据集,包括来自300多个临床中心的153万个记录。在六个心电图数据集上进行的三个下游任务实验表明,我们的方法有效地减少了SB,并实现了最先进的性能。代码和数据集将公开发布。
更新时间: 2025-07-25 03:25:33
领域: eess.SP,cs.AI,cs.LG
Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses
The diagnostic value of electrocardiogram (ECG) lies in its dynamic characteristics, ranging from rhythm fluctuations to subtle waveform deformations that evolve across time and frequency domains. However, supervised ECG models tend to overfit dominant and repetitive patterns, overlooking fine-grained but clinically critical cues, a phenomenon known as Simplicity Bias (SB), where models favor easily learnable signals over subtle but informative ones. In this work, we first empirically demonstrate the presence of SB in ECG analyses and its negative impact on diagnostic performance, while simultaneously discovering that self-supervised learning (SSL) can alleviate it, providing a promising direction for tackling the bias. Following the SSL paradigm, we propose a novel method comprising two key components: 1) Temporal-Frequency aware Filters to capture temporal-frequency features reflecting the dynamic characteristics of ECG signals, and 2) building on this, Multi-Grained Prototype Reconstruction for coarse and fine representation learning across dual domains, further mitigating SB. To advance SSL in ECG analyses, we curate a large-scale multi-site ECG dataset with 1.53 million recordings from over 300 clinical centers. Experiments on three downstream tasks across six ECG datasets demonstrate that our method effectively reduces SB and achieves state-of-the-art performance. Code and dataset will be released publicly.
Updated: 2025-07-25 03:25:33
标题: 感知心脏的掩蔽自动编码器:揭示心电图分析的简单偏差
摘要: 心电图(ECG)的诊断价值在于其动态特征,从节律波动到随时间和频率域演变的微妙波形变形。然而,监督学习的ECG模型往往会过度拟合主导性和重复性模式,忽视了细粒度但临床关键的线索,这种现象被称为简易偏差(SB),即模型偏好易于学习的信号而非微妙但信息丰富的信号。在这项工作中,我们首先从经验上证明了ECG分析中SB的存在及其对诊断性能的负面影响,同时发现自监督学习(SSL)可以缓解这种偏差,为解决偏差提供了一个有希望的方向。遵循SSL范式,我们提出了一个包含两个关键组件的新方法:1)时间频率感知滤波器,用于捕获反映心电信号动态特征的时间频率特征,以及2)在此基础上构建的多粒度原型重建,用于在双域中进行粗细表示学习,进一步减轻SB。为了推进ECG分析中的SSL,我们整理了一个包含来自300多个临床中心的153万条记录的大规模多站点ECG数据集。在六个ECG数据集上进行的三个下游任务的实验表明,我们的方法有效减轻了SB并实现了最先进的性能。代码和数据集将公开发布。
更新时间: 2025-07-25 03:25:33
领域: eess.SP,cs.AI,cs.LG
Uncovering Cross-Linguistic Disparities in LLMs using Sparse Autoencoders
Multilingual large language models (LLMs) exhibit strong cross-linguistic generalization, yet medium to low resource languages underperform on common benchmarks such as ARC-Challenge, MMLU, and HellaSwag. We analyze activation patterns in Gemma-2-2B across all 26 residual layers and 10 languages: Chinese (zh), Russian (ru), Spanish (es), Italian (it), medium to low resource languages including Indonesian (id), Catalan (ca), Marathi (mr), Malayalam (ml), and Hindi (hi), with English (en) as the reference. Using Sparse Autoencoders (SAEs), we reveal systematic disparities in activation patterns. Medium to low resource languages receive up to 26.27 percent lower activations in early layers, with a persistent gap of 19.89 percent in deeper layers. To address this, we apply activation-aware fine-tuning via Low-Rank Adaptation (LoRA), leading to substantial activation gains, such as 87.69 percent for Malayalam and 86.32 percent for Hindi, while maintaining English retention at approximately 91 percent. After fine-tuning, benchmark results show modest but consistent improvements, highlighting activation alignment as a key factor in enhancing multilingual LLM performance.
Updated: 2025-07-25 03:22:50
标题: 利用稀疏自动编码器揭示LLMs中的跨语言差异
摘要: 多语言大型语言模型(LLMs)表现出强大的跨语言泛化能力,然而中低资源语言在ARC-Challenge、MMLU和HellaSwag等常见基准测试中表现不佳。我们分析了Gemma-2-2B中所有26个残差层和10种语言(中文(zh)、俄语(ru)、西班牙语(es)、意大利语(it)、印度尼西亚语(id)、加泰罗尼亚语(ca)、马拉地语(mr)、马拉雅拉姆语(ml)和印地语(hi)等中低资源语言,以英语(en)为参考)的激活模式。通过使用稀疏自动编码器(SAEs),我们揭示了激活模式中的系统性差异。中低资源语言在早期层面的激活量最多比英语低26.27%,在深层次的激活量仍存在19.89%的差距。为了解决这个问题,我们通过低秩适应(LoRA)应用激活感知微调,实现了大幅度的激活增益,如马拉雅拉姆语增加了87.69%、印地语增加了86.32%,同时保持英语的保留率在约91%左右。微调后,基准测试结果显示了一些适度但一致的改进,突显了激活对齐作为提高多语言LLM性能的关键因素。
更新时间: 2025-07-25 03:22:50
领域: cs.CL,cs.AI
Uncovering Cross-Linguistic Disparities in LLMs using Sparse Autoencoders
Multilingual large language models (LLMs) exhibit strong cross-linguistic generalization, yet medium to low resource languages underperform on common benchmarks such as ARC-Challenge, MMLU, and HellaSwag. We analyze activation patterns in Gemma-2-2B across all 26 residual layers and 10 languages: Chinese (zh), Russian (ru), Spanish (es), Italian (it), medium to low resource languages including Indonesian (id), Catalan (ca), Marathi (mr), Malayalam (ml), and Hindi (hi), with English (en) as the reference. Using Sparse Autoencoders (SAEs), we reveal systematic disparities in activation patterns. Medium to low resource languages receive up to 26.27 percent lower activations in early layers, with a persistent gap of 19.89 percent in deeper layers. To address this, we apply activation-aware fine-tuning via Low-Rank Adaptation (LoRA), leading to substantial activation gains, such as 87.69 percent for Malayalam and 86.32 percent for Hindi, while maintaining English retention at approximately 91 percent. After fine-tuning, benchmark results show modest but consistent improvements, highlighting activation alignment as a key factor in enhancing multilingual LLM performance.
Updated: 2025-07-25 03:22:50
标题: 利用稀疏自动编码器揭示LLMs间的跨语言差异
摘要: 多语言大型语言模型(LLMs)展现出强大的跨语言泛化能力,然而中低资源语言在常见基准测试中表现不佳,如ARC-Challenge、MMLU和HellaSwag。我们分析了Gemma-2-2B在所有26个残差层和10种语言中的激活模式:中文(zh)、俄语(ru)、西班牙语(es)、意大利语(it)以及包括印尼语(id)、加泰罗尼亚语(ca)、马拉地语(mr)、马拉雅拉姆语(ml)和印地语(hi)在内的中低资源语言,以英语(en)作为参考。通过稀疏自编码器(SAEs),我们揭示了激活模式中的系统性差异。中低资源语言在早期层面接收到高达26.27%的较低激活量,在更深层次上保持19.89%的持续差距。为了解决这个问题,我们应用了激活感知微调技术,通过低秩适应(LoRA),实现了大幅度的激活增益,如马拉雅拉姆语提高了87.69%、印地语提高了86.32%,同时保持英语保持在大约91%左右。微调后,基准测试结果显示了适度但一致的改善,突显激活对齐作为提升多语言LLM性能的关键因素。
更新时间: 2025-07-25 03:22:50
领域: cs.CL,cs.AI
From Conditional to Unconditional Independence: Testing Conditional Independence via Transport Maps
Testing conditional independence between two random vectors given a third is a fundamental and challenging problem in statistics, particularly in multivariate nonparametric settings due to the complexity of conditional structures. We propose a novel method for testing conditional independence by transforming it to an unconditional independence test problem. We achieve this by constructing two transport maps that transform conditional independence into unconditional independence, this substantially simplifies the problem. These transport maps are estimated from data using conditional continuous normalizing flow models. Within this framework, we derive a test statistic and prove its asymptotic validity under both the null and alternative hypotheses. A permutation-based procedure is employed to evaluate the significance of the test. We validate the proposed method through extensive simulations and real-data analysis. Our numerical studies demonstrate the practical effectiveness of the proposed method for conditional independence
Updated: 2025-07-25 03:07:14
标题: 从条件独立到无条件独立:通过传输映射测试条件独立性
摘要: 在统计学中,测试在给定第三个随机向量的条件下两个随机向量之间的条件独立性是一个基础且具有挑战性的问题,特别是在多元非参数设置中,由于条件结构的复杂性。我们提出了一种新颖的方法,将条件独立性测试转化为无条件独立性测试问题。我们通过构建两个传输映射将条件独立性转化为无条件独立性,从而大大简化了问题。这些传输映射是从数据中使用条件连续标准化流模型估计得到的。在这个框架内,我们推导出一个检验统计量,并证明了在零假设和备择假设下其渐近有效性。采用基于排列的程序来评估检验的显著性。我们通过广泛的模拟和实际数据分析验证了提出的方法。我们的数值研究表明,所提出的方法在条件独立性方面具有实际有效性。
更新时间: 2025-07-25 03:07:14
领域: stat.ML,cs.LG,stat.ME,62G05, 62G08, 68T07
A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions
Retrieval-Augmented Generation (RAG) represents a major advancement in natural language processing (NLP), combining large language models (LLMs) with information retrieval systems to enhance factual grounding, accuracy, and contextual relevance. This paper presents a comprehensive systematic review of RAG, tracing its evolution from early developments in open domain question answering to recent state-of-the-art implementations across diverse applications. The review begins by outlining the motivations behind RAG, particularly its ability to mitigate hallucinations and outdated knowledge in parametric models. Core technical components-retrieval mechanisms, sequence-to-sequence generation models, and fusion strategies are examined in detail. A year-by-year analysis highlights key milestones and research trends, providing insight into RAG's rapid growth. The paper further explores the deployment of RAG in enterprise systems, addressing practical challenges related to retrieval of proprietary data, security, and scalability. A comparative evaluation of RAG implementations is conducted, benchmarking performance on retrieval accuracy, generation fluency, latency, and computational efficiency. Persistent challenges such as retrieval quality, privacy concerns, and integration overhead are critically assessed. Finally, the review highlights emerging solutions, including hybrid retrieval approaches, privacy-preserving techniques, optimized fusion strategies, and agentic RAG architectures. These innovations point toward a future of more reliable, efficient, and context-aware knowledge-intensive NLP systems.
Updated: 2025-07-25 03:05:46
标题: 一个关于关键检索增强生成(RAG)系统的系统性综述:进展、空白和未来方向
摘要: 检索增强生成(RAG)代表了自然语言处理(NLP)的重大进步,将大型语言模型(LLMs)与信息检索系统结合起来,以增强事实基础、准确性和语境相关性。本文对RAG进行了全面系统的审查,追溯了其从开放领域问答早期发展到最近在各种应用中的最新实施。审查首先概述了RAG背后的动机,特别是它在参数模型中减轻幻觉和过时知识的能力。详细研究了核心技术组件-检索机制、序列到序列生成模型和融合策略。通过逐年分析突出了关键里程碑和研究趋势,深入了解了RAG的快速增长。本文进一步探讨了RAG在企业系统中的部署,解决了与专有数据检索、安全性和可扩展性相关的实际挑战。进行了RAG实施的比较评估,对检索准确性、生成流畅性、延迟和计算效率进行了基准测试。对检索质量、隐私问题和集成开销等持续挑战进行了批判性评估。最后,审查突出了新兴解决方案,包括混合检索方法、隐私保护技术、优化融合策略和主观RAG架构。这些创新预示着未来更可靠、高效和具有上下文意识的知识密集型NLP系统。
更新时间: 2025-07-25 03:05:46
领域: cs.CL,cs.LG
Probably Approximately Correct Causal Discovery
The discovery of causal relationships is a foundational problem in artificial intelligence, statistics, epidemiology, economics, and beyond. While elegant theories exist for accurate causal discovery given infinite data, real-world applications are inherently resource-constrained. Effective methods for inferring causal relationships from observational data must perform well under finite data and time constraints, where "performing well" implies achieving high, though not perfect accuracy. In his seminal paper A Theory of the Learnable, Valiant highlighted the importance of resource constraints in supervised machine learning, introducing the concept of Probably Approximately Correct (PAC) learning as an alternative to exact learning. Inspired by Valiant's work, we propose the Probably Approximately Correct Causal (PACC) Discovery framework, which extends PAC learning principles to the causal field. This framework emphasizes both computational and sample efficiency for established causal methods such as propensity score techniques and instrumental variable approaches. Furthermore, we show that it can also provide theoretical guarantees for other widely used methods, such as the Self-Controlled Case Series (SCCS) method, which had previously lacked such guarantees.
Updated: 2025-07-25 02:51:15
标题: 可能近似正确的因果发现
摘要: 发现因果关系是人工智能、统计学、流行病学、经济学等领域的基础性问题。虽然存在针对无限数据的准确因果发现的优雅理论,但现实世界的应用本质上受到资源限制。从观测数据中推断因果关系的有效方法必须在有限数据和时间限制下表现良好,其中“表现良好”意味着达到高,尽管不完美的准确性。在他的开创性论文《可学习的理论》中,Valiant强调了监督机器学习中资源限制的重要性,引入了可能近似正确(PAC)学习作为精确学习的替代方法。受Valiant工作的启发,我们提出了可能近似正确因果(PACC)发现框架,将PAC学习原则扩展到因果领域。该框架强调已建立的因果方法(如倾向得分技术和工具变量方法)的计算和样本效率。此外,我们还展示它还可以为其他广泛使用的方法提供理论保证,比如之前缺乏这种保证的自控案例系列(SCCS)方法。
更新时间: 2025-07-25 02:51:15
领域: stat.ML,cs.LG
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
Controllable generative models for images and videos have seen significant success, yet 3D scene generation, especially in unbounded scenarios like autonomous driving, remains underdeveloped. Existing methods lack flexible controllability and often rely on dense view data collection in controlled environments, limiting their generalizability across common datasets (e.g., nuScenes). In this paper, we introduce MagicDrive3D, a novel framework for controllable 3D street scene generation that combines video-based view synthesis with 3D representation (3DGS) generation. It supports multi-condition control, including road maps, 3D objects, and text descriptions. Unlike previous approaches that require 3D representation before training, MagicDrive3D first trains a multi-view video generation model to synthesize diverse street views. This method utilizes routinely collected autonomous driving data, reducing data acquisition challenges and enriching 3D scene generation. In the 3DGS generation step, we introduce Fault-Tolerant Gaussian Splatting to address minor errors and use monocular depth for better initialization, alongside appearance modeling to manage exposure discrepancies across viewpoints. Experiments show that MagicDrive3D generates diverse, high-quality 3D driving scenes, supports any-view rendering, and enhances downstream tasks like BEV segmentation, demonstrating its potential for autonomous driving simulation and beyond.
Updated: 2025-07-25 02:48:16
标题: MagicDrive3D:街景中任意视角渲染的可控3D生成
摘要: 控制生成模型在图像和视频方面取得了显著的成功,然而在无界情景中,特别是像自动驾驶这样的场景生成仍然不够发达。现有方法缺乏灵活的可控性,并且通常依赖于在受控环境中进行密集视图数据收集,从而限制了它们在常见数据集(例如nuScenes)上的泛化能力。在本文中,我们介绍了MagicDrive3D,这是一个新颖的框架,用于可控的3D街景生成,它将基于视频的视图合成与3D表示(3DGS)生成相结合。它支持多条件控制,包括道路地图、3D对象和文本描述。与之前需要在训练之前进行3D表示的方法不同,MagicDrive3D首先训练一个多视角视频生成模型,合成多样化的街景视图。这种方法利用定期收集的自动驾驶数据,减少了数据获取挑战,并丰富了3D场景生成。在3DGS生成步骤中,我们引入了容错高斯飞溅技术来处理小错误,并利用单眼深度进行更好的初始化,同时通过外观建模来管理视点之间的曝光差异。实验证明,MagicDrive3D生成多样化、高质量的3D驾驶场景,支持任意视角渲染,并增强了BEV分割等下游任务,展示了它在自动驾驶模拟等领域的潜力。
更新时间: 2025-07-25 02:48:16
领域: cs.CV,cs.AI
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
Controllable generative models for images and videos have seen significant success, yet 3D scene generation, especially in unbounded scenarios like autonomous driving, remains underdeveloped. Existing methods lack flexible controllability and often rely on dense view data collection in controlled environments, limiting their generalizability across common datasets (e.g., nuScenes). In this paper, we introduce MagicDrive3D, a novel framework for controllable 3D street scene generation that combines video-based view synthesis with 3D representation (3DGS) generation. It supports multi-condition control, including road maps, 3D objects, and text descriptions. Unlike previous approaches that require 3D representation before training, MagicDrive3D first trains a multi-view video generation model to synthesize diverse street views. This method utilizes routinely collected autonomous driving data, reducing data acquisition challenges and enriching 3D scene generation. In the 3DGS generation step, we introduce Fault-Tolerant Gaussian Splatting to address minor errors and use monocular depth for better initialization, alongside appearance modeling to manage exposure discrepancies across viewpoints. Experiments show that MagicDrive3D generates diverse, high-quality 3D driving scenes, supports any-view rendering, and enhances downstream tasks like BEV segmentation, demonstrating its potential for autonomous driving simulation and beyond.
Updated: 2025-07-25 02:48:16
标题: MagicDrive3D:街景中可控的任意视角渲染的3D生成
摘要: 可以帮我翻译这个文献摘要吗?Controllable generative models for images and videos have seen significant success, yet 3D scene generation, especially in unbounded scenarios like autonomous driving, remains underdeveloped. Existing methods lack flexible controllability and often rely on dense view data collection in controlled environments, limiting their generalizability across common datasets (e.g., nuScenes). In this paper, we introduce MagicDrive3D, a novel framework for controllable 3D street scene generation that combines video-based view synthesis with 3D representation (3DGS) generation. It supports multi-condition control, including road maps, 3D objects, and text descriptions. Unlike previous approaches that require 3D representation before training, MagicDrive3D first trains a multi-view video generation model to synthesize diverse street views. This method utilizes routinely collected autonomous driving data, reducing data acquisition challenges and enriching 3D scene generation. In the 3DGS generation step, we introduce Fault-Tolerant Gaussian Splatting to address minor errors and use monocular depth for better initialization, alongside appearance modeling to manage exposure discrepancies across viewpoints. Experiments show that MagicDrive3D generates diverse, high-quality 3D driving scenes, supports any-view rendering, and enhances downstream tasks like BEV segmentation, demonstrating its potential for autonomous driving simulation and beyond.
更新时间: 2025-07-25 02:48:16
领域: cs.CV,cs.AI
HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling
Discrete speech tokenization is a fundamental component in speech codecs. However, in large-scale speech-to-speech systems, the complexity of parallel streams from multiple quantizers and the computational cost of high-time-dimensional codecs pose significant challenges. In this paper, we introduce HH-Codec, a neural codec that achieves extreme compression at 24 tokens per second for 24 kHz audio while relying on single-quantizer inference. Our approach involves a carefully designed Vector Quantization space for Spoken Language Modeling, optimizing compression efficiency while minimizing information loss. Building on this, we propose an asymmetric encoder-decoder architecture (Audio-VQ-Mel-Audio) that leverages dual supervision and progressive training to enhance reconstruction stability and fidelity. HH-Codec achieves state-of-the-art performance in speech reconstruction with an ultra-low bandwidth of 0.3 kbps. We further evaluate its effectiveness in codebook utilization and generative model adaptation, with extensive ablations validating the necessity of each module. HH-Codec is available at https://github.com/opendilab/HH-Codec.
Updated: 2025-07-25 02:44:30
标题: HH-Codec:高压缩高保真度离散神经编解码器用于口语建模
摘要: 离散语音标记是语音编解码器中的一个基本组成部分。然而,在大规模语音对语音系统中,来自多个量化器的并行流的复杂性以及高时间维度编解码器的计算成本带来了重大挑战。本文介绍了HH-Codec,这是一种神经编解码器,可以在每秒24个标记的情况下实现对24 kHz音频的极端压缩,同时依赖于单量化器推断。我们的方法涉及为口语建模设计的精心设计的矢量量化空间,优化了压缩效率同时最小化信息损失。在此基础上,我们提出了一种不对称的编码器-解码器架构(Audio-VQ-Mel-Audio),利用双重监督和渐进式训练来增强重建稳定性和保真度。HH-Codec在语音重建方面实现了最先进的性能,带宽极低为0.3 kbps。我们进一步评估了其在码书利用和生成模型适应性方面的有效性,通过大量消融验证了每个模块的必要性。HH-Codec可在https://github.com/opendilab/HH-Codec上获得。
更新时间: 2025-07-25 02:44:30
领域: cs.SD,cs.AI,eess.AS
HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling
Discrete speech tokenization is a fundamental component in speech codecs. However, in large-scale speech-to-speech systems, the complexity of parallel streams from multiple quantizers and the computational cost of high-time-dimensional codecs pose significant challenges. In this paper, we introduce HH-Codec, a neural codec that achieves extreme compression at 24 tokens per second for 24 kHz audio while relying on single-quantizer inference. Our approach involves a carefully designed Vector Quantization space for Spoken Language Modeling, optimizing compression efficiency while minimizing information loss. Building on this, we propose an asymmetric encoder-decoder architecture (Audio-VQ-Mel-Audio) that leverages dual supervision and progressive training to enhance reconstruction stability and fidelity. HH-Codec achieves state-of-the-art performance in speech reconstruction with an ultra-low bandwidth of 0.3 kbps. We further evaluate its effectiveness in codebook utilization and generative model adaptation, with extensive ablations validating the necessity of each module. HH-Codec is available at https://github.com/opendilab/HH-Codec.
Updated: 2025-07-25 02:44:30
标题: HH编解码器:用于口语建模的高压缩高保真离散神经编解码器
摘要: 离散语音分词是语音编解码器中的基本组成部分。然而,在大规模语音到语音系统中,来自多个量化器的并行流的复杂性和高时间维度编解码器的计算成本带来了重大挑战。在本文中,我们介绍了HH-Codec,这是一种神经编解码器,可以在每秒24个令牌的情况下实现对24 kHz音频的极端压缩,同时依赖于单量化器推断。我们的方法涉及为口语语言建模设计的精心设计的矢量量化空间,优化压缩效率同时最大限度地减少信息丢失。在此基础上,我们提出了一种利用双重监督和渐进训练来增强重建稳定性和保真度的不对称编码器-解码器架构(Audio-VQ-Mel-Audio)。HH-Codec在语音重建方面实现了最先进的性能,带宽极低,仅为0.3 kbps。我们进一步评估了它在码本利用和生成模型适应性方面的有效性,通过广泛的消融实验证实了每个模块的必要性。HH-Codec可在https://github.com/opendilab/HH-Codec获取。
更新时间: 2025-07-25 02:44:30
领域: cs.SD,cs.AI,eess.AS
PLEIADES: Building Temporal Kernels with Orthogonal Polynomials
We introduce a class of neural networks named PLEIADES (PoLynomial Expansion In Adaptive Distributed Event-based Systems), which contains temporal convolution kernels generated from orthogonal polynomial basis functions. We focus on interfacing these networks with event-based data to perform online spatiotemporal classification and detection with low latency. By virtue of using structured temporal kernels and event-based data, we have the freedom to vary the sample rate of the data along with the discretization step-size of the network without additional finetuning. We experimented with three event-based benchmarks and obtained state-of-the-art results on all three by large margins with significantly smaller memory and compute costs. We achieved: 1) 99.59% accuracy with 192K parameters on the DVS128 hand gesture recognition dataset and 100% with a small additional output filter; 2) 99.58% test accuracy with 277K parameters on the AIS 2024 eye tracking challenge; and 3) 0.556 mAP with 576k parameters on the PROPHESEE 1 Megapixel Automotive Detection Dataset.
Updated: 2025-07-25 02:20:03
标题: PLEIADES:使用正交多项式构建时间核
摘要: 我们介绍了一类名为PLEIADES(PoLynomial Expansion In Adaptive Distributed Event-based Systems)的神经网络,其中包含从正交多项式基函数生成的时间卷积核。我们专注于将这些网络与基于事件的数据接口化,以实现低延迟的在线时空分类和检测。凭借结构化的时间核和基于事件的数据,我们可以在不需要额外微调的情况下自由地改变数据的采样率以及网络的离散步长。我们在三个基于事件的基准测试上进行了实验,并在所有三个测试中取得了遥遥领先的最新成果,同时具有显著更小的内存和计算成本。我们取得了以下成就:1)在DVS128手势识别数据集上使用192K参数达到99.59%的准确率,并在添加少量额外输出滤波器后达到100%的准确率;2)在AIS 2024眼动追踪挑战上使用277K参数达到99.58%的测试准确率;以及3)在PROPHESEE 1 Megapixel汽车检测数据集上使用576k参数达到0.556的mAP。
更新时间: 2025-07-25 02:20:03
领域: cs.LG,cs.AI
PLEIADES: Building Temporal Kernels with Orthogonal Polynomials
We introduce a class of neural networks named PLEIADES (PoLynomial Expansion In Adaptive Distributed Event-based Systems), which contains temporal convolution kernels generated from orthogonal polynomial basis functions. We focus on interfacing these networks with event-based data to perform online spatiotemporal classification and detection with low latency. By virtue of using structured temporal kernels and event-based data, we have the freedom to vary the sample rate of the data along with the discretization step-size of the network without additional finetuning. We experimented with three event-based benchmarks and obtained state-of-the-art results on all three by large margins with significantly smaller memory and compute costs. We achieved: 1) 99.59% accuracy with 192K parameters on the DVS128 hand gesture recognition dataset and 100% with a small additional output filter; 2) 99.58% test accuracy with 277K parameters on the AIS 2024 eye tracking challenge; and 3) 0.556 mAP with 576k parameters on the PROPHESEE 1 Megapixel Automotive Detection Dataset.
Updated: 2025-07-25 02:20:03
标题: PLEIADES:使用正交多项式构建时间核
摘要: 我们介绍了一类名为PLEIADES(PoLynomial Expansion In Adaptive Distributed Event-based Systems)的神经网络,其中包含由正交多项式基函数生成的时间卷积核。我们专注于将这些网络与基于事件的数据接口化,以实现低延迟的在线时空分类和检测。由于使用结构化的时间核和基于事件的数据,我们可以在不需要额外微调的情况下自由地改变数据的采样率以及网络的离散步长。我们在三个基于事件的基准上进行了实验,并在所有三个基准上均取得了领先水平的结果,而且内存和计算成本显著较小。我们实现了:1)在DVS128手势识别数据集上,使用192K参数获得了99.59%的准确率,并且通过添加一个小型额外输出滤波器可达到100%的准确率;2)在AIS 2024眼动追踪挑战中,使用277K参数获得了99.58%的测试准确率;3)在PROPHESEE 1兆像素汽车检测数据集上,使用576k参数获得了0.556的mAP。
更新时间: 2025-07-25 02:20:03
领域: cs.LG,cs.AI
Bridging Quantum and Classical Computing in Drug Design: Architecture Principles for Improved Molecule Generation
Hybrid quantum-classical machine learning offers a path to leverage noisy intermediate-scale quantum (NISQ) devices for drug discovery, but optimal model architectures remain unclear. We systematically optimize the quantum-classical bridge architecture of generative adversarial networks (GANs) for molecule discovery using multi-objective Bayesian optimization. Our optimized model (BO-QGAN) significantly improves performance, achieving a 2.27-fold higher Drug Candidate Score (DCS) than prior quantum-hybrid benchmarks and 2.21-fold higher than the classical baseline, while reducing parameter count by more than 60%. Key findings favor layering multiple (3-4) shallow (4-8 qubit) quantum circuits sequentially, while classical architecture shows less sensitivity above a minimum capacity. This work provides the first empirically-grounded architectural guidelines for hybrid models, enabling more effective integration of current quantum computers into pharmaceutical research pipelines.
Updated: 2025-07-25 02:17:57
标题: 在药物设计中架起量子和经典计算的桥梁:用于改善分子生成的架构原则
摘要: 混合量子-经典机器学习为药物发现提供了利用嘈杂的中间规模量子(NISQ)设备的途径,但最佳模型架构仍不清楚。我们通过多目标贝叶斯优化系统地优化生成对抗网络(GANs)的量子-经典桥接架构,用于分子发现。我们优化的模型(BO-QGAN)显著改善了性能,实现了比以前的量子-混合基准值高2.27倍的药物候选分数(DCS),比经典基线高2.21倍,同时将参数数量减少了60%以上。关键发现支持连续堆叠多个(3-4个)浅层(4-8量子位)的量子电路,而经典架构在最小容量以上显示出较低的敏感性。这项工作为混合模型提供了第一个经验基础的架构指南,使当前量子计算机更有效地整合到制药研究流程中。
更新时间: 2025-07-25 02:17:57
领域: cs.LG,cs.AI,q-bio.BM
Bridging Quantum and Classical Computing in Drug Design: Architecture Principles for Improved Molecule Generation
Hybrid quantum-classical machine learning offers a path to leverage noisy intermediate-scale quantum (NISQ) devices for drug discovery, but optimal model architectures remain unclear. We systematically optimize the quantum-classical bridge architecture of generative adversarial networks (GANs) for molecule discovery using multi-objective Bayesian optimization. Our optimized model (BO-QGAN) significantly improves performance, achieving a 2.27-fold higher Drug Candidate Score (DCS) than prior quantum-hybrid benchmarks and 2.21-fold higher than the classical baseline, while reducing parameter count by more than 60%. Key findings favor layering multiple (3-4) shallow (4-8 qubit) quantum circuits sequentially, while classical architecture shows less sensitivity above a minimum capacity. This work provides the first empirically-grounded architectural guidelines for hybrid models, enabling more effective integration of current quantum computers into pharmaceutical research pipelines.
Updated: 2025-07-25 02:17:57
标题: 在药物设计中实现量子和经典计算的桥梁:用于改进分子生成的架构原则
摘要: 混合量子-经典机器学习为药物发现提供了一条利用噪声中等规模量子(NISQ)设备的途径,但最佳模型架构仍不清楚。我们系统地优化了生成对抗网络(GANs)的量子-经典桥接架构,使用多目标贝叶斯优化来进行分子发现。我们优化的模型(BO-QGAN)显著提高了性能,比之前的量子-混合基准模型的药物候选得分(DCS)高出2.27倍,比经典基准模型高出2.21倍,同时将参数数量减少了超过60%。关键发现支持按顺序堆叠多个(3-4个)浅层(4-8量子比特)量子电路,而经典架构在最小容量以上表现出较低的敏感性。这项工作为混合模型提供了第一个经验基础的架构指南,实现了当前量子计算机更有效地融入制药研究流程。
更新时间: 2025-07-25 02:17:57
领域: cs.LG,cs.AI,q-bio.BM
A Survey on State-of-the-art Deep Learning Applications and Challenges
Deep learning, a branch of artificial intelligence, is a data-driven method that uses multiple layers of interconnected units or neurons to learn intricate patterns and representations directly from raw input data. Empowered by this learning capability, it has become a powerful tool for solving complex problems and is the core driver of many groundbreaking technologies and innovations. Building a deep learning model is challenging due to the algorithm's complexity and the dynamic nature of real-world problems. Several studies have reviewed deep learning concepts and applications. However, the studies mostly focused on the types of deep learning models and convolutional neural network architectures, offering limited coverage of the state-of-the-art deep learning models and their applications in solving complex problems across different domains. Therefore, motivated by the limitations, this study aims to comprehensively review the state-of-the-art deep learning models in computer vision, natural language processing, time series analysis and pervasive computing, and robotics. We highlight the key features of the models and their effectiveness in solving the problems within each domain. Furthermore, this study presents the fundamentals of deep learning, various deep learning model types and prominent convolutional neural network architectures. Finally, challenges and future directions in deep learning research are discussed to offer a broader perspective for future researchers.
Updated: 2025-07-25 02:03:21
标题: 一份关于深度学习应用和挑战的调查
摘要: 深度学习是人工智能的一个分支,是一种数据驱动方法,使用多层相互连接的单元或神经元直接从原始输入数据中学习复杂的模式和表示。凭借这种学习能力,它已成为解决复杂问题的强大工具,并且是许多突破性技术和创新的核心驱动力。构建深度学习模型具有挑战性,因为算法的复杂性以及现实世界问题的动态性。一些研究已经回顾了深度学习的概念和应用。然而,这些研究主要集中在深度学习模型的类型和卷积神经网络架构上,对最新的深度学习模型及其在不同领域解决复杂问题的应用覆盖有限。因此,受到这些限制的激发,本研究旨在全面回顾计算机视觉、自然语言处理、时间序列分析和普适计算以及机器人领域的最新深度学习模型。我们强调模型的关键特点以及它们在每个领域内解决问题的有效性。此外,本研究提供深度学习的基础知识、各种深度学习模型类型和突出的卷积神经网络架构。最后,讨论深度学习研究中的挑战和未来方向,为未来研究人员提供更广阔的视角。
更新时间: 2025-07-25 02:03:21
领域: cs.LG
High Performance Space Debris Tracking in Complex Skylight Backgrounds with a Large-Scale Dataset
With the rapid development of space exploration, space debris has attracted more attention due to its potential extreme threat, leading to the need for real-time and accurate debris tracking. However, existing methods are mainly based on traditional signal processing, which cannot effectively process the complex background and dense space debris. In this paper, we propose a deep learning-based Space Debris Tracking Network~(SDT-Net) to achieve highly accurate debris tracking. SDT-Net effectively represents the feature of debris, enhancing the efficiency and stability of end-to-end model learning. To train and evaluate this model effectively, we also produce a large-scale dataset Space Debris Tracking Dataset (SDTD) by a novel observation-based data simulation scheme. SDTD contains 18,040 video sequences with a total of 62,562 frames and covers 250,000 synthetic space debris. Extensive experiments validate the effectiveness of our model and the challenging of our dataset. Furthermore, we test our model on real data from the Antarctic Station, achieving a MOTA score of 73.2%, which demonstrates its strong transferability to real-world scenarios. Our dataset and code will be released soon.
Updated: 2025-07-25 01:56:35
标题: 大规模数据集下复杂天空背景中高性能空间碎片跟踪
摘要: 随着太空探索的快速发展,太空碎片由于其潜在的极端威胁引起了更多关注,这导致了对实时和准确的碎片跟踪的需求。然而,现有的方法主要基于传统的信号处理,无法有效处理复杂的背景和密集的太空碎片。本文提出了一种基于深度学习的太空碎片跟踪网络(SDT-Net),以实现高度准确的碎片跟踪。SDT-Net有效地表征了碎片的特征,增强了端到端模型学习的效率和稳定性。为了有效训练和评估这个模型,我们还通过一种新颖的基于观测的数据模拟方案制作了一个大规模数据集太空碎片跟踪数据集(SDTD)。SDTD包含18,040个视频序列,共62,562帧,涵盖了250,000个合成太空碎片。大量实验证实了我们模型的有效性,以及我们数据集的挑战性。此外,我们在南极站的真实数据上测试了我们的模型,实现了73.2%的MOTA得分,展示了其在真实场景中的强大可转移性。我们的数据集和代码将很快发布。
更新时间: 2025-07-25 01:56:35
领域: cs.CV,cs.AI
High Performance Space Debris Tracking in Complex Skylight Backgrounds with a Large-Scale Dataset
With the rapid development of space exploration, space debris has attracted more attention due to its potential extreme threat, leading to the need for real-time and accurate debris tracking. However, existing methods are mainly based on traditional signal processing, which cannot effectively process the complex background and dense space debris. In this paper, we propose a deep learning-based Space Debris Tracking Network~(SDT-Net) to achieve highly accurate debris tracking. SDT-Net effectively represents the feature of debris, enhancing the efficiency and stability of end-to-end model learning. To train and evaluate this model effectively, we also produce a large-scale dataset Space Debris Tracking Dataset (SDTD) by a novel observation-based data simulation scheme. SDTD contains 18,040 video sequences with a total of 62,562 frames and covers 250,000 synthetic space debris. Extensive experiments validate the effectiveness of our model and the challenging of our dataset. Furthermore, we test our model on real data from the Antarctic Station, achieving a MOTA score of 73.2%, which demonstrates its strong transferability to real-world scenarios. Our dataset and code will be released soon.
Updated: 2025-07-25 01:56:35
标题: 高性能大规模数据集背景下复杂天空背景中的空间碎片跟踪
摘要: 随着太空探索的迅速发展,太空碎片因其潜在的极端威胁而受到更多关注,这导致了对实时和准确的碎片追踪的需求。然而,现有的方法主要基于传统的信号处理,无法有效处理复杂的背景和密集的太空碎片。在本文中,我们提出了一种基于深度学习的太空碎片追踪网络(SDT-Net),以实现高度准确的碎片追踪。SDT-Net有效地表示了碎片的特征,增强了端到端模型学习的效率和稳定性。为了有效地训练和评估这个模型,我们还通过一种新颖的基于观测的数据模拟方案生成了一个大规模数据集太空碎片追踪数据集(SDTD)。SDTD包含18,040个视频序列,总共62,562帧,涵盖25万个合成太空碎片。广泛的实验证实了我们模型的有效性和我们数据集的挑战性。此外,我们在南极站的真实数据上测试了我们的模型,取得了73.2%的MOTA分数,证明了其对真实世界场景的强大可转移性。我们的数据集和代码将很快发布。
更新时间: 2025-07-25 01:56:35
领域: cs.CV,cs.AI
Success in Humanoid Reinforcement Learning under Partial Observation
Reinforcement learning has been widely applied to robotic control, but effective policy learning under partial observability remains a major challenge, especially in high-dimensional tasks like humanoid locomotion. To date, no prior work has demonstrated stable training of humanoid policies with incomplete state information in the benchmark Gymnasium Humanoid-v4 environment. The objective in this environment is to walk forward as fast as possible without falling, with rewards provided for staying upright and moving forward, and penalties incurred for excessive actions and external contact forces. This research presents the first successful instance of learning under partial observability in this environment. The learned policy achieves performance comparable to state-of-the-art results with full state access, despite using only one-third to two-thirds of the original states. Moreover, the policy exhibits adaptability to robot properties, such as variations in body part masses. The key to this success is a novel history encoder that processes a fixed-length sequence of past observations in parallel. Integrated into a standard model-free algorithm, the encoder enables performance on par with fully observed baselines. We hypothesize that it reconstructs essential contextual information from recent observations, thereby enabling robust decision-making.
Updated: 2025-07-25 01:51:12
标题: 在部分观测条件下的人形机器人强化学习成功
摘要: 强化学习已被广泛应用于机器人控制,但在部分可观察性下有效地学习策略仍然是一个主要挑战,特别是在高维任务如人形 locomotion 中。迄今为止,在基准 Gymnasium Humanoid-v4 环境中尚未有先前的工作证明了在不完整状态信息下对人形策略的稳定训练。该环境中的目标是尽可能快地向前行走而不倒下,奖励给予保持直立和向前移动的行为,而对于过度动作和外部接触力则会受到惩罚。本研究展示了在这种环境中部分可观察性下学习的第一个成功实例。学习到的策略在性能上达到了与全状态访问下的最新结果相媲美,尽管仅使用了原始状态的三分之一到二分之一。此外,该策略表现出对机器人属性的适应性,如身体部分质量的变化。这一成功的关键在于一种新颖的历史编码器,它同时处理过去观察到的固定长度序列。集成到标准的无模型算法中,编码器使性能与完全观察到的基线相媲美。我们假设它可以从最近的观察中重建出关键的上下文信息,从而实现强大的决策制定。
更新时间: 2025-07-25 01:51:12
领域: cs.AI,cs.RO
Success in Humanoid Reinforcement Learning under Partial Observation
Reinforcement learning has been widely applied to robotic control, but effective policy learning under partial observability remains a major challenge, especially in high-dimensional tasks like humanoid locomotion. To date, no prior work has demonstrated stable training of humanoid policies with incomplete state information in the benchmark Gymnasium Humanoid-v4 environment. The objective in this environment is to walk forward as fast as possible without falling, with rewards provided for staying upright and moving forward, and penalties incurred for excessive actions and external contact forces. This research presents the first successful instance of learning under partial observability in this environment. The learned policy achieves performance comparable to state-of-the-art results with full state access, despite using only one-third to two-thirds of the original states. Moreover, the policy exhibits adaptability to robot properties, such as variations in body part masses. The key to this success is a novel history encoder that processes a fixed-length sequence of past observations in parallel. Integrated into a standard model-free algorithm, the encoder enables performance on par with fully observed baselines. We hypothesize that it reconstructs essential contextual information from recent observations, thereby enabling robust decision-making.
Updated: 2025-07-25 01:51:12
标题: 在部分观测下的人形机器人强化学习成功实现
摘要: 强化学习已广泛应用于机器人控制,但在部分可观测性下有效地学习策略仍然是一个重要挑战,尤其是在高维任务如人型机器人行走中。迄今为止,先前的工作尚未展示在基准Gymnasium Humanoid-v4环境中使用不完整状态信息进行人型策略训练的稳定性。该环境中的目标是尽可能快地向前行走而不跌倒,奖励是保持直立和向前移动,并对过度行动和外部接触力进行惩罚。本研究展示了在这种环境下部分可观测性学习的第一个成功实例。学习到的策略在性能上与完全观察到的基准结果相当,尽管只使用了原始状态的三分之一至三分之二。此外,该策略表现出对机器人属性的适应性,如身体部位质量的变化。这一成功的关键在于一种新颖的历史编码器,可以并行处理过去观察到的固定长度序列。将编码器集成到标准无模型算法中,使其能够达到与完全观察到的基线相当的性能。我们假设它可以从最近的观察中重建关键的上下文信息,从而实现稳健的决策制定。
更新时间: 2025-07-25 01:51:12
领域: cs.AI,cs.RO
A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges
AI-based Intelligent Tutoring Systems (ITS) have significant potential to transform teaching and learning. As efforts continue to design, develop, and integrate ITS into educational contexts, mixed results about their effectiveness have emerged. This paper provides a comprehensive review to understand how ITS operate in real educational settings and to identify the associated challenges in their application and evaluation. We use a systematic literature review method to analyze numerous qualified studies published from 2010 to 2025, examining domains such as pedagogical strategies, NLP, adaptive learning, student modeling, and domain-specific applications of ITS. The results reveal a complex landscape regarding the effectiveness of ITS, highlighting both advancements and persistent challenges. The study also identifies a need for greater scientific rigor in experimental design and data analysis. Based on these findings, suggestions for future research and practical implications are proposed.
Updated: 2025-07-25 01:43:07
标题: 人工智能智能辅导系统的综合评述:应用和挑战
摘要: 基于人工智能的智能辅导系统(ITS)具有显著的潜力改变教学和学习。随着努力继续设计、开发和将ITS整合到教育环境中,关于其有效性的混合结果已经出现。本文提供了一项全面审查,以了解ITS在实际教育环境中的运作方式,并识别其应用和评估中涉及的挑战。我们使用系统文献综述方法分析了从2010年到2025年发表的许多合格研究,研究领域包括教学策略、自然语言处理、自适应学习、学生建模以及ITS的领域特定应用。结果显示ITS的有效性存在复杂的格局,突显了进展和持续的挑战。研究还指出需要在实验设计和数据分析方面加强科学严谨性。基于这些发现,提出了未来研究和实际应用建议。
更新时间: 2025-07-25 01:43:07
领域: cs.IR,cs.AI,cs.ET
A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges
AI-based Intelligent Tutoring Systems (ITS) have significant potential to transform teaching and learning. As efforts continue to design, develop, and integrate ITS into educational contexts, mixed results about their effectiveness have emerged. This paper provides a comprehensive review to understand how ITS operate in real educational settings and to identify the associated challenges in their application and evaluation. We use a systematic literature review method to analyze numerous qualified studies published from 2010 to 2025, examining domains such as pedagogical strategies, NLP, adaptive learning, student modeling, and domain-specific applications of ITS. The results reveal a complex landscape regarding the effectiveness of ITS, highlighting both advancements and persistent challenges. The study also identifies a need for greater scientific rigor in experimental design and data analysis. Based on these findings, suggestions for future research and practical implications are proposed.
Updated: 2025-07-25 01:43:07
标题: 一个关于基于人工智能的智能辅导系统的全面评估:应用和挑战
摘要: 基于人工智能的智能辅导系统(ITS)具有显著的潜力来改变教学和学习。随着努力继续设计、开发和整合ITS到教育环境中,关于其有效性的混合结果已经出现。本文提供了一项全面的回顾,以了解ITS在真实教育环境中的运作方式,并识别其应用和评估中的相关挑战。我们采用系统文献综述方法,分析了从2010年到2025年发表的许多合格研究,研究领域包括教学策略、自然语言处理、自适应学习、学生建模和ITS的特定领域应用。研究结果揭示了ITS有效性方面的复杂格局,突出了进展和持续挑战。该研究还指出了对实验设计和数据分析的更严格的科学要求。基于这些发现,提出了未来研究和实际应用的建议。
更新时间: 2025-07-25 01:43:07
领域: cs.IR,cs.AI,cs.ET
Why Isn't Relational Learning Taking Over the World?
AI seems to be taking over the world with systems that model pixels, words, and phonemes. The world is arguably made up, not of pixels, words, and phonemes but of entities (objects, things, including events) with properties and relations among them. Surely we should model these, not the perception or description of them. You might suspect that concentrating on modeling words and pixels is because all of the (valuable) data in the world is in terms of text and images. If you look into almost any company you will find their most valuable data is in spreadsheets, databases and other relational formats. These are not the form that are studied in introductory machine learning, but are full of product numbers, student numbers, transaction numbers and other identifiers that can't be interpreted naively as numbers. The field that studies this sort of data has various names including relational learning, statistical relational AI, and many others. This paper explains why relational learning is not taking over the world -- except in a few cases with restricted relations -- and what needs to be done to bring it to it's rightful prominence.
Updated: 2025-07-25 01:25:51
标题: 为什么关系学习没有占据世界?
摘要: 人工智能似乎正在接管模拟像素、单词和音素的系统。世界可以说是由实体(物体、事物,包括事件)以及它们之间的属性和关系构成,而不是由像素、单词和音素构成。我们应该模拟这些实体,而不是它们的感知或描述。你可能会怀疑,专注于模拟单词和像素是因为世界上所有(有价值的)数据都是以文本和图像形式存在。如果你深入研究任何一家公司,你会发现它们最有价值的数据是以电子表格、数据库和其他关系格式存在的。这些不是在介绍性机器学习中研究的形式,而是充满了产品编号、学生编号、交易编号等无法简单解释为数字的标识符。研究这种数据的领域有各种名称,包括关系学习、统计关系人工智能等。本文解释了为什么关系学习并没有接管世界——除了在一些限制关系的情况下——以及需要做什么来使其得到应有的重视。
更新时间: 2025-07-25 01:25:51
领域: cs.AI,cs.DB,cs.LG
Why Isn't Relational Learning Taking Over the World?
AI seems to be taking over the world with systems that model pixels, words, and phonemes. The world is arguably made up, not of pixels, words, and phonemes but of entities (objects, things, including events) with properties and relations among them. Surely we should model these, not the perception or description of them. You might suspect that concentrating on modeling words and pixels is because all of the (valuable) data in the world is in terms of text and images. If you look into almost any company you will find their most valuable data is in spreadsheets, databases and other relational formats. These are not the form that are studied in introductory machine learning, but are full of product numbers, student numbers, transaction numbers and other identifiers that can't be interpreted naively as numbers. The field that studies this sort of data has various names including relational learning, statistical relational AI, and many others. This paper explains why relational learning is not taking over the world -- except in a few cases with restricted relations -- and what needs to be done to bring it to it's rightful prominence.
Updated: 2025-07-25 01:25:51
标题: 为什么关系学习没有占领世界?
摘要: 人工智能似乎正在使用模拟像素、单词和音素的系统来接管世界。可以说,世界不是由像素、单词和音素构成的,而是由具有属性和彼此之间关系的实体(对象、事物,包括事件)构成的。我们应该模拟这些实体,而不是它们的感知或描述。你可能会怀疑,集中精力模拟单词和像素是因为世界中所有(有价值的)数据都是以文本和图像的形式存在的。如果你调查几乎任何一家公司,你会发现它们最有价值的数据是以电子表格、数据库和其他关联格式存在的。这些数据并不是在初级机器学习中研究的形式,而是充满了产品编号、学生编号、交易编号及其他不能简单地解释为数字的标识符。研究这种数据的领域有各种名称,包括关联学习、统计关联人工智能等。本文解释了为什么关联学习并没有接管世界,除了少数情况下有限制的关系,以及需要做什么才能使其得到应有的重视。
更新时间: 2025-07-25 01:25:51
领域: cs.AI,cs.DB,cs.LG
VIBE: Video-Input Brain Encoder for fMRI Response Modeling
We present VIBE, a two-stage Transformer that fuses multi-modal video, audio, and text features to predict fMRI activity. Representations from open-source models (Qwen2.5, BEATs, Whisper, SlowFast, V-JEPA) are merged by a modality-fusion transformer and temporally decoded by a prediction transformer with rotary embeddings. Trained on 65 hours of movie data from the CNeuroMod dataset and ensembled across 20 seeds, VIBE attains mean parcel-wise Pearson correlations of 0.3225 on in-distribution Friends S07 and 0.2125 on six out-of-distribution films. An earlier iteration of the same architecture obtained 0.3198 and 0.2096, respectively, winning Phase-1 and placing second overall in the Algonauts 2025 Challenge.
Updated: 2025-07-25 01:14:19
标题: VIBE: 用于fMRI响应建模的视频输入脑编码器
摘要: 我们提出了VIBE,一个两阶段的Transformer,将多模态视频、音频和文本特征融合起来,以预测fMRI活动。来自开源模型(Qwen2.5,BEATs,Whisper,SlowFast,V-JEPA)的表示通过一个模态融合Transformer进行合并,然后通过一个带有旋转嵌入的预测Transformer进行时间解码。在来自CNeuroMod数据集的65小时电影数据上训练,并在20个种子之间进行集成,VIBE在分布内的《老友记》第7季上获得了0.3225的平均parcel-wise Pearson相关性,在六部分布外的电影上达到了0.2125。同一架构的早期版本分别获得了0.3198和0.2096,在Algonauts 2025挑战中获得第一阶段的胜利,并在总体排名中获得第二名。
更新时间: 2025-07-25 01:14:19
领域: cs.LG,cs.AI,cs.CV
Value-Based Deep RL Scales Predictably
Scaling data and compute is critical to the success of modern ML. However, scaling demands predictability: we want methods to not only perform well with more compute or data, but also have their performance be predictable from small-scale runs, without running the large-scale experiment. In this paper, we show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior. First, we show that data and compute requirements to attain a given performance level lie on a Pareto frontier, controlled by the updates-to-data (UTD) ratio. By estimating this frontier, we can predict this data requirement when given more compute, and this compute requirement when given more data. Second, we determine the optimal allocation of a total resource budget across data and compute for a given performance and use it to determine hyperparameters that maximize performance for a given budget. Third, this scaling is enabled by first estimating predictable relationships between hyperparameters, which is used to manage effects of overfitting and plasticity loss unique to RL. We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym, when extrapolating to higher levels of data, compute, budget, or performance.
Updated: 2025-07-25 01:10:02
标题: 基于价值观的深度强化学习可预测性扩展
摘要: 扩展数据和计算对于现代机器学习的成功至关重要。然而,扩展需求要求可预测性:我们希望方法不仅在更多计算或数据的情况下表现良好,而且还希望它们的性能可以从小规模运行中预测出来,而无需运行大规模实验。在本文中,我们展示了基于值的离线策略强化学习方法是可预测的,尽管社区传统认为它们存在病态行为。首先,我们展示了实现特定性能水平所需的数据和计算需求位于帕累托前沿上,受更新数据比(UTD)比率控制。通过估计这个前沿,我们可以预测在给定更多计算资源时的数据需求,以及在给定更多数据时的计算需求。其次,我们确定了在给定性能的情况下在数据和计算之间分配总资源预算的最佳方案,并利用它确定最大化给定预算下性能的超参数。第三,这种扩展是通过首先估计超参数之间的可预测关系实现的,这些关系用于管理强化学习中独特的过拟合和可塑性丢失效应。我们使用三种算法:SAC、BRO和PQL,在DeepMind Control、OpenAI gym和IsaacGym上验证了我们的方法,对数据、计算、预算或性能进行了推广。
更新时间: 2025-07-25 01:10:02
领域: cs.LG
A Neuroscience-Inspired Dual-Process Model of Compositional Generalization
Systematic compositional generalization - constructing and understanding novel combinations of known building blocks - remains a core challenge for AI systems. Human cognition achieves this flexibility via the interplay of the hippocampus (HPC) and prefrontal cortex (PFC): the hippocampus rapidly encodes episodes, and the prefrontal cortex consolidates them into reusable schemas for reasoning. Drawing on these insights, we present MIRAGE (Meta-Inference with Rules and Abstractions from Generalized Experience), a framework that achieves systematic generalization on compositional tasks. MIRAGE has two interacting modules mirroring the brain's deliberative HPC-PFC loop and intuitive neocortical pattern recognition. (1) The meta-trained Transformer Neural Decomposer, paralleling neocortical "System 1" computation, is trained on a task-agnostic stream of randomly sampled compositional grammars and applies one decomposition step per pass, with successive passes iteratively refining the sequence representation. (2) The Schema Engine, analogous to the HPC-PFC "System 2" loop, dynamically extracts, ranks, and applies reusable schemas, storing variable bindings in episodic memory and expanding them when needed. By explicitly equipping the Transformer component of MIRAGE with actively managed schematic structures, our model performs systematic compositional operations through explicit schema application and transformation, relying solely on frozen weights when solving entirely novel tasks. This approach demonstrates systematic compositional generalization on the SCAN benchmark, achieving > 99% accuracy on all task splits with only 1.19M parameters in the transformer module. Ablation studies confirm that MIRAGE's systematicity critically depends on the quality of extracted schemas and the model's iterative refinement process.
Updated: 2025-07-25 01:02:07
标题: 一个神经科学启发的组合泛化的双过程模型
摘要: 系统的组合泛化——构建和理解已知构件的新组合——仍然是人工智能系统的核心挑战。人类认知通过海马体(HPC)和前额叶皮层(PFC)的相互作用实现这种灵活性:海马体快速编码事件,前额叶皮层将其整合为可重复使用的推理模式。基于这些见解,我们提出了MIRAGE(从广义经验中提取规则和抽象的元推理)框架,该框架在组合任务上实现了系统性泛化。MIRAGE具有两个互动模块,镜像了大脑的推理HPC-PFC循环和直觉性的新皮层模式识别。第一,元训练的Transformer神经分解器,类似于新皮层“系统1”计算,训练在一个任务不可知的随机抽样的组合语法流上,并且每次传递应用一个分解步骤,连续传递迭代地改进序列表示。第二,模式引擎,类似于HPC-PFC“系统2”循环,动态提取、排名和应用可重复使用的模式,将变量绑定存储在情景记忆中,并在需要时扩展它们。通过明确为MIRAGE的Transformer组件配备主动管理的图表结构,我们的模型通过显式模式应用和转换执行系统的组合操作,在解决完全新任务时仅依赖于冻结权重。这种方法在SCAN基准测试中展示了系统的组合泛化,仅在变压器模块中使用1.19M参数即可在所有任务分割上实现> 99%的准确性。消融研究证实,MIRAGE的系统性关键取决于提取模式的质量和模型的迭代改进过程。
更新时间: 2025-07-25 01:02:07
领域: cs.AI,cs.NE
A Neuroscience-Inspired Dual-Process Model of Compositional Generalization
Systematic compositional generalization - constructing and understanding novel combinations of known building blocks - remains a core challenge for AI systems. Human cognition achieves this flexibility via the interplay of the hippocampus (HPC) and prefrontal cortex (PFC): the hippocampus rapidly encodes episodes, and the prefrontal cortex consolidates them into reusable schemas for reasoning. Drawing on these insights, we present MIRAGE (Meta-Inference with Rules and Abstractions from Generalized Experience), a framework that achieves systematic generalization on compositional tasks. MIRAGE has two interacting modules mirroring the brain's deliberative HPC-PFC loop and intuitive neocortical pattern recognition. (1) The meta-trained Transformer Neural Decomposer, paralleling neocortical "System 1" computation, is trained on a task-agnostic stream of randomly sampled compositional grammars and applies one decomposition step per pass, with successive passes iteratively refining the sequence representation. (2) The Schema Engine, analogous to the HPC-PFC "System 2" loop, dynamically extracts, ranks, and applies reusable schemas, storing variable bindings in episodic memory and expanding them when needed. By explicitly equipping the Transformer component of MIRAGE with actively managed schematic structures, our model performs systematic compositional operations through explicit schema application and transformation, relying solely on frozen weights when solving entirely novel tasks. This approach demonstrates systematic compositional generalization on the SCAN benchmark, achieving > 99% accuracy on all task splits with only 1.19M parameters in the transformer module. Ablation studies confirm that MIRAGE's systematicity critically depends on the quality of extracted schemas and the model's iterative refinement process.
Updated: 2025-07-25 01:02:07
标题: 受神经科学启发的组合概括的双过程模型
摘要: 系统化的构成概括——构建和理解已知构件的新组合——仍然是人工智能系统的核心挑战。人类认知通过海马体(HPC)和前额叶皮质(PFC)的相互作用实现了这种灵活性:海马体快速编码事件,前额叶皮质将其 consolida 到可重复使用的推理模式中。基于这些见解,我们提出了MIRAGE(从广义经验中提取规则和抽象的元推理),这是一个在构成任务上实现系统化概括的框架。MIRAGE有两个相互作用的模块,镜像大脑的审慎HPC-PFC循环和直觉的新皮层模式识别。(1)元训练的Transformer神经分解器,类似于新皮层的“系统1”计算,训练在一个任务不可知的随机采样的构成语法流上,并在每次传递中应用一个分解步骤,连续的传递迭代地完善序列表示。(2)模式引擎,类似于HPC-PFC的“系统2”循环,动态提取、排名和应用可重复使用的模式,在情节记忆中存储变量绑定,并在需要时扩展它们。通过明确地为MIRAGE的Transformer组件配备主动管理的原理结构,我们的模型通过明确的模式应用和转换进行系统化的构成操作,在解决完全新任务时仅依靠冻结的权重。这种方法在SCAN基准测试中展示了系统化的构成概括,在变压器模块中仅使用1.19M参数就实现了所有任务分割的> 99%的准确性。消融研究证实,MIRAGE的系统性关键取决于提取模式的质量和模型的迭代改进过程。
更新时间: 2025-07-25 01:02:07
领域: cs.AI,cs.NE
Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise
Efficient exploration in multi-agent reinforcement learning (MARL) is a challenging problem when receiving only a team reward, especially in environments with sparse rewards. A powerful method to mitigate this issue involves crafting dense individual rewards to guide the agents toward efficient exploration. However, individual rewards generally rely on manually engineered shaping-reward functions that lack high-order intelligence, thus it behaves ineffectively than humans regarding learning and generalization in complex problems. To tackle these issues, we combine the above two paradigms and propose a novel framework, LIGHT (Learning Individual Intrinsic reward via Incorporating Generalized Human experTise), which can integrate human knowledge into MARL algorithms in an end-to-end manner. LIGHT guides each agent to avoid unnecessary exploration by considering both individual action distribution and human expertise preference distribution. Then, LIGHT designs individual intrinsic rewards for each agent based on actionable representational transformation relevant to Q-learning so that the agents align their action preferences with the human expertise while maximizing the joint action value. Experimental results demonstrate the superiority of our method over representative baselines regarding performance and better knowledge reusability across different sparse-reward tasks on challenging scenarios.
Updated: 2025-07-25 00:59:10
标题: 通过整合广义人类专业知识在多智能体强化学习中学习个体内在奖励
摘要: 多智能体强化学习(MARL)中的高效探索是一个具有挑战性的问题,尤其是在只接收团队奖励的情况下,特别是在奖励稀疏的环境中。缓解这个问题的一个强大方法是制定密集的个体奖励,以引导智能体进行高效的探索。然而,个体奖励通常依赖于手工设计的形成奖励函数,缺乏高阶智能,因此在复杂问题的学习和泛化方面表现不如人类有效。为了解决这些问题,我们结合上述两种范式,提出了一个新的框架LIGHT(通过整合通用人类专业知识学习个体内在奖励),可以以端到端的方式将人类知识整合到MARL算法中。LIGHT通过考虑个体行动分布和人类专业知识偏好分布,引导每个智能体避免不必要的探索。然后,LIGHT基于与Q学习相关的可操作表征转换为每个智能体设计个体内在奖励,使智能体将其行动偏好与人类专业知识对齐,同时最大化联合行动价值。实验结果表明,我们的方法在性能和对具有挑战性场景上的不同稀疏奖励任务的知识可重用性方面优于代表性基线。
更新时间: 2025-07-25 00:59:10
领域: cs.LG,cs.AI,cs.MA
Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise
Efficient exploration in multi-agent reinforcement learning (MARL) is a challenging problem when receiving only a team reward, especially in environments with sparse rewards. A powerful method to mitigate this issue involves crafting dense individual rewards to guide the agents toward efficient exploration. However, individual rewards generally rely on manually engineered shaping-reward functions that lack high-order intelligence, thus it behaves ineffectively than humans regarding learning and generalization in complex problems. To tackle these issues, we combine the above two paradigms and propose a novel framework, LIGHT (Learning Individual Intrinsic reward via Incorporating Generalized Human experTise), which can integrate human knowledge into MARL algorithms in an end-to-end manner. LIGHT guides each agent to avoid unnecessary exploration by considering both individual action distribution and human expertise preference distribution. Then, LIGHT designs individual intrinsic rewards for each agent based on actionable representational transformation relevant to Q-learning so that the agents align their action preferences with the human expertise while maximizing the joint action value. Experimental results demonstrate the superiority of our method over representative baselines regarding performance and better knowledge reusability across different sparse-reward tasks on challenging scenarios.
Updated: 2025-07-25 00:59:10
标题: 通过结合广义人类专业知识,在多智能体强化学习中学习个体内在奖励
摘要: 在多智能体强化学习(MARL)中,高效的探索是一个具有挑战性的问题,特别是在稀疏奖励环境中只接收团队奖励时。缓解这个问题的一个有效方法涉及制定密集的个体奖励,以引导智能体进行高效探索。然而,个体奖励通常依赖于手工设计的塑造奖励函数,缺乏高阶智能,因此在学习和在复杂问题中的泛化方面表现不如人类有效。为了解决这些问题,我们将上述两种范式结合起来,提出了一个新颖的框架,LIGHT(通过融入广义人类专业知识学习个体内在奖励),可以以端到端的方式将人类知识整合到MARL算法中。LIGHT通过考虑个体行为分布和人类专业知识偏好分布,引导每个智能体避免不必要的探索。然后,LIGHT基于与Q-learning相关的可行性表示转换为每个智能体设计个体内在奖励,使智能体将其行为偏好与人类专业知识对齐,同时最大化联合行动价值。实验结果表明,我们的方法在表现和在具有挑战性情景中不同稀疏奖励任务上的更好知识可重复性方面优于代表性基线。
更新时间: 2025-07-25 00:59:10
领域: cs.LG,cs.AI,cs.MA
Estimation of conditional average treatment effects on distributed confidential data
The estimation of conditional average treatment effects (CATEs) is an important topic in many scientific fields. CATEs can be estimated with high accuracy if data distributed across multiple parties are centralized. However, it is difficult to aggregate such data owing to confidentiality or privacy concerns. To address this issue, we propose data collaboration double machine learning, a method for estimating CATE models using privacy-preserving fusion data constructed from distributed sources, and evaluate its performance through simulations. We make three main contributions. First, our method enables estimation and testing of semi-parametric CATE models without iterative communication on distributed data, providing robustness to model mis-specification compared to parametric approaches. Second, it enables collaborative estimation across different time points and parties by accumulating a knowledge base. Third, our method performs as well as or better than existing methods in simulations using synthetic, semi-synthetic, and real-world datasets.
Updated: 2025-07-25 00:50:45
标题: 对分布式机密数据上的条件平均治疗效应进行估计
摘要: 条件平均处理效应(CATEs)的估计在许多科学领域中是一个重要的话题。如果分布在多个方当中的数据被集中起来,CATEs的估计可以非常准确。然而,由于保密或隐私问题,聚合这样的数据是困难的。为了解决这个问题,我们提出了数据协作双机器学习,这是一种使用隐私保护融合数据从分布源构建的方法,用于估计CATE模型,并通过模拟评估其性能。我们的方法做出了三个主要贡献。首先,我们的方法能够估计和测试半参数CATE模型,而无需在分布数据上进行迭代通信,相比于参数方法,具有模型错误规范的鲁棒性。其次,它通过积累知识库实现了不同时间点和方当之间的协作估计。第三,我们的方法在使用合成、半合成和真实世界数据集进行模拟时表现得和或比现有方法更好。
更新时间: 2025-07-25 00:50:45
领域: stat.ME,cs.CR,cs.LG
Early Mortality Prediction in ICU Patients with Hypertensive Kidney Disease Using Interpretable Machine Learning
Background: Hypertensive kidney disease (HKD) patients in intensive care units (ICUs) face high short-term mortality, but tailored risk prediction tools are lacking. Early identification of high-risk individuals is crucial for clinical decision-making. Methods: We developed a machine learning framework to predict 30-day in-hospital mortality among ICU patients with HKD using early clinical data from the MIMIC-IV v2.2 database. A cohort of 1,366 adults was curated with strict criteria, excluding malignancy cases. Eighteen clinical features-including vital signs, labs, comorbidities, and therapies-were selected via random forest importance and mutual information filtering. Several models were trained and compared with stratified five-fold cross-validation; CatBoost demonstrated the best performance. Results: CatBoost achieved an AUROC of 0.88 on the independent test set, with sensitivity of 0.811 and specificity of 0.798. SHAP values and Accumulated Local Effects (ALE) plots showed the model relied on meaningful predictors such as altered consciousness, vasopressor use, and coagulation status. Additionally, the DREAM algorithm was integrated to estimate patient-specific posterior risk distributions, allowing clinicians to assess both predicted mortality and its uncertainty. Conclusions: We present an interpretable machine learning pipeline for early, real-time risk assessment in ICU patients with HKD. By combining high predictive performance with uncertainty quantification, our model supports individualized triage and transparent clinical decisions. This approach shows promise for clinical deployment and merits external validation in broader critical care populations.
Updated: 2025-07-25 00:48:23
标题: 可解释的机器学习在高血压肾病ICU患者早期死亡预测中的应用
摘要: 背景:在重症监护病房(ICU)中,患有高血压肾病(HKD)的患者面临着较高的短期死亡率,但缺乏定制的风险预测工具。早期识别高风险个体对临床决策至关重要。方法:我们开发了一个机器学习框架,利用来自MIMIC-IV v2.2数据库的早期临床数据,预测患有HKD的ICU患者在院内30天死亡率。通过严格的标准筛选出一个由1,366名成年人组成的队列,排除了恶性病例。通过随机森林重要性和互信息过滤选择了包括生命体征、实验室检查、合并症和治疗在内的18个临床特征。训练了几个模型,并通过分层五折交叉验证进行比较;CatBoost表现最佳。结果:CatBoost在独立测试集上实现了0.88的AUROC,灵敏度为0.811,特异度为0.798。SHAP值和累积局部效应(ALE)图显示,该模型依赖于意义重大的预测因子,如意识改变、血管紧张素使用和凝血状态。此外,集成了DREAM算法以估计患者特定的后验风险分布,使临床医生能够评估预测死亡率及其不确定性。结论:我们提出了一个可解释的机器学习管道,用于对患有HKD的ICU患者进行早期实时风险评估。通过结合高预测性能和不确定性量化,我们的模型支持个体化分诊和透明的临床决策。这种方法在临床部署中表现出潜力,并值得在更广泛的重症护理人群中进行外部验证。
更新时间: 2025-07-25 00:48:23
领域: cs.LG
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
Despite widespread adoption, the impact of AI tools on software development in the wild remains understudied. We conduct a randomized controlled trial (RCT) to understand how AI tools at the February-June 2025 frontier affect the productivity of experienced open-source developers. 16 developers with moderate AI experience complete 246 tasks in mature projects on which they have an average of 5 years of prior experience. Each task is randomly assigned to allow or disallow usage of early 2025 AI tools. When AI tools are allowed, developers primarily use Cursor Pro, a popular code editor, and Claude 3.5/3.7 Sonnet. Before starting tasks, developers forecast that allowing AI will reduce completion time by 24%. After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%--AI tooling slowed developers down. This slowdown also contradicts predictions from experts in economics (39% shorter) and ML (38% shorter). To understand this result, we collect and evaluate evidence for 20 properties of our setting that a priori could contribute to the observed slowdown effect--for example, the size and quality standards of projects, or prior developer experience with AI tooling. Although the influence of experimental artifacts cannot be entirely ruled out, the robustness of the slowdown effect across our analyses suggests it is unlikely to primarily be a function of our experimental design.
Updated: 2025-07-25 00:43:07
标题: 衡量2025年初期人工智能对有经验的开源开发人员生产力的影响
摘要: 尽管人工智能工具被广泛采用,但其对软件开发的影响仍然被低估。我们进行了一项随机对照试验(RCT),以了解2025年2月至6月的前沿人工智能工具如何影响有经验的开源开发人员的生产率。16名具有中等人工智能经验的开发人员在他们平均有5年前经验的成熟项目中完成了246项任务。每项任务随机分配,允许或禁止使用2025年初的人工智能工具。当允许使用人工智能工具时,开发人员主要使用 Cursor Pro,一款流行的代码编辑器,以及 Claude 3.5/3.7 Sonnet。在开始任务之前,开发人员预计允许使用人工智能将缩短完成时间24%。完成研究后,开发人员估计允许使用人工智能将完成时间缩短20%。令人惊讶的是,我们发现允许使用人工智能实际上会使完成时间延长19%——人工智能工具使开发人员变慢。这种减速也与经济学专家(缩短39%)和机器学习专家(缩短38%)的预测相矛盾。为了理解这一结果,我们收集和评估了我们环境中可能导致观察到的减速效果的20个属性的证据,例如项目的规模和质量标准,或开发人员之前使用人工智能工具的经验。尽管实验人为因素的影响无法完全排除,但我们分析中减速效果的稳健性表明,这不太可能主要是我们实验设计的功能。
更新时间: 2025-07-25 00:43:07
领域: cs.AI,cs.HC,cs.SE,I.2
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
Despite widespread adoption, the impact of AI tools on software development in the wild remains understudied. We conduct a randomized controlled trial (RCT) to understand how AI tools at the February-June 2025 frontier affect the productivity of experienced open-source developers. 16 developers with moderate AI experience complete 246 tasks in mature projects on which they have an average of 5 years of prior experience. Each task is randomly assigned to allow or disallow usage of early 2025 AI tools. When AI tools are allowed, developers primarily use Cursor Pro, a popular code editor, and Claude 3.5/3.7 Sonnet. Before starting tasks, developers forecast that allowing AI will reduce completion time by 24%. After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%--AI tooling slowed developers down. This slowdown also contradicts predictions from experts in economics (39% shorter) and ML (38% shorter). To understand this result, we collect and evaluate evidence for 20 properties of our setting that a priori could contribute to the observed slowdown effect--for example, the size and quality standards of projects, or prior developer experience with AI tooling. Although the influence of experimental artifacts cannot be entirely ruled out, the robustness of the slowdown effect across our analyses suggests it is unlikely to primarily be a function of our experimental design.
Updated: 2025-07-25 00:43:07
标题: 衡量2025年初期人工智能对有经验的开源开发者生产力的影响
摘要: 尽管AI工具被广泛采用,但AI工具对野外软件开发的影响仍未得到充分研究。我们进行了一项随机对照试验(RCT),以了解2025年2月至6月AI工具对有经验的开源开发者生产力的影响。16名具有中等AI经验的开发者在具有平均5年先前经验的成熟项目中完成了246个任务。每个任务都被随机分配,以允许或禁止使用早期的2025年AI工具。当允许使用AI工具时,开发者主要使用Cursor Pro,这是一款流行的代码编辑器,以及Claude 3.5/3.7 Sonnet。在开始任务之前,开发者预测允许AI将减少完成时间24%。完成研究后,开发者估计允许AI将减少完成时间20%。令人惊讶的是,我们发现允许AI实际上会增加完成时间19%——AI工具会使开发者变慢。这种放缓也与经济学专家(减少39%)和机器学习专家(减少38%)的预测相矛盾。为了理解这一结果,我们收集并评估了我们设置的20个属性的证据,这些属性在先验情况下可能会导致观察到的放缓效果,例如项目的规模和质量标准,或者开发者对AI工具的先前经验。虽然实验人为因素的影响无法完全排除,但我们分析结果中放缓效果的稳健性表明,这不太可能主要是我们实验设计的功能。
更新时间: 2025-07-25 00:43:07
领域: cs.AI,cs.HC,cs.SE,I.2
Weak-to-Strong Generalization with Failure Trajectories: A Tree-based Approach to Elicit Optimal Policy in Strong Models
Weak-to-Strong generalization (W2SG) is a new trend to elicit the full capabilities of a strong model with supervision from a weak model. While existing W2SG studies focus on simple tasks like binary classification, we extend this paradigm to complex interactive decision-making environments. Specifically, we fine-tune a strong model with trajectories of intermediate actions generated by a weak model. Motivated by the human learning process, we propose to generalize not only success knowledge but also failure experience so that the strong model can learn from failed trajectories accumulated by weak models. To effectively and efficiently elicit the potential of strong agents, we further construct ``trajectory trees," a hierarchical representation that organizes weak model-generated action trajectories, coupled with Monte Carlo Tree Search (MCTS) to optimize the strong model. Through theoretical analysis, we provide formal guarantees for the effectiveness of our method in improving W2SG performance. Our empirical evaluations demonstrate substantial improvements in reasoning and decision-making capabilities across diverse task domains, validating the scalability and robustness of our proposed framework. Our code is available at: https://github.com/yeruimeng/TraTree
Updated: 2025-07-25 00:17:09
标题: 弱到强泛化与失败轨迹:一种基于树的方法来引导强模型中的最优策略
摘要: 弱到强泛化(W2SG)是一种新趋势,通过弱模型的监督来激发强模型的全部能力。虽然现有的W2SG研究集中在简单任务如二元分类,我们将这种范式扩展到复杂的互动决策环境。具体地,我们用弱模型生成的中间动作轨迹对强模型进行微调。受人类学习过程的启发,我们提出泛化不仅成功知识,还包括失败经验,使得强模型能够从弱模型累积的失败轨迹中学习。为了有效和高效地激发强代理的潜力,我们进一步构建了“轨迹树”,这是一个层次表示,整理了弱模型生成的动作轨迹,结合蒙特卡洛树搜索(MCTS)来优化强模型。通过理论分析,我们为我们的方法在改进W2SG性能方面的有效性提供了正式保证。我们的实证评估显示,在各种任务领域中,推理和决策能力都有显著改进,验证了我们提出的框架的可扩展性和稳健性。我们的代码可在以下链接找到:https://github.com/yeruimeng/TraTree
更新时间: 2025-07-25 00:17:09
领域: cs.LG
PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning
Retrieval-augmented generation (RAG) often falls short when retrieved context includes confusing semi-relevant passages, or when answering questions require deep contextual understanding and reasoning. We propose an efficient fine-tuning framework, called PrismRAG, that (i) trains the model with distractor-aware QA pairs mixing gold evidence with subtle distractor passages, and (ii) instills reasoning-centric habits that make the LLM plan, rationalize, and synthesize without relying on extensive human engineered instructions. Evaluated across 12 open-book RAG QA benchmarks spanning diverse application domains and scenarios, PrismRAG improves average factuality by 5.4%, outperforming state-of-the-art solutions.
Updated: 2025-07-25 00:15:31
标题: PrismRAG:通过干扰物弹性和策略性推理提升RAG的真实性
摘要: 检索增强生成(RAG)通常在检索到的上下文包含混乱的半相关段落时表现不佳,或者在回答问题需要深入的语境理解和推理时表现不佳。我们提出了一种高效的微调框架,称为PrismRAG,它(i)使用混合了黄金证据和微妙的干扰段落的干扰器感知QA对模型进行训练,并(ii)灌输以推理为中心的习惯,使LLM能够规划、合理化和综合,而不依赖于大量的人工设计指令。在涵盖不同应用领域和场景的12个开放书籍RAG QA基准测试中进行评估,PrismRAG的平均真实性提高了5.4%,胜过了最先进的解决方案。
更新时间: 2025-07-25 00:15:31
领域: cs.CL,cs.AI,cs.LG
PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning
Retrieval-augmented generation (RAG) often falls short when retrieved context includes confusing semi-relevant passages, or when answering questions require deep contextual understanding and reasoning. We propose an efficient fine-tuning framework, called PrismRAG, that (i) trains the model with distractor-aware QA pairs mixing gold evidence with subtle distractor passages, and (ii) instills reasoning-centric habits that make the LLM plan, rationalize, and synthesize without relying on extensive human engineered instructions. Evaluated across 12 open-book RAG QA benchmarks spanning diverse application domains and scenarios, PrismRAG improves average factuality by 5.4%, outperforming state-of-the-art solutions.
Updated: 2025-07-25 00:15:31
标题: PrismRAG: 通过分散耐性和策略性推理提升RAG事实性
摘要: 检索增强生成(RAG)通常在检索到的上下文包含令人困惑的半相关段落或需要深入上下文理解和推理回答问题时效果不佳。我们提出了一个高效的微调框架,称为PrismRAG,该框架(i)使用混合了金标证据和微妙干扰者段落的干扰者感知问答对来训练模型,(ii)灌输以推理为中心的习惯,使LLM能够规划、理性化和综合,而无需依赖于大量的人工设计指令。在涵盖各种应用领域和场景的12个开放书籍RAG问答基准测试中进行评估,PrismRAG将平均事实性提高了5.4%,表现优于现有最先进的解决方案。
更新时间: 2025-07-25 00:15:31
领域: cs.CL,cs.AI,cs.LG
Resonant-Tunnelling Diode Reservoir Computing System for Image Recognition
As artificial intelligence continues to push into real-time, edge-based and resource-constrained environments, there is an urgent need for novel, hardware-efficient computational models. In this study, we present and validate a neuromorphic computing architecture based on resonant-tunnelling diodes (RTDs), which exhibit the nonlinear characteristics ideal for physical reservoir computing (RC). We theoretically formulate and numerically implement an RTD-based RC system and demonstrate its effectiveness on two image recognition benchmarks: handwritten digit classification and object recognition using the Fruit~360 dataset. Our results show that this circuit-level architecture delivers promising performance while adhering to the principles of next-generation RC -- eliminating random connectivity in favour of a deterministic nonlinear transformation of input signals.
Updated: 2025-07-25 00:08:12
标题: 共振隧穿二极管储存计算系统用于图像识别
摘要: 随着人工智能不断向实时、边缘计算和资源受限环境推进,迫切需要新颖、硬件高效的计算模型。本研究提出并验证了一种基于谐振隧穿二极管(RTDs)的神经形态计算架构,该架构具有非线性特征,非常适合物理储水池计算(RC)。我们在理论上构建并数值实现了基于RTD的RC系统,并在两个图像识别基准上展示了其有效性:手写数字分类和使用Fruit~360数据集的物体识别。我们的结果表明,这种电路级架构在遵循下一代RC原则的同时,提供了令人期待的性能--消除随机连接,而是采用输入信号的确定性非线性转换。
更新时间: 2025-07-25 00:08:12
领域: cs.LG,physics.app-ph