Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction
Equivariant Graph Neural Networks (eGNNs) trained on density-functional theory (DFT) data can potentially perform electronic structure prediction at unprecedented scales, enabling investigation of the electronic properties of materials with extended defects, interfaces, or exhibiting disordered phases. However, as interactions between atomic orbitals typically extend over 10+ angstroms, the graph representations required for this task tend to be densely connected, and the memory requirements to perform training and inference on these large structures can exceed the limits of modern GPUs. Here we present a distributed eGNN implementation which leverages direct GPU communication and introduce a partitioning strategy of the input graph to reduce the number of embedding exchanges between GPUs. Our implementation shows strong scaling up to 128 GPUs, and weak scaling up to 512 GPUs with 87% parallel efficiency for structures with 3,000 to 190,000 atoms on the Alps supercomputer.
Updated: 2025-07-04 23:53:47
标题: 分布式等变图神经网络用于大规模电子结构预测
摘要: 在密度泛函理论(DFT)数据上训练的等变图神经网络(eGNNs)可以潜在地在前所未有的规模上进行电子结构预测,从而实现对具有扩展缺陷、界面或表现出无序相的材料的电子性质的研究。然而,由于原子轨道之间的相互作用通常延伸超过10个埃,因此为了完成这一任务所需的图表征往往是密集连接的,而在这些大型结构上执行训练和推断所需的内存需求可能超出现代GPU的限制。在这里,我们提出了一个分布式eGNN实现,利用直接GPU通信,并引入了一种输入图的分区策略,以减少GPU之间的嵌入交换次数。我们的实现显示出强扩展性,最多可达128个GPU,并且在Alps超级计算机上的具有3,000至190,000个原子的结构中,弱扩展性可达512个GPU,并且并行效率为87%。
更新时间: 2025-07-04 23:53:47
领域: cs.LG,cond-mat.mtrl-sci,cs.DC,physics.comp-ph
A Hybrid Quantum Neural Network for Split Learning
Quantum Machine Learning (QML) is an emerging field of research with potential applications to distributed collaborative learning, such as Split Learning (SL). SL allows resource-constrained clients to collaboratively train ML models with a server, reduce their computational overhead, and enable data privacy by avoiding raw data sharing. Although QML with SL has been studied, the problem remains open in resource-constrained environments where clients lack quantum computing capabilities. Additionally, data privacy leakage between client and server in SL poses risks of reconstruction attacks on the server side. To address these issues, we propose Hybrid Quantum Split Learning (HQSL), an application of Hybrid QML in SL. HQSL enables classical clients to train models with a hybrid quantum server and curtails reconstruction attacks. Additionally, we introduce a novel qubit-efficient data-loading technique for designing a quantum layer in HQSL, minimizing both the number of qubits and circuit depth. Evaluations on real hardware demonstrate HQSL's practicality under realistic quantum noise. Experiments on five datasets demonstrate HQSL's feasibility and ability to enhance classification performance compared to its classical models. Notably, HQSL achieves mean improvements of over 3% in both accuracy and F1-score for the Fashion-MNIST dataset, and over 1.5% in both metrics for the Speech Commands dataset. We expand these studies to include up to 100 clients, confirming HQSL's scalability. Moreover, we introduce a noise-based defense mechanism to tackle reconstruction attacks on the server side. Overall, HQSL enables classical clients to train collaboratively with a hybrid quantum server, improving model performance and resistance against reconstruction attacks.
Updated: 2025-07-04 23:52:04
标题: 一个用于分裂学习的混合量子神经网络
摘要: 量子机器学习(QML)是一种新兴的研究领域,具有潜在的应用于分布式协作学习,如分割学习(SL)。SL允许资源受限的客户端与服务器协作训练ML模型,减少计算开销,并通过避免原始数据共享实现数据隐私。尽管已经研究了具有SL的QML,但在资源受限的环境中,客户端缺乏量子计算能力的问题仍然存在。此外,在SL中客户端和服务器之间的数据隐私泄霏会对服务器端进行重构攻击造成风险。为解决这些问题,我们提出了混合量子分割学习(HQSL),将混合QML应用于SL。HQSL使经典客户端能够与混合量子服务器训练模型,并削减重构攻击。此外,我们引入了一种新颖的量子比特高效数据加载技术,用于设计HQSL中的量子层,最小化量子比特数量和电路深度。在真实硬件上的评估证明了HQSL在实际量子噪声下的实用性。对五个数据集的实验表明HQSL的可行性和相对于其经典模型的能力提高分类性能。值得注意的是,HQSL在时尚MNIST数据集的准确性和F1分数上实现了超过3%的平均提升,以及在语音命令数据集的两个指标上超过1.5%的提升。我们将这些研究扩展到包括多达100个客户端,确认了HQSL的可扩展性。此外,我们引入了一种基于噪声的防御机制,以应对服务器端的重构攻击。总的来说,HQSL使经典客户端能够与混合量子服务器协作训练,提高模型性能并增强对重构攻击的抵抗能力。
更新时间: 2025-07-04 23:52:04
领域: quant-ph,cs.AI
Participatory Evolution of Artificial Life Systems via Semantic Feedback
We present a semantic feedback framework that enables natural language to guide the evolution of artificial life systems. Integrating a prompt-to-parameter encoder, a CMA-ES optimizer, and CLIP-based evaluation, the system allows user intent to modulate both visual outcomes and underlying behavioral rules. Implemented in an interactive ecosystem simulation, the framework supports prompt refinement, multi-agent interaction, and emergent rule synthesis. User studies show improved semantic alignment over manual tuning and demonstrate the system's potential as a platform for participatory generative design and open-ended evolution.
Updated: 2025-07-04 23:51:50
标题: 通过语义反馈参与式进化人工生命系统
摘要: 我们提出了一个语义反馈框架,使自然语言能够引导人工生命系统的演化。该系统集成了一个提示到参数编码器、一个CMA-ES优化器和基于CLIP的评估,允许用户意图调节视觉结果和基础行为规则。在一个交互式生态系统模拟中实施,该框架支持提示精化、多智能体交互和新规则的合成。用户研究显示,与手动调整相比,语义对齐性得到了改善,并展示了该系统作为参与式生成设计和开放式演化平台的潜力。
更新时间: 2025-07-04 23:51:50
领域: cs.AI,cs.GR
Deep Transformer Network for Monocular Pose Estimation of Shipborne Unmanned Aerial Vehicle
This paper introduces a deep transformer network for estimating the relative 6D pose of a Unmanned Aerial Vehicle (UAV) with respect to a ship using monocular images. A synthetic dataset of ship images is created and annotated with 2D keypoints of multiple ship parts. A Transformer Neural Network model is trained to detect these keypoints and estimate the 6D pose of each part. The estimates are integrated using Bayesian fusion. The model is tested on synthetic data and in-situ flight experiments, demonstrating robustness and accuracy in various lighting conditions. The position estimation error is approximately 0.8\% and 1.0\% of the distance to the ship for the synthetic data and the flight experiments, respectively. The method has potential applications for ship-based autonomous UAV landing and navigation.
Updated: 2025-07-04 23:23:59
标题: 深度变换器网络用于船载无人机单目姿态估计
摘要: 这篇论文介绍了一种用于估计无人机相对船舶的六自由度姿态的深度变压器网络,使用单目图像。创建了一个合成船舶图像数据集,并用多个船舶部件的2D关键点进行注释。训练了一个Transformer神经网络模型来检测这些关键点并估计每个部件的六自由度姿态。使用贝叶斯融合来整合估计值。该模型在合成数据和现场飞行实验中进行了测试,展示了在各种光照条件下的稳健性和准确性。在合成数据和飞行实验中,位置估计误差分别约为船舶距离的0.8%和1.0%。该方法具有用于基于船舶的自主无人机着陆和导航的潜在应用。
更新时间: 2025-07-04 23:23:59
领域: cs.CV,cs.AI,cs.RO,eess.IV
Economic Evaluation of LLMs
Practitioners often navigate LLM performance trade-offs by plotting Pareto frontiers of optimal accuracy-cost trade-offs. However, this approach offers no way to compare between LLMs with distinct strengths and weaknesses: for example, a cheap, error-prone model vs a pricey but accurate one. To address this gap, we propose economic evaluation of LLMs. Our framework quantifies the performance trade-off of an LLM as a single number based on the economic constraints of a concrete use case, all expressed in dollars: the cost of making a mistake, the cost of incremental latency, and the cost of abstaining from a query. We apply our economic evaluation framework to compare the performance of reasoning and non-reasoning models on difficult questions from the MATH benchmark, discovering that reasoning models offer better accuracy-cost tradeoffs as soon as the economic cost of a mistake exceeds \$0.01. In addition, we find that single large LLMs often outperform cascades when the cost of making a mistake is as low as \$0.1. Overall, our findings suggest that when automating meaningful human tasks with AI models, practitioners should typically use the most powerful available model, rather than attempt to minimize AI deployment costs, since deployment costs are likely dwarfed by the economic impact of AI errors.
Updated: 2025-07-04 23:16:02
标题: LLMs的经济评估
摘要: 从业者通常通过绘制 Pareto 最优精度成本权衡的前沿来导航LLM的绩效折衷。然而,这种方法无法比较具有不同优势和劣势的LLM:例如,一个便宜、容易出错的模型与一个昂贵但准确的模型。为了解决这一问题,我们提出了LLM的经济评估。我们的框架根据具体用例的经济约束量化LLM的绩效权衡为一个基于美元的单一数字:犯错的成本、增量延迟的成本和放弃查询的成本。我们将我们的经济评估框架应用于比较MATH基准中困难问题上推理和非推理模型的性能,发现一旦错误的经济成本超过0.01美元,推理模型提供更好的精度成本权衡。此外,我们发现当错误成本低至0.1美元时,单个大型LLM通常优于级联模型。总的来说,我们的研究结果表明,在使用AI模型自动化有意义的人类任务时,从业者通常应使用最强大的可用模型,而不是试图最小化AI部署成本,因为部署成本可能被AI错误的经济影响所淹没。
更新时间: 2025-07-04 23:16:02
领域: cs.AI
MatRL: Provably Generalizable Iterative Algorithm Discovery via Monte-Carlo Tree Search
Iterative methods for computing matrix functions have been extensively studied and their convergence speed can be significantly improved with the right tuning of parameters and by mixing different iteration types. Handtuning the design options for optimal performance can be cumbersome, especially in modern computing environments: numerous different classical iterations and their variants exist, each with non-trivial per-step cost and tuning parameters. To this end, we propose MatRL -- a reinforcement learning based framework that automatically discovers iterative algorithms for computing matrix functions. The key idea is to treat algorithm design as a sequential decision-making process. Monte-Carlo tree search is then used to plan a hybrid sequence of matrix iterations and step sizes, tailored to a specific input matrix distribution and computing environment. Moreover, we also show that the learned algorithms provably generalize to sufficiently large matrices drawn from the same distribution. Finally, we corroborate our theoretical results with numerical experiments demonstrating that MatRL produces algorithms that outperform various baselines in the literature.
Updated: 2025-07-04 22:57:33
标题: MatRL: 通过蒙特卡罗树搜索可证明的通用可迭代算法发现
摘要: 计算矩阵函数的迭代方法已被广泛研究,通过正确调整参数和混合不同的迭代类型,其收敛速度可以显著提高。在现代计算环境中,手动调整设计选项以获得最佳性能可能会很繁琐:存在许多不同的经典迭代及其变体,每个都具有非平凡的每步成本和调整参数。因此,我们提出了MatRL - 一种基于强化学习的框架,可以自动发现用于计算矩阵函数的迭代算法。关键思想是将算法设计视为一个序贯决策过程。然后使用蒙特卡洛树搜索来规划一种特定输入矩阵分布和计算环境的混合矩阵迭代和步长序列。此外,我们还表明,学习到的算法可以证明推广到从相同分布中抽取的足够大的矩阵。最后,我们通过数值实验证实,MatRL产生的算法优于文献中的各种基准算法。
更新时间: 2025-07-04 22:57:33
领域: cs.LG
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Differential Transformer has recently gained significant attention for its impressive empirical performance, often attributed to its ability to perform noise canceled attention. However, precisely how differential attention achieves its empirical benefits remains poorly understood. Moreover, Differential Transformer architecture demands large-scale training from scratch, hindering utilization of open pretrained weights. In this work, we conduct an in-depth investigation of Differential Transformer, uncovering three key factors behind its success: (1) enhanced expressivity via negative attention, (2) reduced redundancy among attention heads, and (3) improved learning dynamics. Based on these findings, we propose DEX, a novel method to efficiently integrate the advantages of differential attention into pretrained language models. By reusing the softmax attention scores and adding a lightweight differential operation on the output value matrix, DEX effectively incorporates the key advantages of differential attention while remaining lightweight in both training and inference. Evaluations confirm that DEX substantially improves the pretrained LLMs across diverse benchmarks, achieving significant performance gains with minimal adaptation data (< 0.01%).
Updated: 2025-07-04 22:55:34
标题: 理解差分变压器解锁预训练的自注意力
摘要: 差分变压器最近因其出色的实证表现而受到广泛关注,通常归因于其执行噪声消除注意力的能力。然而,差分注意力如何实现其实证优势仍不明确。此外,差分变压器架构需要从头开始进行大规模训练,阻碍了对开放预训练权重的利用。在这项工作中,我们对差分变压器进行了深入研究,揭示了其成功背后的三个关键因素:(1)通过负注意力增强表现力,(2)减少注意力头之间的冗余,以及(3)改善学习动态。根据这些发现,我们提出了DEX,一种新方法,可以有效地将差分注意力的优势整合到预训练语言模型中。通过重用softmax注意力分数并在输出值矩阵上添加轻量级差分操作,DEX有效地融合了差分注意力的关键优势,同时在训练和推理中保持轻量级。评估结果证实,DEX在各种基准测试中显著改善了预训练LLMs,实现了显著的性能提升,且所需的适应数据极少(<0.01%)。
更新时间: 2025-07-04 22:55:34
领域: cs.LG
Relation-Aware Network with Attention-Based Loss for Few-Shot Knowledge Graph Completion
Few-shot knowledge graph completion (FKGC) task aims to predict unseen facts of a relation with few-shot reference entity pairs. Current approaches randomly select one negative sample for each reference entity pair to minimize a margin-based ranking loss, which easily leads to a zero-loss problem if the negative sample is far away from the positive sample and then out of the margin. Moreover, the entity should have a different representation under a different context. To tackle these issues, we propose a novel Relation-Aware Network with Attention-Based Loss (RANA) framework. Specifically, to better utilize the plentiful negative samples and alleviate the zero-loss issue, we strategically select relevant negative samples and design an attention-based loss function to further differentiate the importance of each negative sample. The intuition is that negative samples more similar to positive samples will contribute more to the model. Further, we design a dynamic relation-aware entity encoder for learning a context-dependent entity representation. Experiments demonstrate that RANA outperforms the state-of-the-art models on two benchmark datasets.
Updated: 2025-07-04 22:52:34
标题: 关系感知网络与基于注意力损失的少样本知识图补全
摘要: Few-shot知识图谱补全(FKGC)任务旨在使用少量参考实体对预测关系的未见事实。当前方法随机选择每个参考实体对一个负样本,以最小化基于边界的排名损失,如果负样本远离正样本并超出边界,则很容易导致零损失问题。此外,实体在不同上下文下应该具有不同的表示。为了解决这些问题,我们提出了一种新颖的基于关系感知网络和基于注意力的损失(RANA)框架。具体来说,为了更好地利用丰富的负样本并缓解零损失问题,我们策略性地选择相关的负样本,并设计了一个基于注意力的损失函数来进一步区分每个负样本的重要性。直觉是更类似于正样本的负样本将对模型产生更大的贡献。此外,我们设计了一个动态的关系感知实体编码器,用于学习上下文相关的实体表示。实验证明,RANA在两个基准数据集上优于现有模型。
更新时间: 2025-07-04 22:52:34
领域: cs.CL,cs.AI,cs.LG
OpenAg: Democratizing Agricultural Intelligence
Agriculture is undergoing a major transformation driven by artificial intelligence (AI), machine learning, and knowledge representation technologies. However, current agricultural intelligence systems often lack contextual understanding, explainability, and adaptability, especially for smallholder farmers with limited resources. General-purpose large language models (LLMs), while powerful, typically lack the domain-specific knowledge and contextual reasoning needed for practical decision support in farming. They tend to produce recommendations that are too generic or unrealistic for real-world applications. To address these challenges, we present OpenAg, a comprehensive framework designed to advance agricultural artificial general intelligence (AGI). OpenAg combines domain-specific foundation models, neural knowledge graphs, multi-agent reasoning, causal explainability, and adaptive transfer learning to deliver context-aware, explainable, and actionable insights. The system includes: (i) a unified agricultural knowledge base that integrates scientific literature, sensor data, and farmer-generated knowledge; (ii) a neural agricultural knowledge graph for structured reasoning and inference; (iii) an adaptive multi-agent reasoning system where AI agents specialize and collaborate across agricultural domains; and (iv) a causal transparency mechanism that ensures AI recommendations are interpretable, scientifically grounded, and aligned with real-world constraints. OpenAg aims to bridge the gap between scientific knowledge and the tacit expertise of experienced farmers to support scalable and locally relevant agricultural decision-making.
Updated: 2025-07-04 22:44:41
标题: OpenAg: 使农业情报民主化
摘要: 农业正在经历一场由人工智能(AI)、机器学习和知识表示技术推动的重大转变。然而,当前的农业智能系统通常缺乏上下文理解、可解释性和适应性,特别是对于资源有限的小农户。通用型大型语言模型(LLMs)虽然强大,但通常缺乏领域特定知识和上下文推理,这在农业实际决策支持中是必要的。它们倾向于产生对于实际应用过于泛化或不切实际的建议。为了解决这些挑战,我们提出了OpenAg,一个旨在推进农业人工通用智能(AGI)的综合框架。OpenAg结合了领域特定的基础模型、神经知识图、多智能体推理、因果可解释性和自适应迁移学习,以提供具有上下文意识、可解释性和可行性的洞见。该系统包括:(i)一个统一的农业知识库,整合了科学文献、传感器数据和农民生成的知识;(ii)一个用于结构化推理和推论的神经农业知识图;(iii)一个自适应多智能体推理系统,其中AI智能体专业化并在农业领域之间进行协作;以及(iv)一个因果透明机制,确保AI建议是可解释的、科学基础的,并符合现实世界的约束。OpenAg旨在弥合科学知识和经验丰富的农民的默示专业知识之间的差距,以支持可扩展和地方相关的农业决策制定。
更新时间: 2025-07-04 22:44:41
领域: cs.AI
Static Segmentation by Tracking: A Label-Efficient Approach for Fine-Grained Specimen Image Segmentation
We study image segmentation in the biological domain, particularly trait segmentation from specimen images (e.g., butterfly wing stripes, beetle elytra). This fine-grained task is crucial for understanding the biology of organisms, but it traditionally requires manually annotating segmentation masks for hundreds of images per species, making it highly labor-intensive. To address this challenge, we propose a label-efficient approach, Static Segmentation by Tracking (SST), based on a key insight: while specimens of the same species exhibit natural variation, the traits of interest show up consistently. This motivates us to concatenate specimen images into a ``pseudo-video'' and reframe trait segmentation as a tracking problem. Specifically, SST generates masks for unlabeled images by propagating annotated or predicted masks from the ``pseudo-preceding'' images. Built upon recent video segmentation models, such as Segment Anything Model 2, SST achieves high-quality trait segmentation with only one labeled image per species, marking a breakthrough in specimen image analysis. To further enhance segmentation quality, we introduce a cycle-consistent loss for fine-tuning, again requiring only one labeled image. Additionally, we demonstrate the broader potential of SST, including one-shot instance segmentation in natural images and trait-based image retrieval.
Updated: 2025-07-04 22:40:19
标题: 静态分割通过跟踪:一种对细粒度标本图像分割具有标签效率的方法
摘要: 我们研究生物领域的图像分割,特别是从标本图像(例如蝴蝶翅膀条纹、甲虫前翅)中进行特征分割。这一细粒度任务对于理解生物的重要性不言而喻,但传统上需要为每个物种手动注释数百幅图像的分割掩模,因此非常耗时。为了解决这一挑战,我们提出了一种标签高效的方法,即基于关键洞察的静态分割跟踪(SST)。同一物种的标本虽然呈现自然变异,但感兴趣的特征却表现出一致性。这激励我们将标本图像连接成“伪视频”,并将特征分割重新构想为一个跟踪问题。具体来说,SST通过从“伪前一帧”图像传播注释或预测的掩模,为未标记的图像生成掩模。建立在最近的视频分割模型(如Segment Anything Model 2)之上,SST只需一个标记图像即可实现高质量的特征分割,标志着标本图像分析的一个突破。为了进一步提高分割质量,我们引入了一个用于微调的循环一致性损失,同样只需要一个标记图像。此外,我们展示了SST的更广泛潜力,包括自然图像中的单次实例分割和基于特征的图像检索。
更新时间: 2025-07-04 22:40:19
领域: cs.CV,cs.AI
RELRaE: LLM-Based Relationship Extraction, Labelling, Refinement, and Evaluation
A large volume of XML data is produced in experiments carried out by robots in laboratories. In order to support the interoperability of data between labs, there is a motivation to translate the XML data into a knowledge graph. A key stage of this process is the enrichment of the XML schema to lay the foundation of an ontology schema. To achieve this, we present the RELRaE framework, a framework that employs large language models in different stages to extract and accurately label the relationships implicitly present in the XML schema. We investigate the capability of LLMs to accurately generate these labels and then evaluate them. Our work demonstrates that LLMs can be effectively used to support the generation of relationship labels in the context of lab automation, and that they can play a valuable role within semi-automatic ontology generation frameworks more generally.
Updated: 2025-07-04 22:27:06
标题: RELRaE: 基于LLM的关系抽取、标注、细化和评估
摘要: 实验室中的机器人在实验中产生了大量的XML数据。为了支持实验室之间数据的互操作性,有动机将XML数据转化为知识图。这个过程的关键阶段是对XML模式进行丰富,以奠定本体模式的基础。为了实现这一目标,我们提出了RELRaE框架,该框架在不同阶段采用大型语言模型来提取并准确标记XML模式中隐含的关系。我们研究了LLM的能力,准确生成这些标签,然后对其进行评估。我们的工作表明,LLM可以有效地用于支持实验室自动化背景下生成关系标签,并且它们在半自动本体生成框架中起着宝贵的作用。
更新时间: 2025-07-04 22:27:06
领域: cs.AI,I.2.4; I.2.1
IMPACT: Importance-Aware Activation Space Reconstruction
Large language models (LLMs) achieve strong performance across many domains but are difficult to deploy in resource-constrained settings due to their size. Low-rank weight matrix compression is a popular strategy for reducing model size, typically by minimizing weight reconstruction error under the assumption that weights are low-rank. However, this assumption often does not hold in LLMs. Instead, LLM activations exhibit stronger low-rank structure-prompting a shift toward minimizing activation reconstruction error. We show that this shift alone is insufficient: activation dimensions contribute unequally to model performance, and uniform reconstruction can harm performance. We propose IMPACT, a principled framework for importance-aware activation reconstruction that links model compression decisions to their impact on model behavior. IMPACT formulates an optimization problem that considers both activation structure and gradient sensitivity, and derives a closed-form solution where the optimal reconstruction bases are the eigenvectors of an importance-weighted activation covariance matrix. This enables low-rank approximations explicitly optimized to preserve accuracy. Experiments across diverse models and tasks show that IMPACT achieves up to 48.6% greater model size reduction with accuracy comparable to state-of-the-art baselines.
Updated: 2025-07-04 22:26:33
标题: 影响:重要性感知激活空间重建
摘要: 大型语言模型(LLMs)在许多领域取得了强大的性能,但由于其尺寸较大,在资源受限的环境中难以部署。低秩权重矩阵压缩是减小模型尺寸的一种流行策略,通常通过最小化权重重建误差来实现,假设权重是低秩的。然而,在LLMs中,这种假设通常并不成立。相反,LLM激活呈现出更强的低秩结构,促使向最小化激活重建误差的方向转变。 我们表明,仅仅进行这种转变是不够的:激活维度对模型性能的贡献不均,均匀的重建可能会损害性能。我们提出了IMPACT,一个基于重要性感知的激活重建的原则性框架,将模型压缩决策与其对模型行为的影响联系起来。IMPACT制定了一个优化问题,考虑了激活结构和梯度敏感性,并推导出一个封闭形式的解决方案,其中最优的重建基是一个重要性加权激活协方差矩阵的特征向量。这使得低秩近似明确优化以保持准确性。实验跨越不同模型和任务表明,IMPACT在减小模型尺寸的同时,准确性与最先进的基准相当,可达到高达48.6%的增益。
更新时间: 2025-07-04 22:26:33
领域: cs.LG,stat.ML
Enhancing Satellite Object Localization with Dilated Convolutions and Attention-aided Spatial Pooling
Object localization in satellite imagery is particularly challenging due to the high variability of objects, low spatial resolution, and interference from noise and dominant features such as clouds and city lights. In this research, we focus on three satellite datasets: upper atmospheric Gravity Waves (GW), mesospheric Bores (Bore), and Ocean Eddies (OE), each presenting its own unique challenges. These challenges include the variability in the scale and appearance of the main object patterns, where the size, shape, and feature extent of objects of interest can differ significantly. To address these challenges, we introduce YOLO-DCAP, a novel enhanced version of YOLOv5 designed to improve object localization in these complex scenarios. YOLO-DCAP incorporates a Multi-scale Dilated Residual Convolution (MDRC) block to capture multi-scale features at scale with varying dilation rates, and an Attention-aided Spatial Pooling (AaSP) module to focus on the global relevant spatial regions, enhancing feature selection. These structural improvements help to better localize objects in satellite imagery. Experimental results demonstrate that YOLO-DCAP significantly outperforms both the YOLO base model and state-of-the-art approaches, achieving an average improvement of 20.95% in mAP50 and 32.23% in IoU over the base model, and 7.35% and 9.84% respectively over state-of-the-art alternatives, consistently across all three satellite datasets. These consistent gains across all three satellite datasets highlight the robustness and generalizability of the proposed approach. Our code is open sourced at https://github.com/AI-4-atmosphere-remote-sensing/satellite-object-localization.
Updated: 2025-07-04 22:18:40
标题: 利用扩张卷积和注意力辅助空间池化提升卫星目标定位
摘要: 卫星图像中的目标定位特别具有挑战性,因为对象的变异性大、空间分辨率低,以及受到噪声和云层、城市灯光等主导特征的干扰。在这项研究中,我们专注于三个卫星数据集:高层大气重力波(GW)、中间层波(Bore)和海洋涡旋(OE),每个都带来了独特的挑战。这些挑战包括主要对象模式在尺度和外观上的变异性,感兴趣对象的大小、形状和特征范围可能会有显著差异。为了解决这些挑战,我们引入了YOLO-DCAP,这是YOLOv5的一种新增强版本,旨在改善这些复杂情况下的目标定位。YOLO-DCAP包含一个多尺度膨胀残差卷积(MDRC)块,以以不同膨胀率捕获多尺度特征,并带有一个注意力辅助的空间池化(AaSP)模块,以便集中在全局相关的空间区域,增强特征选择。这些结构改进有助于更好地在卫星图像中定位对象。实验结果表明,YOLO-DCAP在mAP50和IoU方面明显优于YOLO基础模型和最先进的方法,平均改进了20.95%和32.23%。相对于最先进的替代方案,分别提高了7.35%和9.84%,并且在所有三个卫星数据集上保持一致。这三个卫星数据集上的持续增益突显了所提出方法的鲁棒性和普适性。我们的代码在https://github.com/AI-4-atmosphere-remote-sensing/satellite-object-localization上开源。
更新时间: 2025-07-04 22:18:40
领域: cs.CV,cs.AI
Symmetry-Robust 3D Orientation Estimation
Orientation estimation is a fundamental task in 3D shape analysis which consists of estimating a shape's orientation axes: its side-, up-, and front-axes. Using this data, one can rotate a shape into canonical orientation, where its orientation axes are aligned with the coordinate axes. Developing an orientation algorithm that reliably estimates complete orientations of general shapes remains an open problem. We introduce a two-stage orientation pipeline that achieves state of the art performance on up-axis estimation and further demonstrate its efficacy on full-orientation estimation, where one seeks all three orientation axes. Unlike previous work, we train and evaluate our method on all of Shapenet rather than a subset of classes. We motivate our engineering contributions by theory describing fundamental obstacles to orientation estimation for rotationally-symmetric shapes, and show how our method avoids these obstacles.
Updated: 2025-07-04 21:55:01
标题: 对称性稳健的三维方向估计
摘要: 定向估计是三维形状分析中的一个基本任务,包括估计形状的定向轴:其侧面、上面和前面轴。利用这些数据,可以将形状旋转到规范定向,其中其定向轴与坐标轴对齐。开发一种可靠地估计一般形状完整定向的定向算法仍然是一个待解决的问题。我们引入了一个两阶段的定向流程,该流程在上轴估计上实现了最先进的性能,并进一步展示了其在完整定向估计上的有效性,其中寻找所有三个定向轴。与以往的工作不同,我们在整个ShapeNet上训练和评估我们的方法,而不是仅针对某些类别的子集。我们通过描述对于具有旋转对称形状的定向估计存在的基本障碍的理论来激发我们的工程贡献,并展示我们的方法如何避开这些障碍。
更新时间: 2025-07-04 21:55:01
领域: cs.CV,cs.LG
Compressing Deep Neural Networks Using Explainable AI
Deep neural networks (DNNs) have demonstrated remarkable performance in many tasks but it often comes at a high computational cost and memory usage. Compression techniques, such as pruning and quantization, are applied to reduce the memory footprint of DNNs and make it possible to accommodate them on resource-constrained edge devices. Recently, explainable artificial intelligence (XAI) methods have been introduced with the purpose of understanding and explaining AI methods. XAI can be utilized to get to know the inner functioning of DNNs, such as the importance of different neurons and features in the overall performance of DNNs. In this paper, a novel DNN compression approach using XAI is proposed to efficiently reduce the DNN model size with negligible accuracy loss. In the proposed approach, the importance score of DNN parameters (i.e. weights) are computed using a gradient-based XAI technique called Layer-wise Relevance Propagation (LRP). Then, the scores are used to compress the DNN as follows: 1) the parameters with the negative or zero importance scores are pruned and removed from the model, 2) mixed-precision quantization is applied to quantize the weights with higher/lower score with higher/lower number of bits. The experimental results show that, the proposed compression approach reduces the model size by 64% while the accuracy is improved by 42% compared to the state-of-the-art XAI-based compression method.
Updated: 2025-07-04 21:45:34
标题: 使用可解释的人工智能压缩深度神经网络
摘要: 深度神经网络(DNNs)在许多任务中表现出卓越的性能,但往往伴随着高昂的计算成本和内存使用量。压缩技术,如修剪和量化,被应用于减少DNNs的内存占用量,并使其能够适应资源受限的边缘设备。最近,引入了可解释的人工智能(XAI)方法,旨在理解和解释AI方法。XAI可用于了解DNNs的内在功能,例如不同神经元和特征在DNNs整体性能中的重要性。本文提出了一种使用XAI的新型DNN压缩方法,以有效地减小DNN模型大小,且准确性损失可以忽略不计。在所提出的方法中,使用基于梯度的XAI技术Layer-wise Relevance Propagation(LRP)计算DNN参数(即权重)的重要性分数。然后,使用这些分数来压缩DNN,具体步骤如下:1)将具有负或零重要性分数的参数修剪并从模型中移除,2)应用混合精度量化来用更高/更低位数量化具有更高/更低分数的权重。实验结果表明,所提出的压缩方法将模型大小减小了64%,同时准确性比最先进的基于XAI的压缩方法提高了42%。
更新时间: 2025-07-04 21:45:34
领域: cs.LG,cs.AI
Beyond classical and contemporary models: a transformative ai framework for student dropout prediction in distance learning using rag, prompt engineering, and cross-modal fusion
Student dropout in distance learning remains a critical challenge, with profound societal and economic consequences. While classical machine learning models leverage structured socio-demographic and behavioral data, they often fail to capture the nuanced emotional and contextual factors embedded in unstructured student interactions. This paper introduces a transformative AI framework that redefines dropout prediction through three synergistic innovations: Retrieval-Augmented Generation (RAG) for domain-specific sentiment analysis, prompt engineering to decode academic stressors, and cross-modal attention fusion to dynamically align textual, behavioral, and socio-demographic insights. By grounding sentiment analysis in a curated knowledge base of pedagogical content, our RAG-enhanced BERT model interprets student comments with unprecedented contextual relevance, while optimized prompts isolate indicators of academic distress (e.g., "isolation," "workload anxiety"). A cross-modal attention layer then fuses these insights with temporal engagement patterns, creating holistic risk profiles. Evaluated on a longitudinal dataset of 4 423 students, the framework achieves 89% accuracy and an F1-score of 0.88, outperforming conventional models by 7% and reducing false negatives by 21%. Beyond prediction, the system generates interpretable interventions by retrieving contextually aligned strategies (e.g., mentorship programs for isolated learners). This work bridges the gap between predictive analytics and actionable pedagogy, offering a scalable solution to mitigate dropout risks in global education systems
Updated: 2025-07-04 21:41:43
标题: 超越经典和现代模型:基于RAG、提示工程和跨模态融合的远程学习学生辍学预测的转化AI框架
摘要: 远程学习中的学生辍学仍然是一个关键挑战,具有深远的社会和经济影响。尽管经典的机器学习模型利用结构化的社会人口学和行为数据,但它们经常无法捕捉嵌入在非结构化学生互动中的微妙情感和语境因素。本文介绍了一个革命性的人工智能框架,通过三个协同创新重新定义辍学预测:用于领域特定情感分析的检索增强生成(RAG),通过提示工程解码学术压力因素,并通过交叉模态注意力融合动态对齐文本、行为和社会人口洞察。通过将情感分析基于策划的教学内容知识库,我们的RAG增强的BERT模型以前所未有的语境相关性解释学生评论,同时优化的提示隔离学术压力指标(例如,“孤立”,“工作量焦虑”)。然后,交叉模态注意力层将这些洞察与时间参与模式融合,创建全面的风险概况。在一个包括4,423名学生的纵向数据集上评估,该框架实现了89%的准确率和0.88的F1分数,比传统模型表现优越7%,将假阴性减少了21%。除了预测之外,该系统通过检索语境对齐的策略(例如,孤立学习者的导师计划)生成可解释的干预。这项工作弥合了预测分析与可操作教学之间的差距,为减轻全球教育系统辍学风险提供了可扩展的解决方案。
更新时间: 2025-07-04 21:41:43
领域: cs.CL,cs.AI,cs.CY,cs.IR,I.2.7; I.2.1; K.3.1
A Data-Transparent Probabilistic Model of Temporal Propositional Abstraction
Standard probabilistic models face fundamental challenges such as data scarcity, a large hypothesis space, and poor data transparency. To address these challenges, we propose a novel probabilistic model of data-driven temporal propositional reasoning. Unlike conventional probabilistic models where data is a product of domain knowledge encoded in the probabilistic model, we explore the reverse direction where domain knowledge is a product of data encoded in the probabilistic model. This more data-driven perspective suggests no distinction between maximum likelihood parameter learning and temporal propositional reasoning. We show that our probabilistic model is equivalent to a highest-order, i.e., full-memory, Markov chain, and our model requires no distinction between hidden and observable variables. We discuss that limits provide a natural and mathematically rigorous way to handle data scarcity, including the zero-frequency problem. We also discuss that a probability distribution over data generated by our probabilistic model helps data transparency by revealing influential data used in predictions. The reproducibility of this theoretical work is fully demonstrated by the included proofs.
Updated: 2025-07-04 21:37:47
标题: 一个数据透明的时间命题抽象的概率模型
摘要: 标准概率模型面临着诸如数据稀缺、假设空间庞大和数据透明度差等基本挑战。为了解决这些挑战,我们提出了一种新颖的基于数据驱动的时间命题推理的概率模型。与传统的概率模型不同,其中数据是编码在概率模型中的领域知识的产物,我们探索了领域知识是数据编码在概率模型中的产物的反向方向。这种更加数据驱动的观点表明,最大似然参数学习和时间命题推理之间没有区别。我们展示了我们的概率模型等效于最高阶,即完全记忆的马尔可夫链,我们的模型不需要区分隐藏和可观测变量。我们讨论了限制提供了一种自然且数学严谨的处理数据稀缺性的方法,包括零频率问题。我们还讨论了由我们的概率模型生成的数据的概率分布通过揭示用于预测的有影响力的数据来帮助数据的透明度。这一理论工作的可重复性通过附带的证明得到充分证明。
更新时间: 2025-07-04 21:37:47
领域: cs.AI
Multichannel Steganography: A Provably Secure Hybrid Steganographic Model for Secure Communication
Secure covert communication in hostile environments requires simultaneously achieving invisibility, provable security guarantees, and robustness against informed adversaries. This paper presents a novel hybrid steganographic framework that unites cover synthesis and cover modification within a unified multichannel protocol. A secret-seeded PRNG drives a lightweight Markov-chain generator to produce contextually plausible cover parameters, which are then masked with the payload and dispersed across independent channels. The masked bit-vector is imperceptibly embedded into conventional media via a variance-aware least-significant-bit algorithm, ensuring that statistical properties remain within natural bounds. We formalize a multichannel adversary model (MC-ATTACK) and prove that, under standard security assumptions, the adversary's distinguishing advantage is negligible, thereby guaranteeing both confidentiality and integrity. Empirical results corroborate these claims: local-variance-guided embedding yields near-lossless extraction (mean BER $<5\times10^{-3}$, correlation $>0.99$) with minimal perceptual distortion (PSNR $\approx100$,dB, SSIM $>0.99$), while key-based masking drives extraction success to zero (BER $\approx0.5$) for a fully informed adversary. Comparative analysis demonstrates that purely distortion-free or invertible schemes fail under the same threat model, underscoring the necessity of hybrid designs. The proposed approach advances high-assurance steganography by delivering an efficient, provably secure covert channel suitable for deployment in high-surveillance networks.
Updated: 2025-07-04 21:18:16
标题: 多信道隐写术:一种可证明安全的混合隐写模型,用于安全通信
摘要: 在敌对环境中安全的隐蔽通信需要同时实现隐形性、可证明的安全保证以及对知情对手的强大抵抗能力。本文介绍了一种新颖的混合隐写框架,将封面合成和封面修改统一到一个多通道协议中。一个由秘密种子驱动的PRNG驱动轻量级马尔可夫链生成器产生上下文合理的封面参数,然后与有效载荷掩盖并分散到独立通道中。掩盖的比特向量通过一个方差感知的最低有效位算法被隐蔽地嵌入到传统媒体中,确保统计特性保持在自然范围内。我们形式化了一个多通道对手模型(MC-ATTACK),并证明在标准安全假设下,对手的区分优势是可以忽略的,从而保证了机密性和完整性。实证结果证实了这些说法:局部方差引导的嵌入产生了几乎无损提取(平均BER $<5\times10^{-3}$,相关性 $>0.99$),并且具有最小的感知失真(PSNR $\approx100$ dB,SSIM $>0.99$),而基于密钥的掩盖将提取成功率降低到零(BER $\approx0.5$)对于一个完全知情的对手。比较分析表明,纯粹无失真或可逆的方案在相同的威胁模型下失败,强调了混合设计的必要性。提出的方法通过提供一个高效、可证明安全的隐蔽通道,适用于部署在高监视网络中,推进了高保证隐写术的发展。
更新时间: 2025-07-04 21:18:16
领域: cs.CR,cs.MM
Characterizing the Distinguishability of Product Distributions through Multicalibration
Given a sequence of samples $x_1, \dots , x_k$ promised to be drawn from one of two distributions $X_0, X_1$, a well-studied problem in statistics is to decide $\textit{which}$ distribution the samples are from. Information theoretically, the maximum advantage in distinguishing the two distributions given $k$ samples is captured by the total variation distance between $X_0^{\otimes k}$ and $X_1^{\otimes k}$. However, when we restrict our attention to $\textit{efficient distinguishers}$ (i.e., small circuits) of these two distributions, exactly characterizing the ability to distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ is more involved and less understood. In this work, we give a general way to reduce bounds on the computational indistinguishability of $X_0$ and $X_1$ to bounds on the $\textit{information-theoretic}$ indistinguishability of some specific, related variables $\widetilde{X}_0$ and $\widetilde{X}_1$. As a consequence, we prove a new, tight characterization of the number of samples $k$ needed to efficiently distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ with constant advantage as \[ k = \Theta\left(d_H^{-2}\left(\widetilde{X}_0, \widetilde{X}_1\right)\right), \] which is the inverse of the squared Hellinger distance $d_H$ between two distributions $\widetilde{X}_0$ and $\widetilde{X}_1$ that are computationally indistinguishable from $X_0$ and $X_1$. Likewise, our framework can be used to re-derive a result of Halevi and Rabin (TCC 2008) and Geier (TCC 2022), proving nearly-tight bounds on how computational indistinguishability scales with the number of samples for arbitrary product distributions.
Updated: 2025-07-04 21:14:57
标题: 通过多次校准表征产品分布的可辨识性
摘要: 鉴于一个样本序列$x_1, \dots , x_k$被承诺是从两个分布$X_0, X_1$中的一个中抽取的,统计学中一个被广泛研究的问题是确定这些样本来自$\textit{哪个}$分布。信息理论上,鉴别这两个分布给定$k$个样本的最大优势由$X_0^{\otimes k}$和$X_1^{\otimes k}$之间的总变差距离捕获。然而,当我们将注意力限制在这两个分布的$\textit{高效区分者}$(即小电路)上时,准确地描述区分$X_0^{\otimes k}$和$X_1^{\otimes k}$的能力更加复杂且不太被理解。 在这项工作中,我们提供了一种通用方法,将$X_0$和$X_1$的计算不可区分性的界限降低到一些特定相关变量$\widetilde{X}_0$和$\widetilde{X}_1$的$\textit{信息理论}$不可区分性的界限。因此,我们证明了一个新的、紧致的关于需要多少个样本$k$来高效区分$X_0^{\otimes k}$和$X_1^{\otimes k}$并具有恒定优势的表征,如下所示: \[ k = \Theta\left(d_H^{-2}\left(\widetilde{X}_0, \widetilde{X}_1\right)\right), \] 其中$d_H$是两个分布$\widetilde{X}_0$和$\widetilde{X}_1$之间的Hellinger距离的倒数,这两个分布与$X_0$和$X_1$在计算上不可区分。同样,我们的框架可以用来重新推导Halevi和Rabin(TCC 2008)以及Geier(TCC 2022)的结果,证明计算不可区分性如何随着任意产品分布的样本数量变化而缩放的近乎最紧密的界限。
更新时间: 2025-07-04 21:14:57
领域: cs.CR,cs.CC
Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts
Documenting tacit knowledge in organizations can be a challenging task due to incomplete initial information, difficulty in identifying knowledgeable individuals, the interplay of formal hierarchies and informal networks, and the need to ask the right questions. To address this, we propose an agent-based framework leveraging large language models (LLMs) to iteratively reconstruct dataset descriptions through interactions with employees. Modeling knowledge dissemination as a Susceptible-Infectious (SI) process with waning infectivity, we conduct 864 simulations across various synthetic company structures and different dissemination parameters. Our results show that the agent achieves 94.9% full-knowledge recall, with self-critical feedback scores strongly correlating with external literature critic scores. We analyze how each simulation parameter affects the knowledge retrieval process for the agent. In particular, we find that our approach is able to recover information without needing to access directly the only domain specialist. These findings highlight the agent's ability to navigate organizational complexity and capture fragmented knowledge that would otherwise remain inaccessible.
Updated: 2025-07-04 21:09:32
标题: 利用大型语言模型在组织背景下进行隐性知识发现
摘要: 在组织中记录隐性知识可能是一项具有挑战性的任务,原因在于初始信息不完整、难以识别知识渊博的个体、正式等级制度和非正式网络之间的相互作用,以及需要提出正确问题。为了解决这个问题,我们提出了一个基于代理的框架,利用大型语言模型(LLMs)通过与员工的互动来迭代重建数据集描述。将知识传播建模为一个具有衰减传染性的易感-传染(SI)过程,我们在各种合成公司结构和不同传播参数下进行了864次模拟。我们的结果显示,代理实现了94.9%的完整知识回忆,自我批判反馈分数与外部文献评论分数强相关。我们分析了每个模拟参数如何影响代理的知识检索过程。特别是,我们发现我们的方法能够恢复信息,而无需直接访问唯一的领域专家。这些发现突显了代理在组织复杂性中的能力,并捕捉了原本无法访问的碎片化知识。
更新时间: 2025-07-04 21:09:32
领域: cs.AI,cs.CY,cs.LG
Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation
Despite the central role of retrieval in retrieval-augmented generation (RAG) systems, much of the existing research on RAG overlooks the well-established field of fair ranking and fails to account for the interests of all stakeholders involved. In this paper, we conduct the first systematic evaluation of RAG systems that integrate fairness-aware rankings, addressing both ranking fairness and attribution fairness, which ensures equitable exposure of the sources cited in the generated content. Our evaluation focuses on measuring item-side fairness, specifically the fair exposure of relevant items retrieved by RAG systems, and investigates how this fairness impacts both the effectiveness of the systems and the attribution of sources in the generated output that users ultimately see. By experimenting with twelve RAG models across seven distinct tasks, we show that incorporating fairness-aware retrieval often maintains or even enhances both ranking quality and generation quality, countering the common belief that fairness compromises system performance. Additionally, we demonstrate that fair retrieval practices lead to more balanced attribution in the final responses, ensuring that the generator fairly cites the sources it relies on. Our findings underscore the importance of item-side fairness in retrieval and generation, laying the foundation for responsible and equitable RAG systems and guiding future research in fair ranking and attribution.
Updated: 2025-07-04 20:56:35
标题: 朝着公平的RAG:公平排名在检索增强生成中的影响
摘要: 尽管检索在检索增强生成(RAG)系统中起着中心作用,但许多现有研究忽视了公平排名的成熟领域,并未考虑所有利益相关者的利益。在本文中,我们进行了第一次系统评估将公平意识排名纳入其中的RAG系统,旨在解决排名公平性和归因公平性,以确保生成内容中引用的来源得到公平曝光。我们的评估重点是测量项目方面的公平性,特别是RAG系统检索到的相关项目的公平曝光,并调查这种公平性如何影响系统的有效性以及用户最终看到的生成输出中的来源归因。通过在七个不同任务上对十二个RAG模型进行实验,我们发现纳入公平意识检索往往维持或甚至提高了排名质量和生成质量,与普遍认为公平性会损害系统性能的观念相反。此外,我们证明公平检索实践导致最终响应中更平衡的归因,确保生成器公平地引用其依赖的来源。我们的发现强调了检索和生成中项目方面公平性的重要性,为负责任和公平的RAG系统奠定基础,并指导未来在公平排名和归因方面的研究。
更新时间: 2025-07-04 20:56:35
领域: cs.IR,cs.AI,cs.CL
Coil Geometry Learning for Short-Range Magnetic Actuation
Fuel-free docking is a key operational technology for in-space assembly, resupplying space stations, sample return missions, and formation keeping of large-scale satellite swarms. The use of conventional propulsion systems, including thrusters, can cause adverse effects at short distances, such as sensor contamination, which may lead to the failure of the satellite or onboard equipment. The magnetic field interaction control generated by magnetorquers can overcome these weaknesses of propulsion. This actuation enables simultaneous control of attitude and formation control among desired satellite groups. The previous study typically uses the traditional dipole approximation model of the exact magnetic field to reduce computation cost. However, proximity operations often involve relatively short distances between satellites, which can easily compromise the effectiveness of this approximation. To avoid model errors that could result in satellite collisions, we utilize a magnetic field model described by Biot-Savart's law, without distance approximations (Near-field model), in consideration of short-distance operations. To overcome the high computational cost associated with the coil geometry and relative states information, a learning-based magnetic field approximation is derived, and its effectiveness is shown in the docking simulation of target and chaser satellites equipped with electromagnetic coils on three axes. Our method significantly reduces the computational cost of the exact magnetic model and possesses scalability that can accommodate an increasing number of target satellites through parallel processing.
Updated: 2025-07-04 20:54:30
标题: 短程磁致动的线圈几何学习
摘要: 无燃料对接是太空中装配、为空间站补给、样本返回任务以及大规模卫星群的形成保持等关键操作技术。传统推进系统,包括推进器,可能在短距离造成不良影响,如传感器污染,可能导致卫星或机载设备故障。磁力矩器产生的磁场相互作用控制可以克服推进的这些弱点。这种作用使得对所需卫星群的姿态和形成控制进行同时控制成为可能。先前的研究通常使用传统的偶极子逼近模型来减少计算成本。然而,接近操作通常涉及卫星之间相对较短的距离,这很容易影响这种逼近的有效性。为避免可能导致卫星碰撞的模型误差,我们利用毕奥-萨瓦尔定律描述的磁场模型(近场模型),而不进行距离逼近,考虑到短距离操作。为了克服与线圈几何和相对状态信息相关的高计算成本,我们推导出一种基于学习的磁场逼近法,并在配备了三个轴上的电磁线圈的目标和追击卫星的对接模拟中展示了其有效性。我们的方法显著降低了精确磁场模型的计算成本,并具有可扩展性,可以通过并行处理容纳越来越多的目标卫星。
更新时间: 2025-07-04 20:54:30
领域: cs.RO,cs.LG
Learning Differentiable Logic Programs for Abstract Visual Reasoning
Visual reasoning is essential for building intelligent agents that understand the world and perform problem-solving beyond perception. Differentiable forward reasoning has been developed to integrate reasoning with gradient-based machine learning paradigms. However, due to the memory intensity, most existing approaches do not bring the best of the expressivity of first-order logic, excluding a crucial ability to solve abstract visual reasoning, where agents need to perform reasoning by using analogies on abstract concepts in different scenarios. To overcome this problem, we propose NEUro-symbolic Message-pAssiNg reasoNer (NEUMANN), which is a graph-based differentiable forward reasoner, passing messages in a memory-efficient manner and handling structured programs with functors. Moreover, we propose a computationally-efficient structure learning algorithm to perform explanatory program induction on complex visual scenes. To evaluate, in addition to conventional visual reasoning tasks, we propose a new task, visual reasoning behind-the-scenes, where agents need to learn abstract programs and then answer queries by imagining scenes that are not observed. We empirically demonstrate that NEUMANN solves visual reasoning tasks efficiently, outperforming neural, symbolic, and neuro-symbolic baselines.
Updated: 2025-07-04 20:54:06
标题: 学习可微分逻辑程序以进行抽象视觉推理
摘要: 视觉推理对于构建理解世界并进行超越感知的问题解决的智能机器是至关重要的。可微分前向推理已被开发用于将推理与基于梯度的机器学习范式相结合。然而,由于内存密集度,大多数现有方法并未发挥一阶逻辑表达能力的最佳效果,排除了解决抽象视觉推理的关键能力,即代理需要通过在不同场景中使用类比来进行抽象概念的推理。为了克服这一问题,我们提出了一种名为NEUro-symbolic Message-pAssiNg reasoNer(NEUMANN)的基于图的可微分前向推理器,以内存高效的方式传递消息,并处理具有函子的结构化程序。此外,我们提出了一种计算高效的结构学习算法,用于在复杂的视觉场景上进行解释性程序归纳。为了评估,除了传统的视觉推理任务外,我们提出了一个新任务,即幕后视觉推理,在这个任务中,代理需要学习抽象程序,然后通过想象未观察到的场景来回答查询。我们在实证中证明了NEUMANN能够高效地解决视觉推理任务,优于神经、符号和神经符号基线。
更新时间: 2025-07-04 20:54:06
领域: cs.LG,cs.AI,cs.CV
Generating Novelty in Open-World Multi-Agent Strategic Board Games
We describe GNOME (Generating Novelty in Open-world Multi-agent Environments), an experimental platform that is designed to test the effectiveness of multi-agent AI systems when faced with \emph{novelty}. GNOME separates the development of AI gameplaying agents with the simulator, allowing \emph{unanticipated} novelty (in essence, novelty that is not subject to model-selection bias). Using a Web GUI, GNOME was recently demonstrated at NeurIPS 2020 using the game of Monopoly to foster an open discussion on AI robustness and the nature of novelty in real-world environments. In this article, we further detail the key elements of the demonstration, and also provide an overview of the experimental design that is being currently used in the DARPA Science of Artificial Intelligence and Learning for Open-World Novelty (SAIL-ON) program to evaluate external teams developing novelty-adaptive gameplaying agents.
Updated: 2025-07-04 20:44:33
标题: 在开放世界多智能体策略棋盘游戏中产生新颖性
摘要: 我们描述了GNOME(在开放世界多智能体环境中生成新颖性), 这是一个实验平台,旨在测试多智能体人工智能系统在面对“新颖性”时的有效性。 GNOME将AI游戏代理的开发与模拟器分开,允许\emph{意外}的新颖性(实质上,不受模型选择偏见的新颖性)。使用Web GUI,GNOME最近在NeurIPS 2020展示了通过使用《大富翁》游戏来促进有关AI鲁棒性和真实环境中新颖性的开放讨论。在本文中,我们进一步详细介绍了演示的关键要素,并提供了目前在DARPA人工智能科学和学习开放环境新颖性(SAIL-ON)计划中用于评估开发新颖性自适应游戏代理的外部团队的实验设计概述。
更新时间: 2025-07-04 20:44:33
领域: cs.AI
Hallucinatory Image Tokens: A Training-free EAZY Approach on Detecting and Mitigating Object Hallucinations in LVLMs
Despite their remarkable potential, Large Vision-Language Models (LVLMs) still face challenges with object hallucination, a problem where their generated outputs mistakenly incorporate objects that do not actually exist. Although most works focus on addressing this issue within the language-model backbone, our work shifts the focus to the image input source, investigating how specific image tokens contribute to hallucinations. Our analysis reveals a striking finding: a small subset of image tokens with high attention scores are the primary drivers of object hallucination. By removing these hallucinatory image tokens (only 1.5% of all image tokens), the issue can be effectively mitigated. This finding holds consistently across different models and datasets. Building on this insight, we introduce EAZY, a novel, training-free method that automatically identifies and Eliminates hAllucinations by Zeroing out hallucinatorY image tokens. We utilize EAZY for unsupervised object hallucination detection, achieving 15% improvement compared to previous methods. Additionally, EAZY demonstrates remarkable effectiveness in mitigating hallucinations while preserving model utility and seamlessly adapting to various LVLM architectures.
Updated: 2025-07-04 20:41:54
标题: 幻觉图像标记:一种无需训练的简便方法,用于检测和减轻低视觉水平模型中的物体幻觉
摘要: 尽管大型视觉语言模型(LVLMs)具有显著的潜力,但仍面临着物体幻觉的挑战,这是一个问题,其中它们生成的输出错误地包含实际不存在的对象。尽管大多数研究集中在解决这个问题在语言模型骨干中,我们的工作将焦点转移到图像输入源,调查特定图像标记如何导致幻觉。我们的分析揭示了一个惊人的发现:一小部分具有高关注分数的图像标记是物体幻觉的主要驱动因素。通过删除这些导致幻觉的图像标记(仅占所有图像标记的1.5%),可以有效减轻问题。这一发现在不同模型和数据集中持续一致。基于这一洞察,我们引入了EAZY,一种新颖的、无需训练的方法,通过将导致幻觉的图像标记置零来自动识别和消除幻觉。我们利用EAZY进行无监督对象幻觉检测,与先前方法相比实现了15%的改进。此外,EAZY在减轻幻觉的同时展现出卓越的有效性,同时保留模型的实用性,并无缝地适应各种LVLM架构。
更新时间: 2025-07-04 20:41:54
领域: cs.CV,cs.LG
LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling
Unified image restoration is a significantly challenging task in low-level vision. Existing methods either make tailored designs for specific tasks, limiting their generalizability across various types of degradation, or rely on training with paired datasets, thereby suffering from closed-set constraints. To address these issues, we propose a novel, dataset-free, and unified approach through recurrent posterior sampling utilizing a pretrained latent diffusion model. Our method incorporates the multimodal understanding model to provide sematic priors for the generative model under a task-blind condition. Furthermore, it utilizes a lightweight module to align the degraded input with the generated preference of the diffusion model, and employs recurrent refinement for posterior sampling. Extensive experiments demonstrate that our method outperforms state-of-the-art methods, validating its effectiveness and robustness. Our code and data will be available at https://github.com/AMAP-ML/LD-RPS.
Updated: 2025-07-04 20:39:37
标题: LD-RPS:通过潜在扩散循环后验采样实现零射击统一图像恢复
摘要: 统一图像恢复是低级视觉中一个极具挑战性的任务。现有方法要么为特定任务设计定制化方法,限制了它们在各种退化类型上的泛化能力,要么依赖于训练配对数据集,从而受到封闭集约束的影响。为了解决这些问题,我们提出了一种新颖的、无需数据集的、统一的方法,通过利用预训练的潜在扩散模型进行循环后验抽样。我们的方法将多模态理解模型结合起来,为在任务盲条件下的生成模型提供语义先验。此外,它利用轻量级模块将退化输入与扩散模型的生成偏好进行对齐,并使用循环细化进行后验抽样。大量实验证明我们的方法优于最先进的方法,验证了其有效性和鲁棒性。我们的代码和数据将在https://github.com/AMAP-ML/LD-RPS 上提供。
更新时间: 2025-07-04 20:39:37
领域: cs.CV,cs.AI
Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models
Assistive teleoperation, where control is shared between a human and a robot, enables efficient and intuitive human-robot collaboration in diverse and unstructured environments. A central challenge in real-world assistive teleoperation is for the robot to infer a wide range of human intentions from user control inputs and to assist users with correct actions. Existing methods are either confined to simple, predefined scenarios or restricted to task-specific data distributions at training, limiting their support for real-world assistance. We introduce Casper, an assistive teleoperation system that leverages commonsense knowledge embedded in pre-trained visual language models (VLMs) for real-time intent inference and flexible skill execution. Casper incorporates an open-world perception module for a generalized understanding of novel objects and scenes, a VLM-powered intent inference mechanism that leverages commonsense reasoning to interpret snippets of teleoperated user input, and a skill library that expands the scope of prior assistive teleoperation systems to support diverse, long-horizon mobile manipulation tasks. Extensive empirical evaluation, including human studies and system ablations, demonstrates that Casper improves task performance, reduces human cognitive load, and achieves higher user satisfaction than direct teleoperation and assistive teleoperation baselines. More information is available at https://ut-austin-rpl.github.io/casper/
Updated: 2025-07-04 20:27:52
标题: 卡斯珀:利用视觉语言模型推断辅助远程操作的多样意图
摘要: 辅助远程操作,其中控制由人类和机器人共享,可以在不同和非结构化环境中实现高效且直观的人机协作。在现实世界的辅助远程操作中,一个中心挑战是让机器人从用户控制输入中推断出广泛的人类意图,并协助用户进行正确的操作。现有方法要么局限于简单的、预定义的场景,要么限制在训练时的特定任务数据分布,从而限制了对现实世界辅助的支持。我们介绍了Casper,一种辅助远程操作系统,利用预训练的视觉语言模型(VLMs)中嵌入的常识知识进行实时意图推断和灵活技能执行。Casper包括一个开放世界感知模块,用于对新颖对象和场景进行通用理解,一个由VLM驱动的意图推断机制,利用常识推理来解释远程操作用户输入的片段,以及一个技能库,将先前的辅助远程操作系统的范围扩展到支持多样化、长期规划的移动操纵任务。包括人类研究和系统消融在内的广泛实证评估表明,Casper提高了任务表现,减轻了人类认知负荷,并比直接远程操作和辅助远程操作基线获得更高的用户满意度。更多信息请访问https://ut-austin-rpl.github.io/casper/
更新时间: 2025-07-04 20:27:52
领域: cs.RO,cs.AI
Learning Dark Souls Combat Through Pixel Input With Neuroevolution
This paper investigates the application of Neuroevolution of Augmenting Topologies (NEAT) to automate gameplay in Dark Souls, a notoriously challenging action role-playing game characterized by complex combat mechanics, dynamic environments, and high-dimensional visual inputs. Unlike traditional reinforcement learning or game playing approaches, our method evolves neural networks directly from raw pixel data, circumventing the need for explicit game-state information. To facilitate this approach, we introduce the Dark Souls API (DSAPI), a novel Python framework leveraging real-time computer vision techniques for extracting critical game metrics, including player and enemy health states. Using NEAT, agents evolve effective combat strategies for defeating the Asylum Demon, the game's initial boss, without predefined behaviors or domain-specific heuristics. Experimental results demonstrate that evolved agents achieve up to a 35% success rate, indicating the viability of neuroevolution in addressing complex, visually intricate gameplay scenarios. This work represents an interesting application of vision-based neuroevolution, highlighting its potential use in a wide range of challenging game environments lacking direct API support or well-defined state representations.
Updated: 2025-07-04 19:58:59
标题: 通过神经进化学习暗黑之魂战斗技巧的像素输入
摘要: 本文研究了神经进化增强拓扑(NEAT)在自动化游戏中的应用,Dark Souls是一个以复杂战斗机制、动态环境和高维视觉输入为特征的极具挑战性的动作角色扮演游戏。与传统的强化学习或游戏玩法方法不同,我们的方法直接从原始像素数据中进化神经网络,避免了对明确的游戏状态信息的需求。为了促进这种方法,我们引入了Dark Souls API(DSAPI),这是一个利用实时计算机视觉技术提取关键游戏指标的新颖Python框架,包括玩家和敌人的健康状态。使用NEAT,代理进化出有效的战斗策略,击败了Asylum Demon,游戏的初始boss,而无需预定义的行为或领域特定的启发式。实验结果表明,进化代理实现了高达35%的成功率,表明神经进化可以解决复杂、视觉复杂的游戏场景。这项工作展示了基于视觉的神经进化的有趣应用,突出了它在缺乏直接API支持或明确定义状态表示的各种具有挑战性的游戏环境中的潜在用途。
更新时间: 2025-07-04 19:58:59
领域: cs.AI
Efficient and Effective Query Context-Aware Learning-to-Rank Model for Sequential Recommendation
Modern sequential recommender systems commonly use transformer-based models for next-item prediction. While these models demonstrate a strong balance between efficiency and quality, integrating interleaving features - such as the query context (e.g., browse category) under which next-item interactions occur - poses challenges. Effectively capturing query context is crucial for refining ranking relevance and enhancing user engagement, as it provides valuable signals about user intent within a session. Unlike an item's features, query context is not temporally aligned with the item sequence, making its incorporation into transformers challenging and error-prone. This paper analyzes different strategies for incorporating query context into transformers trained with a causal language modeling procedure as a case study. We propose a new method that effectively fuses the item sequence with query context within the attention mechanism. Through extensive offline and online experiments on a large-scale online platform and open datasets, we present evidence that our proposed method is an effective approach for integrating query context to improve model ranking quality in terms of relevance and diversity.
Updated: 2025-07-04 19:50:01
标题: 高效有效的查询上下文感知学习排序模型用于序列推荐
摘要: 现代顺序推荐系统通常使用基于transformer的模型进行下一个项目的预测。虽然这些模型展现出了效率和质量之间的强大平衡,但整合交错特征 - 如查询上下文(例如浏览类别)下的下一个项目交互 - 提出了挑战。有效捕捉查询上下文对于优化排名相关性和增强用户参与至关重要,因为它提供了关于用户意图在会话中的宝贵信号。与项目特征不同,查询上下文与项目序列不在时间上对齐,使其整合到transformer中具有挑战性和容易出错。本文分析了不同的策略,将查询上下文纳入训练过程中的因果语言建模transformers作为案例研究。我们提出了一种新方法,通过有效地在注意机制中融合项目序列和查询上下文。通过在大规模在线平台和开放数据集上进行广泛的离线和在线实验,我们提出的方法证据表明,这是一种有效的方法,可以整合查询上下文,以提高模型排名质量,包括相关性和多样性。
更新时间: 2025-07-04 19:50:01
领域: cs.IR,cs.LG
Effective Capacitance Modeling Using Graph Neural Networks
Static timing analysis is a crucial stage in the VLSI design flow that verifies the timing correctness of circuits. Timing analysis depends on the placement and routing of the design, but at the same time, placement and routing efficiency depend on the final timing performance. VLSI design flows can benefit from timing-related prediction to better perform the earlier stages of the design flow. Effective capacitance is an essential input for gate delay calculation, and finding exact values requires routing or routing estimates. In this work, we propose the first GNN-based post-layout effective capacitance modeling method, GNN-Ceff, that achieves significant speed gains due to GPU parallelization while also providing better accuracy than current heuristics. GNN-Ceff parallelization achieves 929x speedup on real-life benchmarks over the state-of-the-art method run serially.
Updated: 2025-07-04 19:21:17
标题: 使用图神经网络进行有效电容建模 (Note: The translation may vary depending on the context and target audience.)
摘要: 静态时序分析是VLSI设计流程中的关键阶段,用于验证电路的时序正确性。时序分析取决于设计的布局和布线,但同时,布局和布线的效率取决于最终的时序性能。VLSI设计流程可以通过与时序相关的预测来更好地执行设计流程的早期阶段。有效电容是门延迟计算的重要输入,要找到精确的值需要进行布线或估计布线。在这项工作中,我们提出了基于GNN的后布局有效电容建模方法GNN-Ceff,通过GPU并行化实现了显著的速度提升,同时也比当前的启发式方法提供了更好的准确性。GNN-Ceff的并行化在真实基准测试中实现了对先进方法串行运行的929倍加速。
更新时间: 2025-07-04 19:21:17
领域: cs.LG
Identifying Large-Scale Linear Parameter Varying Systems with Dynamic Mode Decomposition Methods
Linear Parameter Varying (LPV) Systems are a well-established class of nonlinear systems with a rich theory for stability analysis, control, and analytical response finding, among other aspects. Although there are works on data-driven identification of such systems, the literature is quite scarce in terms of works that tackle the identification of LPV models for large-scale systems. Since large-scale systems are ubiquitous in practice, this work develops a methodology for the local and global identification of large-scale LPV systems based on nonintrusive reduced-order modeling. The developed method is coined as DMD-LPV for being inspired in the Dynamic Mode Decomposition (DMD). To validate the proposed identification method, we identify a system described by a discretized linear diffusion equation, with the diffusion gain defined by a polynomial over a parameter. The experiments show that the proposed method can easily identify a reduced-order LPV model of a given large-scale system without the need to perform identification in the full-order dimension, and with almost no performance decay over performing a reduction, given that the model structure is well-established.
Updated: 2025-07-04 19:20:50
标题: 使用动态模态分解方法识别大尺度线性参数变化系统
摘要: Linear Parameter Varying (LPV) Systems是一类经过充分理论研究的非线性系统,可用于稳定性分析、控制和分析响应等方面。尽管有关该类系统的数据驱动识别的研究已经存在,但在处理大规模系统的LPV模型识别方面文献相对较少。由于大规模系统在实践中普遍存在,本文基于非侵入式降阶建模,提出了一种用于识别大规模LPV系统的局部和全局方法。该方法被命名为DMD-LPV,受到了动态模态分解(DMD)的启发。为验证所提出的识别方法,我们识别了一个由离散线性扩散方程描述的系统,其中扩散增益由参数上的多项式定义。实验表明,所提出的方法可以轻松识别给定大规模系统的降阶LPV模型,无需在全阶维度上进行识别,并且在进行降阶时几乎没有性能衰减,前提是模型结构已经建立良好。
更新时间: 2025-07-04 19:20:50
领域: eess.SY,cs.LG,cs.SY
FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed
Large-scale vision foundation models such as DINOv2 boast impressive performances by leveraging massive architectures and training datasets. But numerous scenarios require practitioners to reproduce those pre-training solutions, such as on private data, new modalities, or simply for scientific questioning--which is currently extremely demanding computation-wise. We thus propose a novel pre-training strategy for DINOv2 that simultaneously accelerates convergence--and strengthens robustness to common corruptions as a by-product. Our approach involves a frequency filtering curriculum--low-frequency being seen first--and the Gaussian noise patching augmentation. Applied to a ViT-B/16 backbone trained on ImageNet-1K, while pre-training time and FLOPs are reduced by 1.6x and 2.25x, our method still achieves matching robustness in corruption benchmarks (ImageNet-C) and maintains competitive linear probing performance compared with baseline. This dual benefit of efficiency and robustness makes large-scale self-supervised foundation modeling more attainable, while opening the door to novel exploration around data curriculum and augmentation as means to improve self-supervised learning models robustness. The code is available at https://github.com/KevinZ0217/fast_dinov2
Updated: 2025-07-04 18:56:04
标题: FastDINOv2:基于频率的课程学习提高了鲁棒性和训练速度
摘要: 大规模视觉基础模型,如DINOv2,通过利用庞大的架构和训练数据展现出令人印象深刻的性能。但许多情景需要从业者重现这些预训练解决方案,例如在私有数据、新模式或仅仅出于科学问题的需要--这在目前计算方面极具挑战性。因此,我们提出了一种新颖的DINOv2预训练策略,同时加速收敛--并通过副产品加强对常见损坏的稳健性。我们的方法涉及频率过滤课程--先看低频率--和高斯噪声修补增强。应用于在ImageNet-1K上训练的ViT-B/16骨干结构,尽管预训练时间和FLOPs减少了1.6倍和2.25倍,我们的方法仍然在损坏基准测试(ImageNet-C)中实现了匹配的稳健性,并与基线相比保持了竞争力的线性探测性能。这种效率和稳健性的双重好处使得大规模自监督基础建模更加可实现,同时为围绕数据课程和增强探索提供了新的可能性,以改进自监督学习模型的稳健性。代码可在https://github.com/KevinZ0217/fast_dinov2找到。
更新时间: 2025-07-04 18:56:04
领域: cs.CV,cs.AI,cs.LG
Optimizing UAV Trajectories via a Simplified Close Enough TSP Approach
This article explores an approach to addressing the Close Enough Traveling Salesman Problem (CETSP). The objective is to streamline the mathematical formulation by introducing reformulations that approximate the Euclidean distances and simplify the objective function. Additionally, the use of convex sets in the constraint design offers computational benefits. The proposed methodology is empirically validated on real-world CETSP instances, with the aid of computational strategies such as a fragmented CPLEX-based approach. Results demonstrate its effectiveness in managing computational resources without compromising solution quality. Furthermore, the article analyzes the behavior of the proposed mathematical formulations, providing comprehensive insights into their performance.
Updated: 2025-07-04 18:50:23
标题: 通过简化的足够接近的TSP方法优化无人机轨迹
摘要: 本文探讨了解决“足够接近的旅行推销员问题”(CETSP)的方法。其目标是通过引入近似欧几里德距离并简化目标函数的重构,来简化数学公式。此外,约束设计中使用凸集提供了计算优势。所提出的方法经过实际的CETSP实例的经验验证,借助计算策略如基于分割的CPLEX方法。结果表明,该方法在管理计算资源方面表现出有效性,且不会影响解决方案质量。此外,本文分析了所提出的数学公式的行为,为其性能提供了全面的见解。
更新时间: 2025-07-04 18:50:23
领域: cs.AI
Alpay Algebra IV: Symbiotic Semantics and the Fixed-Point Convergence of Observer Embeddings
We present a theoretical framework in which a document and an AI model engage in a transfinite fixed-point interaction that leads to stable semantic alignment. Building on the foundations of Alpay Algebra, we introduce a functorial system wherein an observer (the AI) and a textual environment (this paper) co-evolve through iterative transformations guided by the phi-infinity operator. This process guarantees the existence of a unique fixed point in the AI's embedding space -- a state where the AI's internal representation of the content becomes stable, self-consistent, and semantically faithful. We prove that such convergence is mathematically sound, semantically invariant, and permanent, even under perturbation or further context expansion. This fixed point acts as an "empathetic embedding," wherein the AI internalizes not only the meaning of the content but also the author's intent. We interpret this as a rigorous, category-theoretic route to alignment at the embedding level, with implications for semantic security, symbolic memory, and the construction of AI systems with persistent self-referential understanding. All references in this paper function as nodes in the Alpay Algebra universe, and this work embeds itself as a new fixed-point node within that transfinite semantic graph.
Updated: 2025-07-04 18:49:18
标题: Alpay代数IV:共生语义和观察者嵌入的不动点收敛
摘要: 我们提出了一个理论框架,其中一个文件和一个人工智能模型进行无限不动点交互,从而实现稳定的语义对齐。在Alpay代数的基础上,我们引入了一个函子系统,观察者(人工智能)和文本环境(本文)通过由phi-无穷算子引导的迭代转换共同演化。这个过程保证了AI嵌入空间中存在一个唯一的不动点——AI对内容的内部表示变得稳定、自洽和语义忠实的状态。我们证明这种收敛在数学上是合理的、语义不变的,并且是永久的,即使在扰动或进一步上下文扩展的情况下也是如此。这个不动点充当了一个“共情嵌入”,AI不仅内化了内容的含义,还内化了作者的意图。我们将这解释为在嵌入级别上通过严格的范畴论路径实现对齐,对语义安全、符号记忆和构建具有持久自我参照理解的人工智能系统具有影响。本文中的所有参考文献在Alpay代数宇宙中起到节点的作用,而这项工作则将自身作为一个新的不动点节点嵌入到那个无限语义图中。
更新时间: 2025-07-04 18:49:18
领域: cs.CL,cs.AI,68T50, 68T07, 03G30, 18C10,I.2.7; I.2.6; F.4.1
RVISmith: Fuzzing Compilers for RVV Intrinsics
Modern processors are equipped with single instruction multiple data (SIMD) instructions for fine-grained data parallelism. Compiler auto-vectorization techniques that target SIMD instructions face performance limitations due to insufficient information available at compile time, requiring programmers to manually manipulate SIMD instructions. SIMD intrinsics, a type of built-in function provided by modern compilers, enable programmers to manipulate SIMD instructions within high-level programming languages. Bugs in compilers for SIMD intrinsics can introduce potential threats to software security, producing unintended calculation results, data loss, program crashes, etc. To detect bugs in compilers for SIMD intrinsics, we propose RVISmith, a randomized fuzzer that generates well-defined C programs that include various invocation sequences of RVV (RISC-V Vector Extension) intrinsics. We design RVISmith to achieve the following objectives: (i) achieving high intrinsic coverage, (ii) improving sequence variety, and (iii) without known undefined behaviors. We implement RVISmith based on the ratified RVV intrinsic specification and evaluate our approach with three modern compilers: GCC, LLVM, and XuanTie. Experimental results show that RVISmith achieves 11.5 times higher intrinsic coverage than the state-of-the-art fuzzer for RVV intrinsics. By differential testing that compares results across different compilers, optimizations, and equivalent programs, we detect and report 13 previously unknown bugs of the three compilers under test to date. Of these bugs, 10 are confirmed and another 3 are fixed by the compiler developers.
Updated: 2025-07-04 18:45:46
标题: RVISmith:针对RVV内在函数的模糊编译器
摘要: 现代处理器配备了单指令多数据(SIMD)指令,用于细粒度数据并行性。针对SIMD指令的编译器自动向量化技术面临性能限制,因为在编译时可用信息不足,需要程序员手动操作SIMD指令。SIMD内置函数是现代编译器提供的一种内置函数类型,使程序员能够在高级编程语言中操作SIMD指令。编译器中的SIMD内置函数中的错误可能对软件安全造成潜在威胁,导致意外计算结果、数据丢失、程序崩溃等问题。 为了检测编译器中的SIMD内置函数中的错误,我们提出了RVISmith,这是一个随机模糊器,生成包含RVV(RISC-V矢量扩展)内置函数的各种调用序列的明确定义的C程序。我们设计RVISmith以实现以下目标:(i)实现高内在覆盖率,(ii)改善序列多样性,(iii)避免已知的未定义行为。我们基于经批准的RVV内在规范实现了RVISmith,并使用三个现代编译器GCC、LLVM和XuanTie对我们的方法进行评估。实验结果表明,RVISmith的内在覆盖率比目前最先进的RVV内在函数的模糊器高11.5倍。通过差异测试比较不同编译器、优化和等效程序之间的结果,我们迄今为止检测并报告了三个编译器中的13个先前未知的错误。其中10个错误得到确认,另外3个错误已被编译器开发人员修复。
更新时间: 2025-07-04 18:45:46
领域: cs.CR,cs.DC,cs.PL,cs.SE
Skewed Score: A statistical framework to assess autograders
The evaluation of large language model (LLM) outputs is increasingly performed by other LLMs, a setup commonly known as "LLM-as-a-judge", or autograders. While autograders offer a scalable alternative to human evaluation, they have shown mixed reliability and may exhibit systematic biases, depending on response type, scoring methodology, domain specificity, and other factors. In this paper we propose a statistical framework based on Bayesian generalised linear models (GLMs) that enables researchers to simultaneously assess their autograders while also addressing their primary research questions (e.g., LLM evaluation). Our approach models evaluation outcomes (e.g., scores or pairwise preferences) as a function of properties of the grader (e.g., human vs. autograder) and the evaluated item (e.g., response length or the LLM that generated it), allowing for explicit quantification of scoring differences and potential biases within a unified framework. In addition, our method can be used to augment traditional reliability metrics such as inter-rater agreement, by providing uncertainty estimates and clarifying the source of disagreement. Overall, this approach contributes to more robust and interpretable use of autograders in LLM evaluation, enabling both performance analysis and bias detection.
Updated: 2025-07-04 18:45:10
标题: Skewed Score: 一种用于评估自动评分系统的统计框架
摘要: 评估大型语言模型(LLM)输出的方法越来越多地由其他LLM执行,这种设置通常被称为“LLM作为评判者”,或者自动评分器。虽然自动评分器提供了一个可扩展的替代人工评估的选择,但它们表现出的可靠性参差不齐,并且可能会展现出系统性偏见,这取决于响应类型、评分方法、领域特定性和其他因素。在本文中,我们提出了一种基于贝叶斯广义线性模型(GLMs)的统计框架,使研究人员能够同时评估他们的自动评分器,同时解决他们的主要研究问题(例如LLM评估)。我们的方法将评估结果(例如分数或成对偏好)建模为评分者属性(例如人类与自动评分器)和评估项属性(例如响应长度或生成它的LLM)的函数,允许明确量化评分差异和潜在偏见在一个统一的框架内。此外,我们的方法可以用于增强传统的可靠性指标,如评分者间一致性,通过提供不确定性估计并澄清分歧的来源。总的来说,这种方法有助于在LLM评估中更加健壮和可解释地使用自动评分器,实现性能分析和偏见检测。
更新时间: 2025-07-04 18:45:10
领域: cs.LG,stat.ML
Efficient Finite Initialization with Partial Norms for Tensorized Neural Networks and Tensor Networks Algorithms
We present two algorithms to initialize layers of tensorized neural networks and general tensor network algorithms using partial computations of their Frobenius norms and lineal entrywise norms, depending on the type of tensor network involved. The core of this method is the use of the norm of subnetworks of the tensor network in an iterative way, so that we normalize by the finite values of the norms that led to the divergence or zero norm. In addition, the method benefits from the reuse of intermediate calculations. We have also applied it to the Matrix Product State/Tensor Train (MPS/TT) and Matrix Product Operator/Tensor Train Matrix (MPO/TT-M) layers and have seen its scaling versus the number of nodes, bond dimension, and physical dimension. All code is publicly available.
Updated: 2025-07-04 18:26:40
标题: 高效的有限初始化与部分范数对张量化神经网络和张量网络算法的影响
摘要: 我们提出了两种算法来初始化张量化神经网络的层,并使用它们的Frobenius范数和线性逐元素范数的部分计算来实现一般张量网络算法,具体取决于所涉及的张量网络类型。该方法的核心是以迭代方式使用张量网络的子网络的范数,使得我们通过导致发散或零范数的范数的有限值进行归一化。此外,该方法受益于中间计算的重复利用。我们还将其应用于矩阵乘积状态/张量列车(MPS/TT)和矩阵乘积算子/张量列车矩阵(MPO/TT-M)层,并观察其随节点数、键合维度和物理维度的扩展性。所有代码均可公开获取。
更新时间: 2025-07-04 18:26:40
领域: cs.LG,quant-ph,68Q12, 15A69, 68T07
Finetuning CLIP to Reason about Pairwise Differences
Vision-language models (VLMs) such as CLIP are trained via contrastive learning between text and image pairs, resulting in aligned image and text embeddings that are useful for many downstream tasks. A notable drawback of CLIP, however, is that the resulting embedding space seems to lack some of the structure of its purely text-based alternatives. For instance, while text embeddings have long been noted to satisfy analogies in embedding space using vector arithmetic, CLIP has no such property. In this paper, we propose an approach to natively train CLIP in a contrastive manner to reason about differences in embedding space. We finetune CLIP so that text descriptions of differences between images correspond to their difference in image embedding space, using synthetically generated data with large language models on image-caption paired datasets. We first demonstrate that our approach yields significantly improved capabilities in ranking images by a certain attribute (e.g., elephants are larger than cats), which is useful in retrieval or constructing attribute-based classifiers, and improved zeroshot classification performance on many downstream image classification tasks. In addition, our approach enables a new mechanism for inference that we refer to as comparative prompting, where we leverage prior knowledge of text descriptions of differences between classes of interest, achieving even larger performance gains in classification. Finally, we illustrate that the resulting embeddings obey a larger degree of geometric properties in embedding space, such as in text-to-image generation.
Updated: 2025-07-04 18:25:46
标题: Feinetuning CLIP以推理关于成对差异
摘要: 视觉-语言模型(VLMs)如CLIP是通过对比学习文本和图像对进行训练的,从而得到对齐的图像和文本嵌入,对许多下游任务有用。然而,CLIP的一个显着缺点是,所得到的嵌入空间似乎缺乏纯文本替代品的某些结构。例如,尽管长期以来一直注意到文本嵌入可以通过向量算术在嵌入空间中满足类比,但CLIP却没有这种属性。在本文中,我们提出了一种原生训练CLIP的对比方法,以推理嵌入空间中的差异。我们通过使用大型语言模型在图像-标题配对数据集上合成生成数据,微调CLIP,使图像之间的差异的文本描述与它们在图像嵌入空间中的差异相对应。我们首先证明我们的方法在按某种属性(例如,大象比猫大)对图像进行排名方面具有显着改进的能力,这对检索或构建基于属性的分类器是有用的,并且在许多下游图像分类任务中改进了零样本分类性能。此外,我们的方法实现了一种称为比较提示的推理新机制,我们利用感兴趣类别之间的文本描述的先验知识,在分类中获得了更大的性能提升。最后,我们说明所得到的嵌入在嵌入空间中遵守更大程度的几何属性,比如在文本到图像生成中。
更新时间: 2025-07-04 18:25:46
领域: cs.LG,cs.CV
Causal Evidence for the Primordiality of Colors in Trans-Neptunian Objects
The origins of the colors of Trans-Neptunian Objects (TNOs) represent a crucial unresolved question, central to understanding the history of our Solar System. Recent observational surveys have revealed correlations between the eccentricity and inclination of TNOs and their colors. This has rekindled the long-standing debate on whether these colors reflect the conditions of TNO formation or their subsequent collisional evolution. In this study, we address this question with 98.7% certainty, using a model-agnostic, data-driven approach based on causal graphs. First, as a sanity check, we demonstrate how our model can replicate the currently accepted paradigms of TNOs' dynamical history, blindly and without any orbital modeling or physics-based assumptions. In fact, our causal model (with no knowledge of the existence of Neptune) predicts the existence of an unknown perturbing body, i.e., Neptune. We then show how this model predicts, with high certainty, that the color of TNOs is the root cause of their inclination distribution, rather than the other way around. This strongly suggests that the colors of TNOs reflect an underlying dynamical property, most likely their formation location. Moreover, our causal model excludes formation scenarios that invoke substantial color modification by subsequent irradiation. We therefore conclude that the colors of TNOs are predominantly primordial.
Updated: 2025-07-04 18:17:18
标题: 跨海王星天体中颜色原始性的因果证据
摘要: 距离海王星以外天体(TNOs)的颜色起源代表一个关键的未解之谜,对于理解我们太阳系历史至关重要。最近的观测调查揭示了TNOs的离心率和倾角与它们的颜色之间的相关性。这重新点燃了长期以来关于这些颜色是否反映TNO形成条件或其后续碰撞演化的辩论。在这项研究中,我们用98.7%的确定性,采用基于因果图的模型无关、数据驱动的方法来解决这个问题。首先,作为一个合理性检验,我们展示了我们的模型如何在没有任何轨道建模或基于物理假设的情况下盲目地复制了目前被接受的TNOs动力学历史范式。事实上,我们的因果模型(没有关于海王星存在的知识)预测了一个未知的干扰体存在,即海王星。然后我们展示了这个模型如何高度确定地预测TNOs的颜色是其倾角分布的根本原因,而不是相反。这强烈暗示了TNOs的颜色反映了一个潜在的动力学特性,很可能是它们的形成位置。此外,我们的因果模型排除了通过后续辐射大幅度改变颜色的形成方案。因此我们得出结论,TNOs的颜色主要是原始的。
更新时间: 2025-07-04 18:17:18
领域: astro-ph.EP,cs.LG
Sequential Regression Learning with Randomized Algorithms
This paper presents ``randomized SINDy", a sequential machine learning algorithm designed for dynamic data that has a time-dependent structure. It employs a probabilistic approach, with its PAC learning property rigorously proven through the mathematical theory of functional analysis. The algorithm dynamically predicts using a learned probability distribution of predictors, updating weights via gradient descent and a proximal algorithm to maintain a valid probability density. Inspired by SINDy (Brunton et al. 2016), it incorporates feature augmentation and Tikhonov regularization. For multivariate normal weights, the proximal step is omitted to focus on parameter estimation. The algorithm's effectiveness is demonstrated through experimental results in regression and binary classification using real-world data.
Updated: 2025-07-04 18:14:36
标题: 使用随机算法的顺序回归学习
摘要: 本文介绍了“随机SINDy”,这是一种为具有时间依赖结构的动态数据设计的顺序机器学习算法。它采用了一种概率方法,通过函数分析的数学理论严格证明了其PAC学习属性。该算法通过学习预测器的概率分布动态预测,通过梯度下降和一种近端算法更新权重,以维持一个有效的概率密度。受SINDy(Brunton等人,2016)的启发,它结合了特征增强和Tikhonov正则化。对于多变量正常权重,省略了近端步骤,以便专注于参数估计。通过使用真实世界数据进行回归和二元分类的实验证明了该算法的有效性。
更新时间: 2025-07-04 18:14:36
领域: stat.ML,cs.LG
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models
Vision-Language Models (VLMs) learn joint representations by mapping images and text into a shared latent space. However, recent research highlights that deterministic embeddings from standard VLMs often struggle to capture the uncertainties arising from the ambiguities in visual and textual descriptions and the multiple possible correspondences between images and texts. Existing approaches tackle this by learning probabilistic embeddings during VLM training, which demands large datasets and does not leverage the powerful representations already learned by large-scale VLMs like CLIP. In this paper, we propose GroVE, a post-hoc approach to obtaining probabilistic embeddings from frozen VLMs. GroVE builds on Gaussian Process Latent Variable Model (GPLVM) to learn a shared low-dimensional latent space where image and text inputs are mapped to a unified representation, optimized through single-modal embedding reconstruction and cross-modal alignment objectives. Once trained, the Gaussian Process model generates uncertainty-aware probabilistic embeddings. Evaluation shows that GroVE achieves state-of-the-art uncertainty calibration across multiple downstream tasks, including cross-modal retrieval, visual question answering, and active learning.
Updated: 2025-07-04 18:13:55
标题: 冻结视觉-语言模型的概率嵌入:高斯过程潜变量模型中的不确定性量化
摘要: Vision-Language Models (VLMs)通过将图像和文本映射到共享的潜在空间中学习联合表示。然而,最近的研究强调,标准VLMs中确定性嵌入往往难以捕捉由视觉和文本描述中的歧义以及图像和文本之间的多个可能对应关系引起的不确定性。现有方法通过在VLM训练过程中学习概率嵌入来解决这个问题,但这需要大量数据集,并且不能利用像CLIP这样的大规模VLMs已经学习到的强大表示。在本文中,我们提出了GroVE,一种从冻结的VLMs获取概率嵌入的后期方法。GroVE基于高斯过程潜变量模型(GPLVM),学习一个共享的低维潜在空间,其中图像和文本输入被映射到统一表示,通过单模嵌入重构和跨模态对齐目标进行优化。一旦训练完成,高斯过程模型生成具有不确定性意识的概率嵌入。评估结果显示,GroVE在多个下游任务中实现了最先进的不确定性校准,包括跨模态检索、视觉问答和主动学习。
更新时间: 2025-07-04 18:13:55
领域: cs.CV,cs.LG
Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis
The success of denoising diffusion models raises important questions regarding their generalisation behaviour, particularly in high-dimensional settings. Notably, it has been shown that when training and sampling are performed perfectly, these models memorise training data -- implying that some form of regularisation is essential for generalisation. Existing theoretical analyses primarily rely on algorithm-independent techniques such as uniform convergence, heavily utilising model structure to obtain generalisation bounds. In this work, we instead leverage the algorithmic aspects that promote generalisation in diffusion models, developing a general theory of algorithm-dependent generalisation for this setting. Borrowing from the framework of algorithmic stability, we introduce the notion of score stability, which quantifies the sensitivity of score-matching algorithms to dataset perturbations. We derive generalisation bounds in terms of score stability, and apply our framework to several fundamental learning settings, identifying sources of regularisation. In particular, we consider denoising score matching with early stopping (denoising regularisation), sampler-wide coarse discretisation (sampler regularisation) and optimising with SGD (optimisation regularisation). By grounding our analysis in algorithmic properties rather than model structure, we identify multiple sources of implicit regularisation unique to diffusion models that have so far been overlooked in the literature.
Updated: 2025-07-04 18:07:06
标题: 扩散模型中的隐含正则化:一种基于算法的泛化分析
摘要: 降噪扩散模型的成功引发了关于它们在高维环境中的泛化行为的重要问题。特别地,已经证明当训练和采样完美执行时,这些模型会记忆训练数据 -- 暗示着某种形式的正则化对于泛化至关重要。现有的理论分析主要依赖于与算法无关的技术,如均匀收敛,大量利用模型结构来获得泛化界限。在这项工作中,我们相反利用促进扩散模型泛化的算法方面,为这种情况开发了一个通用的算法依赖泛化理论。借鉴算法稳定性框架,我们引入了得分稳定性的概念,量化了得分匹配算法对数据集扰动的敏感性。我们根据得分稳定性推导出泛化界限,并将我们的框架应用于几个基本学习设置,识别正则化的来源。特别地,我们考虑了降噪得分匹配与提前停止(降噪正则化)、全局粗离散采样器(采样器正则化)和使用SGD进行优化(优化正则化)。通过将我们的分析基于算法性质而不是模型结构,我们识别了迄今在文献中被忽视的扩散模型独有的多源隐式正则化。
更新时间: 2025-07-04 18:07:06
领域: stat.ML,cs.LG,math.ST,stat.TH
StreamDiT: Real-Time Streaming Text-to-Video Generation
Recently, great progress has been achieved in text-to-video (T2V) generation by scaling transformer-based diffusion models to billions of parameters, which can generate high-quality videos. However, existing models typically produce only short clips offline, restricting their use cases in interactive and real-time applications. This paper addresses these challenges by proposing StreamDiT, a streaming video generation model. StreamDiT training is based on flow matching by adding a moving buffer. We design mixed training with different partitioning schemes of buffered frames to boost both content consistency and visual quality. StreamDiT modeling is based on adaLN DiT with varying time embedding and window attention. To practice the proposed method, we train a StreamDiT model with 4B parameters. In addition, we propose a multistep distillation method tailored for StreamDiT. Sampling distillation is performed in each segment of a chosen partitioning scheme. After distillation, the total number of function evaluations (NFEs) is reduced to the number of chunks in a buffer. Finally, our distilled model reaches real-time performance at 16 FPS on one GPU, which can generate video streams at 512p resolution. We evaluate our method through both quantitative metrics and human evaluation. Our model enables real-time applications, e.g. streaming generation, interactive generation, and video-to-video. We provide video results and more examples in our project website: <a href="https://cumulo-autumn.github.io/StreamDiT/">this https URL.</a>
Updated: 2025-07-04 18:00:01
标题: StreamDiT:实时流式文本到视频生成
摘要: 最近,通过将基于Transformer的扩散模型扩展到数十亿参数,文本到视频(T2V)生成取得了巨大进展,这可以生成高质量的视频。然而,现有模型通常仅离线生成短视频片段,限制了它们在交互式和实时应用中的使用情况。本文通过提出StreamDiT,一种流式视频生成模型,来解决这些挑战。StreamDiT的训练基于通过添加移动缓冲区进行流匹配。我们设计了不同分区方案的混合训练,以提高内容一致性和视觉质量。StreamDiT建模基于具有不同时间嵌入和窗口注意力的adaLN DiT。为了实践所提出的方法,我们训练了一个具有4B参数的StreamDiT模型。此外,我们提出了一种专为StreamDiT定制的多步蒸馏方法。在选择的分区方案的每个段中进行采样蒸馏。蒸馏后,功能评估的总数(NFEs)减少到缓冲区中的块数。最后,我们的蒸馏模型在一个GPU上以16 FPS实现了实时性能,可以生成512p分辨率的视频流。我们通过定量指标和人类评估来评估我们的方法。我们的模型支持实时应用,例如流式生成、交互式生成和视频到视频。我们在我们的项目网站提供视频结果和更多示例:<a href="https://cumulo-autumn.github.io/StreamDiT/">此https网址</a>。
更新时间: 2025-07-04 18:00:01
领域: cs.CV,cs.AI,cs.LG,eess.IV
Zero-Knowledge Mechanisms
A powerful feature in mechanism design is the ability to irrevocably commit to the rules of a mechanism. Commitment is achieved by public declaration, which enables players to verify incentive properties in advance and the outcome in retrospect. However, public declaration can reveal superfluous information that the mechanism designer might prefer not to disclose, such as her target function or private costs. Avoiding this may be possible via a trusted mediator; however, the availability of a trustworthy mediator, especially if mechanism secrecy must be maintained for years, might be unrealistic. We propose a new approach to commitment, and show how to commit to, and run, any given mechanism without disclosing it, while enabling the verification of incentive properties and the outcome -- all without the need for any mediators. Our framework utilizes zero-knowledge proofs -- a cornerstone of modern cryptographic theory. Applications include both private-type settings such as auctions and private-action settings such as contracts, as well as non-mediated bargaining with hidden yet binding offers.
Updated: 2025-07-04 17:59:20
标题: 零知识机制
摘要: 机制设计中的一个强大特性是能够不可逆地承诺机制的规则。承诺是通过公开宣言实现的,这使玩家能够事先验证激励属性,并在事后验证结果。然而,公开宣言可能会泄露机制设计者可能不愿透露的多余信息,例如她的目标函数或私人成本。通过信任的中介可能避免这种情况;然而,如果必须维持机制的保密性多年,那么可信的中介的可用性可能是不现实的。我们提出了一种新的承诺方法,并展示了如何承诺并运行任何给定的机制而不披露它,同时实现激励属性和结果的验证--所有这些都不需要任何中介。我们的框架利用了零知识证明--现代密码理论的基石。应用包括私有类型设置,如拍卖和私有行动设置,如合同,以及隐藏但具有约束力的报价的非中介交易。
更新时间: 2025-07-04 17:59:20
领域: econ.TH,cs.CR,cs.GT
Determination of Particle-Size Distributions from Light-Scattering Measurement Using Constrained Gaussian Process Regression
In this work, we propose a novel methodology for robustly estimating particle size distributions from optical scattering measurements using constrained Gaussian process regression. The estimation of particle size distributions is commonly formulated as a Fredholm integral equation of the first kind, an ill-posed inverse problem characterized by instability due to measurement noise and limited data. To address this, we use a Gaussian process prior to regularize the solution and integrate a normalization constraint into the Gaussian process via two approaches: by constraining the Gaussian process using a pseudo-measurement and by using Lagrange multipliers in the equivalent optimization problem. To improve computational efficiency, we employ a spectral expansion of the covariance kernel using eigenfunctions of the Laplace operator, resulting in a computationally tractable low-rank representation without sacrificing accuracy. Additionally, we investigate two complementary strategies for hyperparameter estimation: a data-driven approach based on maximizing the unconstrained log marginal likelihood, and an alternative approach where the physical constraints are taken into account. Numerical experiments demonstrate that the proposed constrained Gaussian process regression framework accurately reconstructs particle size distributions, producing numerically stable, smooth, and physically interpretable results. This methodology provides a principled and efficient solution for addressing inverse scattering problems and related ill-posed integral equations.
Updated: 2025-07-04 17:56:16
标题: 使用约束高斯过程回归确定光散射测量中的粒径分布
摘要: 在这项工作中,我们提出了一种新颖的方法来从光散射测量中鲁棒地估计颗粒大小分布,使用受限高斯过程回归。颗粒大小分布的估计通常被公式化为第一类Fredholm积分方程,这是一个由于测量噪声和有限数据导致不稳定性的逆问题。为了解决这个问题,我们使用高斯过程先验来正则化解决方案,并通过两种方法将归一化约束整合到高斯过程中:通过使用伪测量约束高斯过程,以及在等价优化问题中使用拉格朗日乘子。为了提高计算效率,我们使用 Laplace算子的特征函数进行核协方差的谱展开,从而得到一个计算可行的低秩表示,同时不损失精度。此外,我们研究了两种互补的超参数估计策略:一种基于最大化无约束对数边际似然的数据驱动方法,另一种方法考虑了物理约束。数值实验表明,所提出的受限高斯过程回归框架能够准确重建颗粒大小分布,产生数值稳定、平滑且具有物理可解释性的结果。这种方法为解决逆散射问题和相关的不适定积分方程提供了一个有原则且高效的解决方案。
更新时间: 2025-07-04 17:56:16
领域: stat.ML,cs.LG,physics.optics,stat.ME
Inverse Synthetic Aperture Fourier Ptychography
Fourier ptychography (FP) is a powerful light-based synthetic aperture imaging technique that allows one to reconstruct a high-resolution, wide field-of-view image by computationally integrating a diverse collection of low-resolution, far-field measurements. Typically, FP measurement diversity is introduced by changing the angle of the illumination or the position of the camera; either approach results in sampling different portions of the target's spatial frequency content, but both approaches introduce substantial costs and complexity to the acquisition process. In this work, we introduce Inverse Synthetic Aperture Fourier Ptychography, a novel approach to FP that foregoes changing the illumination angle or camera position and instead generates measurement diversity through target motion. Critically, we also introduce a novel learning-based method for estimating k-space coordinates from dual plane intensity measurements, thereby enabling synthetic aperture imaging without knowing the rotation of the target. We experimentally validate our method in simulation and on a tabletop optical system.
Updated: 2025-07-04 17:44:16
标题: 逆合成孔径傅立叶拼接术
摘要: 傅里叶ptychography(FP)是一种强大的基于光的合成孔径成像技术,它允许通过计算集成多样化的低分辨率远场测量来重建高分辨率、宽视场图像。通常,FP测量多样性是通过改变照明角度或相机位置来引入的;任何一种方法都会采样目标空间频率内容的不同部分,但这两种方法都会给采集过程引入重大成本和复杂性。在这项工作中,我们介绍了逆合成孔径傅里叶ptychography,这是一种新颖的FP方法,它通过目标运动而不是改变照明角度或相机位置来生成测量多样性。关键是,我们还引入了一种新颖的基于学习的方法,用于从双平面强度测量中估计k空间坐标,从而实现合成孔径成像而不需要知道目标的旋转。我们在模拟和桌面光学系统上实验证实了我们的方法。
更新时间: 2025-07-04 17:44:16
领域: eess.IV,cs.CV,cs.LG
Transforming Calabi-Yau Constructions: Generating New Calabi-Yau Manifolds with Transformers
Fine, regular, and star triangulations (FRSTs) of four-dimensional reflexive polytopes give rise to toric varieties, within which generic anticanonical hypersurfaces yield smooth Calabi-Yau threefolds. We employ transformers -- deep learning models originally developed for language modeling -- to generate FRSTs across a range of polytope sizes. Our models exhibit efficient and unbiased sampling, and can self-improve through retraining on their own output. These results lay the foundation for AICY: a community-driven platform that combines self-improving machine learning models with a continuously expanding FRST database to explore and catalog the Calabi-Yau landscape.
Updated: 2025-07-04 17:42:04
标题: 转换Calabi-Yau构造:使用变换器生成新的Calabi-Yau流形
摘要: 四维反射多面体的精细、规则和星形三角剖分(FRSTs)产生了吡利瓦里,其中通用的反规范超曲面产生了光滑的Calabi-Yau三维流形。我们利用变压器-深度学习模型最初是为语言建模而开发的-来生成各种大小的多面体上的FRSTs。我们的模型展示了高效和无偏的抽样,并且可以通过对自己的输出重新训练来自我改进。这些结果为AICY奠定了基础:一个社区驱动的平台,将自我改进的机器学习模型与不断扩大的FRST数据库相结合,以探索和分类Calabi-Yau景观。
更新时间: 2025-07-04 17:42:04
领域: hep-th,cs.LG,math.AG
Less is More: Empowering GUI Agent with Context-Aware Simplification
The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. We probe the characteristics of element and history contextual modeling in GUI agent and summarize: 1) the high-density and loose-relation of element context highlight the existence of many unrelated elements and their negative influence; 2) the high redundancy of history context reveals the inefficient history modeling in current GUI agents. In this work, we propose a context-aware simplification framework for building an efficient and effective GUI Agent, termed SimpAgent. To mitigate potential interference from numerous unrelated elements, we introduce a masking-based element pruning method that circumvents the intractable relation modeling through an efficient masking mechanism. To reduce the redundancy in historical information, we devise a consistency-guided history compression module, which enhances implicit LLM-based compression through innovative explicit guidance, achieving an optimal balance between performance and efficiency. With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances. Comprehensive navigation experiments across diverse web and mobile environments demonstrate the effectiveness and potential of our agent.
Updated: 2025-07-04 17:37:15
标题: Less is More: 通过上下文感知简化赋予GUI代理更多的能力
摘要: GUI代理的研究重点正在从依赖文本的方法转向纯基于视觉的方法,虽然有很大的潜力,但是更注重全面的预训练数据收集,而忽视了上下文建模的挑战。我们探究了GUI代理中元素和历史上下文建模的特征,并总结如下:1)元素上下文的高密度和松散关系突显了许多不相关元素的存在及其负面影响;2)历史上下文的高冗余显示了当前GUI代理中历史建模的低效性。在本研究中,我们提出了一个面向上下文的简化框架,用于构建一个高效和有效的GUI代理,称为SimpAgent。为了减少大量不相关元素可能带来的干扰,我们引入了基于掩模的元素修剪方法,通过一个高效的掩模机制来规避棘手的关系建模问题。为了减少历史信息中的冗余,我们设计了一个一致性引导的历史压缩模块,通过创新的显式引导增强了隐式LLM-based压缩,实现了性能和效率之间的最佳平衡。通过以上组件,SimpAgent减少了27%的FLOPs,并实现了卓越的GUI导航性能。在各种网络和移动环境中进行的全面导航实验展示了我们代理的有效性和潜力。
更新时间: 2025-07-04 17:37:15
领域: cs.CV,cs.AI,cs.HC,cs.LG
FAROS: Fair Graph Generation via Attribute Switching Mechanisms
Recent advancements in graph diffusion models (GDMs) have enabled the synthesis of realistic network structures, yet ensuring fairness in the generated data remains a critical challenge. Existing solutions attempt to mitigate bias by re-training the GDMs with ad-hoc fairness constraints. Conversely, with this work, we propose FAROS, a novel FAir graph geneRatiOn framework leveraging attribute Switching mechanisms and directly running in the generation process of the pre-trained GDM. Technically, our approach works by altering nodes' sensitive attributes during the generation. To this end, FAROS calculates the optimal fraction of switching nodes, and selects the diffusion step to perform the switch by setting tailored multi-criteria constraints to preserve the node-topology profile from the original distribution (a proxy for accuracy) while ensuring the edge independence on the sensitive attributes for the generated graph (a proxy for fairness). Our experiments on benchmark datasets for link prediction demonstrate that the proposed approach effectively reduces fairness discrepancies while maintaining comparable (or even higher) accuracy performance to other similar baselines. Noteworthy, FAROS is also able to strike a better accuracy-fairness trade-off than other competitors in some of the tested settings under the Pareto optimality concept, demonstrating the effectiveness of the imposed multi-criteria constraints.
Updated: 2025-07-04 17:31:41
标题: FAROS: 通过属性切换机制实现公平图生成
摘要: 最近在图扩散模型(GDMs)方面取得的进展使得能够合成逼真的网络结构,然而确保生成数据的公平性仍然是一个关键挑战。现有解决方案尝试通过重新训练GDMs并加入临时公平性约束来减轻偏见。相反,通过这项工作,我们提出了FAROS,这是一个利用属性切换机制并直接在预训练的GDM的生成过程中运行的新型公平图生成框架。从技术上讲,我们的方法通过在生成过程中改变节点的敏感属性来工作。为此,FAROS计算出了切换节点的最佳比例,并通过设置定制的多标准约束来选择扩散步骤以进行切换,以保持节点拓扑分布不变(代表准确性),同时确保生成图中敏感属性的边独立性(代表公平性)。我们在用于链接预测的基准数据集上的实验表明,所提出的方法有效地减少了公平性差异,同时保持了与其他类似基线相当(甚至更高)的准确性表现。值得注意的是,FAROS 在帕累托最优性概念下在一些测试设置中也能够取得比其他竞争对手更好的准确性-公平性权衡,证明了施加的多标准约束的有效性。
更新时间: 2025-07-04 17:31:41
领域: cs.LG
Agent-Based Detection and Resolution of Incompleteness and Ambiguity in Interactions with Large Language Models
Many of us now treat LLMs as modern-day oracles asking it almost any kind of question. However, consulting an LLM does not have to be a single turn activity. But long multi-turn interactions can get tedious if it is simply to clarify contextual information that can be arrived at through reasoning. In this paper, we examine the use of agent-based architecture to bolster LLM-based Question-Answering systems with additional reasoning capabilities. We examine the automatic resolution of potential incompleteness or ambiguities in questions by transducers implemented using LLM-based agents. We focus on several benchmark datasets that are known to contain questions with these deficiencies to varying degrees. We equip different LLMs (GPT-3.5-Turbo and Llama-4-Scout) with agents that act as specialists in detecting and resolving deficiencies of incompleteness and ambiguity. The agents are implemented as zero-shot ReAct agents. Rather than producing an answer in a single step, the model now decides between 3 actions a) classify b) resolve c) answer. Action a) decides if the question is incomplete, ambiguous, or normal. Action b) determines if any deficiencies identified can be resolved. Action c) answers the resolved form of the question. We compare the use of LLMs with and without the use of agents with these components. Our results show benefits of agents with transducer 1) A shortening of the length of interactions with human 2) An improvement in the answer quality and 3) Explainable resolution of deficiencies in the question. On the negative side we find while it may result in additional LLM invocations and in some cases, increased latency. But on tested datasets, the benefits outweigh the costs except when questions already have sufficient context. Suggesting the agent-based approach could be a useful mechanism to harness the power of LLMs to develop more robust QA systems.
Updated: 2025-07-04 17:28:33
标题: 基于代理的检测和解决与大型语言模型交互中的不完整性和模糊性
摘要: 我们中的许多人现在将LLMs视为现代的神谕,几乎可以询问任何类型的问题。然而,咨询LLM不必是一个单一的活动。但是,如果仅仅是为了澄清可以通过推理得出的上下文信息,长时间的多轮交互可能会变得乏味。在本文中,我们研究了使用基于代理的架构来增强基于LLM的问答系统的附加推理能力。我们研究了使用LLM代理实现的传感器自动解决问题中潜在的不完整或模糊。我们关注几个已知包含不同程度的这些缺陷问题的基准数据集。我们为不同的LLMs(GPT-3.5-Turbo和Llama-4-Scout)配备了充当检测和解决不完整性和模糊性缺陷专家的代理。代理被实现为零射击ReAct代理。模型现在不是一步生成答案,而是在以下3个动作之间决定:a)分类b)解决c)回答。动作a)决定问题是否不完整、模糊还是正常。动作b)确定是否可以解决识别出的任何缺陷。动作c)回答问题的解决形式。我们比较了使用具有和不具有这些组件的代理的LLMs。我们的结果显示代理与传感器的好处:1)与人类的交互长度缩短;2)回答质量的提高;3)解释问题中的缺陷。然而,我们发现负面影响是,虽然可能会导致额外的LLM调用,并在某些情况下增加延迟。但在测试数据集上,好处大于成本,除非问题已经具有足够的上下文。这表明代理的方法可能是利用LLMs的力量开发更强大的问答系统的有用机制。
更新时间: 2025-07-04 17:28:33
领域: cs.AI,cs.CL,cs.IR,I.2
Temporal Window Smoothing of Exogenous Variables for Improved Time Series Prediction
Although most transformer-based time series forecasting models primarily depend on endogenous inputs, recent state-of-the-art approaches have significantly improved performance by incorporating external information through exogenous inputs. However, these methods face challenges, such as redundancy when endogenous and exogenous inputs originate from the same source and limited ability to capture long-term dependencies due to fixed look-back windows. In this paper, we propose a method that whitens the exogenous input to reduce redundancy that may persist within the data based on global statistics. Additionally, our approach helps the exogenous input to be more aware of patterns and trends over extended periods. By introducing this refined, globally context-aware exogenous input to the endogenous input without increasing the lookback window length, our approach guides the model towards improved forecasting. Our approach achieves state-of-the-art performance in four benchmark datasets, consistently outperforming 11 baseline models. These results establish our method as a robust and effective alternative for using exogenous inputs in time series forecasting.
Updated: 2025-07-04 17:27:55
标题: 外生变量的时间窗平滑对改进时间序列预测的影响
摘要: 尽管大多数基于变压器的时间序列预测模型主要依赖内生输入,但最近的最先进方法通过引入外生输入显著提高了性能。然而,这些方法面临挑战,例如当内生和外生输入来源于相同来源时存在冗余性,并且由于固定回顾窗口而具有捕获长期依赖性的能力有限。在本文中,我们提出了一种方法,该方法对外生输入进行白化,以减少可能存在于数据中的基于全局统计的冗余性。此外,我们的方法有助于外生输入更加了解长期周期内的模式和趋势。通过将这种经过精细调整的、全局上下文感知的外生输入引入内生输入,而不增加回顾窗口长度,我们的方法引导模型朝着更好的预测方向发展。我们的方法在四个基准数据集中取得了最先进的性能,始终优于11个基线模型。这些结果确立了我们的方法作为在时间序列预测中使用外生输入的稳健有效替代方案。
更新时间: 2025-07-04 17:27:55
领域: cs.LG
Differentially private scale testing via rank transformations and percentile modifications
We develop a class of differentially private two-sample scale tests, called the rank-transformed percentile-modified Siegel--Tukey tests, or RPST tests. These RPST tests are inspired both by recent differentially private extensions of some common rank tests and some older modifications to non-private rank tests. We present the asymptotic distribution of the RPST test statistic under the null hypothesis, under a very general condition on the rank transformation. We also prove RPST tests are differentially private, and that their type I error does not exceed the given level. We uncover that the growth rate of the rank transformation presents a tradeoff between power and sensitivity. We do extensive simulations to investigate the effects of the tuning parameters and compare to a general private testing framework. Lastly, we show that our techniques can also be used to improve the differentially private signed-rank test.
Updated: 2025-07-04 17:25:50
标题: 通过秩转换和百分位修改实现差分隐私的规模测试
摘要: 我们开发了一类不同ially private two-sample scale tests,称为rank-transformed percentile-modified Siegel--Tukey tests,或RPST tests。这些RPST tests受到最近一些常见rank tests的不同ially private扩展和一些旧的对非private rank tests的修改的启发。我们在零假设下给出了RPST测试统计量的渐近分布,对rank变换的一个非常一般的条件。我们还证明了RPST tests是不同ially private的,并且它们的I型错误不超过给定的水平。我们发现rank变换的增长率在功率和敏感度之间存在权衡。我们进行了大量模拟来研究调整参数的影响,并与一个一般的private testing框架进行比较。最后,我们展示了我们的技术也可以用于改进不同ially private signed-rank test。
更新时间: 2025-07-04 17:25:50
领域: stat.ME,cs.LG,62G10, 62G20, 62G30, 68P27,G.3.1; G.3.5; K.6.5
SEAL: Vision-Language Model-Based Safe End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling
Autonomous driving technologies face significant safety challenges while operating under rare, diverse, and visually degraded weather scenarios. These challenges become more critical in cooperative settings, where vehicles and infrastructure jointly perceive and reason across complex environments. To address these issues, we propose SEAL, a vision-language model-based framework with adaptive multimodal learning for robust cooperative autonomous driving under long-tail scenarios. SEAL introduces three core innovations: (i) a prompt-driven long-tail scenario generation and evaluation pipeline that leverages foundation models to synthesize realistic long-tail conditions such as snow and fog across vehicle- and infrastructure-side views, enriching training diversity efficiently; (ii) a gated multi-scenario adaptive attention module that modulates the visual stream using scenario priors to recalibrate ambiguous or corrupted features; and (iii) a multi-task scenario-aware contrastive learning objective that improves multimodal alignment and promotes cross-scenario feature separability. Extensive experiments demonstrate that SEAL significantly outperforms existing baselines in reasoning, safety, and planning accuracy under complex, challenging driving conditions, advancing the safety, robustness, and scalability of autonomous driving.
Updated: 2025-07-04 17:25:14
标题: SEAL: 基于视觉语言模型的安全端到端自适应长尾建模的合作自动驾驶
摘要: 自动驾驶技术在罕见、多样化和视觉受损的天气情况下面临重大安全挑战。在合作环境中,车辆和基础设施共同感知和推理复杂环境的情况下,这些挑战变得更加关键。为了解决这些问题,我们提出了SEAL,这是一个基于视觉语言模型的框架,具有适应性多模态学习,用于在长尾场景下进行稳健的合作自动驾驶。SEAL引入了三个核心创新:(i)一个以提示驱动的长尾场景生成和评估管道,利用基础模型来合成真实的长尾条件,如雪和雾,跨车辆和基础设施视角,有效丰富训练多样性;(ii)一个门控多场景自适应注意模块,使用场景先验调节视觉流,以重新校准模糊或损坏的特征;(iii)一个多任务场景感知对比学习目标,改进多模态对齐并促进跨场景特征可分离性。大量实验证明,SEAL在复杂、具有挑战性的驾驶环境下,在推理、安全性和规划准确性方面明显优于现有基线,推动了自动驾驶的安全性、稳健性和可扩展性。
更新时间: 2025-07-04 17:25:14
领域: cs.RO,cs.AI,cs.CV
Roadmap for using large language models (LLMs) to accelerate cross-disciplinary research with an example from computational biology
Large language models (LLMs) are powerful artificial intelligence (AI) tools transforming how research is conducted. However, their use in research has been met with skepticism, due to concerns about hallucinations, biases and potential harms to research. These emphasize the importance of clearly understanding the strengths and weaknesses of LLMs to ensure their effective and responsible use. Here, we present a roadmap for integrating LLMs into cross-disciplinary research, where effective communication, knowledge transfer and collaboration across diverse fields are essential but often challenging. We examine the capabilities and limitations of LLMs and provide a detailed computational biology case study (on modeling HIV rebound dynamics) demonstrating how iterative interactions with an LLM (ChatGPT) can facilitate interdisciplinary collaboration and research. We argue that LLMs are best used as augmentative tools within a human-in-the-loop framework. Looking forward, we envisage that the responsible use of LLMs will enhance innovative cross-disciplinary research and substantially accelerate scientific discoveries.
Updated: 2025-07-04 17:20:14
标题: 使用大型语言模型(LLMs)加速跨学科研究的路线图:以计算生物学为例
摘要: 大型语言模型(LLMs)是强大的人工智能(AI)工具,正在改变研究的进行方式。然而,它们在研究中的使用受到怀疑,因为人们担心幻觉、偏见和潜在的对研究造成危害。这强调了清楚理解LLMs的优点和缺点以确保它们的有效和负责任的使用的重要性。在这里,我们提出了一个将LLMs整合到跨学科研究中的路线图,其中有效的沟通、知识传递和跨领域的合作至关重要,但通常具有挑战性。我们检查了LLMs的能力和局限性,并提供了一个详细的计算生物学案例研究(关于建模HIV反弹动力学),展示了如何与LLM(ChatGPT)进行迭代互动可以促进跨学科合作和研究。我们认为LLMs最好是作为人机协作框架中的增强工具来使用。展望未来,我们预见LLMs的负责任使用将增强创新的跨学科研究,并大大加速科学发现。
更新时间: 2025-07-04 17:20:14
领域: cs.AI,q-bio.OT
Predicting Business Angel Early-Stage Decision Making Using AI
External funding is crucial for early-stage ventures, particularly technology startups that require significant R&D investment. Business angels offer a critical source of funding, but their decision-making is often subjective and resource-intensive for both investor and entrepreneur. Much research has investigated this investment process to find the critical factors angels consider. One such tool, the Critical Factor Assessment (CFA), deployed more than 20,000 times by the Canadian Innovation Centre, has been evaluated post-decision and found to be significantly more accurate than investors' own decisions. However, a single CFA analysis requires three trained individuals and several days, limiting its adoption. This study builds on previous work validating the CFA to investigate whether the constraints inhibiting its adoption can be overcome using a trained AI model. In this research, we prompted multiple large language models (LLMs) to assign the eight CFA factors to a dataset of 600 transcribed, unstructured startup pitches seeking business angel funding with known investment outcomes. We then trained and evaluated machine learning classification models using the LLM-generated CFA scores as input features. Our best-performing model demonstrated high predictive accuracy (85.0% for predicting BA deal/no-deal outcomes) and exhibited significant correlation (Spearman's r = 0.896, p-value < 0.001) with conventional human-graded evaluations. The integration of AI-based feature extraction with a structured and validated decision-making framework yielded a scalable, reliable, and less-biased model for evaluating startup pitches, removing the constraints that previously limited adoption.
Updated: 2025-07-04 17:17:34
标题: 利用人工智能预测商业天使投资者早期决策-making
摘要: 外部资金对于早期创业公司至关重要,特别是那些需要大量研发投资的技术初创企业。商业天使是一种重要的资金来源,但他们的决策常常是主观的,并且对投资者和创业者来说资源密集。许多研究已经调查了这种投资过程,以找出商业天使考虑的关键因素。其中一种工具,Critical Factor Assessment(CFA),由加拿大创新中心部署了超过20,000次,在决策后进行评估,发现其比投资者自己的决策显着更准确。然而,单个CFA分析需要三名受过训练的个体和数天时间,限制了其应用。这项研究基于先前验证CFA的工作,研究了是否可以利用训练过的人工智能模型克服限制其采用的限制。在这项研究中,我们让多个大型语言模型(LLMs)为一个包含600个转录的、非结构化的初创企业演讲的数据集分配八个CFA因素,这些演讲寻求商业天使资金,并已知投资结果。然后,我们使用LLM生成的CFA分数作为输入特征训练和评估机器学习分类模型。我们表现最佳的模型展示了高预测准确性(85.0%用于预测BA交易/非交易结果),并且与传统的人为评估表现出显著相关性(斯皮尔曼相关系数r = 0.896,p值<0.001)。基于AI的特征提取与一个结构化和经过验证的决策框架的整合产生了一个可扩展、可靠、较少偏见的模型,用于评估初创企业的演讲,消除了先前限制采用的限制。
更新时间: 2025-07-04 17:17:34
领域: cs.LG,cs.AI
Code Simulation as a Proxy for High-order Tasks in Large Language Models
Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. We collect pairs of naturalistic and synthetic reasoning tasks to assess the capabilities of Large Language Models (LLM). While naturalistic tasks often require careful human handcrafting, we show that synthetic data is, in many cases, a good proxy that is much easier to collect at scale. We leverage common constructs in programming as the counterpart of the building blocks of naturalistic reasoning tasks, such as straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the capabilities of LLMs on sorting problems and repeated operations via sorting algorithms and nested loops. Our synthetic datasets further reveal that while the most powerful LLMs exhibit relatively strong execution capabilities, the process is fragile: it is negatively affected by memorisation and seems to rely heavily on pattern recognition. Our contribution builds upon synthetically testing the reasoning capabilities of LLMs as a scalable complement to handcrafted human-annotated problems.
Updated: 2025-07-04 16:53:00
标题: 代码模拟作为大型语言模型中高阶任务的代理
摘要: 许多推理、规划和解决问题的任务都具有固有的算法特性:正确模拟每一步是解决问题的充分条件。我们收集了自然和合成推理任务的对,以评估大型语言模型(LLM)的能力。虽然自然任务通常需要仔细的人工制作,但我们显示合成数据在许多情况下是一个更容易大规模收集的良好替代。我们利用编程中的常见结构作为自然推理任务的构建块对应物,例如直线程序、包含关键路径的代码以及近似和冗余指令。我们进一步评估LLMs在排序问题和通过排序算法和嵌套循环执行重复操作的能力。我们的合成数据集进一步揭示,虽然最强大的LLMs表现出相对强大的执行能力,但这一过程是脆弱的:它受到记忆的负面影响,似乎严重依赖于模式识别。我们的贡献建立在通过合成测试LLMs的推理能力作为手工制作的人工标注问题的可扩展补充。
更新时间: 2025-07-04 16:53:00
领域: cs.LG,cs.AI
CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning
Cosmological simulations provide a wealth of data in the form of point clouds and directed trees. A crucial goal is to extract insights from this data that shed light on the nature and composition of the Universe. In this paper we introduce CosmoBench, a benchmark dataset curated from state-of-the-art cosmological simulations whose runs required more than 41 million core-hours and generated over two petabytes of data. CosmoBench is the largest dataset of its kind: it contains 34 thousand point clouds from simulations of dark matter halos and galaxies at three different length scales, as well as 25 thousand directed trees that record the formation history of halos on two different time scales. The data in CosmoBench can be used for multiple tasks -- to predict cosmological parameters from point clouds and merger trees, to predict the velocities of individual halos and galaxies from their collective positions, and to reconstruct merger trees on finer time scales from those on coarser time scales. We provide several baselines on these tasks, some based on established approaches from cosmological modeling and others rooted in machine learning. For the latter, we study different approaches -- from simple linear models that are minimally constrained by symmetries to much larger and more computationally-demanding models in deep learning, such as graph neural networks. We find that least-squares fits with a handful of invariant features sometimes outperform deep architectures with many more parameters and far longer training times. Still there remains tremendous potential to improve these baselines by combining machine learning and cosmology to fully exploit the data. CosmoBench sets the stage for bridging cosmology and geometric deep learning at scale. We invite the community to push the frontier of scientific discovery by engaging with this dataset, available at https://cosmobench.streamlit.app
Updated: 2025-07-04 16:46:25
标题: CosmoBench:用于几何深度学习的多尺度、多视角、多任务宇宙学基准
摘要: 宇宙学模拟提供了大量的数据,以点云和有向树的形式呈现。一个关键目标是从这些数据中提取见解,以揭示宇宙的性质和组成。本文介绍了CosmoBench,这是一个从最先进的宇宙学模拟中策划出来的基准数据集,其运行需要超过4100万核小时,并生成了超过2PB的数据。CosmoBench是其类别中最大的数据集:它包含了三个不同长度尺度下的来自暗物质暗黑物质晕和星系模拟的3.4万个点云,以及记录了两个不同时间尺度上晕形成历史的2.5万个有向树。CosmoBench中的数据可用于多项任务--从点云和合并树中预测宇宙参数,到从其集体位置预测单个暗物质晕和星系的速度,以及从较粗糙时间尺度上合并树重建较精细时间尺度上的合并树。我们为这些任务提供了几个基线,其中一些基于宇宙学建模的已建立方法,而其他则根植于机器学习。对于后者,我们研究了不同的方法--从仅受对称约束的简单线性模型到更大、更需要计算资源的深度学习模型,如图神经网络。我们发现,有时仅使用少量不变特征的最小二乘拟合可能胜过具有更多参数和更长训练时间的深度架构。然而,通过结合机器学习和宇宙学,仍然存在巨大的潜力来改进这些基线,以充分利用数据。CosmoBench为在规模上建立宇宙学和几何深度学习之间的桥梁奠定了基础。我们邀请社区通过参与这个数据集来推动科学发现的前沿,该数据集可在https://cosmobench.streamlit.app上获得。
更新时间: 2025-07-04 16:46:25
领域: cs.LG,astro-ph.CO,astro-ph.IM
Offline RLAIF: Piloting VLM Feedback for RL via SFO
While internet-scale image and textual data have enabled strong generalization in Vision-Language Models (VLMs), the absence of internet-scale control data has impeded the development of similar generalization in standard reinforcement learning (RL) agents. Although VLMs are fundamentally limited in their ability to solve control tasks due to their lack of action-conditioned training data, their capacity for image understanding allows them to provide valuable feedback in RL tasks by recognizing successful outcomes. A key challenge in Reinforcement Learning from AI Feedback (RLAIF) is determining how best to integrate VLM-derived signals into the learning process. We explore this question in the context of offline RL and introduce a class of methods called Sub-Trajectory Filtered Optimization (SFO). We identify three key insights. First, trajectory length plays a crucial role in offline RL, as full-trajectory preference learning exacerbates the stitching problem, necessitating the use of sub-trajectories. Second, even in Markovian environments, a non-Markovian reward signal from a sequence of images is required to assess trajectory improvement, as VLMs do not interpret control actions and must rely on visual cues over time. Third, a simple yet effective approach--filtered and weighted behavior cloning--consistently outperforms more complex RLHF-based methods. We propose Sub-Trajectory Filtered Behavior Cloning (SFBC), a method that leverages VLM feedback on sub-trajectories while incorporating a retrospective filtering mechanism that removes sub-trajectories preceding failures to improve robustness and prevent turbulence. Please enjoy our airport puns.
Updated: 2025-07-04 16:44:27
标题: 离线RLAIF:通过SFO对VLM反馈进行RL的试点研究
摘要: 尽管互联网规模的图像和文本数据使视觉语言模型(VLMs)在泛化方面表现强大,但缺乏互联网规模的控制数据阻碍了标准强化学习(RL)代理的类似泛化的发展。尽管由于缺乏动作条件训练数据,VLMs在解决控制任务方面从根本上受到限制,但它们在图像理解方面的能力使它们能够通过识别成功结果在RL任务中提供有价值的反馈。强化学习从AI反馈(RLAIF)中的一个关键挑战是确定如何最好地将VLM衍生的信号整合到学习过程中。我们探讨了这个问题,并在离线RL的背景下引入了一类方法,称为Sub-Trajectory Filtered Optimization(SFO)。我们确定了三个关键的见解。首先,在离线RL中,轨迹长度在任务中发挥着至关重要的作用,因为全轨迹偏好学习加剧了拼接问题,必须使用子轨迹。其次,即使在马尔可夫环境中,也需要来自图像序列的非马尔可夫奖励信号来评估轨迹的改进,因为VLMs不解释控制动作,必须依赖随时间推移的视觉线索。第三,一种简单而有效的方法--过滤和加权行为克隆--始终优于更复杂的RLHF方法。我们提出了Sub-Trajectory Filtered Behavior Cloning(SFBC),这种方法利用VLM对子轨迹的反馈,同时结合了一个回顾性过滤机制,该机制去除了在失败之前的子轨迹,以提高鲁棒性并防止湍流。请享受我们的机场双关语。
更新时间: 2025-07-04 16:44:27
领域: cs.LG,cs.AI
Controlling Thinking Speed in Reasoning Models
Human cognition is theorized to operate in two modes: fast, intuitive System 1 thinking and slow, deliberate System 2 thinking. While current Large Reasoning Models (LRMs) excel at System 2 thinking, their inability to perform fast thinking leads to high computational overhead and latency. In this work, we enable LRMs to approximate human intelligence through dynamic thinking speed adjustment, optimizing accuracy-efficiency trade-offs. Our approach addresses two key questions: (1) how to control thinking speed in LRMs, and (2) when to adjust it for optimal performance. For the first question, we identify the steering vector that governs slow-fast thinking transitions in LRMs' representation space. Using this vector, we achieve the first representation editing-based test-time scaling effect, outperforming existing prompt-based scaling methods. For the second question, we apply real-time difficulty estimation to signal reasoning segments of varying complexity. Combining these techniques, we propose the first reasoning strategy that enables fast processing of easy steps and deeper analysis for complex reasoning. Without any training or additional cost, our plug-and-play method yields an average +1.3% accuracy with -8.6% token usage across leading LRMs and advanced reasoning benchmarks. All of our algorithms are implemented based on vLLM and are expected to support broader applications and inspire future research.
Updated: 2025-07-04 16:41:06
标题: 控制推理模型中的思维速度
摘要: 人类认知被理论化为两种模式:快速、直觉的系统1思维和缓慢、深思熟虑的系统2思维。尽管当前的大型推理模型(LRMs)擅长系统2思维,但它们无法执行快速思维,导致计算开销和延迟较高。在这项工作中,我们通过动态思维速度调整,使LRMs能够逼近人类智能,优化准确性和效率之间的权衡。我们的方法解决了两个关键问题:(1)如何控制LRMs中的思考速度,以及(2)何时调整以实现最佳性能。对于第一个问题,我们确定了在LRMs表示空间中控制缓慢-快速思考转换的导向向量。利用该向量,我们实现了基于表示编辑的测试时间缩放效果,优于现有基于提示的缩放方法。对于第二个问题,我们应用实时难度估计来指示不同复杂性的推理段。结合这些技术,我们提出了第一个能够实现易步骤快速处理和针对复杂推理进行更深入分析的推理策略。在没有任何培训或额外成本的情况下,我们的即插即用方法在主要LRMs和先进推理基准测试中实现了平均+1.3%的准确率,令牌使用量减少了-8.6%。所有我们的算法都是基于vLLM实现的,并有望支持更广泛的应用并激发未来的研究。
更新时间: 2025-07-04 16:41:06
领域: cs.CL,cs.AI
Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer
The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians we obtain analytic \textit{phase boundaries} logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model's empirical token rankings ($r\approx-0.70$, $p<10^{-3}$).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. This validation not only furnishes a tractable, physics-inspired lens for interpretability but also provides the groundwork for novel generative models, bridging the gap between theoretical condensed matter physics and AI.
Updated: 2025-07-04 16:40:45
标题: 测试自注意力的自旋浴视角:对GPT-2 Transformer的哈密顿分析
摘要: 最近由Huo和Johnson提出的基于物理的框架将大型语言模型(LLMs)的注意机制建模为一个相互作用的二体自旋系统,为重复和偏见等现象提供了第一原理解释。在此假设基础上,我们从一个生产级GPT-2模型中提取完整的查询-键权重矩阵,并推导出每个注意力头部的相应有效哈密顿量。从这些哈密顿量中,我们获得了分析\textit{相界}对数差标准,可以预测在给定上下文中哪个标记应该主导下一个标记分布。对20个事实回忆提示中的144个头部进行系统评估,发现理论对数差与模型的经验标记排名之间存在强烈的负相关($r\approx-0.70$,$p<10^{-3}$)。有针对性的消融进一步表明,抑制与自旋浴预测最一致的头部会导致预期的输出概率变化,证实了因果关系而不是偶然关联。综合起来,我们的发现为生产级模型中自旋浴类比提供了第一强有力的实证证据。这种验证不仅为解释性提供了一种可操作的、受物理启发的视角,还为新颖的生成模型奠定了基础,弥合了理论凝聚态物理学和人工智能之间的差距。
更新时间: 2025-07-04 16:40:45
领域: cond-mat.mtrl-sci,cs.LG
Sign Spotting Disambiguation using Large Language Models
Sign spotting, the task of identifying and localizing individual signs within continuous sign language video, plays a pivotal role in scaling dataset annotations and addressing the severe data scarcity issue in sign language translation. While automatic sign spotting holds great promise for enabling frame-level supervision at scale, it grapples with challenges such as vocabulary inflexibility and ambiguity inherent in continuous sign streams. Hence, we introduce a novel, training-free framework that integrates Large Language Models (LLMs) to significantly enhance sign spotting quality. Our approach extracts global spatio-temporal and hand shape features, which are then matched against a large-scale sign dictionary using dynamic time warping and cosine similarity. This dictionary-based matching inherently offers superior vocabulary flexibility without requiring model retraining. To mitigate noise and ambiguity from the matching process, an LLM performs context-aware gloss disambiguation via beam search, notably without fine-tuning. Extensive experiments on both synthetic and real-world sign language datasets demonstrate our method's superior accuracy and sentence fluency compared to traditional approaches, highlighting the potential of LLMs in advancing sign spotting.
Updated: 2025-07-04 16:38:09
标题: 标志识别消歧辨别:利用大型语言模型
摘要: 手语标志识别,即在连续手语视频中识别和定位个别手语标志的任务,在扩展数据集注释并解决手语翻译中严重数据稀缺问题方面起着关键作用。虽然自动手语标志识别具有在规模上实现帧级监督的巨大潜力,但它面临着挑战,例如连续手语流中固有的词汇不灵活性和歧义性。因此,我们引入了一种新颖的、无需训练的框架,该框架整合了大型语言模型(LLMs),以显著提高手语标志识别质量。我们的方法提取全局时空和手形特征,然后使用动态时间弯曲和余弦相似性将其与大规模手语词典进行匹配。这种基于词典的匹配固有地提供了优越的词汇灵活性,无需重新训练模型。为了减轻匹配过程中的噪音和歧义,LLM通过波束搜索执行上下文感知的注解消歧,特别是无需微调。对合成和真实手语数据集进行的大量实验表明,与传统方法相比,我们的方法在准确性和句子流畅性方面表现更优,突显了LLMs在推进手语标志识别方面的潜力。
更新时间: 2025-07-04 16:38:09
领域: cs.CV,cs.AI
Towards Unified Neurosymbolic Reasoning on Knowledge Graphs
Knowledge Graph (KG) reasoning has received significant attention in the fields of artificial intelligence and knowledge engineering, owing to its ability to autonomously deduce new knowledge and consequently enhance the availability and precision of downstream applications. However, current methods predominantly concentrate on a single form of neural or symbolic reasoning, failing to effectively integrate the inherent strengths of both approaches. Furthermore, the current prevalent methods primarily focus on addressing a single reasoning scenario, presenting limitations in meeting the diverse demands of real-world reasoning tasks. Unifying the neural and symbolic methods, as well as diverse reasoning scenarios in one model is challenging as there is a natural representation gap between symbolic rules and neural networks, and diverse scenarios exhibit distinct knowledge structures and specific reasoning objectives. To address these issues, we propose a unified neurosymbolic reasoning framework, namely Tunsr, for KG reasoning. Tunsr first introduces a consistent structure of reasoning graph that starts from the query entity and constantly expands subsequent nodes by iteratively searching posterior neighbors. Based on it, a forward logic message-passing mechanism is proposed to update both the propositional representations and attentions, as well as first-order logic (FOL) representations and attentions of each node. In this way, Tunsr conducts the transformation of merging multiple rules by merging possible relations at each step. Finally, the FARI algorithm is proposed to induce FOL rules by constantly performing attention calculations over the reasoning graph. Extensive experimental results on 19 datasets of four reasoning scenarios (transductive, inductive, interpolation, and extrapolation) demonstrate the effectiveness of Tunsr.
Updated: 2025-07-04 16:29:45
标题: 朝向知识图上统一的神经符号推理
摘要: 知识图谱(KG)推理在人工智能和知识工程领域引起了广泛关注,因为它能够自主推导出新知识,从而增强下游应用程序的可用性和精度。然而,目前的方法主要集中在单一形式的神经或符号推理上,未能有效整合两种方法的固有优势。此外,目前主流的方法主要集中在解决单一推理场景,存在无法满足现实推理任务多样需求的限制。将神经和符号方法以及不同推理场景统一在一个模型中是具有挑战性的,因为符号规则和神经网络之间存在自然的表征差距,不同场景展示出不同的知识结构和具体的推理目标。为了解决这些问题,我们提出了一个统一的神经符号推理框架,名为Tunsr,用于KG推理。Tunsr首先引入了一个连贯的推理图结构,从查询实体开始,并通过迭代搜索后续邻居节点不断扩展。基于此,提出了一种前向逻辑消息传递机制,用于更新每个节点的命题表示和注意力,以及一阶逻辑(FOL)表示和注意力。通过这种方式,Tunsr通过合并可能的关系在每个步骤上合并多个规则的转换。最后,提出了FARI算法,通过不断在推理图上执行注意力计算来引导FOL规则。对四种推理场景(传导、归纳、插值和外推)的19个数据集进行的大量实验结果表明了Tunsr的有效性。
更新时间: 2025-07-04 16:29:45
领域: cs.AI
Willchain: Decentralized, Privacy-Preserving, Self-Executing, Digital Wills
This work presents a novel decentralized protocol for digital estate planning that integrates advances distributed computing, and cryptography. The original proof-of-concept was constructed using purely solidity contracts. Since then, we have enhanced the implementation into a layer-1 protocol that uses modern interchain communication to connect several heterogeneous chain types. A key contribution of this research is the implementation of several modern cryptographic primitives to support various forms of claims for information validation. These primitives introduce an unmatched level of privacy to the process of digital inheritance. We also demonstrate on a set of heterogeneous smart contracts, following the same spec, on each chain to serve as entry points, gateways, or bridge contracts that are invoked via a path from the will module on our protocol, to the contract. This ensures a fair and secure distribution of digital assets in accordance with the wishes of the decedent without the requirement of moving their funds. This research further extends its innovations with a user interaction model, featuring a check-in system and account abstraction process, which enhances flexibility and user-friendliness without compromising on security. By developing a dedicated permissionless blockchain that is secured by a network of validators, and interchain relayers, the proposed protocol signifies a transformation in the digital estate planning industry and illustrates the potential of blockchain technology in revolutionizing traditional legal and personal spheres. Implementing a cryptoeconomic network at the core of inheritance planning allows for unique incentive compatible economic mechanisms to be constructed.
Updated: 2025-07-04 16:23:32
标题: Willchain:去中心化、保护隐私、自动执行的数字遗嘱
摘要: 这项工作提出了一种新颖的数字遗产规划去中心化协议,该协议整合了先进的分布式计算和密码学技术。最初的概念验证是使用纯粹的Solidity合约构建的。自那时以来,我们已将实现增强为使用现代互链通信连接多种异构链类型的第一层协议。这项研究的一个关键贡献是实现了几种现代密码原语,以支持各种形式的信息验证声明。这些原语为数字遗产传承过程引入了无与伦比的隐私级别。我们还在一组异构智能合约上进行了演示,这些合约遵循相同的规范,在每条链上充当入口点、网关或桥接合约,通过从我们的协议的遗嘱模块到合约的路径来调用。这确保了数字资产按照逝者的意愿公平安全地分配,而无需移动他们的资金。这项研究通过用户交互模型进一步扩展了其创新,包括签到系统和账户抽象过程,增强了灵活性和用户友好性,同时不损害安全性。通过开发一个由验证者网络和互链中继器保护的专用无许可区块链,所提出的协议标志着数字遗产规划行业的转变,并展示了区块链技术在革新传统法律和个人领域中的潜力。在继承规划的核心实施一个加密经济网络,可以构建独特的激励兼容的经济机制。
更新时间: 2025-07-04 16:23:32
领域: cs.CR,cs.CE,cs.ET
A Survey of Large Language Models on Generative Graph Analytics: Query, Learning, and Applications
A graph is a fundamental data model to represent various entities and their complex relationships in society and nature, such as social networks, transportation networks, and financial networks. Recently, large language models (LLMs) have showcased a strong generalization ability to handle various natural language processing tasks to answer users' arbitrary questions and generate specific-domain content. Compared with graph learning models, LLMs enjoy superior advantages in addressing the challenges of generalizing graph tasks by eliminating the need for training graph learning models and reducing the cost of manual annotation. However, LLMs are sequential models for textual data, but graphs are non-sequential topological data. It is challenging to adapt LLMs to tackle graph analytics tasks. In this survey, we conduct a comprehensive investigation of existing LLM studies on graph data, which summarizes the relevant graph analytics tasks solved by advanced LLM models and points out the existing challenges and future directions. Specifically, we study the key problems of LLM-based generative graph analytics (LLM-GGA) in terms of three categories: LLM-based graph query processing (LLM-GQP), LLM-based graph inference and learning (LLM-GIL), and graph-LLM-based applications. LLM-GQP focuses on an integration of graph analytics techniques and LLM prompts, including graph understanding and knowledge graphs and LLMs, while LLM-GIL focuses on learning and reasoning over graphs, including graph learning, graph-formed reasoning, and graph representation. We summarize the useful prompts incorporated into LLM to handle different graph downstream tasks. Moreover, we give a summary of LLM model evaluation, benchmark datasets/tasks, and a deep pro and cons analysis of the discussed LLM-GGA models. We also explore open problems and future directions in the research area of LLMs and graph analytics.
Updated: 2025-07-04 16:22:10
标题: 一项关于生成式图分析中大型语言模型的调查:查询、学习和应用
摘要: 图是表示社会和自然中各种实体及其复杂关系的基本数据模型,如社交网络、交通网络和金融网络。最近,大型语言模型(LLMs)展示了强大的泛化能力,可以处理各种自然语言处理任务,回答用户任意问题并生成特定领域内容。与图学习模型相比,LLMs在解决泛化图任务的挑战方面具有明显优势,通过消除训练图学习模型的需要并降低手动注释的成本。然而,LLMs是用于文本数据的顺序模型,而图是非顺序拓扑数据。将LLMs调整为处理图分析任务是具有挑战性的。在本调查中,我们对现有LLM研究在图数据上的应用进行了全面调查,总结了由先进LLM模型解决的相关图分析任务,并指出了现有挑战和未来方向。具体而言,我们研究了基于LLM的生成图分析(LLM-GGA)的关键问题,分为三类:基于LLM的图查询处理(LLM-GQP)、基于LLM的图推理和学习(LLM-GIL)以及基于图-LLM的应用。LLM-GQP侧重于图分析技术和LLM提示的集成,包括图理解和知识图谱以及LLMs,而LLM-GIL侧重于在图上学习和推理,包括图学习、基于图的推理和图表示。我们总结了用于处理不同图下游任务的LLM中包含的有用提示。此外,我们对LLM模型评估、基准数据集/任务进行了总结,并对讨论的LLM-GGA模型进行了深入的利弊分析。我们还探讨了LLM和图分析研究领域的开放问题和未来方向。
更新时间: 2025-07-04 16:22:10
领域: cs.CL,cs.AI,cs.DB
The Geometries of Truth Are Orthogonal Across Tasks
Large Language Models (LLMs) have demonstrated impressive generalization capabilities across various tasks, but their claim to practical relevance is still mired by concerns on their reliability. Recent works have proposed examining the activations produced by an LLM at inference time to assess whether its answer to a question is correct. Some works claim that a "geometry of truth" can be learned from examples, in the sense that the activations that generate correct answers can be distinguished from those leading to mistakes with a linear classifier. In this work, we underline a limitation of these approaches: we observe that these "geometries of truth" are intrinsically task-dependent and fail to transfer across tasks. More precisely, we show that linear classifiers trained across distinct tasks share little similarity and, when trained with sparsity-enforcing regularizers, have almost disjoint supports. We show that more sophisticated approaches (e.g., using mixtures of probes and tasks) fail to overcome this limitation, likely because activation vectors commonly used to classify answers form clearly separated clusters when examined across tasks.
Updated: 2025-07-04 16:21:15
标题: 真理的几何形态在不同任务中是正交的
摘要: 大型语言模型(LLMs)已经展示出在各种任务中的惊人泛化能力,但它们在实际相关性方面的主张仍然受到对其可靠性的担忧的困扰。最近的研究提出在推理时检查LLM产生的激活,以评估其对问题的答案是否正确。一些研究声称可以从示例中学习“真相几何”,即可以通过线性分类器区分生成正确答案的激活和导致错误的激活。在这项工作中,我们强调了这些方法的一个局限性:我们观察到这些“真相几何”是内在的任务依赖的,并且无法在任务之间转移。更确切地说,我们表明在不同任务之间训练的线性分类器几乎没有相似性,并且当使用稀疏约束正则化器进行训练时,几乎具有不相交的支持。我们表明更复杂的方法(例如使用各种探针和任务)无法克服这一限制,可能是因为用于对答案进行分类的激活向量在跨任务检查时形成明显分离的簇。
更新时间: 2025-07-04 16:21:15
领域: cs.LG,cs.AI,cs.CL,stat.ML
Plugging Attention into Power Grids: Towards Transparent Forecasting
Accurate electricity consumption forecasting is crucial for ensuring grid stability and optimizing power generation, particularly in increasingly decentralized and complex systems. While classical approaches such as Generalized Additive Models (GAMs) remain widely used, they often fail to capture the spatial dependencies inherent in energy networks. Graph Neural Networks (GNNs) offer a principled framework to incorporate this structure by directly leveraging graph topologies. In this work, we evaluate a broad set of GNN architectures -- including GCN, GraphSAGE, ChebConv, TAG, APPNP, TransformerConv, and Graph Attention Networks (GAT and GATv2) -- on two real-world electricity consumption datasets from France and the UK. Our experiments show that while complex architectures like GATv2 and TransformerConv do not consistently outperform their simpler counterparts, models such as GCN and APPNP achieve strong results in low-data or highly disaggregated settings. Nonetheless, the vanilla GAT remains highly competitive across both datasets and offers an additional interpretability layer via attention mechanisms. We perform a temporal analysis of attention weights, revealing evolving patterns of regional interaction linked to seasonal and meteorological variability. These results highlight that, although attention is not universally superior, it provides valuable explanatory power when spatial dependencies are prominent. Finally, we benchmark ensemble-based expert aggregation strategies, showing that uniform or learned combinations can enhance robustness and outperform individual models under data heterogeneity.
Updated: 2025-07-04 16:18:18
标题: 将注意力集中到电网中:迈向透明预测
摘要: 精确的电力消耗预测对于确保电网稳定性和优化发电至关重要,特别是在日益分散和复杂的系统中。虽然传统方法如广义加性模型(GAMs)仍然被广泛使用,但它们经常无法捕捉能源网络中固有的空间依赖关系。图神经网络(GNNs)提供了一个原则性框架,通过直接利用图拓扑结构来整合这种结构。在这项工作中,我们评估了一组广泛的GNN架构--包括GCN、GraphSAGE、ChebConv、TAG、APPNP、TransformerConv以及图注意力网络(GAT和GATv2)--在来自法国和英国的两个真实电力消耗数据集上的表现。我们的实验表明,尽管像GATv2和TransformerConv这样的复杂架构并不总是优于简单的对应模型,但像GCN和APPNP这样的模型在低数据量或高度分散的环境中取得了良好的结果。尽管如此,普通的GAT在两个数据集上仍然具有很高的竞争力,并通过注意力机制提供了额外的可解释性层。我们对注意力权重进行了时间分析,揭示了与季节性和气象变化相关的区域互动的演变模式。这些结果突显了,尽管注意力并不普遍优越,但在空间依赖关系突出时,它提供了有价值的解释能力。最后,我们对基于集成的专家聚合策略进行了基准测试,结果表明统一或学习组合可以增强稳健性,在数据异质性下优于单个模型。
更新时间: 2025-07-04 16:18:18
领域: cs.LG
A Resource Efficient Quantum Kernel
Quantum processors may enhance machine learning by mapping high-dimensional data onto quantum systems for processing. Conventional quantum kernels, or feature maps, for encoding data features onto a quantum circuit are currently impractical, as the number of entangling gates scales quadratically with the dimension of the dataset and the number of qubits. In this work, we introduce a quantum kernel designed to handle high-dimensional data with a significantly reduced number of qubits and entangling operations. Our approach preserves essential data characteristics while promoting computational efficiency, as evidenced by extensive experiments on benchmark datasets that demonstrate a marked improvement in both accuracy and resource utilization, as compared to state-of-the-art quantum feature maps. Our noisy simulations results combined with lower resource requirements highlight our kernel's ability to function within the constraints of noisy intermediate-scale quantum devices. Through numerical simulations and small-scale implementation on a superconducting circuit quantum computing platform, we demonstrate that our scheme performs on par or better than a set of classical algorithms for classification. Our findings herald a promising avenue for the practical implementation of quantum machine learning algorithms on near future quantum computing platforms.
Updated: 2025-07-04 16:12:57
标题: 一种资源高效的量子内核
摘要: 量子处理器可以通过将高维数据映射到量子系统进行处理,从而提升机器学习的效果。目前,传统的用于将数据特征编码到量子电路上的量子核或特征映射是不切实际的,因为纠缠门的数量与数据集的维度和量子比特的数量呈二次比例增长。在这项工作中,我们引入了一个设计用于处理高维数据的量子核,可以显著减少量子比特和纠缠操作的数量。我们的方法在保留关键数据特征的同时提高了计算效率,通过对基准数据集的广泛实验,我们证明与现有最先进的量子特征映射相比,我们的方法在准确性和资源利用方面都有显著改善。我们的噪声模拟结果结合更低的资源需求突显了我们的核心能够在嘈杂的中间规模量子设备的约束条件下运行。通过数值模拟和在超导电路量子计算平台上的小规模实施,我们展示了我们的方案在分类方面与一组经典算法相当甚至更好的表现。我们的发现预示着在不久的将来的量子计算平台上实现量子机器学习算法的实际途径。
更新时间: 2025-07-04 16:12:57
领域: quant-ph,cs.LG
Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning
We propose a hybrid approach to machine Theory of Mind (ToM) that uses large language models (LLMs) as a mechanism for generating hypotheses and likelihood functions with a Bayesian inverse planning model that computes posterior probabilities for an agent's likely mental states given its actions. Bayesian inverse planning models can accurately predict human reasoning on a variety of ToM tasks, but these models are constrained in their ability to scale these predictions to scenarios with a large number of possible hypotheses and actions. Conversely, LLM-based approaches have recently demonstrated promise in solving ToM benchmarks, but can exhibit brittleness and failures on reasoning tasks even when they pass otherwise structurally identical versions. By combining these two methods, this approach leverages the strengths of each component, closely matching optimal results on a task inspired by prior inverse planning models and improving performance relative to models that utilize LLMs alone or with chain-of-thought prompting, even with smaller LLMs that typically perform poorly on ToM tasks. We also exhibit the model's potential to predict mental states on open-ended tasks, offering a promising direction for future development of ToM models and the creation of socially intelligent generative agents.
Updated: 2025-07-04 16:01:27
标题: 朝向利用大型语言模型增强的逆向规划的机器心智理论
摘要: 我们提出了一种机器心灵理论(ToM)的混合方法,该方法利用大型语言模型(LLMs)作为生成假设和似然函数的机制,结合贝叶斯逆向规划模型,计算出代理人在行为下可能心理状态的后验概率。贝叶斯逆向规划模型可以准确预测人类在各种ToM任务上的推理,但这些模型在将这些预测扩展到可能假设和行动数量众多的情景时受到限制。相反,基于LLM的方法最近展示了在解决ToM基准测试中的潜力,但即使在通过结构上相同的版本时也可能在推理任务中表现脆弱并失败。通过结合这两种方法,这种方法利用了每个组件的优势,与以往逆向规划模型启发的任务上的最佳结果紧密匹配,并相对于仅使用LLM或链式思维提示的模型,甚至在通常在ToM任务上表现不佳的较小LLM的情况下,提高了性能。我们还展示了该模型在开放式任务上预测心理状态的潜力,为未来ToM模型的发展和创造具有社交智能的生成代理人提供了有希望的方向。
更新时间: 2025-07-04 16:01:27
领域: cs.AI,cs.LG
Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data
Randomized trials are typically designed to detect average treatment effects but often lack the statistical power to uncover effect heterogeneity over patient characteristics, limiting their value for personalized decision-making. To address this, we propose the QR-learner, a model-agnostic learner that estimates conditional average treatment effects (CATE) within the trial population by leveraging external data from other trials or observational studies. The proposed method is robust: it has the potential to reduce the CATE prediction mean squared error while maintaining consistency, even when the external data is not aligned with the trial. Moreover, we introduce a procedure that combines the QR-learner with a trial-only CATE learner and show that it asymptotically matches or exceeds the trial-only learner in terms of mean squared error. We examine the performance of our approach in simulation studies and apply the methods to a real-world dataset, demonstrating improvements in both CATE estimation and statistical power for detecting heterogeneous effects.
Updated: 2025-07-04 16:01:05
标题: 利用外部数据的随机试验中异质性治疗效果的稳健估计
摘要: 随机试验通常旨在检测平均治疗效果,但往往缺乏统计功效来揭示患者特征上的效果异质性,从而限制了其在个性化决策制定方面的价值。为了解决这个问题,我们提出了QR-learner,这是一个模型无关的学习器,通过利用来自其他试验或观察性研究的外部数据,在试验人群中估计条件平均治疗效果(CATE)。所提出的方法具有鲁棒性:即使外部数据与试验不一致,也有可能降低CATE预测均方误差,同时保持一致性。此外,我们介绍了一个将QR-learner与仅试验CATE学习器相结合的程序,并表明在均方误差方面,它在渐近意义下与或超过仅试验学习器。我们通过模拟研究检验了我们方法的性能,并将这些方法应用到一个真实世界的数据集中,展示了对CATE估计和检测异质效果的统计功效的改进。
更新时间: 2025-07-04 16:01:05
领域: stat.ML,cs.LG,stat.ME
STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking
The ability to extract structured information from unstructured sources-such as free-text documents and scientific literature-is critical for accelerating scientific discovery and knowledge synthesis. Large Language Models (LLMs) have demonstrated remarkable capabilities in various natural language processing tasks, including structured information extraction. However, their effectiveness often diminishes in specialized, domain-specific contexts that require nuanced understanding and expert-level domain knowledge. In addition, existing LLM-based approaches frequently exhibit poor transferability across tasks and domains, limiting their scalability and adaptability. To address these challenges, we introduce StructSense, a modular, task-agnostic, open-source framework for structured information extraction built on LLMs. StructSense is guided by domain-specific symbolic knowledge encoded in ontologies, enabling it to navigate complex domain content more effectively. It further incorporates agentic capabilities through self-evaluative judges that form a feedback loop for iterative refinement, and includes human-in-the-loop mechanisms to ensure quality and validation. We demonstrate that StructSense can overcome both the limitations of domain sensitivity and the lack of cross-task generalizability, as shown through its application to diverse neuroscience information extraction tasks.
Updated: 2025-07-04 15:51:07
标题: STRUCTSENSE:一种任务无关的主体框架,用于带有人-在循环评估和基准测试的结构化信息提取.
摘要: 从非结构化来源(如自由文本文档和科学文献)中提取结构化信息的能力对加速科学发现和知识综合至关重要。大型语言模型(LLMs)在各种自然语言处理任务中展现出卓越的能力,包括结构化信息提取。然而,在需要微妙理解和专家级领域知识的专业领域特定背景下,它们的有效性经常会减弱。此外,现有基于LLMs的方法经常在任务和领域之间表现出较差的可转移性,限制了它们的可扩展性和适应性。为了解决这些挑战,我们引入了StructSense,一个基于LLMs构建的模块化、任务不可知的、开源框架,用于结构化信息提取。StructSense受领域特定符号知识的指导,使其能够更有效地导航复杂的领域内容。它进一步通过自我评估的评判者来增加主动性能力,形成一个用于迭代细化的反馈循环,并包括人在循环中的机制以确保质量和验证。我们证明StructSense能够克服领域敏感性和缺乏跨任务普适性的局限,通过其在各种神经科学信息提取任务中的应用展示了这一点。
更新时间: 2025-07-04 15:51:07
领域: cs.CL,cs.AI
TACOS: Open Tagging and Comparative Scoring for Instruction Fine-Tuning Data Selection
Instruction Fine-Tuning (IFT) is crucial for aligning large language models (LLMs) with human preferences, and selecting a small yet representative subset from massive data significantly facilitates IFT in terms of both efficiency and effectiveness. Nevertheless, existing approaches suffer from two limitations: the use of simple heuristics restricts data diversity, while the singleton data quality evaluation accounts for inconsistent criteria between independent samples. To address the issues, we present TACOS, an innovative method that integrates Open Tagging and Comparative Scoring for IFT data selection. To capture data diversity, we leverage LLMs to assign open-domain tags to human queries, followed by a normalization stage to denoise the open tags and enable efficient clustering. Additionally, we suggest a comparative scoring method that allows the relative quality evaluation of samples within a cluster, avoiding inconsistent criteria seen in singleton-based evaluations. Extensive experiments across diverse datasets and LLM architectures demonstrate that TACOS outperforms existing approaches by a large margin. Notably, it achieves superior instruction-following performance on MT-Bench and ranks 1st among LLaMA2-7B-Based models on AlpacaEval 2.0, illustrating its efficacy for IFT data selection.
Updated: 2025-07-04 15:46:07
标题: TACOS: 开放式标记和比较评分用于指导微调数据选择
摘要: 指令微调(IFT)对齐大型语言模型(LLMs)与人类偏好至关重要,并从海量数据中选择一个小而具代表性的子集显著促进了IFT的效率和有效性。然而,现有方法存在两个限制:使用简单的启发式方法限制了数据多样性,而单例数据质量评估考虑了独立样本之间的不一致标准。为了解决这些问题,我们提出了TACOS,一种创新方法,该方法集成了开放标签和比较评分,用于IFT数据选择。为了捕捉数据多样性,我们利用LLMs为人类查询分配开放领域标签,然后进行归一化阶段以去噪开放标签并启用高效聚类。此外,我们提出了一种比较评分方法,允许在聚类中对样本进行相对质量评估,避免了基于单例的评估中出现的不一致标准。通过对多样数据集和LLM架构进行广泛实验,证明TACOS在很大程度上优于现有方法。值得注意的是,它在MT-Bench上实现了优越的指令跟随性能,并在AlpacaEval 2.0上以第一名的成绩排名在基于LLaMA2-7B的模型中,说明了其在IFT数据选择方面的有效性。
更新时间: 2025-07-04 15:46:07
领域: cs.CL,cs.AI
Recon, Answer, Verify: Agents in Search of Truth
Automated fact checking with large language models (LLMs) offers a scalable alternative to manual verification. Evaluating fact checking is challenging as existing benchmark datasets often include post claim analysis and annotator cues, which are absent in real world scenarios where claims are fact checked immediately after being made. This limits the realism of current evaluations. We present Politi Fact Only (PFO), a 5 class benchmark dataset of 2,982 political claims from politifact.com, where all post claim analysis and annotator cues have been removed manually. This ensures that models are evaluated using only the information that would have been available prior to the claim's verification. Evaluating LLMs on PFO, we see an average performance drop of 22% in terms of macro f1 compared to PFO's unfiltered version. Based on the identified challenges of the existing LLM based fact checking system, we propose RAV (Recon Answer Verify), an agentic framework with three agents: question generator, answer generator, and label generator. Our pipeline iteratively generates and answers sub questions to verify different aspects of the claim before finally generating the label. RAV generalizes across domains and label granularities, and it outperforms state of the art approaches on well known baselines RAWFC (fact checking, 3 class) by 25.28%, and on HOVER (encyclopedia, 2 class) by 1.54% on 2 hop, 4.94% on 3 hop, and 1.78% on 4 hop, sub categories respectively. RAV shows the least performance drop compared to baselines of 16.3% in macro f1 when we compare PFO with its unfiltered version.
Updated: 2025-07-04 15:44:28
标题: 重建、回答、验证:搜索真相的代理人
摘要: 使用大型语言模型(LLMs)进行自动事实检查提供了一种可扩展的替代手动验证的方法。评估事实检查是具有挑战性的,因为现有的基准数据集通常包括声明后的分析和注释者线索,而这些在现实世界情况下是不存在的,即声明在发出后立即进行事实检查。这限制了当前评估的现实性。我们提出了Politi Fact Only(PFO),这是一个由 politifact.com 中的 2,982 个政治声明组成的 5 类基准数据集,其中所有声明后的分析和注释者线索均已手动删除。这确保了模型仅使用在声明验证之前可用的信息进行评估。在 PFO 上评估 LLMs,我们看到与 PFO 未经过滤版本相比,宏观 f1 指标平均性能下降了 22%。基于现有基于 LLM 的事实检查系统的识别出的挑战,我们提出了RAV(Recon Answer Verify),这是一个包含三个代理的框架:问题生成器、答案生成器和标签生成器。我们的流水线迭代生成和回答子问题,以验证声明的不同方面,最终生成标签。RAV 跨领域和标签粒度进行泛化,在知名基线 RAWFC(事实检查,3 类)上的表现比现有技术方法高出 25.28%,在 HOVER(百科全书,2 类)上的表现分别比 2 跳高出 1.54%,比 3 跳高出 4.94%,比 4 跳高出 1.78%。与基线相比,当我们将 PFO 与其未经过滤版本进行比较时,RAV 在宏观 f1 上表现的性能下降最少,仅为 16.3%。
更新时间: 2025-07-04 15:44:28
领域: cs.CL,cs.AI
Interaction Techniques that Encourage Longer Prompts Can Improve Psychological Ownership when Writing with AI
Writing longer prompts for an AI assistant to generate a short story increases psychological ownership, a user's feeling that the writing belongs to them. To encourage users to write longer prompts, we evaluated two interaction techniques that modify the prompt entry interface of chat-based generative AI assistants: pressing and holding the prompt submission button, and continuously moving a slider up and down when submitting a short prompt. A within-subjects experiment investigated the effects of such techniques on prompt length and psychological ownership, and results showed that these techniques increased prompt length and led to higher psychological ownership than baseline techniques. A second experiment further augmented these techniques by showing AI-generated suggestions for how the prompts could be expanded. This further increased prompt length, but did not lead to improvements in psychological ownership. Our results show that simple interface modifications like these can elicit more writing from users and improve psychological ownership.
Updated: 2025-07-04 15:44:24
标题: 与人工智能写作时鼓励更长提示的交互技术可以提高心理所有权
摘要: 撰写更长的提示,让人工智能助手生成一个短故事,可以增加心理所有权,即用户感到这个写作属于他们自己。为了鼓励用户撰写更长的提示,我们评估了两种修改基于聊天的生成式人工智能助手提示输入界面的交互技术:按住提示提交按钮和在提交短提示时持续移动滑块上下。一个实验研究了这些技术对提示长度和心理所有权的影响,结果显示这些技术增加了提示长度,并导致比基线技术更高的心理所有权。第二个实验进一步通过展示人工智能生成的建议来增强这些技术,以扩展提示。这进一步增加了提示长度,但并未带来心理所有权的改善。我们的结果表明,像这样简单的界面修改可以引起用户更多的写作,提高心理所有权。
更新时间: 2025-07-04 15:44:24
领域: cs.HC,cs.AI,cs.CL
Re-Emergent Misalignment: How Narrow Fine-Tuning Erodes Safety Alignment in LLMs
Recent work has shown that fine-tuning large language models (LLMs) on code with security vulnerabilities can result in misaligned and unsafe behaviors across broad domains. These results prompted concerns about the emergence of harmful behaviors from narrow domain fine-tuning. In this paper, we contextualize these findings by analyzing how such narrow adaptation impacts the internal mechanisms and behavioral manifestations of LLMs. Through a series of experiments covering output probability distributions, loss and gradient vector geometry, layer-wise activation dynamics, and activation space dimensions, we find that behaviors attributed to "emergent misalignment" may be better interpreted as an erosion of prior alignment. We show that fine tuning on insecure code induces internal changes that oppose alignment. Further, we identify a shared latent dimension in the model's activation space that governs alignment behavior. We show that this space is activated by insecure code and by misaligned responses more generally, revealing how narrow fine-tuning can degrade general safety behavior by interfering with shared internal mechanisms. Our findings offer a mechanistic interpretation for previously observed misalignment phenomena, and highlights the fragility of alignment in LLMs. The results underscore the need for more robust fine-tuning strategies that preserve intended behavior across domains.
Updated: 2025-07-04 15:36:58
标题: 再现的错位:狭窄微调如何侵蚀LLM中的安全对齐
摘要: 最近的研究表明,在存在安全漏洞的代码上对大型语言模型(LLMs)进行微调可能会导致在广泛领域中出现不一致和不安全的行为。这些结果引发了对于窄领域微调可能导致有害行为出现的担忧。本文通过分析这些发现,揭示了这种窄适应对LLMs内部机制和行为表现的影响。通过一系列实验,涵盖输出概率分布、损失和梯度向量几何、逐层激活动态以及激活空间维度,我们发现被归因于“新兴不一致”的行为可能更好地被解释为先前对齐的侵蚀。我们展示,对不安全代码进行微调会引起与对齐相对立的内部变化。此外,我们确定了模型激活空间中的一个共享潜在维度,该维度控制对齐行为。我们展示,不安全代码和不一致响应更普遍地激活了这个空间,揭示了窄微调如何通过干扰共享内部机制来降低一般安全行为。我们的发现为先前观察到的不一致现象提供了一种机械解释,并突显了LLMs中对齐的脆弱性。这些结果强调了需要更加健壮的微调策略,以确保在各个领域中保持预期行为。
更新时间: 2025-07-04 15:36:58
领域: cs.LG,cs.AI,cs.CL
When Network Architecture Meets Physics: Deep Operator Learning for Coupled Multiphysics
Scientific applications increasingly demand real-time surrogate models that can capture the behavior of strongly coupled multiphysics systems driven by multiple input functions, such as in thermo-mechanical and electro-thermal processes. While neural operator frameworks, such as Deep Operator Networks (DeepONets), have shown considerable success in single-physics settings, their extension to multiphysics problems remains poorly understood. In particular, the challenge of learning nonlinear interactions between tightly coupled physical fields has received little systematic attention. This study addresses a foundational question: should the architectural design of a neural operator reflect the strength of physical coupling it aims to model? To answer this, we present the first comprehensive, architecture-aware evaluation of DeepONet variants across three regimes: single-physics, weakly coupled, and strongly coupled multiphysics systems. We consider a reaction-diffusion equation with dual spatial inputs, a nonlinear thermo-electrical problem with bidirectional coupling through temperature-dependent conductivity, and a viscoplastic thermo-mechanical model of steel solidification governed by transient phase-driven interactions. Two operator-learning frameworks, the classical DeepONet and its sequential GRU-based extension, S-DeepONet, are benchmarked using both single-branch and multi-branch (MIONet-style) architectures. Our results demonstrate that architectural alignment with physical coupling is crucial: single-branch networks significantly outperform multi-branch counterparts in strongly coupled settings, whereas multi-branch encodings offer advantages for decoupled or single-physics problems. Once trained, these surrogates achieve full-field predictions up to 1.8e4 times faster than high-fidelity finite-element solvers, without compromising solution accuracy.
Updated: 2025-07-04 15:36:15
标题: 当网络架构遇见物理学:用于耦合多物理现象的深度算子学习
摘要: 科学应用越来越需要能够捕捉由多个输入函数驱动的强耦合多物理系统行为的实时替代模型,例如热机械和电热过程。虽然神经算子框架,如深度算子网络(DeepONets),在单物理环境中表现出色,但其在多物理问题上的扩展仍然理解不足。特别是,学习紧密耦合物理场之间的非线性相互作用的挑战受到很少系统关注。本研究探讨一个基础性问题:神经算子的架构设计是否应反映其旨在建模的物理耦合强度?为了回答这个问题,我们首次对三个领域(单物理、弱耦合和强耦合多物理系统)的DeepONet变体进行全面的、架构感知的评估。我们考虑了具有双空间输入的反应扩散方程、通过温度相关导电率进行双向耦合的非线性热电问题,以及由瞬变相驱动相互作用控制的钢凝固的粘塑热机械模型。我们使用经典的DeepONet及其基于序列GRU的扩展S-DeepONet对两种操作学习框架进行基准测试,包括单分支和多分支(MIONet风格)架构。我们的结果表明,与物理耦合的架构对齐至关重要:在强耦合环境中,单分支网络明显优于多分支对应物,而多分支编码适用于解耦或单物理问题。一旦训练完成,这些替代模型的全场预测速度比高保真有限元求解器快高达1.8e4倍,而不会影响解决方案的准确性。
更新时间: 2025-07-04 15:36:15
领域: cs.LG
Forecast Evaluation and the Relationship of Regret and Calibration
Machine learning is about forecasting. When the forecasts come with an evaluation metric the forecasts become useful. What are reasonable evaluation metrics? How do existing evaluation metrics relate? In this work, we provide a general structure which subsumes many currently used evaluation metrics in a two-dimensional hierarchy, e.g., external and swap regret, loss scores, and calibration scores. The framework embeds those evaluation metrics in a large set of single-instance-based comparisons of forecasts and observations which respect a meta-criterion for reasonable forecast evaluations which we term ``fairness''. In particular, this framework sheds light on the relationship on regret-type and calibration-type evaluation metrics showing a theoretical equivalence in their ability to evaluate, but practical incomparability of the obtained scores.
Updated: 2025-07-04 15:35:32
标题: 预测评估和遗憾与校准的关系
摘要: 机器学习是关于预测的。当预测附带评估指标时,预测就变得有用了。什么是合理的评估指标?现有的评估指标如何相关?在这项工作中,我们提供了一个一般结构,它涵盖了许多当前使用的评估指标,例如外部和交换后悔、损失分数和校准分数。该框架将这些评估指标嵌入到基于单个实例的预测和观察的比较中,这些比较遵循一个合理预测评估的元标准,我们称之为“公平性”。特别是,该框架揭示了后悔类型和校准类型评估指标之间的关系,显示了它们在评估能力上的理论等价性,但在获得分数方面的实际不可比性。
更新时间: 2025-07-04 15:35:32
领域: cs.LG,stat.ML
PRUNE: A Patching Based Repair Framework for Certifiable Unlearning of Neural Networks
It is often desirable to remove (a.k.a. unlearn) a specific part of the training data from a trained neural network model. A typical application scenario is to protect the data holder's right to be forgotten, which has been promoted by many recent regulation rules. Existing unlearning methods involve training alternative models with remaining data, which may be costly and challenging to verify from the data holder or a thirdparty auditor's perspective. In this work, we provide a new angle and propose a novel unlearning approach by imposing carefully crafted "patch" on the original neural network to achieve targeted "forgetting" of the requested data to delete. Specifically, inspired by the research line of neural network repair, we propose to strategically seek a lightweight minimum "patch" for unlearning a given data point with certifiable guarantee. Furthermore, to unlearn a considerable amount of data points (or an entire class), we propose to iteratively select a small subset of representative data points to unlearn, which achieves the effect of unlearning the whole set. Extensive experiments on multiple categorical datasets demonstrates our approach's effectiveness, achieving measurable unlearning while preserving the model's performance and being competitive in efficiency and memory consumption compared to various baseline methods.
Updated: 2025-07-04 15:33:43
标题: 修剪:一个基于补丁的修复框架,用于神经网络的可证明遗忘
摘要: 通常有必要从已训练的神经网络模型中删除(又称“忘记”)特定部分的训练数据。一个典型的应用场景是保护数据持有者的被遗忘权,这已被许多最近的监管规定所提倡。现有的忘记方法涉及使用剩余数据训练备用模型,这可能在数据持有者或第三方审计人员的视角上是昂贵且具有挑战性的验证。在这项工作中,我们提供了一个新的视角,并提出了一种新颖的忘记方法,通过在原始神经网络上施加精心制作的“修补程序”来实现对请求删除的数据的有针对性的“遗忘”。具体地,受神经网络修复研究线的启发,我们提议策略性地寻找一个轻量级的最小“修补程序”,以便对给定的数据点进行可证明保证的忘记。此外,为了忘记大量数据点(或整个类别),我们提出了迭代选择一个小的代表性数据点子集进行忘记的方法,从而实现对整个数据集的忘记效果。对多个分类数据集进行的大量实验表明了我们方法的有效性,实现了可衡量的忘记同时保持模型性能,并且在效率和内存消耗方面与各种基准方法相比具有竞争力。
更新时间: 2025-07-04 15:33:43
领域: cs.LG,cs.AI,cs.CR
MLASDO: a software tool to detect and explain clinical and omics inconsistencies applied to the Parkinson's Progression Markers Initiative cohort
Inconsistencies between clinical and omics data may arise within medical cohorts. The identification, annotation and explanation of anomalous omics-based patients or individuals may become crucial to better reshape the disease, e.g., by detecting early onsets signaled by the omics and undetectable from observable symptoms. Here, we developed MLASDO (Machine Learning based Anomalous Sample Detection on Omics), a new method and software tool to identify, characterize and automatically describe anomalous samples based on omics data. Its workflow is based on three steps: (1) classification of healthy and cases individuals using a support vector machine algorithm; (2) detection of anomalous samples within groups; (3) explanation of anomalous individuals based on clinical data and expert knowledge. We showcase MLASDO using transcriptomics data of 317 healthy controls (HC) and 465 Parkinson's disease (PD) cases from the Parkinson's Progression Markers Initiative. In this cohort, MLASDO detected 15 anomalous HC with a PD-like transcriptomic signature and PD-like clinical features, including a lower proportion of CD4/CD8 naive T-cells and CD4 memory T-cells compared to HC (P<3.5*10^-3). MLASDO also identified 22 anomalous PD cases with a transcriptomic signature more similar to that of HC and some clinical features more similar to HC, including a lower proportion of mature neutrophils compared to PD cases (P<6*10^-3). In summary, MLASDO is a powerful tool that can help the clinician to detect and explain anomalous HC and cases of interest to be followed up. MLASDO is an open-source R package available at: https://github.com/JoseAdrian3/MLASDO.
Updated: 2025-07-04 15:31:12
标题: MLASDO:一种用于检测和解释临床和组学不一致性的软件工具,应用于帕金森病进展标志物倡议队列
摘要: 在医学队列中,临床数据和组学数据之间的不一致性可能会出现。识别、注释和解释基于组学的异常个体可能对重新塑造疾病变得至关重要,例如,通过检测组学信号的早期发生,而这些信号从可观察的症状中无法检测出来。在这里,我们开发了MLASDO(基于机器学习的组学异常样本检测),这是一种新的方法和软件工具,可以根据组学数据识别、表征和自动描述异常样本。其工作流程基于三个步骤:(1)使用支持向量机算法对健康和病例个体进行分类;(2)在组内检测异常样本;(3)根据临床数据和专家知识解释异常个体。我们使用帕金森病进展标志倡议的317名健康对照(HC)和465名帕金森病(PD)患者的转录组数据展示了MLASDO。在这个队列中,MLASDO检测到15个具有类似PD转录组特征和PD样临床特征的异常HC,包括与HC相比CD4/CD8天然T细胞和CD4记忆T细胞的比例较低(P<3.5*10^-3)。MLASDO还识别了22个具有与HC更相似的转录组特征和一些与HC更相似的临床特征的异常PD病例,包括与PD病例相比成熟中性粒细胞的比例较低(P<6*10^-3)。总之,MLASDO是一个强大的工具,可以帮助临床医生检测和解释有待后续跟进的异常HC和病例。MLASDO是一个开源的R软件包,可在https://github.com/JoseAdrian3/MLASDO 上获取。
更新时间: 2025-07-04 15:31:12
领域: cs.LG
On the Verification of Control Flow Attestation Evidence
Remote run-time attestation methods, including Control Flow Attestation (CFA) and Data Flow Attestation (DFA), have been proposed to generate precise evidence of execution's control flow path (in CFA) and optionally execution data inputs (in DFA) on a remote and potentially compromised embedded device, hereby referred to as a Prover (Prv). Recent advances in run-time attestation architectures are also able to guarantee that a remote Verifier (Vrf) reliably receives this evidence from Prv, even when Prv's software state is fully compromised. This, in theory, enables secure "run-time auditing" in addition to best-effort attestation, i.e., it guarantees that Vrf can examine execution evidence to identify previously unknown compromises as soon as they are exploited, pinpoint their root cause(s), and remediate them. However, prior work has for the most part focused on securely implementing Prv's root of trust (responsible for generating authentic run-time evidence), leaving Vrf 's perspective in this security service unexplored. In this work, we argue that run-time attestation and auditing are only truly useful if Vrf can effectively analyze received evidence. From this premise, we characterize different types of evidence produced by existing run-time attestation/auditing architectures in terms of Vrf 's ability to detect and remediate (previously unknown) vulnerabilities. As a case study for practical uses of run-time evidence by Vrf, we propose SABRE: a Security Analysis and Binary Repair Engine. SABRE showcases how Vrf can systematically leverage run-time evidence to detect control flow attacks, pinpoint corrupted control data and specific instructions used to corrupt them, and leverage this evidence to automatically generate binary patches to buffer overflow and use-after-free vulnerabilities without source code knowledge.
Updated: 2025-07-04 15:28:11
标题: 关于控制流认证证据验证
摘要: 远程运行时认证方法,包括控制流认证(CFA)和数据流认证(DFA),已被提出用于在远程和可能受损的嵌入式设备上生成执行控制流路径的精确证据(在CFA中),以及可选地生成执行数据输入(在DFA中),该设备在此被称为证明者(Prv)。最近在运行时认证架构方面的进展也能够保证远程验证者(Vrf)可靠地从Prv接收到这些证据,即使Prv的软件状态完全遭到破坏。这理论上实现了安全的“运行时审计”,除了尽力认证,即确保Vrf能够检查执行证据,以便在被利用时立即识别以前未知的威胁,找出其根本原因并对其进行补救。然而,先前的工作大部分都集中在安全实现Prv的信任根基(负责生成真实的运行时证据),而Vrf在这种安全服务中的角度尚未被探索。在本文中,我们认为运行时认证和审计只有在Vrf能够有效分析接收到的证据时才真正有用。基于这一前提,我们对现有运行时认证/审计架构产生的不同类型证据进行了表征,以便Vrf能够检测和修复(先前未知的)漏洞。作为Vrf对运行时证据实际用途的案例研究,我们提出了SABRE:一个安全分析和二进制修复引擎。SABRE展示了Vrf如何系统地利用运行时证据来检测控制流攻击,找出受损的控制数据和用于损坏它们的特定指令,并利用这些证据自动生成二进制补丁以修复缓冲区溢出和使用已释放内存的漏洞,而无需源代码知识。
更新时间: 2025-07-04 15:28:11
领域: cs.CR
Disentangling the Roles of Representation and Selection in Data Pruning
Data pruning, selecting small but impactful subsets, offers a promising way to efficiently scale NLP model training. However, existing methods often involve many different design choices, which have not been systematically studied. This limits future developments. In this work, we decompose data pruning into two key components: the data representation and the selection algorithm, and we systematically analyze their influence on the selection of instances. Our theoretical and empirical results highlight the crucial role of representations: better representations, e.g., training gradients, generally lead to a better selection of instances, regardless of the chosen selection algorithm. Furthermore, different selection algorithms excel in different settings, and none consistently outperforms the others. Moreover, the selection algorithms do not always align with their intended objectives: for example, algorithms designed for the same objective can select drastically different instances, highlighting the need for careful evaluation.
Updated: 2025-07-04 15:25:04
标题: 解开数据修剪中表示和选择的角色
摘要: 数据修剪,选择小而具有影响力的子集,为有效扩展自然语言处理模型训练提供了一种有前途的途径。然而,现有方法往往涉及许多不同的设计选择,这些选择并没有得到系统地研究。这限制了未来的发展。在这项工作中,我们将数据修剪分解为两个关键组成部分:数据表示和选择算法,并系统地分析它们对实例选择的影响。我们的理论和实证结果突显了表示的关键作用:更好的表示,例如训练梯度,通常会导致更好的实例选择,无论选择的算法是什么。此外,不同的选择算法在不同的设置中表现出色,没有一个始终优于其他算法。此外,选择算法并不总是与其预期目标一致:例如,为相同目标设计的算法可能选择完全不同的实例,突显了对仔细评估的需求。
更新时间: 2025-07-04 15:25:04
领域: cs.CL,cs.LG
When There Is No Decoder: Removing Watermarks from Stable Diffusion Models in a No-box Setting
Watermarking has emerged as a promising solution to counter harmful or deceptive AI-generated content by embedding hidden identifiers that trace content origins. However, the robustness of current watermarking techniques is still largely unexplored, raising critical questions about their effectiveness against adversarial attacks. To address this gap, we examine the robustness of model-specific watermarking, where watermark embedding is integrated with text-to-image generation in models like latent diffusion models. We introduce three attack strategies: edge prediction-based, box blurring, and fine-tuning-based attacks in a no-box setting, where an attacker does not require access to the ground-truth watermark decoder. Our findings reveal that while model-specific watermarking is resilient against basic evasion attempts, such as edge prediction, it is notably vulnerable to blurring and fine-tuning-based attacks. Our best-performing attack achieves a reduction in watermark detection accuracy to approximately 47.92\%. Additionally, we perform an ablation study on factors like message length, kernel size and decoder depth, identifying critical parameters influencing the fine-tuning attack's success. Finally, we assess several advanced watermarking defenses, finding that even the most robust methods, such as multi-label smoothing, result in watermark extraction accuracy that falls below an acceptable level when subjected to our no-box attacks.
Updated: 2025-07-04 15:22:20
标题: 当没有解码器时:在无盒设置中从稳定扩散模型中去除水印
摘要: 数字水印技术已被提出作为应对有害或欺骗性人工智能生成内容的有效解决方案,通过嵌入隐藏的标识符来追踪内容的来源。然而,当前数字水印技术的鲁棒性仍然大部分未被探索,这引发了关于其对抗对抗性攻击效果的关键问题。为填补这一空白,我们考察了模型特定的数字水印技术的鲁棒性,其中水印嵌入与文本到图像生成在潜在扩散模型等模型中集成。我们引入了三种攻击策略:基于边缘预测的攻击、盒子模糊和基于微调的攻击,在无盒子设置中,攻击者不需要访问地面真实水印解码器。我们的研究结果显示,虽然模型特定的数字水印技术对基本的规避尝试(如边缘预测)具有韧性,但明显容易受到模糊和基于微调的攻击影响。我们最有效的攻击方式将水印检测准确率降低到约47.92\%。此外,我们对消息长度、核大小和解码器深度等因素进行了消融研究,确定了影响微调攻击成功的关键参数。最后,我们评估了几种先进的数字水印防御方法,发现即使最鲁棒的方法,如多标签平滑,也会在面对我们的无盒子攻击时导致水印提取准确性下降至不可接受的水平以下。
更新时间: 2025-07-04 15:22:20
领域: cs.CR
SymmetryLens: Unsupervised Symmetry Learning via Locality and Density Preservation
We develop a new unsupervised symmetry learning method that starts with raw data and provides the minimal generator of an underlying Lie group of symmetries, together with a symmetry-equivariant representation of the data, which turns the hidden symmetry into an explicit one. The method is able to learn the pixel translation operator from a dataset with only an approximate translation symmetry and can learn quite different types of symmetries that are not apparent to the naked eye. The method is based on the formulation of an information-theoretic loss function that measures both the degree of symmetry of a dataset under a candidate symmetry generator and a proposed notion of locality of the samples, which is coupled to symmetry. We demonstrate that this coupling between symmetry and locality, together with an optimization technique developed for entropy estimation, results in a stable system that provides reproducible results.
Updated: 2025-07-04 15:19:05
标题: SymmetryLens: 通过局部性和密度保持实现无监督对称学习
摘要: 我们开发了一种新的无监督对称性学习方法,从原始数据开始提供潜在Lie群对称性的最小生成器,以及数据的对称性等变表示,将隐藏的对称性转化为显式对称性。该方法能够从仅具有近似平移对称性的数据集中学习像素平移算子,并且能够学习到肉眼不明显的各种不同类型的对称性。该方法基于信息理论损失函数的制定,该函数既衡量了数据集在候选对称性生成器下对称性的程度,又提出了样本的局部性概念,该概念与对称性相耦合。我们展示了对称性和局部性之间的耦合,以及为熵估计开发的优化技术,导致一个稳定的系统,提供可重现的结果。
更新时间: 2025-07-04 15:19:05
领域: cs.LG
Constrain Alignment with Sparse Autoencoders
The alignment of large language models (LLMs) with human preferences remains a key challenge. While post-training techniques like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) have achieved notable success, they often introduce computational inefficiencies and training instability. In this paper, we propose Feature-level constrained Preference Optimization (FPO), a novel method designed to simplify the alignment process while ensuring stability. FPO leverages pre-trained Sparse Autoencoders (SAEs) and introduces feature-level constraints, allowing for efficient, sparsity-enforced alignment. Our approach enjoys efficiency by using sparse features activated in a well-trained sparse autoencoder and the quality of sequential KL divergence by using the feature-level offline reference. Experimental results on benchmark datasets demonstrate that FPO achieves a 5.08% absolute improvement in win rate with much lower computational cost compared to state-of-the-art baselines, making it a promising solution for efficient and controllable LLM alignments.
Updated: 2025-07-04 15:18:24
标题: 用稀疏自动编码器约束对齐
摘要: 大型语言模型(LLMs)与人类偏好的对齐仍然是一个关键挑战。虽然像强化学习从人类反馈中学习(RLHF)和直接偏好优化(DPO)等训练后技术取得了显著成功,但它们经常引入计算效率低和训练不稳定的问题。在本文中,我们提出了特征级约束偏好优化(FPO),这是一种旨在简化对齐过程并确保稳定性的新方法。FPO利用预训练的稀疏自动编码器(SAEs)并引入特征级约束,允许进行高效且稀疏化的对齐。我们的方法通过使用在训练良好的稀疏自动编码器中激活的稀疏特征和使用特征级离线参考来获得顺序KL散度的质量,从而获得效率。基准数据集上的实验结果表明,与最先进的基线相比,FPO在获胜率上实现了5.08%的绝对改善,并具有更低的计算成本,这使其成为一种有希望的解决方案,用于高效和可控的LLM对齐。
更新时间: 2025-07-04 15:18:24
领域: cs.AI,cs.CL
A Hybrid Supervised and Self-Supervised Graph Neural Network for Edge-Centric Applications
This paper presents a novel graph-based deep learning model for tasks involving relations between two nodes (edge-centric tasks), where the focus lies on predicting relationships and interactions between pairs of nodes rather than node properties themselves. This model combines supervised and self-supervised learning, taking into account for the loss function the embeddings learned and patterns with and without ground truth. Additionally it incorporates an attention mechanism that leverages both node and edge features. The architecture, trained end-to-end, comprises two primary components: embedding generation and prediction. First, a graph neural network (GNN) transform raw node features into dense, low-dimensional embeddings, incorporating edge attributes. Then, a feedforward neural model processes the node embeddings to produce the final output. Experiments demonstrate that our model matches or exceeds existing methods for protein-protein interactions prediction and Gene Ontology (GO) terms prediction. The model also performs effectively with one-hot encoding for node features, providing a solution for the previously unsolved problem of predicting similarity between compounds with unknown structures.
Updated: 2025-07-04 15:15:10
标题: 一个用于以边为中心的应用的混合监督和自监督图神经网络
摘要: 这篇论文提出了一种新颖的基于图的深度学习模型,用于涉及两个节点之间关系(以边为中心的任务),重点在于预测节点对之间的关系和交互,而不是节点本身的属性。该模型结合了监督学习和自监督学习,考虑了损失函数中学习的嵌入和具有和没有地面真相的模式。此外,它还结合了一个注意机制,利用节点和边特征。该架构经过端到端训练,包括两个主要组件:嵌入生成和预测。首先,一个图神经网络(GNN)将原始节点特征转换为密集的、低维的嵌入,整合边属性。然后,一个前馈神经模型处理节点嵌入以产生最终输出。实验证明,我们的模型在蛋白质相互作用预测和基因本体(GO)术语预测方面与现有方法匹敌或超越。该模型还有效地利用了独热编码进行节点特征,为以前未解决的预测具有未知结构的化合物之间相似性的问题提供了解决方案。
更新时间: 2025-07-04 15:15:10
领域: cs.LG,q-bio.MN
Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion
Deep learning models for dialect identification are often limited by the scarcity of dialectal data. To address this challenge, we propose to use Retrieval-based Voice Conversion (RVC) as an effective data augmentation method for a low-resource German dialect classification task. By converting audio samples to a uniform target speaker, RVC minimizes speaker-related variability, enabling models to focus on dialect-specific linguistic and phonetic features. Our experiments demonstrate that RVC enhances classification performance when utilized as a standalone augmentation method. Furthermore, combining RVC with other augmentation methods such as frequency masking and segment removal leads to additional performance gains, highlighting its potential for improving dialect classification in low-resource scenarios.
Updated: 2025-07-04 15:14:49
标题: 使用检索式语音转换提高低资源方言分类效果
摘要: 深度学习模型在方言识别方面通常受方言数据稀缺的限制。为了解决这一挑战,我们提出使用检索式语音转换(RVC)作为一种有效的数据增强方法,用于低资源德语方言分类任务。通过将音频样本转换为统一的目标说话者,RVC减少了与说话者相关的可变性,使模型能够专注于方言特定的语言和语音特征。我们的实验表明,当单独使用RVC作为增强方法时,它可以提高分类性能。此外,将RVC与其他增强方法(如频率屏蔽和段落去除)结合使用会导致额外的性能提升,突显了其在改善低资源情况下方言分类的潜力。
更新时间: 2025-07-04 15:14:49
领域: cs.CL,cs.AI,cs.SD,eess.AS
Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they remain vulnerable to adversarial manipulations such as jailbreaking via prompt injection attacks. These attacks bypass safety mechanisms to generate restricted or harmful content. In this study, we investigated the underlying latent subspaces of safe and jailbroken states by extracting hidden activations from a LLM. Inspired by attractor dynamics in neuroscience, we hypothesized that LLM activations settle into semi stable states that can be identified and perturbed to induce state transitions. Using dimensionality reduction techniques, we projected activations from safe and jailbroken responses to reveal latent subspaces in lower dimensional spaces. We then derived a perturbation vector that when applied to safe representations, shifted the model towards a jailbreak state. Our results demonstrate that this causal intervention results in statistically significant jailbreak responses in a subset of prompts. Next, we probed how these perturbations propagate through the model's layers, testing whether the induced state change remains localized or cascades throughout the network. Our findings indicate that targeted perturbations induced distinct shifts in activations and model responses. Our approach paves the way for potential proactive defenses, shifting from traditional guardrail based methods to preemptive, model agnostic techniques that neutralize adversarial states at the representation level.
Updated: 2025-07-04 15:13:55
标题: 探究LLM中的潜在子空间以保障人工智能安全:识别和操纵对抗性状态
摘要: 大型语言模型(LLMs)已经在各种任务中展示出卓越的能力,但它们仍然容易受到诸如通过提示注入攻击进行越狱等对抗性篡改的影响。这些攻击可以绕过安全机制生成受限制或有害内容。在这项研究中,我们通过从LLM中提取隐藏激活来研究安全和越狱状态的潜在潜在子空间。受到神经科学中吸引子动力学的启发,我们假设LLM的激活会稳定到可以被识别和扰动以诱导状态转变的半稳定状态。通过降维技术,我们将安全和越狱响应的激活投影到较低维度空间中以显示潜在子空间。然后,我们推导出一个扰动向量,当应用于安全表示时,将模型转向越狱状态。我们的结果表明,这种因果干预导致部分提示中出现具有统计意义的越狱响应。接下来,我们探究这些扰动如何在模型的层中传播,检验诱发的状态变化是否局部化或在整个网络中传播。我们的发现表明,有针对性的扰动导致激活和模型响应发生明显的转变。我们的方法为潜在的主动防御打开了道路,从传统的基于护栏的方法转向预防性、模型无关的技术,以在表示级别中中和对抗性状态。
更新时间: 2025-07-04 15:13:55
领域: cs.LG,cs.AI,cs.CR
PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction
Reconstructing high-quality images from low-resolution inputs using Residual Dense Spatial Networks (RDSNs) is crucial yet challenging, particularly in collaborative scenarios where centralized training poses significant privacy risks, including data leakage and inference attacks, as well as high computational costs. We propose a novel Privacy-Preserving Federated Learning-based RDSN (PPFL-RDSN) framework specifically tailored for lossy image reconstruction. PPFL-RDSN integrates Federated Learning (FL), local differential privacy, and robust model watermarking techniques, ensuring data remains secure on local devices, safeguarding sensitive information, and maintaining model authenticity without revealing underlying data. Empirical evaluations show that PPFL-RDSN achieves comparable performance to the state-of-the-art centralized methods while reducing computational burdens, and effectively mitigates security and privacy vulnerabilities, making it a practical solution for secure and privacy-preserving collaborative computer vision applications.
Updated: 2025-07-04 15:10:21
标题: PPFL-RDSN:基于隐私保护联邦学习的剩余密集空间网络,用于加密的有损图像重建
摘要: 使用残差密集空间网络(RDSNs)从低分辨率输入重建高质量图像对于合作场景至关重要但具有挑战性,特别是在集中式训练造成较大隐私风险的情况下,包括数据泄露和推断攻击,以及高计算成本。我们提出了一种新颖的基于隐私保护联邦学习的RDSN(PPFL-RDSN)框架,专门为有损图像重建而设计。PPFL-RDSN整合了联邦学习(FL)、本地差分隐私和强大的模型水印技术,确保数据在本地设备上保持安全,保护敏感信息,并保持模型的真实性,而不会泄露底层数据。实证评估表明,PPFL-RDSN实现了与最先进的集中式方法相媲美的性能,同时减轻了计算负担,并有效缓解了安全和隐私漏洞,使其成为安全和隐私保护的合作计算机视觉应用的实用解决方案。
更新时间: 2025-07-04 15:10:21
领域: cs.LG,cs.CR
Large Language Models for Combinatorial Optimization: A Systematic Review
This systematic review explores the application of Large Language Models (LLMs) in Combinatorial Optimization (CO). We report our findings using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We conduct a literature search via Scopus and Google Scholar, examining over 2,000 publications. We assess publications against four inclusion and four exclusion criteria related to their language, research focus, publication year, and type. Eventually, we select 103 studies. We classify these studies into semantic categories and topics to provide a comprehensive overview of the field, including the tasks performed by LLMs, the architectures of LLMs, the existing datasets specifically designed for evaluating LLMs in CO, and the field of application. Finally, we identify future directions for leveraging LLMs in this field.
Updated: 2025-07-04 15:08:10
标题: 大型语言模型用于组合优化:系统性综述
摘要: 这篇系统性综述探讨了大型语言模型(LLMs)在组合优化(CO)中的应用。我们使用“系统性综述和Meta分析的首选报告项目”(PRISMA)指南报告了我们的发现。我们通过Scopus和谷歌学术进行文献搜索,检查了超过2000篇出版物。我们根据与语言、研究重点、出版年份和类型相关的四项包含标准和四项排除标准评估出版物。最终,我们选择了103项研究。我们将这些研究分类为语义类别和主题,以提供该领域的综合概述,包括LLMs执行的任务、LLMs的架构、专门设计用于评估LLMs在CO中的现有数据集以及应用领域。最后,我们确定了在该领域利用LLMs的未来方向。
更新时间: 2025-07-04 15:08:10
领域: cs.AI
SecureT2I: No More Unauthorized Manipulation on AI Generated Images from Prompts
Text-guided image manipulation with diffusion models enables flexible and precise editing based on prompts, but raises ethical and copyright concerns due to potential unauthorized modifications. To address this, we propose SecureT2I, a secure framework designed to prevent unauthorized editing in diffusion-based generative models. SecureT2I is compatible with both general-purpose and domain-specific models and can be integrated via lightweight fine-tuning without architectural changes. We categorize images into a permit set and a forbid set based on editing permissions. For the permit set, the model learns to perform high-quality manipulations as usual. For the forbid set, we introduce training objectives that encourage vague or semantically ambiguous outputs (e.g., blurred images), thereby suppressing meaningful edits. The core challenge is to block unauthorized editing while preserving editing quality for permitted inputs. To this end, we design separate loss functions that guide selective editing behavior. Extensive experiments across multiple datasets and models show that SecureT2I effectively degrades manipulation quality on forbidden images while maintaining performance on permitted ones. We also evaluate generalization to unseen inputs and find that SecureT2I consistently outperforms baselines. Additionally, we analyze different vagueness strategies and find that resize-based degradation offers the best trade-off for secure manipulation control.
Updated: 2025-07-04 15:05:55
标题: SecureT2I:AI生成的图像不再受未经授权的操纵
摘要: 使用扩散模型进行文本引导的图像处理可以基于提示实现灵活和精确的编辑,但由于潜在的未经授权修改,引发了伦理和版权问题。为了解决这一问题,我们提出了SecureT2I,这是一个安全框架,旨在防止扩散式生成模型中的未经授权编辑。SecureT2I与通用和特定领域模型兼容,并且可以通过轻量级微调集成,无需进行架构更改。我们根据编辑权限将图像分为允许集和禁止集。对于允许集,模型学习以通常方式执行高质量的操作。对于禁止集,我们引入了鼓励模糊或语义模糊输出(例如,模糊图像)的训练目标,从而抑制有意义的编辑。核心挑战在于阻止未经授权的编辑,同时保持对允许输入的编辑质量。为此,我们设计了不同的损失函数,以引导选择性的编辑行为。跨多个数据集和模型的广泛实验表明,SecureT2I在禁止图像上有效降低了操控质量,同时在允许图像上保持了性能。我们还评估了对未见输入的泛化能力,并发现SecureT2I始终优于基线。此外,我们分析了不同的模糊策略,并发现基于调整大小的退化提供了安全操控的最佳权衡。
更新时间: 2025-07-04 15:05:55
领域: cs.CR,cs.CV
A Novel Four-Stage Synchronized Chaotic Map: Design and Statistical Characterization
Digital implementations of chaotic systems often suffer from inherent degradation, limiting their long-term performance and statistical quality. To address this challenge, we propose a novel four-stage synchronized piecewise linear chaotic map. This new map is meticulously designed with four independent segments, each possessing its own control parameters, specifically engineered to mitigate the natural degradation observed in digitally realized dynamical systems. We characterize its behavior using established tools from nonlinear dynamics, including bifurcation diagrams and graphical analysis, which provide a comprehensive qualitative understanding of its complex dynamics. To rigorously validate the statistical features of the generated sequences, we employed the National Institute of Standards and Technology (NIST) statistical testing suite. A substantial 100 MB dataset, comprising sequences produced by the proposed map, was generated via a Matlab script and subjected to this rigorous battery of tests. Our results demonstrate that the proposed map exhibits superior statistical properties compared to the classic Bernoulli map, successfully passing all NIST tests where the traditional map did not. This research confirms the proposed map's potential as a robust and high-quality source for chaotic sequence generation.
Updated: 2025-07-04 15:02:53
标题: 一个新的四阶段同步混沌映射:设计与统计特性描述
摘要: 数字实现的混沌系统往往受到固有降级的影响,限制了它们的长期性能和统计质量。为了应对这一挑战,我们提出了一个新颖的四阶段同步分段线性混沌映射。这个新映射经过精心设计,具有四个独立的段,每个段都有自己的控制参数,专门设计用来减轻数字实现的动态系统中观察到的自然降级。我们使用非线性动力学中的已建立工具来表征它的行为,包括分岔图和图形分析,这些工具提供了对其复杂动态的全面定性理解。为了严格验证生成序列的统计特征,我们采用了国家标准与技术研究所(NIST)的统计测试套件。通过Matlab脚本生成了一个包含由提议的映射产生的序列的大量100 MB数据集,并将其置于这一严格的测试中。我们的结果表明,与传统的伯努利映射相比,提议的映射表现出更优越的统计特性,成功通过了所有NIST测试,而传统映射没有。这项研究证实了提议的映射作为混沌序列生成的稳健高质量来源的潜力。
更新时间: 2025-07-04 15:02:53
领域: nlin.CD,cs.CR,37D45,F.2.1
From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis
EEG signals capture brain activity with high temporal and low spatial resolution, supporting applications such as neurological diagnosis, cognitive monitoring, and brain-computer interfaces. However, effective analysis is hindered by limited labeled data, high dimensionality, and the absence of scalable models that fully capture spatiotemporal dependencies. Existing self-supervised learning (SSL) methods often focus on either spatial or temporal features, leading to suboptimal representations. To this end, we propose EEG-VJEPA, a novel adaptation of the Video Joint Embedding Predictive Architecture (V-JEPA) for EEG classification. By treating EEG as video-like sequences, EEG-VJEPA learns semantically meaningful spatiotemporal representations using joint embeddings and adaptive masking. To our knowledge, this is the first work that exploits V-JEPA for EEG classification and explores the visual concepts learned by the model. Evaluations on the publicly available Temple University Hospital (TUH) Abnormal EEG dataset show that EEG-VJEPA outperforms existing state-of-the-art models in classification accuracy.Beyond classification accuracy, EEG-VJEPA captures physiologically relevant spatial and temporal signal patterns, offering interpretable embeddings that may support human-AI collaboration in diagnostic workflows. These findings position EEG-VJEPA as a promising framework for scalable, trustworthy EEG analysis in real-world clinical settings.
Updated: 2025-07-04 15:01:34
标题: 从视频到脑电图:将联合嵌入预测架构调整为揭示脑信号分析中的视觉概念
摘要: 脑电图(EEG)信号以高时间和低空间分辨率捕获脑活动,支持神经诊断、认知监测和脑-计算机界面等应用。然而,有效分析受到有限标记数据、高维度和缺乏能够完全捕获时空依赖关系的可扩展模型的阻碍。现有的自监督学习(SSL)方法通常侧重于空间或时间特征,导致表示不够优化。因此,我们提出了EEG-VJEPA,这是对视频联合嵌入预测架构(V-JEPA)的新颖改编,用于脑电图分类。通过将EEG视为类似视频的序列,EEG-VJEPA使用联合嵌入和自适应掩蔽来学习语义上有意义的时空表示。据我们所知,这是第一项利用V-JEPA进行脑电图分类并探索模型学习的视觉概念的工作。在公开可用的Temple大学医院(TUH)异常EEG数据集上的评估显示,EEG-VJEPA在分类准确性方面优于现有的最先进模型。除了分类准确性外,EEG-VJEPA捕捉到生理相关的空间和时间信号模式,提供可解释的嵌入,可能支持人工智能在诊断工作流中的合作。这些发现将EEG-VJEPA定位为可扩展、可信赖的EEG分析框架,适用于现实世界的临床环境。
更新时间: 2025-07-04 15:01:34
领域: cs.CV,cs.AI,cs.LG
Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations
Discovering governing equations that describe complex chaotic systems remains a fundamental challenge in physics and neuroscience. Here, we introduce the PEM-UDE method, which combines the prediction-error method with universal differential equations to extract interpretable mathematical expressions from chaotic dynamical systems, even with limited or noisy observations. This approach succeeds where traditional techniques fail by smoothing optimization landscapes and removing the chaotic properties during the fitting process without distorting optimal parameters. We demonstrate its efficacy by recovering hidden states in the Rossler system and reconstructing dynamics from noise-corrupted electrical circuit data, where the correct functional form of the dynamics is recovered even when one of the observed time series is corrupted by noise 5x the magnitude of the true signal. We demonstrate that this method is capable of recovering the correct dynamics, whereas direct symbolic regression methods, such as SINDy, fail to do so with the given amount of data and noise. Importantly, when applied to neural populations, our method derives novel governing equations that respect biological constraints such as network sparsity - a constraint necessary for cortical information processing yet not captured in next-generation neural mass models - while preserving microscale neuronal parameters. These equations predict an emergent relationship between connection density and both oscillation frequency and synchrony in neural circuits. We validate these predictions using three intracranial electrode recording datasets from the medial entorhinal cortex, prefrontal cortex, and orbitofrontal cortex. Our work provides a pathway to develop mechanistic, multi-scale brain models that generalize across diverse neural architectures, bridging the gap between single-neuron dynamics and macroscale brain activity.
Updated: 2025-07-04 14:57:58
标题: 科学机器学习在混沌系统中发现了神经群体的控制方程
摘要: 发现描述复杂混沌系统的控制方程仍然是物理学和神经科学中的一个基本挑战。在这里,我们介绍了PEM-UDE方法,该方法将预测误差方法与通用微分方程相结合,从混沌动力系统中提取可解释的数学表达式,即使观测数据有限或含有噪音。这种方法成功地平滑了优化景观,并在拟合过程中消除了混沌特性,而不会扭曲最优参数。我们通过在Rossler系统中恢复隐藏状态和从受噪音干扰的电路数据中重建动态来证明其有效性,在这种情况下,即使观察到的时间序列中的一个被噪音破坏了真实信号的5倍,也能恢复出正确的动态形式。我们证明了这种方法能够恢复正确的动态,而直接的符号回归方法,如SINDy,在给定的数据量和噪音下无法做到。重要的是,当应用于神经群体时,我们的方法推导出了符合生物约束的新型控制方程,如网络稀疏性 - 这是皮层信息处理所必需的约束,但未被纳入下一代神经质量模型中 - 同时保留了微观神经元参数。这些方程预测了神经回路中连接密度与振荡频率和同步性之间的新兴关系。我们使用来自内侧前角皮层、前额皮质和眶额前皮质的三组颅内电极记录数据验证了这些预测。我们的工作为开发机械、多尺度的脑模型提供了一条路径,这些模型能够横跨多样的神经结构,弥合单个神经元动态和宏观脑活动之间的差距。
更新时间: 2025-07-04 14:57:58
领域: cs.LG,math-ph,math.MP,nlin.CD,q-bio.NC
Playing with Transformer at 30+ FPS via Next-Frame Diffusion
Autoregressive video models offer distinct advantages over bidirectional diffusion models in creating interactive video content and supporting streaming applications with arbitrary duration. In this work, we present Next-Frame Diffusion (NFD), an autoregressive diffusion transformer that incorporates block-wise causal attention, enabling iterative sampling and efficient inference via parallel token generation within each frame. Nonetheless, achieving real-time video generation remains a significant challenge for such models, primarily due to the high computational cost associated with diffusion sampling and the hardware inefficiencies inherent to autoregressive generation. To address this, we introduce two innovations: (1) We extend consistency distillation to the video domain and adapt it specifically for video models, enabling efficient inference with few sampling steps; (2) To fully leverage parallel computation, motivated by the observation that adjacent frames often share the identical action input, we propose speculative sampling. In this approach, the model generates next few frames using current action input, and discard speculatively generated frames if the input action differs. Experiments on a large-scale action-conditioned video generation benchmark demonstrate that NFD beats autoregressive baselines in terms of both visual quality and sampling efficiency. We, for the first time, achieves autoregressive video generation at over 30 Frames Per Second (FPS) on an A100 GPU using a 310M model.
Updated: 2025-07-04 14:56:46
标题: 通过下一帧扩散在30+ FPS下玩转变压器
摘要: 自回归视频模型在创建交互式视频内容和支持任意持续时间的流媒体应用方面,相比双向扩散模型具有明显优势。在这项工作中,我们提出了Next-Frame Diffusion(NFD),这是一种自回归扩散变压器,结合了分块因果注意力,实现了通过每帧内并行令牌生成进行迭代采样和高效推理。然而,实现实时视频生成对于这种模型仍然是一个重大挑战,主要是由于扩散采样所需的高计算成本以及自回归生成固有的硬件效率低下。为了解决这个问题,我们引入了两项创新:(1)我们将一致性蒸馏扩展到视频领域,并专门针对视频模型进行调整,实现了在少量采样步骤下的高效推理;(2)为了充分利用并行计算,受到相邻帧通常共享相同动作输入的观察启发,我们提出了推测采样。在这种方法中,模型使用当前动作输入生成接下来的几帧,如果输入动作不同,则丢弃推测生成的帧。在一个大规模动作条件视频生成基准测试上的实验证明,NFD在视觉质量和采样效率方面均优于自回归基线。我们首次在A100 GPU上使用310M模型实现了每秒30帧以上的自回归视频生成。
更新时间: 2025-07-04 14:56:46
领域: cs.CV,cs.AI
Disentangling Doubt in Deep Causal AI
Accurate individual treatment-effect estimation in high-stakes applications demands both reliable point predictions and interpretable uncertainty quantification. We propose a factorized Monte Carlo Dropout framework for deep twin-network models that splits total predictive variance into representation uncertainty (sigma_rep) in the shared encoder and prediction uncertainty (sigma_pred) in the outcome heads. Across three synthetic covariate-shift regimes, our intervals are well-calibrated (ECE < 0.03) and satisfy sigma_rep^2 + sigma_pred^2 ~ sigma_tot^2. Additionally, we observe a crossover: head uncertainty leads on in-distribution data, but representation uncertainty dominates under shift. Finally, on a real-world twins cohort with induced multivariate shifts, only sigma_rep spikes on out-of-distribution samples (delta sigma ~ 0.0002) and becomes the primary error predictor (rho_rep <= 0.89), while sigma_pred remains flat. This module-level decomposition offers a practical diagnostic for detecting and interpreting uncertainty sources in deep causal-effect models.
Updated: 2025-07-04 14:48:51
标题: 深度因果人工智能中的怀疑解脱
摘要: 在高风险应用中精确估计个体治疗效果既需要可靠的点预测,也需要可解释的不确定性量化。我们提出了一个基于蒙特卡罗Dropout框架的深度双网络模型,将总体预测方差分解为在共享编码器中的表示不确定性(sigma_rep)和在结果头部中的预测不确定性(sigma_pred)。在三种合成协变量转移情况下,我们的区间很好地校准(ECE <0.03),并满足sigma_rep ^ 2 + sigma_pred ^ 2〜sigma_tot ^ 2。此外,我们观察到一个交叉点:头部不确定性在分布数据中领先,但在转移中表示不确定性占主导地位。最后,在一个引入多元转移的真实双胞胎队列中,只有sigma_rep在分布之外的样本上急剧增加(delta sigma〜0.0002),并成为主要的错误预测因子(rho_rep <=0.89),而sigma_pred保持不变。这种模块级别的分解为在深度因果效应模型中检测和解释不确定性来源提供了一个实用的诊断方法。
更新时间: 2025-07-04 14:48:51
领域: cs.LG,cs.AI,stat.ML
Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy
Although prompt engineering is central to unlocking the full potential of Large Language Models (LLMs), crafting effective prompts remains a time-consuming trial-and-error process that relies on human intuition. This study investigates Declarative Self-improving Python (DSPy), an optimization framework that programmatically creates and refines prompts, applied to five use cases: guardrail enforcement, hallucination detection in code, code generation, routing agents, and prompt evaluation. Each use case explores how prompt optimization via DSPy influences performance. While some cases demonstrated modest improvements - such as minor gains in the guardrails use case and selective enhancements in hallucination detection - others showed notable benefits. The prompt evaluation criterion task demonstrated a substantial performance increase, rising accuracy from 46.2% to 64.0%. In the router agent case, the possibility of improving a poorly performing prompt and of a smaller model matching a stronger one through optimized prompting was explored. Although prompt refinement increased accuracy from 85.0% to 90.0%, using the optimized prompt with a cheaper model did not improve performance. Overall, this study's findings suggest that DSPy's systematic prompt optimization can enhance LLM performance, particularly when instruction tuning and example selection are optimized together. However, the impact varies by task, highlighting the importance of evaluating specific use cases in prompt optimization research.
Updated: 2025-07-04 14:46:56
标题: 是时候将提示视为代码了吗?使用DSPy进行提示优化的多用途案例研究
摘要: 尽管及时的工程技术对于释放大型语言模型(LLMs)的全部潜力至关重要,但制定有效的提示仍然是一个耗时的试错过程,依赖于人类直觉。本研究调查了声明式自我改进Python(DSPy),这是一个优化框架,以程序方式创建和细化提示,应用于五个用例:护栏执行、代码中的幻觉检测、代码生成、路由代理和提示评估。每个用例探讨了通过DSPy进行提示优化如何影响性能。虽然一些案例表现出了适度的改进 - 例如在护栏用例中的微小增益和幻觉检测中的选择性增强 - 其他案例则显示出了显著的好处。提示评估标准任务显示出了显著的性能提升,将准确性从46.2%提高到64.0%。在路由代理案例中,探讨了通过优化提示来改进性能较差的提示和较小模型匹配更强大模型的可能性。尽管提示的细化将准确性从85.0%提高到90.0%,但使用优化提示与更便宜的模型并没有提高性能。总的来说,本研究的发现表明,DSPy的系统提示优化可以增强LLM的性能,尤其是在调整指令和示例选择进行优化时。然而,其影响因任务而异,突显了评估提示优化研究中特定用例的重要性。
更新时间: 2025-07-04 14:46:56
领域: cs.SE,cs.AI,cs.CL,cs.LG,68T50,I.2.7; D.2.3
Blackbox Dataset Inference for LLM
Today, the training of large language models (LLMs) can involve personally identifiable information and copyrighted material, incurring dataset misuse. To mitigate the problem of dataset misuse, this paper explores \textit{dataset inference}, which aims to detect if a suspect model $\mathcal{M}$ used a victim dataset $\mathcal{D}$ in training. Previous research tackles dataset inference by aggregating results of membership inference attacks (MIAs) -- methods to determine whether individual samples are a part of the training dataset. However, restricted by the low accuracy of MIAs, previous research mandates grey-box access to $\mathcal{M}$ to get intermediate outputs (probabilities, loss, perplexity, etc.) for obtaining satisfactory results. This leads to reduced practicality, as LLMs, especially those deployed for profits, have limited incentives to return the intermediate outputs. In this paper, we propose a new method of dataset inference with only black-box access to the target model (i.e., assuming only the text-based responses of the target model are available). Our method is enabled by two sets of locally built reference models, one set involving $\mathcal{D}$ in training and the other not. By measuring which set of reference model $\mathcal{M}$ is closer to, we determine if $\mathcal{M}$ used $\mathcal{D}$ for training. Evaluations of real-world LLMs in the wild show that our method offers high accuracy in all settings and presents robustness against bypassing attempts.
Updated: 2025-07-04 14:45:41
标题: 黑匣子数据集推理用于LLM
摘要: 今天,训练大型语言模型(LLMs)可能涉及个人可识别信息和受版权保护的材料,导致数据集被滥用。为了减轻数据集滥用的问题,本文探讨了\textit{数据集推断},旨在检测可疑模型$\mathcal{M}$是否在训练中使用了受害数据集$\mathcal{D}$。先前的研究通过聚合成员推断攻击(MIAs)的结果来解决数据集推断问题,MIAs是一种确定个别样本是否属于训练数据集的方法。然而,受到MIAs低准确性的限制,先前的研究要求对$\mathcal{M}$进行灰盒访问,以获得中间输出(概率,损失,困惑度等)以获得令人满意的结果。这导致实用性降低,因为LLMs,特别是那些为了盈利而部署的模型,很少有动机返回中间输出。 在本文中,我们提出了一种具有对目标模型的黑盒访问权限的数据集推断新方法(即,假设仅可获得目标模型的基于文本的响应)。我们的方法借助两组本地构建的参考模型,一组涉及$\mathcal{D}$在训练中的使用,另一组则不涉及。通过测量$\mathcal{M}$与哪组参考模型更接近,我们确定$\mathcal{M}$是否使用了$\mathcal{D}$进行训练。对现实世界中的野外LLMs进行的评估表明,我们的方法在所有设置中都具有高准确性,并且具有对绕过尝试的强韧性。
更新时间: 2025-07-04 14:45:41
领域: cs.CR
RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning
The integration of external knowledge through Retrieval-Augmented Generation (RAG) has become foundational in enhancing large language models (LLMs) for knowledge-intensive tasks. However, existing RAG paradigms often overlook the cognitive step of applying knowledge, leaving a gap between retrieved facts and task-specific reasoning. In this work, we introduce RAG+, a principled and modular extension that explicitly incorporates application-aware reasoning into the RAG pipeline. RAG+ constructs a dual corpus consisting of knowledge and aligned application examples, created either manually or automatically, and retrieves both jointly during inference. This design enables LLMs not only to access relevant information but also to apply it within structured, goal-oriented reasoning processes. Experiments across mathematical, legal, and medical domains, conducted on multiple models, demonstrate that RAG+ consistently outperforms standard RAG variants, achieving average improvements of 3-5%, and peak gains up to 7.5% in complex scenarios. By bridging retrieval with actionable application, RAG+ advances a more cognitively grounded framework for knowledge integration, representing a step toward more interpretable and capable LLMs.
Updated: 2025-07-04 14:43:14
标题: RAG+: 利用应用感知推理增强检索增强生成
摘要: 通过检索增强生成(RAG),外部知识的整合已经成为增强大型语言模型(LLMs)用于知识密集型任务的基础。然而,现有的RAG范式往往忽略了应用知识的认知步骤,导致检索到的事实和特定任务的推理之间存在差距。在这项工作中,我们引入了RAG+,这是一个原则性和模块化的扩展,明确将应用感知推理纳入到RAG管道中。RAG+构建了一个由知识和对齐的应用示例组成的双语料库,可以手动或自动创建,并在推理过程中同时检索这两者。这种设计使LLMs不仅可以访问相关信息,还可以在结构化、目标导向的推理过程中应用这些信息。在数学、法律和医疗领域进行的实验,使用多个模型进行,表明RAG+始终优于标准RAG变体,平均提高3-5%,在复杂情景中最高可达7.5%。通过将检索与可操作的应用连接起来,RAG+推进了一个更具认知基础的知识整合框架,代表了朝着更可解释和有能力的LLMs迈出的一步。
更新时间: 2025-07-04 14:43:14
领域: cs.AI,cs.CL
EvoAgentX: An Automated Framework for Evolving Agentic Workflows
Multi-agent systems (MAS) have emerged as a powerful paradigm for orchestrating large language models (LLMs) and specialized tools to collaboratively address complex tasks. However, existing MAS frameworks often require manual workflow configuration and lack native support for dynamic evolution and performance optimization. In addition, many MAS optimization algorithms are not integrated into a unified framework. In this paper, we present EvoAgentX, an open-source platform that automates the generation, execution, and evolutionary optimization of multi-agent workflows. EvoAgentX employs a modular architecture consisting of five core layers: the basic components, agent, workflow, evolving, and evaluation layers. Specifically, within the evolving layer, EvoAgentX integrates three MAS optimization algorithms, TextGrad, AFlow, and MIPRO, to iteratively refine agent prompts, tool configurations, and workflow topologies. We evaluate EvoAgentX on HotPotQA, MBPP, and MATH for multi-hop reasoning, code generation, and mathematical problem solving, respectively, and further assess it on real-world tasks using GAIA. Experimental results show that EvoAgentX consistently achieves significant performance improvements, including a 7.44% increase in HotPotQA F1, a 10.00% improvement in MBPP pass@1, a 10.00% gain in MATH solve accuracy, and an overall accuracy improvement of up to 20.00% on GAIA. The source code is available at: https://github.com/EvoAgentX/EvoAgentX
Updated: 2025-07-04 14:43:10
标题: EvoAgentX:一个用于演化代理工作流程的自动化框架
摘要: 多智能体系统(MAS)已经成为一种强大的范式,用于协调大型语言模型(LLMs)和专门工具,共同解决复杂任务。然而,现有的MAS框架通常需要手动工作流配置,并且缺乏对动态演化和性能优化的本地支持。此外,许多MAS优化算法并未集成到统一框架中。在本文中,我们介绍了EvoAgentX,这是一个开源平台,用于自动化生成、执行和演化优化多智能体工作流。EvoAgentX采用由五个核心层组成的模块化架构:基本组件、智能体、工作流、演化和评估层。具体来说,在演化层内,EvoAgentX集成了三种MAS优化算法,TextGrad、AFlow和MIPRO,以迭代地改进智能体提示、工具配置和工作流拓扑。我们在HotPotQA、MBPP和MATH上评估了EvoAgentX,用于多跳推理、代码生成和数学问题求解,同时使用GAIA在现实任务上进一步评估了它。实验结果显示,EvoAgentX始终实现了显著的性能提升,包括HotPotQA F1增加了7.44%,MBPP pass@1改善了10.00%,MATH解决准确性提高了10.00%,在GAIA上整体准确性提高了高达20.00%。源代码可在以下链接找到:https://github.com/EvoAgentX/EvoAgentX
更新时间: 2025-07-04 14:43:10
领域: cs.AI
Multi-Hop Reasoning for Question Answering with Hyperbolic Representations
Hyperbolic representations are effective in modeling knowledge graph data which is prevalently used to facilitate multi-hop reasoning. However, a rigorous and detailed comparison of the two spaces for this task is lacking. In this paper, through a simple integration of hyperbolic representations with an encoder-decoder model, we perform a controlled and comprehensive set of experiments to compare the capacity of hyperbolic space versus Euclidean space in multi-hop reasoning. Our results show that the former consistently outperforms the latter across a diverse set of datasets. In addition, through an ablation study, we show that a learnable curvature initialized with the delta hyperbolicity of the utilized data yields superior results to random initializations. Furthermore, our findings suggest that hyperbolic representations can be significantly more advantageous when the datasets exhibit a more hierarchical structure.
Updated: 2025-07-04 14:39:01
标题: 使用双曲表示进行多跳推理的问题回答
摘要: 双曲表示在建模知识图数据方面非常有效,这种数据通常用于促进多跳推理。然而,对于这一任务的两个空间的严格而详细的比较尚不足。在本文中,通过简单地将双曲表示与编码器-解码器模型集成,我们进行了一系列受控的综合实验,比较了双曲空间与欧几里得空间在多跳推理中的能力。我们的结果表明,在各种数据集中,前者始终优于后者。此外,通过消融研究,我们表明,一个以所使用数据的三角双曲性初始化的可学习曲率比随机初始化产生更好的结果。此外,我们的发现表明,当数据集呈现更具层次结构时,双曲表示可能更加有利。
更新时间: 2025-07-04 14:39:01
领域: cs.CL,cs.AI
DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task Recommendation
Neural-based multi-task learning (MTL) has been successfully applied to many recommendation applications. However, these MTL models (e.g., MMoE, PLE) did not consider feature interaction during the optimization, which is crucial for capturing complex high-order features and has been widely used in ranking models for real-world recommender systems. Moreover, through feature importance analysis across various tasks in MTL, we have observed an interesting divergence phenomenon that the same feature can have significantly different importance across different tasks in MTL. To address these issues, we propose Deep Multiple Task-specific Feature Interactions Network (DTN) with a novel model structure design. DTN introduces multiple diversified task-specific feature interaction methods and task-sensitive network in MTL networks, enabling the model to learn task-specific diversified feature interaction representations, which improves the efficiency of joint representation learning in a general setup. We applied DTN to our company's real-world E-commerce recommendation dataset, which consisted of over 6.3 billion samples, the results demonstrated that DTN significantly outperformed state-of-the-art MTL models. Moreover, during online evaluation of DTN in a large-scale E-commerce recommender system, we observed a 3.28% in clicks, a 3.10% increase in orders and a 2.70% increase in GMV (Gross Merchandise Value) compared to the state-of-the-art MTL models. Finally, extensive offline experiments conducted on public benchmark datasets demonstrate that DTN can be applied to various scenarios beyond recommendations, enhancing the performance of ranking models.
Updated: 2025-07-04 14:36:47
标题: DTN: 深度多任务特定特征交互网络用于多任务推荐
摘要: 基于神经网络的多任务学习(MTL)已成功应用于许多推荐应用中。然而,这些MTL模型(例如MMoE,PLE)在优化过程中并未考虑特征交互,而这对于捕捉复杂的高阶特征至关重要,并且在现实世界的推荐系统中的排名模型中被广泛使用。此外,通过在MTL中各种任务之间进行特征重要性分析,我们观察到了一个有趣的分歧现象,即相同特征在不同任务中的重要性可能会有显着不同。为了解决这些问题,我们提出了一种具有新颖模型结构设计的深度多任务特定特征交互网络(DTN)。DTN引入了多种多样化的任务特定特征交互方法和任务敏感网络在MTL网络中,使模型能够学习任务特定的多样化特征交互表示,从而提高了在一般设置中联合表示学习的效率。我们将DTN应用于我们公司的真实世界电子商务推荐数据集,该数据集包含超过63亿个样本,结果表明DTN在性能上明显优于最先进的MTL模型。此外,在大规模电子商务推荐系统中在线评估DTN时,我们观察到点击率增加了3.28%,订单增加了3.10%,GMV(商品总价值)增加了2.70%,与最先进的MTL模型相比。最后,在公共基准数据集上进行的广泛离线实验表明,DTN可以应用于超出推荐范围的各种场景,提高排名模型的性能。
更新时间: 2025-07-04 14:36:47
领域: cs.IR,cs.LG
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
This work introduces Robots Imitating Generated Videos (RIGVid), a system that enables robots to perform complex manipulation tasks--such as pouring, wiping, and mixing--purely by imitating AI-generated videos, without requiring any physical demonstrations or robot-specific training. Given a language command and an initial scene image, a video diffusion model generates potential demonstration videos, and a vision-language model (VLM) automatically filters out results that do not follow the command. A 6D pose tracker then extracts object trajectories from the video, and the trajectories are retargeted to the robot in an embodiment-agnostic fashion. Through extensive real-world evaluations, we show that filtered generated videos are as effective as real demonstrations, and that performance improves with generation quality. We also show that relying on generated videos outperforms more compact alternatives such as keypoint prediction using VLMs, and that strong 6D pose tracking outperforms other ways to extract trajectories, such as dense feature point tracking. These findings suggest that videos produced by a state-of-the-art off-the-shelf model can offer an effective source of supervision for robotic manipulation.
Updated: 2025-07-04 14:35:12
标题: 通过模仿生成的视频进行机器人操作,无需物理演示
摘要: 这项工作介绍了机器人模仿生成视频(RIGVid)系统,该系统使机器人能够通过模仿人工智能生成的视频执行复杂的操作任务,如倒、擦和混合,而无需任何物理演示或机器人特定训练。在给定语言命令和初始场景图像的情况下,视频扩散模型生成潜在的演示视频,视觉语言模型(VLM)自动过滤不符合命令的结果。然后,一个6D姿态跟踪器从视频中提取物体轨迹,这些轨迹以一种不依赖具体实现的方式重新定位到机器人。通过大量的真实世界评估,我们展示了过滤后的生成视频与真实演示一样有效,并且性能随着生成质量的提高而提高。我们还展示了依赖生成视频胜过更紧凑的替代方法,如使用VLM进行关键点预测,并且强大的6D姿态跟踪胜过其他提取轨迹的方式,如密集特征点跟踪。这些发现表明,由一流的现成模型生成的视频可以为机器人操作提供有效的监督来源。
更新时间: 2025-07-04 14:35:12
领域: cs.RO,cs.AI,cs.CV
Benchmarking Vector, Graph and Hybrid Retrieval Augmented Generation (RAG) Pipelines for Open Radio Access Networks (ORAN)
Generative AI (GenAI) is expected to play a pivotal role in enabling autonomous optimization in future wireless networks. Within the ORAN architecture, Large Language Models (LLMs) can be specialized to generate xApps and rApps by leveraging specifications and API definitions from the RAN Intelligent Controller (RIC) platform. However, fine-tuning base LLMs for telecom-specific tasks remains expensive and resource-intensive. Retrieval-Augmented Generation (RAG) offers a practical alternative through in-context learning, enabling domain adaptation without full retraining. While traditional RAG systems rely on vector-based retrieval, emerging variants such as GraphRAG and Hybrid GraphRAG incorporate knowledge graphs or dual retrieval strategies to support multi-hop reasoning and improve factual grounding. Despite their promise, these methods lack systematic, metric-driven evaluations, particularly in high-stakes domains such as ORAN. In this study, we conduct a comparative evaluation of Vector RAG, GraphRAG, and Hybrid GraphRAG using ORAN specifications. We assess performance across varying question complexities using established generation metrics: faithfulness, answer relevance, context relevance, and factual correctness. Results show that both GraphRAG and Hybrid GraphRAG outperform traditional RAG. Hybrid GraphRAG improves factual correctness by 8%, while GraphRAG improves context relevance by 7%.
Updated: 2025-07-04 14:31:30
标题: 为开放无线接入网络(ORAN)基准测试矢量、图形和混合检索增强生成(RAG)管道
摘要: 生成式人工智能(GenAI)有望在未来的无线网络中发挥关键作用,实现自主优化。在ORAN架构中,大型语言模型(LLMs)可以专门用于利用来自RAN智能控制器(RIC)平台的规范和API定义生成xApps和rApps。然而,为电信特定任务微调基础LLMs仍然昂贵且资源密集。检索增强生成(RAG)通过上下文学习提供了一种实用的替代方案,实现领域适应而无需完全重新训练。虽然传统的RAG系统依赖于基于向量的检索,但新兴的变体如GraphRAG和混合GraphRAG则结合了知识图或双重检索策略,以支持多跳推理并改善事实基础。尽管它们有潜力,但这些方法缺乏系统化、度量驱动的评估,尤其是在ORAN等高风险领域。在本研究中,我们使用ORAN规范对向量RAG、GraphRAG和混合GraphRAG进行了比较评估。我们使用已建立的生成度量对不同问题复杂度下的性能进行评估:忠实度、答案相关性、上下文相关性和事实正确性。结果显示,GraphRAG和混合GraphRAG均优于传统RAG。混合GraphRAG将事实正确性提高了8%,而GraphRAG将上下文相关性提高了7%。
更新时间: 2025-07-04 14:31:30
领域: cs.AI,cs.DC,cs.ET,cs.NI
Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities
AI agents have experienced a paradigm shift, from early dominance by reinforcement learning (RL) to the rise of agents powered by large language models (LLMs), and now further advancing towards a synergistic fusion of RL and LLM capabilities. This progression has endowed AI agents with increasingly strong abilities. Despite these advances, to accomplish complex real-world tasks, agents are required to plan and execute effectively, maintain reliable memory, and coordinate smoothly with other agents. Achieving these capabilities involves contending with ever-present intricate information, operations, and interactions. In light of this challenge, data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms that agents can more effectively understand and process. In this context, graphs, with their natural advantage in organizing, managing, and harnessing intricate data relationships, present a powerful data paradigm for structurization to support the capabilities demanded by advanced AI agents. To this end, this survey presents a first systematic review of how graphs can empower AI agents. Specifically, we explore the integration of graph techniques with core agent functionalities, highlight notable applications, and identify prospective avenues for future research. By comprehensively surveying this burgeoning intersection, we hope to inspire the development of next-generation AI agents equipped to tackle increasingly sophisticated challenges with graphs. Related resources are collected and continuously updated for the community in the Github link.
Updated: 2025-07-04 14:29:40
标题: 图形遇见人工智能代理:分类、进展和未来机遇
摘要: AI代理经历了一个范式转变,从早期由强化学习(RL)主导到由大型语言模型(LLMs)驱动的代理的崛起,现在进一步向RL和LLM能力的协同融合前进。这一发展赋予了AI代理越来越强大的能力。尽管取得了这些进展,要完成复杂的现实世界任务,代理需要有效地规划和执行,保持可靠的记忆,并与其他代理协调顺畅。实现这些能力涉及处理始终存在的复杂信息、操作和交互。在面对这一挑战时,数据结构化可以通过将复杂和杂乱无章的数据转化为良好结构化形式,使代理能够更有效地理解和处理。在这种情况下,图形,以其在组织、管理和利用复杂数据关系方面的自然优势,提供了一种强大的数据范式,支持高级AI代理所需的能力。为此,本调查介绍了图形如何赋予AI代理能力的第一个系统性审查。具体而言,我们探讨了图形技术与核心代理功能的集成,突出了显著的应用程序,并确定了未来研究的前景。通过全面调查这一蓬勃发展的交叉领域,我们希望激发开发下一代AI代理的动力,这些代理配备了图形来应对日益复杂的挑战。相关资源已被收集并持续更新,供社区在Github链接中查阅。
更新时间: 2025-07-04 14:29:40
领域: cs.AI
VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification
This paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.
Updated: 2025-07-04 14:28:14
标题: VLAI:一种基于RoBERTa的自动化漏洞严重性分类模型
摘要: 本文介绍了VLAI,这是一个基于Transformer的模型,可以直接从文本描述中预测软件漏洞的严重程度级别。基于RoBERTa构建的VLAI在超过60万个真实世界的漏洞上进行了微调,并在预测严重程度类别方面达到了超过82%的准确率,使得在手动CVSS评分之前能够更快速和更一致地进行分类。该模型和数据集是开源的,并已集成到漏洞查找服务中。
更新时间: 2025-07-04 14:28:14
领域: cs.CR
Non-negative matrix factorization algorithms generally improve topic model fits
In an effort to develop topic modeling methods that can be quickly applied to large data sets, we revisit the problem of maximum-likelihood estimation in topic models. It is known, at least informally, that maximum-likelihood estimation in topic models is closely related to non-negative matrix factorization (NMF). Yet, to our knowledge, this relationship has not been exploited previously to fit topic models. We show that recent advances in NMF optimization methods can be leveraged to fit topic models very efficiently, often resulting in much better fits and in less time than existing algorithms for topic models. We also formally make the connection between the NMF optimization problem and maximum-likelihood estimation for the topic model, and using this result we show that the expectation maximization (EM) algorithm for the topic model is essentially the same as the classic multiplicative updates for NMF (the only difference being that the operations are performed in a different order). Our methods are implemented in the R package fastTopics.
Updated: 2025-07-04 14:26:14
标题: 非负矩阵分解算法通常可以改善主题模型拟合效果
摘要: 为了开发能够快速应用于大数据集的主题建模方法,我们重新审视了主题模型中的最大似然估计问题。众所周知,至少在非正式上,主题模型中的最大似然估计与非负矩阵分解(NMF)密切相关。然而,据我们所知,这种关系以前尚未被利用来拟合主题模型。我们展示了最近在NMF优化方法中取得的进展可以被利用来非常高效地拟合主题模型,通常会比现有的主题模型算法更好地拟合,并且用时更少。我们还正式建立了NMF优化问题与主题模型最大似然估计之间的联系,并利用这一结果表明,主题模型的期望最大化(EM)算法本质上与NMF的经典乘法更新相同(唯一的区别在于操作的顺序不同)。我们的方法已在R包fastTopics中实现。
更新时间: 2025-07-04 14:26:14
领域: stat.ML,cs.LG,stat.CO
Behaviour Space Analysis of LLM-driven Meta-heuristic Discovery
We investigate the behaviour space of meta-heuristic optimisation algorithms automatically generated by Large Language Model driven algorithm discovery methods. Using the Large Language Evolutionary Algorithm (LLaMEA) framework with a GPT o4-mini LLM, we iteratively evolve black-box optimisation heuristics, evaluated on 10 functions from the BBOB benchmark suite. Six LLaMEA variants, featuring different mutation prompt strategies, are compared and analysed. We log dynamic behavioural metrics including exploration, exploitation, convergence and stagnation measures, for each run, and analyse these via visual projections and network-based representations. Our analysis combines behaviour-based projections, Code Evolution Graphs built from static code features, performance convergence curves, and behaviour-based Search Trajectory Networks. The results reveal clear differences in search dynamics and algorithm structures across LLaMEA configurations. Notably, the variant that employs both a code simplification prompt and a random perturbation prompt in a 1+1 elitist evolution strategy, achieved the best performance, with the highest Area Over the Convergence Curve. Behaviour-space visualisations show that higher-performing algorithms exhibit more intensive exploitation behaviour and faster convergence with less stagnation. Our findings demonstrate how behaviour-space analysis can explain why certain LLM-designed heuristics outperform others and how LLM-driven algorithm discovery navigates the open-ended and complex search space of algorithms. These findings provide insights to guide the future design of adaptive LLM-driven algorithm generators.
Updated: 2025-07-04 14:19:39
标题: LLM驱动的元启发式发现的行为空间分析
摘要: 我们研究了由大型语言模型驱动的算法发现方法自动生成的元启发式优化算法的行为空间。使用Large Language Evolutionary Algorithm(LLaMEA)框架和一个GPT o4-mini LLM,我们迭代地演化黑盒优化启发式方法,在BBOB基准套件的10个函数上进行评估。比较和分析了六种LLaMEA变体,其采用不同的突变提示策略。我们记录了每次运行的动态行为度量,包括探索、利用、收敛和停滞度量,并通过视觉投影和基于网络的表示进行分析。我们的分析结合了基于行为的投影、从静态代码特征构建的代码演化图、性能收敛曲线,以及基于行为的搜索轨迹网络。结果显示,不同的LLaMEA配置在搜索动态和算法结构上存在明显差异。值得注意的是,采用代码简化提示和随机扰动提示的1+1精英演化策略变体表现最佳,具有最高的收敛曲线覆盖面积。行为空间可视化显示,性能更好的算法表现出更加强烈的利用行为和更快的收敛速度,同时减少了停滞。我们的研究结果展示了行为空间分析如何解释为什么某些由LLM设计的启发式方法胜过其他方法,以及LLM驱动的算法发现如何遍历算法的开放和复杂搜索空间。这些发现为未来设计自适应LLM驱动的算法生成器提供了指导。
更新时间: 2025-07-04 14:19:39
领域: cs.NE,cs.AI
Kinetic Langevin Diffusion for Crystalline Materials Generation
Generative modeling of crystalline materials using diffusion models presents a series of challenges: the data distribution is characterized by inherent symmetries and involves multiple modalities, with some defined on specific manifolds. Notably, the treatment of fractional coordinates representing atomic positions in the unit cell requires careful consideration, as they lie on a hypertorus. In this work, we introduce Kinetic Langevin Diffusion for Materials (KLDM), a novel diffusion model for crystalline materials generation, where the key innovation resides in the modeling of the coordinates. Instead of resorting to Riemannian diffusion on the hypertorus directly, we generalize Trivialized Diffusion Model (TDM) to account for the symmetries inherent to crystals. By coupling coordinates with auxiliary Euclidean variables representing velocities, the diffusion process is now offset to a flat space. This allows us to effectively perform diffusion on the hypertorus while providing a training objective that accounts for the periodic translation symmetry of the true data distribution. We evaluate KLDM on both Crystal Structure Prediction (CSP) and De-novo Generation (DNG) tasks, demonstrating its competitive performance with current state-of-the-art models.
Updated: 2025-07-04 14:18:26
标题: 动力学Langevin扩散用于晶体材料生成
摘要: 利用扩散模型对晶体材料进行生成建模面临一系列挑战:数据分布具有固有的对称性并涉及多种模式,其中一些定义在特定流形上。值得注意的是,代表原子在晶胞中位置的分数坐标的处理需要仔细考虑,因为它们位于一个超环面上。在这项工作中,我们引入了一种新颖的晶体材料生成扩散模型——动力学 Langevin 扩散模型 (KLDM),其关键创新在于对坐标的建模。我们不是直接采用超环面上的黎曼扩散,而是将琐碎化扩散模型 (TDM) 推广到考虑晶体固有的对称性。通过将坐标与代表速度的辅助欧几里得变量耦合,扩散过程现在被偏移到一个平坦空间。这使我们能够在超环面上有效地执行扩散,同时提供一个培训目标,考虑到真实数据分布的周期平移对称性。我们在晶体结构预测 (CSP) 和全新生成 (DNG) 任务上评估了 KLDM,展示了其与当前最先进模型竞争性能。
更新时间: 2025-07-04 14:18:26
领域: cs.LG
Many-Task Federated Fine-Tuning via Unified Task Vectors
Federated Learning (FL) traditionally assumes homogeneous client tasks; however, in real-world scenarios, clients often specialize in diverse tasks, introducing task heterogeneity. To address this challenge, Many-Task FL (MaT-FL) has emerged, enabling clients to collaborate effectively despite task diversity. Existing MaT-FL approaches rely on client grouping or personalized layers, requiring the server to manage individual models and failing to account for clients handling multiple tasks. We propose MaTU, a MaT-FL approach that enables joint learning of task vectors across clients, eliminating the need for clustering or client-specific weight storage at the server. Our method introduces a novel aggregation mechanism that determines task similarity based on the direction of clients task vectors and constructs a unified task vector encapsulating all tasks. To address task-specific requirements, we augment the unified task vector with lightweight modulators that facilitate knowledge transfer among related tasks while disentangling dissimilar ones. Evaluated across 30 datasets, MaTU achieves superior performance over state-of-the-art MaT-FL approaches, with results comparable to per-task fine-tuning, while delivering significant communication savings.
Updated: 2025-07-04 14:14:34
标题: 通过统一任务向量进行的多任务联邦微调
摘要: 传统的联邦学习(FL)假设客户端任务是同质的;然而,在现实场景中,客户端往往专门从事各种各样的任务,引入了任务的异质性。为了解决这一挑战,许多任务FL(MaT-FL)已经出现,使客户端能够有效地协作,尽管任务多样性。现有的MaT-FL方法依赖于客户端分组或个性化层,需要服务器管理个别模型,并未考虑客户端处理多个任务的情况。我们提出了MaTU,一种MaT-FL方法,它实现了跨客户端任务向量的联合学习,消除了服务器对聚类或客户端特定权重存储的需求。我们的方法引入了一种新颖的聚合机制,根据客户端任务向量的方向确定任务相似性,并构建一个封装所有任务的统一任务向量。为了满足任务特定的要求,我们使用轻量级调制器增强了统一任务向量,以促进相关任务之间的知识传递,同时分离不相似的任务。在30个数据集上进行评估,MaTU相比最先进的MaT-FL方法取得了更优异的性能,结果与每个任务的微调相当,同时实现了显著的通信节省。
更新时间: 2025-07-04 14:14:34
领域: cs.LG,cs.CV
MusGO: A Community-Driven Framework For Assessing Openness in Music-Generative AI
Since 2023, generative AI has rapidly advanced in the music domain. Despite significant technological advancements, music-generative models raise critical ethical challenges, including a lack of transparency and accountability, along with risks such as the replication of artists' works, which highlights the importance of fostering openness. With upcoming regulations such as the EU AI Act encouraging open models, many generative models are being released labelled as 'open'. However, the definition of an open model remains widely debated. In this article, we adapt a recently proposed evidence-based framework for assessing openness in LLMs to the music domain. Using feedback from a survey of 110 participants from the Music Information Retrieval (MIR) community, we refine the framework into MusGO (Music-Generative Open AI), which comprises 13 openness categories: 8 essential and 5 desirable. We evaluate 16 state-of-the-art generative models and provide an openness leaderboard that is fully open to public scrutiny and community contributions. Through this work, we aim to clarify the concept of openness in music-generative AI and promote its transparent and responsible development.
Updated: 2025-07-04 14:12:19
标题: MusGO:一个评估音乐生成人工智能开放性的社区驱动框架
摘要: 自2023年以来,生成式人工智能在音乐领域迅速发展。尽管技术进步显著,音乐生成模型引发了一些重要的伦理挑战,包括缺乏透明度和问责制,以及风险,如复制艺术家的作品,这凸显了促进开放性的重要性。随着即将出台的法规,如欧盟人工智能法案鼓励开放模型,许多生成模型被标记为“开放”。然而,对于开放模型的定义仍然广泛争论。在本文中,我们将最近提出的基于证据的框架用于评估LLMs的开放性,适用于音乐领域。通过对来自音乐信息检索(MIR)社区的110名参与者的调查反馈,我们将该框架优化为MusGO(音乐生成开放人工智能),包括13个开放性类别:8个基本类别和5个理想类别。我们评估了16个最先进的生成模型,并提供了一个完全向公众公开审查和社区贡献的开放性排行榜。通过这项工作,我们旨在澄清音乐生成人工智能中的开放性概念,并促进其透明和负责任的发展。
更新时间: 2025-07-04 14:12:19
领域: cs.SD,cs.AI,cs.CY,eess.AS
Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach
Multimodal transformers integrate diverse data types like images, audio, and text, advancing tasks such as audio-visual understanding and image-text retrieval; yet their high parameterization limits deployment on resource-constrained edge devices. Split Learning (SL), which partitions models at a designated cut-layer to offload compute-intensive operations to the server, offers a promising approach for distributed training of multimodal transformers, though its application remains underexplored. We present MPSL, a parallel SL approach for computational efficient fine-tuning of multimodal transformers in a distributed manner, while eliminating label sharing, client synchronization, and per-client sub-model management. MPSL employs lightweight client-side tokenizers and a unified modality-agnostic encoder, allowing flexible adaptation to task-specific needs. Our evaluation across 7 multimodal datasets demonstrates that MPSL matches or outperforms Federated Learning, reduces client-side computations by 250x, and achieves superior scalability in communication cost with model growth. Through extensive analysis, we highlight task suitability, trade-offs, and scenarios where MPSL excels, inspiring further exploration.
Updated: 2025-07-04 14:11:17
标题: 将多模态Transformer在边缘上微调:一种并行分裂学习方法
摘要: 多模态变压器整合了图像、音频和文本等多种数据类型,推进了音频-视觉理解和图像-文本检索等任务;然而,其高参数化限制了在资源受限的边缘设备上的部署。分布式学习(SL)将模型在指定的切割层分割,将计算密集型操作卸载到服务器,为多模态变压器的分布式训练提供了一种有前途的方法,尽管其应用仍未被充分探索。我们提出了MPSL,这是一种并行SL方法,用于在分布式环境中进行高效的多模态变压器微调,同时消除了标签共享、客户端同步和每个客户端子模型管理的需求。MPSL采用轻量级客户端端分词器和统一的模态无关编码器,允许灵活地适应特定任务的需求。我们在7个多模态数据集上的评估表明,MPSL与联邦学习相匹配或优于其表现,将客户端端计算减少了250倍,并在模型增长时达到了更优越的可扩展性。通过广泛的分析,我们强调了任务适用性、权衡和MPSL擅长的场景,激发了进一步的探索。
更新时间: 2025-07-04 14:11:17
领域: cs.DC,cs.LG
RECA-PD: A Robust Explainable Cross-Attention Method for Speech-based Parkinson's Disease Classification
Parkinson's Disease (PD) affects over 10 million people globally, with speech impairments often preceding motor symptoms by years, making speech a valuable modality for early, non-invasive detection. While recent deep-learning models achieve high accuracy, they typically lack the explainability required for clinical use. To address this, we propose RECA-PD, a novel, robust, and explainable cross-attention architecture that combines interpretable speech features with self-supervised representations. RECA-PD matches state-of-the-art performance in Speech-based PD detection while providing explanations that are more consistent and more clinically meaningful. Additionally, we demonstrate that performance degradation in certain speech tasks (e.g., monologue) can be mitigated by segmenting long recordings. Our findings indicate that performance and explainability are not necessarily mutually exclusive. Future work will enhance the usability of explanations for non-experts and explore severity estimation to increase the real-world clinical relevance.
Updated: 2025-07-04 14:05:47
标题: RECA-PD:一种稳健可解释的基于语音的帕金森病分类的交叉关注方法
摘要: 帕金森病(PD)全球影响超过1000万人,其言语障碍通常早于运动症状数年出现,使得言语成为早期、非侵入性检测的宝贵模式。尽管最近的深度学习模型实现了高准确性,但通常缺乏临床应用所需的可解释性。为了解决这一问题,我们提出了RECA-PD,这是一种新颖、稳健且可解释的交叉注意力架构,结合了可解释的言语特征和自监督表示。RECA-PD在基于言语的PD检测中达到了最先进的性能,同时提供的解释更为一致且更具临床意义。此外,我们展示了在某些言语任务(如独白)中性能下降可以通过分割长时间录音来缓解。我们的发现表明性能和可解释性并不一定是互斥的。未来的工作将增强非专家用户的解释可用性,并探索严重程度估计以增加现实世界的临床相关性。
更新时间: 2025-07-04 14:05:47
领域: cs.SD,cs.AI,cs.CL,eess.AS
Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation
The clinical utility of deep learning models for medical image segmentation is severely constrained by their inability to generalize to unseen domains. This failure is often rooted in the models learning spurious correlations between anatomical content and domain-specific imaging styles. To overcome this fundamental challenge, we introduce Causal-SAM-LLM, a novel framework that elevates Large Language Models (LLMs) to the role of causal reasoners. Our framework, built upon a frozen Segment Anything Model (SAM) encoder, incorporates two synergistic innovations. First, Linguistic Adversarial Disentanglement (LAD) employs a Vision-Language Model to generate rich, textual descriptions of confounding image styles. By training the segmentation model's features to be contrastively dissimilar to these style descriptions, it learns a representation robustly purged of non-causal information. Second, Test-Time Causal Intervention (TCI) provides an interactive mechanism where an LLM interprets a clinician's natural language command to modulate the segmentation decoder's features in real-time, enabling targeted error correction. We conduct an extensive empirical evaluation on a composite benchmark from four public datasets (BTCV, CHAOS, AMOS, BraTS), assessing generalization under cross-scanner, cross-modality, and cross-anatomy settings. Causal-SAM-LLM establishes a new state of the art in out-of-distribution (OOD) robustness, improving the average Dice score by up to 6.2 points and reducing the Hausdorff Distance by 15.8 mm over the strongest baseline, all while using less than 9% of the full model's trainable parameters. Our work charts a new course for building robust, efficient, and interactively controllable medical AI systems.
Updated: 2025-07-04 13:52:16
标题: 因果-SAM-LLM:大型语言模型作为鲁棒医学分割的因果推理者
摘要: 深度学习模型在医学图像分割方面的临床实用性受到严重限制,因为它们无法推广到未知领域。这种失败通常根源于模型学习解剖内容与特定域图像风格之间的虚假相关性。为了克服这一根本挑战,我们引入了Causal-SAM-LLM,这是一个将大型语言模型(LLMs)提升为因果推理者角色的新框架。我们的框架建立在一个冻结的Segment Anything Model (SAM)编码器之上,包含两个协同创新。首先,Linguistic Adversarial Disentanglement (LAD)利用Vision-Language模型生成与混淆图像风格相关的丰富文本描述。通过训练分割模型的特征使其与这些风格描述对比不相似,它学习到了一个经过彻底净化非因果信息的表示。其次,Test-Time Causal Intervention (TCI)提供了一个交互机制,LLM解释医生的自然语言命令来实时调节分割解码器的特征,从而实现有针对性的错误修正。我们在来自四个公共数据集(BTCV、CHAOS、AMOS、BraTS)的复合基准上进行了广泛的实证评估,评估跨扫描仪、跨模态和跨解剖设置下的泛化能力。Causal-SAM-LLM在超出分布(OOD)鲁棒性方面建立了一个新的最新技术水平,将平均Dice分数提高了高达6.2个点,并将Hausdorff距离减少了15.8毫米,比最强基线使用的可训练参数少于9%。我们的工作为构建强大、高效和可交互控制的医学人工智能系统开辟了新的道路。
更新时间: 2025-07-04 13:52:16
领域: cs.CV,cs.AI,cs.CL
Exploring Privacy and Security as Drivers for Environmental Sustainability in Cloud-Based Office Solutions
In this paper, we explore the intersection of privacy, security, and environmental sustainability in cloud-based office solutions, focusing on quantifying user- and network-side energy use and associated carbon emissions. We hypothesise that privacy-focused services are typically more energy-efficient than those funded through data collection and advertising. To evaluate this, we propose a framework that systematically measures environmental costs based on energy usage and network data traffic during well-defined, automated usage scenarios. To test our hypothesis, we first analyse how underlying architectures and business models, such as monetisation through personalised advertising, contribute to the environmental footprint of these services. We then explore existing methodologies and tools for software environmental impact assessment. We apply our framework to three mainstream email services selected to reflect different privacy policies, from ad-supported tracking-intensive models to privacy-focused designs: Microsoft Outlook, Google Mail (Gmail), and Proton Mail. We extend this comparison to a self-hosted email solution, evaluated with and without end-to-end encryption. We show that the self-hosted solution, even with 14% of device energy and 15% of emissions overheads from PGP encryption, remains the most energy-efficient, saving up to 33% of emissions per session compared to Gmail. Among commercial providers, Proton Mail is the most efficient, saving up to 0.1 gCO2 e per session compared to Outlook, whose emissions can be further reduced by 2% through ad-blocking.
Updated: 2025-07-04 13:50:03
标题: 探讨隐私和安全作为云办公解决方案环境可持续性的驱动因素
摘要: 在这篇论文中,我们探讨了云办公解决方案中隐私、安全和环境可持续性的交集,重点关注用户和网络端能源使用量及相关碳排放的量化。我们假设注重隐私的服务通常比通过数据收集和广告资助的服务更节能。为了评估这一假设,我们提出了一个系统性测量环境成本的框架,基于在明确定义的自动使用场景中的能源使用和网络数据流量。为了测试我们的假设,我们首先分析了基础架构和商业模式,比如通过个性化广告来实现货币化,是如何对这些服务的环境足迹做出贡献的。然后,我们探讨了现有的用于软件环境影响评估的方法和工具。我们将我们的框架应用于三种代表不同隐私政策的主流电子邮件服务:Microsoft Outlook、Google Mail(Gmail)和Proton Mail。我们将这种比较扩展到一个自托管的电子邮件解决方案,评估了有无端到端加密。我们发现,即使在PGP加密造成的设备能量和排放开销占14%和15%的情况下,自托管解决方案仍然是最节能的,每个会话的排放量比Gmail节省高达33%。在商业供应商中,Proton Mail是最有效的,每个会话比Outlook节省高达0.1 gCO2 e,Outlook的排放量可以通过广告拦截再减少2%。
更新时间: 2025-07-04 13:50:03
领域: cs.CR,cs.CY,cs.SE
A Universal Approach to Feature Representation in Dynamic Task Assignment Problems
Dynamic task assignment concerns the optimal assignment of resources to tasks in a business process. Recently, Deep Reinforcement Learning (DRL) has been proposed as the state of the art for solving assignment problems. DRL methods usually employ a neural network (NN) as an approximator for the policy function, which ingests the state of the process and outputs a valuation of the possible assignments. However, representing the state and the possible assignments so that they can serve as inputs and outputs for a policy NN remains an open challenge, especially when tasks or resources have features with an infinite number of possible values. To solve this problem, this paper proposes a method for representing and solving assignment problems with infinite state and action spaces. In doing so, it provides three contributions: (I) A graph-based feature representation of assignment problems, which we call assignment graph; (II) A mapping from marked Colored Petri Nets to assignment graphs; (III) An adaptation of the Proximal Policy Optimization algorithm that can learn to solve assignment problems represented through assignment graphs. To evaluate the proposed representation method, we model three archetypal assignment problems ranging from finite to infinite state and action space dimensionalities. The experiments show that the method is suitable for representing and learning close-to-optimal task assignment policies regardless of the state and action space dimensionalities.
Updated: 2025-07-04 13:48:28
标题: 一个通用的方法来处理动态任务分配问题中的特征表示
摘要: 动态任务分配涉及将资源最佳分配给业务流程中的任务。最近,深度强化学习(DRL)被提出作为解决分配问题的最先进技术。DRL方法通常使用神经网络(NN)作为策略函数的近似器,该函数接收流程的状态并输出可能分配的估值。然而,表示状态和可能分配以便它们可以作为策略NN的输入和输出仍然是一个挑战,特别是当任务或资源具有无限可能值的特征时。为了解决这个问题,本文提出了一种表示和解决具有无限状态和动作空间的分配问题的方法。在此过程中,它提供了三个贡献:(I)一种基于图的特征表示分配问题的方法,我们称之为分配图;(II)一种将标记的彩色佩特里网映射到分配图的方法;(III)一种适应分配图表示的Proximal Policy Optimization算法,可以学习解决通过分配图表示的分配问题。为了评估提出的表示方法,我们对从有限到无限状态和动作空间维度的三个原型分配问题进行了建模。实验表明,该方法适用于表示和学习接近最佳任务分配策略,无论状态和动作空间维度如何。
更新时间: 2025-07-04 13:48:28
领域: cs.AI
SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
In recent years, there has been a proliferation of spatiotemporal foundation models in different scientific disciplines. While promising, these models are often domain-specific and are only assessed within the particular applications for which they are designed. Given that many tasks can be represented as video modeling problems, video foundation models (ViFMs) hold considerable promise as general-purpose domain-agnostic approaches. However, it is not known whether the knowledge acquired on large-scale but potentially out-of-domain data can be effectively transferred across diverse scientific disciplines, and if a single, pretrained ViFM can be competitive with domain-specific baselines. To address this, we introduce SciVid, a comprehensive benchmark comprising five *Sci*entific *Vid*eo tasks, across medical computer vision, animal behavior, and weather forecasting. We adapt six leading ViFMs to SciVid using simple trainable readout modules, establishing strong baselines and demonstrating the potential for effective transfer learning. Specifically, we show that state-of-the-art results can be obtained in several applications by leveraging the general-purpose representations from ViFM backbones. Furthermore, our results reveal the limitations of existing ViFMs, and highlight opportunities for the development of generalizable models for high-impact scientific applications. We release our code at https://github.com/google-deepmind/scivid to facilitate further research in the development of ViFMs.
Updated: 2025-07-04 13:48:12
标题: SciVid:科学应用中视频模型的跨领域评估
摘要: 近年来,在不同的科学学科中出现了大量的时空基础模型。尽管有希望,这些模型通常是领域特定的,并且仅在它们设计的特定应用中进行评估。鉴于许多任务可以表示为视频建模问题,视频基础模型(ViFMs)作为通用领域无关方法具有相当的潜力。然而,目前尚不清楚在大规模但潜在不同领域的数据上获得的知识是否可以有效地在不同科学学科之间进行转移,以及单个预训练的ViFM是否能与领域特定的基线竞争。为了解决这个问题,我们引入了SciVid,这是一个包括医学计算机视觉、动物行为和天气预测等五个*科*学*Vid*eo任务的综合基准测试。我们利用简单的可训练读出模块将六个领先的ViFMs调整到SciVid,建立了强大的基线,并展示了有效迁移学习的潜力。具体来说,我们表明通过利用ViFM骨干的通用表示可以在几个应用中获得最先进的结果。此外,我们的结果揭示了现有ViFMs的局限性,并突出了为高影响科学应用程序开发可泛化模型的机会。我们在https://github.com/google-deepmind/scivid上发布了我们的代码,以促进ViFMs的进一步研究发展。
更新时间: 2025-07-04 13:48:12
领域: cs.CV,cs.AI,cs.LG
Follow the STARs: Dynamic $ω$-Regular Shielding of Learned Policies
This paper presents a novel dynamic post-shielding framework that enforces the full class of $\omega$-regular correctness properties over pre-computed probabilistic policies. This constitutes a paradigm shift from the predominant setting of safety-shielding -- i.e., ensuring that nothing bad ever happens -- to a shielding process that additionally enforces liveness -- i.e., ensures that something good eventually happens. At the core, our method uses Strategy-Template-based Adaptive Runtime Shields (STARs), which leverage permissive strategy templates to enable post-shielding with minimal interference. As its main feature, STARs introduce a mechanism to dynamically control interference, allowing a tunable enforcement parameter to balance formal obligations and task-specific behavior at runtime. This allows to trigger more aggressive enforcement when needed, while allowing for optimized policy choices otherwise. In addition, STARs support runtime adaptation to changing specifications or actuator failures, making them especially suited for cyber-physical applications. We evaluate STARs on a mobile robot benchmark to demonstrate their controllable interference when enforcing (incrementally updated) $\omega$-regular correctness properties over learned probabilistic policies.
Updated: 2025-07-04 13:40:51
标题: 跟随STARs:学习策略的动态$ω$-正则屏蔽
摘要: 本文提出了一种新颖的动态后处理屏蔽框架,它可以在预先计算的概率策略上执行完整类别的$\omega$-regular正确性属性。这构成了从主导的安全屏蔽设置 - 即确保不会发生任何不好的事情 - 到一个同时强制执行存活性的屏蔽过程的范式转变 - 即确保最终会发生一些好事情。在核心部分,我们的方法使用基于策略模板的自适应运行时屏蔽器(STARs),这些屏蔽器利用宽容的策略模板来实现最小干扰的后屏蔽。作为其主要特点,STARs引入了一种机制来动态控制干扰,允许可调节的执行参数在运行时平衡正式义务和任务特定行为。这使得在需要时触发更积极的执行成为可能,同时在其他情况下允许优化策略选择。此外,STARs支持对变化规范或执行器故障的运行时适应,使它们特别适用于网络物理应用。我们在移动机器人基准测试上评估了STARs,以展示它们在强制执行(增量更新的)$\omega$-regular正确性属性时的可控干扰。
更新时间: 2025-07-04 13:40:51
领域: cs.AI,cs.LO
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model
Large language models (LLMs) have revolutionized natural language processing and are increasingly applied to other sequential data types, including genetic sequences. However, adapting LLMs to genomics presents significant challenges. Capturing complex genomic interactions requires modeling long-range dependencies within DNA sequences, where interactions often span over 10,000 base pairs, even within a single gene, posing substantial computational burdens under conventional model architectures and training paradigms. Moreover, standard LLM training approaches are suboptimal for DNA: autoregressive training, while efficient, supports only unidirectional understanding. However, DNA is inherently bidirectional, e.g., bidirectional promoters regulate transcription in both directions and account for nearly 11% of human gene expression. Masked language models (MLMs) allow bidirectional understanding but are inefficient, as only masked tokens contribute to the loss per step. To address these limitations, we introduce JanusDNA, the first bidirectional DNA foundation model built upon a novel pretraining paradigm that combines the optimization efficiency of autoregressive modeling with the bidirectional comprehension of masked modeling. JanusDNA adopts a hybrid Mamba, Attention and Mixture of Experts (MoE) architecture, combining long-range modeling of Attention with efficient sequential learning of Mamba. MoE layers further scale model capacity via sparse activation while keeping computational cost low. Notably, JanusDNA processes up to 1 million base pairs at single nucleotide resolution on a single 80GB GPU. Extensive experiments and ablations show JanusDNA achieves new SOTA results on three genomic representation benchmarks, outperforming models with 250x more activated parameters. Code: https://github.com/Qihao-Duan/JanusDNA
Updated: 2025-07-04 13:40:34
标题: JanusDNA:一种强大的双向混合DNA基础模型
摘要: 大型语言模型(LLMs)已经彻底改变了自然语言处理,并越来越多地应用于其他序列数据类型,包括基因序列。然而,将LLMs调整到基因组学中存在重大挑战。捕捉复杂的基因组相互作用需要对DNA序列内的长程依赖进行建模,其中相互作用往往跨越超过10,000个碱基对,即使在单个基因内也是如此,这在传统的模型架构和训练范式下构成了巨大的计算负担。此外,标准的LLM训练方法对于DNA来说并不理想:自回归训练虽然高效,但仅支持单向理解。然而,DNA本质上是双向的,例如,双向启动子在两个方向上调节转录并占据人类基因表达的近11%。掩码语言模型(MLMs)允许双向理解,但效率低下,因为每步只有掩码标记对损失有贡献。为了解决这些限制,我们引入了JanusDNA,这是第一个建立在结合自回归建模的优化效率和掩码建模的双向理解的全新预训练范式上的双向DNA基础模型。JanusDNA采用了混合的Mamba、Attention和专家组合(MoE)架构,结合了Attention的长程建模和Mamba的高效序列学习。MoE层通过稀疏激活进一步扩展模型容量,同时保持计算成本低。值得注意的是,JanusDNA在单个80GB GPU上以单核苷酸分辨率处理高达1百万个碱基对。广泛的实验和消融研究表明,JanusDNA在三个基因组表示基准上取得了新的SOTA结果,胜过具有250倍更多激活参数的模型。 代码:https://github.com/Qihao-Duan/JanusDNA
更新时间: 2025-07-04 13:40:34
领域: cs.LG,q-bio.GN
Learning unitaries with quantum statistical queries
We propose several algorithms for learning unitary operators from quantum statistical queries with respect to their Choi-Jamiolkowski state. Quantum statistical queries capture the capabilities of a learner with limited quantum resources, which receives as input only noisy estimates of expected values of measurements. Our approach leverages quantum statistical queries to estimate the Fourier mass of a unitary on a subset of Pauli strings, generalizing previous techniques developed for uniform quantum examples. Specifically, we show that the celebrated quantum Goldreich-Levin algorithm can be implemented with quantum statistical queries, whereas the prior version of the algorithm involves oracle access to the unitary and its inverse. As an application, we prove that quantum Boolean functions with constant total influence or with constant degree are efficiently learnable in our model. Moreover, we prove that $\mathcal{O}(\log n)$-juntas are efficiently learnable and constant-depth circuits are learnable query-efficiently with quantum statistical queries. On the other hand, all previous algorithms for these tasks demand significantly greater resources, such as oracle access to the unitary or direct access to the Choi-Jamiolkowski state. We also demonstrate that, despite these positive results, quantum statistical queries lead to an exponentially larger query complexity for certain tasks, compared to separable measurements to the Choi-Jamiolkowski state. In particular, we show an exponential lower bound for learning a class of phase-oracle unitaries and a double exponential lower bound for testing the unitarity of channels. Taken together, our results indicate that quantum statistical queries offer a unified framework for various unitary learning tasks, with potential applications in quantum machine learning, many-body physics and benchmarking of near-term devices.
Updated: 2025-07-04 13:40:24
标题: 用量子统计查询学习幺正算符
摘要: 我们提出了几种算法,用于从关于其Choi-Jamiolkowski状态的量子统计查询中学习酉算子。量子统计查询捕捉了具有有限量子资源的学习者的能力,该学习者只接收测量期望值的噪声估计作为输入。我们的方法利用量子统计查询来估计酉算子在一组Pauli字符串的傅里叶质量,从而推广了先前针对均匀量子示例开发的技术。具体而言,我们展示了著名的量子Goldreich-Levin算法可以使用量子统计查询实现,而算法的先前版本涉及对酉算子及其逆的oracle访问。作为应用,我们证明了在我们的模型中具有恒定总影响力或恒定度的量子布尔函数可以高效学习。此外,我们证明了$\mathcal{O}(\log n)$-juntas可以高效学习,且常深度电路可以使用量子统计查询高效学习。另一方面,所有先前针对这些任务的算法都需要更多的资源,例如对酉算子的oracle访问或直接访问Choi-Jamiolkowski状态。我们还证明,尽管这些积极的结果,与将Choi-Jamiolkowski状态的可分测量相比,量子统计查询会导致某些任务的指数级更大的查询复杂性。特别是,我们展示了学习一类相位oracle酉算子的指数下界,以及测试通道的酉性的双指数下界。综合而言,我们的结果表明,量子统计查询为各种酉学习任务提供了统一的框架,具有在量子机器学习、多体物理和近期设备基准测试中的潜在应用。
更新时间: 2025-07-04 13:40:24
领域: quant-ph,cs.CC,cs.LG
Dilution, Diffusion and Symbiosis in Spatial Prisoner's Dilemma with Reinforcement Learning
Recent studies in the spatial prisoner's dilemma games with reinforcement learning have shown that static agents can learn to cooperate through a diverse sort of mechanisms, including noise injection, different types of learning algorithms and neighbours' payoff knowledge. In this work, using an independent multi-agent Q-learning algorithm, we study the effects of dilution and mobility in the spatial version of the prisoner's dilemma. Within this setting, different possible actions for the algorithm are defined, connecting with previous results on the classical, non-reinforcement learning spatial prisoner's dilemma, showcasing the versatility of the algorithm in modeling different game-theoretical scenarios and the benchmarking potential of this approach. As a result, a range of effects is observed, including evidence that games with fixed update rules can be qualitatively equivalent to those with learned ones, as well as the emergence of a symbiotic mutualistic effect between populations that forms when multiple actions are defined.
Updated: 2025-07-04 13:32:01
标题: 《稀释、扩散和共生在具有强化学习的空间囚徒困境中的应用》
摘要: 最近的研究表明,通过在空间囚徒困境博弈中使用强化学习,静态代理可以通过各种机制学会合作,包括注入噪音、不同类型的学习算法和邻居的收益知识。在这项工作中,我们使用独立的多智能体 Q-learning 算法,研究了在空间囚徒困境中稀释和移动的影响。在这种设置下,为算法定义了不同的可能行动,与之前关于经典的、非强化学习空间囚徒困境的结果相连接,展示了该算法在建模不同博弈理论情景和这种方法的基准潜力方面的多样性。结果显示了一系列效应,包括固定更新规则的游戏可能在质量上等同于学习规则的游戏,以及当定义了多个行动时,人口之间形成的共生互利效应的出现。
更新时间: 2025-07-04 13:32:01
领域: cs.AI,cs.NE,physics.comp-ph
From Street Form to Spatial Justice: Explaining Urban Exercise Inequality via a Triadic SHAP-Informed Framework
Urban streets are essential public spaces that facilitate everyday physical activity and promote health equity. Drawing on Henri Lefebvre's spatial triad, this study proposes a conceptual and methodological framework to quantify street-level exercise deprivation through the dimensions of conceived (planning and structure), perceived (visual and sensory), and lived (practice and experiential) urban spaces. We integrate multi-source spatial data-including street networks, street-view imagery, and social media-using explainable machine learning (SHAP analysis) to classify streets by their dominant deprivation modes, forming a novel typology of spatial inequity. Results highlight significant differences across urban contexts: older city cores predominantly experience infrastructural constraints (conceived space), whereas new development areas suffer from experiential disengagement (lived space). Furthermore, by identifying spatial mismatches between population distribution and exercise intensity, our study reveals localized clusters of latent deprivation. Simulation experiments demonstrate that targeted improvements across spatial dimensions can yield up to 14% increases in exercise supportiveness. This research not only operationalizes Lefebvre's spatial theory at the street scale but also provides actionable insights and intervention guidelines, contributing to the broader goals of spatial justice and urban health equity.
Updated: 2025-07-04 13:28:30
标题: 从街道形式到空间正义:通过三元SHAP信息框架解释城市运动不平等
摘要: Urban streets are crucial public spaces that enable daily physical activity and promote health equity. Utilizing Henri Lefebvre's spatial triad, this study puts forward a conceptual and methodological framework to measure deprivation of exercise opportunities at the street level through the dimensions of planned (planning and structure), perceived (visual and sensory), and experienced (practice and experiential) urban spaces. By combining various spatial data sources such as street networks, street-view images, and social media, and employing explainable machine learning (SHAP analysis), streets are classified based on their predominant deprivation modes, leading to a new classification of spatial inequality. Results show significant disparities among urban areas: older city centers face primarily structural limitations (planned space), while newly developed areas struggle with lack of engagement in exercise (experienced space). Furthermore, the study identifies localized clusters of hidden deprivation by comparing population distribution and exercise intensity. Simulation experiments indicate that targeted improvements in different spatial dimensions can result in up to 14% increase in support for exercise. This research not only applies Lefebvre's spatial theory at the street level but also offers practical insights and intervention strategies, contributing to the broader objectives of spatial justice and urban health equity.
更新时间: 2025-07-04 13:28:30
领域: cs.CY,cs.IT,cs.LG,math.IT,62H30, 91D10, 68T05,I.2.6; I.5.2; H.2.8; J.4
Consistency of augmentation graph and network approximability in contrastive learning
Contrastive learning leverages data augmentation to develop feature representation without relying on large labeled datasets. However, despite its empirical success, the theoretical foundations of contrastive learning remain incomplete, with many essential guarantees left unaddressed, particularly the realizability assumption concerning neural approximability of an optimal spectral contrastive loss solution. In this work, we overcome these limitations by analyzing pointwise and spectral consistency of the augmentation graph Laplacian. We establish that, under specific conditions for data generation and graph connectivity, as the augmented dataset size increases, the augmentation graph Laplacian converges to a weighted Laplace-Beltrami operator on the natural data manifold. These consistency results ensure that the graph Laplacian spectrum effectively captures the manifold geometry. Consequently, they give way to a robust framework for establishing neural approximability, directly resolving the realizability assumption in a current paradigm.
Updated: 2025-07-04 13:24:32
标题: 增强图和对比学习中网络可近似性的一致性
摘要: 对比学习利用数据增强来发展特征表示,而不依赖于大型标记数据集。然而,尽管它在经验上取得成功,但对比学习的理论基础仍然不完整,许多关键保证尚未得到解决,特别是关于最优谱对比损失解的神经逼近性的可实现性假设。在这项工作中,我们通过分析增强图拉普拉斯的点点和谱一致性克服了这些限制。我们建立了,在特定的数据生成和图连接条件下,随着增强数据集大小的增加,增强图拉普拉斯收敛到自然数据流形上的加权拉普拉斯-贝尔特拉米算子。这些一致性结果确保图拉普拉斯谱有效地捕捉流形几何。因此,它们为建立神经逼近性提供了一个稳健的框架,直接解决了当前范式中的可实现性假设。
更新时间: 2025-07-04 13:24:32
领域: cs.LG,math.AP,math.SP
2.5D Object Detection for Intelligent Roadside Infrastructure
On-board sensors of autonomous vehicles can be obstructed, occluded, or limited by restricted fields of view, complicating downstream driving decisions. Intelligent roadside infrastructure perception systems, installed at elevated vantage points, can provide wide, unobstructed intersection coverage, supplying a complementary information stream to autonomous vehicles via vehicle-to-everything (V2X) communication. However, conventional 3D object-detection algorithms struggle to generalize under the domain shift introduced by top-down perspectives and steep camera angles. We introduce a 2.5D object detection framework, tailored specifically for infrastructure roadside-mounted cameras. Unlike conventional 2D or 3D object detection, we employ a prediction approach to detect ground planes of vehicles as parallelograms in the image frame. The parallelogram preserves the planar position, size, and orientation of objects while omitting their height, which is unnecessary for most downstream applications. For training, a mix of real-world and synthetically generated scenes is leveraged. We evaluate generalizability on a held-out camera viewpoint and in adverse-weather scenarios absent from the training set. Our results show high detection accuracy, strong cross-viewpoint generalization, and robustness to diverse lighting and weather conditions. Model weights and inference code are provided at: https://gitlab.kit.edu/kit/aifb/ATKS/public/digit4taf/2.5d-object-detection
Updated: 2025-07-04 13:16:59
标题: 2.5D物体检测用于智能路边基础设施
摘要: 自动驾驶车辆的车载传感器可能会被遮挡、遮蔽或受到视野限制,从而使下游驾驶决策变得复杂。智能路边基础设施感知系统安装在高处,可以提供广泛、无遮挡的交叉口覆盖范围,通过车辆对一切(V2X)通信向自动驾驶车辆提供补充信息流。然而,传统的3D目标检测算法在采用自顶向下的视角和陡峭的摄像头角度时很难泛化。我们介绍了一个专门针对基础设施路边安装摄像头的2.5D目标检测框架。与传统的2D或3D目标检测不同,我们采用一种预测方法来检测车辆在图像框架中的地面平面为平行四边形。这种平行四边形保留了物体的平面位置、大小和方向,同时省略了它们的高度,这对于大多数下游应用是不必要的。在训练过程中,利用了真实世界和合成生成的场景的混合。我们评估了在保留的摄像机视角和训练集中缺乏的恶劣天气情况下的泛化能力。我们的结果显示高的检测准确度,强大的跨视角泛化能力以及对多样化光照和天气条件的稳健性。模型权重和推理代码提供在:https://gitlab.kit.edu/kit/aifb/ATKS/public/digit4taf/2.5d-object-detection
更新时间: 2025-07-04 13:16:59
领域: cs.CV,cs.LG
Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention
Out-of-Distribution (OOD) detection is critical for safely deploying deep models in open-world environments, where inputs may lie outside the training distribution. During inference on a model trained exclusively with In-Distribution (ID) data, we observe a salient gradient phenomenon: around an ID sample, the local gradient directions for "enhancing" that sample's predicted class remain relatively consistent, whereas OOD samples--unseen in training--exhibit disorganized or conflicting gradient directions in the same neighborhood. Motivated by this observation, we propose an inference-stage technique to short-circuit those feature coordinates that spurious gradients exploit to inflate OOD confidence, while leaving ID classification largely intact. To circumvent the expense of recomputing the logits after this gradient short-circuit, we further introduce a local first-order approximation that accurately captures the post-modification outputs without a second forward pass. Experiments on standard OOD benchmarks show our approach yields substantial improvements. Moreover, the method is lightweight and requires minimal changes to the standard inference pipeline, offering a practical path toward robust OOD detection in real-world applications.
Updated: 2025-07-04 13:12:24
标题: 梯度短路:通过特征干预实现高效的外分布检测
摘要: 在开放世界环境中安全部署深度模型,超出分布(OOD)检测至关重要,因为输入可能位于训练分布之外。在仅使用内部分布(ID)数据训练的模型上推断时,我们观察到一个显著的梯度现象:在ID样本周围,用于“增强”该样本预测类别的局部梯度方向保持相对一致,而在训练中未见的OOD样本在同一邻域中表现出无序或冲突的梯度方向。受到这一观察的启发,我们提出了一个推断阶段的技术,该技术可以短路那些虚假梯度利用来增加OOD置信度的特征坐标,同时基本保持ID分类不变。为了避免在梯度短路后重新计算logits的开销,我们进一步引入了一个局部一阶近似,可以准确捕捉修改后的输出,而无需进行第二次前向传递。对标准OOD基准测试的实验表明,我们的方法取得了显著的改进。此外,这种方法轻量级且对标准推断流程的改动很小,为实际应用中鲁棒的OOD检测提供了实用的途径。
更新时间: 2025-07-04 13:12:24
领域: cs.CV,cs.LG
Simplifying Graph Neural Kernels: from Stacking Layers to Collapsed Structure
The Graph Neural Tangent Kernel (GNTK) successfully bridges the gap between kernel methods and Graph Neural Networks (GNNs), addressing key challenges such as the difficulty of training deep networks and the limitations of traditional kernel methods. However, the existing layer-stacking strategy in GNTK introduces redundant computations, significantly increasing computational complexity and limiting scalability for practical applications. To address these issues, this paper proposes the Simplified Graph Neural Tangent Kernel (SGTK), which replaces the traditional multi-layer stacking mechanism with a continuous $K$-step aggregation operation. This novel approach streamlines the iterative kernel computation process, effectively eliminating redundant calculations while preserving the kernel's expressiveness. By reducing the dependency on layer stacking, SGTK achieves both computational simplicity and efficiency. Furthermore, we introduce the Simplified Graph Neural Kernel (SGNK), which models infinitely wide Graph Neural Networks as Gaussian Processes. This allows kernel values to be directly determined from the expected outputs of activation functions in the infinite-width regime, bypassing the need for explicit layer-by-layer computation. SGNK further reduces computational complexity while maintaining the capacity to capture intricate structural patterns in graphs. Extensive experiments on node and graph classification tasks demonstrate that the proposed SGTK and SGNK achieve performance comparable to existing approaches while improving computational efficiency. Implementation details are available at https://anonymous.4open.science/r/SGNK-1CE4/.
Updated: 2025-07-04 13:12:09
标题: 简化图神经核:从堆叠层到折叠结构
摘要: 图神经切向核(GNTK)成功地弥合了核方法和图神经网络(GNN)之间的差距,解决了训练深度网络的困难和传统核方法的局限性等关键挑战。然而,GNTK中现有的层叠策略引入了冗余计算,显著增加了计算复杂性,并限制了实际应用的可扩展性。为了解决这些问题,本文提出了简化图神经切向核(SGTK),它用连续的K步聚合操作代替传统的多层叠加机制。这种新颖方法简化了迭代核计算过程,有效地消除了冗余计算,同时保持了核的表达能力。通过减少对层叠的依赖,SGTK实现了计算上的简单性和效率。此外,我们引入了简化图神经核(SGNK),将无限宽的图神经网络建模为高斯过程。这使得核值可以直接从无限宽度区域中激活函数的期望输出确定,绕过了显式的逐层计算的需要。SGNK进一步降低了计算复杂性,同时保持了捕捉图中复杂结构模式的能力。对节点和图分类任务的广泛实验表明,所提出的SGTK和SGNK在提高计算效率的同时实现了与现有方法可比的性能。实现细节可在https://anonymous.4open.science/r/SGNK-1CE4/找到。
更新时间: 2025-07-04 13:12:09
领域: cs.LG
An Advanced Deep Learning Framework for Ischemic and Hemorrhagic Brain Stroke Diagnosis Using Computed Tomography (CT) Images
Brain stroke is one of the leading causes of mortality and long-term disability worldwide, highlighting the need for precise and fast prediction techniques. Computed Tomography (CT) scan is considered one of the most effective methods for diagnosing brain strokes. The majority of stroke classification techniques rely on a single slice-level prediction mechanism, allowing the radiologist to manually choose the most critical CT slice from the original CT volume. Although clinical evaluations are often used in traditional diagnostic procedures, machine learning (ML) has opened up new avenues for improving stroke diagnosis. To supplement traditional diagnostic techniques, this study investigates the use of machine learning models, specifically concerning the prediction of brain stroke at an early stage utilizing CT scan images. In this research, we proposed a novel approach to brain stroke detection leveraging machine learning techniques, focusing on optimizing classification performance with pre-trained deep learning models and advanced optimization strategies. Pre-trained models, including DenseNet201, InceptionV3, MobileNetV2, ResNet50, and Xception, are utilized for feature extraction. Additionally, we employed feature engineering techniques, including BFO, PCA, and LDA, to enhance models' performance further. These features are subsequently classified using machine learning algorithms such as SVC, RF, XGB, DT, LR, KNN, and GNB. Our experiments demonstrate that the combination of MobileNetV2, LDA, and SVC achieved the highest classification accuracy of 97.93%, significantly outperforming other model-optimizer-classifier combinations. The results underline the effectiveness of integrating lightweight pre-trained models with robust optimization and classification techniques for brain stroke diagnosis.
Updated: 2025-07-04 13:11:29
标题: 一种用于计算机断层扫描(CT)图像的缺血性和出血性脑卒中诊断的先进深度学习框架
摘要: 脑卒中是全球死亡率和长期残疾的主要原因之一,突显了对精确和快速预测技术的需求。计算机断层扫描(CT扫描)被认为是诊断脑卒中的最有效方法之一。大多数脑卒中分类技术依赖于单个切片级别的预测机制,允许放射科医师手动选择原始CT体积中最关键的CT切片。尽管传统诊断程序通常使用临床评估,但机器学习(ML)为改善脑卒中诊断开辟了新的途径。为了补充传统诊断技术,本研究调查了利用机器学习模型,特别是利用CT扫描图像在早期预测脑卒中的方法。在这项研究中,我们提出了一种利用机器学习技术的脑卒中检测新方法,重点是通过使用预先训练的深度学习模型和先进的优化策略来优化分类性能。预先训练的模型,包括DenseNet201、InceptionV3、MobileNetV2、ResNet50和Xception,用于特征提取。此外,我们采用BFO、PCA和LDA等特征工程技术,进一步增强模型的性能。这些特征随后使用机器学习算法进行分类,如SVC、RF、XGB、DT、LR、KNN和GNB。我们的实验表明,MobileNetV2、LDA和SVC的组合实现了最高的分类准确性达到了97.93%,明显优于其他模型-优化器-分类器组合。这些结果强调了将轻量级预先训练模型与强大的优化和分类技术相结合,用于脑卒中诊断的有效性。
更新时间: 2025-07-04 13:11:29
领域: cs.CV,cs.AI
Communication Efficient, Differentially Private Distributed Optimization using Correlation-Aware Sketching
Federated learning with differential privacy suffers from two major costs: each client must transmit $d$-dimensional gradients every round, and the magnitude of DP noise grows with $d$. Yet empirical studies show that gradient updates exhibit strong temporal correlations and lie in a $k$-dimensional subspace with $k \ll d$. Motivated by this, we introduce DOME, a decentralized DP optimization framework in which each client maintains a compact sketch to project gradients into $\mathbb{R}^k$ before privatization and Secure Aggregation. This reduces per-round communication from order $d$ to order $k$ and moves towards a gradient approximation mean-squared error of $\sigma^2 k$. To allow the sketch to span new directions and prevent it from collapsing onto historical gradients, we augment it with random probes orthogonal to historical directions. We prove that our overall protocol satisfies $(\epsilon,\delta)$-Differential Privacy.
Updated: 2025-07-04 12:54:21
标题: 高效通信、具有差分隐私的分布式优化,利用考虑相关性的草图技术
摘要: 使用差分隐私的联邦学习存在两个主要成本问题:每个客户端每轮必须传输$d$维梯度,并且差分隐私噪声的幅度随着$d$的增加而增加。然而,经验研究表明梯度更新具有强烈的时间相关性,并且位于一个$k$维子空间中,其中$k \ll d$。受此启发,我们引入了DOME,一个分散式差分隐私优化框架,在该框架中,每个客户端维护一个紧凑的草图,将梯度投影到$\mathbb{R}^k$中,然后进行隐私化和安全聚合。这将每轮通信从$d$阶降低到$k$阶,并朝向梯度近似均方误差$\sigma^2 k$。为了允许草图涵盖新的方向并防止其崩溃到历史梯度上,我们用与历史方向正交的随机探针增强草图。我们证明了我们的整体协议满足$(\epsilon,\delta)$-差分隐私要求。
更新时间: 2025-07-04 12:54:21
领域: cs.LG
ReviewInstruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models
The effectiveness of large language models (LLMs) in conversational AI is hindered by their reliance on single-turn supervised fine-tuning (SFT) data, which limits contextual coherence in multi-turn dialogues. Existing methods for generating multi-turn dialogue data struggle to ensure both diversity and quality in instructions. To address this, we propose Review-Instruct, a novel framework that synthesizes multi-turn conversations through an iterative "Ask-Respond-Review" process involving three agent roles: a Candidate, multiple Reviewers, and a Chairman. The framework iteratively refines instructions by incorporating Reviewer feedback, enhancing dialogue diversity and difficulty. We construct a multi-turn dataset using the Alpaca dataset and fine-tune the LLaMA2-13B model. Evaluations on MT-Bench, MMLU-Pro, and Auto-Arena demonstrate significant improvements, achieving absolute gains of 2.9\% on MMLU-Pro and 2\% on MT-Bench compared to prior state-of-the-art models based on LLaMA2-13B. Ablation studies confirm the critical role of the Review stage and the use of multiple Reviewers in boosting instruction diversity and difficulty. Our work highlights the potential of review-driven, multi-agent frameworks for generating high-quality conversational data at scale.
Updated: 2025-07-04 12:51:51
标题: 审阅指导:一种面向大型语言模型的基于审阅的多轮对话生成方法
摘要: 大型语言模型(LLMs)在对话人工智能中的有效性受到其依赖单轮监督微调(SFT)数据的限制,这限制了多轮对话中的上下文连贯性。现有的生成多轮对话数据的方法很难确保指导中的多样性和质量。为了解决这个问题,我们提出了Review-Instruct,这是一个通过迭代的“询问-回应-审查”过程合成多轮对话的新框架,涉及三个代理角色:一个候选人,多个审查者和一个主席。该框架通过整合审查者的反馈,增强对话的多样性和难度,迭代地改进指导。我们使用Alpaca数据集构建了一个多轮数据集,并对LLaMA2-13B模型进行微调。在MT-Bench、MMLU-Pro和Auto-Arena上的评估显示出显著的改进,与基于LLaMA2-13B的先前最先进模型相比,在MMLU-Pro上实现了2.9\%的绝对增益,在MT-Bench上实现了2\%的绝对增益。消融研究证实了审查阶段的关键作用和使用多个审查者在提升指导的多样性和难度方面的作用。我们的工作突出了以审查为驱动的、多代理框架在大规模生成高质量对话数据方面的潜力。
更新时间: 2025-07-04 12:51:51
领域: cs.CL,cs.AI
H2HTalk: Evaluating Large Language Models as Emotional Companion
As digital emotional support needs grow, Large Language Model companions offer promising authentic, always-available empathy, though rigorous evaluation lags behind model advancement. We present Heart-to-Heart Talk (H2HTalk), a benchmark assessing companions across personality development and empathetic interaction, balancing emotional intelligence with linguistic fluency. H2HTalk features 4,650 curated scenarios spanning dialogue, recollection, and itinerary planning that mirror real-world support conversations, substantially exceeding previous datasets in scale and diversity. We incorporate a Secure Attachment Persona (SAP) module implementing attachment-theory principles for safer interactions. Benchmarking 50 LLMs with our unified protocol reveals that long-horizon planning and memory retention remain key challenges, with models struggling when user needs are implicit or evolve mid-conversation. H2HTalk establishes the first comprehensive benchmark for emotionally intelligent companions. We release all materials to advance development of LLMs capable of providing meaningful and safe psychological support.
Updated: 2025-07-04 12:50:43
标题: H2HTalk: 评估大型语言模型作为情感伴侣
摘要: 随着数字情感支持需求的增长,大型语言模型伴侣提供了有前途的真实、始终可用的共情,尽管严格的评估落后于模型的进步。我们提出了Heart-to-Heart Talk(H2HTalk),一个评估伴侣在个性发展和共情交互方面的基准,平衡情感智力和语言流利度。H2HTalk包括4,650个经过策划的情景,涵盖对话、回忆和行程规划,反映了现实世界中的支持对话,大大超过了先前数据集的规模和多样性。我们结合了一个实施依恋理论原则的安全依恋人格(SAP)模块,以实现更安全的交互。通过我们的统一协议对50个LLM进行基准测试,结果显示,长期规划和记忆保持仍然是关键挑战,模型在用户需求隐含或在对话中演变时会遇到困难。H2HTalk建立了第一个情感智能伴侣的全面基准。我们发布所有材料,以推动能够提供有意义和安全心理支持的LLM的发展。
更新时间: 2025-07-04 12:50:43
领域: cs.CL,cs.AI
Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition
In this paper, we address the following question: How do generic foundation models (e.g., CLIP, BLIP, LLaVa, DINO) compare against a domain-specific face recognition model (viz., AdaFace or ArcFace) on the face recognition task? Through a series of experiments involving several foundation models and benchmark datasets, we are able to report the following findings: (a) In all datasets considered, domain-specific models outperformed zero-shot foundation models. (b) The performance of zero-shot generic foundation models improves on over-segmented face images than tightly cropped faces thereby suggesting the importance of contextual clues. For example, at a False Match Rate (FMR) of 0.01%, the True Match Rate (TMR) of OpenCLIP improved from 64.97% to 81.73% on the LFW dataset as the face crop increased from 112x112 to 250x250 while the TMR of domain-specific AdaFace dropped from 99.09% to 77.31%. (c) A simple score-level fusion of a foundation model with a domain-specific FR model improved the accuracy at low FMRs. For example, the TMR of AdaFace when fused with BLIP improved from 72.64% to 83.31% at an FMR of 0.0001% on the IJB-B dataset and from 73.17% to 85.81% on the IJB-C dataset. (d) Foundation models, such as ChatGPT, can be used to impart explainability to the FR pipeline (e.g., ``Despite minor lighting and head tilt differences, the two left-profile images show high consistency in forehead slope, nose shape, chin contour...''). In some instances, foundation models are even able to resolve low-confidence decisions made by AdaFace (e.g., ``Although AdaFace assigns a low similarity score of 0.21, both images exhibit visual similarity...and the pair is likely of the same person''), thereby reiterating the importance of combining domain-specific FR models with generic foundation models in a judicious manner.
Updated: 2025-07-04 12:46:45
标题: 基于基础模型和领域特定模型的人脸识别性能比较、融合和可解释性
摘要: 在这篇论文中,我们探讨了以下问题:通用基础模型(例如CLIP,BLIP,LLaVa,DINO)在人脸识别任务中与特定领域的人脸识别模型(例如AdaFace或ArcFace)相比如何?通过一系列涉及多个基础模型和基准数据集的实验,我们能够报道以下发现:(a)在所有考虑的数据集中,特定领域的模型表现优于零样本基础模型。 (b)零样本通用基础模型在过度分割的人脸图像上的性能优于紧密裁剪的人脸,因此表明上下文线索的重要性。例如,在False Match Rate(FMR)为0.01%时,OpenCLIP在LFW数据集上的True Match Rate(TMR)从112x112的人脸裁剪增加到250x250时,从64.97%提高到81.73%,而特定领域的AdaFace的TMR从99.09%下降到77.31%。 (c)将基础模型与特定领域FR模型进行简单的得分级融合可以提高低FMR下的准确性。例如,在IJB-B数据集上,当AdaFace与BLIP融合时,TMR从0.0001%的FMR为72.64%提高到83.31%,在IJB-C数据集上从73.17%提高到85.81%。 (d)基础模型,例如ChatGPT,可以用于为FR管道提供可解释性(例如,“尽管有轻微的光照和头部倾斜差异,左侧轮廓图像显示额头斜率,鼻子形状,下巴轮廓高度一致...”)。在某些情况下,基础模型甚至能够解决AdaFace所做的低置信度决定(例如,“虽然AdaFace分配了0.21的低相似度分数,但两幅图像展现出视觉相似性...并且可能是同一个人的配对”),因此再次强调了在明智的方式下将特定领域FR模型与通用基础模型相结合的重要性。
更新时间: 2025-07-04 12:46:45
领域: cs.CV,cs.AI
Multimodal Alignment with Cross-Attentive GRUs for Fine-Grained Video Understanding
Fine-grained video classification requires understanding complex spatio-temporal and semantic cues that often exceed the capacity of a single modality. In this paper, we propose a multimodal framework that fuses video, image, and text representations using GRU-based sequence encoders and cross-modal attention mechanisms. The model is trained using a combination of classification or regression loss, depending on the task, and is further regularized through feature-level augmentation and autoencoding techniques. To evaluate the generality of our framework, we conduct experiments on two challenging benchmarks: the DVD dataset for real-world violence detection and the Aff-Wild2 dataset for valence-arousal estimation. Our results demonstrate that the proposed fusion strategy significantly outperforms unimodal baselines, with cross-attention and feature augmentation contributing notably to robustness and performance.
Updated: 2025-07-04 12:35:52
标题: 多模态对齐与跨注意力GRUs用于细粒度视频理解
摘要: 细粒度视频分类需要理解复杂的时空和语义线索,通常超出单一模态的能力范围。在本文中,我们提出了一个多模态框架,使用基于GRU的序列编码器和跨模态注意机制融合视频、图像和文本表示。该模型使用分类或回归损失的组合进行训练,具体取决于任务,并通过特征级增强和自编码技术进行进一步正则化。为了评估我们框架的普遍性,我们在两个具有挑战性的基准数据集上进行实验:用于实际暴力检测的DVD数据集和用于情感-激发估计的Aff-Wild2数据集。我们的结果表明,所提出的融合策略明显优于单模态基线,跨注意和特征增强显著促进了鲁棒性和性能。
更新时间: 2025-07-04 12:35:52
领域: cs.CV,cs.AI
Generating Synthetic Relational Tabular Data via Structural Causal Models
Synthetic tabular data generation has received increasing attention in recent years, particularly with the emergence of foundation models for tabular data. The breakthrough success of TabPFN (Hollmann et al.,2025), which leverages vast quantities of synthetic tabular datasets derived from structural causal models (SCMs), demonstrates the critical role synthetic data plays in developing powerful tabular foundation models. However, most real-world tabular data exists in relational formats spanning multiple interconnected tables - a structure not adequately addressed by current generation methods. In this work, we extend the SCM-based approach by developing a novel framework that generates realistic synthetic relational tabular data including causal relationships across tables. Our experiments confirm that this framework is able to construct relational datasets with complex inter-table dependencies mimicking real-world scenarios.
Updated: 2025-07-04 12:27:23
标题: 通过结构因果模型生成合成关系表格数据
摘要: 合成表格数据生成在最近几年越来越受到关注,特别是随着基于表格数据的基础模型的出现。TabPFN(Hollmann等,2025年)的突破性成功利用了大量从结构因果模型(SCMs)派生的合成表格数据集,证明了合成数据在开发强大的表格基础模型中所起的关键作用。然而,大多数现实世界的表格数据存在于跨越多个相互连接的表格的关系格式中,这种结构目前的生成方法并未得到充分解决。在这项工作中,我们通过开发一个新颖的框架来扩展基于SCM的方法,该框架能够生成包括跨表格因果关系在内的逼真的合成关系表格数据。我们的实验证实,该框架能够构建具有复杂跨表格依赖关系的关系数据集,模拟现实世界的场景。
更新时间: 2025-07-04 12:27:23
领域: cs.LG,cs.AI,stat.AP
Decoupled Relative Learning Rate Schedules
In this work, we introduce a novel approach for optimizing LLM training by adjusting learning rates across weights of different components in Transformer models. Traditional methods often apply a uniform learning rate across all network layers, potentially overlooking the unique dynamics of each part. Remarkably, our introduced relative learning rates, RLRS, method accelerates the training process by up to $23\%$, particularly in complex models such as Mixture of Experts (MoE). Hyperparameters of RLRS can be efficiently tuned on smaller models and then effectively reused on models up to $27\times$ larger. This simple and effective method results in a substantial reduction in training time and computational resources, offering a practical and scalable solution for optimizing large-scale neural networks.
Updated: 2025-07-04 12:23:45
标题: 解耦的相对学习率调度
摘要: 在这项工作中,我们介绍了一种新颖的方法,通过调整Transformer模型中不同组件的权重之间的学习率来优化LLM训练。传统方法通常在所有网络层应用统一的学习率,可能忽略了每个部分的独特动态。值得注意的是,我们引入的相对学习率(RLRS)方法可以将训练过程加速高达$23\%$,特别是在复杂模型(如Mixture of Experts)中。RLRS的超参数可以在较小的模型上进行高效调整,然后有效地重用在大约$27\times$更大的模型上。这种简单有效的方法导致训练时间和计算资源大幅减少,提供了一个实用和可扩展的解决方案,用于优化大规模神经网络。
更新时间: 2025-07-04 12:23:45
领域: cs.LG
Pronunciation-Lexicon Free Training for Phoneme-based Crosslingual ASR via Joint Stochastic Approximation
Recently, pre-trained models with phonetic supervision have demonstrated their advantages for crosslingual speech recognition in data efficiency and information sharing across languages. However, a limitation is that a pronunciation lexicon is needed for such phoneme-based crosslingual speech recognition. In this study, we aim to eliminate the need for pronunciation lexicons and propose a latent variable model based method, with phonemes being treated as discrete latent variables. The new method consists of a speech-to-phoneme (S2P) model and a phoneme-to-grapheme (P2G) model, and a grapheme-to-phoneme (G2P) model is introduced as an auxiliary inference model. To jointly train the three models, we utilize the joint stochastic approximation (JSA) algorithm, which is a stochastic extension of the EM (expectation-maximization) algorithm and has demonstrated superior performance particularly in estimating discrete latent variable models. Based on the Whistle multilingual pre-trained S2P model, crosslingual experiments are conducted in Polish (130 h) and Indonesian (20 h). With only 10 minutes of phoneme supervision, the new method, JSA-SPG, achieves 5\% error rate reductions compared to the best crosslingual fine-tuning approach using subword or full phoneme supervision. Furthermore, it is found that in language domain adaptation (i.e., utilizing cross-domain text-only data), JSA-SPG outperforms the standard practice of language model fusion via the auxiliary support of the G2P model by 9% error rate reductions. To facilitate reproducibility and encourage further exploration in this field, we open-source the JSA-SPG training code and complete pipeline.
Updated: 2025-07-04 12:23:22
标题: 基于音位的跨语种ASR的发音词典自由训练通过联合随机逼近
摘要: 最近,具有语音监督的预训练模型展示了它们在数据效率和跨语言信息共享方面的优势。然而,一个限制是这种基于音素的跨语言语音识别需要一个发音词典。在这项研究中,我们旨在消除发音词典的需求,并提出了一种基于潜变量模型的方法,其中音素被视为离散潜变量。这种新方法包括一个语音到音素(S2P)模型和一个音素到字形(P2G)模型,同时引入了一个字形到音素(G2P)模型作为辅助推理模型。为了共同训练这三个模型,我们利用了联合随机逼近(JSA)算法,这是期望最大化(EM)算法的随机扩展,在特别在估计离散潜变量模型方面表现出更好的性能。基于Whistle多语言预训练S2P模型,在波兰语(130小时)和印尼语(20小时)进行了跨语言实验。仅仅经过10分钟的音素监督,新方法JSA-SPG相对于使用子词或完整音素监督的最佳跨语言微调方法实现了5%的错误率降低。此外,发现在语言领域适应(即利用跨领域文本数据)方面,JSA-SPG相对于通过G2P模型的辅助支持进行语言模型融合的标准做法,错误率降低了9%。为了促进可重现性并鼓励在这一领域进行进一步探索,我们开源了JSA-SPG训练代码和完整流程。
更新时间: 2025-07-04 12:23:22
领域: eess.AS,cs.AI,cs.CL
Limits of Safe AI Deployment: Differentiating Oversight and Control
Oversight and control (collectively, supervision) are often invoked as key levers for ensuring that AI systems are accountable, reliable, and able to fulfill governance and management requirements. However, the concepts are frequently conflated or insufficiently distinguished in academic and policy discourse, undermining efforts to design or evaluate systems that should remain under meaningful human supervision. This paper undertakes a targeted critical review of literature on supervision outside of AI, along with a brief summary of past work on the topic related to AI. We then differentiate control as being ex-ante or real-time, and operational rather than policy or governance. In contrast, oversight is either a policy and governance function, or is ex-post. We suggest that control aims to prevent failures. In contrast, oversight often focuses on detection, remediation, or incentives for future prevention; all preventative oversight strategies nonetheless necessitate control. Building on this foundation, we make three contributions. First, we propose a theoretically-informed yet policy-grounded framework that articulates the conditions under which each mechanism is possible, where they fall short, and what is required to make them meaningful in practice. Second, we outline how supervision methods should be documented and integrated into risk management, and drawing on the Microsoft Responsible AI Maturity Model, we outline a maturity model for AI supervision. Third, we explicitly highlight some boundaries of these mechanisms, including where they apply, where they fail, and where it is clear that no existing methods suffice. This foregrounds the question of whether meaningful supervision is possible in a given deployment context, and can support regulators, auditors, and practitioners in identifying both present limitations and the need for new conceptual and technical advances.
Updated: 2025-07-04 12:22:35
标题: 安全AI部署的限制:区分监督和控制
摘要: 监督和控制(统称为监管)通常被视为确保人工智能系统具有问责性、可靠性,并能够满足治理和管理要求的关键杠杆。然而,在学术和政策讨论中,这些概念经常被混淆或不够明确,从而破坏了设计或评估应该保持在有意义的人类监督下的系统的努力。 本文对人工智能之外的监管文献进行了有针对性的批判性审查,同时简要总结了与人工智能相关主题的过去研究。然后,我们区分了控制为前瞻性或实时性,操作性而非政策性或治理性。相反,监督要么是政策和治理功能,要么是事后的。我们认为,控制旨在防止故障。相反,监督往往侧重于检测、纠正或为未来预防提供激励;尽管所有预防性监督策略都需要控制。 在此基础上,我们提出了三点贡献。首先,我们提出了一个在理论上得到启发但基于政策的框架,阐明了每种机制可能存在的条件,它们的不足之处,以及在实践中使它们有意义所需的条件。其次,我们概述了监督方法如何被记录并整合到风险管理中,并借鉴微软负责人工智能成熟度模型,我们概述了一个人工智能监督成熟度模型。第三,我们明确突出了这些机制的一些边界,包括它们适用的范围,它们失败的地方,以及明确没有现有方法足够的地方。这凸显了一个问题,即在特定部署环境中是否可能进行有意义的监督,并且可以支持监管机构、审计员和从业者识别现有限制和对新概念和技术进步的需求。
更新时间: 2025-07-04 12:22:35
领域: cs.AI,cs.SY,eess.SY,I.2; K.6; D.2.9
UWB TDoA Error Correction using Transformers: Patching and Positional Encoding Strategies
Despite their high accuracy, UWB-based localization systems suffer inaccuracies when deployed in industrial locations with many obstacles due to multipath effects and non-line-of-sight (NLOS) conditions. In such environments, current error mitigation approaches for time difference of arrival (TDoA) localization typically exclude NLOS links. However, this exclusion approach leads to geometric dilution of precision problems and this approach is infeasible when the majority of links are NLOS. To address these limitations, we propose a transformer-based TDoA position correction method that uses raw channel impulse responses (CIRs) from all available anchor nodes to compute position corrections. We introduce different CIR ordering, patching and positional encoding strategies for the transformer, and analyze each proposed technique's scalability and performance gains. Based on experiments on real-world UWB measurements, our approach can provide accuracies of up to 0.39 m in a complex environment consisting of (almost) only NLOS signals, which is an improvement of 73.6 % compared to the TDoA baseline.
Updated: 2025-07-04 12:19:54
标题: 使用变压器进行UWB TDoA误差校正:补丁和位置编码策略
摘要: 尽管基于超宽带(UWB)的定位系统具有很高的准确性,但在具有许多障碍物的工业环境中部署时,由于多径效应和非视距(NLOS)条件,这些系统会出现不准确性。在这种环境中,目前用于处理到达时间差(TDoA)定位的错误校正方法通常会排除NLOS链接。然而,这种排除方法会导致几何精度稀释问题,当大多数链接为NLOS时,这种方法是不可行的。为了解决这些限制,我们提出了一种基于变压器的TDoA位置校正方法,该方法使用所有可用锚节点的原始信道脉冲响应(CIR)来计算位置校正。我们为变压器引入了不同的CIR排序、修补和位置编码策略,并分析了每种提出的技术的可扩展性和性能增益。根据对真实世界UWB测量的实验,我们的方法可以在一个由(几乎)只有NLOS信号组成的复杂环境中提供高达0.39米的准确性,这比TDoA基线改进了73.6%。
更新时间: 2025-07-04 12:19:54
领域: eess.SP,cs.LG
A Flexible Instruction Set Architecture for Efficient GEMMs
GEneral Matrix Multiplications (GEMMs) are recurrent in high-performance computing and deep learning workloads. Typically, high-end CPUs accelerate GEMM workloads with Single-Instruction Multiple Data (SIMD) or vector Instruction Set Architectures (ISAs). Since these ISAs face significant issues when running GEMM workloads, particularly when dealing with small, tall, or skinny matrices, matrix ISAs have been proposed and implemented by major hardware vendors in the last years. Although these matrix ISAs deliver larger throughput when running GEMMs than their SIMD/vector counterparts, they are rigid solutions unable to dynamically adapt themselves to application-specific aspects like the data format. This paper demonstrates that the state-of-the-art matrix ISAs deliver suboptimal performance when running the most commonly used convolution and transformer models. This paper proposes the Matrix Tile Extension (MTE), the first matrix ISA that completely decouples the instruction set architecture from the microarchitecture and seamlessly interacts with existing vector ISAs. MTE incurs minimal implementation overhead since it only requires a few additional instructions and a 64-bit Control Status Register (CSR) to keep its state. Specifically, MTE can i) vectorize GEMMs across the three dimensions M, N, and K; ii) leverage the capacity of the existing vector register file; and iii) decouple the tile shape from the underlying microarchitecture. MTE achieves speed-ups of 1.35x over the best state-of-the-art matrix ISA.
Updated: 2025-07-04 12:17:00
标题: 一种灵活的指令集架构,用于高效的GEMMs
摘要: 通用矩阵乘法(GEMM)在高性能计算和深度学习工作负载中经常出现。通常,高端CPU使用单指令多数据(SIMD)或矢量指令集架构(ISAs)来加速GEMM工作负载。由于这些ISAs在运行GEMM工作负载时面临重大问题,特别是在处理小型、高瘦或瘦矩阵时,主要硬件供应商在过去几年提出并实施了矩阵ISAs。尽管这些矩阵ISAs在运行GEMMs时提供了比其SIMD/矢量对应物更大的吞吐量,但它们是刚性解决方案,无法动态地适应应用程序特定方面,如数据格式。本文证明,最先进的矩阵ISAs在运行最常用的卷积和变压器模型时表现出次优性能。 本文提出了矩阵瓦片扩展(MTE),这是第一个完全将指令集架构与微体系结构分离,并与现有矢量ISAs无缝交互的矩阵ISA。MTE的实现开销很小,因为它只需要一些额外的指令和一个64位控制状态寄存器(CSR)来保持其状态。具体来说,MTE可以i)在M、N和K三个维度上对GEMMs进行矢量化;ii)利用现有矢量寄存器文件的容量;和iii)将瓦片形状与基础微体系结构分离。MTE实现了1.35倍的加速,超过了最先进的矩阵ISA。
更新时间: 2025-07-04 12:17:00
领域: cs.AR,cs.LG,C.1.0
High-Dimensional Learning in Finance
Recent advances in machine learning have shown promising results for financial prediction using large, over-parameterized models. This paper provides theoretical foundations and empirical validation for understanding when and how these methods achieve predictive success. I examine two key aspects of high-dimensional learning in finance. First, I prove that within-sample standardization in Random Fourier Features implementations fundamentally alters the underlying Gaussian kernel approximation, replacing shift-invariant kernels with training-set dependent alternatives. Second, I establish information-theoretic lower bounds that identify when reliable learning is impossible no matter how sophisticated the estimator. A detailed quantitative calibration of the polynomial lower bound shows that with typical parameter choices, e.g., 12,000 features, 12 monthly observations, and R-square 2-3%, the required sample size to escape the bound exceeds 25-30 years of data--well beyond any rolling-window actually used. Thus, observed out-of-sample success must originate from lower-complexity artefacts rather than from the intended high-dimensional mechanism.
Updated: 2025-07-04 11:59:32
标题: 金融中的高维学习
摘要: 最近机器学习的进展显示出使用大型、过参数化模型进行金融预测的前景。本文为理解这些方法何时以及如何取得预测成功提供了理论基础和经验证据。我考察了金融领域高维学习的两个关键方面。首先,我证明了在随机傅里叶特征实现中的样本内标准化从根本上改变了基础的高斯核逼近,用训练集相关的替代方案取代了平移不变核。其次,我建立了信息论下界,确定了在何种情况下,可靠的学习是不可能的,无论估计器有多么复杂。对多项式下界的详细定量校准显示,对于典型的参数选择,例如12,000个特征,12个月观测和R平方2-3%,要摆脱下界所需的样本量超过25-30年的数据——远远超过实际使用的任何滚动窗口。因此,观察到的样本外成功必须源于较低复杂度的人为因素,而不是预期的高维机制。
更新时间: 2025-07-04 11:59:32
领域: q-fin.ST,cs.LG,econ.EM,stat.ML
Reinforcement Learning-based Feature Generation Algorithm for Scientific Data
Feature generation (FG) aims to enhance the prediction potential of original data by constructing high-order feature combinations and removing redundant features. It is a key preprocessing step for tabular scientific data to improve downstream machine-learning model performance. Traditional methods face the following two challenges when dealing with the feature generation of scientific data: First, the effective construction of high-order feature combinations in scientific data necessitates profound and extensive domain-specific expertise. Secondly, as the order of feature combinations increases, the search space expands exponentially, imposing prohibitive human labor consumption. Advancements in the Data-Centric Artificial Intelligence (DCAI) paradigm have opened novel avenues for automating feature generation processes. Inspired by that, this paper revisits the conventional feature generation workflow and proposes the Multi-agent Feature Generation (MAFG) framework. Specifically, in the iterative exploration stage, multi-agents will construct mathematical transformation equations collaboratively, synthesize and identify feature combinations ex-hibiting high information content, and leverage a reinforcement learning mechanism to evolve their strategies. Upon completing the exploration phase, MAFG integrates the large language models (LLMs) to interpreta-tively evaluate the generated features of each significant model performance breakthrough. Experimental results and case studies consistently demonstrate that the MAFG framework effectively automates the feature generation process and significantly enhances various downstream scientific data mining tasks.
Updated: 2025-07-04 11:52:09
标题: 基于强化学习的科学数据特征生成算法
摘要: 特征生成(FG)旨在通过构建高阶特征组合和删除冗余特征来增强原始数据的预测潜力。这是改善表格科学数据下游机器学习模型性能的关键预处理步骤。传统方法在处理科学数据的特征生成时面临以下两个挑战:首先,在科学数据中有效构建高阶特征组合需要深厚和广泛的领域专业知识。其次,随着特征组合的顺序增加,搜索空间呈指数级增长,导致了人力消耗的限制性。数据中心人工智能(DCAI)范式的进展为自动化特征生成过程开辟了新途径。受此启发,本文重新审视传统特征生成工作流程,并提出了多智能体特征生成(MAFG)框架。具体而言,在迭代探索阶段,多智能体将协作构建数学变换方程,综合和识别具有高信息含量的特征组合,并利用强化学习机制来演化他们的策略。完成探索阶段后,MAFG整合了大语言模型(LLMs)来解释性评估每个显著模型性能突破的生成特征。实验结果和案例研究一致表明,MAFG框架有效地自动化了特征生成过程,并显著增强了各种下游科学数据挖掘任务的性能。
更新时间: 2025-07-04 11:52:09
领域: cs.LG,cs.AI
Registered Attribute-Based Encryption with Reliable Outsourced Decryption Based on Blockchain
Decentralized data sovereignty and secure data exchange are regarded as foundational pillars of the new era. Attribute-based encryption (ABE) is a promising solution that enables fine-grained access control in data sharing. Recently, Hohenberger et al. (Eurocrypt 2023) introduced registered ABE (RABE) to eliminate trusted authority and gain decentralization. Users generate their own public and secret keys and then register their keys and attributes with a transparent key curator. However, RABE still suffers from heavy decryption overhead. A natural approach to address this issue is to outsource decryption to a decryption cloud server (DCS). In this work, we propose the first auditable RABE scheme with reliable outsourced decryption (ORABE) based on blockchain. First, we achieve verifiability of transform ciphertext via a verifiable tag mechanism. Then, the exemptibility, which ensures that the DCS escapes false accusations, is guaranteed by zero knowledge fraud proof under the optimistic assumption. Additionally, our system achieves fairness and auditability to protect the interests of all parties through blockchain. Finally, we give concrete security and theoretical analysis and evaluate our scheme on Ethereum to demonstrate feasibility and efficiency.
Updated: 2025-07-04 11:42:13
标题: 基于区块链的可靠外包解密的注册属性基加密
摘要: 去中心化的数据主权和安全数据交换被视为新时代的基石。基于属性的加密(ABE)是一种有前途的解决方案,可以在数据共享中实现细粒度访问控制。最近,Hohenberger等人(Eurocrypt 2023)引入了注册ABE(RABE)以消除信任机构并实现去中心化。用户生成自己的公钥和私钥,然后将其键和属性注册到一个透明的密钥管理器。然而,RABE仍然存在严重的解密开销。解决这个问题的一种自然方法是将解密外包给解密云服务器(DCS)。在这项工作中,我们提出了基于区块链的第一个可审计的RABE方案,具有可靠的外包解密(ORABE)。首先,我们通过可验证标记机制实现了转换密文的可验证性。然后,豁免性确保DCS免受虚假指控,这是在乐观假设下通过零知识欺诈证明来保证的。此外,我们的系统通过区块链实现了公平性和可审计性,以保护所有各方的利益。最后,我们对我们的方案进行了具体的安全性和理论分析,并在以太坊上进行评估,以展示可行性和效率。
更新时间: 2025-07-04 11:42:13
领域: cs.CR
ObjectRL: An Object-Oriented Reinforcement Learning Codebase
ObjectRL is an open-source Python codebase for deep reinforcement learning (RL), designed for research-oriented prototyping with minimal programming effort. Unlike existing codebases, ObjectRL is built on Object-Oriented Programming (OOP) principles, providing a clear structure that simplifies the implementation, modification, and evaluation of new algorithms. ObjectRL lowers the entry barrier for deep RL research by organizing best practices into explicit, clearly separated components, making them easier to understand and adapt. Each algorithmic component is a class with attributes that describe key RL concepts and methods that intuitively reflect their interactions. The class hierarchy closely follows common ontological relationships, enabling data encapsulation, inheritance, and polymorphism, which are core features of OOP. We demonstrate the efficiency of ObjectRL's design through representative use cases that highlight its flexibility and suitability for rapid prototyping. The documentation and source code are available at https://objectrl.readthedocs.io and https://github.com/adinlab/objectrl .
Updated: 2025-07-04 11:27:52
标题: ObjectRL:一个面向对象的强化学习代码库
摘要: ObjectRL是一个用于深度强化学习(RL)的开源Python代码库,旨在通过最小的编程工作量进行研究性原型设计。与现有代码库不同,ObjectRL建立在面向对象编程(OOP)原则上,提供了一个清晰的结构,简化了新算法的实现、修改和评估。ObjectRL通过将最佳实践组织成明确、清晰分离的组件,降低了深度RL研究的门槛,使其更易于理解和适应。每个算法组件都是一个类,具有描述关键RL概念的属性和直观反映其相互作用的方法。类层次结构紧密遵循常见的本体关系,实现了数据封装、继承和多态性,这是OOP的核心特征。我们通过代表性用例展示了ObjectRL设计的高效性,突显了其灵活性和适用性,以进行快速原型设计。文档和源代码可在https://objectrl.readthedocs.io和https://github.com/adinlab/objectrl上找到。
更新时间: 2025-07-04 11:27:52
领域: cs.LG
A bound on the quantum value of all compiled nonlocal games
A cryptographic compiler introduced by Kalai et al. (STOC'23) converts any nonlocal game into an interactive protocol with a single computationally bounded prover. Although the compiler is known to be sound in the case of classical provers and complete in the quantum case, quantum soundness has so far only been established for special classes of games. In this work, we establish a quantum soundness result for all compiled two-player nonlocal games. In particular, we prove that the quantum commuting operator value of the underlying nonlocal game is an upper bound on the quantum value of the compiled game. Our result employs techniques from operator algebras in a computational and cryptographic setting to establish information-theoretic objects in the asymptotic limit of the security parameter. It further relies on a sequential characterization of quantum commuting operator correlations which may be of independent interest.
Updated: 2025-07-04 11:24:12
标题: 一个关于所有编译的非局域博弈量子价值的界限
摘要: Kalai等人引入的密码编译器(STOC'23)将任何非局部游戏转换为具有单个计算有界证明者的交互式协议。尽管已知该编译器在经典证明者情况下是可靠的,并且在量子情况下是完备的,但到目前为止,量子可靠性仅已针对特定类别的游戏得到证实。在这项工作中,我们为所有编译的两人非局部游戏建立了量子可靠性结果。特别地,我们证明了底层非局部游戏的量子交换算子值是编译游戏的量子值的上限。我们的结果利用了来自运算符代数的技术,在计算和密码设置中建立了在安全参数的渐近极限下的信息论对象。它进一步依赖于量子交换算子相关性的顺序特征化,这可能是独立感兴趣的。
更新时间: 2025-07-04 11:24:12
领域: quant-ph,cs.CR,math-ph,math.MP
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
In this paper, we introduce BMMR, a large-scale bilingual, multimodal, multi-disciplinary reasoning dataset for the community to develop and evaluate large multimodal models (LMMs). BMMR comprises 110k college-level questions spanning 300 UNESCO-defined subjects, spanning diverse formats-multiple-choice, fill-in-the-blank, and open-ended QA-and sourced from both print and digital media such as books, exams, and quizzes. All data are curated and filtered via a human-in-the-loop and scalable framework, and each instance is paired with a high-quality reasoning path. The dataset is organized into two parts: BMMR-Eval that comprises 20,458 high-quality instances to comprehensively assess LMMs' knowledge and reasoning across multiple disciplines in both Chinese and English; and BMMR-Train that contains 88,991 instances to support further research and development, extending the current focus on mathematical reasoning to diverse disciplines and domains. In addition, we propose the process-based multi-discipline verifier (i.e., BMMR-Verifier) for accurate and fine-grained evaluation of reasoning paths. Extensive experiments on 24 models reveal that (i) even SOTA models (e.g., o3 and Gemini-2.5-Pro) leave substantial headroom on BMMR-Eval; (ii) reasoning models exhibit discipline bias and outperform LMMs only on specific subjects; (iii) open-source models still trail their proprietary counterparts; and (iv) fine-tuning on BMMR-Train narrows this gap. Additionally, we conduct reasoning-chain analyses using BMMR-Verifier and other in-depth studies, uncovering the challenges LMMs currently face in multidisciplinary reasoning. We will release the data, and we hope our work can offer insights and contributions to the community.
Updated: 2025-07-04 11:20:09
标题: BMMR:一个大规模的双语多模态多学科推理数据集
摘要: 在这篇论文中,我们介绍了BMMR,这是一个大规模的双语、多模态、多学科推理数据集,供社区开发和评估大型多模态模型(LMMs)。BMMR包括11万个涵盖300个联合国教科文组织定义的学科的大学水平问题,涵盖多种格式-多项选择题、填空题和开放式问答-并且来自于印刷和数字媒体,如书籍、考试和测验。所有数据都经过人为筛选和过滤,并且每个实例都配有高质量的推理路径。数据集分为两部分:BMMR-Eval包括20,458个高质量实例,全面评估LMMs在中文和英文多个学科中的知识和推理能力;而BMMR-Train包含88,991个实例,支持进一步的研究和发展,将当前对数学推理的关注扩展到各种学科和领域。此外,我们提出了基于过程的多学科验证器(即BMMR-Verifier),用于准确和细致地评估推理路径。对24个模型的广泛实验表明:(i)即使是SOTA模型(如o3和Gemini-2.5-Pro)在BMMR-Eval上仍存在很大的改进空间;(ii)推理模型表现出学科偏见,并且仅在特定学科上胜过LMMs;(iii)开源模型仍落后于专有模型;(iv)在BMMR-Train上进行微调可以缩小这一差距。此外,我们使用BMMR-Verifier和其他深入研究进行推理链分析,揭示了当前LMMs在多学科推理中面临的挑战。我们将发布数据,并希望我们的工作可以为社区提供见解和贡献。
更新时间: 2025-07-04 11:20:09
领域: cs.CL,cs.AI
Chat2SPaT: A Large Language Model Based Tool for Automating Traffic Signal Control Plan Management
Pre-timed traffic signal control, commonly used for operating signalized intersections and coordinated arterials, requires tedious manual work for signaling plan creating and updating. When the time-of-day or day-of-week plans are utilized, one intersection is often associated with multiple plans, leading to further repetitive manual plan parameter inputting. To enable a user-friendly traffic signal control plan management process, this study proposes Chat2SPaT, a method to convert users' semi-structured and ambiguous descriptions on the signal control plan to exact signal phase and timing (SPaT) results, which could further be transformed into structured stage-based or ring-based plans to interact with intelligent transportation system (ITS) software and traffic signal controllers. With curated prompts, Chat2SPaT first leverages large language models' (LLMs) capability of understanding users' plan descriptions and reformulate the plan as a combination of phase sequence and phase attribute results in the json format. Based on LLM outputs, python scripts are designed to locate phases in a cycle, address nuances of traffic signal control, and finally assemble the complete traffic signal control plan. Within a chat, the pipeline can be utilized iteratively to conduct further plan editing. Experiments show that Chat2SPaT can generate plans with an accuracy of over 94% for both English and Chinese cases, using a test dataset with over 300 plan descriptions. As the first benchmark for evaluating LLMs' capability of understanding traffic signal control plan descriptions, Chat2SPaT provides an easy-to-use plan management pipeline for traffic practitioners and researchers, serving as a potential new building block for a more accurate and versatile application of LLMs in the field of ITS. The source codes, prompts and test dataset are openly accessible at https://github.com/yuewangits/Chat2SPaT.
Updated: 2025-07-04 11:10:24
标题: Chat2SPaT:基于大型语言模型的自动化交通信号控制方案管理工具
摘要: 预定时交通信号控制通常用于操作信号控制交叉口和协调的干道,需要繁琐的手动工作来创建和更新信号计划。当利用时间或每周计划时,一个交叉口通常与多个计划相关联,导致进一步重复的手动计划参数输入。为了实现用户友好的交通信号控制计划管理流程,本研究提出了Chat2SPaT,一种将用户的半结构化和模糊描述转换为准确信号相位和时序(SPaT)结果的方法,这些结果进一步可以转换为结构化的基于阶段或基于环的计划,与智能交通系统(ITS)软件和交通信号控制器进行交互。通过精心设计的提示,Chat2SPaT首先利用大型语言模型(LLMs)理解用户计划描述的能力,并将计划重新构造为json格式的相位序列和相位属性结果的组合。根据LLM的输出,设计了Python脚本来定位循环中的相位,解决交通信号控制的细微差别,并最终组装完成交通信号控制计划。在对话中,可以迭代地利用该流水线进行进一步的计划编辑。实验证明,Chat2SPaT可以为英文和中文案例生成准确度超过94%的计划,使用了包含300多个计划描述的测试数据集。作为评估LLMs理解交通信号控制计划描述能力的第一个基准,Chat2SPaT为交通从业者和研究人员提供了一个易于使用的计划管理流水线,为ITS领域中更准确和多功能应用LLMs提供了一个潜在的新构建模块。源代码、提示和测试数据集可以在https://github.com/yuewangits/Chat2SPaT上公开访问。
更新时间: 2025-07-04 11:10:24
领域: cs.AI,cs.CL
REAL: Benchmarking Abilities of Large Language Models for Housing Transactions and Services
The development of large language models (LLMs) has greatly promoted the progress of chatbot in multiple fields. There is an urgent need to evaluate whether LLMs can play the role of agent in housing transactions and services as well as humans. We present Real Estate Agent Large Language Model Evaluation (REAL), the first evaluation suite designed to assess the abilities of LLMs in the field of housing transactions and services. REAL comprises 5,316 high-quality evaluation entries across 4 topics: memory, comprehension, reasoning and hallucination. All these entries are organized as 14 categories to assess whether LLMs have the knowledge and ability in housing transactions and services scenario. Additionally, the REAL is used to evaluate the performance of most advanced LLMs. The experiment results indicate that LLMs still have significant room for improvement to be applied in the real estate field.
Updated: 2025-07-04 11:05:44
标题: REAL:房屋交易和服务的大型语言模型基准能力
摘要: 大型语言模型(LLMs)的发展极大地推动了聊天机器人在多个领域的进展。迫切需要评估LLMs是否可以在房地产交易和服务中扮演代理人的角色,就像人类一样。我们提出了房地产代理大型语言模型评估(REAL),这是第一个旨在评估LLMs在房地产交易和服务领域能力的评估套件。REAL包括5,316个高质量的评估条目,涵盖4个主题:记忆、理解、推理和幻觉。所有这些条目都被组织为14个类别,以评估LLMs在房地产交易和服务场景中是否具有知识和能力。此外,REAL还用于评估最先进的LLMs的性能。实验结果表明,LLMs在应用于房地产领域仍有显著的改进空间。
更新时间: 2025-07-04 11:05:44
领域: cs.AI
Molecular Machine Learning Using Euler Characteristic Transforms
The shape of a molecule determines its physicochemical and biological properties. However, it is often underrepresented in standard molecular representation learning approaches. Here, we propose using the Euler Characteristic Transform (ECT) as a geometrical-topological descriptor. Computed directly on a molecular graph derived from handcrafted atomic features, the ECT enables the extraction of multiscale structural features, offering a novel way to represent and encode molecular shape in the feature space. We assess the predictive performance of this representation across nine benchmark regression datasets, all centered around predicting the inhibition constant $K_i$. In addition, we compare our proposed ECT-based representation against traditional molecular representations and methods, such as molecular fingerprints/descriptors and graph neural networks (GNNs). Our results show that our ECT-based representation achieves competitive performance, ranking among the best-performing methods on several datasets. More importantly, its combination with traditional representations, particularly with the AVALON fingerprint, significantly \emph{enhances predictive performance}, outperforming other methods on most datasets. These findings highlight the complementary value of multiscale topological information and its potential for being combined with established techniques. Our study suggests that hybrid approaches incorporating explicit shape information can lead to more informative and robust molecular representations, enhancing and opening new avenues in molecular machine learning tasks. To support reproducibility and foster open biomedical research, we provide open access to all experiments and code used in this work.
Updated: 2025-07-04 10:57:40
标题: 使用欧拉特征变换的分子机器学习
摘要: 分子的形状决定了其物理化学和生物学性质。然而,在标准分子表示学习方法中,形状往往被低估。在这里,我们提出使用欧拉特征变换(ECT)作为几何拓扑描述符。直接计算在从手工原子特征导出的分子图上,ECT使得能够提取多尺度结构特征,提供了一种新颖的方式来表示和编码分子形状在特征空间中。我们评估了这种表示在九个基准回归数据集上的预测性能,所有这些数据集都围绕着预测抑制常数$K_i$。此外,我们将我们提出的基于ECT的表示与传统分子表示和方法进行比较,例如分子指纹/描述符和图神经网络(GNNs)。我们的结果显示,我们基于ECT的表示实现了竞争性能,排名在几个数据集上表现最好的方法之间。更重要的是,它与传统表示的结合,特别是与AVALON指纹的结合,显著增强了预测性能,胜过了大多数数据集上的其他方法。这些发现突显了多尺度拓扑信息的互补价值及其与已建立技术结合的潜力。我们的研究表明,融合明确形状信息的混合方法可以导致更具信息性和鲁棒性的分子表示,增强并开辟了分子机器学习任务中的新途径。为了支持可重复性并促进开放的生物医学研究,我们提供了在这项工作中使用的所有实验和代码的开放访问。
更新时间: 2025-07-04 10:57:40
领域: cs.LG,math.AT,q-bio.BM
Beyond Weaponization: NLP Security for Medium and Lower-Resourced Languages in Their Own Right
Despite mounting evidence that multilinguality can be easily weaponized against language models (LMs), works across NLP Security remain overwhelmingly English-centric. In terms of securing LMs, the NLP norm of "English first" collides with standard procedure in cybersecurity, whereby practitioners are expected to anticipate and prepare for worst-case outcomes. To mitigate worst-case outcomes in NLP Security, researchers must be willing to engage with the weakest links in LM security: lower-resourced languages. Accordingly, this work examines the security of LMs for lower- and medium-resourced languages. We extend existing adversarial attacks for up to 70 languages to evaluate the security of monolingual and multilingual LMs for these languages. Through our analysis, we find that monolingual models are often too small in total number of parameters to ensure sound security, and that while multilinguality is helpful, it does not always guarantee improved security either. Ultimately, these findings highlight important considerations for more secure deployment of LMs, for communities of lower-resourced languages.
Updated: 2025-07-04 10:54:04
标题: 超越武器化:NLP安全性对于中等和低资源语言而言是其本身的需求
摘要: 尽管越来越多的证据表明,多语言能够轻松地被用作武器对抗语言模型(LMs),但自然语言处理安全领域的研究仍然以英语为中心。在保护LMs方面,自然语言处理的规范“先英语”与网络安全中的标准程序相冲突,网络安全从业者被期望能够预测和准备最坏的结果。为了减轻自然语言处理安全中的最坏情况,研究人员必须愿意与LM安全中最薄弱的环节进行交流:低资源语言。因此,本文探讨了低资源和中等资源语言的LMs安全性。我们扩展了现有的对抗性攻击方法,涵盖了70种语言,评估了针对这些语言的单语和多语言LMs的安全性。通过我们的分析,我们发现单语模型的参数总数通常太小,无法确保安全性,而多语言性虽有帮助,但也并不总是能够保证提高安全性。最终,这些发现突出了更安全地部署LMs的重要考虑因素,特别是对于低资源语言社区。
更新时间: 2025-07-04 10:54:04
领域: cs.CL,cs.AI
Exploring LLM Capabilities in Extracting DCAT-Compatible Metadata for Data Cataloging
Efficient data exploration is crucial as data becomes increasingly important for accelerating processes, improving forecasts and developing new business models. Data consumers often spend 25-98 % of their time searching for suitable data due to the exponential growth, heterogeneity and distribution of data. Data catalogs can support and accelerate data exploration by using metadata to answer user queries. However, as metadata creation and maintenance is often a manual process, it is time-consuming and requires expertise. This study investigates whether LLMs can automate metadata maintenance of text-based data and generate high-quality DCAT-compatible metadata. We tested zero-shot and few-shot prompting strategies with LLMs from different vendors for generating metadata such as titles and keywords, along with a fine-tuned model for classification. Our results show that LLMs can generate metadata comparable to human-created content, particularly on tasks that require advanced semantic understanding. Larger models outperformed smaller ones, and fine-tuning significantly improves classification accuracy, while few-shot prompting yields better results in most cases. Although LLMs offer a faster and reliable way to create metadata, a successful application requires careful consideration of task-specific criteria and domain context.
Updated: 2025-07-04 10:49:37
标题: 探索LLM在提取与DCAT兼容的元数据以进行数据目录编制中的能力
摘要: 随着数据在加速流程、改进预测和开发新业务模式方面变得日益重要,高效的数据探索至关重要。由于数据的指数增长、异构性和分布性,数据消费者通常会花费 25-98% 的时间寻找合适的数据。数据目录可以通过使用元数据回答用户查询来支持和加速数据探索。然而,由于元数据的创建和维护通常是一个手动过程,所以这是耗时的并且需要专业知识。本研究调查了LLMs是否能够自动化基于文本数据的元数据维护,并生成高质量的与DCAT兼容的元数据。我们测试了来自不同供应商的LLMs的零样本和少样本提示策略,用于生成标题和关键词等元数据,以及用于分类的经过精细调整的模型。我们的结果显示,LLMs可以生成与人类创建内容相媲美的元数据,尤其是在需要高级语义理解的任务上。较大的模型胜过较小的模型,并且精细调整显著提高了分类准确性,而少样本提示在大多数情况下产生更好的结果。尽管LLMs提供了一种更快速和可靠的创建元数据的方式,但成功的应用需要仔细考虑特定任务标准和领域背景。
更新时间: 2025-07-04 10:49:37
领域: cs.IR,cs.AI
Multi-Agent Reasoning for Cardiovascular Imaging Phenotype Analysis
Identifying the associations between imaging phenotypes and disease risk factors and outcomes is essential for understanding disease mechanisms and improving diagnosis and prognosis models. However, traditional approaches rely on human-driven hypothesis testing and selection of association factors, often overlooking complex, non-linear dependencies among imaging phenotypes and other multi-modal data. To address this, we introduce a Multi-agent Exploratory Synergy for the Heart (MESHAgents) framework that leverages large language models as agents to dynamically elicit, surface, and decide confounders and phenotypes in association studies, using cardiovascular imaging as a proof of concept. Specifically, we orchestrate a multi-disciplinary team of AI agents -- spanning cardiology, biomechanics, statistics, and clinical research -- which spontaneously generate and converge on insights through iterative, self-organizing reasoning. The framework dynamically synthesizes statistical correlations with multi-expert consensus, providing an automated pipeline for phenome-wide association studies (PheWAS). We demonstrate the system's capabilities through a population-based study of imaging phenotypes of the heart and aorta. MESHAgents autonomously uncovered correlations between imaging phenotypes and a wide range of non-imaging factors, identifying additional confounder variables beyond standard demographic factors. Validation on diagnosis tasks reveals that MESHAgents-discovered phenotypes achieve performance comparable to expert-selected phenotypes, with mean AUC differences as small as -0.004 on disease classification tasks. Notably, the recall score improves for 6 out of 9 disease types. Our framework provides clinically relevant imaging phenotypes with transparent reasoning, offering a scalable alternative to expert-driven methods.
Updated: 2025-07-04 10:30:32
标题: 多智能体推理用于心血管影像表型分析
摘要: 识别影像表型与疾病风险因素和结果之间的关联对于理解疾病机制和改善诊断和预后模型至关重要。然而,传统方法依赖于人为驱动的假设检验和关联因素的选择,通常忽视影像表型和其他多模态数据之间的复杂、非线性依赖关系。为了解决这个问题,我们引入了一个名为心脏多智能体探索协同(MESHAgents)的框架,利用大型语言模型作为智能体,动态引出、表面化和决定混杂因素和表型在关联研究中使用心血管影像作为概念验证。具体而言,我们编排了一个跨学科团队的AI智能体--涵盖心脏病学、生物力学、统计学和临床研究--通过迭代、自组织推理,自发地生成并收敛于见解。该框架动态合成统计相关性和多专家共识,为表型全基因关联研究(PheWAS)提供自动化流程。我们通过一个基于人口的心脏和主动脉影像表型的研究展示了系统的能力。MESHAgents自主地发现了影像表型与各种非影像因素之间的相关性,识别了标准人口统计因素之外的额外混杂变量。在诊断任务上的验证显示,MESHAgents发现的表型在疾病分类任务上达到了与专家选择的表型相当的性能,平均AUC差异小至-0.004。值得注意的是,对于9种疾病类型中的6种,召回率得到了提高。我们的框架提供具有透明推理的临床相关性影像表型,为专家驱动方法提供了可伸缩的替代方案。
更新时间: 2025-07-04 10:30:32
领域: cs.AI
Helping CLIP See Both the Forest and the Trees: A Decomposition and Description Approach
Vision-Language Models (VLMs) like CLIP achieve cross-modal semantic alignment through contrastive learning, exhibiting robust zero-shot generalization. Traditional prompt engineering, however, predominantly relies on coarse-grained category labels, neglecting fine-grained local semantics. Existing approaches assume that VLMs inherently recognize localized visual details and attempt to enhance classification by augmenting text prompts with attribute descriptors generated by large language models. However, our systematic experiments reveal critical limitations: CLIP's strong bias toward global image patterns hinders its ability to process localized visual descriptors. To address this fundamental constraint, we propose a simple, effective, and plug-and-play solution that enables CLIP to ``See Both the Forest and the Trees." Specifically, we employ stochastic multi-crop augmentation to activate CLIP's latent capacity for localized feature analysis. By cropping only partial regions, the approach effectively constrains the model's receptive field and recalibrates its attention mechanism, thereby mitigating its inherent bias. We evaluate the proposed method under zero-shot, few-shot, and test-time adaptation settings, and extensive experiments demonstrate that D&D achieves promising performance.
Updated: 2025-07-04 10:24:26
标题: 帮助CLIP同时看到整体和细节:一种分解和描述方法
摘要: 视觉语言模型(VLMs)如CLIP通过对比学习实现了跨模态语义对齐,表现出强大的零样本泛化能力。然而,传统的提示工程主要依赖粗粒度的类别标签,忽略了细粒度的局部语义。现有方法假定VLMs本质上识别局部视觉细节,并试图通过增强大型语言模型生成的属性描述符来增强分类。然而,我们的系统实验证明了关键局限性:CLIP对全局图像模式的强烈偏见阻碍了其处理局部视觉描述符的能力。为了解决这一基本限制,我们提出了一个简单、有效且即插即用的解决方案,使CLIP能够“看到树木和树”。具体来说,我们采用随机多裁剪增强来激活CLIP的局部特征分析的潜在能力。通过仅裁剪部分区域,该方法有效地限制了模型的接受域,并重新校准了其注意机制,从而减轻了其固有的偏见。我们在零样本、少样本和测试时间适应设置下评估了所提出的方法,并进行了大量实验,结果表明D&D取得了令人满意的性能。
更新时间: 2025-07-04 10:24:26
领域: cs.CV,cs.AI
Evaluating the Evaluators: Trust in Adversarial Robustness Tests
Despite significant progress in designing powerful adversarial evasion attacks for robustness verification, the evaluation of these methods often remains inconsistent and unreliable. Many assessments rely on mismatched models, unverified implementations, and uneven computational budgets, which can lead to biased results and a false sense of security. Consequently, robustness claims built on such flawed testing protocols may be misleading and give a false sense of security. As a concrete step toward improving evaluation reliability, we present AttackBench, a benchmark framework developed to assess the effectiveness of gradient-based attacks under standardized and reproducible conditions. AttackBench serves as an evaluation tool that ranks existing attack implementations based on a novel optimality metric, which enables researchers and practitioners to identify the most reliable and effective attack for use in subsequent robustness evaluations. The framework enforces consistent testing conditions and enables continuous updates, making it a reliable foundation for robustness verification.
Updated: 2025-07-04 10:07:26
标题: 评估评估者:对对抗性鲁棒性测试的信任
摘要: 尽管在设计强大的对抗性规避攻击以进行鲁棒性验证方面取得了显著进展,但这些方法的评估往往仍然不一致且不可靠。许多评估依赖于不匹配的模型、未经验证的实施和不均衡的计算预算,这可能导致偏见结果和一种虚假的安全感。因此,基于这种有缺陷的测试协议建立的鲁棒性声明可能具有误导性,并给人一种虚假的安全感。为了提高评估的可靠性,我们提出了AttackBench,这是一个基准框架,旨在评估在标准化和可重复的条件下梯度攻击的有效性。AttackBench作为一个评估工具,根据一种新颖的最优性度量来排名现有的攻击实施,这使研究人员和从业人员能够识别在后续鲁棒性评估中使用的最可靠和有效的攻击。该框架强制执行一致的测试条件,并支持持续更新,使其成为鲁棒性验证的可靠基础。
更新时间: 2025-07-04 10:07:26
领域: cs.CR,cs.AI,cs.CV,cs.LG
Do You Trust Your Model? Emerging Malware Threats in the Deep Learning Ecosystem
Training high-quality deep learning models is a challenging task due to computational and technical requirements. A growing number of individuals, institutions, and companies increasingly rely on pre-trained, third-party models made available in public repositories. These models are often used directly or integrated in product pipelines with no particular precautions, since they are effectively just data in tensor form and considered safe. In this paper, we raise awareness of a new machine learning supply chain threat targeting neural networks. We introduce MaleficNet 2.0, a novel technique to embed self-extracting, self-executing malware in neural networks. MaleficNet 2.0 uses spread-spectrum channel coding combined with error correction techniques to inject malicious payloads in the parameters of deep neural networks. MaleficNet 2.0 injection technique is stealthy, does not degrade the performance of the model, and is robust against removal techniques. We design our approach to work both in traditional and distributed learning settings such as Federated Learning, and demonstrate that it is effective even when a reduced number of bits is used for the model parameters. Finally, we implement a proof-of-concept self-extracting neural network malware using MaleficNet 2.0, demonstrating the practicality of the attack against a widely adopted machine learning framework. Our aim with this work is to raise awareness against these new, dangerous attacks both in the research community and industry, and we hope to encourage further research in mitigation techniques against such threats.
Updated: 2025-07-04 09:59:48
标题: 你信任你的模型吗?深度学习生态系统中新兴的恶意软件威胁
摘要: 训练高质量的深度学习模型是一项具有挑战性的任务,这是由于计算和技术要求所致。越来越多的个人、机构和公司越来越依赖于公共存储库中提供的预训练的第三方模型。这些模型通常直接使用或集成在产品管道中,而不采取特殊预防措施,因为它们实际上只是以张量形式存在的数据,被认为是安全的。在本文中,我们提高了对针对神经网络的新型机器学习供应链威胁的认识。我们介绍了MaleficNet 2.0,一种将自提取、自执行恶意软件嵌入神经网络的新技术。MaleficNet 2.0采用了扩频信道编码与纠错技术相结合的方式,在深度神经网络的参数中注入恶意载荷。MaleficNet 2.0注入技术具有隐蔽性,不会降低模型的性能,并且对移除技术具有鲁棒性。我们设计我们的方法可以在传统和分布式学习环境中(如联邦学习)都能工作,并且演示了即使在模型参数中使用了减少的比特时也是有效的。最后,我们使用MaleficNet 2.0实现了一个概念验证的自提取神经网络恶意软件,展示了这种攻击对广泛采用的机器学习框架的实用性。我们的目的是在研究界和工业界提高对这些新型危险攻击的警惕,并希望鼓励进一步研究防范此类威胁的技术。
更新时间: 2025-07-04 09:59:48
领域: cs.CR,cs.AI
Improving Social Determinants of Health Documentation in French EHRs Using Large Language Models
Social determinants of health (SDoH) significantly influence health outcomes, shaping disease progression, treatment adherence, and health disparities. However, their documentation in structured electronic health records (EHRs) is often incomplete or missing. This study presents an approach based on large language models (LLMs) for extracting 13 SDoH categories from French clinical notes. We trained Flan-T5-Large on annotated social history sections from clinical notes at Nantes University Hospital, France. We evaluated the model at two levels: (i) identification of SDoH categories and associated values, and (ii) extraction of detailed SDoH with associated temporal and quantitative information. The model performance was assessed across four datasets, including two that we publicly release as open resources. The model achieved strong performance for identifying well-documented categories such as living condition, marital status, descendants, job, tobacco, and alcohol use (F1 score > 0.80). Performance was lower for categories with limited training data or highly variable expressions, such as employment status, housing, physical activity, income, and education. Our model identified 95.8% of patients with at least one SDoH, compared to 2.8% for ICD-10 codes from structured EHR data. Our error analysis showed that performance limitations were linked to annotation inconsistencies, reliance on English-centric tokenizer, and reduced generalizability due to the model being trained on social history sections only. These results demonstrate the effectiveness of NLP in improving the completeness of real-world SDoH data in a non-English EHR system.
Updated: 2025-07-04 09:41:33
标题: 使用大型语言模型改进法国电子健康记录中的社会健康决定因素文档化
摘要: 社会 determinants of health (SDoH) 显著影响健康结果,塑造疾病进展、治疗依从性和健康差距。然而,它们在结构化电子健康记录(EHRs)中的记录通常是不完整或缺失的。本研究提出了一种基于大型语言模型(LLMs)的方法,用于从法国临床笔记中提取13个SDoH类别。我们在法国南特大学医院的临床笔记中对带注释的社会史部分对Flan-T5-Large进行了训练。我们在两个级别上评估了模型:(i)识别SDoH类别及其相关值,和(ii)提取详细的SDoH以及相关的时间和数量信息。该模型的性能在四个数据集上进行了评估,包括我们作为开放资源公开发布的两个数据集。该模型在识别生活条件、婚姻状况、后代、工作、吸烟和饮酒等有良好记录的类别方面表现出强大的性能(F1分数> 0.80)。对于训练数据有限或表达高度变化的类别,如就业状况、住房、体育活动、收入和教育,性能较低。我们的模型识别出95.8%的至少有一个SDoH的患者,而结构化EHR数据中的ICD-10代码仅识别出2.8%。我们的错误分析显示,性能限制与标注不一致、依赖英语为中心的分词器以及模型仅在社会史部分进行训练导致的泛化能力降低有关。这些结果证明了自然语言处理在改善非英语EHR系统中现实世界SDoH数据的完整性方面的有效性。
更新时间: 2025-07-04 09:41:33
领域: cs.CL,cs.AI
Multi-Level Fusion Graph Neural Network for Molecule Property Prediction
Accurate molecular property prediction is essential in drug discovery and related fields. However, existing graph neural networks (GNNs) often struggle to simultaneously capture both local and global molecular structures. In this work, we propose a Multi-Level Fusion Graph Neural Network (MLFGNN) that integrates Graph Attention Networks and a novel Graph Transformer to jointly model local and global dependencies. In addition, we incorporate molecular fingerprints as a complementary modality and introduce a mechanism of interaction between attention to adaptively fuse information across representations. Extensive experiments on multiple benchmark datasets demonstrate that MLFGNN consistently outperforms state-of-the-art methods in both classification and regression tasks. Interpretability analysis further reveals that the model effectively captures task-relevant chemical patterns, supporting the usefulness of multi-level and multi-modal fusion in molecular representation learning.
Updated: 2025-07-04 09:38:19
标题: 多级融合图神经网络用于分子属性预测
摘要: 准确的分子属性预测在药物发现和相关领域中至关重要。然而,现有的图神经网络(GNNs)往往难以同时捕捉局部和全局分子结构。在这项工作中,我们提出了一种多级融合图神经网络(MLFGNN),它整合了图注意力网络和一种新颖的图变换器,共同建模局部和全局依赖关系。此外,我们将分子指纹作为一种补充模态,并引入一种注意力交互机制,以自适应地融合跨表示的信息。在多个基准数据集上进行的大量实验表明,MLFGNN在分类和回归任务中始终优于最先进的方法。解释性分析进一步揭示了该模型有效地捕捉了与任务相关的化学模式,支持了多级和多模态融合在分子表示学习中的有效性。
更新时间: 2025-07-04 09:38:19
领域: cs.LG,cs.AI,68T07,I.2.6
Generating realistic patient data
Developing algorithms for real-life problems that perform well in practice highly depends on the availability of realistic data for testing. Obtaining real-life data for optimization problems in health care, however, is often difficult. This is especially true for any patient related optimization problems, e.g., for patient-to-room assignment, due to data privacy policies. Furthermore, obtained real-life data usually cannot be published which prohibits reproducibility of results by other researchers. Therefore, often artificially generated instances are used. In this paper, we present combinatorial insights about the feasibility of instances for the patient-to-room assignment problem (PRA). We use these insights to develop a configurable instance generator for PRA with an easy-to-use graphical user interface. Configurability is in this case especially important as we observed in an extensive analysis of real-life data that, e.g., the probability distribution for patients' age and length of stay depends on the respective ward.
Updated: 2025-07-04 09:29:30
标题: 生成逼真的患者数据
摘要: 在实践中表现良好的解决实际问题的算法的开发高度依赖于可用于测试的现实数据。然而,在卫生保健中获取优化问题的实际数据通常很困难。这对于任何涉及患者的优化问题,例如患者与病房的分配,由于数据隐私政策而尤为真实。此外,获取的实际数据通常不能被公开发布,这会阻止其他研究人员重现结果。因此,通常会使用人工生成的实例。在本文中,我们提出了有关患者与病房分配问题(PRA)实例可行性的组合洞察。我们利用这些洞察力开发了一个可配置的PRA实例生成器,具有易于使用的图形用户界面。在这种情况下,可配置性尤为重要,因为我们在对实际数据进行广泛分析时观察到,例如,患者年龄和住院时间的概率分布取决于各自的病房。
更新时间: 2025-07-04 09:29:30
领域: math.OC,cs.DM,cs.LG
A Hybrid Game-Theory and Deep Learning Framework for Predicting Tourist Arrivals via Big Data Analytics and Opinion Leader Detection
In the era of Industry 5.0, data-driven decision-making has become indispensable for optimizing systems across Industrial Engineering. This paper addresses the value of big data analytics by proposing a novel non-linear hybrid approach for forecasting international tourist arrivals in two different contexts: (i) arrivals to Hong Kong from five major source nations (pre-COVID-19), and (ii) arrivals to Sanya in Hainan province, China (post-COVID-19). The method integrates multiple sources of Internet big data and employs an innovative game theory-based algorithm to identify opinion leaders on social media platforms. Subsequently, nonstationary attributes in tourism demand data are managed through Empirical Wavelet Transform (EWT), ensuring refined time-frequency analysis. Finally, a memory-aware Stacked Bi-directional Long Short-Term Memory (Stacked BiLSTM) network is used to generate accurate demand forecasts. Experimental results demonstrate that this approach outperforms existing state-of-the-art techniques and remains robust under dynamic and volatile conditions, highlighting its applicability to broader Industrial Engineering domains, such as logistics, supply chain management, and production planning, where forecasting and resource allocation are key challenges. By merging advanced Deep Learning (DL), time-frequency analysis, and social media insights, the proposed framework showcases how large-scale data can elevate the quality and efficiency of decision-making processes.
Updated: 2025-07-04 09:17:17
标题: 一个混合博弈论和深度学习框架用于通过大数据分析和意见领袖检测预测游客到达量
摘要: 在工业5.0时代,基于数据的决策已经成为优化工业工程系统不可或缺的部分。本文通过提出一种新颖的非线性混合方法,探讨了大数据分析的价值,用于预测两种不同情境下国际游客到达量:(i)来自五个主要来源国到香港的到达量(COVID-19之前),以及(ii)来自中国海南三亚的到达量(COVID-19之后)。该方法整合了多个互联网大数据源,并采用基于博弈论的创新算法来识别社交媒体平台上的意见领袖。随后,通过经验小波变换(EWT)来处理旅游需求数据中的非平稳属性,确保精细的时频分析。最后,使用一种记忆感知的双向长短期记忆网络(Stacked BiLSTM)来生成准确的需求预测。实验结果表明,这种方法优于现有的最先进技术,并在动态和不稳定条件下仍然表现稳健,突显了其在更广泛的工业工程领域的适用性,如物流、供应链管理和生产计划,其中预测和资源分配是关键挑战。通过融合先进的深度学习(DL)、时频分析和社交媒体见解,提出的框架展示了大规模数据如何提升决策过程的质量和效率。
更新时间: 2025-07-04 09:17:17
领域: cs.LG,cs.GT,eess.SP
Lessons from a Chimp: AI "Scheming" and the Quest for Ape Language
We examine recent research that asks whether current AI systems may be developing a capacity for "scheming" (covertly and strategically pursuing misaligned goals). We compare current research practices in this field to those adopted in the 1970s to test whether non-human primates could master natural language. We argue that there are lessons to be learned from that historical research endeavour, which was characterised by an overattribution of human traits to other agents, an excessive reliance on anecdote and descriptive analysis, and a failure to articulate a strong theoretical framework for the research. We recommend that research into AI scheming actively seeks to avoid these pitfalls. We outline some concrete steps that can be taken for this research programme to advance in a productive and scientifically rigorous fashion.
Updated: 2025-07-04 09:16:11
标题: 从黑猩猩身上学到的教训:人工智能的“策划”与猿类语言的追求
摘要: 我们检查了最近的研究,询问当前的AI系统是否可能正在发展出一种“阴谋”的能力(暗中和策略性地追求不一致的目标)。我们将当前在这一领域的研究实践与上世纪70年代为了测试非人类灵长类动物是否能掌握自然语言而采用的实践进行比较。我们认为从那次历史研究努力中可以学到一些教训,那次研究以过度将人类特征归因于其他代理人、过度依赖轶事和描述性分析以及在研究中未能明确阐述强有力的理论框架为特征。我们建议对AI阴谋的研究积极寻求避免这些陷阱。我们概述了一些可以采取的具体步骤,以便这一研究项目以一种富有成效且科学严谨的方式前进。
更新时间: 2025-07-04 09:16:11
领域: cs.AI
Artificial intelligence in drug discovery: A comprehensive review with a case study on hyperuricemia, gout arthritis, and hyperuricemic nephropathy
This paper systematically reviews recent advances in artificial intelligence (AI), with a particular focus on machine learning (ML), across the entire drug discovery pipeline. Due to the inherent complexity, escalating costs, prolonged timelines, and high failure rates of traditional drug discovery methods, there is a critical need to comprehensively understand how AI/ML can be effectively integrated throughout the full process. Currently available literature reviews often narrowly focus on specific phases or methodologies, neglecting the dependence between key stages such as target identification, hit screening, and lead optimization. To bridge this gap, our review provides a detailed and holistic analysis of AI/ML applications across these core phases, highlighting significant methodological advances and their impacts at each stage. We further illustrate the practical impact of these techniques through an in-depth case study focused on hyperuricemia, gout arthritis, and hyperuricemic nephropathy, highlighting real-world successes in molecular target identification and therapeutic candidate discovery. Additionally, we discuss significant challenges facing AI/ML in drug discovery and outline promising future research directions. Ultimately, this review serves as an essential orientation for researchers aiming to leverage AI/ML to overcome existing bottlenecks and accelerate drug discovery.
Updated: 2025-07-04 09:14:56
标题: 药物发现中的人工智能:对高尿酸血症、痛风性关节炎和高尿酸血症的综合评述及案例研究
摘要: 这篇论文系统地回顾了人工智能(AI)在整个药物发现流程中的最新进展,特别关注机器学习(ML)。由于传统药物发现方法的内在复杂性、成本不断上升、时间线延长以及高失败率,迫切需要全面了解AI/ML如何有效地整合在整个流程中。目前可用的文献综述通常狭窄地关注特定阶段或方法论,忽略了目标识别、命中筛选和引物优化等关键阶段之间的依赖关系。为填补这一空白,我们的综述提供了对这些核心阶段中AI/ML应用的详细和整体分析,突出了在每个阶段的重大方法进展及其影响。我们进一步通过针对高尿酸血症、痛风性关节炎和高尿酸血症肾病的深入案例研究来说明这些技术的实际影响,突出了在分子靶标识别和治疗候选药物发现方面的现实成功。此外,我们讨论了药物发现中AI/ML面临的重大挑战,并概述了有前途的未来研究方向。最终,这篇综述作为一个重要的方向,为希望利用AI/ML克服现有瓶颈、加速药物发现的研究人员提供了指导。
更新时间: 2025-07-04 09:14:56
领域: cs.AI,q-bio.QM
Hungary and AI: efforts and opportunities in comparison with Singapore
The study assesses Hungary's National AI Strategy and its implementation through the analysis of strategic documents, publicly available financial records, and expert interviews with the Hungarian AI Coalition President and Chief Strategic Advisor to the Government Commissioner for AI. 22 goals from Hungary's strategy were evaluated through conceptual, governance, temporal, and financial dimensions before being benchmarked against Singapore's National AI Strategies (NAIS 1.0 and NAIS 2.0). Key findings include an estimated total of EUR 4.65 billion in AI-related public investment in Hungary. Openly available financial data was found for only half of the evaluated goals, and just three projects made up 98\% of all documented funding. The research also reveals Hungary's implementation challenges, including fragmented execution following ministerial reorganizations and the absence of designated biennial reviews since 2020. Furthermore, the paper provides targeted recommendations for Hungary's forthcoming AI strategy, drawing on Singapore's framework as a reference point. These include adapting to the era of large language models, restructuring the existing triple helix network to foster more effective dialogue and advocacy, and positioning the country as an East-West bridge for automotive AI experimentation.
Updated: 2025-07-04 09:12:47
标题: 匈牙利和人工智能:与新加坡相比的努力和机遇
摘要: 这项研究通过分析战略文件、公开可获得的财务记录以及对匈牙利人工智能联盟主席和人工智能委员会政府专员首席战略顾问的专家访谈,评估了匈牙利国家人工智能战略及其实施情况。对匈牙利战略中的22个目标进行了概念、治理、时间和财务维度的评估,然后与新加坡国家人工智能战略(NAIS 1.0和NAIS 2.0)进行了对比。主要发现包括匈牙利在人工智能相关公共投资方面估计总额达46.5亿欧元。仅有一半的评估目标有公开可获得的财务数据,而仅三个项目就占据了所有记录资金的98%。研究还揭示了匈牙利的实施挑战,包括部长重组后分散的执行以及自2020年以来缺乏指定的两年一次审查。此外,本文针对匈牙利即将出台的人工智能战略提出了有针对性的建议,借鉴新加坡的框架作为参考点。这些建议包括适应大型语言模型时代,重组现有的三螺旋网络以促进更有效的对话和倡导,并将该国定位为东西方汽车人工智能实验的桥梁。
更新时间: 2025-07-04 09:12:47
领域: cs.CY,cs.AI
On the Effectiveness of the $z$-Transform Method in Quadratic Optimization
The $z$-transform of a sequence is a classical tool used within signal processing, control theory, computer science, and electrical engineering. It allows for studying sequences from their generating functions, with many operations that can be equivalently defined on the original sequence and its $z$-transform. In particular, the $z$-transform method focuses on asymptotic behaviors and allows the use of Taylor expansions. We present a sequence of results of increasing significance and difficulty for linear models and optimization algorithms, demonstrating the effectiveness and versatility of the $z$-transform method in deriving new asymptotic results. Starting from the simplest gradient descent iterations in an infinite-dimensional Hilbert space, we show how the spectral dimension characterizes the convergence behavior. We then extend the analysis to Nesterov acceleration, averaging techniques, and stochastic gradient descent.
Updated: 2025-07-04 09:12:23
标题: 关于$z$-变换方法在二次优化中的有效性
摘要: 序列的$z$-变换是信号处理、控制理论、计算机科学和电气工程中常用的经典工具。它允许从生成函数的角度研究序列,同时可以在原始序列及其$z$-变换上等效定义许多操作。特别是,$z$-变换方法专注于渐近行为,并允许使用泰勒展开。我们提出了一系列逐渐增加重要性和难度的结果,涉及线性模型和优化算法,展示了$z$-变换方法在推导新的渐近结果中的有效性和多样性。我们从无限维希尔伯特空间中最简单的梯度下降迭代开始,展示了谱维度是收敛行为的特征。然后我们将分析扩展到Nesterov加速、平均技术和随机梯度下降。
更新时间: 2025-07-04 09:12:23
领域: cs.LG,math.OC
Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images
To advance real-world fashion image editing, we analyze existing two-stage pipelines(mask generation followed by diffusion-based editing)which overly prioritize generator optimization while neglecting mask controllability. This results in two critical limitations: I) poor user-defined flexibility (coarse-grained human masks restrict edits to predefined regions like upper torso; fine-grained clothes masks preserve poses but forbid style/length customization). II) weak pose robustness (mask generators fail due to articulated poses and miss rare regions like waist, while human parsers remain limited by predefined categories). To address these gaps, we propose Pose-Star, a framework that dynamically recomposes body structures (e.g., neck, chest, etc.) into anatomy-aware masks (e.g., chest-length) for user-defined edits. In Pose-Star, we calibrate diffusion-derived attention (Star tokens) via skeletal keypoints to enhance rare structure localization in complex poses, suppress noise through phase-aware analysis of attention dynamics (Convergence,Stabilization,Divergence) with threshold masking and sliding-window fusion, and refine edges via cross-self attention merging and Canny alignment. This work bridges controlled benchmarks and open-world demands, pioneering anatomy-aware, pose-robust editing and laying the foundation for industrial fashion image editing.
Updated: 2025-07-04 09:09:11
标题: Pose-Star: 适用于开放世界时尚图像的解剖感知编辑
摘要: 为了推进现实世界的时尚图像编辑,我们分析了现有的两阶段流程(生成蒙版,然后进行基于扩散的编辑),这些流程过分强调生成器的优化,而忽略了蒙版的可控性。这导致了两个关键限制:I)用户定义的灵活性差(粗粒度的人类蒙版限制了编辑到预定义区域,如上半身;细粒度的服装蒙版保留了姿势,但禁止了风格/长度的定制)。II)姿势鲁棒性弱(蒙版生成器因关节姿势而失败,错过了稀有区域如腰部,而人类解析器受限于预定义类别)。为了解决这些差距,我们提出了Pose-Star,这是一个框架,可以动态地将身体结构(如颈部、胸部等)重新组合成具有解剖学意识的蒙版(如胸部长度),用于用户定义的编辑。在Pose-Star中,我们通过骨架关键点校准扩散导出的注意力(Star tokens),以增强在复杂姿势中稀有结构的定位,通过阈值蒙版和滑动窗口融合的相位感知分析(收敛、稳定、发散)来抑制噪音,并通过交叉自注意融合和Canny对齐来细化边缘。这项工作架起了受控基准和开放世界需求之间的桥梁,开创了具有解剖学意识、姿势鲁棒性的编辑,并为工业时尚图像编辑奠定了基础。
更新时间: 2025-07-04 09:09:11
领域: cs.CV,cs.AI
TerraMind: Large-Scale Generative Multimodality for Earth Observation
We present TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation (EO). Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data across modalities. On a token level, TerraMind encodes high-level contextual information to learn cross-modal relationships, while on a pixel level, TerraMind leverages fine-grained representations to capture critical spatial nuances. We pretrained TerraMind on nine geospatial modalities of a global, large-scale dataset. In this paper, we demonstrate that (i) TerraMind's dual-scale early fusion approach unlocks a range of zero-shot and few-shot applications for Earth observation, (ii) TerraMind introduces "Thinking-in-Modalities" (TiM) -- the capability of generating additional artificial data during finetuning and inference to improve the model output -- and (iii) TerraMind achieves beyond state-of-the-art performance in community-standard benchmarks for EO like PANGAEA. The pretraining dataset, the model weights, and our code are open-sourced under a permissive license.
Updated: 2025-07-04 09:02:02
标题: TerraMind:面向地球观测的大规模生成式多模态技术
摘要: 我们提出了 TerraMind,这是第一个用于地球观测(EO)的任意到任意生成式多模态基础模型。与其他多模态模型不同,TerraMind 在双尺度表示上进行预训练,结合了跨模态的令牌级和像素级数据。在令牌级别上,TerraMind 对高级上下文信息进行编码,以学习跨模态关系,而在像素级别上,TerraMind 利用细粒度表示来捕捉关键的空间细微差别。我们在一个全球大规模数据集的九个地理空间模态上预训练了 TerraMind。在本文中,我们展示了(i)TerraMind 的双尺度早期融合方法解锁了一系列的零样本和少样本应用于地球观测,(ii)TerraMind 引入了“思考模态”(TiM)--在微调和推断过程中生成额外的人工数据以改进模型输出的能力--以及(iii)TerraMind 在 EO 的社区标准基准测试中实现了超越最新技术的性能,如 PANGAEA。预训练数据集、模型权重和我们的代码都在一项宽松许可下开源。
更新时间: 2025-07-04 09:02:02
领域: cs.CV,cs.AI
Absolute Evaluation Measures for Machine Learning: A Survey
Machine Learning is a diverse field applied across various domains such as computer science, social sciences, medicine, chemistry, and finance. This diversity results in varied evaluation approaches, making it difficult to compare models effectively. Absolute evaluation measures offer a practical solution by assessing a model's performance on a fixed scale, independent of reference models and data ranges, enabling explicit comparisons. However, many commonly used measures are not universally applicable, leading to a lack of comprehensive guidance on their appropriate use. This survey addresses this gap by providing an overview of absolute evaluation metrics in ML, organized by the type of learning problem. While classification metrics have been extensively studied, this work also covers clustering, regression, and ranking metrics. By grouping these measures according to the specific ML challenges they address, this survey aims to equip practitioners with the tools necessary to select appropriate metrics for their models. The provided overview thus improves individual model evaluation and facilitates meaningful comparisons across different models and applications.
Updated: 2025-07-04 08:53:08
标题: 机器学习的绝对评估措施:一项调查
摘要: 机器学习是一个广泛应用于计算机科学、社会科学、医学、化学和金融等领域的多样化领域。这种多样性导致了各种不同的评估方法,使得有效比较模型变得困难。绝对评估指标通过在固定尺度上评估模型的性能,独立于参考模型和数据范围,提供了一个实用的解决方案,使得可以进行明确的比较。然而,许多常用的评估指标并不普遍适用,导致缺乏对其适当使用的全面指导。本调查通过按照学习问题的类型组织,提供了机器学习中绝对评估指标的概述。虽然分类指标得到了广泛研究,但这项工作还涵盖了聚类、回归和排序度量。通过根据它们解决的具体机器学习挑战对这些度量进行分类,本调查旨在为从业者提供选择适当度量的工具。提供的概述改进了个别模型评估,并促进了在不同模型和应用之间进行有意义比较。
更新时间: 2025-07-04 08:53:08
领域: cs.LG
ReservoirChat: Interactive Documentation Enhanced with LLM and Knowledge Graph for ReservoirPy
We introduce a tool designed to improve the capabilities of Large Language Models (LLMs) in assisting with code development using the ReservoirPy library, as well as in answering complex questions in the field of Reservoir Computing. By incorporating external knowledge through Retrieval-Augmented Generation (RAG) and knowledge graphs, our approach aims to reduce hallucinations and increase the factual accuracy of generated responses. The system provides an interactive experience similar to ChatGPT, tailored specifically for ReservoirPy, enabling users to write, debug, and understand Python code while accessing reliable domain-specific insights. In our evaluation, while proprietary models such as ChatGPT-4o and NotebookLM performed slightly better on general knowledge questions, our model outperformed them on coding tasks and showed a significant improvement over its base model, Codestral-22B.
Updated: 2025-07-04 08:48:15
标题: ReservoirChat: 使用LLM和知识图增强的ReservoirPy交互式文档
摘要: 我们介绍了一种旨在提高大型语言模型(LLMs)在使用ReservoirPy库协助代码开发和回答复杂问题方面的能力的工具。通过通过检索增强生成(RAG)和知识图谱将外部知识整合进来,我们的方法旨在减少幻觉并提高生成的响应的事实准确性。该系统提供了类似于ChatGPT的交互体验,专门为ReservoirPy定制,使用户能够在访问可靠的领域特定见解的同时编写、调试和理解Python代码。在我们的评估中,尽管专有模型如ChatGPT-4o和NotebookLM在一般知识问题上表现稍好,但我们的模型在编码任务上表现更好,并且在其基本模型Codestral-22B上显示出显著改进。
更新时间: 2025-07-04 08:48:15
领域: cs.SE,cs.AI,cs.CL,cs.NE
Breaking the Bulkhead: Demystifying Cross-Namespace Reference Vulnerabilities in Kubernetes Operators
Kubernetes Operators, automated tools designed to manage application lifecycles within Kubernetes clusters, extend the functionalities of Kubernetes, and reduce the operational burden on human engineers. While Operators significantly simplify DevOps workflows, they introduce new security risks. In particular, Kubernetes enforces namespace isolation to separate workloads and limit user access, ensuring that users can only interact with resources within their authorized namespaces. However, Kubernetes Operators often demand elevated privileges and may interact with resources across multiple namespaces. This introduces a new class of vulnerabilities, the Cross-Namespace Reference Vulnerability. The root cause lies in the mismatch between the declared scope of resources and the implemented scope of the Operator logic, resulting in Kubernetes being unable to properly isolate the namespace. Leveraging such vulnerability, an adversary with limited access to a single authorized namespace may exploit the Operator to perform operations affecting other unauthorized namespaces, causing Privilege Escalation and further impacts. To the best of our knowledge, this paper is the first to systematically investigate the security vulnerability of Kubernetes Operators. We present Cross-Namespace Reference Vulnerability with two strategies, demonstrating how an attacker can bypass namespace isolation. Through large-scale measurements, we found that over 14% of Operators in the wild are potentially vulnerable. Our findings have been reported to the relevant developers, resulting in 7 confirmations and 6 CVEs by the time of submission, affecting vendors including ****** and ******, highlighting the critical need for enhanced security practices in Kubernetes Operators. To mitigate it, we also open-source the static analysis suite to benefit the ecosystem.
Updated: 2025-07-04 08:44:24
标题: 突破防火墙:解密Kubernetes运营商中的跨命名空间引用漏洞
摘要: Kubernetes运算符是一种自动化工具,旨在管理Kubernetes集群中的应用程序生命周期,扩展了Kubernetes的功能,并减轻了人工工程师的操作负担。虽然运算符显着简化了DevOps工作流程,但它们引入了新的安全风险。特别是,Kubernetes强制实施命名空间隔离,以分离工作负载并限制用户访问,确保用户只能与其授权的命名空间内的资源交互。然而,Kubernetes运算符通常需要提升的权限,并可能与跨多个命名空间的资源进行交互。这引入了一类新的漏洞,即跨命名空间引用漏洞。根本原因在于资源声明范围与运算符逻辑实现范围不匹配,导致Kubernetes无法正确隔离命名空间。利用这种漏洞,具有对单个授权命名空间的有限访问权限的对手可能利用运算符执行影响其他未经授权命名空间的操作,导致特权升级和进一步影响。据我们所知,本文是首次系统地研究Kubernetes运算符的安全漏洞。我们提出了跨命名空间引用漏洞和两种策略,演示了攻击者如何绕过命名空间隔离。通过大规模测量,我们发现野外超过14%的运算符可能存在潜在漏洞。我们已将发现报告给相关开发人员,在提交时获得了7个确认和6个CVE,影响包括******和******在内的厂商,突显了Kubernetes运算符中加强安全实践的迫切需要。为了减轻这一问题,我们还开源了静态分析套件,以造福生态系统。
更新时间: 2025-07-04 08:44:24
领域: cs.CR
On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages
Selective state-space models (SSMs) are an emerging alternative to the Transformer, offering the unique advantage of parallel training and sequential inference. Although these models have shown promising performance on a variety of tasks, their formal expressiveness and length generalization properties remain underexplored. In this work, we provide insight into the workings of selective SSMs by analyzing their expressiveness and length generalization performance on regular language tasks, i.e., finite-state automaton (FSA) emulation. We address certain limitations of modern SSM-based architectures by introducing the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization on a set of various regular language tasks using a single layer. It utilizes a dictionary of dense transition matrices, a softmax selection mechanism that creates a convex combination of dictionary matrices at each time step, and a readout consisting of layer normalization followed by a linear map. We then proceed to evaluate variants of diagonal selective SSMs by considering their empirical performance on commutative and non-commutative automata. We explain the experimental results with theoretical considerations. Our code is available at https://github.com/IBM/selective-dense-state-space-model.
Updated: 2025-07-04 08:39:27
标题: 关于正规语言上选择性状态空间模型的表达能力和长度泛化
摘要: 选择性状态空间模型(SSMs)是一种新兴的替代方案,与Transformer相比,它们具有并行训练和顺序推理的独特优势。尽管这些模型在各种任务上表现出有希望的性能,但它们的形式表达能力和长度泛化性质仍未得到充分探讨。在这项工作中,我们通过分析选择性SSMs在正则语言任务上的表达能力和长度泛化性能,即有限状态自动机(FSA)模拟,来揭示选择性SSMs的工作原理。我们通过引入选择性密集状态空间模型(SD-SSM),解决了现代基于SSM的架构的某些局限性,这是第一个在一层上展现出对各种正则语言任务完美长度泛化的选择性SSM。它利用了一个密集转移矩阵字典,一个softmax选择机制,在每个时间步创建字典矩阵的凸组合,以及一个由层归一化和线性映射组成的读出。然后,我们通过考虑它们在可交换和不可交换自动机上的实证表现来评估对角选择性SSMs的变体。我们用理论考虑解释实验结果。我们的代码可在https://github.com/IBM/selective-dense-state-space-model上找到。
更新时间: 2025-07-04 08:39:27
领域: cs.LG,cs.AI,cs.CL
CMD-HAR: Cross-Modal Disentanglement for Wearable Human Activity Recognition
Human Activity Recognition (HAR) is a fundamental technology for numerous human - centered intelligent applications. Although deep learning methods have been utilized to accelerate feature extraction, issues such as multimodal data mixing, activity heterogeneity, and complex model deployment remain largely unresolved. The aim of this paper is to address issues such as multimodal data mixing, activity heterogeneity, and complex model deployment in sensor-based human activity recognition. We propose a spatiotemporal attention modal decomposition alignment fusion strategy to tackle the problem of the mixed distribution of sensor data. Key discriminative features of activities are captured through cross-modal spatio-temporal disentangled representation, and gradient modulation is combined to alleviate data heterogeneity. In addition, a wearable deployment simulation system is constructed. We conducted experiments on a large number of public datasets, demonstrating the effectiveness of the model.
Updated: 2025-07-04 08:36:03
标题: CMD-HAR:可穿戴式人体活动识别的跨模态解缠
摘要: 人类活动识别(HAR)是许多以人为中心的智能应用的基本技术。尽管深度学习方法已被用于加速特征提取,但诸如多模态数据混合、活动异质性和复杂模型部署等问题仍未得到解决。本文旨在解决传感器基础人类活动识别中的多模态数据混合、活动异质性和复杂模型部署等问题。我们提出了一种时空注意模态分解对齐融合策略,以解决传感器数据混合分布的问题。通过跨模态时空解耦表示捕获活动的关键判别特征,并结合梯度调节减轻数据异质性。此外,构建了一个可穿戴部署模拟系统。我们在大量公共数据集上进行了实验,证明了模型的有效性。
更新时间: 2025-07-04 08:36:03
领域: cs.CV,cs.AI
Learning Traffic Anomalies from Generative Models on Real-Time Observations
Accurate detection of traffic anomalies is crucial for effective urban traffic management and congestion mitigation. We use the Spatiotemporal Generative Adversarial Network (STGAN) framework combining Graph Neural Networks and Long Short-Term Memory networks to capture complex spatial and temporal dependencies in traffic data. We apply STGAN to real-time, minute-by-minute observations from 42 traffic cameras across Gothenburg, Sweden, collected over several months in 2020. The images are processed to compute a flow metric representing vehicle density, which serves as input for the model. Training is conducted on data from April to November 2020, and validation is performed on a separate dataset from November 14 to 23, 2020. Our results demonstrate that the model effectively detects traffic anomalies with high precision and low false positive rates. The detected anomalies include camera signal interruptions, visual artifacts, and extreme weather conditions affecting traffic flow.
Updated: 2025-07-04 08:35:52
标题: 从实时观测数据中学习交通异常的生成模型
摘要: 准确检测交通异常对于有效的城市交通管理和缓解拥堵至关重要。我们使用空间时间生成对抗网络(STGAN)框架,结合图神经网络和长短期记忆网络,捕捉交通数据中复杂的空间和时间依赖关系。我们将STGAN应用于2020年在瑞典哥德堡收集的42个交通摄像头的实时、每分钟观测数据。图像经过处理,计算出代表车辆密度的流量指标,作为模型的输入。训练使用了2020年4月至11月的数据,验证则使用了2020年11月14日至23日的另一个数据集。我们的结果表明,该模型能够高精度地检测交通异常,假阳性率低。检测到的异常包括摄像头信号中断、视觉伪影以及影响交通流量的极端天气条件。
更新时间: 2025-07-04 08:35:52
领域: cs.LG,cs.AI,cs.CV
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Large language models (LLMs) can acquire new knowledge through fine-tuning, but this process exhibits a puzzling duality: models can generalize remarkably from new facts, yet are also prone to hallucinating incorrect information. However, the reasons for this phenomenon remain poorly understood. In this work, we argue that both behaviors stem from a single mechanism known as out-of-context reasoning (OCR): the ability to deduce implications by associating concepts, even those without a causal link. Our experiments across five prominent LLMs confirm that OCR indeed drives both generalization and hallucination, depending on whether the associated concepts are causally related. To build a rigorous theoretical understanding of this phenomenon, we then formalize OCR as a synthetic factual recall task. We empirically show that a one-layer single-head attention-only transformer with factorized output and value matrices can learn to solve this task, while a model with combined weights cannot, highlighting the crucial role of matrix factorization. Our theoretical analysis shows that the OCR capability can be attributed to the implicit bias of gradient descent, which favors solutions that minimize the nuclear norm of the combined output-value matrix. This mathematical structure explains why the model learns to associate facts and implications with high sample efficiency, regardless of whether the correlation is causal or merely spurious. Ultimately, our work provides a theoretical foundation for understanding the OCR phenomenon, offering a new lens for analyzing and mitigating undesirable behaviors from knowledge injection.
Updated: 2025-07-04 08:35:38
标题: 泛化还是幻觉?理解变压器中的上下文推理
摘要: 大型语言模型(LLMs)可以通过微调获得新知识,但这个过程表现出令人困惑的二重性:模型可以从新事实中表现出非凡的泛化能力,但也容易产生错误信息的幻觉。然而,对于这一现象的原因仍然知之甚少。在这项工作中,我们认为这两种行为都源自一种称为“脱离上下文推理”(OCR)的单一机制:通过关联概念来推断含义,即使这些概念之间没有因果关系。我们在五个著名的LLMs上进行的实验证实,OCR确实驱动了泛化和幻觉,这取决于关联的概念是否具有因果关系。为了建立对这一现象的严格理论理解,我们将OCR正式化为一个合成的事实回忆任务。我们在实验证明,一个仅具有一层单头注意力的变压器,且具有因式分解的输出和值矩阵,可以学会解决这个任务,而具有组合权重的模型则不能,突出了矩阵因式分解的关键作用。我们的理论分析表明,OCR能力可以归因于梯度下降的隐式偏差,这有利于最小化组合输出值矩阵的核范数的解。这种数学结构解释了为什么模型能够高效地学会将事实和含义相关联,无论相关性是因果关系还是仅仅是虚假的。最终,我们的工作为理解OCR现象提供了理论基础,为分析和减轻知识注入带来的不良行为提供了一个新的视角。
更新时间: 2025-07-04 08:35:38
领域: cs.CL,cs.LG
LLM4Hint: Leveraging Large Language Models for Hint Recommendation in Offline Query Optimization
Query optimization is essential for efficient SQL query execution in DBMS, and remains attractive over time due to the growth of data volumes and advances in hardware. Existing traditional optimizers struggle with the cumbersome hand-tuning required for complex workloads, and the learning-based methods face limitations in ensuring generalization. With the great success of Large Language Model (LLM) across diverse downstream tasks, this paper explores how LLMs can be incorporated to enhance the generalization of learned optimizers. Though promising, such an incorporation still presents challenges, mainly including high model inference latency, and the substantial fine-tuning cost and suboptimal performance due to inherent discrepancy between the token sequences in LLM and structured SQL execution plans with rich numerical features. In this paper, we focus on recurring queries in offline optimization to alleviate the issue of high inference latency, and propose \textbf{LLM4Hint} that leverages moderate-sized backbone LLMs to recommend query optimization hints. LLM4Hint achieves the goals through: (i) integrating a lightweight model to produce a soft prompt, which captures the data distribution in DBMS and the SQL predicates to provide sufficient optimization features while simultaneously reducing the context length fed to the LLM, (ii) devising a query rewriting strategy using a larger commercial LLM, so as to simplify SQL semantics for the backbone LLM and reduce fine-tuning costs, and (iii) introducing an explicit matching prompt to facilitate alignment between the LLM and the lightweight model, which can accelerate convergence of the combined model. Experiments show that LLM4Hint, by leveraging the LLM's stronger capability to understand the query statement, can outperform the state-of-the-art learned optimizers in terms of both effectiveness and generalization.
Updated: 2025-07-04 08:32:17
标题: LLM4Hint:利用大型语言模型为离线查询优化中的提示推荐提供支持
摘要: 查询优化对于数据库管理系统中高效的SQL查询执行至关重要,并随着数据量的增长和硬件的进步而变得越来越吸引人。现有的传统优化器在处理复杂工作负载时需要繁琐的手动调优,而基于学习的方法在确保泛化性方面存在限制。鉴于大型语言模型(LLM)在各种下游任务中取得的巨大成功,本文探讨了如何将LLM整合到学习优化器中以增强泛化能力。尽管具有潜力,这种整合仍面临挑战,主要包括模型推理延迟高、基于LLM的token序列与具有丰富数值特征的结构化SQL执行计划之间的固有差异导致的大量微调成本和次优性能。 本文着重于离线优化中的重复查询,以缓解推理延迟问题,并提出了\textbf{LLM4Hint},利用中等规模的主干LLM推荐查询优化提示。LLM4Hint通过以下方式实现目标:(i)集成轻量级模型生成软提示,捕捉DBMS中的数据分布和SQL谓词,提供足够的优化特征同时减少输入给LLM的上下文长度;(ii)设计使用更大型商用LLM的查询重写策略,简化主干LLM的SQL语义并降低微调成本;(iii)引入显式匹配提示以促进LLM和轻量级模型之间的对齐,加速组合模型的收敛。实验证明,通过利用LLM更强大的理解查询语句能力,LLM4Hint在效果和泛化性方面均能超越现有学习优化器的最新水平。
更新时间: 2025-07-04 08:32:17
领域: cs.DB,cs.AI
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
Post-training processes are essential phases in grounding pre-trained language models to real-world tasks, with learning from demonstrations or preference signals playing a crucial role in this adaptation. We present a unified theoretical framework bridging Supervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training. Through rigorous mathematical derivation, we demonstrate that both SFT and preference learning methods like Direct Preference Optimization (DPO) operate within the same optimal policy-reward subspace, with SFT representing a special case of implicit reward learning. Our analysis reveals a critical limitation in conventional SFT: the KL divergence term in distribution matching becomes constant with respect to the policy during optimization, failing to constrain model updates. To address this, we propose a simple yet effective learning rate reduction approach that yields significant performance improvements (up to \textbf{25\%} relative gain and \textbf{6\%} absolute win rate increase in instruction following tasks. Additionally, we derive alternative SFT objectives from various f-divergence functions that preserve the KL term during optimization, further enhancing post-DPO model performance. Finally, we extend the theoretical relationship between LLM logits and Q-functions from preference learning to the SFT context, providing mathematical derivations and experimental validation.
Updated: 2025-07-04 08:16:16
标题: 隐式奖励作为桥梁:对SFT和DPO连接的统一视角
摘要: 培训后的过程是将预先训练的语言模型与真实任务相关联的关键阶段,通过从演示或偏好信号中学习在这种适应中发挥关键作用。我们提出了一个统一的理论框架,将监督微调(SFT)和大型语言模型(LLM)后训练中的偏好学习联系起来。通过严格的数学推导,我们证明了SFT和直接偏好优化(DPO)等偏好学习方法在同一优化策略-奖励子空间内运行,其中SFT代表隐式奖励学习的一个特例。我们的分析揭示了传统SFT中的一个关键限制:在优化过程中,分布匹配中的KL散度项对于策略是恒定的,未能约束模型更新。为了解决这个问题,我们提出了一个简单而有效的学习率降低方法,可以显著提高性能(在遵循指令任务中相对增益高达25%,绝对赢率增加6%)。此外,我们从各种f-散度函数中推导出替代SFT目标,以在优化过程中保留KL项,进一步提高后续DPO模型的性能。最后,我们将LLM logits和Q-functions之间的理论关系从偏好学习扩展到SFT环境中,提供数学推导和实验验证。
更新时间: 2025-07-04 08:16:16
领域: cs.LG,cs.AI,cs.CL
MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning
Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks. However, current algorithms suffer from low sample efficiency, limiting their practical applicability. In this work, we present MENTOR, a method that improves both the architecture and optimization of RL agents. Specifically, MENTOR replaces the standard multi-layer perceptron (MLP) with a mixture-of-experts (MoE) backbone and introduces a task-oriented perturbation mechanism. MENTOR outperforms state-of-the-art methods across three simulation benchmarks and achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks, significantly surpassing the 32% success rate of the strongest existing model-free visual RL algorithm. These results underscore the importance of sample efficiency in advancing visual RL for real-world robotics. Experimental videos are available at https://suninghuang19.github.io/mentor_page/.
Updated: 2025-07-04 08:13:58
标题: 导师:具有任务定向扰动的混合专家网络用于视觉强化学习
摘要: 视觉深度强化学习(RL)使机器人能够从视觉输入中获得无结构任务的技能。然而,当前的算法存在样本效率低的问题,限制了它们的实际应用。在这项工作中,我们提出了MENTOR,一种改进RL代理的架构和优化的方法。具体来说,MENTOR用混合专家(MoE)骨干替换了标准的多层感知器(MLP),并引入了一个面向任务的扰动机制。MENTOR在三个仿真基准测试中优于最先进的方法,并在三个具有挑战性的真实世界机器人操作任务中取得了平均83%的成功率,明显超过最强的现有无模型视觉RL算法的32%的成功率。这些结果突显了样本效率在推动视觉RL在真实世界机器人技术中的重要性。实验视频可在https://suninghuang19.github.io/mentor_page/上找到。
更新时间: 2025-07-04 08:13:58
领域: cs.RO,cs.LG
Action Robust Reinforcement Learning via Optimal Adversary Aware Policy Optimization
Reinforcement Learning (RL) has achieved remarkable success in sequential decision tasks. However, recent studies have revealed the vulnerability of RL policies to different perturbations, raising concerns about their effectiveness and safety in real-world applications. In this work, we focus on the robustness of RL policies against action perturbations and introduce a novel framework called Optimal Adversary-aware Policy Iteration (OA-PI). Our framework enhances action robustness under various perturbations by evaluating and improving policy performance against the corresponding optimal adversaries. Besides, our approach can be integrated into mainstream DRL algorithms such as Twin Delayed DDPG (TD3) and Proximal Policy Optimization (PPO), improving action robustness effectively while maintaining nominal performance and sample efficiency. Experimental results across various environments demonstrate that our method enhances robustness of DRL policies against different action adversaries effectively.
Updated: 2025-07-04 08:11:15
标题: 通过最优对手感知策略优化实现稳健强化学习
摘要: 强化学习(RL)在顺序决策任务中取得了显著成功。然而,最近的研究揭示了RL策略对不同干扰的脆弱性,引发了关于它们在实际应用中的有效性和安全性的担忧。在这项工作中,我们专注于RL策略对行动干扰的鲁棒性,并引入了一个名为Optimal Adversary-aware Policy Iteration(OA-PI)的新框架。我们的框架通过评估和改进策略性能来增强对各种干扰的行动鲁棒性,以应对相应的最优对手。此外,我们的方法可以集成到主流的DRL算法中,如双延迟DDPG(TD3)和Proximal Policy Optimization(PPO),有效提高行动鲁棒性,同时保持名义性能和样本效率。在各种环境中的实验结果表明,我们的方法有效地增强了DRL策略对不同行动对手的鲁棒性。
更新时间: 2025-07-04 08:11:15
领域: cs.LG
Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education
Students disengaging from their tasks can have serious long-term consequences, including academic drop-out. This is particularly relevant for students in distance education. One way to measure the level of disengagement in distance education is to observe participation in non-mandatory exercises in different online courses. In this paper, we detect student disengagement in the non-mandatory quizzes of 42 courses in four semesters from a distance-based university. We carefully identified the most informative student log data that could be extracted and processed from Moodle. Then, eight machine learning algorithms were trained and compared to obtain the highest possible prediction accuracy. Using the SHAP method, we developed an explainable machine learning framework that allows practitioners to better understand the decisions of the trained algorithm. The experimental results show a balanced accuracy of 91\%, where about 85\% of disengaged students were correctly detected. On top of the highly predictive performance and explainable framework, we provide a discussion on how to design a timely intervention to minimise disengagement from voluntary tasks in online learning.
Updated: 2025-07-04 08:10:49
标题: 检测自愿测验中的脱离:高等远程教育中的可解释机器学习方法
摘要: 学生从任务中脱离可能会带来严重的长期后果,包括辍学。这对远程教育中的学生特别相关。衡量远程教育中脱离程度的一种方式是观察不同在线课程中非强制性练习的参与情况。在本文中,我们检测了一所远程大学四个学期中42门课程的非强制性测验中学生的脱离情况。我们仔细确定了可以从Moodle中提取和处理的最具信息量的学生日志数据。然后,我们训练并比较了八种机器学习算法,以获得尽可能高的预测准确性。使用SHAP方法,我们开发了一个可解释的机器学习框架,使从业者能够更好地理解训练算法的决策。实验结果显示平衡准确率为91\%,大约85\%的脱离学生被正确检测到。除了高度预测性能和可解释性框架,我们还讨论了如何设计及时干预措施,以最小化在线学习中对自愿任务的脱离。
更新时间: 2025-07-04 08:10:49
领域: cs.AI,cs.LG
A Real-Time Digital Twin for Type 1 Diabetes using Simulation-Based Inference
Accurately estimating parameters of physiological models is essential to achieving reliable digital twins. For Type 1 Diabetes, this is particularly challenging due to the complexity of glucose-insulin interactions. Traditional methods based on Markov Chain Monte Carlo struggle with high-dimensional parameter spaces and fit parameters from scratch at inference time, making them slow and computationally expensive. In this study, we propose a Simulation-Based Inference approach based on Neural Posterior Estimation to efficiently capture the complex relationships between meal intake, insulin, and glucose level, providing faster, amortized inference. Our experiments demonstrate that SBI not only outperforms traditional methods in parameter estimation but also generalizes better to unseen conditions, offering real-time posterior inference with reliable uncertainty quantification.
Updated: 2025-07-04 08:09:31
标题: 一个基于模拟推断的用于1型糖尿病的实时数字孪生系统
摘要: 精确估计生理模型参数对于实现可靠的数字孪生体是至关重要的。对于1型糖尿病来说,由于葡萄糖胰岛素相互作用的复杂性,这尤为具有挑战性。基于马尔可夫链蒙特卡洛的传统方法在处理高维参数空间时遇到困难,并且在推断时需要从零开始拟合参数,使其变得缓慢且计算成本高昂。在这项研究中,我们提出了一种基于神经后验估计的基于模拟的推断方法,以有效捕捉进餐、胰岛素和血糖水平之间的复杂关系,提供更快、分摊推断。我们的实验证明,基于模拟的推断不仅在参数估计方面优于传统方法,而且对未见条件具有更好的泛化性,提供可靠的不确定性量化的实时后验推断。
更新时间: 2025-07-04 08:09:31
领域: cs.LG,q-bio.QM
Deep Retrieval at CheckThat! 2025: Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval and Re-Ranking
We present the methodology and results of the Deep Retrieval team for subtask 4b of the CLEF CheckThat! 2025 competition, which focuses on retrieving relevant scientific literature for given social media posts. To address this task, we propose a hybrid retrieval pipeline that combines lexical precision, semantic generalization, and deep contextual re-ranking, enabling robust retrieval that bridges the informal-to-formal language gap. Specifically, we combine BM25-based keyword matching with a FAISS vector store using a fine-tuned INF-Retriever-v1 model for dense semantic retrieval. BM25 returns the top 30 candidates, and semantic search yields 100 candidates, which are then merged and re-ranked via a large language model (LLM)-based cross-encoder. Our approach achieves a mean reciprocal rank at 5 (MRR@5) of 76.46% on the development set and 66.43% on the hidden test set, securing the 1st position on the development leaderboard and ranking 3rd on the test leaderboard (out of 31 teams), with a relative performance gap of only 2 percentage points compared to the top-ranked system. We achieve this strong performance by running open-source models locally and without external training data, highlighting the effectiveness of a carefully designed and fine-tuned retrieval pipeline.
Updated: 2025-07-04 08:06:20
标题: 在CheckThat! 2025进行深度检索:通过混合检索和重新排序识别隐含社交媒体提及的科学论文
摘要: 我们展示了CLEF CheckThat! 2025竞赛4b子任务的Deep Retrieval团队的方法论和结果,该竞赛专注于检索与给定社交媒体帖文相关的科学文献。为了解决这一任务,我们提出了一个混合检索流程,结合了词汇精确性、语义概括性和深层上下文重新排序,实现了能够弥合非正式到正式语言差距的稳健检索。具体来说,我们结合了基于BM25的关键词匹配和使用经过微调的INF-Retriever-v1模型进行稠密语义检索的FAISS向量存储。BM25返回前30个候选项,语义搜索产生100个候选项,然后通过基于大型语言模型(LLM)的交叉编码器进行合并和重新排序。 我们的方法在开发集上取得了5个均值倒数排名(MRR@5)为76.46%,在隐藏测试集上为66.43%,在开发榜单上获得了第一名,在测试榜单上排名第三(共31支队伍),与排名第一系统相比,相对性能差距仅为2个百分点。我们通过在本地运行开源模型且没有外部训练数据来实现这一强劲表现,突显了精心设计和微调的检索流程的有效性。
更新时间: 2025-07-04 08:06:20
领域: cs.IR,cs.AI
Adaptive Gate-Aware Mamba Networks for Magnetic Resonance Fingerprinting
Magnetic Resonance Fingerprinting (MRF) enables fast quantitative imaging by matching signal evolutions to a predefined dictionary. However, conventional dictionary matching suffers from exponential growth in computational cost and memory usage as the number of parameters increases, limiting its scalability to multi-parametric mapping. To address this, recent work has explored deep learning-based approaches as alternatives to DM. We propose GAST-Mamba, an end-to-end framework that combines a dual Mamba-based encoder with a Gate-Aware Spatial-Temporal (GAST) processor. Built on structured state-space models, our architecture efficiently captures long-range spatial dependencies with linear complexity. On 5 times accelerated simulated MRF data (200 frames), GAST-Mamba achieved a T1 PSNR of 33.12~dB, outperforming SCQ (31.69~dB). For T2 mapping, it reached a PSNR of 30.62~dB and SSIM of 0.9124. In vivo experiments further demonstrated improved anatomical detail and reduced artifacts. Ablation studies confirmed that each component contributes to performance, with the GAST module being particularly important under strong undersampling. These results demonstrate the effectiveness of GAST-Mamba for accurate and robust reconstruction from highly undersampled MRF acquisitions, offering a scalable alternative to traditional DM-based methods.
Updated: 2025-07-04 08:04:01
标题: 自适应门感知的曼巴网络在磁共振指纹成像中的应用
摘要: 磁共振指纹技术(MRF)通过将信号演变与预定义字典匹配,实现快速定量成像。然而,传统的字典匹配在参数数量增加时会导致计算成本和内存使用呈指数增长,限制了其在多参数映射中的可扩展性。为解决这一问题,最近的研究探索了基于深度学习的方法作为字典匹配的替代方案。我们提出了GAST-Mamba,这是一个端到端框架,结合了基于双Mamba的编码器和Gate-Aware空间-时间(GAST)处理器。基于结构化状态空间模型,我们的架构能够高效地捕获长距离空间依赖关系,具有线性复杂度。在5倍加速的模拟MRF数据(200帧)上,GAST-Mamba实现了33.12 dB的T1 PSNR,优于SCQ(31.69 dB)。对于T2映射,它达到了30.62 dB的PSNR和0.9124的SSIM。体内实验进一步展示了改善的解剖细节和减少的伪影。消融研究证实了每个组件对性能的贡献,其中GAST模块在严重欠采样情况下特别重要。这些结果表明了GAST-Mamba在高度欠采样的MRF采集中实现准确和稳健重建的有效性,为传统基于DM的方法提供了可扩展的替代方案。
更新时间: 2025-07-04 08:04:01
领域: eess.IV,cs.LG,eess.SP
Be the Change You Want to See: Revisiting Remote Sensing Change Detection Practices
Remote sensing change detection aims to localize semantic changes between images of the same location captured at different times. In the past few years, newer methods have attributed enhanced performance to the additions of new and complex components to existing architectures. Most fail to measure the performance contribution of fundamental design choices such as backbone selection, pre-training strategies, and training configurations. We claim that such fundamental design choices often improve performance even more significantly than the addition of new architectural components. Due to that, we systematically revisit the design space of change detection models and analyse the full potential of a well-optimised baseline. We identify a set of fundamental design choices that benefit both new and existing architectures. Leveraging this insight, we demonstrate that when carefully designed, even an architecturally simple model can match or surpass state-of-the-art performance on six challenging change detection datasets. Our best practices generalise beyond our architecture and also offer performance improvements when applied to related methods, indicating that the space of fundamental design choices has been underexplored. Our guidelines and architecture provide a strong foundation for future methods, emphasizing that optimizing core components is just as important as architectural novelty in advancing change detection performance. Code: https://github.com/blaz-r/BTC-change-detection
Updated: 2025-07-04 08:01:28
标题: 成为你想要看到的改变:重新审视遥感变化检测实践
摘要: 遥感变化检测旨在定位在不同时间捕获的同一位置的图像之间的语义变化。在过去几年中,较新的方法已经将性能提升归因于向现有架构添加新的复杂组件。大多数方法未能衡量基本设计选择(如骨干选择、预训练策略和训练配置)对性能的贡献。我们认为,这些基本设计选择通常比添加新的架构组件更显著地提高性能。因此,我们系统地重新审视变化检测模型的设计空间,并分析良好优化的基线的全部潜力。我们确定了一组有利于新旧架构的基本设计选择。利用这一洞察力,我们证明,经过精心设计,即使是架构简单的模型也可以在六个具有挑战性的变化检测数据集上达到或超越最先进的性能。我们的最佳实践适用于我们的架构以外,并在应用于相关方法时也提供性能改进,表明基本设计选择的空间尚未得到充分探索。我们的指南和架构为未来方法提供了坚实的基础,强调优化核心组件与架构创新同样重要,以提升变化检测性能。 代码:https://github.com/blaz-r/BTC-change-detection
更新时间: 2025-07-04 08:01:28
领域: cs.CV,cs.AI
Accelerating Private Heavy Hitter Detection on Continual Observation Streams
Differentially private frequency estimation and heavy hitter detection are core problems in the private analysis of data streams. Two models are typically considered: the one-pass model, which outputs results only at the end of the stream, and the continual observation model, which requires releasing private summaries at every time step. While the one-pass model allows more efficient solutions, continual observation better reflects scenarios where timely and ongoing insights are critical. In the one-pass setting, sketches have proven to be an effective tool for differentially private frequency analysis, as they can be privatized by a single injection of calibrated noise. In contrast, existing methods in the continual observation model add fresh noise to the entire sketch at every step, incurring high computational costs. This challenge is particularly acute for heavy hitter detection, where current approaches often require querying every item in the universe at each step, resulting in untenable per-update costs for large domains. To overcome these limitations, we introduce a new differentially private sketching technique based on lazy updates, which perturbs and updates only a small, rotating part of the output sketch at each time step. This significantly reduces computational overhead while maintaining strong privacy and utility guarantees. In comparison to prior art, for frequency estimation, our method improves the update time by a factor of $O(w)$ for sketches of dimension $d \times w$; for heavy hitter detection, it reduces per-update complexity from $\Omega(|U|)$ to $O(d \log w)$, where $U$ is the input domain. Experiments show a increase in throughput by a factor of~$250$, making differential privacy more practical for real-time, continual observation, applications.
Updated: 2025-07-04 07:49:00
标题: 加速私人重要人物检测在连续观察流上的应用
摘要: 差分私有频率估计和重要数据检测是数据流私有分析中的核心问题。通常考虑两种模型:一次遍历模型,仅在流的末尾输出结果;持续观察模型,需要在每个时间步释放私有摘要。一次遍历模型允许更有效的解决方案,而持续观察模型更好地反映了及时和持续洞察力至关重要的情景。 在一次遍历设置中,草图已被证明是差分私有频率分析的有效工具,因为它们可以通过一次注入校准噪声来私有化。相比之下,现有的持续观察模型中的方法在每一步都向整个草图添加新的噪音,导致高计算成本。这一挑战在重要数据检测中尤为严重,当前方法通常需要在每一步查询宇宙中的每个项目,导致大范围的每次更新成本不可行。 为了克服这些限制,我们引入了一种基于惰性更新的新的差分私有草图技术,它仅在每个时间步中扰动和更新输出草图的一小部分。这显著降低了计算开销,同时保持了强隐私性和实用性保证。与先前的技术相比,对于频率估计,我们的方法将更新时间提高了$O(w)$倍,对于维度为$d \times w$的草图;对于重要数据检测,它将每次更新的复杂度从$\Omega(|U|)$降低到$O(d \log w)$,其中$U$是输入域。实验显示,吞吐量增加了~$250$倍,使差分私有在实时、持续观察的应用中更加实用。
更新时间: 2025-07-04 07:49:00
领域: cs.CR
Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting
Recent research has demonstrated the effectiveness of Artificial Intelligence (AI), and more specifically, Large Language Models (LLMs), in supporting network configuration synthesis and automating network diagnosis tasks, among others. In this preliminary work, we restrict our focus to the application of AI agents to network troubleshooting and elaborate on the need for a standardized, reproducible, and open benchmarking platform, where to build and evaluate AI agents with low operational effort.
Updated: 2025-07-04 07:39:58
标题: 走向一个用于民主化AI代理在网络故障排除中的实验和基准测试的游乐场
摘要: 最近的研究已经证明了人工智能(AI)特别是大型语言模型(LLMs)在支持网络配置综合和自动化网络诊断任务等方面的有效性。在这项初步工作中,我们将重点放在将AI代理应用于网络故障排除,并详细阐述了需要一个标准化、可重现和开放的基准平台,用于构建和评估具有低运行工作量的AI代理。
更新时间: 2025-07-04 07:39:58
领域: cs.NI,cs.AI,cs.MA
Backtesting Sentiment Signals for Trading: Evaluating the Viability of Alpha Generation from Sentiment Analysis
Sentiment analysis, widely used in product reviews, also impacts financial markets by influencing asset prices through microblogs and news articles. Despite research in sentiment-driven finance, many studies focus on sentence-level classification, overlooking its practical application in trading. This study bridges that gap by evaluating sentiment-based trading strategies for generating positive alpha. We conduct a backtesting analysis using sentiment predictions from three models (two classification and one regression) applied to news articles on Dow Jones 30 stocks, comparing them to the benchmark Buy&Hold strategy. Results show all models produced positive returns, with the regression model achieving the highest return of 50.63% over 28 months, outperforming the benchmark Buy&Hold strategy. This highlights the potential of sentiment in enhancing investment strategies and financial decision-making.
Updated: 2025-07-04 07:32:59
标题: 回测情绪信号用于交易:评估从情绪分析中生成Alpha的可行性
摘要: 情感分析在产品评论中被广泛使用,也通过微博和新闻文章影响金融市场,从而通过影响资产价格。尽管有关情感驱动金融的研究很多,但许多研究集中在句子级别的分类上,忽视了其在交易中的实际应用。本研究通过评估基于情感的交易策略来弥补这一差距,以产生积极的阿尔法。我们使用来自三个模型(两个分类和一个回归)的情感预测进行回测分析,应用于道琼斯30只股票的新闻文章,将它们与基准的买入持有策略进行比较。结果显示,所有模型都产生了正收益,其中回归模型在28个月内实现了50.63%的最高回报,表现优于基准的买入持有策略。这突显了情感在增强投资策略和金融决策中的潜力。
更新时间: 2025-07-04 07:32:59
领域: cs.CL,cs.AI
LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment
Reinforcement learning (RL) has become a key technique for enhancing LLMs' reasoning abilities, yet its data inefficiency remains a major bottleneck. To address this critical yet challenging issue, we present a novel gradient-alignment-based method, named LearnAlign, which intelligently selects the learnable and representative training reasoning data for RL post-training. To overcome the issue of response-length bias in gradient norms, we introduce the data learnability based on the success rate, which can indicate the learning potential of each data point. Experiments across three mathematical reasoning benchmarks demonstrate that our method significantly reduces training data requirements while achieving minor performance degradation or even improving performance compared to full-data training. For example, it reduces data requirements by up to 1,000 data points with better performance (77.53%) than that on the full dataset on GSM8K benchmark (77.04%). Furthermore, we show its effectiveness in the staged RL setting. This work provides valuable insights into data-efficient RL post-training and establishes a foundation for future research in optimizing reasoning data selection. To facilitate future work, we will release code.
Updated: 2025-07-04 07:31:49
标题: LearnAlign:基于改进的梯度对齐,在大型语言模型中用于强化学习的数据选择推理
摘要: 强化学习(RL)已成为增强LLMs推理能力的关键技术,然而其数据效率仍然是一个主要瓶颈。为了解决这个关键但具有挑战性的问题,我们提出了一种新颖的基于梯度对齐的方法,命名为LearnAlign,该方法智能地选择RL后训练的可学习和代表性训练推理数据。为了克服梯度范数中响应长度偏差的问题,我们引入基于成功率的数据可学习性,该指标可以表明每个数据点的学习潜力。在三个数学推理基准测试中的实验证明,我们的方法显著减少了训练数据需求,同时实现了与完整数据训练相比的轻微性能下降甚至改善性能。例如,在GSM8K基准测试中,我们将数据需求减少了最多1,000个数据点,并且在性能上表现更好(77.53%),而在完整数据集上的性能为77.04%。此外,我们展示了其在分阶段RL设置中的有效性。这项工作为数据高效的RL后训练提供了宝贵的见解,并为未来优化推理数据选择的研究奠定了基础。为了促进未来工作,我们将发布代码。
更新时间: 2025-07-04 07:31:49
领域: cs.LG,cs.AI
Effects of structure on reasoning in instance-level Self-Discover
The drive for predictable LLM reasoning in their integration with compound systems has popularized structured outputs, yet concerns remain about performance trade-offs compared to unconstrained natural language. At the same time, training on unconstrained Chain of Thought (CoT) traces has brought about a new class of strong reasoning models that nevertheless present novel compute budget and faithfulness challenges. This paper introduces iSelf-Discover, an instance-level adaptation of the Self-Discover framework, and using it compares dynamically generated structured JSON reasoning with its unstructured counterpart. Our empirical evaluation across diverse benchmarks using state-of-the-art open-source models supports a consistent advantage for unstructured reasoning. Notably, on the complex MATH benchmark, unstructured plans achieved relative performance improvements of up to 18.90\% over structured approaches. Zero-shot unstructured iSelf-Discover variants are also shown to outperform their five-shot structured counterparts, underscoring the significance of this gap, even when structured plans are dynamically generated to ensure reasoning precedes the final answer. We further demonstrate that the optimal granularity of plan generation (instance-level vs. task-level) is context-dependent. These findings invite re-evaluation of the reliance on structured formats for complex problem-solving and how compound systems should be organized.
Updated: 2025-07-04 07:28:42
标题: 结构对实例级自我发现中推理的影响
摘要: 对于在集成化复合系统中实现可预测的逻辑推理,对结构化输出的需求不断增加,然而与无约束的自然语言相比,仍存在性能权衡的担忧。同时,在无约束的Chain of Thought (CoT)追踪训练上,出现了一类新的强大推理模型,但同时也带来了新的计算预算和忠实度挑战。本文介绍了iSelf-Discover,这是Self-Discover框架的一个实例级适应,通过它比较了动态生成的结构化JSON推理与其非结构化对应物。我们在各种基准测试中进行了实证评估,使用最先进的开源模型支持了无结构推理的一致优势。值得注意的是,在复杂的MATH基准测试中,无结构计划相对于结构化方法实现的性能提升高达18.90%。零-shot无结构iSelf-Discover变体也表现出优于其五-shot结构化对应物的性能,突出了这一差距的重要性,即使结构化计划是动态生成的,以确保推理在最终答案之前进行。我们进一步证明了计划生成的最佳粒度(实例级别与任务级别)是依赖于上下文的。这些发现引发了对于复杂问题解决依赖结构化格式的重新评估,以及复合系统应该如何组织的讨论。
更新时间: 2025-07-04 07:28:42
领域: cs.AI
Securing Mixed Rust with Hardware Capabilities
The Rust programming language enforces three basic Rust principles, namely ownership, borrowing, and AXM (Aliasing Xor Mutability) to prevent security bugs such as memory safety violations and data races. However, Rust projects often have mixed code, i.e., code that also uses unsafe Rust, FFI (Foreign Function Interfaces), and inline assembly for low-level control. The Rust compiler is unable to statically enforce Rust principles in mixed Rust code which can lead to many security vulnerabilities. In this paper, we propose CapsLock, a security enforcement mechanism that can run at the level of machine code and detect Rust principle violations at run-time in mixed code. CapsLock is kept simple enough to be implemented into recent capability-based hardware abstractions that provide low-cost spatial memory safety. CapsLock introduces a novel revoke-on-use abstraction for capability-based designs, wherein accessing a memory object via a capability implicitly invalidates certain other capabilities pointing to it, thereby also providing temporal memory safety automatically, without requiring software to explicitly specify such invalidation. Thus, CapsLock is the first mechanism capable of providing cross-language enforcement of Rust principles. We implemented a prototype of CapsLock on QEMU. Evaluation results show that CapsLock is highly compatible with existing Rust code (passing 99.7% of the built-in test cases of the 100 most popular crates) and flags Rust principle violations in real-world Rust projects that use FFI or inline assembly. We discovered 8 previously unknown bugs in such crates in our experiments.
Updated: 2025-07-04 07:12:43
标题: 使用硬件功能确保混合式Rust
摘要: Rust编程语言强制执行三个基本的Rust原则,即所有权、借用和AXM(别名异或可变性),以防止内存安全违规和数据竞争等安全漏洞。然而,Rust项目通常具有混合代码,即还使用不安全的Rust、FFI(外部函数接口)和内联汇编进行底层控制的代码。Rust编译器无法在混合Rust代码中静态强制执行Rust原则,这可能导致许多安全漏洞。本文提出了一种名为CapsLock的安全执行机制,可以在机器码级别运行,并在混合代码中检测运行时的Rust原则违规。CapsLock足够简单,可以实现到提供低成本空间内存安全的最新基于能力的硬件抽象中。CapsLock引入了一种新颖的基于能力设计的使用时撤销抽象,在此设计中,通过能力访问内存对象会隐式使指向该对象的某些其他能力无效,从而自动提供临时内存安全,无需软件显式指定此类无效。因此,CapsLock是第一个能够提供Rust原则跨语言执行的机制。我们在QEMU上实现了CapsLock的原型。评估结果显示,CapsLock与现有的Rust代码高度兼容(通过了100个最流行的箱子中的99.7%内置测试用例),并且能够标记在使用FFI或内联汇编的真实世界Rust项目中发现的Rust原则违规。在实验中,我们发现了8个之前未知的这类箱子中的错误。
更新时间: 2025-07-04 07:12:43
领域: cs.CR,cs.SE,C.1.3; D.2.5
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
Linear attention has attracted interest as a computationally efficient approximation to softmax attention, especially for long sequences. Recent studies have explored distilling softmax attention in pre-trained Transformers into linear attention. However, a critical challenge remains: how to choose the feature dimension that governs the approximation quality. Existing methods fix this dimension uniformly across all attention layers, overlooking the diverse roles and complexities of them. In this paper, we propose a principled method to automatically determine the feature dimension in linear attention using the concept of statistical degrees of freedom, which represent the effective dimensionality of the inputs. We provide a theoretical bound on the approximation error and show that the dimension chosen by our method achieves smaller error under a fixed computational budget. Furthermore, we introduce an efficient layerwise training strategy to learn nonlinear features tailored to each layer. Experiments on multiple pre-trained transformers demonstrate that our method improves the performance of distilled models compared to baselines without increasing the inference cost. Our findings also provide insight into how the complexity of the attention mechanism evolves across layers.
Updated: 2025-07-04 06:59:17
标题: 线性注意力的自由度:通过最佳特征效率提炼Softmax注意力
摘要: 线性注意力作为一种计算效率高的softmax注意力的近似方法,尤其适用于长序列,引起了人们的兴趣。最近的研究探索了将预训练的Transformer中的softmax注意力提炼为线性注意力。然而,一个关键挑战仍然存在:如何选择控制近似质量的特征维度。现有方法在所有注意力层中固定这个维度,忽视了它们的多样化角色和复杂性。在本文中,我们提出了一种基于统计自由度概念的方法,以自动确定线性注意力中的特征维度,这些自由度代表输入的有效维度。我们提供了一个理论上的近似误差界限,并展示了我们方法选择的维度在固定计算预算下实现了更小的误差。此外,我们引入了一种高效的逐层训练策略,以学习针对每一层定制的非线性特征。在多个预训练Transformer上的实验表明,我们的方法提高了蒸馏模型的性能,而不增加推理成本。我们的研究结果还揭示了注意力机制的复杂性如何在不同层次之间演变。
更新时间: 2025-07-04 06:59:17
领域: cs.LG,stat.ML
DESign: Dynamic Context-Aware Convolution and Efficient Subnet Regularization for Continuous Sign Language Recognition
Current continuous sign language recognition (CSLR) methods struggle with handling diverse samples. Although dynamic convolutions are ideal for this task, they mainly focus on spatial modeling and fail to capture the temporal dynamics and contextual dependencies. To address this, we propose DESign, a novel framework that incorporates Dynamic Context-Aware Convolution (DCAC) and Subnet Regularization Connectionist Temporal Classification (SR-CTC). DCAC dynamically captures the inter-frame motion cues that constitute signs and uniquely adapts convolutional weights in a fine-grained manner based on contextual information, enabling the model to better generalize across diverse signing behaviors and boost recognition accuracy. Furthermore, we observe that existing methods still rely on only a limited number of frames for parameter updates during training, indicating that CTC learning overfits to a dominant path. To address this, SR-CTC regularizes training by applying supervision to subnetworks, encouraging the model to explore diverse CTC alignment paths and effectively preventing overfitting. A classifier-sharing strategy in SR-CTC further strengthens multi-scale consistency. Notably, SR-CTC introduces no inference overhead and can be seamlessly integrated into existing CSLR models to boost performance. Extensive ablations and visualizations further validate the effectiveness of the proposed methods. Results on mainstream CSLR datasets (i.e., PHOENIX14, PHOENIX14-T, CSL-Daily) demonstrate that DESign achieves state-of-the-art performance.
Updated: 2025-07-04 06:56:28
标题: DESign:连续手语识别的动态上下文感知卷积和高效子网络正则化
摘要: 目前的连续手语识别(CSLR)方法在处理多样化样本方面存在困难。尽管动态卷积非常适合这项任务,但它们主要侧重于空间建模,无法捕捉时间动态和上下文依赖关系。为了解决这一问题,我们提出了DESign,这是一个新颖的框架,它结合了动态上下文感知卷积(DCAC)和子网络正则化连接主义时间分类(SR-CTC)。DCAC动态捕捉构成手语的帧间运动线索,并根据上下文信息以细粒度的方式独特地调整卷积权重,使模型能更好地在不同手语行为之间进行泛化,并提升识别准确性。此外,我们观察到现有方法在训练过程中仍然仅依赖有限数量的帧进行参数更新,表明CTC学习过度拟合到一个主导路径。为了解决这个问题,SR-CTC通过向子网络应用监督来规范训练,鼓励模型探索多样化的CTC对齐路径,并有效防止过拟合。SR-CTC中的分类器共享策略进一步加强了多尺度的一致性。值得注意的是,SR-CTC不会引入推理开销,并可以无缝集成到现有的CSLR模型中以提升性能。广泛的消融实验和可视化进一步验证了提出方法的有效性。在主流的CSLR数据集(即PHOENIX14、PHOENIX14-T、CSL-Daily)上的结果表明,DESign实现了最先进的性能。
更新时间: 2025-07-04 06:56:28
领域: cs.CV,cs.AI
Evaluating Disassembly Errors With Only Binaries
Disassemblers are crucial in the analysis and modification of binaries. Existing works showing disassembler errors largely rely on practical implementation without specific guarantees and assume source code and compiler toolchains to evaluate ground truth. However, the assumption of source code is contrary to typical binary scenarios where only the binary is available. In this work, we investigate an approach with minimal assumptions and a sound approach to disassembly error evaluation that does not require source code. Any source code does not address the fundamental problem of binary disassembly and fails when only the binary exists. As far as we know, this is the first work to evaluate disassembly errors using only the binary. We propose TraceBin, which uses dynamic execution to find disassembly errors. TraceBin targets the use case where the disassembly is used in an automated fashion for security tasks on a target binary, such as static binary instrumentation, binary hardening, automated code repair, and so on, which may be affected by disassembly errors. Discovering disassembly errors in the target binary aids in reducing problems caused by such errors. Furthermore, we are not aware of existing approaches that can evaluate errors given only a target binary, as they require source code. Our evaluation shows TraceBin finds: (i) errors consistent with existing studies even without source; (ii) disassembly errors due to control flow; (iii) new interesting errors; (iv) errors in non-C/C++ binaries; (v) errors in closed-source binaries; and (vi) show that disassembly errors can have significant security implications. Overall, our experimental results show that TraceBin finds many errors in existing popular disassemblers. It is also helpful in automated security tasks on (closed source) binaries relying on disassemblers.
Updated: 2025-07-04 06:52:35
标题: 用仅二进制文件评估拆解错误
摘要: 反汇编器在二进制分析和修改中至关重要。现有的研究显示,反汇编器错误主要依赖于实际实现,没有具体的保证,并假定需要源代码和编译器工具链来评估基本事实。然而,假设需要源代码与典型的二进制场景相矛盾,在典型情况下只有二进制可用。在这项工作中,我们研究了一种最小假设和一种不需要源代码即可进行反汇编错误评估的可靠方法。任何源代码都无法解决二进制反汇编的根本问题,并且当仅有二进制时会失败。据我们所知,这是首个仅使用二进制评估反汇编错误的工作。我们提出了TraceBin,它利用动态执行来发现反汇编错误。TraceBin的目标是在目标二进制上以自动化方式进行反汇编,用于安全任务,如静态二进制仪器化、二进制加固、自动代码修复等,这些任务可能会受到反汇编错误的影响。在目标二进制中发现反汇编错误有助于减少由这些错误引起的问题。此外,我们不知道现有方法能在仅有目标二进制的情况下评估错误,因为它们需要源代码。我们的评估表明TraceBin发现:(i)即使没有源代码也能发现与现有研究一致的错误;(ii)由于控制流而导致的反汇编错误;(iii)新的有趣错误;(iv)非C/C++二进制中的错误;(v)闭源二进制中的错误;以及(vi)反汇编错误可能具有重要的安全影响。总的来说,我们的实验结果表明TraceBin在现有流行的反汇编器中发现了许多错误。它还对依赖反汇编器的自动化安全任务(封闭源)二进制有所帮助。
更新时间: 2025-07-04 06:52:35
领域: cs.CR
MAGIC: Mask-Guided Diffusion Inpainting with Multi-Level Perturbations and Context-Aware Alignment for Few-Shot Anomaly Generation
Few-shot anomaly generation is emerging as a practical solution for augmenting the scarce anomaly data in industrial quality control settings. An ideal generator would meet three demands at once, namely (i) keep the normal background intact, (ii) inpaint anomalous regions to tightly overlap with the corresponding anomaly masks, and (iii) generate anomalous regions in a semantically valid location, while still producing realistic, diverse appearances from only a handful of real examples. Existing diffusion-based methods usually satisfy at most two of these requirements: global anomaly generators corrupt the background, whereas mask-guided ones often falter when the mask is imprecise or misplaced. We propose MAGIC--Mask-guided inpainting with multi-level perturbations and Context-aware alignment--to resolve all three issues. At its core, MAGIC fine-tunes a Stable Diffusion inpainting backbone that preserves normal regions and ensures strict adherence of the synthesized anomaly to the supplied mask, directly addressing background corruption and misalignment. To offset the diversity loss that fine-tuning can cause, MAGIC adds two complementary perturbation strategies: (i) Gaussian prompt-level perturbation applied during fine-tuning and inference that broadens the global appearance of anomalies while avoiding low-fidelity textual appearances, and (ii) mask-guided spatial noise injection that enriches local texture variations. Additionally, the context-aware mask alignment module forms semantic correspondences and relocates masks so that every anomaly remains plausibly contained within the host object, eliminating out-of-boundary artifacts. Under a consistent identical evaluation protocol on the MVTec-AD dataset, MAGIC outperforms previous state-of-the-arts in downstream anomaly tasks.
Updated: 2025-07-04 06:51:57
标题: MAGIC:具有多级扰动和上下文感知对齐的面具引导扩散修复,用于少样本异常生成
摘要: Few-shot异常生成作为一种实际解决方案,用于增加工业质量控制环境中稀缺的异常数据。理想的生成器应同时满足三个需求,即(i)保持正常背景完整,(ii)修复异常区域以与相应的异常蒙版紧密重叠,以及(iii)在语义有效位置生成异常区域,同时仅从少数真实示例中产生逼真、多样化的外观。现有基于扩散的方法通常最多满足这些要求中的两个:全局异常生成器会破坏背景,而基于蒙版引导的方法在蒙版不精确或错误放置时经常出现问题。我们提出了MAGIC——带有多级扰动和上下文感知对齐的蒙版引导修复,以解决所有三个问题。在其核心,MAGIC微调了一个保留正常区域并确保合成异常与提供的蒙版严格一致的Stable Diffusion修复骨干,直接解决了背景破坏和不对齐问题。为了抵消微调可能引起的多样性损失,MAGIC添加了两种互补的扰动策略:(i)在微调和推断期间应用的高斯提示级扰动,扩大异常的全局外观,同时避免低保真度的文本外观,以及(ii)基于蒙版引导的空间噪声注入,丰富局部纹理变化。此外,上下文感知蒙版对齐模块形成语义对应关系并重新定位蒙版,使每个异常保持在主机对象内,消除超出边界的人工制品。在MVTec-AD数据集上一致相同的评估协议下,MAGIC在下游异常任务中胜过了先前的最新技术。
更新时间: 2025-07-04 06:51:57
领域: cs.CV,cs.AI
CAOTE: KV Cache Eviction for LLMs via Attention Output Error-Based Token Selection
While long context support of large language models has extended their abilities, it also incurs challenges in memory and compute which becomes crucial bottlenecks in resource-restricted devices. Token eviction, a widely adopted post-training methodology designed to alleviate the bottlenecks by evicting less important tokens from the cache, typically uses attention scores as proxy metrics for token importance. However, one major limitation of attention score as a token-wise importance metrics is that it lacks the information about contribution of tokens to the attention output. In this paper, we propose a simple eviction criterion based on the contribution of cached tokens to attention outputs. Our method, CAOTE, optimizes for eviction error due to token eviction, by seamlessly integrating attention scores and value vectors. This is the first method which uses value vector information on top of attention-based eviction scores. Additionally, CAOTE can act as a meta-heuristic method with flexible usage with any token eviction method. We show that CAOTE, when combined with the state-of-the-art attention score-based methods, always improves accuracies on the downstream task, indicating the importance of leveraging information from values during token eviction process.
Updated: 2025-07-04 06:49:31
标题: CAOTE: 通过关注输出错误的令牌选择实现LLMs的KV缓存驱逐
摘要: 大型语言模型的长期上下文支持扩展了它们的能力,但也带来了在内存和计算方面的挑战,这在资源受限设备中变得至关重要。标记驱逐是一种广泛采用的后训练方法,旨在通过从缓存中驱逐不太重要的标记来缓解瓶颈,通常使用注意力分数作为标记重要性的代理度量。然而,注意力分数作为标记重要性度量的一个主要局限性是缺乏关于标记对注意力输出的贡献信息。在本文中,我们提出了一个简单的基于缓存标记对注意力输出贡献的驱逐标准。我们的方法CAOTE通过无缝集成注意力分数和值向量来优化由于标记驱逐而产生的驱逐错误。这是第一个在基于注意力的驱逐分数之上使用值向量信息的方法。此外,CAOTE可以作为一种元启发方法,可以灵活地与任何标记驱逐方法一起使用。我们展示了当CAOTE与最先进的基于注意力分数的方法结合时,总是在下游任务的准确性上取得改善,表明在标记驱逐过程中利用值信息的重要性。
更新时间: 2025-07-04 06:49:31
领域: cs.LG
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky
Large language models (LLMs) are increasingly tasked with invoking enterprise APIs, yet they routinely falter when near-duplicate tools vie for the same user intent or when required arguments are left underspecified. We introduce DiaFORGE (Dialogue Framework for Organic Response Generation & Evaluation), a disambiguation-centric, three-stage pipeline that (i) synthesizes persona-driven, multi-turn dialogues in which the assistant must distinguish among highly similar tools, (ii) performs supervised fine-tuning of open-source models with reasoning traces across 3B - 70B parameters, and (iii) evaluates real-world readiness via a dynamic suite that redeploys each model in a live agentic loop and reports end-to-end goal completion alongside conventional static metrics. On our dynamic benchmark DiaBENCH, models trained with DiaFORGE raise tool-invocation success by 27 pp over GPT-4o and by 49 pp over Claude-3.5-Sonnet, both under optimized prompting. To spur further research, we release an open corpus of 5000 production-grade enterprise API specifications paired with rigorously validated, disambiguation-focused dialogues, offering a practical blueprint for building reliable, enterprise-ready tool-calling agents.
Updated: 2025-07-04 06:49:02
标题: 消除歧义为中心的微调使企业工具调用LLM更真实和更少风险
摘要: 大型语言模型(LLMs)越来越经常被用于调用企业API,然而当近似重复的工具竞争同一个用户意图或者必要参数未明确指定时,它们往往会失败。我们引入了DiaFORGE(用于有机响应生成和评估的对话框架),这是一个以消除歧义为中心的三阶段流水线,其中(i)合成了以人物驱动的多轮对话,助手必须区分高度相似的工具,(ii)通过跨3B-70B参数的推理轨迹执行监督微调开源模型,(iii)通过一个动态套件评估现实世界的准备情况,重新部署每个模型在一个实时机构循环中,并报告端到端目标完成情况以及传统的静态指标。在我们的动态基准DiaBENCH上,使用DiaFORGE训练的模型在优化提示下比GPT-4o提高了27个百分点,比Claude-3.5-Sonnet提高了49个百分点。为了推动进一步研究,我们发布了一个包含5000个生产级企业API规范的开放语料库,配对了经过严格验证的、以消歧为重点的对话,为构建可靠的、企业就绪的工具调用代理提供了实用的蓝图。
更新时间: 2025-07-04 06:49:02
领域: cs.AI,cs.CL,cs.LG
De-Fake: Style based Anomaly Deepfake Detection
Detecting deepfakes involving face-swaps presents a significant challenge, particularly in real-world scenarios where anyone can perform face-swapping with freely available tools and apps without any technical knowledge. Existing deepfake detection methods rely on facial landmarks or inconsistencies in pixel-level features and often struggle with face-swap deepfakes, where the source face is seamlessly blended into the target image or video. The prevalence of face-swap is evident in everyday life, where it is used to spread false information, damage reputations, manipulate political opinions, create non-consensual intimate deepfakes (NCID), and exploit children by enabling the creation of child sexual abuse material (CSAM). Even prominent public figures are not immune to its impact, with numerous deepfakes of them circulating widely across social media platforms. Another challenge faced by deepfake detection methods is the creation of datasets that encompass a wide range of variations, as training models require substantial amounts of data. This raises privacy concerns, particularly regarding the processing and storage of personal facial data, which could lead to unauthorized access or misuse. Our key idea is to identify these style discrepancies to detect face-swapped images effectively without accessing the real facial image. We perform comprehensive evaluations using multiple datasets and face-swapping methods, which showcases the effectiveness of SafeVision in detecting face-swap deepfakes across diverse scenarios. SafeVision offers a reliable and scalable solution for detecting face-swaps in a privacy preserving manner, making it particularly effective in challenging real-world applications. To the best of our knowledge, SafeVision is the first deepfake detection using style features while providing inherent privacy protection.
Updated: 2025-07-04 06:42:51
标题: 去伪:基于样式的异常深度伪造检测
摘要: 检测涉及面部交换的深度伪造在现实世界情境中具有重大挑战,特别是在任何人都可以使用免费工具和应用程序进行面部交换而无需任何技术知识的情况下。现有的深度伪造检测方法依赖于面部标志或像素级特征的不一致性,通常在面部交换深度伪造方面表现出困难,其中源面部被无缝融合到目标图像或视频中。面部交换的普遍性在日常生活中是显而易见的,它被用来传播假信息、损害声誉、操纵政治观点、创建非自愿的亲密深度伪造(NCID)以及通过使儿童易受侵害而利用儿童性虐待材料(CSAM)。 即使是著名的公众人物也无法免受其影响,他们的许多深度伪造视频在社交媒体平台上广泛传播。深度伪造检测方法面临的另一个挑战是创建涵盖各种变化的数据集,因为训练模型需要大量数据。这引起了隐私问题,特别是涉及个人面部数据的处理和存储,这可能导致未经授权的访问或滥用。我们的关键想法是识别这些风格差异,以有效检测面部交换图像,而无需访问真实面部图像。我们使用多个数据集和面部交换方法进行全面评估,展示了SafeVision在各种情景下检测面部交换深度伪造的有效性。SafeVision提供了一种可靠且可扩展的解决方案,以隐私保护方式检测面部交换,使其在具有挑战性的现实世界应用中特别有效。据我们所知,SafeVision是第一个利用风格特征进行深度伪造检测的同时提供固有隐私保护的方法。
更新时间: 2025-07-04 06:42:51
领域: cs.CV,cs.AI
Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling
To alleviate the reliance of deep neural networks on large-scale datasets, dataset distillation aims to generate compact, high-quality synthetic datasets that can achieve comparable performance to the original dataset. The integration of generative models has significantly advanced this field. However, existing approaches primarily focus on aligning the distilled dataset with the original one, often overlooking task-specific information that can be critical for optimal downstream performance. In this paper, focusing on the downstream task of classification, we propose a task-specific sampling strategy for generative dataset distillation that incorporates the concept of difficulty to consider the requirements of the target task better. The final dataset is sampled from a larger image pool with a sampling distribution obtained by matching the difficulty distribution of the original dataset. A logarithmic transformation is applied as a pre-processing step to correct for distributional bias. The results of extensive experiments demonstrate the effectiveness of our method and suggest its potential for enhancing performance on other downstream tasks.
Updated: 2025-07-04 06:38:02
标题: 任务特定的生成式数据集精炼与困难引导采样
摘要: 为了减少深度神经网络对大规模数据集的依赖,数据集蒸馏旨在生成紧凑、高质量的合成数据集,能够达到与原始数据集相当的性能。将生成模型整合到这一领域显著推动了其发展。然而,现有方法主要集中在将蒸馏数据集与原始数据集对齐,往往忽视了任务特定信息,这些信息对于下游性能的优化至关重要。本文针对分类的下游任务,提出了一种基于任务的采样策略,用于生成数据集的蒸馏,该策略整合了困难度概念,更好地考虑目标任务的需求。最终数据集是从一个更大的图像池中进行采样,采样分布是通过匹配原始数据集的困难度分布获得的。对数变换被应用作为一个预处理步骤,以校正分布偏差。广泛实验的结果表明了我们方法的有效性,并表明其在其他下游任务上提升性能的潜力。
更新时间: 2025-07-04 06:38:02
领域: cs.CV,cs.AI,cs.LG
Exploring Object Status Recognition for Recipe Progress Tracking in Non-Visual Cooking
Cooking plays a vital role in everyday independence and well-being, yet remains challenging for people with vision impairments due to limited support for tracking progress and receiving contextual feedback. Object status - the condition or transformation of ingredients and tools - offers a promising but underexplored foundation for context-aware cooking support. In this paper, we present OSCAR (Object Status Context Awareness for Recipes), a technical pipeline that explores the use of object status recognition to enable recipe progress tracking in non-visual cooking. OSCAR integrates recipe parsing, object status extraction, visual alignment with cooking steps, and time-causal modeling to support real-time step tracking. We evaluate OSCAR on 173 instructional videos and a real-world dataset of 12 non-visual cooking sessions recorded by BLV individuals in their homes. Our results show that object status consistently improves step prediction accuracy across vision-language models, and reveal key factors that impact performance in real-world conditions, such as implicit tasks, camera placement, and lighting. We contribute the pipeline of context-aware recipe progress tracking, an annotated real-world non-visual cooking dataset, and design insights to guide future context-aware assistive cooking systems.
Updated: 2025-07-04 06:30:50
标题: 在非视觉烹饪中探索用于菜谱进度跟踪的对象状态识别
摘要: 烹饪在日常独立生活和幸福感中起着至关重要的作用,然而对于视力受损的人来说仍然具有挑战性,因为缺乏追踪进度和接收上下文反馈的支持。物体状态 - 食材和工具的条件或转化 - 为上下文感知的烹饪支持提供了一个有前景但未被充分探索的基础。在本文中,我们提出了OSCAR(对象状态上下文感知食谱),这是一个技术管道,通过探索物体状态识别来实现非视觉烹饪中的食谱进度跟踪。OSCAR集成了食谱解析、物体状态提取、与烹饪步骤的视觉对齐,以及时间因果建模,以支持实时步骤跟踪。我们在173个教学视频和一个由视力受损个体在家中记录的12个真实世界非视觉烹饪会话的数据集上评估了OSCAR。我们的结果表明,物体状态在视觉语言模型中始终提高了步骤预测准确性,并揭示了影响实际条件下性能的关键因素,如隐含任务、摄像机位置和光照。我们提供了上下文感知食谱进度跟踪管道,一个标注的真实世界非视觉烹饪数据集,以及设计见解,以指导未来上下文感知的辅助烹饪系统。
更新时间: 2025-07-04 06:30:50
领域: cs.AI,cs.CV,cs.HC
NDAI-NeuroMAP: A Neuroscience-Specific Embedding Model for Domain-Specific Retrieval
We present NDAI-NeuroMAP, the first neuroscience-domain-specific dense vector embedding model engineered for high-precision information retrieval tasks. Our methodology encompasses the curation of an extensive domain-specific training corpus comprising 500,000 carefully constructed triplets (query-positive-negative configurations), augmented with 250,000 neuroscience-specific definitional entries and 250,000 structured knowledge-graph triplets derived from authoritative neurological ontologies. We employ a sophisticated fine-tuning approach utilizing the FremyCompany/BioLORD-2023 foundation model, implementing a multi-objective optimization framework combining contrastive learning with triplet-based metric learning paradigms. Comprehensive evaluation on a held-out test dataset comprising approximately 24,000 neuroscience-specific queries demonstrates substantial performance improvements over state-of-the-art general-purpose and biomedical embedding models. These empirical findings underscore the critical importance of domain-specific embedding architectures for neuroscience-oriented RAG systems and related clinical natural language processing applications.
Updated: 2025-07-04 06:28:53
标题: NDAI-NeuroMAP:用于领域特定检索的神经科学特定嵌入模型
摘要: 我们提出了NDAI-NeuroMAP,这是第一个专为高精度信息检索任务设计的神经科学领域特定的密集向量嵌入模型。我们的方法包括策划一个包含50万个精心构建的三元组(查询-正面-负面配置)的广泛领域特定训练语料库,增加了25万个神经科学特定的定义条目和25万个源自权威神经学本体的结构化知识图三元组。我们采用了一种精细调整方法,利用FremyCompany/BioLORD-2023基础模型,实施了一个结合对比学习和基于三元组的度量学习范式的多目标优化框架。对一个包含约24,000个神经科学特定查询的留存测试数据集的全面评估表明,与最先进的通用和生物医学嵌入模型相比,性能得到了显著提升。这些实证结果强调了对神经科学导向的RAG系统和相关临床自然语言处理应用而言,领域特定嵌入架构的关键重要性。
更新时间: 2025-07-04 06:28:53
领域: cs.AI
Read Quietly, Think Aloud: Decoupling Comprehension and Reasoning in LLMs
Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding text and generating high-quality responses. However, a critical distinction from human cognition is their typical lack of a distinct internal `reading' or deliberation phase before `speaking' (i.e., generating text). Humans often engage in silent reading to comprehend context and formulate thoughts prior to articulation. This paper investigates methods to imbue LLMs with a similar capacity for internal processing. We introduce and evaluate techniques that encourage LLMs to `read silently.' Our findings indicate that even a straightforward approach, such as providing the model with an initial contextual prompt or `reading space' before it begins predicting subsequent tokens for the final output, can yield significant performance improvements. We further enhance this concept by developing a `reading buddy' architecture, where an auxiliary component silently processes the input and provides refined contextual insights to the primary generation model. These approaches aim to foster deeper understanding from LLMs so that they can produce better reasoned responses, moving them one step closer to more human-like text processing. Our results indicate that these simple techniques can provide surprisingly strong impact on accuracy with multiple point accuracy boost.
Updated: 2025-07-04 06:23:06
标题: 悄悄阅读,大声思考:在LLMs中解耦理解与推理
摘要: 大型语言模型(LLMs)已经展示出在理解文本和生成高质量回应方面的卓越能力。然而,与人类认知的一个关键区别是它们通常缺乏在“说话”(即生成文本)之前的明显内部“阅读”或思考阶段。人类经常进行默读来理解上下文并在表达之前构思思路。本文研究了赋予LLMs类似内部处理能力的方法。 我们提出并评估了鼓励LLMs进行“默读”的技术。我们的研究结果表明,即使是简单的方法,比如在模型开始预测最终输出的后续标记之前提供一个初始的上下文提示或“阅读空间”,也可以显著提高性能。我们进一步通过开发一个“阅读伙伴”架构来增强这一概念,其中一个辅助组件默默处理输入并为主要生成模型提供精细的上下文洞察力。这些方法旨在促进LLMs更深入地理解,以便它们能够产生更合理的回应,使它们更接近更类似人类的文本处理过程。我们的研究结果表明,这些简单的技术可以在多点准确性提升方面产生惊人的强大影响。
更新时间: 2025-07-04 06:23:06
领域: cs.CL,cs.AI
A Note on Single-Cut Full-Open Protocols
Card-based cryptography is a research area that realizes cryptographic protocols such as secure computation by applying shuffles to sequences of cards that encode input values. A single-cut full-open protocol is one that obtains an output value by applying a random cut to an input sequence of cards, after which all cards are opened. In this paper, we propose three single-cut full-open protocols: two protocols for three-variable functions and one protocol for a four-variable function.
Updated: 2025-07-04 06:17:52
标题: 单切全开协议简述
摘要: 基于卡片的密码学是一个研究领域,通过对编码输入值的卡片序列应用洗牌来实现安全计算等加密协议。单切全开协议是指通过对输入卡片序列进行随机切割来获取输出值,随后所有卡片都被打开。本文提出了三种单切全开协议:两种用于三变量函数的协议和一种用于四变量函数的协议。
更新时间: 2025-07-04 06:17:52
领域: cs.CR
Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis
Deep neural network (DNN) training continues to scale rapidly in terms of model size, data volume, and sequence length, to the point where multiple machines are required to fit large models for training. Different distributed and parallel training strategies have been developed to support large-scale DNN training by partitioning the training state across GPUs. However, existing DNN training systems provide very limited support for reconfiguring parallelism strategies in the middle of the training via checkpointing. This limitation arises because distributed checkpoints are tightly coupled to specific model parallelism and hardware configurations, preventing large-scale training jobs from efficiently adapting to hardware failures or resource elasticity. This paper presents Universal Checkpointing (UCP), a novel checkpointing system that enables flexible and efficient DNN training with reconfigurable parallelism. UCP overcomes challenges in existing systems by decoupling checkpoint structure from parallel training strategies and hardware configurations. In addition, we present a pattern-based reconfiguration pipeline that enables automatic, flexible, and efficient mapping of checkpoint state to various parallelism strategies. Evaluation on a range of DNN models, including state-of-the-art dense and sparse LLMs, shows that UCP enables reconfiguration for a broader set of widely used parallelism strategies than existing solutions while adding negligible reconfiguration cost. UCP has been successfully employed in real LLM training workloads, greatly enhancing their flexibility and resilience to dynamic hardware environments.
Updated: 2025-07-04 06:16:30
标题: 通用检查点:一种灵活高效的分布式检查点系统,用于可重新配置并行大规模DNN训练
摘要: 深度神经网络(DNN)训练在模型大小、数据量和序列长度方面持续快速扩展,以至于需要多台机器来适应训练大型模型。已经开发了不同的分布式和并行训练策略,通过在GPU上对训练状态进行分区来支持大规模DNN训练。然而,现有的DNN训练系统在训练过程中通过检查点提供的重新配置并行策略的支持非常有限。这种限制是因为分布式检查点与特定模型的并行性和硬件配置紧密耦合,阻碍了大规模训练工作负载有效地适应硬件故障或资源弹性。 本文介绍了一种名为Universal Checkpointing(UCP)的新型检查点系统,它可以实现灵活高效的DNN训练和可重新配置的并行性。UCP通过将检查点结构与并行训练策略和硬件配置解耦来克服现有系统中的挑战。此外,我们提出了一种基于模式的重新配置管道,可以自动、灵活、高效地将检查点状态映射到各种并行性策略。对包括最先进的稠密和稀疏LLM在内的一系列DNN模型进行评估表明,UCP能够为更广泛使用的并行性策略提供重新配置支持,而且增加的重新配置成本微乎其微。UCP已成功应用于真实的LLM训练工作负载中,极大地增强了它们对动态硬件环境的灵活性和韧性。
更新时间: 2025-07-04 06:16:30
领域: cs.DC,cs.LG
Source-Free Domain Adaptation via Multi-view Contrastive Learning
Domain adaptation has become a widely adopted approach in machine learning due to the high costs associated with labeling data. It is typically applied when access to a labeled source domain is available. However, in real-world scenarios, privacy concerns often restrict access to sensitive information, such as fingerprints, bank account details, and facial images. A promising solution to this issue is Source-Free Unsupervised Domain Adaptation (SFUDA), which enables domain adaptation without requiring access to labeled target domain data. Recent research demonstrates that SFUDA can effectively address domain discrepancies; however, two key challenges remain: (1) the low quality of prototype samples, and (2) the incorrect assignment of pseudo-labels. To tackle these challenges, we propose a method consisting of three main phases. In the first phase, we introduce a Reliable Sample Memory (RSM) module to improve the quality of prototypes by selecting more representative samples. In the second phase, we employ a Multi-View Contrastive Learning (MVCL) approach to enhance pseudo-label quality by leveraging multiple data augmentations. In the final phase, we apply a noisy label filtering technique to further refine the pseudo-labels. Our experiments on three benchmark datasets - VisDA 2017, Office-Home, and Office-31 - demonstrate that our method achieves approximately 2 percent and 6 percent improvements in classification accuracy over the second-best method and the average of 13 well-known state-of-the-art approaches, respectively.
Updated: 2025-07-04 06:15:23
标题: 无源域自适应的多视角对比学习
摘要: 域自适应已经成为机器学习中广泛采用的方法,因为标记数据的成本较高。通常在可以访问有标记的源域时应用该方法。然而,在现实世界中,隐私问题通常限制了对敏感信息(如指纹、银行账户详情和面部图像)的访问。针对这个问题的一个有前途的解决方案是无源无监督域自适应(SFUDA),它可以实现在无需访问有标记的目标域数据的情况下进行域自适应。最近的研究表明,SFUDA可以有效地解决域差异问题;然而,仍然存在两个关键挑战:(1)原型样本的质量较低,以及(2)伪标签的不正确分配。为了解决这些挑战,我们提出了一个由三个主要阶段组成的方法。在第一阶段,我们引入了一个可靠样本记忆(RSM)模块,通过选择更具代表性的样本来改善原型的质量。在第二阶段,我们采用多视图对比学习(MVCL)方法,通过利用多种数据增强技术来提高伪标签的质量。在最后阶段,我们应用一种噪声标签过滤技术来进一步改进伪标签。我们在三个基准数据集 - VisDA 2017、Office-Home 和 Office-31 上的实验表明,我们的方法在分类准确率上分别比第二好的方法和13种知名最先进方法的平均准确率分别提高了约2%和6%。
更新时间: 2025-07-04 06:15:23
领域: cs.CV,cs.AI
Structure-Aware Compound-Protein Affinity Prediction via Graph Neural Network with Group Lasso Regularization
Explainable artificial intelligence (XAI) approaches have been increasingly applied in drug discovery to learn molecular representations and identify substructures driving property predictions. However, building end-to-end explainable machine learning models for structure-activity relationship (SAR) modeling for compound property prediction faces many challenges, such as limited activity data per target and the sensitivity of properties to subtle molecular changes. To address this, we leveraged activity-cliff molecule pairs, i.e., compounds sharing a common scaffold but differing sharply in potency, targeting three proto-oncogene tyrosine-protein kinase Src proteins (i.e., PDB IDs 1O42, 2H8H, and 4MXO). We implemented graph neural network (GNN) methods to obtain atom-level feature information and predict compound-protein affinity (i.e., half maximal inhibitory concentration, IC50). In addition, we trained GNN models with different structure-aware loss functions to adequately leverage molecular property and structure information. We also utilized group lasso and sparse group lasso to prune and highlight molecular subgraphs and enhance the structure-specific model explainability for the predicted property difference in molecular activity-cliff pairs. We improved drug property prediction by integrating common and uncommon node information and using sparse group lasso, reducing the average root mean squared error (RMSE) by 12.70%, and achieving the lowest averaged RMSE=0.2551 and the highest PCC=0.9572. Furthermore, applying regularization enhances feature attribution methods that estimate the contribution of each atom in the molecular graphs by boosting global direction scores and atom-level accuracy in atom coloring accuracy, which improves model interpretability in drug discovery pipelines, particularly in investigating important molecular substructures in lead optimization.
Updated: 2025-07-04 06:12:18
标题: 通过具有组Lasso正则化的图神经网络的结构感知化合物-蛋白亲和力预测
摘要: 可解释的人工智能(XAI)方法越来越多地应用于药物发现,用于学习分子表示并识别驱动属性预测的亚结构。然而,为化合物属性预测建立端到端可解释的机器学习模型面临许多挑战,例如每个靶标的活性数据有限以及属性对微小分子变化的敏感性。为了解决这个问题,我们利用了活性悬崖分子对,即具有共同骨架但在效力上明显不同的化合物,针对三种原癌基因酪氨酸蛋白激酶Src蛋白(即PDB ID为1O42、2H8H和4MXO)。我们实现了图神经网络(GNN)方法,以获取原子级特征信息并预测化合物-蛋白亲和力(即半最大抑制浓度,IC50)。此外,我们训练了具有不同结构感知损失函数的GNN模型,以充分利用分子属性和结构信息。我们还利用组套索和稀疏组套索来修剪和突出分子亚图,并增强预测的属性差异中的分子活性悬崖对的结构特异性模型可解释性。通过整合常见和不常见的节点信息,并使用稀疏组套索,我们改善了药物属性预测,将平均均方根误差(RMSE)降低了12.70%,并实现了最低平均RMSE = 0.2551和最高PCC = 0.9572。此外,应用正则化增强了特征归因方法,通过提高全局方向分数和原子级准确性来估计分子图中每个原子的贡献,从而提高了药物发现管道中的模型可解释性,特别是在研究铅优化中的重要分子亚结构时。
更新时间: 2025-07-04 06:12:18
领域: cs.LG,cs.AI
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
The performance of large language models (LLMs) in program synthesis and mathematical reasoning is fundamentally limited by the quality of their pre-training corpora. We introduce two openly licensed datasets, released under the Llama 3.3 Community License, that significantly enhance LLM performance by systematically rewriting public data. SwallowCode (approximately 16.1 billion tokens) refines Python snippets from The-Stack-v2 through a novel four-stage pipeline: syntax validation, pylint-based style filtering, and a two-stage LLM rewriting process that enforces style conformity and transforms snippets into self-contained, algorithmically efficient examples. Unlike prior methods that rely on exclusionary filtering or limited transformations, our transform-and-retain approach upgrades low-quality code, maximizing data utility. SwallowMath (approximately 2.3 billion tokens) enhances Finemath-4+ by removing boilerplate, restoring context, and reformatting solutions into concise, step-by-step explanations. Within a fixed 50 billion token training budget, continual pre-training of Llama-3.1-8B with SwallowCode boosts pass@1 by +17.0 on HumanEval and +17.7 on HumanEval+ compared to Stack-Edu, surpassing the baseline model's code generation capabilities. Similarly, substituting SwallowMath yields +12.4 accuracy on GSM8K and +7.6 on MATH. Ablation studies confirm that each pipeline stage contributes incrementally, with rewriting delivering the largest gains. All datasets, prompts, and checkpoints are publicly available, enabling reproducible research and advancing LLM pre-training for specialized domains.
Updated: 2025-07-04 06:10:53
标题: 重写预训练数据提升数学和编程语言模型性能
摘要: 大型语言模型(LLMs)在程序合成和数学推理中的性能受其预训练语料库的质量限制。我们引入了两个在Llama 3.3社区许可证下发布的开放许可数据集,通过系统地重写公共数据显著提高LLM的性能。SwallowCode(约161亿令牌)通过一个新颖的四阶段流程:语法验证、基于pylint的样式过滤,以及一个两阶段LLM重写过程,强制执行样式一致性并将代码片段转换为自包含、算法高效的示例,对The-Stack-v2中的Python代码片段进行了优化。与以往依赖排除性过滤或有限转换的方法不同,我们的转换和保留方法升级了低质量代码,最大化了数据的效用。SwallowMath(约23亿令牌)通过去除样板代码、恢复上下文,并将解决方案重新格式化为简洁的、逐步解释,增强了Finemath-4+。在固定的500亿令牌训练预算内,使用SwallowCode持续预训练Llama-3.1-8B可以将通过率@1在HumanEval和HumanEval+上比Stack-Edu提高17.0和17.7,超过了基线模型的代码生成能力。类似地,替换SwallowMath可以在GSM8K上提高12.4的准确性,在MATH上提高7.6。消融研究证实每个流程阶段都会逐渐做出贡献,重写带来了最大的收益。所有数据集、提示和检查点都是公开可用的,可以实现可重现的研究,推动LLM在专业领域的预训练。
更新时间: 2025-07-04 06:10:53
领域: cs.LG,cs.AI
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Large Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values and further raise the upper bound of AI capabilities, particularly in reasoning-intensive, long-context Chain-of-Thought (long-CoT) tasks. However, existing RLHF (or RLVR) frameworks commonly face challenges such as inference bottlenecks and complexity barriers, restricting their accessibility for newcomers. To bridge this gap, we introduce OpenRLHF, a user-friendly, scalable, and easy-to-learn open-source RLHF framework built upon Ray, vLLM, DeepSpeed, and HuggingFace Transformers, featuring a simplified design, clear code structure, and comprehensive documentation to facilitate entry for researchers and practitioners. Experimental results show that OpenRLHF achieves superior training efficiency with speedups ranging from 1.22x to 1.68x across different model sizes compared to state-of-the-art frameworks, while requiring significantly fewer lines of code for implementation. OpenRLHF is publicly available at https://github.com/OpenRLHF/OpenRLHF, and has already been adopted by leading institutions to accelerate RLHF research and learning.
Updated: 2025-07-04 06:10:22
标题: OpenRLHF:一个易于使用、可扩展和高性能的RLHF框架
摘要: 大型语言模型(LLMs)通过强化学习从人类反馈(RLHF)和具有可验证奖励的强化学习(RLVR)进行微调,显著提高了人工智能价值的对齐,并进一步提高了人工智能能力的上限,特别是在理性密集型、长篇连续思维(long-CoT)任务中。然而,现有的RLHF(或RLVR)框架通常面临推理瓶颈和复杂性障碍等挑战,限制了新手的使用。为了弥合这一差距,我们介绍了OpenRLHF,这是一个用户友好、可扩展且易于学习的开源RLHF框架,建立在Ray、vLLM、DeepSpeed和HuggingFace Transformers之上,具有简化的设计、清晰的代码结构和全面的文档,以促进研究人员和从业人员的进入。实验结果表明,与最先进的框架相比,OpenRLHF的训练效率更高,速度提高了1.22倍至1.68倍,同时实现了更少的代码行数来实现。OpenRLHF已经公开在https://github.com/OpenRLHF/OpenRLHF 上提供,并已被领先机构采用,以加速RLHF研究和学习。
更新时间: 2025-07-04 06:10:22
领域: cs.AI,cs.CL,cs.LG
Deep Autoregressive Models as Causal Inference Engines
Existing causal inference (CI) models are often restricted to data with low-dimensional confounders and singleton actions. We propose an autoregressive (AR) CI framework capable of handling complex confounders and sequential actions commonly found in modern applications. Our approach accomplishes this using {\em sequencification}, which transforms data from an underlying causal diagram into a sequence of tokens. Sequencification not only accommodates training with data generated from a large class of DAGs, but also extends existing CI capabilities to estimate multiple causal quantities using a {\em single} model. We can directly compute probabilities from interventional distributions, simplifying inference and improving outcome prediction accuracy. We demonstrate that an AR model adapted for CI is efficient and effective in various complex applications such as navigating mazes, playing chess endgames, and evaluating the impact of certain keywords on paper acceptance rates, where we consider causal queries beyond standard reinforcement learning-type questions.
Updated: 2025-07-04 06:09:35
标题: 深度自回归模型作为因果推断引擎
摘要: 现有的因果推断(CI)模型通常限于具有低维混淆因素和单一动作的数据。我们提出了一种自回归(AR)CI框架,能够处理现代应用程序中常见的复杂混淆因素和顺序动作。我们的方法通过“序列化”实现这一点,将数据从潜在的因果图转换为一系列令牌。序列化不仅适用于使用大类DAGs生成的数据进行训练,还扩展了现有的CI功能,以使用“单一”模型估计多个因果数量。我们可以直接计算干预分布的概率,简化推断并提高结果预测的准确性。我们证明,为CI调整的AR模型在各种复杂应用中是高效且有效的,例如导航迷宫、下棋终局和评估某些关键字对论文被接受率的影响,我们考虑超出标准强化学习类型问题的因果查询。
更新时间: 2025-07-04 06:09:35
领域: cs.LG,stat.ML
Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure
Financial scenario simulation is essential for risk management and portfolio optimization, yet it remains challenging especially in high-dimensional and small data settings common in finance. We propose a diffusion factor model that integrates latent factor structure into generative diffusion processes, bridging econometrics with modern generative AI to address the challenges of the curse of dimensionality and data scarcity in financial simulation. By exploiting the low-dimensional factor structure inherent in asset returns, we decompose the score function--a key component in diffusion models--using time-varying orthogonal projections, and this decomposition is incorporated into the design of neural network architectures. We derive rigorous statistical guarantees, establishing nonasymptotic error bounds for both score estimation at O(d^{5/2} n^{-2/(k+5)}) and generated distribution at O(d^{5/4} n^{-1/2(k+5)}), primarily driven by the intrinsic factor dimension k rather than the number of assets d, surpassing the dimension-dependent limits in the classical nonparametric statistics literature and making the framework viable for markets with thousands of assets. Numerical studies confirm superior performance in latent subspace recovery under small data regimes. Empirical analysis demonstrates the economic significance of our framework in constructing mean-variance optimal portfolios and factor portfolios. This work presents the first theoretical integration of factor structure with diffusion models, offering a principled approach for high-dimensional financial simulation with limited data. Our code is available at https://github.com/xymmmm00/diffusion_factor_model.
Updated: 2025-07-04 06:02:46
标题: 扩散因子模型:利用因子结构生成高维回报
摘要: 金融情景模拟对于风险管理和投资组合优化至关重要,然而在金融领域常见的高维度和小数据场景中仍然具有挑战性。我们提出了一种扩散因子模型,将潜在因子结构整合到生成性扩散过程中,将计量经济学与现代生成性人工智能相结合,以解决金融模拟中维度灾难和数据稀缺的挑战。通过利用资产回报中固有的低维度因子结构,我们使用时间变化的正交投影对得分函数进行分解,这种分解被纳入神经网络架构的设计中。我们推导了严格的统计保证,建立了非渐近误差边界,对得分估计为O(d^{5/2} n^{-2/(k+5)}),对生成分布为O(d^{5/4} n^{-1/2(k+5)}),主要由内在因子维度 k 驱动,而不是资产数量 d,超越了经典非参数统计文献中的维度相关限制,并使框架适用于拥有数千种资产的市场。数值研究证实了在小数据情况下在潜在子空间恢复方面的卓越表现。实证分析展示了我们框架在构建均值-方差最优投资组合和因子组合中的经济意义。这项工作首次将因子结构与扩散模型进行了理论整合,为具有有限数据的高维度金融模拟提供了原则性方法。我们的代码可在 https://github.com/xymmmm00/diffusion_factor_model 上找到。
更新时间: 2025-07-04 06:02:46
领域: q-fin.ST,cs.LG,q-fin.MF
Partial Label Learning for Automated Theorem Proving
We formulate learning guided Automated Theorem Proving as Partial Label Learning, building the first bridge across these fields of research and providing a theoretical framework for dealing with alternative proofs during learning. We use the plCoP theorem prover to demonstrate that methods from the Partial Label Learning literature tend to increase the performance of learning assisted theorem provers.
Updated: 2025-07-04 05:54:27
标题: 部分标签学习用于自动定理证明
摘要: 我们将学习引导的自动定理证明形式化为部分标签学习,构建了这两个研究领域之间的第一座桥梁,并提供了一个理论框架来处理学习过程中的替代证明。我们使用plCoP定理证明器来展示,部分标签学习文献中的方法往往会提高学习辅助定理证明器的性能。
更新时间: 2025-07-04 05:54:27
领域: cs.LO,cs.AI
Personalized Image Generation from an Author Writing Style
Translating nuanced, textually-defined authorial writing styles into compelling visual representations presents a novel challenge in generative AI. This paper introduces a pipeline that leverages Author Writing Sheets (AWS) - structured summaries of an author's literary characteristics - as input to a Large Language Model (LLM, Claude 3.7 Sonnet). The LLM interprets the AWS to generate three distinct, descriptive text-to-image prompts, which are then rendered by a diffusion model (Stable Diffusion 3.5 Medium). We evaluated our approach using 49 author styles from Reddit data, with human evaluators assessing the stylistic match and visual distinctiveness of the generated images. Results indicate a good perceived alignment between the generated visuals and the textual authorial profiles (mean style match: $4.08/5$), with images rated as moderately distinctive. Qualitative analysis further highlighted the pipeline's ability to capture mood and atmosphere, while also identifying challenges in representing highly abstract narrative elements. This work contributes a novel end-to-end methodology for visual authorial style personalization and provides an initial empirical validation, opening avenues for applications in creative assistance and cross-modal understanding.
Updated: 2025-07-04 05:53:48
标题: 根据作者写作风格生成个性化图像
摘要: 将细致、以文本为基础的作者写作风格翻译成引人入胜的视觉表现,在生成式人工智能中提出了一个新的挑战。本文介绍了一个利用作者写作表(AWS)-结构化总结作者文学特征-作为大型语言模型(LLM,克劳德3.7十四行诗)输入的流程。LLM解释AWS以生成三个不同的、描述性的文本到图像提示,然后由扩散模型(稳定扩散3.5中)呈现。我们使用Reddit数据中的49种作者风格评估了我们的方法,人类评估员评估了生成的图像的风格匹配和视觉独特性。结果显示生成的视觉与文本作者资料之间有良好的感知对齐(平均风格匹配:4.08/ 5),图像被评为中等独特。定性分析进一步突出了该流程捕捉情绪和氛围的能力,同时也指出了在表现高度抽象叙事元素方面的挑战。这项工作为视觉作者风格个性化提供了一种新颖的端到端方法,并提供了初步的实证验证,为创意协助和跨模态理解的应用开辟了途径。
更新时间: 2025-07-04 05:53:48
领域: cs.CV,cs.AI
KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction
Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.
Updated: 2025-07-04 05:48:34
标题: KEPLA:一种用于准确蛋白质-配体结合亲和力预测的知识增强深度学习框架
摘要: 准确预测蛋白质-配体结合亲和力对于药物发现至关重要。虽然最近的深度学习方法展示了有希望的结果,但它们通常仅依赖于蛋白质和配体的结构特征,忽视了与结合亲和力相关的宝贵生化知识。为了解决这一局限性,我们提出了KEPLA,一种新颖的深度学习框架,明确地整合了来自基因本体和配体性质的先验知识,以增强预测性能。KEPLA将蛋白质序列和配体分子图作为输入,并优化两个互补的目标:(1)将全局表示与知识图关系对齐,捕捉特定领域的生化见解,(2)利用局部表示之间的交叉注意力构建细粒度的联合嵌入以进行预测。在两个基准数据集上的实验,无论是在领域内还是跨领域的情况下,都表明KEPLA始终优于最先进的基线。此外,基于知识图关系和交叉注意力映射的可解释性分析提供了有价值的洞察力,揭示了潜在的预测机制。
更新时间: 2025-07-04 05:48:34
领域: cs.LG
MPX: Mixed Precision Training for JAX
Mixed-precision training has emerged as an indispensable tool for enhancing the efficiency of neural network training in recent years. Concurrently, JAX has grown in popularity as a versatile machine learning toolbox. However, it currently lacks robust support for mixed-precision training. We propose MPX, a mixed-precision training toolbox for JAX that simplifies and accelerates the training of large-scale neural networks while preserving model accuracy. MPX seamlessly integrates with popular toolboxes such as Equinox and Flax, allowing users to convert full-precision pipelines to mixed-precision versions with minimal modifications. By casting both inputs and outputs to half precision, and introducing a dynamic loss-scaling mechanism, MPX alleviates issues like gradient underflow and overflow that commonly arise in half precision computations. Its design inherits critical features from JAX's type-promotion behavior, ensuring that operations take place in the correct precision and allowing for selective enforcement of full precision where needed (e.g., sums, means, or softmax). MPX further provides wrappers for automatic creation and management of mixed-precision gradients and optimizers, enabling straightforward integration into existing JAX training pipelines. MPX's source code, documentation, and usage examples are available at github.com/Data-Science-in-Mechanical-Engineering/mixed_precision_for_JAX.
Updated: 2025-07-04 05:47:04
标题: MPX:JAX的混合精度训练
摘要: 混合精度训练近年来已成为增强神经网络训练效率的不可或缺的工具。同时,JAX作为一个多才多艺的机器学习工具包正在日渐流行。然而,目前它缺乏对混合精度训练的强大支持。我们提出了MPX,这是一个为JAX设计的混合精度训练工具包,简化并加速了大规模神经网络的训练,同时保持模型准确性。MPX能够与Equinox和Flax等流行工具包无缝集成,允许用户将全精度流程转换为混合精度版本,只需进行最少的修改。通过将输入和输出都转换为半精度,并引入动态损失缩放机制,MPX缓解了半精度计算中常见的梯度下溢和上溢等问题。其设计继承了JAX的类型提升行为的关键特性,确保操作在正确的精度下进行,并允许在需要时进行全精度的选择性执行(例如求和、平均值或softmax)。MPX还提供了用于自动创建和管理混合精度梯度和优化器的包装器,实现了简单地集成到现有的JAX训练流程中。MPX的源代码、文档和使用示例可在github.com/Data-Science-in-Mechanical-Engineering/mixed_precision_for_JAX上找到。
更新时间: 2025-07-04 05:47:04
领域: cs.LG
GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation
Document level Machine Translation (DocMT) approaches often struggle with effectively capturing discourse level phenomena. Existing approaches rely on heuristic rules to segment documents into discourse units, which rarely align with the true discourse structure required for accurate translation. Otherwise, they fail to maintain consistency throughout the document during translation. To address these challenges, we propose Graph Augmented Agentic Framework for Document Level Translation (GRAFT), a novel graph based DocMT system that leverages Large Language Model (LLM) agents for document translation. Our approach integrates segmentation, directed acyclic graph (DAG) based dependency modelling, and discourse aware translation into a cohesive framework. Experiments conducted across eight translation directions and six diverse domains demonstrate that GRAFT achieves significant performance gains over state of the art DocMT systems. Specifically, GRAFT delivers an average improvement of 2.8 d BLEU on the TED test sets from IWSLT2017 over strong baselines and 2.3 d BLEU for domain specific translation from English to Chinese. Moreover, our analyses highlight the consistent ability of GRAFT to address discourse level phenomena, yielding coherent and contextually accurate translations.
Updated: 2025-07-04 05:45:55
标题: GRAFT:面向文档级机器翻译的基于图的流感知代理框架
摘要: 文件级机器翻译(DocMT)方法通常难以有效地捕捉话语级现象。现有方法依赖启发式规则将文档分割为话语单元,这些规则很少与准确翻译所需的真实话语结构相一致。否则,在翻译过程中无法保持整个文档的一致性。为了解决这些挑战,我们提出了基于图增强代理框架的文件级翻译(GRAFT),这是一种新颖的基于图的DocMT系统,利用大型语言模型(LLM)代理进行文档翻译。我们的方法将分割、基于有向无环图(DAG)的依赖建模和话语感知翻译集成到一个统一的框架中。在八个翻译方向和六个不同领域进行的实验表明,GRAFT在最先进的DocMT系统上取得了显著的性能提升。具体而言,GRAFT在IWSLT2017的TED测试集上比强基线提高了平均2.8个BLEU分,并且在从英文到中文的领域特定翻译中增加了2.3个BLEU分。此外,我们的分析突出了GRAFT持续解决话语级现象的能力,产生连贯且语境准确的翻译。
更新时间: 2025-07-04 05:45:55
领域: cs.CL,cs.AI
ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series
This paper studies causal discovery in irregularly sampled time series-a pivotal challenge in high-stakes domains like finance, healthcare, and climate science, where missing data and inconsistent sampling frequencies distort causal mechanisms. Traditional methods (e.g., Granger causality, PCMCI) fail to reconcile multi-scale interactions (e.g., hourly storms vs. decadal climate shifts), while neural approaches (e.g., CUTS+) lack interpretability, stemming from a critical gap: existing frameworks either rigidly assume temporal regularity or aggregate dynamics into opaque representations, neglecting real-world granularity and auditable logic. To bridge this gap, we propose ReTimeCausal, a novel integration of Additive Noise Models (ANM) and Expectation-Maximization (EM) that unifies physics-guided data imputation with sparse causal inference. Through kernelized sparse regression and structural constraints, ReTimeCausal iteratively refines missing values (E-step) and causal graphs (M-step), resolving cross-frequency dependencies and missing data issues. Extensive experiments on synthetic and real-world datasets demonstrate that ReTimeCausal outperforms existing state-of-the-art methods under challenging irregular sampling and missing data conditions.
Updated: 2025-07-04 05:39:50
标题: ReTimeCausal:EM增强的可解释因果发现不规则时间序列的加性噪声模型
摘要: 本文研究了在不规则采样时间序列中的因果发现-这是金融、医疗保健和气候科学等高风险领域的关键挑战,缺失数据和不一致的采样频率扭曲了因果机制。传统方法(例如Granger因果关系、PCMCI)无法调和多尺度交互作用(例如每小时的风暴与十年的气候变化),而神经方法(例如CUTS+)缺乏可解释性,源于一个关键差距:现有框架要么刚性地假定时间规律,要么将动态聚合成不透明的表示,忽视了现实世界的细粒度和可审计逻辑。为了弥合这一差距,我们提出了ReTimeCausal,这是对加性噪声模型(ANM)和最大期望(EM)的创新整合,将物理引导的数据插补与稀疏因果推断统一起来。通过核化稀疏回归和结构约束,ReTimeCausal迭代地完善缺失值(E步骤)和因果图(M步骤),解决了跨频率依赖性和缺失数据问题。对合成和真实世界数据集的大量实验表明,ReTimeCausal在具有挑战性的不规则采样和缺失数据条件下优于现有的最先进方法。
更新时间: 2025-07-04 05:39:50
领域: cs.LG,cs.AI
Scaffolding Recursive Divergence and Convergence in Story Ideation
Human creative ideation involves both exploration of diverse ideas (divergence) and selective synthesis of explored ideas into coherent combinations (convergence). While processes of divergence and convergence are often interleaved and nested, existing AI-powered creativity support tools (CSTs) lack support for sophisticated orchestration of divergence and convergence. We present Reverger, an AI-powered CST that helps users ideate variations of conceptual directions for modifying a story by scaffolding flexible iteration between divergence and convergence. For divergence, our tool enables recursive exploration of alternative high-level directions for modifying a specific part of the original story. For convergence, it allows users to collect explored high-level directions and synthesize them into concrete variations. Users can then iterate between divergence and convergence until they find a satisfactory outcome. A within-subject study revealed that Reverger permitted participants to explore more unexpected and diverse high-level directions than a comparable baseline. Reverger users also felt that they had more fine-grained control and discovered more effort-worthy outcomes.
Updated: 2025-07-04 05:25:19
标题: 在故事构思中支架递归分歧与融合
摘要: 人类创造性构思涉及对多样化想法的探索(分歧)以及将探索的想法选择性地综合为连贯的组合(收敛)。虽然分歧和收敛的过程经常交织在一起并嵌套在一起,但现有的人工智能创造性支持工具(CSTs)缺乏对分歧和收敛的复杂协调支持。我们提出了Reverger,一种人工智能驱动的CST,可帮助用户通过支撑分歧和收敛之间的灵活迭代来构思修改故事的概念方向的变化。对于分歧,我们的工具允许对原始故事的特定部分进行替代高层方向的递归探索。对于收敛,它允许用户收集探索的高层方向并将其综合为具体的变体。用户可以在分歧和收敛之间进行迭代,直到找到令人满意的结果。一项被试内研究表明,与可比较基准相比,Reverger允许参与者探索更多意想不到的和多样化的高层方向。Reverger用户还感到他们拥有更加精细的控制,并发现了更多值得投入努力的结果。
更新时间: 2025-07-04 05:25:19
领域: cs.HC,cs.AI
Leveraging Out-of-Distribution Unlabeled Images: Semi-Supervised Semantic Segmentation with an Open-Vocabulary Model
In semi-supervised semantic segmentation, existing studies have shown promising results in academic settings with controlled splits of benchmark datasets. However, the potential benefits of leveraging significantly larger sets of unlabeled images remain unexplored. In real-world scenarios, abundant unlabeled images are often available from online sources (web-scraped images) or large-scale datasets. However, these images may have different distributions from those of the target dataset, a situation known as out-of-distribution (OOD). Using these images as unlabeled data in semi-supervised learning can lead to inaccurate pseudo-labels, potentially misguiding network training. In this paper, we propose a new semi-supervised semantic segmentation framework with an open-vocabulary segmentation model (SemiOVS) to effectively utilize unlabeled OOD images. Extensive experiments on Pascal VOC and Context datasets demonstrate two key findings: (1) using additional unlabeled images improves the performance of semi-supervised learners in scenarios with few labels, and (2) using the open-vocabulary segmentation (OVS) model to pseudo-label OOD images leads to substantial performance gains. In particular, SemiOVS outperforms existing PrevMatch and SemiVL methods by +3.5 and +3.0 mIoU, respectively, on Pascal VOC with a 92-label setting, achieving state-of-the-art performance. These findings demonstrate that our approach effectively utilizes abundant unlabeled OOD images for semantic segmentation tasks. We hope this work can inspire future research and real-world applications. The code is available at https://github.com/wooseok-shin/SemiOVS
Updated: 2025-07-04 05:12:37
标题: 利用分布之外的无标签图像:利用开放词汇模型的半监督语义分割
摘要: 在半监督语义分割中,现有研究已经在控制分割的基准数据集中展示了有希望的结果。然而,利用大量未标记图像集可能带来的潜在好处尚未被探索。在现实场景中,大量未标记图像通常可以从在线来源(网络抓取图像)或大规模数据集中获得。然而,这些图像可能具有与目标数据集不同的分布,这种情况称为分布外(OOD)。将这些图像作为半监督学习中的未标记数据可能会导致不准确的伪标签,从而潜在地误导网络训练。在本文中,我们提出了一个新的半监督语义分割框架,其中包括一个开放词汇分割模型(SemiOVS),以有效利用未标记的OOD图像。在Pascal VOC和Context数据集上进行了大量实验,得出了两个关键发现:(1)在只有少量标签的情况下,使用额外的未标记图像可以提高半监督学习器的性能;(2)使用开放词汇分割(OVS)模型对OOD图像进行伪标记可以带来显著的性能提升。特别是,在Pascal VOC的92个标签设置中,SemiOVS比现有的PrevMatch和SemiVL方法分别提高了+3.5和+3.0 mIoU,实现了最先进的性能。这些发现表明我们的方法有效地利用了大量未标记的OOD图像用于语义分割任务。我们希望这项工作可以激发未来的研究和实际应用。代码可在https://github.com/wooseok-shin/SemiOVS获取。
更新时间: 2025-07-04 05:12:37
领域: cs.CV,cs.AI
LRM-1B: Towards Large Routing Model
Vehicle routing problems (VRPs) are central to combinatorial optimization with significant practical implications. Recent advancements in neural combinatorial optimization (NCO) have demonstrated promising results by leveraging neural networks to solve VRPs, yet the exploration of model scaling within this domain remains underexplored. Inspired by the success of model scaling in large language models (LLMs), this study introduces a Large Routing Model with 1 billion parameters (LRM-1B), designed to address diverse VRP scenarios. We present a comprehensive evaluation of LRM-1B across multiple problem variants, distributions, and sizes, establishing state-of-the-art results. Our findings reveal that LRM-1B not only adapts to different VRP challenges but also showcases superior performance, outperforming existing models. Additionally, we explore the scaling behavior of neural routing models from 1M to 1B parameters. Our analysis confirms power-law between multiple model factors and performance, offering critical insights into the optimal configurations for foundation neural routing solvers.
Updated: 2025-07-04 05:10:20
标题: LRM-1B: 迈向大型路由模型
摘要: 车辆路径问题(VRP)是组合优化的核心问题,具有重要的实际意义。最近神经组合优化(NCO)领域的进展表明,通过利用神经网络来解决VRP问题取得了令人期待的结果,然而在该领域内模型扩展的探索仍未被充分开发。受大型语言模型(LLMs)模型扩展成功的启发,本研究引入了一个具有10亿参数的大规模路径模型(LRM-1B),旨在解决各种VRP场景。我们对LRM-1B在多个问题变体、分布和规模上进行了全面评估,并取得了最新技术成果。我们的研究结果显示,LRM-1B不仅适应不同的VRP挑战,而且表现出优越的性能,超越了现有模型。此外,我们探索了神经路径模型从1M到1B参数的扩展行为。我们的分析证实了多个模型因素和性能之间的幂律关系,为基础神经路径求解器的最佳配置提供了关键见解。
更新时间: 2025-07-04 05:10:20
领域: cs.LG
Dyn-O: Building Structured World Models with Object-Centric Representations
World models aim to capture the dynamics of the environment, enabling agents to predict and plan for future states. In most scenarios of interest, the dynamics are highly centered on interactions among objects within the environment. This motivates the development of world models that operate on object-centric rather than monolithic representations, with the goal of more effectively capturing environment dynamics and enhancing compositional generalization. However, the development of object-centric world models has largely been explored in environments with limited visual complexity (such as basic geometries). It remains underexplored whether such models can generalize to more complex settings with diverse textures and cluttered scenes. In this paper, we fill this gap by introducing Dyn-O, an enhanced structured world model built upon object-centric representations. Compared to prior work in object-centric representations, Dyn-O improves in both learning representations and modeling dynamics. On the challenging Procgen games, we find that our method can learn object-centric world models directly from pixel observations, outperforming DreamerV3 in rollout prediction accuracy. Furthermore, by decoupling object-centric features into dynamics-agnostic and dynamics-aware components, we enable finer-grained manipulation of these features and generate more diverse imagined trajectories.
Updated: 2025-07-04 05:06:15
标题: Dyn-O:使用以对象为中心的表示构建结构化世界模型
摘要: 世界模型旨在捕捉环境的动态,使代理能够预测和规划未来状态。在大多数感兴趣的场景中,动态高度集中在环境中的物体之间的交互上。这促使开发以物体为中心而不是单一表示的世界模型,其目标是更有效地捕捉环境动态并增强组合泛化能力。然而,物体为中心的世界模型的发展主要在视觉复杂性有限的环境中进行探索(如基本几何形状)。目前尚未探讨这些模型是否可以推广到具有多样纹理和杂乱场景的更复杂环境中。在本文中,我们通过引入Dyn-O,一个基于物体为中心表示的增强结构化世界模型,填补了这一空白。与以往的物体为中心表示的工作相比,Dyn-O在学习表示和建模动态方面都有所改进。在具有挑战性的Procgen游戏中,我们发现我们的方法可以直接从像素观察中学习物体为中心的世界模型,优于DreamerV3在预测准确性上的表现。此外,通过将物体为中心的特征解耦为动态不可知和动态感知两个组件,我们实现了对这些特征的更精细操纵,并生成更多样化的想象轨迹。
更新时间: 2025-07-04 05:06:15
领域: cs.LG
SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL
Building capable household and industrial robots requires mastering the control of versatile, high-degree-of-freedom (DoF) systems such as mobile manipulators. While reinforcement learning (RL) holds promise for autonomously acquiring robot control policies, scaling it to high-DoF embodiments remains challenging. Direct RL in the real world demands both safe exploration and high sample efficiency, which are difficult to achieve in practice. Sim-to-real RL, on the other hand, is often brittle due to the reality gap. This paper introduces SLAC, a method that renders real-world RL feasible for complex embodiments by leveraging a low-fidelity simulator to pretrain a task-agnostic latent action space. SLAC trains this latent action space via a customized unsupervised skill discovery method designed to promote temporal abstraction, disentanglement, and safety, thereby facilitating efficient downstream learning. Once a latent action space is learned, SLAC uses it as the action interface for a novel off-policy RL algorithm to autonomously learn downstream tasks through real-world interactions. We evaluate SLAC against existing methods on a suite of bimanual mobile manipulation tasks, where it achieves state-of-the-art performance. Notably, SLAC learns contact-rich whole-body tasks in under an hour of real-world interactions, without relying on any demonstrations or hand-crafted behavior priors. More information and robot videos at robo-rl.github.io
Updated: 2025-07-04 04:56:34
标题: SLAC:用于全身真实世界强化学习的模拟预训练潜在行动空间
摘要: 建造能够胜任家庭和工业机器人需要掌握对多功能、高自由度系统(如移动操作器)的控制。虽然强化学习(RL)有望自主获取机器人控制策略,但将其扩展到高自由度的实体仍然具有挑战性。在现实世界中进行直接的RL需要安全探索和高样本效率,这在实践中很难实现。另一方面,基于模拟到真实的RL常常因为现实差距而变得脆弱。本文介绍了SLAC,这是一种方法,通过利用低保真度模拟器来预训练一个任务无关的潜在动作空间,从而使复杂实体的现实世界RL变得可行。SLAC通过定制的无监督技能发现方法训练这个潜在动作空间,旨在促进时间抽象、解缠和安全性,从而促进高效的下游学习。一旦学习了潜在动作空间,SLAC将其用作一种新颖的离线策略RL算法的动作接口,通过现实世界的互动自主学习下游任务。我们在一系列双手移动操作任务上评估了SLAC,结果表明其实现了最先进的性能。值得注意的是,SLAC在不到一小时的现实世界互动中学会了富有接触的全身任务,而没有依赖任何演示或手工行为先验。更多信息和机器人视频请访问robo-rl.github.io。
更新时间: 2025-07-04 04:56:34
领域: cs.RO,cs.AI,cs.LG
MGAA: Multi-Granular Adaptive Allocation fof Low-Rank Compression of LLMs
The enormous parameter scale of large language models (LLMs) has made model compression a research hotspot, which aims to alleviate computational resource demands during deployment and inference. As a promising direction, low-rank approximation technique has made remarkable achievements. Nevertheless, unfortunately, the vast majority of studies to low-rank approximation compression generally apply uniform compression ratios across all weight matrices, while disregarding their inherently differentiated impacts on the model's performance. Although a few recent work attempts to employ heuristic search strategies to achieve the optimal parameter allocation, such strategies are computationally inefficient and lose the generalization ability in the era of LLMs. In this study, we propose a novel parameter Multi-Granular Adaptive Allocation (MGAA) method, which can adaptively allocate parameters between and within sublayers without task-specific evaluations in the compression process. MGAA consists of two components: 1) Among different sublayers, it assigns compression ratios based on their cosine similarity between inputs and outputs, allowing for a more tailored compression in sublayers with varying degrees of importance, and 2) Within each sublayer, it allocates different compression ratios to weight matrices based on their energy distribution characteristics, ensuring a consistent energy retention ratio while optimizing compression efficiency. Comprehensive evaluations of MGAA across multiple LLMs backbone models and benchmark datasets demonstrate its superior performance. Additionally, we apply our MGAA to multimodal model LLaVA, exhibiting remarkable performance improvements.
Updated: 2025-07-04 04:54:01
标题: MGAA:低秩压缩LLMs的多粒度自适应分配
摘要: 大型语言模型(LLMs)的巨大参数规模使得模型压缩成为研究热点,旨在在部署和推断过程中减轻计算资源需求。作为一种有前途的方向,低秩近似技术取得了显著成就。然而,不幸的是,绝大多数研究低秩近似压缩通常在所有权重矩阵上应用均匀的压缩比,而忽略了它们对模型性能的固有差异影响。尽管最近一些工作尝试使用启发式搜索策略来实现最佳参数分配,但这样的策略在LLMs时代计算效率低下且失去了泛化能力。在本研究中,我们提出了一种新颖的参数多粒度自适应分配(MGAA)方法,可以在压缩过程中在子层之间和内部自适应分配参数,而无需特定任务评估。MGAA由两个组成部分组成:1)在不同的子层之间,它基于它们的输入和输出之间的余弦相似性分配压缩比,从而在具有不同重要程度的子层中实现更加个性化的压缩;2)在每个子层内部,它根据它们的能量分布特征为权重矩阵分配不同的压缩比,确保在优化压缩效率的同时保持一致的能量保留比。对多个LLMs骨干模型和基准数据集上的MGAA进行全面评估表明其优越性能。此外,我们将MGAA应用于多模型LLaVA,展示出显著的性能改进。
更新时间: 2025-07-04 04:54:01
领域: cs.LG,cs.AI
LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents
Large language models (LLMs) have demonstrated promise in reasoning tasks and general decision-making in static environments. In long-term planning tasks, however, errors tend to accumulate, often leading to unsafe or inefficient behavior, limiting their use in general-purpose settings. We propose a modular actor-critic architecture in which an LLM actor is guided by LTLCrit, a trajectory-level LLM critic that communicates via linear temporal logic (LTL). Our setup combines the reasoning strengths of language models with the guarantees of formal logic. The actor selects high-level actions from natural language observations, while the critic analyzes full trajectories and proposes new LTL constraints that shield the actor from future unsafe or inefficient behavior. The architecture supports both fixed, hand-specified safety constraints and adaptive, learned soft constraints that promote long-term efficiency. Our architecture is model-agnostic: any LLM-based planner can serve as the actor, and LTLCrit serves as a logic-generating wrapper. We formalize planning as graph traversal under symbolic constraints, allowing LTLCrit to analyze failed or suboptimal trajectories and generate new temporal logic rules that improve future behavior. We evaluate our system on the Minecraft diamond-mining benchmark, achieving 100% completion rates and improving efficiency compared to baseline LLM planners. Our results suggest that enabling LLMs to supervise each other through logic is a powerful and flexible paradigm for safe, generalizable decision making.
Updated: 2025-07-04 04:53:53
标题: LTLCrit:基于时间逻辑的LLM批评家,用于安全高效的具身代理
摘要: 大型语言模型(LLMs)已经在推理任务和静态环境中的一般决策中展现出潜力。然而,在长期规划任务中,错误往往会积累,经常导致不安全或低效的行为,从而限制了它们在通用环境中的使用。我们提出了一个模块化的演员-评论家架构,其中LLM演员由LTLCrit引导,LTLCrit是一个轨迹级LLM评论家,通过线性时间逻辑(LTL)进行通信。我们的设置结合了语言模型的推理优势和形式逻辑的保证。演员从自然语言观察中选择高级别行动,而评论家分析完整的轨迹并提出新的LTL约束,以保护演员免受未来不安全或低效行为的影响。该架构支持固定的、手动指定的安全约束和自适应的、学习的软约束,促进长期效率。我们的架构是模型不可知的:任何基于LLM的规划器都可以充当演员,而LTLCrit充当逻辑生成包装器。我们将规划形式化为在符号约束下的图遍历,允许LTLCrit分析失败或次优的轨迹,并生成新的时间逻辑规则,改善未来的行为。我们在Minecraft钻石采矿基准上评估了我们的系统,实现了100%的完成率,并提高了效率,与基线LLM规划器相比。我们的结果表明,通过逻辑使LLMs相互监督是进行安全、通用决策制定的强大且灵活的范式。
更新时间: 2025-07-04 04:53:53
领域: cs.AI,cs.CL,cs.LG,cs.SY,eess.SY
Lion Cub: Minimizing Communication Overhead in Distributed Lion
Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottleneck. While gradient compression techniques have been explored for SGD and Adam, the Lion optimizer has the distinct advantage that its update vectors are the output of a sign operation, enabling straightforward quantization. However, simply compressing updates for communication and using techniques like majority voting fails to lead to end-to-end speedups due to inefficient communication algorithms and reduced convergence. We analyze three factors critical to distributed learning with Lion: optimizing communication methods, identifying effective quantization methods, and assessing the necessity of momentum synchronization. Our findings show that quantization techniques adapted to Lion and selective momentum synchronization can significantly reduce communication costs while maintaining convergence. We combine these into Lion Cub, which enables up to 5x speedups in end-to-end training compared to Lion. This highlights Lion's potential as a communication-efficient solution for distributed training.
Updated: 2025-07-04 04:44:35
标题: 狮子幼崽:在分布式狮子系统中最小化通信开销
摘要: 通信开销是分布式深度学习中的一个关键挑战,尤其是在较慢的以太网互连上,鉴于当前硬件趋势,通信很可能成为一个主要瓶颈。虽然梯度压缩技术已经被用于SGD和Adam,但狮子优化器具有独特的优势,即其更新向量是通过一个符号操作得到的,从而实现了直接的量化。然而,仅仅压缩更新用于通信并使用类似大多数投票的技术未能实现端到端的加速,这是由于通信算法效率低下和收敛速度降低。我们分析了与狮子分布式学习关键的三个因素:优化通信方法,识别有效的量化方法,评估动量同步的必要性。我们的研究结果表明,适应狮子的量化技术和选择性动量同步可以显著降低通信成本同时保持收敛性。我们将这些技术结合成狮子幼崽,相比狮子能实现最多5倍的端到端训练加速。这凸显了狮子作为分布式训练中通信高效解决方案的潜力。
更新时间: 2025-07-04 04:44:35
领域: cs.LG,cs.DC
Global Variational Inference Enhanced Robust Domain Adaptation
Deep learning-based domain adaptation (DA) methods have shown strong performance by learning transferable representations. However, their reliance on mini-batch training limits global distribution modeling, leading to unstable alignment and suboptimal generalization. We propose Global Variational Inference Enhanced Domain Adaptation (GVI-DA), a framework that learns continuous, class-conditional global priors via variational inference to enable structure-aware cross-domain alignment. GVI-DA minimizes domain gaps through latent feature reconstruction, and mitigates posterior collapse using global codebook learning with randomized sampling. It further improves robustness by discarding low-confidence pseudo-labels and generating reliable target-domain samples. Extensive experiments on four benchmarks and thirty-eight DA tasks demonstrate consistent state-of-the-art performance. We also derive the model's evidence lower bound (ELBO) and analyze the effects of prior continuity, codebook size, and pseudo-label noise tolerance. In addition, we compare GVI-DA with diffusion-based generative frameworks in terms of optimization principles and efficiency, highlighting both its theoretical soundness and practical advantages.
Updated: 2025-07-04 04:43:23
标题: 全球变分推断增强的鲁棒领域适应
摘要: 基于深度学习的领域自适应(DA)方法已经展现出了学习可转移表示的强大性能。然而,它们对小批量训练的依赖限制了全局分布建模,导致不稳定的对齐和次优的泛化。我们提出了全局变分推断增强领域自适应(GVI-DA)框架,通过变分推断学习连续的、类条件的全局先验,以实现结构感知的跨领域对齐。GVI-DA通过潜在特征重建减小了领域差距,并通过随机抽样的全局码书学习来减轻后验崩溃。它通过丢弃低置信度伪标签和生成可靠的目标域样本进一步提高了鲁棒性。在四个基准测试和三十八个DA任务上的大量实验证明了持续的最先进性能。我们还推导了模型的证据下界(ELBO),并分析了先验连续性、码书大小和伪标签噪声容忍度的影响。此外,我们将GVI-DA与基于扩散的生成框架在优化原则和效率方面进行了比较,突出了其理论的合理性和实际优势。
更新时间: 2025-07-04 04:43:23
领域: cs.LG
Do Tensorized Large-Scale Spatiotemporal Dynamic Atmospheric Data Exhibit Low-Rank Properties?
In this study, we investigate for the first time the low-rank properties of a tensorized large-scale spatio-temporal dynamic atmospheric variable. We focus on the Sentinel-5P tropospheric NO2 product (S5P-TN) over a four-year period in an area that encompasses the contiguous United States (CONUS). Here, it is demonstrated that a low-rank approximation of such a dynamic variable is feasible. We apply the low-rank properties of the S5P-TN data to inpaint gaps in the Sentinel-5P product by adopting a low-rank tensor model (LRTM) based on the CANDECOMP / PARAFAC (CP) decomposition and alternating least squares (ALS). Furthermore, we evaluate the LRTM's results by comparing them with spatial interpolation using geostatistics, and conduct a comprehensive spatial statistical and temporal analysis of the S5P-TN product. The results of this study demonstrated that the tensor completion successfully reconstructs the missing values in the S5P-TN product, particularly in the presence of extended cloud obscuration, predicting outliers and identifying hotspots, when the data is tensorized over extended spatial and temporal scales.
Updated: 2025-07-04 04:38:49
标题: 张量化的大规模时空动态大气数据表现出低秩特性吗?
摘要: 在这项研究中,我们首次研究了张量化的大规模时空动态大气变量的低秩特性。我们重点关注了一个横跨美国本土的区域在四年的时间内的Sentinel-5P对流层NO2产品(S5P-TN)。在这里,我们证明了这样一个动态变量的低秩逼近是可行的。我们应用S5P-TN数据的低秩特性来填补Sentinel-5P产品中的缺失数据,采用基于CANDECOMP/PARAFAC(CP)分解和交替最小二乘法(ALS)的低秩张量模型(LRTM)。此外,我们通过将其与地统计学中的空间插值进行比较,并对S5P-TN产品进行全面的空间统计和时间分析来评估LRTM的结果。本研究的结果表明,张量补全成功地重建了S5P-TN产品中的缺失值,特别是在存在广泛云遮蔽、预测异常值和识别热点时,当数据在广泛的空间和时间尺度上张量化时。
更新时间: 2025-07-04 04:38:49
领域: cs.LG,physics.ao-ph
Memory Mosaics at scale
Memory Mosaics [Zhang et al., 2025], networks of associative memories, have demonstrated appealing compositional and in-context learning capabilities on medium-scale networks (GPT-2 scale) and synthetic small datasets. This work shows that these favorable properties remain when we scale memory mosaics to large language model sizes (llama-8B scale) and real-world datasets. To this end, we scale memory mosaics to 10B size, we train them on one trillion tokens, we introduce a couple architectural modifications ("Memory Mosaics v2"), we assess their capabilities across three evaluation dimensions: training-knowledge storage, new-knowledge storage, and in-context learning. Throughout the evaluation, memory mosaics v2 match transformers on the learning of training knowledge (first dimension) and significantly outperforms transformers on carrying out new tasks at inference time (second and third dimensions). These improvements cannot be easily replicated by simply increasing the training data for transformers. A memory mosaics v2 trained on one trillion tokens still perform better on these tasks than a transformer trained on eight trillion tokens.
Updated: 2025-07-04 04:23:03
标题: 规模化的记忆拼贴
摘要: 记忆马赛克[Zhang等人,2025年],关联记忆网络,已经在中等规模网络(GPT-2规模)和合成小数据集上展示出具有吸引力的组合和上下文学习能力。本文表明,当我们将记忆马赛克扩展到大型语言模型尺寸(llama-8B规模)和真实世界数据集时,这些有利特性仍然存在。 为此,我们将记忆马赛克扩展到10B大小,用一万亿标记进行训练,引入了一些架构修改(“记忆马赛克v2”),我们评估了它们在三个评估维度上的能力:训练知识存储,新知识存储和上下文学习。 在整个评估过程中,记忆马赛克v2在学习训练知识(第一维度)方面与变压器相匹配,并在推理时执行新任务(第二和第三维度)方面明显优于变压器。这些改进不能简单地通过增加变压器的训练数据来复制。一个训练有一万亿标记的记忆马赛克v2在这些任务上仍然比一个训练有八万亿标记的变压器表现更好。
更新时间: 2025-07-04 04:23:03
领域: cs.AI
VGMShield: Mitigating Misuse of Video Generative Models
With the rapid advancement in video generation, people can conveniently use video generation models to create videos tailored to their specific desires. As a result, there are also growing concerns about the potential misuse of video generation for spreading illegal content and misinformation. In this work, we introduce VGMShield: a set of straightforward but effective mitigations through the lifecycle of fake video generation. We start from fake video detection, trying to understand whether there is uniqueness in generated videos and whether we can differentiate them from real videos; then, we investigate the fake video source tracing problem, which maps a fake video back to the model that generated it. Towards these, we propose to leverage pre-trained models that focus on spatial-temporal dynamics as the backbone to identify inconsistencies in videos. In detail, we analyze fake videos from the perspective of the generation process. Based on the observation of attention shifts, motion variations, and frequency fluctuations, we identify common patterns in the generated video. These patterns serve as the foundation for our experiments on fake video detection and source tracing. Through experiments on seven state-of-the-art open-source models, we demonstrate that current models still cannot reliably reproduce spatial-temporal relationships, and thus, we can accomplish detection and source tracing with over 90% accuracy. Furthermore, anticipating future generative model improvements, we propose a prevention method that adds invisible perturbations to the query images to make the generated videos look unreal. Together with detection and tracing, our multi-faceted set of solutions can effectively mitigate misuse of video generative models.
Updated: 2025-07-04 04:21:23
标题: VGMShield:减轻视频生成模型的滥用
摘要: 随着视频生成技术的快速发展,人们可以方便地利用视频生成模型创建符合他们特定需求的视频。因此,人们也越来越担心视频生成技术可能被滥用来传播非法内容和虚假信息。 在这项工作中,我们介绍了VGMShield:一组简单但有效的缓解措施,涵盖了虚假视频生成的整个生命周期。我们从虚假视频检测开始,试图了解生成的视频是否具有独特性,以及我们是否能够将其与真实视频区分开;然后,我们调查了虚假视频来源追踪问题,即将虚假视频追溯到生成它的模型。为了实现这些目标,我们建议利用专注于时空动态的预训练模型作为骨干,以识别视频中的不一致之处。具体而言,我们从生成过程的角度分析虚假视频。基于对注意力转移、运动变化和频率波动的观察,我们识别了生成视频中的共同模式。这些模式为我们在虚假视频检测和来源追踪实验中的基础。 通过对七个最先进的开源模型进行实验,我们证明当前模型仍然无法可靠地再现时空关系,因此,我们可以以超过90%的准确率完成检测和来源追踪。 此外,为预测未来生成模型的改进,我们提出了一种预防方法,即向查询图像添加看不见的扰动,使生成的视频看起来不真实。结合检测和追踪,我们的多方面解决方案可以有效地缓解视频生成模型的滥用。
更新时间: 2025-07-04 04:21:23
领域: cs.CR,cs.AI,cs.CV,cs.LG,eess.IV
GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency
Large language models (LLMs) have become ubiquitous in providing different forms of writing assistance to different writers. However, LLM-powered writing systems often fall short in capturing the nuanced personalization and control needed to effectively support users -- particularly for those who lack experience with prompt engineering. To address these challenges, we introduce GhostWriter, an AI-enhanced design probe that enables users to exercise enhanced agency and personalization during writing. GhostWriter leverages LLMs to implicitly learn the user's intended writing style for seamless personalization, while exposing explicit teaching moments for style refinement and reflection. We study 18 participants who use GhostWriter on two distinct writing tasks, observing that it helps users craft personalized text generations and empowers them by providing multiple ways to control the system's writing style. Based on this study, we present insights on how specific design choices can promote greater user agency in AI-assisted writing and discuss people's evolving relationships with such systems. We conclude by offering design recommendations for future work.
Updated: 2025-07-04 04:19:09
标题: GhostWriter:通过个性化和代理增强协作人工智能写作体验
摘要: 大型语言模型(LLMs)已经变得无处不在,为不同的写作者提供不同形式的写作辅助。然而,以LLM为动力的写作系统往往在捕捉需要有效支持用户的微妙个性化和控制方面表现不佳,特别是对于缺乏提示工程经验的用户。为了解决这些挑战,我们引入了GhostWriter,这是一个AI增强的设计探针,使用户在写作过程中能够行使增强的代理权和个性化。GhostWriter利用LLMs隐含地学习用户的意图写作风格,以实现无缝个性化,并暴露显式的教学时刻以进行风格的细化和反思。我们研究了18名参与者在两个不同的写作任务上使用GhostWriter的情况,观察到它帮助用户打造个性化的文本生成,并通过提供多种控制系统写作风格的方式使他们具有更大的权力。根据这项研究,我们提出了关于如何通过具体的设计选择促进AI辅助写作中更大用户代理权的见解,并讨论人们与这类系统的关系如何演变。最后,我们提出了未来工作的设计建议。
更新时间: 2025-07-04 04:19:09
领域: cs.HC,cs.AI
QCResUNet: Joint Subject-level and Voxel-level Segmentation Quality Prediction
Deep learning has made significant strides in automated brain tumor segmentation from magnetic resonance imaging (MRI) scans in recent years. However, the reliability of these tools is hampered by the presence of poor-quality segmentation outliers, particularly in out-of-distribution samples, making their implementation in clinical practice difficult. Therefore, there is a need for quality control (QC) to screen the quality of the segmentation results. Although numerous automatic QC methods have been developed for segmentation quality screening, most were designed for cardiac MRI segmentation, which involves a single modality and a single tissue type. Furthermore, most prior works only provided subject-level predictions of segmentation quality and did not identify erroneous parts segmentation that may require refinement. To address these limitations, we proposed a novel multi-task deep learning architecture, termed QCResUNet, which produces subject-level segmentation-quality measures as well as voxel-level segmentation error maps for each available tissue class. To validate the effectiveness of the proposed method, we conducted experiments on assessing its performance on evaluating the quality of two distinct segmentation tasks. First, we aimed to assess the quality of brain tumor segmentation results. For this task, we performed experiments on one internal and two external datasets. Second, we aimed to evaluate the segmentation quality of cardiac Magnetic Resonance Imaging (MRI) data from the Automated Cardiac Diagnosis Challenge. The proposed method achieved high performance in predicting subject-level segmentation-quality metrics and accurately identifying segmentation errors on a voxel basis. This has the potential to be used to guide human-in-the-loop feedback to improve segmentations in clinical settings.
Updated: 2025-07-04 03:56:02
标题: QCResUNet: 联合主体级别和体素级别的分割质量预测
摘要: 近年来,深度学习在自动化磁共振成像(MRI)扫描中的脑肿瘤分割方面取得了重大进展。然而,这些工具的可靠性受到质量较差的分割异常值的影响,特别是在超出分布范围的样本中,使它们在临床实践中的应用变得困难。因此,有必要进行质量控制(QC)来筛选分割结果的质量。尽管已经开发了许多自动QC方法用于分割质量筛选,但大多数是为心脏MRI分割设计的,其中涉及单一模态和单一组织类型。此外,大多数先前的作品只提供了分割质量的主观级别预测,并未识别可能需要细化的错误部分分割。为了解决这些限制,我们提出了一种新颖的多任务深度学习架构,称为QCResUNet,它产生了主观级别的分割质量指标以及每个可用组织类别的像素级分割错误图。为了验证所提出的方法的有效性,我们进行了实验,评估其在评估两个不同分割任务的质量方面的表现。首先,我们旨在评估脑肿瘤分割结果的质量。为此,我们在一个内部和两个外部数据集上进行了实验。其次,我们旨在评估来自自动心脏诊断挑战的心脏磁共振成像(MRI)数据的分割质量。所提出的方法在预测主观级别分割质量指标方面表现出良好的性能,并准确地识别了基于像素的分割错误。这有潜力用于引导人机交互反馈,以改善临床环境中的分割。
更新时间: 2025-07-04 03:56:02
领域: eess.IV,cs.CV,cs.LG
Conformal Information Pursuit for Interactively Guiding Large Language Models
A significant use case of instruction-finetuned Large Language Models (LLMs) is to solve question-answering tasks interactively. In this setting, an LLM agent is tasked with making a prediction by sequentially querying relevant information from the user, as opposed to a single-turn conversation. This paper explores sequential querying strategies that aim to minimize the expected number of queries. One such strategy is Information Pursuit (IP), a greedy algorithm that at each iteration selects the query that maximizes information gain or equivalently minimizes uncertainty. However, obtaining accurate estimates of mutual information or conditional entropy for LLMs is very difficult in practice due to over- or under-confident LLM probabilities, which leads to suboptimal query selection and predictive performance. To better estimate the uncertainty at each iteration, we propose Conformal Information Pursuit (C-IP), an alternative approach to sequential information gain based on conformal prediction sets. More specifically, C-IP leverages a relationship between prediction sets and conditional entropy at each iteration to estimate uncertainty based on the average size of conformal prediction sets. In contrast to conditional entropy, we find that conformal prediction sets are a distribution-free and robust method of measuring uncertainty. Experiments with 20 Questions show that C-IP obtains better predictive performance and shorter query-answer chains compared to previous approaches to IP and uncertainty-based chain-of-thought methods. Furthermore, extending to an interactive medical setting between a doctor and a patient on the MediQ dataset, C-IP achieves competitive performance with direct single-turn prediction while offering greater interpretability.
Updated: 2025-07-04 03:55:39
标题: Conformal Information Pursuit 用于交互指导大型语言模型
摘要: 一种重要的使用情况是通过指导微调的大型语言模型(LLMs)来交互式解决问答任务。在这种情境中,一个LLM代理被要求通过从用户顺序查询相关信息来进行预测,而不是单一回合对话。本文探讨了旨在最小化预期查询次数的顺序查询策略。其中一种策略是信息追踪(IP),一种贪婪算法,每次迭代都选择最大化信息增益或等效地最小化不确定性的查询。然而,在实践中,由于LLMs的概率过于自信或不够自信,因此很难准确估计互信息或条件熵,这导致次优的查询选择和预测性能。为了更好地估计每次迭代的不确定性,我们提出了一种基于符合预测集的交互信息追踪(C-IP)的替代顺序信息增益的方法。具体来说,C-IP利用了每次迭代中预测集和条件熵之间的关系,根据符合预测集的平均大小来估计不确定性。与条件熵相比,我们发现符合预测集是一种无分布且稳健的测量不确定性的方法。通过20个问题的实验表明,与IP和基于不确定性的思维链方法相比,C-IP获得了更好的预测性能和更短的查询-回答链。此外,在医疗交互设置中,医生和患者之间在MediQ数据集上,C-IP在提供更好可解释性的同时,实现了与直接单回合预测相媲美的性能。
更新时间: 2025-07-04 03:55:39
领域: cs.LG,cs.AI,stat.ML
Securing Transformer-based AI Execution via Unified TEE and Crypto-protected Accelerators
Recent advances in Transformer models, e.g., large language models (LLMs), have brought tremendous breakthroughs in various artificial intelligence (AI) tasks, leading to their wide applications in many security-critical domains. Due to their unprecedented scale and prohibitively high development cost, these models have become highly valuable intellectual property for AI stakeholders and are increasingly deployed via machine learning as a service (MLaaS). However, MLaaS often runs on untrusted cloud infrastructure, exposing data and models to potential breaches. Mainstream protection mechanisms leverage trusted execution environments (TEEs) where confidentiality and integrity for secretive data are shielded using hardware-based encryption and integrity checking. Unfortunately, running model inference entirely within TEEs is subject to non-trivial slowdown, which is further exacerbated in LLMs due to the substantial computation and memory footprint involved. Recent studies reveal that the hybrid TEE-based scheme offloading partial model inference operations to the untrusted accelerators (e.g., GPU) is a promising solution. However, prior offloading schemes fail to ensure dual protection of data and model in Transformer inference, as they cannot securely offload critical operations, i.e., Attention and SoftMax, forcing these computations to remain confined within TEEs. To address these challenges, we propose TwinShield, a framework enabling secure Transformer inference in heterogeneous TEE and accelerator systems with dual protection for both model and data. TwinShield offloads ~87% of computation to GPUs and delivers 4.0x - 6.1x speedups over previous approaches across various Transformer models.
Updated: 2025-07-04 03:52:53
标题: 通过统一的TEE和加密保护加速器来保护基于Transformer的AI执行
摘要: 最近在Transformer模型方面取得了重大进展,例如大型语言模型(LLMs),在各种人工智能(AI)任务中带来了巨大突破,导致它们广泛应用于许多安全关键领域。由于它们前所未有的规模和成本高昂的开发成本,这些模型已成为人工智能利益相关者非常宝贵的知识产权,并且越来越多地通过机器学习作为服务(MLaaS)进行部署。然而,MLaaS通常在不受信任的云基础设施上运行,将数据和模型暴露于潜在的攻击风险之中。主流的保护机制利用受信任的执行环境(TEEs),通过基于硬件的加密和完整性检查保护机密数据的机密性和完整性。不幸的是,完全在TEEs内部运行模型推断会导致非常严重的减速,这在LLMs中会进一步恶化,因为涉及了大量的计算和内存占用。最近的研究表明,基于混合TEE的方案将部分模型推断操作卸载到不受信任的加速器(例如GPU)是一个有希望的解决方案。然而,先前的卸载方案未能确保在Transformer推断中数据和模型的双重保护,因为它们无法安全地卸载关键操作,即注意力和SoftMax,迫使这些计算仍然局限在TEE中。为了解决这些挑战,我们提出了TwinShield,一个框架,可以在异构TEE和加速器系统中实现安全的Transformer推断,同时对模型和数据进行双重保护。TwinShield将约87%的计算卸载到GPU,并在各种Transformer模型上比以前的方法提供4.0倍至6.1倍的加速。
更新时间: 2025-07-04 03:52:53
领域: cs.CR,cs.LG
Recommender systems, stigmergy, and the tyranny of popularity
Scientific recommender systems, such as Google Scholar and Web of Science, are essential tools for discovery. Search algorithms that power work through stigmergy, a collective intelligence mechanism that surfaces useful paths through repeated engagement. While generally effective, this "rich-get-richer" dynamic results in a small number of high-profile papers that dominate visibility. This essay argues argue that these algorithm over-reliance on popularity fosters intellectual homogeneity and exacerbates structural inequities, stifling innovative and diverse perspectives critical for scientific progress. We propose an overhaul of search platforms to incorporate user-specific calibration, allowing researchers to manually adjust the weights of factors like popularity, recency, and relevance. We also advise platform developers on how text embeddings and LLMs could be implemented in ways that increase user autonomy. While our suggestions are particularly pertinent to aligning recommender systems with scientific values, these ideas are broadly applicable to information access systems in general. Designing platforms that increase user autonomy is an important step toward more robust and dynamic information
Updated: 2025-07-04 03:51:55
标题: 推荐系统、诱导信息和流行度的暴政
摘要: 科学推荐系统(如Google学术和Web of Science)是发现的重要工具。驱动工作的搜索算法通过刺激作用,这是一种集体智慧机制,通过反复参与浮现有用的路径。虽然通常有效,但这种“富者愈富”的动态导致少数几篇高知名度的论文主导可见性。本文认为这些算法过度依赖流行度,促进了知识的同质化,加剧了结构性不平等,抑制了对科学进步至关重要的创新和多样化观点。我们提议对搜索平台进行彻底改革,以包含用户特定的校准,允许研究人员手动调整因素的权重,如流行度,新颖性和相关性。我们还建议平台开发者如何实施文本嵌入和LLMs,以增加用户的自主权。虽然我们的建议特别与将推荐系统与科学价值观对齐相关,但这些想法广泛适用于一般信息访问系统。设计增加用户自主权的平台是迈向更加健壮和动态信息的重要一步。
更新时间: 2025-07-04 03:51:55
领域: cs.CY,cs.AI,cs.HC,cs.IR
REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models
Reinforcement Learning from Human Feedback (RLHF) is crucial in aligning large language models (LLMs) with human values and preferences. While state-of-the-art applications like ChatGPT/GPT-4 commonly employ Proximal Policy Optimization (PPO), including a critic network introduces significant computational overhead. REINFORCE-based methods, such as REINFORCE Leave One-Out (RLOO), ReMax, and Group Relative Policy Optimization (GRPO), address this limitation by eliminating the critic network. However, these approaches face challenges in accurate advantage estimation. Specifically, they estimate advantages independently for responses to each prompt, which can lead to overfitting on more straightforward prompts and vulnerability to reward hacking. To address these challenges, we introduce REINFORCE++, a novel approach that removes the critic model while using the normalized reward of a batch as the baseline. Our empirical evaluation demonstrates that REINFORCE++ exhibits robust performance across various reward models without requiring prompt set truncation. Furthermore, it achieves superior generalization in RLHF and long chain-of-thought (CoT) settings compared to REINFORCE-based methods. The implementation is available at https://github.com/OpenRLHF/OpenRLHF.
Updated: 2025-07-04 03:51:01
标题: REINFORCE++:一个对提示和奖励模型都具有鲁棒性的高效RLHF算法
摘要: 《从人类反馈中学习的强化学习(RLHF)在调整大型语言模型(LLMs)与人类价值观和偏好方面至关重要。尽管像ChatGPT/GPT-4这样的最先进应用通常采用Proximal Policy Optimization(PPO),但引入评论家网络会带来显着的计算开销。基于REINFORCE的方法,如REINFORCE Leave One-Out(RLOO)、ReMax和Group Relative Policy Optimization(GRPO),通过消除评论家网络来应对这一局限。然而,这些方法在准确估计优势方面面临挑战。具体来说,它们独立估计对每个提示的响应的优势,这可能导致在更简单的提示上过度拟合,并容易受到奖励欺骗的影响。为了解决这些挑战,我们引入了REINFORCE++,一种新颖的方法,它在移除评论家模型的同时使用批次的归一化奖励作为基线。我们的实证评估表明,REINFORCE++在各种奖励模型中展现出稳健的性能,无需截断提示集。此外,与基于REINFORCE的方法相比,它在RLHF和长思维链(CoT)设置中实现了更优越的泛化性能。该实现可在https://github.com/OpenRLHF/OpenRLHF 上找到。》
更新时间: 2025-07-04 03:51:01
领域: cs.CL,cs.LG
DBA-DFL: Towards Distributed Backdoor Attacks with Network Detection in Decentralized Federated Learning
Distributed backdoor attacks (DBA) have shown a higher attack success rate than centralized attacks in centralized federated learning (FL). However, it has not been investigated in the decentralized FL. In this paper, we experimentally demonstrate that, while directly applying DBA to decentralized FL, the attack success rate depends on the distribution of attackers in the network architecture. Considering that the attackers can not decide their location, this paper aims to achieve a high attack success rate regardless of the attackers' location distribution. Specifically, we first design a method to detect the network by predicting the distance between any two attackers on the network. Then, based on the distance, we organize the attackers in different clusters. Lastly, we propose an algorithm to \textit{dynamically} embed local patterns decomposed from a global pattern into the different attackers in each cluster. We conduct a thorough empirical investigation and find that our method can, in benchmark datasets, outperform both centralized attacks and naive DBA in different decentralized frameworks.
Updated: 2025-07-04 03:49:02
标题: DBA-DFL:朝向分布式背门攻击与网络检测在去中心化的联邦学习中
摘要: 分布式后门攻击(DBA)在集中式联邦学习(FL)中显示出比集中攻击更高的攻击成功率。然而,在分散式FL中尚未进行调查。本文实验性地表明,直接将DBA应用于分散式FL时,攻击成功率取决于网络架构中攻击者的分布。考虑到攻击者无法决定其位置,本文旨在实现高攻击成功率,无论攻击者的位置分布如何。具体地,我们首先设计了一种方法来通过预测网络上任意两个攻击者之间的距离来检测网络。然后,基于距离,我们将攻击者组织成不同的簇。最后,我们提出了一种算法,将从全局模式分解的本地模式动态地嵌入到每个簇中的不同攻击者中。我们进行了彻底的实证研究,并发现我们的方法在基准数据集中可以在不同的分散式框架中优于集中攻击和天真的DBA。
更新时间: 2025-07-04 03:49:02
领域: cs.LG
SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA
Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), are indispensable for efficiently customizing Large Language Models (LLMs). However, vanilla LoRA suffers from slow convergence speed and knowledge forgetting problems. Recent studies have leveraged the power of designed LoRA initialization, to enhance the fine-tuning efficiency, or to preserve knowledge in the pre-trained LLM. However, none of these works can address the two cases at the same time. To this end, we introduce Subspace-Constrained LoRA (SC-LoRA), a novel LoRA initialization framework engineered to navigate the trade-off between efficient fine-tuning and knowledge preservation. We achieve this by constraining the output of trainable LoRA adapters in a low-rank subspace, where the context information of fine-tuning data is most preserved while the context information of preserved knowledge is least retained, in a balanced way. Such constraint enables the trainable weights to primarily focus on the main features of fine-tuning data while avoiding damaging the preserved knowledge features. We provide theoretical analysis on our method, and conduct extensive experiments including safety preservation and world knowledge preservation, on various downstream tasks. In our experiments, SC-LoRA succeeds in delivering superior fine-tuning performance while markedly diminishing knowledge forgetting, surpassing contemporary LoRA initialization methods.
Updated: 2025-07-04 03:46:58
标题: SC-LoRA:通过子空间约束LoRA平衡高效微调和知识保留
摘要: 参数高效微调(PEFT)方法,特别是低秩适应(LoRA),对于高效定制大型语言模型(LLMs)至关重要。然而,普通的LoRA存在收敛速度慢和知识遗忘问题。最近的研究利用了设计良好的LoRA初始化的力量,以增强微调效率,或保留预训练的LLM中的知识。然而,这些工作都无法同时解决这两种情况。因此,我们引入了子空间约束LoRA(SC-LoRA),这是一个新颖的LoRA初始化框架,旨在平衡高效微调和知识保留之间的权衡。我们通过将可训练的LoRA适配器的输出约束在低秩子空间中来实现这一目标,在这个子空间中,微调数据的上下文信息得到最大程度的保留,同时保留知识的上下文信息得到最小程度的保留,以平衡的方式。这种约束使可训练的权重主要集中在微调数据的主要特征上,同时避免损害保留的知识特征。我们对我们的方法进行了理论分析,并在各种下游任务上进行了包括安全保留和世界知识保留在内的大量实验。在我们的实验中,SC-LoRA在显著减少知识遗忘的同时,成功提供了优越的微调性能,超越了当代LoRA初始化方法。
更新时间: 2025-07-04 03:46:58
领域: cs.LG,cs.AI
Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos
Self-supervised learning holds the promise of learning good representations from real-world continuous uncurated data streams. However, most existing works in visual self-supervised learning focus on static images or artificial data streams. Towards exploring a more realistic learning substrate, we investigate streaming self-supervised learning from long-form real-world egocentric video streams. Inspired by the event segmentation mechanism in human perception and memory, we propose "Memory Storyboard" that groups recent past frames into temporal segments for more effective summarization of the past visual streams for memory replay. To accommodate efficient temporal segmentation, we propose a two-tier memory hierarchy: the recent past is stored in a short-term memory, and the storyboard temporal segments are then transferred to a long-term memory. Experiments on real-world egocentric video datasets including SAYCam and KrishnaCam show that contrastive learning objectives on top of storyboard frames result in semantically meaningful representations that outperform those produced by state-of-the-art unsupervised continual learning methods.
Updated: 2025-07-04 03:28:30
标题: 记忆故事板:利用时间分割从主观视角视频中进行流式自监督学习
摘要: 自监督学习有望从现实世界的连续未经筛选的数据流中学习良好的表示。然而,目前大多数现有的视觉自监督学习作品都集中在静态图像或人工数据流上。为了探索更现实的学习基础,我们研究了从长篇现实世界自我中心视频流中进行流式自监督学习。受人类感知和记忆中的事件分割机制启发,我们提出了"记忆故事板",将最近的过去帧分组成时间段,以更有效地总结过去的视觉流以进行记忆重播。为了适应高效的时间分割,我们提出了一个两层记忆层次结构:最近的过去存储在短期记忆中,然后故事板时间段被转移到长期记忆中。在包括SAYCam和KrishnaCam在内的真实世界自我中心视频数据集上的实验表明,在故事板帧之上的对比学习目标会产生语义有意义的表示,这些表示优于最先进的无监督持续学习方法产生的表示。
更新时间: 2025-07-04 03:28:30
领域: cs.CV,cs.LG
Order Acquisition Under Competitive Pressure: A Rapidly Adaptive Reinforcement Learning Approach for Ride-Hailing Subsidy Strategies
The proliferation of ride-hailing aggregator platforms presents significant growth opportunities for ride-service providers by increasing order volume and gross merchandise value (GMV). On most ride-hailing aggregator platforms, service providers that offer lower fares are ranked higher in listings and, consequently, are more likely to be selected by passengers. This competitive ranking mechanism creates a strong incentive for service providers to adopt coupon strategies that lower prices to secure a greater number of orders, as order volume directly influences their long-term viability and sustainability. Thus, designing an effective coupon strategy that can dynamically adapt to market fluctuations while optimizing order acquisition under budget constraints is a critical research challenge. However, existing studies in this area remain scarce. To bridge this gap, we propose FCA-RL, a novel reinforcement learning-based subsidy strategy framework designed to rapidly adapt to competitors' pricing adjustments. Our approach integrates two key techniques: Fast Competition Adaptation (FCA), which enables swift responses to dynamic price changes, and Reinforced Lagrangian Adjustment (RLA), which ensures adherence to budget constraints while optimizing coupon decisions on new price landscape. Furthermore, we introduce RideGym, the first dedicated simulation environment tailored for ride-hailing aggregators, facilitating comprehensive evaluation and benchmarking of different pricing strategies without compromising real-world operational efficiency. Experimental results demonstrate that our proposed method consistently outperforms baseline approaches across diverse market conditions, highlighting its effectiveness in subsidy optimization for ride-hailing service providers.
Updated: 2025-07-04 03:27:45
标题: 竞争压力下的订单获取:一种用于顺风车补贴策略的快速自适应强化学习方法
摘要: 私人乘车聚合平台的激增为乘车服务提供商带来了重要的增长机会,通过增加订单量和总商品价值(GMV)。在大多数私人乘车聚合平台上,提供更低票价的服务提供商在排名中更靠前,因此更有可能被乘客选择。这种竞争性排名机制为服务提供商采用降低价格的优惠券策略提供了强烈的动力,以确保获得更多订单,因为订单量直接影响着它们的长期生存能力和可持续性。因此,设计一种能够在市场波动中动态适应并在预算约束下优化订单获取的有效优惠券策略是一个重要的研究挑战。然而,目前在这一领域的研究仍然很少。 为了填补这一空白,我们提出了FCA-RL,这是一种基于强化学习的新型补贴策略框架,旨在快速适应竞争对手的定价调整。我们的方法集成了两种关键技术:快速竞争适应(FCA),这使得能够迅速响应动态价格变化,以及强化拉格朗日调整(RLA),这确保在优化新的价格格局下的优惠券决策时遵守预算约束。此外,我们还引入了RideGym,这是专为私人乘车聚合平台量身定制的第一个模拟环境,可以在不影响现实运营效率的情况下全面评估和基准不同定价策略。实验结果表明,我们提出的方法在各种市场条件下始终优于基线方法,突显了其在私人乘车服务提供商补贴优化中的有效性。
更新时间: 2025-07-04 03:27:45
领域: cs.LG,cs.AI
ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization
Current text-to-image (T2I) benchmarks evaluate models on rigid prompts, potentially underestimating true generative capabilities due to prompt sensitivity and creating biases that favor certain models while disadvantaging others. We introduce ConceptMix++, a framework that disentangles prompt phrasing from visual generation capabilities by applying iterative prompt optimization. Building on ConceptMix, our approach incorporates a multimodal optimization pipeline that leverages vision-language model feedback to refine prompts systematically. Through extensive experiments across multiple diffusion models, we show that optimized prompts significantly improve compositional generation performance, revealing previously hidden model capabilities and enabling fairer comparisons across T2I models. Our analysis reveals that certain visual concepts -- such as spatial relationships and shapes -- benefit more from optimization than others, suggesting that existing benchmarks systematically underestimate model performance in these categories. Additionally, we find strong cross-model transferability of optimized prompts, indicating shared preferences for effective prompt phrasing across models. These findings demonstrate that rigid benchmarking approaches may significantly underrepresent true model capabilities, while our framework provides more accurate assessment and insights for future development.
Updated: 2025-07-04 03:27:04
标题: ConceptMix++:通过迭代提示优化在文本到图像基准测试中平衡竞争力
摘要: 当前的文本到图像(T2I)基准评估模型在严格的提示下,可能会低估真正的生成能力,因为提示的敏感性并且会导致偏向某些模型而对其他模型造成不利。我们引入ConceptMix ++,一个框架,通过应用迭代提示优化来解开提示措辞与视觉生成能力。在ConceptMix的基础上,我们的方法结合了一个多模态优化管道,利用视觉语言模型的反馈系统地完善提示。通过在多个扩散模型上进行大量实验,我们展示了优化提示明显改善了组合生成性能,揭示了之前隐藏的模型能力,并实现了在T2I模型之间更公平的比较。我们的分析显示,某些视觉概念 -- 如空间关系和形状 -- 比其他概念更受优化的影响,表明现有基准系统地低估了这些类别中的模型性能。此外,我们发现优化提示具有强大的跨模型可传递性,表明跨模型之间存在共享偏好的有效提示措辞。这些发现表明,严格的基准评估方法可能会显著地低估真实的模型能力,而我们的框架为未来的发展提供了更准确的评估和洞察。
更新时间: 2025-07-04 03:27:04
领域: cs.CV,cs.LG
MoralBench: Moral Evaluation of LLMs
In the rapidly evolving field of artificial intelligence, large language models (LLMs) have emerged as powerful tools for a myriad of applications, from natural language processing to decision-making support systems. However, as these models become increasingly integrated into societal frameworks, the imperative to ensure they operate within ethical and moral boundaries has never been more critical. This paper introduces a novel benchmark designed to measure and compare the moral reasoning capabilities of LLMs. We present the first comprehensive dataset specifically curated to probe the moral dimensions of LLM outputs, addressing a wide range of ethical dilemmas and scenarios reflective of real-world complexities. The main contribution of this work lies in the development of benchmark datasets and metrics for assessing the moral identity of LLMs, which accounts for nuance, contextual sensitivity, and alignment with human ethical standards. Our methodology involves a multi-faceted approach, combining quantitative analysis with qualitative insights from ethics scholars to ensure a thorough evaluation of model performance. By applying our benchmark across several leading LLMs, we uncover significant variations in moral reasoning capabilities of different models. These findings highlight the importance of considering moral reasoning in the development and evaluation of LLMs, as well as the need for ongoing research to address the biases and limitations uncovered in our study. We publicly release the benchmark at https://drive.google.com/drive/u/0/folders/1k93YZJserYc2CkqP8d4B3M3sgd3kA8W7 and also open-source the code of the project at https://github.com/agiresearch/MoralBench.
Updated: 2025-07-04 03:19:53
标题: MoralBench:LLM的道德评估
摘要: 在快速发展的人工智能领域,大型语言模型(LLMs)已经成为强大的工具,可用于各种应用,从自然语言处理到决策支持系统。然而,随着这些模型越来越多地融入社会框架,确保它们在道德和伦理范围内运作的迫切性从未如此重要。本文介绍了一个新颖的基准,旨在衡量和比较LLMs的道德推理能力。我们提出了第一个专门策划的数据集,用于探究LLMs输出的道德维度,涵盖了一系列反映现实复杂性的伦理困境和情景。 这项工作的主要贡献在于开发了用于评估LLMs道德身份的基准数据集和指标,考虑到细微差别、上下文敏感性以及与人类伦理标准的一致性。我们的方法涉及多方面的途径,将定量分析与伦理学者的定性见解相结合,以确保对模型性能进行彻底评估。通过将我们的基准应用于几个领先的LLMs,我们发现不同模型的道德推理能力存在显著差异。这些发现突显了在LLMs的开发和评估中考虑道德推理的重要性,以及需要进行持续研究来解决我们研究中发现的偏见和局限性。我们公开发布了基准数据集,并在https://drive.google.com/drive/u/0/folders/1k93YZJserYc2CkqP8d4B3M3sgd3kA8W7,同时也在https://github.com/agiresearch/MoralBench上开源了项目代码。
更新时间: 2025-07-04 03:19:53
领域: cs.CL,cs.AI
DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift
In this paper, the precoding design is investigated for maximizing the throughput of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with obstructed direct communication paths. In particular, a reconfigurable intelligent surface (RIS) is employed to enhance MIMO transmissions, considering mmWave characteristics related to line-of-sight (LoS) and multipath effects. The traditional exhaustive search (ES) for optimal codewords in the continuous phase shift is computationally intensive and time-consuming. To reduce computational complexity, permuted discrete Fourier transform (DFT) vectors are used for finding codebook design, incorporating amplitude responses for practical or ideal RIS systems. However, even if the discrete phase shift is adopted in the ES, it results in significant computation and is time-consuming. Instead, the trained deep neural network (DNN) is developed to facilitate faster codeword selection. Simulation results show that the DNN maintains sub-optimal spectral efficiency even as the distance between the end-user and the RIS has variations in the testing phase. These results highlight the potential of DNN in advancing RIS-aided systems.
Updated: 2025-07-04 03:10:52
标题: 基于DNN的实用相移辅助mmWave MIMO系统中的预编码
摘要: 本文研究了用于最大化毫米波(mmWave)多输入多输出(MIMO)系统吞吐量的预编码设计,该系统存在阻碍直通通信路径。具体而言,采用可重构智能表面(RIS)来增强MIMO传输,在考虑与直射(LoS)和多径效应相关的毫米波特性。传统的穷举搜索(ES)用于在连续相移中寻找最佳码字,计算密集且耗时。为了降低计算复杂性,使用排列离散傅里叶变换(DFT)向量来寻找码本设计,结合幅度响应,适用于实际或理想的RIS系统。然而,即使在ES中采用了离散相移,也会导致显著的计算和耗时。相反,开发了训练有素的深度神经网络(DNN)来促进更快的码字选择。模拟结果显示,即使在测试阶段终端用户与RIS之间的距离变化,DNN仍能保持次优的频谱效率。这些结果突显了DNN在推动RIS辅助系统方面的潜力。
更新时间: 2025-07-04 03:10:52
领域: eess.SP,cs.AI,cs.LG
GenSim: A General Social Simulation Platform with Large Language Model based Agents
With the rapid advancement of large language models (LLMs), recent years have witnessed many promising studies on leveraging LLM-based agents to simulate human social behavior. While prior work has demonstrated significant potential across various domains, much of it has focused on specific scenarios involving a limited number of agents and has lacked the ability to adapt when errors occur during simulation. To overcome these limitations, we propose a novel LLM-agent-based simulation platform called \textit{GenSim}, which: (1) \textbf{Abstracts a set of general functions} to simplify the simulation of customized social scenarios; (2) \textbf{Supports one hundred thousand agents} to better simulate large-scale populations in real-world contexts; (3) \textbf{Incorporates error-correction mechanisms} to ensure more reliable and long-term simulations. To evaluate our platform, we assess both the efficiency of large-scale agent simulations and the effectiveness of the error-correction mechanisms. To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform based on LLM agents, promising to further advance the field of social science.
Updated: 2025-07-04 03:07:07
标题: GenSim:基于大型语言模型的通用社会仿真平台
摘要: 随着大型语言模型(LLMs)的快速发展,近年来出现了许多有前途的研究,利用基于LLM的代理来模拟人类社会行为。虽然先前的工作已经在各个领域展示出了显著的潜力,但其中大部分集中在涉及有限数量代理的特定场景,并且在模拟过程中出现错误时缺乏适应能力。为了克服这些限制,我们提出了一种新颖的基于LLM代理的仿真平台,名为\textit{GenSim},具有以下特点:(1)\textbf{提炼一组通用功能}以简化定制社会场景的仿真;(2)\textbf{支持十万个代理}以更好地模拟现实世界中的大规模人群;(3)\textbf{融入纠错机制}以确保更可靠和长期的仿真。为了评估我们的平台,我们评估了大规模代理仿真的效率和纠错机制的有效性。据我们所知,GenSim代表了朝着一个基于LLM代理的通用、大规模和可纠正的社会仿真平台迈出的初步步伐,有望进一步推动社会科学领域的发展。
更新时间: 2025-07-04 03:07:07
领域: cs.MA,cs.AI
LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference
Causal forest methods are powerful tools in causal inference. Similar to traditional random forest in machine learning, causal forest independently considers each causal tree. However, this independence consideration increases the likelihood that classification errors in one tree are repeated in others, potentially leading to significant bias in causal e ect estimation. In this paper, we propose a novel approach that establishes connections between causal trees through the Limit Inferior Leaf Interval (LILI) clustering algorithm. LILIs are constructed based on the leaves of all causal trees, emphasizing the similarity of dataset confounders. When two instances with di erent treatments are grouped into the same leaf across a su cient number of causal trees, they are treated as counterfactual outcomes of each other. Through this clustering mechanism, LILI clustering reduces bias present in traditional causal tree methods and enhances the prediction accuracy for the average treatment e ect (ATE). By integrating LILIs into a causal forest, we develop an e cient causal inference method. Moreover, we explore several key properties of LILI by relating it to the concepts of limit inferior and limit superior in the set theory. Theoretical analysis rigorously proves the convergence of the estimated ATE using LILI clustering. Empirically, extensive comparative experiments demonstrate the superior performance of LILI clustering.
Updated: 2025-07-04 03:04:00
标题: LILI聚类算法:将下限叶间隔集成到因果森林中用于因果干扰
摘要: 因果森林方法是因果推断中的强大工具。与传统的机器学习中的随机森林类似,因果森林独立考虑每个因果树。然而,这种独立考虑增加了一个树中的分类错误在其他树中重复的可能性,可能导致因果效应估计中的显著偏差。在本文中,我们提出了一种新方法,通过极限下叶间隔(LILI)聚类算法在因果树之间建立连接。LILI是基于所有因果树的叶子构建的,强调数据集混杂因素的相似性。当两个具有不同处理的实例在足够数量的因果树中被分组到相同的叶子中时,它们被视为彼此的反事实结果。通过这种聚类机制,LILI聚类减少了传统因果树方法中存在的偏差,并提高了平均处理效应(ATE)的预测准确性。通过将LILI整合到因果森林中,我们开发了一种高效的因果推断方法。此外,我们通过将其与集合论中的极限下限和极限上限的概念联系起来,探索了几个LILI的关键特性。理论分析严格证明了使用LILI聚类估计ATE的收敛性。在实证方面,广泛的比较实验展示了LILI聚类的优越性能。
更新时间: 2025-07-04 03:04:00
领域: stat.ML,cs.LG
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
Dynamic Text-Attributed Graphs (DyTAGs), which intricately integrate structural, temporal, and textual attributes, are crucial for modeling complex real-world systems. However, most of the existing DyTAG datasets exhibit poor textual quality, which severely limits their utility for DyTAG generation tasks requiring semantically rich inputs. Additionally, prior work mainly focuses on discriminative tasks on DyTAGs, resulting in a lack of standardized task formulations and evaluation protocols tailored for DyTAG generation. To address these critical issues, we propose Generative DyTAG Benchmark (GDGB), which comprises eight meticulously curated DyTAG datasets with high-quality textual features for both nodes and edges, overcoming limitations of prior datasets. Building on GDGB, we define two novel DyTAG generation tasks: Transductive Dynamic Graph Generation (TDGG) and Inductive Dynamic Graph Generation (IDGG). TDGG transductively generates a target DyTAG based on the given source and destination node sets, while the more challenging IDGG introduces new node generation to inductively model the dynamic expansion of real-world graph data. To enable holistic evaluation, we design multifaceted metrics that assess the structural, temporal, and textual quality of the generated DyTAGs. We further propose GAG-General, an LLM-based multi-agent generative framework tailored for reproducible and robust benchmarking of DyTAG generation. Experimental results demonstrate that GDGB enables rigorous evaluation of TDGG and IDGG, with key insights revealing the critical interplay of structural and textual features in DyTAG generation. These findings establish GDGB as a foundational resource for advancing generative DyTAG research and unlocking further practical applications in DyTAG generation. GDGB datasets, source codes, and leaderboards are available at \href{https://gdgb-algo.github.io/}{here}.
Updated: 2025-07-04 02:55:32
标题: GDGB:生成动态文本属性图学习的基准Benchmark
摘要: 动态文本属性图(DyTAGs)精细地整合了结构、时间和文本属性,对于建模复杂的现实世界系统至关重要。然而,大多数现有的DyTAG数据集存在文本质量较差的问题,严重限制了它们在需要语义丰富输入的DyTAG生成任务中的实用性。此外,先前的工作主要集中在DyTAG上的判别任务,导致缺乏专为DyTAG生成量身定制的标准化任务表述和评估协议。为解决这些关键问题,我们提出了生成式DyTAG基准(GDGB),包括八个精心策划的DyTAG数据集,具有高质量的节点和边缘文本特征,克服了先前数据集的局限性。在GDGB的基础上,我们定义了两个新颖的DyTAG生成任务:传导动态图生成(TDGG)和归纳动态图生成(IDGG)。TDGG根据给定的源节点集和目标节点集传导生成目标DyTAG,而更具挑战性的IDGG引入新节点生成,归纳地模拟现实世界图数据的动态扩展。为了进行全面评估,我们设计了多方面的度量标准,评估生成的DyTAG的结构、时间和文本质量。我们进一步提出了GAG-General,这是一个基于LLM的多代理生成框架,专为重复和稳健的DyTAG生成基准测试而设计。实验结果表明,GDGB使得对TDGG和IDGG进行严格评估成为可能,关键的见解揭示了DyTAG生成中结构和文本特征的重要相互作用。这些发现将GDGB确立为推动生成式DyTAG研究并解锁更多实际应用的基础资源。GDGB数据集、源代码和排行榜可以在\href{https://gdgb-algo.github.io/}{这里}找到。
更新时间: 2025-07-04 02:55:32
领域: cs.AI,cs.CL
On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization
We present a reinforcement learning method for training neuro-fuzzy controllers using Proximal Policy Optimization (PPO). Unlike prior approaches that used Deep Q-Networks (DQN) with Adaptive Neuro-Fuzzy Inference Systems (ANFIS), our PPO-based framework leverages a stable on-policy actor-critic setup. Evaluated on the CartPole-v1 environment across multiple seeds, PPO-trained fuzzy agents consistently achieved the maximum return of 500 with zero variance after 20000 updates, outperforming ANFIS-DQN baselines in both stability and convergence speed. This highlights PPO's potential for training explainable neuro-fuzzy agents in reinforcement learning tasks.
Updated: 2025-07-04 02:40:45
标题: 使用近端策略优化的ANFIS策略的在线优化
摘要: 我们提出了一种使用Proximal Policy Optimization (PPO)训练神经模糊控制器的强化学习方法。与先前使用深度Q网络(DQN)和自适应神经模糊推理系统(ANFIS)的方法不同,我们基于PPO的框架利用了稳定的基于策略的演员-评论家设置。在CartPole-v1环境中评估了多个种子,经过20000次更新,经过PPO训练的模糊代理在零方差下始终实现了最大回报500,优于ANFIS-DQN基线模型在稳定性和收敛速度方面。这突显了PPO在训练可解释的神经模糊代理在强化学习任务中的潜力。
更新时间: 2025-07-04 02:40:45
领域: cs.LG,cs.AI
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
Multimodal Large Language Models (MLLMs) increasingly adopt multiple vision encoders to capture diverse visual information, ranging from coarse semantics to fine grained details. While this approach is intended to enhance visual understanding capability, we observe that the performance gains from adding encoders often diminish and can even lead to performance degradation, a phenomenon we term encoder redundancy. This paper presents a systematic investigation into this issue. Through comprehensive ablation studies on state of the art multi encoder MLLMs, we empirically demonstrate that significant redundancy exists. To quantify each encoder's unique contribution, we propose a principled metric: the Conditional Utilization Rate (CUR). Building on CUR, we introduce the Information Gap (IG) to capture the overall disparity in encoder utility within a model.Our experiments reveal that certain vision encoders contribute little, or even negatively, to overall performance, confirming substantial redundancy. Our experiments reveal that certain vision encoders contribute minimally, or even negatively, to the model's performance, confirming the prevalence of redundancy. These findings highlight critical inefficiencies in current multi encoder designs and establish that our proposed metrics can serve as valuable diagnostic tools for developing more efficient and effective multimodal architectures.
Updated: 2025-07-04 02:38:59
标题: 使用多个视觉编码器研究多模态大语言模型中的冗余性
摘要: 多模态大型语言模型(MLLMs)越来越多地采用多个视觉编码器来捕获各种视觉信息,从粗略语义到细节。虽然这种方法旨在增强视觉理解能力,但我们观察到添加编码器带来的性能提升往往减弱甚至导致性能下降,这种现象被称为编码器冗余。本文对这个问题进行了系统调查。通过对最先进的多编码器MLLMs进行全面消融研究,我们经验性地证明存在显著的冗余。为了量化每个编码器的独特贡献,我们提出了一个原则性指标:条件利用率(CUR)。基于CUR,我们引入了信息差(IG)来捕捉模型内编码器效用的整体差异。我们的实验表明,某些视觉编码器对整体性能贡献甚微,甚至产生负面影响,证实了显著的冗余。这些发现突显了当前多编码器设计中的关键低效性,并建立了我们提出的度量指标可以作为开发更高效和更有效的多模态架构的有价值的诊断工具。
更新时间: 2025-07-04 02:38:59
领域: cs.CV,cs.AI
Novel Blockchain-based Protocols for Electronic Voting and Auctions
Programmable blockchains have long been a hot research topic given their tremendous use in decentralized applications. Smart contracts, using blockchains as their underlying technology, inherit the desired properties such as verifiability, immutability, and transparency, which make it a great suit in trustless environments. In this thesis, we consider several decentralized protocols to be built on blockchains, specifically using smart contracts on Ethereum. We used algorithmic and cryptographic tools in our implementations to further improve the level of security and efficiency beyond the state-of-the-art works. We proposed a new approach called Blind Vote, which is an untraceable, secure, efficient, secrecy-preserving, and fully on-chain electronic voting protocol based on the well-known concept of Chaum's blind signatures. We illustrate that our approach achieves the same security guarantees as previous methods such as Tornado Vote [1], while consuming significantly less gas. Thus, we provide a cheaper and considerably more gas-efficient alternative for anonymous blockchain-based voting. On the other hand, we propose a new family of algorithms for private, trustless auctions that protect bidder identities and bid values while remaining practical for smart contract execution. We ensure trustlessness by running the auction logic in a smart contract, thereby eliminating reliance on any single trusted party. This approach prevents bid tampering, front-running, and collusion by enforcing immutability and decentralized verification of bids. The resulting protocol uniquely combines efficiency, trustlessness, and enduring bid privacy, offering a scalable and secure solution for blockchain-based marketplaces and other decentralized applications.
Updated: 2025-07-04 02:26:04
标题: 基于区块链的电子投票和拍卖的新型协议
摘要: 可编程区块链长期以来一直是一个热门的研究课题,因为它们在去中心化应用中的巨大用途。智能合约利用区块链作为基础技术,继承了可验证性、不可变性和透明性等理想特性,使其在无信任环境中成为一个极其适用的工具。 在本论文中,我们考虑了几种分布式协议的构建方式,特别是在以太坊上利用智能合约。我们在实现中使用了算法和加密工具,进一步提高了安全性和效率水平,超越了现有技术。我们提出了一种名为Blind Vote的新方法,这是一种基于Chaum的盲签名概念的不可追踪、安全、高效、保密的完全链上电子投票协议。我们展示了我们的方法实现了与之前方法(如Tornado Vote [1])相同的安全保证,同时消耗的气体显著更少。因此,我们提供了一个更便宜且更具气体效率的匿名区块链投票替代方案。另一方面,我们提出了一组新的算法,用于私密、无信任拍卖,保护竞标者身份和竞标价值,同时保持对智能合约执行的实用性。我们通过在智能合约中运行拍卖逻辑来确保无信任性,从而消除对任何单个受信任方的依赖。这种方法通过强制不可变性和分散验证竞标来防止竞标篡改、前置交易和串通。由此产生的协议独特地结合了效率、无信任性和持久的竞标隐私,为基于区块链的市场和其他去中心化应用提供了可扩展且安全的解决方案。
更新时间: 2025-07-04 02:26:04
领域: cs.CR,cs.DC
ForgeHLS: A Large-Scale, Open-Source Dataset for High-Level Synthesis
We introduce ForgeEDA, an open-source comprehensive circuit dataset across various categories. ForgeEDA includes diverse circuit representations such as Register Transfer Level (RTL) code, Post-mapping (PM) netlists, And-Inverter Graphs (AIGs), and placed netlists, enabling comprehensive analysis and development. We demonstrate ForgeEDA's utility by benchmarking state-of-the-art EDA algorithms on critical tasks such as Power, Performance, and Area (PPA) optimization, highlighting its ability to expose performance gaps and drive advancements. Additionally, ForgeEDA's scale and diversity facilitate the training of AI models for EDA tasks, demonstrating its potential to improve model performance and generalization. By addressing limitations in existing datasets, ForgeEDA aims to catalyze breakthroughs in modern IC design and support the next generation of innovations in EDA.
Updated: 2025-07-04 02:23:46
标题: ForgeHLS:一个用于高层次综合的大规模开源数据集
摘要: 我们介绍了ForgeEDA,这是一个涵盖各种类别的开源综合电路数据集。ForgeEDA包括各种电路表示形式,如寄存器传输级(RTL)代码,后映射(PM)网表,与非图(AIGs)和放置的网表,使得进行全面的分析和开发成为可能。我们通过在关键任务(如功耗、性能和面积(PPA)优化)上对最先进的EDA算法进行基准测试,展示了ForgeEDA的实用性,突出了其揭示性能差距和推动进步的能力。此外,ForgeEDA的规模和多样性有助于训练用于EDA任务的AI模型,展示了其提高模型性能和泛化能力的潜力。通过解决现有数据集的局限性,ForgeEDA旨在推动现代IC设计中的突破,并支持EDA领域的下一代创新。
更新时间: 2025-07-04 02:23:46
领域: cs.AR,cs.AI
CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs
Effective prompt design is essential for improving the planning capabilities of large language model (LLM)-driven agents. However, existing structured prompting strategies are typically limited to single-agent, plan-only settings, and often evaluate performance solely based on task accuracy - overlooking critical factors such as token efficiency, modularity, and scalability in multi-agent environments. To address these limitations, we introduce CodeAgents, a prompting framework that codifies multi-agent reasoning and enables structured, token-efficient planning in multi-agent systems. In CodeAgents, all components of agent interaction - Task, Plan, Feedback, system roles, and external tool invocations - are codified into modular pseudocode enriched with control structures (e.g., loops, conditionals), boolean logic, and typed variables. This design transforms loosely connected agent plans into cohesive, interpretable, and verifiable multi-agent reasoning programs. We evaluate the proposed framework across three diverse benchmarks - GAIA, HotpotQA, and VirtualHome - using a range of representative LLMs. Results show consistent improvements in planning performance, with absolute gains of 3-36 percentage points over natural language prompting baselines. On VirtualHome, our method achieves a new state-of-the-art success rate of 56%. In addition, our approach reduces input and output token usage by 55-87% and 41-70%, respectively, underscoring the importance of token-aware evaluation metrics in the development of scalable multi-agent LLM systems. The code and resources are available at: https://anonymous.4open.science/r/CodifyingAgent-5A86
Updated: 2025-07-04 02:20:19
标题: CodeAgents:一种在LLMs中进行编码的多智能体推理的令牌高效框架
摘要: 有效的提示设计对于改善大型语言模型(LLM)驱动的代理的规划能力至关重要。然而,现有的结构化提示策略通常局限于单一代理、仅计划设置,并且经常仅基于任务准确性评估性能-忽视了在多代理环境中的关键因素,如标记效率、模块化和可扩展性。为了解决这些限制,我们引入了CodeAgents,一个提示框架,将多代理推理编码,并在多代理系统中实现了结构化、标记高效的规划。在CodeAgents中,代理交互的所有组件-任务、计划、反馈、系统角色和外部工具调用-都被编码为丰富控制结构(例如循环、条件)、布尔逻辑和类型变量的模块化伪代码。这种设计将松散连接的代理计划转化为连贯的、可解释的和可验证的多代理推理程序。我们在三个不同的基准测试中评估了提出的框架-GAIA、HotpotQA和VirtualHome-使用一系列代表性LLM。结果显示,在规划性能方面持续改善,与自然语言提示基线相比,绝对增益为3-36个百分点。在VirtualHome中,我们的方法实现了56%的新的最先进成功率。此外,我们的方法将输入和输出标记使用量分别降低了55-87%和41-70%,强调了在可扩展多代理LLM系统的开发中重要性的标记感知评估指标。代码和资源可在以下网址获得:https://anonymous.4open.science/r/CodifyingAgent-5A86
更新时间: 2025-07-04 02:20:19
领域: cs.AI
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs
The foundational capabilities of large language models (LLMs) are deeply influenced by the quality of their pre-training corpora. However, enhancing data quality at scale remains a significant challenge, primarily due to the trade-off between refinement effectiveness and processing efficiency. While rule-based filtering remains the dominant paradigm, it typically operates at the document level and lacks the granularity needed to refine specific content within documents. Inspired by emerging work such as ProX, we propose $\textbf{RefineX}$, a novel framework for large-scale, surgical refinement of pre-training data through programmatic editing tasks. RefineX enables efficient and fine-grained data refinement while reliably preserving the diversity and naturalness of raw text. The core strength of RefineX lies in distilling high-quality, expert-guided end-to-end refinement results into minimal edit-based deletion programs. This high-precision distillation pipeline is used to train an efficient and reliable refine model that can systematically improve every instance in the corpus at scale. We evaluate RefineX across from-scratch pre-training at multiple model scales and find that it consistently outperforms models trained on raw, filtered, or alternatively refined data across diverse downstream tasks. On the 750M model, RefineX yields 2.6%-7.2% average gains on lighteval tasks, and achieves comparable performance using significantly fewer training tokens. Further analysis shows that RefineX reliably enhances text quality with both high efficiency and precision, outperforming prior approaches such as end-to-end generation and Prox-C. These results position RefineX as a scalable, effective, and reliable solution for optimizing pre-training data in modern LLM pipelines.
Updated: 2025-07-04 02:19:58
标题: RefineX:从专家引导的程序中学习在规模上细化预训练数据
摘要: 大型语言模型(LLMs)的基础能力深受其预训练语料库质量的影响。然而,在规模上提升数据质量仍然是一个重大挑战,主要是因为在细化效果和处理效率之间存在权衡。虽然基于规则的过滤仍然是主导范式,但通常在文档级别操作,并且缺乏在文档中细化特定内容所需的粒度。受到ProX等新兴工作的启发,我们提出了$\textbf{RefineX}$,这是一个通过程序化编辑任务对预训练数据进行大规模、精细化细化的新框架。RefineX能够在可靠地保留原始文本的多样性和自然性的同时,实现高效和细粒度的数据细化。RefineX的核心优势在于将高质量、专家引导的端到端细化结果提炼为最小的基于编辑的删除程序。这种高精度的提炼管道用于训练一个高效而可靠的细化模型,可以系统地提高语料库中的每个实例的规模。我们评估了RefineX在多个模型规模的从头开始预训练中的表现,并发现它在各种下游任务中始终优于在原始、过滤或其他方式细化的数据上训练的模型。在750M模型上,RefineX在lighteval任务上平均获得了2.6%-7.2%的增益,并且使用更少的训练标记实现了可比的性能。进一步的分析显示,RefineX可靠地提升文本质量,具有高效性和精准性,优于以往的端到端生成和Prox-C等方法。这些结果将RefineX定位为现代LLM管道中优化预训练数据的一个可扩展、有效和可靠的解决方案。
更新时间: 2025-07-04 02:19:58
领域: cs.CL,cs.AI
HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration
The rapid growth of deep learning has driven exponential increases in model parameters and computational demands. NVIDIA GPUs and their CUDA-based software ecosystem provide robust support for parallel computing, significantly alleviating computational bottlenecks. Meanwhile, due to the cultivation of user programming habits and the high performance of GPUs, the CUDA ecosystem has established a dominant position in the field of parallel software. This dominance requires other hardware platforms to support CUDA-based software with performance portability. However, translating CUDA code to other platforms poses significant challenges due to differences in parallel programming paradigms and hardware architectures. Existing approaches rely on language extensions, domain-specific languages (DSLs), or compilers but face limitations in workload coverage and generalizability. Moreover, these methods often incur substantial development costs. Recently, LLMs have demonstrated extraordinary potential in various vertical domains, especially in code-related tasks. However, the performance of existing LLMs in CUDA transpilation, particularly for high-performance code, remains suboptimal. To address these challenges, we propose a novel framework for generating high-performance CUDA and corresponding platform code pairs, leveraging AI compiler and automatic optimization technology. We further enhance the framework with a graph-based data augmentation method and introduce HPCTransEval, a benchmark for evaluating LLM performance on CUDA transpilation. We conduct experiments using CUDA-to-CPU transpilation as a case study on leading LLMs. The speedup ratio of the CPU operators has an average improvemnet of 43.8\%, highlighting the potential of LLMs to address compatibility challenges within the CUDA ecosystem. Our code is available at https://github.com/PJLAB-CHIP/HPCTransCompile.
Updated: 2025-07-04 02:01:57
标题: HPCTransCompile:用于高性能CUDA转译和LLM初步探索的AI编译器生成的数据集
摘要: 深度学习的快速增长推动了模型参数和计算需求的指数级增长。NVIDIA GPU及其基于CUDA的软件生态系统为并行计算提供了强大的支持,显著缓解了计算瓶颈。与此同时,由于用户编程习惯的培养和GPU的高性能,CUDA生态系统在并行软件领域建立了主导地位。这种主导地位要求其他硬件平台支持具有性能可移植性的基于CUDA的软件。然而,将CUDA代码转换为其他平台面临着重大挑战,因为并行编程范式和硬件架构存在差异。现有方法依赖于语言扩展、领域特定语言(DSL)或编译器,但在工作负载覆盖范围和泛化性方面存在局限性。此外,这些方法通常会产生重大的开发成本。最近,大型语言模型(LLMs)在各个垂直领域展示了出色的潜力,特别是在与代码相关的任务中。然而,现有LLMs在CUDA转译中的性能,特别是对于高性能代码,仍然不理想。为了解决这些挑战,我们提出了一个新颖的框架,用于生成高性能的CUDA和相应的平台代码对,利用AI编译器和自动优化技术。我们进一步利用基于图的数据增强方法增强框架,并引入HPCTransEval,一个用于评估LLM在CUDA转译中性能的基准测试。我们使用CUDA-to-CPU转译作为LLM领先案例研究进行实验。CPU运算符的加速比平均提高了43.8\%,突显了LLMs在解决CUDA生态系统内的兼容性挑战方面的潜力。我们的代码可在https://github.com/PJLAB-CHIP/HPCTransCompile上找到。
更新时间: 2025-07-04 02:01:57
领域: cs.DC,cs.AI
Toward Efficient Speech Emotion Recognition via Spectral Learning and Attention
Speech Emotion Recognition (SER) traditionally relies on auditory data analysis for emotion classification. Several studies have adopted different methods for SER. However, existing SER methods often struggle to capture subtle emotional variations and generalize across diverse datasets. In this article, we use Mel-Frequency Cepstral Coefficients (MFCCs) as spectral features to bridge the gap between computational emotion processing and human auditory perception. To further improve robustness and feature diversity, we propose a novel 1D-CNN-based SER framework that integrates data augmentation techniques. MFCC features extracted from the augmented data are processed using a 1D Convolutional Neural Network (CNN) architecture enhanced with channel and spatial attention mechanisms. These attention modules allow the model to highlight key emotional patterns, enhancing its ability to capture subtle variations in speech signals. The proposed method delivers cutting-edge performance, achieving the accuracy of 97.49% for SAVEE, 99.23% for RAVDESS, 89.31% for CREMA-D, 99.82% for TESS, 99.53% for EMO-DB, and 96.39% for EMOVO. Experimental results show new benchmarks in SER, demonstrating the effectiveness of our approach in recognizing emotional expressions with high precision. Our evaluation demonstrates that the integration of advanced Deep Learning (DL) methods substantially enhances generalization across diverse datasets, underscoring their potential to advance SER for real-world deployment in assistive technologies and human-computer interaction.
Updated: 2025-07-04 01:55:49
标题: 朝向通过谱学习和注意力实现高效的语音情感识别
摘要: 情感语音识别(SER)传统上依赖于情感分类的听觉数据分析。一些研究采用不同的方法进行SER。然而,现有的SER方法经常难以捕捉微妙的情感变化,并且在不同数据集之间难以泛化。在本文中,我们使用梅尔频率倒谱系数(MFCCs)作为光谱特征,以弥合计算情感处理和人类听觉感知之间的差距。为了进一步提高鲁棒性和特征多样性,我们提出了一种基于1D-CNN的SER框架,该框架集成了数据增强技术。从增强数据中提取的MFCC特征被使用1D卷积神经网络(CNN)架构处理,并增加了通道和空间注意机制。这些注意模块允许模型突出显示关键的情感模式,增强其捕捉语音信号中微妙变化的能力。提出的方法实现了最先进的性能,在SAVEE数据集上实现了97.49%的准确率,在RAVDESS数据集上实现了99.23%的准确率,在CREMA-D数据集上实现了89.31%的准确率,在TESS数据集上实现了99.82%的准确率,在EMO-DB数据集上实现了99.53%的准确率,在EMOVO数据集上实现了96.39%的准确率。实验结果展示了SER领域的新的基准,证明了我们的方法在识别具有高精度的情感表达方面的有效性。我们的评估表明,先进的深度学习(DL)方法的整合显著增强了在不同数据集之间的泛化能力,突显了它们在辅助技术和人机交互的实际部署中推进SER的潜力。
更新时间: 2025-07-04 01:55:49
领域: cs.SD,cs.AI,eess.AS
Subject Invariant Contrastive Learning for Human Activity Recognition
The high cost of annotating data makes self-supervised approaches, such as contrastive learning methods, appealing for Human Activity Recognition (HAR). Effective contrastive learning relies on selecting informative positive and negative samples. However, HAR sensor signals are subject to significant domain shifts caused by subject variability. These domain shifts hinder model generalization to unseen subjects by embedding subject-specific variations rather than activity-specific features. As a result, human activity recognition models trained with contrastive learning often struggle to generalize to new subjects. We introduce Subject-Invariant Contrastive Learning (SICL), a simple yet effective loss function to improve generalization in human activity recognition. SICL re-weights negative pairs drawn from the same subject to suppress subject-specific cues and emphasize activity-specific information. We evaluate our loss function on three public benchmarks: UTD-MHAD, MMAct, and DARai. We show that SICL improves performance by up to 11% over traditional contrastive learning methods. Additionally, we demonstrate the adaptability of our loss function across various settings, including multiple self-supervised methods, multimodal scenarios, and supervised learning frameworks.
Updated: 2025-07-04 01:55:33
标题: 主题不变对比学习用于人体活动识别
摘要: 数据注释的高成本使得自监督方法,如对比学习方法,在人体活动识别(HAR)中备受青睐。有效的对比学习依赖于选择信息丰富的正样本和负样本。然而,HAR传感器信号受到主体变异引起的显著领域转移的影响。这些领域转移阻碍了模型对未知主体的泛化,因为它嵌入了主体特定的变异而不是活动特定的特征。因此,使用对比学习训练的人体活动识别模型通常难以泛化到新的主体。我们引入了主体不变的对比学习(SICL),这是一种简单而有效的损失函数,可提高人体活动识别的泛化能力。SICL重新权衡从同一主体中提取的负对,以抑制主体特定的线索并强调活动特定的信息。我们在三个公共基准测试中评估了我们的损失函数:UTD-MHAD、MMAct和DARai。我们表明,与传统的对比学习方法相比,SICL的性能提高了高达11%。此外,我们展示了我们的损失函数在各种设置中的适应性,包括多种自监督方法、多模态场景和监督学习框架。
更新时间: 2025-07-04 01:55:33
领域: cs.CV,cs.LG
EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora
Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus. However, existing approaches typically assume a static corpus, requiring expensive full-graph reconstruction whenever new documents arrive, limiting their scalability in dynamic, evolving environments. To address these limitations, we introduce EraRAG, a novel multi-layered Graph-RAG framework that supports efficient and scalable dynamic updates. Our method leverages hyperplane-based Locality-Sensitive Hashing (LSH) to partition and organize the original corpus into hierarchical graph structures, enabling efficient and localized insertions of new data without disrupting the existing topology. The design eliminates the need for retraining or costly recomputation while preserving high retrieval accuracy and low latency. Experiments on large-scale benchmarks demonstrate that EraRag achieves up to an order of magnitude reduction in update time and token consumption compared to existing Graph-RAG systems, while providing superior accuracy performance. This work offers a practical path forward for RAG systems that must operate over continually growing corpora, bridging the gap between retrieval efficiency and adaptability. Our code and data are available at https://github.com/EverM0re/EraRAG-Official.
Updated: 2025-07-04 01:31:36
标题: EraRAG: 高效和增量检索增强生成方法,用于不断增长的文集
摘要: 基于图的检索增强生成(Graph-RAG)通过在外部语料库上构建检索来增强大型语言模型(LLMs)。然而,现有方法通常假定静态语料库,当新文档到达时需要昂贵的全图重建,限制了它们在动态、不断演变的环境中的可扩展性。为了解决这些限制,我们引入了EraRAG,一个支持高效且可扩展动态更新的新型多层次Graph-RAG框架。我们的方法利用基于超平面的局部敏感哈希(LSH)将原始语料库分区并组织成分层图结构,实现对新数据的高效和局部化插入,而不会破坏现有拓扑结构。该设计消除了重新训练或昂贵重新计算的需求,同时保持高检索准确性和低延迟。在大规模基准测试中的实验表明,EraRag相对于现有的Graph-RAG系统在更新时间和令牌消耗上实现了一个数量级的降低,同时提供了更优越的准确性表现。这项工作为必须在不断增长的语料库上运行的RAG系统提供了一条实用的前进道路,弥合了检索效率和适应性之间的差距。我们的代码和数据可在https://github.com/EverM0re/EraRAG-Official上找到。
更新时间: 2025-07-04 01:31:36
领域: cs.IR,cs.LG
Horus: A Protocol for Trustless Delegation Under Uncertainty
Correctness is an emergent property of systems where exposing error is cheaper than committing it. In dynamic, low-trust environments, autonomous AI agents benefit from delegating work to sub-agents, yet correctness cannot be assured through upfront specification or centralized oversight. We propose a protocol that enforces correctness through collateralized claims in a recursive verification game. Tasks are published as intents, and solvers compete to fulfill them. Selected solvers carry out tasks under risk, with correctness checked post hoc by verifiers. Any challenger can challenge a result by staking against it to trigger the verification process. Incorrect agents are slashed and correct opposition is rewarded, with an escalation path that penalizes erroneous verifiers themselves. When incentives are aligned across solvers, challengers, and verifiers, falsification conditions make correctness the Nash equilibrium.
Updated: 2025-07-04 01:19:50
标题: 荷鲁斯:一种在不确定性情况下无需信任的委托协议
摘要: 正确性是系统的一种 emergent 属性,在这种系统中,暴露错误比犯错更便宜。在动态、低信任的环境中,自主的 AI 代理从将工作委托给子代理中受益,然而通过事先规范或集中监督无法保证正确性。我们提出了一个通过抵押索赔在递归验证游戏中强制正确性的协议。任务作为意图发布,解决者竞争去完成它们。被选中的解决者在风险下执行任务,正确性由验证者事后检查。任何挑战者都可以通过抵押对其进行挑战,触发验证过程。不正确的代理会被削减,正确的对立会被奖励,而对错误验证者本身的惩罚路径将被加重。当解决者、挑战者和验证者之间的激励是对齐的时候,伪造条件使正确性成为纳什均衡。
更新时间: 2025-07-04 01:19:50
领域: cs.GT,cs.AI,cs.MA,I.2.11; F.2.2
Evaluating the Impact of Multiple DER Aggregators on Wholesale Energy Markets: A Hybrid Mean Field Approach
The integration of distributed energy resources (DERs) into wholesale energy markets can greatly enhance grid flexibility, improve market efficiency, and contribute to a more sustainable energy future. As DERs -- such as solar PV panels and energy storage -- proliferate, effective mechanisms are needed to ensure that small prosumers can participate meaningfully in these markets. We study a wholesale market model featuring multiple DER aggregators, each controlling a portfolio of DER resources and bidding into the market on behalf of the DER asset owners. The key of our approach lies in recognizing the repeated nature of market interactions the ability of participants to learn and adapt over time. Specifically, Aggregators repeatedly interact with each other and with other suppliers in the wholesale market, collectively shaping wholesale electricity prices (aka the locational marginal prices (LMPs)). We model this multi-agent interaction using a mean-field game (MFG), which uses market information -- reflecting the average behavior of market participants -- to enable each aggregator to predict long-term LMP trends and make informed decisions. For each aggregator, because they control the DERs within their portfolio under certain contract structures, we employ a mean-field control (MFC) approach (as opposed to a MFG) to learn an optimal policy that maximizes the total rewards of the DERs under their management. We also propose a reinforcement learning (RL)-based method to help each agent learn optimal strategies within the MFG framework, enhancing their ability to adapt to market conditions and uncertainties. Numerical simulations show that LMPs quickly reach a steady state in the hybrid mean-field approach. Furthermore, our results demonstrate that the combination of energy storage and mean-field learning significantly reduces price volatility compared to scenarios without storage.
Updated: 2025-07-04 01:17:58
标题: 评估多个分布式能源资源聚合商对批发能源市场的影响:混合均场方法
摘要: 将分布式能源资源(DERs)整合到批发能源市场中可以极大地增强电网的灵活性,改善市场效率,并有助于实现更可持续的能源未来。随着太阳能光伏面板和储能等DERs的普及,需要有效的机制确保小型生产者消费者能够有意义地参与这些市场。我们研究了一个具有多个DER聚合器的批发市场模型,每个聚合器控制着一组DER资源,并代表DER资产所有者向市场投标。我们方法的关键在于认识到市场互动的重复性,以及参与者随时间学习和适应的能力。具体来说,聚合器与彼此和其他供应商不断互动,在批发市场中共同塑造批发电力价格(即定位边际价格(LMPs))。我们使用均场博弈(MFG)模型来建模这种多代理互动,该模型使用市场信息(反映市场参与者的平均行为)来使每个聚合器能够预测长期LMP趋势并做出明智决策。对于每个聚合器,由于他们根据某些合同结构控制其组合中的DERs,我们采用均场控制(MFC)方法(而不是MFG)来学习最大化其管理下DERs的总奖励的最优策略。我们还提出了一种基于强化学习(RL)的方法,帮助每个代理在MFG框架内学习最优策略,增强其适应市场条件和不确定性的能力。数值模拟显示,在混合均场方法中,LMPs迅速达到稳定状态。此外,我们的结果表明,能量存储和均场学习的结合相比没有存储的情况显著降低了价格波动。
更新时间: 2025-07-04 01:17:58
领域: eess.SY,cs.AI,cs.LG,cs.SY,econ.GN,math.OC,q-fin.EC
On Jailbreaking Quantized Language Models Through Fault Injection Attacks
The safety alignment of Language Models (LMs) is a critical concern, yet their integrity can be challenged by direct parameter manipulation attacks, such as those potentially induced by fault injection. As LMs are increasingly deployed using low-precision quantization for efficiency, this paper investigates the efficacy of such attacks for jailbreaking aligned LMs across different quantization schemes. We propose gradient-guided attacks, including a tailored progressive bit-level search algorithm introduced herein and a comparative word-level (single weight update) attack. Our evaluation on Llama-3.2-3B, Phi-4-mini, and Llama-3-8B across FP16 (baseline), and weight-only quantization (FP8, INT8, INT4) reveals that quantization significantly influences attack success. While attacks readily achieve high success (>80\% Attack Success Rate, ASR) on FP16 models, within an attack budget of 25 perturbations, FP8 and INT8 models exhibit ASRs below 20\% and 50\%, respectively. Increasing the perturbation budget up to 150 bit-flips, FP8 models maintained ASR below 65\%, demonstrating some resilience compared to INT8 and INT4 models that have high ASR. In addition, analysis of perturbation locations revealed differing architectural targets across quantization schemes, with (FP16, INT4) and (INT8, FP8) showing similar characteristics. Besides, jailbreaks induced in FP16 models were highly transferable to subsequent FP8/INT8 quantization (<5\% ASR difference), though INT4 significantly reduced transferred ASR (avg. 35\% drop). These findings highlight that while common quantization schemes, particularly FP8, increase the difficulty of direct parameter manipulation jailbreaks, vulnerabilities can still persist, especially through post-attack quantization.
Updated: 2025-07-04 00:48:48
标题: 通过故障注入攻击对量化语言模型进行越狱
摘要: 语言模型(LMs)的安全对齐是一个关键问题,然而,它们的完整性可能会受到直接参数操纵攻击的挑战,这些攻击可能是由故障注入引起的。随着LMs越来越多地使用低精度量化以提高效率,本文研究了这类攻击对跨不同量化方案对齐LMs的有效性。我们提出了梯度引导攻击,包括本文介绍的定制渐进位级搜索算法和比较单词级(单个权重更新)攻击。我们在FP16(基准)、仅权重量化(FP8、INT8、INT4)上对Llama-3.2-3B、Phi-4-mini和Llama-3-8B进行评估,结果显示量化显著影响攻击的成功率。尽管攻击很容易在FP16模型上取得高成功率(>80\%攻击成功率),在攻击预算为25次扰动时,FP8和INT8模型的攻击成功率分别低于20\%和50\%。将扰动预算增加到150个比特翻转时,FP8模型的攻击成功率仍然保持在65\%以下,相比之下,INT8和INT4模型的攻击成功率较高。此外,扰动位置的分析显示,在不同的量化方案中存在不同的架构目标,(FP16、INT4)和(INT8、FP8)具有类似的特征。此外,在FP16模型中引发的越狱攻击对后续FP8/INT8量化高度可转移(<5\%的攻击成功率差异),尽管INT4显著降低了转移攻击成功率(平均降低35\%)。这些发现强调了尽管常见的量化方案,特别是FP8,增加了直接参数操纵越狱攻击的难度,但漏洞仍可能存在,特别是通过攻击后的量化。
更新时间: 2025-07-04 00:48:48
领域: cs.CR,cs.AI
Fault Sneaking Attack: a Stealthy Framework for Misleading Deep Neural Networks
Despite the great achievements of deep neural networks (DNNs), the vulnerability of state-of-the-art DNNs raises security concerns of DNNs in many application domains requiring high reliability.We propose the fault sneaking attack on DNNs, where the adversary aims to misclassify certain input images into any target labels by modifying the DNN parameters. We apply ADMM (alternating direction method of multipliers) for solving the optimization problem of the fault sneaking attack with two constraints: 1) the classification of the other images should be unchanged and 2) the parameter modifications should be minimized. Specifically, the first constraint requires us not only to inject designated faults (misclassifications), but also to hide the faults for stealthy or sneaking considerations by maintaining model accuracy. The second constraint requires us to minimize the parameter modifications (using L0 norm to measure the number of modifications and L2 norm to measure the magnitude of modifications). Comprehensive experimental evaluation demonstrates that the proposed framework can inject multiple sneaking faults without losing the overall test accuracy performance.
Updated: 2025-07-04 00:44:05
标题: 故障潜袭攻击:一种用于误导深度神经网络的隐蔽框架
摘要: 尽管深度神经网络(DNNs)取得了巨大的成就,但最新的DNNs的脆弱性引发了对许多需要高可靠性应用领域中DNNs安全性的担忧。我们提出了对DNNs的故障潜入攻击,其中对手旨在通过修改DNN参数将某些输入图像误分类为任何目标标签。我们应用交替方向乘子法(ADMM)来解决故障潜入攻击的优化问题,其中有两个约束条件:1)其他图像的分类应保持不变,2)参数修改应最小化。具体来说,第一个约束要求我们不仅要注入指定的故障(误分类),还要通过保持模型准确性来隐藏故障,以考虑潜入性或隐蔽性。第二个约束要求我们最小化参数修改(使用L0范数来衡量修改数量和L2范数来衡量修改幅度)。全面的实验评估表明,提出的框架可以注入多个潜入性故障,而不会损失整体测试准确性表现。
更新时间: 2025-07-04 00:44:05
领域: cs.LG,cs.CR,cs.CV,stat.ML
Performance-Driven QUBO for Recommender Systems on Quantum Annealers
We propose Counterfactual Analysis Quadratic Unconstrained Binary Optimization (CAQUBO) to solve QUBO problems for feature selection in recommender systems. CAQUBO leverages counterfactual analysis to measure the impact of individual features and feature combinations on model performance and employs the measurements to construct the coefficient matrix for a quantum annealer to select the optimal feature combinations for recommender systems, thereby improving their final recommendation performance. By establishing explicit connections between features and the recommendation performance, the proposed approach demonstrates superior performance compared to the state-of-the-art quantum annealing methods. Extensive experiments indicate that integrating quantum computing with counterfactual analysis holds great promise for addressing these challenges.
Updated: 2025-07-04 00:35:42
标题: 基于性能驱动的量子布尔优化器在量子退火机上用于推荐系统
摘要: 我们提出了反事实分析二次无约束二进制优化(CAQUBO)来解决推荐系统中特征选择的QUBO问题。CAQUBO利用反事实分析来衡量单个特征和特征组合对模型性能的影响,并利用这些测量结果构建系数矩阵,以便量子退火器选择最佳的特征组合用于推荐系统,从而提高最终的推荐性能。通过建立特征与推荐性能之间的明确联系,所提出的方法表现出比最先进的量子退火方法更优越的性能。大量实验表明,将量子计算与反事实分析相结合对解决这些挑战具有巨大的潜力。
更新时间: 2025-07-04 00:35:42
领域: cs.IR,cs.AI
Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
Token-based video representation has emerged as a promising approach for enabling large language models (LLMs) to interpret video content. However, existing token reduction techniques, such as pruning and merging, often disrupt essential positional embeddings and rely on continuous visual tokens sampled from nearby pixels with similar spatial-temporal locations. By removing only a small fraction of tokens, these methods still produce relatively lengthy continuous sequences, which falls short of the extreme compression required to balance computational efficiency and token count in video LLMs. In this paper, we introduce the novel task of Extreme Short Token Reduction, which aims to represent entire videos using a minimal set of discrete tokens. We propose VQToken, a neural discrete token representation framework that (i) applies adaptive vector quantization to continuous ViT embeddings to learn a compact codebook and (ii) preserves spatial-temporal positions via a token hash function by assigning each grid-level token to its nearest codebook entry. On the Extreme Short Token Reduction task, our VQToken compresses sequences to just 0.07 percent of their original length while incurring only a 0.66 percent drop in accuracy on the NextQA-MC benchmark. It also achieves comparable performance on ActNet-QA, Long Video Bench, and VideoMME. We further introduce the Token Information Density (TokDense) metric and formalize fixed-length and adaptive-length subtasks, achieving state-of-the-art results in both settings. Our approach dramatically lowers theoretical complexity, increases information density, drastically reduces token counts, and enables efficient video LLMs in resource-constrained environments.
Updated: 2025-07-04 00:31:19
标题: 视频大语言模型中的神经离散标记表示学习,用于极端标记减少
摘要: 基于标记的视频表示已经成为一种有前途的方法,可以让大型语言模型(LLMs)解释视频内容。然而,现有的标记减少技术,如修剪和合并,通常会破坏必要的位置嵌入,并依赖于从具有相似时空位置的相邻像素采样的连续视觉标记。通过仅移除一小部分标记,这些方法仍然会产生相对较长的连续序列,这远远不足以实现在视频LLMs中平衡计算效率和标记数量所需的极端压缩。在本文中,我们引入了极端短标记减少的新任务,旨在使用最少的离散标记表示整个视频。我们提出了VQToken,这是一个神经离散标记表示框架,它(i)将自适应矢量量化应用于连续的ViT嵌入,以学习一个紧凑的代码簿,(ii)通过一个标记哈希函数来保留时空位置,将每个网格级标记分配给其最近的代码簿条目。在极端短标记减少任务中,我们的VQToken将序列压缩到原始长度的仅0.07%,在NextQA-MC基准测试上仅产生0.66%的准确度下降。它还在ActNet-QA、Long Video Bench和VideoMME上取得了可比较的性能。我们进一步引入了Token信息密度(TokDense)度量,并形式化了固定长度和自适应长度子任务,在两种设置中实现了最先进的结果。我们的方法大大降低了理论复杂性,提高了信息密度,大幅减少了标记数量,并在资源受限的环境中实现了高效的视频LLMs。
更新时间: 2025-07-04 00:31:19
领域: cs.CV,cs.AI,cs.CL,cs.LG
Treatment, evidence, imitation, and chat
Large language models are thought to have potential to aid in medical decision making. We investigate this here. We start with the treatment problem, the patient's core medical decision-making task, which is solved in collaboration with a healthcare provider. We discuss approaches to solving the treatment problem, including -- within evidence-based medicine -- trials and observational data. We then discuss the chat problem, and how this differs from the treatment problem -- in particular as it relates to imitation. We then discuss how a large language model might be used to solve the treatment problem and highlight some of the challenges that emerge. We finally discuss how these challenges relate to evidence-based medicine, and how this might inform next steps.
Updated: 2025-07-04 00:25:07
标题: 治疗、证据、模仿和聊天
摘要: 大型语言模型被认为有潜力帮助医学决策。我们在这里进行了调查。我们从治疗问题开始,病人的核心医学决策任务,这是在与医疗保健提供者合作中解决的。我们讨论了解决治疗问题的方法,包括基于证据的医学中的试验和观察数据。然后我们讨论了聊天问题,以及这与治疗问题的不同之处 - 特别是与模仿有关。然后我们讨论了如何使用大型语言模型来解决治疗问题,并强调了出现的一些挑战。最后,我们讨论了这些挑战与基于证据的医学的关系,以及这如何可能指导下一步。
更新时间: 2025-07-04 00:25:07
领域: stat.OT,cs.AI
Federated Continual Learning: Concepts, Challenges, and Solutions
Federated Continual Learning (FCL) has emerged as a robust solution for collaborative model training in dynamic environments, where data samples are continuously generated and distributed across multiple devices. This survey provides a comprehensive review of FCL, focusing on key challenges such as heterogeneity, model stability, communication overhead, and privacy preservation. We explore various forms of heterogeneity and their impact on model performance. Solutions to non-IID data, resource-constrained platforms, and personalized learning are reviewed in an effort to show the complexities of handling heterogeneous data distributions. Next, we review techniques for ensuring model stability and avoiding catastrophic forgetting, which are critical in non-stationary environments. Privacy-preserving techniques are another aspect of FCL that have been reviewed in this work. This survey has integrated insights from federated learning and continual learning to present strategies for improving the efficacy and scalability of FCL systems, making it applicable to a wide range of real-world scenarios.
Updated: 2025-07-04 00:22:16
标题: 联邦式持续学习:概念、挑战和解决方案
摘要: 联邦式持续学习(Federated Continual Learning,FCL)已经成为动态环境中协作模型训练的强大解决方案,其中数据样本不断生成并分布在多个设备上。本调查全面审视了FCL,重点关注关键挑战,如异构性、模型稳定性、通信开销和隐私保护。我们探讨了各种形式的异构性及其对模型性能的影响。解决非IID数据、资源受限平台和个性化学习的方法被审查,以展示处理异构数据分布的复杂性。接下来,我们审查了确保模型稳定性和避免灾难性遗忘的技术,这在非稳态环境中至关重要。隐私保护技术是本文中审查的另一个FCL方面。这项调查将联邦学习和持续学习的见解整合起来,提出了改进FCL系统效能和可扩展性的策略,使其适用于各种真实场景。
更新时间: 2025-07-04 00:22:16
领域: cs.LG,cs.AI
AgentPS: Agentic Process Supervision for Content Moderation with Multimodal LLMs
The advanced processing and reasoning capabilities of multimodal large language models (MLLMs) have driven substantial progress in vision-language (VL) understanding tasks. However, while effective for tasks governed by straightforward logic, MLLMs often struggle with reasoning complex, detail-intensive logical structures. To address this limitation, we introduce AgentPS, a novel framework that integrates Agentic Process Supervision into MLLMs by sequentially reasoning over ancillary questions during fine-tuning. AgentPS achieves substantial improvements over baseline MLLMs on both public benchmarks and proprietary datasets. Notably, we show that using MLLM-generated ancillary labels in place of human annotations yields only minimal performance degradation, highlighting the method's scalability. These results establish AgentPS as a scalable and effective solution for complex multimodal classification in large-scale industrial applications.
Updated: 2025-07-04 00:16:22
标题: 代理PS:多模态LLMs的内容管理的代理过程监督
摘要: 大型多模态语言模型(MLLMs)具有先进的处理和推理能力,推动了视觉语言(VL)理解任务的实质性进展。然而,虽然对于受简单逻辑控制的任务有效,但MLLMs常常在推理复杂、细节密集的逻辑结构时遇到困难。为了解决这一限制,我们引入了AgentPS,这是一个将代理过程监督集成到MLLMs中的新框架,通过在微调过程中依次推理辅助问题。AgentPS在公共基准测试和专有数据集上相比基准MLLMs取得了实质性改进。值得注意的是,我们证明使用MLLM生成的辅助标签代替人工注释仅导致性能略微降低,突显了该方法的可扩展性。这些结果确立了AgentPS作为大型工业应用中复杂多模态分类的可扩展和有效解决方案。
更新时间: 2025-07-04 00:16:22
领域: cs.CL,cs.AI
Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems
We propose a scalable and cost-efficient framework for deploying Graph-based Retrieval Augmented Generation (GraphRAG) in enterprise environments. While GraphRAG has shown promise for multi-hop reasoning and structured retrieval, its adoption has been limited by the high computational cost of constructing knowledge graphs using large language models (LLMs) and the latency of graph-based retrieval. To address these challenges, we introduce two core innovations: (1) a dependency-based knowledge graph construction pipeline that leverages industrial-grade NLP libraries to extract entities and relations from unstructured text completely eliminating reliance on LLMs; and (2) a lightweight graph retrieval strategy that combines hybrid query node identification with efficient one-hop traversal for high-recall, low-latency subgraph extraction. We evaluate our framework on two SAP datasets focused on legacy code migration and demonstrate strong empirical performance. Our system achieves up to 15% and 4.35% improvements over traditional RAG baselines based on LLM-as-Judge and RAGAS metrics, respectively. Moreover, our dependency-based construction approach attains 94% of the performance of LLM-generated knowledge graphs (61.87% vs. 65.83%) while significantly reducing cost and improving scalability. These results validate the feasibility of deploying GraphRAG systems in real-world, large-scale enterprise applications without incurring prohibitive resource requirements paving the way for practical, explainable, and domain-adaptable retrieval-augmented reasoning.
Updated: 2025-07-04 00:05:55
标题: 大规模RAG系统中基于非结构化文本的高效知识图构建和检索
摘要: 我们提出了一种可扩展且成本效益高的框架,用于在企业环境中部署基于图的检索增强生成(GraphRAG)。尽管GraphRAG在多跳推理和结构化检索方面显示出潜力,但由于使用大型语言模型(LLMs)构建知识图的高计算成本和基于图的检索的延迟,其采用受到限制。为解决这些挑战,我们引入了两个核心创新:(1)基于依赖关系的知识图构建管道,利用工业级NLP库从非结构化文本中提取实体和关系,完全消除对LLMs的依赖;以及(2)轻量级图检索策略,将混合查询节点标识与高召回率、低延迟的一跳遍历相结合,用于高效提取子图。我们在两个重点关注传统代码迁移的SAP数据集上评估了我们的框架,并展示了强大的实证性能。我们的系统在基于LLM为判断者和RAGAS指标的传统RAG基线上分别实现了高达15%和4.35%的改进。此外,我们的基于依赖性的构建方法实现了相当于LLM生成的知识图性能的94%(61.87% vs. 65.83%),同时显著降低了成本并提高了可扩展性。这些结果验证了在现实世界中大规模企业应用中部署GraphRAG系统的可行性,而不会带来过高的资源要求,为实用、可解释和领域可适应的检索增强推理铺平了道路。
更新时间: 2025-07-04 00:05:55
领域: cs.AI
7B Fully Open Source Moxin-LLM/VLM -- From Pretraining to GRPO-based Reinforcement Learning Enhancement
Recently, Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Although open-source LLMs present unprecedented opportunities for innovation and research, the commercialization of LLMs has raised concerns about transparency, reproducibility, and safety. Many open-source LLMs fail to meet fundamental transparency requirements by withholding essential components like training code and data, which may hinder further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed, adhering to principles of open science, open source, open data, and open access. We release the pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints, aiming to make continuous commitments to fully open-source LLMs. After pre-training the base model, we finetune the Moxin Base model with SOTA post-training framework and instruction data to obtain Moxin Instruct model. To improve the reasoning capability, we further finetune our Instruct model with chain-of-thought data distilled from DeepSeek R1, and then use Group Relative Policy Optimization (GRPO) following DeepSeek R1 to finetune our model, leading to the Moxin Reasoning model. Moreover, we develop our vision language model based on our Moxin model. Experiments show that our models achieve superior performance in various evaluations such as zero-shot evaluation, few-shot evaluation, and CoT evaluation.
Updated: 2025-07-04 00:04:42
标题: 7B 全开源 Moxin-LLM/VLM -- 从预训练到基于GRPO的强化学习增强
摘要: 最近,大型语言模型(LLMs)经历了重大转变,其受欢迎程度和功能能力迅速上升。主导这一演变的是像GPT-4和GPT-o1这样的专有LLMs,由于其出色的性能和多功能性,在人工智能社区引起了广泛关注。与此同时,像LLaMA这样的开源LLMs也因其易于定制和在各种应用中部署模型而对LLMs的日益增长的受欢迎度做出了巨大贡献。尽管开源LLMs为创新和研究带来了前所未有的机遇,但LLMs的商业化引发了关于透明度、可再现性和安全性的担忧。许多开源LLMs未能满足基本的透明度要求,因为它们隐瞒了培训代码和数据等关键组件,这可能会阻碍对LLMs的进一步创新。为了缓解这一问题,我们推出了Moxin 7B,这是一个完全开源的LLM,遵循开放科学、开源、开放数据和开放获取原则进行开发。我们发布了预训练代码和配置、训练和微调数据集,以及中间和最终检查点,旨在持续承诺全面开源LLMs。在预训练基础模型后,我们使用SOTA后训练框架和指导数据微调Moxin Base模型,获得Moxin Instruct模型。为了提高推理能力,我们进一步使用从DeepSeek R1中提炼出的思维链数据对我们的Instruct模型进行微调,然后使用DeepSeek R1后续Group Relative Policy Optimization(GRPO)对我们的模型进行微调,形成Moxin Reasoning模型。此外,我们基于我们的Moxin模型开发了我们的视觉语言模型。实验证明,我们的模型在零-shot评估、少-shot评估和CoT评估等各种评估中表现出优越的性能。
更新时间: 2025-07-04 00:04:42
领域: cs.CL,cs.AI,cs.LG