DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders
Video diffusion models have revolutionized generative video synthesis, but they are imprecise, slow, and can be opaque during generation -- keeping users in the dark for a prolonged period. In this work, we propose DiffusionBrowser, a model-agnostic, lightweight decoder framework that allows users to interactively generate previews at any point (timestep or transformer block) during the denoising process. Our model can generate multi-modal preview representations that include RGB and scene intrinsics at more than 4$\times$ real-time speed (less than 1 second for a 4-second video) that convey consistent appearance and motion to the final video. With the trained decoder, we show that it is possible to interactively guide the generation at intermediate noise steps via stochasticity reinjection and modal steering, unlocking a new control capability. Moreover, we systematically probe the model using the learned decoders, revealing how scene, object, and other details are composed and assembled during the otherwise black-box denoising process.
Updated: 2025-12-15 18:59:57
标题: DiffusionBrowser: 通过多分支解码器进行交互式扩散预览
摘要: 视频扩散模型已经彻底改变了生成式视频合成,但它们不够精确,速度缓慢,在生成过程中可能会变得不透明,使用户在很长一段时间内被蒙在鼓里。在这项工作中,我们提出了DiffusionBrowser,这是一个模型无关、轻量级的解码器框架,允许用户在去噪过程中的任意时间点(时间步长或转换器块)交互地生成预览。我们的模型可以生成多模态预览表示,包括RGB和场景内在特性,以超过4倍的实时速度(对于一个4秒的视频不到1秒),传达一致的外观和动作到最终视频。通过训练好的解码器,我们展示了可以通过随机性重新注入和模态引导来交互地引导在中间噪声步骤中的生成,解锁了一种新的控制能力。此外,我们系统地探索了使用学习解码器对模型进行的,揭示了在否则黑盒去噪过程中如何组成和组装场景、对象和其他细节。
更新时间: 2025-12-15 18:59:57
领域: cs.CV,cs.AI,cs.GR,cs.LG
Feedforward 3D Editing via Text-Steerable Image-to-3D
Recent progress in image-to-3D has opened up immense possibilities for design, AR/VR, and robotics. However, to use AI-generated 3D assets in real applications, a critical requirement is the capability to edit them easily. We present a feedforward method, Steer3D, to add text steerability to image-to-3D models, which enables editing of generated 3D assets with language. Our approach is inspired by ControlNet, which we adapt to image-to-3D generation to enable text steering directly in a forward pass. We build a scalable data engine for automatic data generation, and develop a two-stage training recipe based on flow-matching training and Direct Preference Optimization (DPO). Compared to competing methods, Steer3D more faithfully follows the language instruction and maintains better consistency with the original 3D asset, while being 2.4x to 28.5x faster. Steer3D demonstrates that it is possible to add a new modality (text) to steer the generation of pretrained image-to-3D generative models with 100k data. Project website: https://glab-caltech.github.io/steer3d/
Updated: 2025-12-15 18:58:55
标题: 前馈3D编辑通过文本可导向的图像到3D
摘要: 最近在图像到3D方面取得的进展为设计、AR/VR和机器人技术开辟了巨大的可能性。然而,要在实际应用中使用AI生成的3D资产,一个关键要求是能够轻松地对其进行编辑。我们提出了一种前馈方法Steer3D,用于向图像到3D模型添加文本可操控性,从而实现使用语言编辑生成的3D资产。我们的方法受到ControlNet的启发,我们将其调整为图像到3D生成,以在正向传递中直接实现文本操控。我们构建了一个可扩展的数据引擎用于自动生成数据,并开发了基于流匹配训练和Direct Preference Optimization (DPO)的两阶段训练方案。与竞争方法相比,Steer3D更加忠实地遵循语言指令,并与原始3D资产保持更好的一致性,同时速度更快,速度提高了2.4倍至28.5倍。Steer3D展示了向预训练的图像到3D生成模型添加一种新方式(文本)以控制生成的可能性,并使用了10万数据。项目网站:https://glab-caltech.github.io/steer3d/
更新时间: 2025-12-15 18:58:55
领域: cs.CV,cs.AI
Directional Textual Inversion for Personalized Text-to-Image Generation
Textual Inversion (TI) is an efficient approach to text-to-image personalization but often fails on complex prompts. We trace these failures to embedding norm inflation: learned tokens drift to out-of-distribution magnitudes, degrading prompt conditioning in pre-norm Transformers. Empirically, we show semantics are primarily encoded by direction in CLIP token space, while inflated norms harm contextualization; theoretically, we analyze how large magnitudes attenuate positional information and hinder residual updates in pre-norm blocks. We propose Directional Textual Inversion (DTI), which fixes the embedding magnitude to an in-distribution scale and optimizes only direction on the unit hypersphere via Riemannian SGD. We cast direction learning as MAP with a von Mises-Fisher prior, yielding a constant-direction prior gradient that is simple and efficient to incorporate. Across personalization tasks, DTI improves text fidelity over TI and TI-variants while maintaining subject similarity. Crucially, DTI's hyperspherical parameterization enables smooth, semantically coherent interpolation between learned concepts (slerp), a capability that is absent in standard TI. Our findings suggest that direction-only optimization is a robust and scalable path for prompt-faithful personalization.
Updated: 2025-12-15 18:57:07
标题: 个性化文本到图像生成的方向性文本反转
摘要: 文本反转(TI)是一种有效的文本到图像个性化方法,但通常在复杂提示上失败。我们将这些失败追溯到嵌入规范膨胀:学习的标记漂移至超出分布范围的幅度,降低了预规范Transformer中提示的条件。从经验上讲,我们展示了语义主要由CLIP标记空间中的方向编码,而膨胀的规范损害了上下文化;从理论上讲,我们分析了大幅度如何减弱位置信息并阻碍预规范块中的残差更新。我们提出了方向文本反转(DTI),它将嵌入幅度固定为一个分布尺度,并通过Riemannian SGD仅在单位超球面上优化方向。我们将方向学习作为具有冯·米塞斯-费舍尔先验的MAP,产生一个恒定方向的先验梯度,简单高效地融入其中。在个性化任务中,DTI提高了文本的忠实度,同时保持了主题相似性,优于TI和TI变体。至关重要的是,DTI的超球形参数化使得在学习概念之间实现平滑、语义连贯的插值(slerp)成为可能,这是标准TI中所没有的能力。我们的发现表明,仅优化方向是一条稳健且可扩展的路径,用于忠实于提示的个性化。
更新时间: 2025-12-15 18:57:07
领域: cs.LG,cs.CV
Template-Guided Reconstruction of Pulmonary Segments with Neural Implicit Functions
High-quality 3D reconstruction of pulmonary segments plays a crucial role in segmentectomy and surgical planning for the treatment of lung cancer. Due to the resolution requirement of the target reconstruction, conventional deep learning-based methods often suffer from computational resource constraints or limited granularity. Conversely, implicit modeling is favored due to its computational efficiency and continuous representation at any resolution. We propose a neural implicit function-based method to learn a 3D surface to achieve anatomy-aware, precise pulmonary segment reconstruction, represented as a shape by deforming a learnable template. Additionally, we introduce two clinically relevant evaluation metrics to comprehensively assess the quality of the reconstruction. Furthermore, to address the lack of publicly available shape datasets for benchmarking reconstruction algorithms, we developed a shape dataset named Lung3D, which includes the 3D models of 800 labeled pulmonary segments and their corresponding airways, arteries, veins, and intersegmental veins. We demonstrate that the proposed approach outperforms existing methods, providing a new perspective for pulmonary segment reconstruction. Code and data will be available at https://github.com/HINTLab/ImPulSe.
Updated: 2025-12-15 18:56:33
标题: 用神经隐式函数模板引导的肺段重建
摘要: 肺段高质量的3D重建在肺癌治疗中的肺段切除和手术规划中起着至关重要的作用。由于目标重建的分辨率要求,传统的基于深度学习的方法往往受到计算资源限制或有限的粒度。相反,由于其计算效率和在任何分辨率下的连续表示,隐式建模备受青睐。我们提出一种基于神经隐式函数的方法,通过学习3D表面来实现解剖学感知、精确的肺段重建,将其表示为通过变形可学习的模板。此外,我们引入了两个临床相关的评估指标,全面评估重建的质量。此外,为了解决公开可用形状数据集缺乏用于基准重建算法的问题,我们开发了一个名为Lung3D的形状数据集,包括800个标记的肺段及其相应的气道、动脉、静脉和肺段之间的静脉的3D模型。我们证明所提出的方法优于现有方法,为肺段重建提供了新的视角。代码和数据将在https://github.com/HINTLab/ImPulSe上提供。
更新时间: 2025-12-15 18:56:33
领域: cs.GR,cs.AI,cs.CV
A Scientific Reasoning Model for Organic Synthesis Procedure Generation
Solving computer-aided synthesis planning is essential for enabling fully automated, robot-assisted synthesis workflows and improving the efficiency of drug discovery. A key challenge, however, is bridging the gap between computational route design and practical laboratory execution, particularly the accurate prediction of viable experimental procedures for each synthesis step. In this work, we present QFANG, a scientific reasoning language model capable of generating precise, structured experimental procedures directly from reaction equations, with explicit chain-of-thought reasoning. To develop QFANG, we curated a high-quality dataset comprising 905,990 chemical reactions paired with structured action sequences, extracted and processed from patent literature using large language models. We introduce a Chemistry-Guided Reasoning (CGR) framework that produces chain-of-thought data grounded in chemical knowledge at scale. The model subsequently undergoes supervised fine-tuning to elicit complex chemistry reasoning. Finally, we apply Reinforcement Learning from Verifiable Rewards (RLVR) to further enhance procedural accuracy. Experimental results demonstrate that QFANG outperforms advanced general-purpose reasoning models and nearest-neighbor retrieval baselines, measured by traditional NLP similarity metrics and a chemically aware evaluator using an LLM-as-a-judge. Moreover, QFANG generalizes to certain out-of-domain reaction classes and adapts to variations in laboratory conditions and user-specific constraints. We believe that QFANG's ability to generate high-quality synthesis procedures represents an important step toward bridging the gap between computational synthesis planning and fully automated laboratory synthesis.
Updated: 2025-12-15 18:55:39
标题: 一个有机合成程序生成的科学推理模型
摘要: 解决计算机辅助合成规划是实现完全自动化、机器人辅助合成工作流程和提高药物发现效率的关键。然而,一个关键挑战是弥合计算路线设计与实际实验室执行之间的差距,特别是准确预测每个合成步骤的可行实验程序。在这项工作中,我们提出了QFANG,一种科学推理语言模型,能够直接从反应方程式生成精确、结构化的实验程序,并具有明确的思维链。为了开发QFANG,我们精心筛选了一个高质量的数据集,包括905,990个化学反应和结构化行动序列,这些数据从专利文献中提取并处理,使用大型语言模型。我们引入了一个基于化学知识的推理框架(CGR),以规模化地生成基于化学知识的思维链数据。随后,模型经过监督微调,以引发复杂的化学推理。最后,我们应用可验证奖励的强化学习(RLVR)来进一步提高程序的准确性。实验结果表明,QFANG在传统NLP相似度指标和化学感知评估器(使用LLM作为评判者)方面胜过先进的通用推理模型和最近邻检索基线。此外,QFANG能够推广到某些超领域反应类别,并适应实验室条件的变化和用户特定的约束。我们相信,QFANG生成高质量合成程序的能力代表了弥合计算合成规划与完全自动化实验室合成之间差距的重要一步。
更新时间: 2025-12-15 18:55:39
领域: cs.LG
SEDULity: A Proof-of-Learning Framework for Distributed and Secure Blockchains with Efficient Useful Work
The security and decentralization of Proof-of-Work (PoW) have been well-tested in existing blockchain systems. However, its tremendous energy waste has raised concerns about sustainability. Proof-of-Useful-Work (PoUW) aims to redirect the meaningless computation to meaningful tasks such as solving machine learning (ML) problems, giving rise to the branch of Proof-of-Learning (PoL). While previous studies have proposed various PoLs, they all, to some degree, suffer from security, decentralization, or efficiency issues. In this paper, we propose a PoL framework that trains ML models efficiently while maintaining blockchain security in a fully distributed manner. We name the framework SEDULity, which stands for a Secure, Efficient, Distributed, and Useful Learning-based blockchain system. Specifically, we encode the template block into the training process and design a useful function that is difficult to solve but relatively easy to verify, as a substitute for the PoW puzzle. We show that our framework is distributed, secure, and efficiently trains ML models. We further demonstrate that the proposed PoL framework can be extended to other types of useful work and design an incentive mechanism to incentivize task verification. We show theoretically that a rational miner is incentivized to train fully honestly with well-designed system parameters. Finally, we present simulation results to demonstrate the performance of our framework and validate our analysis.
Updated: 2025-12-15 18:55:20
标题: SEDULity: 一种基于学习证明的分布式和安全区块链框架,具有高效的有用工作
摘要: 工作量证明(PoW)的安全性和去中心化在现有的区块链系统中得到了很好的测试。然而,其巨大的能源浪费引发了对可持续性的担忧。有用工作证明(PoUW)旨在将毫无意义的计算重定向到解决机器学习(ML)问题等有意义的任务,从而催生了有用学习证明(PoL)的分支。虽然先前的研究提出了各种PoL,但它们在一定程度上都存在安全性、去中心化或效率问题。在本文中,我们提出了一个PoL框架,可以在完全分布式的方式下高效地训练ML模型,同时保持区块链的安全性。我们将该框架命名为SEDULity,代表一个安全、高效、分布式和有用的基于学习的区块链系统。具体地,我们将模板块编码到训练过程中,并设计一个难以解决但相对容易验证的有用函数,作为PoW难题的替代物。我们展示了我们的框架是分布式的、安全的,并且能够高效地训练ML模型。我们进一步展示了所提出的PoL框架可以扩展到其他类型的有用工作,并设计了一个激励机制来激励任务验证。我们从理论上证明,合理的矿工会受到激励以诚实地训练,并具有良好设计的系统参数。最后,我们呈现了模拟结果,展示了我们框架的性能,并验证了我们的分析。
更新时间: 2025-12-15 18:55:20
领域: cs.CR,cs.DC,cs.IT,cs.LG
Embedding-Based Rankings of Educational Resources based on Learning Outcome Alignment: Benchmarking, Expert Validation, and Learner Performance
As the online learning landscape evolves, the need for personalization is increasingly evident. Although educational resources are burgeoning, educators face challenges selecting materials that both align with intended learning outcomes and address diverse learner needs. Large Language Models (LLMs) are attracting growing interest for their potential to create learning resources that better support personalization, but verifying coverage of intended outcomes still requires human alignment review, which is costly and limits scalability. We propose a framework that supports the cost-effective automation of evaluating alignment between educational resources and intended learning outcomes. Using human-generated materials, we benchmarked LLM-based text-embedding models and found that the most accurate model (Voyage) achieved 79% accuracy in detecting alignment. We then applied the optimal model to LLM-generated resources and, via expert evaluation, confirmed that it reliably assessed correspondence to intended outcomes (83% accuracy). Finally, in a three-group experiment with 360 learners, higher alignment scores were positively related to greater learning performance, chi-squared(2, N = 360) = 15.39, p < 0.001. These findings show that embedding-based alignment scores can facilitate scalable personalization by confirming alignment with learning outcomes, which allows teachers to focus on tailoring content to diverse learner needs.
Updated: 2025-12-15 18:51:00
标题: 基于学习成果对教育资源进行嵌入式排名:基准测试、专家验证和学习者表现
摘要: 随着在线学习领域的发展,个性化需求日益凸显。尽管教育资源不断增加,教育工作者在选择既符合预期学习成果又满足不同学习者需求的材料方面面临挑战。大型语言模型(LLMs)因其能够创建更好支持个性化学习资源的潜力而引起越来越多的关注,但验证预期成果覆盖仍需要人工对齐审查,这既昂贵又限制了可伸缩性。我们提出了一个框架,支持成本效益高的自动评估教育资源与预期学习成果之间的对齐。使用人工生成的材料,我们对基于LLM的文本嵌入模型进行了基准测试,发现最准确的模型(Voyage)在检测对齐方面达到了79%的准确率。然后,我们将最佳模型应用于LLM生成的资源,并通过专家评估确认其能够可靠地评估与预期结果的一致性(83%的准确率)。最后,在一个涉及360名学习者的三组实验中,更高的对齐分数与更好的学习表现呈正相关,卡方值(2,N = 360)= 15.39,p <0.001。这些发现表明,基于嵌入的对齐分数可以通过确认与学习成果的对齐来促进可扩展的个性化,这使教师能够专注于调整内容以满足多样化学习者需求。
更新时间: 2025-12-15 18:51:00
领域: cs.CY,cs.AI
Large-Language Memorization During the Classification of United States Supreme Court Cases
Large-language models (LLMs) have been shown to respond in a variety of ways for classification tasks outside of question-answering. LLM responses are sometimes called "hallucinations" since the output is not what is ex pected. Memorization strategies in LLMs are being studied in detail, with the goal of understanding how LLMs respond. We perform a deep dive into a classification task based on United States Supreme Court (SCOTUS) decisions. The SCOTUS corpus is an ideal classification task to study for LLM memory accuracy because it presents significant challenges due to extensive sentence length, complex legal terminology, non-standard structure, and domain-specific vocabulary. Experimentation is performed with the latest LLM fine tuning and retrieval-based approaches, such as parameter-efficient fine-tuning, auto-modeling, and others, on two traditional category-based SCOTUS classification tasks: one with 15 labeled topics and another with 279. We show that prompt-based models with memories, such as DeepSeek, can be more robust than previous BERT-based models on both tasks scoring about 2 points better than previous models not based on prompting.
Updated: 2025-12-15 18:47:48
标题: 在对美国最高法院案件进行分类时的大语言记忆
摘要: 大型语言模型(LLMs)已被证明在问题回答之外的分类任务中以多种方式做出响应。LLM的响应有时被称为“幻觉”,因为输出与预期不符。目前正在详细研究LLM中的记忆策略,目的是理解LLM的响应方式。我们深入研究了基于美国最高法院(SCOTUS)裁决的分类任务。SCOTUS语料库是一个理想的分类任务,用于研究LLM的记忆准确性,因为由于句子长度长、法律术语复杂、结构非标准以及领域特定词汇等挑战较大。我们使用最新的LLM微调和检索方法进行实验,如参数高效微调、自动建模等,对两个传统基于类别的SCOTUS分类任务进行实验:一个包含15个标记主题,另一个包含279个主题。我们表明,基于提示的模型,如DeepSeek,可以比以前基于BERT的模型更为稳健,两个任务的得分比之前不基于提示的模型高约2个点。
更新时间: 2025-12-15 18:47:48
领域: cs.CL,cs.AI,cs.ET,cs.IR
World Models Can Leverage Human Videos for Dexterous Manipulation
Dexterous manipulation is challenging because it requires understanding how subtle hand motion influences the environment through contact with objects. We introduce DexWM, a Dexterous Manipulation World Model that predicts the next latent state of the environment conditioned on past states and dexterous actions. To overcome the scarcity of dexterous manipulation datasets, DexWM is trained on over 900 hours of human and non-dexterous robot videos. To enable fine-grained dexterity, we find that predicting visual features alone is insufficient; therefore, we introduce an auxiliary hand consistency loss that enforces accurate hand configurations. DexWM outperforms prior world models conditioned on text, navigation, and full-body actions, achieving more accurate predictions of future states. DexWM also demonstrates strong zero-shot generalization to unseen manipulation skills when deployed on a Franka Panda arm equipped with an Allegro gripper, outperforming Diffusion Policy by over 50% on average in grasping, placing, and reaching tasks.
Updated: 2025-12-15 18:37:12
标题: 世界模型可以利用人类视频进行灵巧操作
摘要: 灵巧操纵是具有挑战性的,因为它需要理解微妙的手部运动如何通过与物体接触影响环境。我们引入了DexWM,一种灵巧操纵世界模型,它可以根据过去的状态和灵巧动作来预测环境的下一个潜在状态。为了克服灵巧操纵数据集的稀缺性,DexWM在超过900小时的人类和非灵巧机器人视频上进行了训练。为了实现细粒度的灵巧性,我们发现仅预测视觉特征是不够的;因此,我们引入了一个辅助手部一致性损失,以确保准确的手部配置。DexWM在文本、导航和全身动作条件下的先前世界模型表现出色,实现了对未来状态更准确的预测。当在装备有Allegro夹持器的Franka Panda臂上部署时,DexWM还展示了对未见操纵技能的强大零-shot泛化能力,在抓取、放置和到达任务中平均表现优于扩散策略50%以上。
更新时间: 2025-12-15 18:37:12
领域: cs.RO,cs.AI,cs.CV
From Code to Field: Evaluating the Robustness of Convolutional Neural Networks for Disease Diagnosis in Mango Leaves
The validation and verification of artificial intelligence (AI) models through robustness assessment are essential to guarantee the reliable performance of intelligent systems facing real-world challenges, such as image corruptions including noise, blurring, and weather variations. Despite the global importance of mango (Mangifera indica L.), there is a lack of studies on the robustness of models for the diagnosis of disease in its leaves. This paper proposes a methodology to evaluate convolutional neural networks (CNNs) under adverse conditions. We adapted the MangoLeafDB dataset, generating MangoLeafDB-C with 19 types of artificial corruptions at five severity levels. We conducted a benchmark comparing five architectures: ResNet-50, ResNet-101, VGG-16, Xception, and LCNN (the latter being a lightweight architecture designed specifically for mango leaf diagnosis). The metrics include the F1 score, the corruption error (CE) and the relative mean corruption error (relative mCE). The results show that LCNN outperformed complex models in corruptions that can be present in real-world scenarios such as Defocus Blur, Motion Blur, while also achieving the lowest mCE. Modern architectures (e.g., ResNet-101) exhibited significant performance degradation in corrupted scenarios, despite their high accuracy under ideal conditions. These findings suggest that lightweight and specialized models may be more suitable for real-world applications in edge devices, where robustness and efficiency are critical. The study highlights the need to incorporate robustness assessments in the development of intelligent systems for agriculture, particularly in regions with technological limitations.
Updated: 2025-12-15 18:36:48
标题: 从代码到田野:评估卷积神经网络在芒果叶病害诊断中的鲁棒性
摘要: 通过健壮性评估对人工智能(AI)模型进行验证和验证对于确保智能系统面对现实世界挑战的可靠性表现至关重要,例如图像污染,包括噪音,模糊和天气变化。尽管芒果(Mangifera indica L.)的全球重要性,但缺乏对其叶子疾病诊断模型的健壮性研究。本文提出了一种评估卷积神经网络(CNNs)在逆境条件下的方法。我们调整了MangoLeafDB数据集,生成了具有19种人工污染的MangoLeafDB-C,分为五个严重程度级别。我们对五种架构进行了基准测试:ResNet-50,ResNet-101,VGG-16,Xception和LCNN(后者是专门设计用于芒果叶诊断的轻量级架构)。指标包括F1分数,污染错误(CE)和相对平均污染错误(相对mCE)。结果表明,LCNN在可能出现在现实场景中的污染中表现优于复杂模型,如焦外模糊,运动模糊,同时也实现了最低的mCE。现代架构(例如ResNet-101)在受损场景中表现显著下降,尽管在理想条件下的准确性很高。这些发现表明,轻量级和专门化模型可能更适合边缘设备中的现实应用,其中健壮性和效率至关重要。该研究强调了在农业智能系统开发中,特别是在技术限制的地区,需要纳入健壮性评估的必要性。
更新时间: 2025-12-15 18:36:48
领域: cs.LG,cs.AI,cs.CV
Universality of high-dimensional scaling limits of stochastic gradient descent
We consider statistical tasks in high dimensions whose loss depends on the data only through its projection into a fixed-dimensional subspace spanned by the parameter vectors and certain ground truth vectors. This includes classifying mixture distributions with cross-entropy loss with one and two-layer networks, and learning single and multi-index models with one and two-layer networks. When the data is drawn from an isotropic Gaussian mixture distribution, it is known that the evolution of a finite family of summary statistics under stochastic gradient descent converges to an autonomous ordinary differential equation (ODE), as the dimension and sample size go to $\infty$ and the step size goes to $0$ commensurately. Our main result is that these ODE limits are universal in that this convergence occurs even when the data is drawn from mixtures of product measures provided the first two moments match the corresponding Gaussian distribution and the initialization and ground truth vectors are sufficiently coordinate-delocalized. We complement this by proving two corresponding non-universality results. We provide a simple example where the ODE limits are non-universal if the initialization is coordinate aligned. We also show that the stochastic differential equation limits arising as fluctuations of the summary statistics around their ODE's fixed points are not universal.
Updated: 2025-12-15 18:30:26
标题: 随机梯度下降高维缩放极限的普遍性
摘要: 我们考虑高维统计任务,其损失仅通过数据在由参数向量和某些基本真值向量张成的固定维子空间中的投影来确定。这包括使用一层和两层网络进行交叉熵损失的混合分布分类,以及使用一层和两层网络进行单指标和多指标模型学习。当数据来自各向同性高斯混合分布时,已知在随机梯度下降下一组有限的摘要统计的演化会收敛到一个自主常微分方程(ODE),当维度和样本大小趋于无穷大,步长趋于0时。我们的主要结果是这些ODE极限是普遍的,即这种收敛即使在数据来自产品测度混合时也会发生,只要第一和第二时刻匹配相应的高斯分布,并且初始化和基本真值向量足够坐标分散。我们通过证明两个相应的非普遍性结果来补充这一点。我们提供了一个简单的例子,其中如果初始化是坐标对齐的,则ODE极限是非普遍的。我们还展示了摘要统计围绕其ODE固定点的波动所产生的随机微分方程极限也不是普遍的。
更新时间: 2025-12-15 18:30:26
领域: stat.ML,cs.LG,math.PR,math.ST
Developing synthetic microdata through machine learning for firm-level business surveys
Public-use microdata samples (PUMS) from the United States (US) Census Bureau on individuals have been available for decades. However, large increases in computing power and the greater availability of Big Data have dramatically increased the probability of re-identifying anonymized data, potentially violating the pledge of confidentiality given to survey respondents. Data science tools can be used to produce synthetic data that preserve critical moments of the empirical data but do not contain the records of any existing individual respondent or business. Developing public-use firm data from surveys presents unique challenges different from demographic data, because there is a lack of anonymity and certain industries can be easily identified in each geographic area. This paper briefly describes a machine learning model used to construct a synthetic PUMS based on the Annual Business Survey (ABS) and discusses various quality metrics. Although the ABS PUMS is currently being refined and results are confidential, we present two synthetic PUMS developed for the 2007 Survey of Business Owners, similar to the ABS business data. Econometric replication of a high impact analysis published in Small Business Economics demonstrates the verisimilitude of the synthetic data to the true data and motivates discussion of possible ABS use cases.
Updated: 2025-12-15 18:28:42
标题: 利用机器学习开发合成微观数据用于企业层面的商业调查
摘要: 美国人口普查局的个人公共使用微数据样本(PUMS)已经有几十年的历史。然而,计算能力的大幅增加和大数据的更广泛可用性极大地增加了重新识别匿名数据的可能性,可能违反对调查受访者的保密承诺。数据科学工具可以用来生成保留实证数据的关键时刻但不包含任何现有个人受访者或企业记录的合成数据。从调查中开发公共使用公司数据面临着不同于人口统计数据的独特挑战,因为在每个地理区域中,缺乏匿名性,某些行业可以很容易地被识别。本文简要描述了一个基于年度商业调查(ABS)构建合成PUMS的机器学习模型,并讨论了各种质量指标。尽管ABS PUMS目前正在完善并且结果是保密的,我们提供了两个为2007年业主调查开发的合成PUMS,类似于ABS商业数据。对《小型企业经济学》发表的高影响力分析的计量复制表明,合成数据与真实数据的相似度,并激发对可能的ABS使用案例的讨论。
更新时间: 2025-12-15 18:28:42
领域: cs.LG,econ.GN,stat.AP,stat.ME
StutterFuse: Mitigating Modality Collapse in Stuttering Detection with Jaccard-Weighted Metric Learning and Gated Fusion
Stuttering detection breaks down when disfluencies overlap. Existing parametric models struggle to distinguish complex, simultaneous disfluencies (e.g., a 'block' with a 'prolongation') due to the scarcity of these specific combinations in training data. While Retrieval-Augmented Generation (RAG) has revolutionized NLP by grounding models in external knowledge, this paradigm remains unexplored in pathological speech processing. To bridge this gap, we introduce StutterFuse, the first Retrieval-Augmented Classifier (RAC) for multi-label stuttering detection. By conditioning a Conformer encoder on a non-parametric memory bank of clinical examples, we allow the model to classify by reference rather than memorization. We further identify and solve "Modality Collapse", an "Echo Chamber" effect where naive retrieval boosts recall but degrades precision. We mitigate this using: (1) SetCon, a Jaccard-Weighted Metric Learning objective that optimizes for multi-label set similarity, and (2) a Gated Mixture-of-Experts fusion strategy that dynamically arbitrates between acoustic evidence and retrieved context. On the SEP-28k dataset, StutterFuse achieves a weighted F1-score of 0.65, outperforming strong baselines and demonstrating remarkable zero-shot cross-lingual generalization.
Updated: 2025-12-15 18:28:39
标题: StutterFuse: 使用Jaccard加权度量学习和门控融合缓解口吃检测中的模态崩溃
摘要: 口吃检测在不流畅重叠时会出现问题。现有的参数模型很难区分复杂的、同时发生的不流畅现象(例如“阻塞”和“延长”),因为训练数据中这些特定组合很少见。虽然检索增强生成(RAG)通过引入外部知识使模型基础化,已经在自然语言处理领域取得了革命性进展,但这种范式在病理性言语处理领域尚未被探索。为了弥补这一差距,我们引入了StutterFuse,第一个用于多标签口吃检测的检索增强分类器(RAC)。通过在一个非参数化的临床示例存储库上对Conformer编码器进行条件化,我们使模型能够通过引用而不是记忆来分类。我们进一步确定并解决了“模态崩溃”问题,这是一种“回音室”效应,即天真的检索会提高召回率但降低精确度。我们通过以下方式缓解这一问题:(1)SetCon,一种Jaccard加权度量学习目标,优化多标签集相似性,(2)一种门控专家混合策略,动态地调节声学证据和检索上下文之间的关系。在SEP-28k数据集上,StutterFuse实现了0.65的加权F1分数,优于强基线,并展示了显著的零射击跨语言泛化能力。
更新时间: 2025-12-15 18:28:39
领域: cs.LG
Certified-Everlasting Quantum NIZK Proofs
We study non-interactive zero-knowledge proofs (NIZKs) for NP satisfying: 1) statistical soundness, 2) computational zero-knowledge and 3) certified-everlasting zero-knowledge (CE-ZK). The CE-ZK property allows a verifier of a quantum proof to revoke the proof in a way that can be checked (certified) by the prover. Conditioned on successful certification, the verifier's state can be efficiently simulated with only the statement, in a statistically indistinguishable way. Our contributions regarding these certified-everlasting NIZKs (CE-NIZKs) are as follows: - We identify a barrier to obtaining CE-NIZKs in the CRS model via generalizations of known interactive proofs that satisfy CE-ZK. - We circumvent this by constructing CE-NIZK from black-box use of NIZK for NP satisfying certain properties, along with OWFs. As a result, we obtain CE-NIZKs for NP in the CRS model, based on polynomial hardness of the learning with errors (LWE) assumption. - In addition, we observe that the aforementioned barrier does not apply to the shared EPR model. Consequently, we present a CE-NIZK for NP in this model based on any statistical binding hidden-bits generator, which can be based on LWE. The only quantum computation in this protocol involves single-qubit measurements of the shared EPR pairs.
Updated: 2025-12-15 18:23:48
标题: 认证永恒量子NIZK证明
摘要: 我们研究了满足以下条件的NP的非交互式零知识证明(NIZK):1)统计上的可靠性,2)计算零知识和3)认证永久零知识(CE-ZK)。CE-ZK属性允许量子证明的验证者以一种可以被证明者检查(认证)的方式撤销证明。在成功认证的情况下,验证者的状态可以仅通过陈述以统计上无法区分的方式进行高效模拟。关于这些认证永久NIZKs(CE-NIZKs)的贡献如下: - 我们确定了通过满足CE-ZK的已知交互式证明的泛化来获得CRS模型中CE-NIZKs的障碍。 - 通过利用满足特定属性的NP的NIZK的黑盒使用以及OWFs,我们规避了这一障碍。因此,我们基于学习有误差(LWE)假设的多项式困难度获得了CRS模型中NP的CE-NIZKs。 - 此外,我们观察到上述障碍不适用于共享EPR模型。因此,我们提出了一种基于任何统计绑定隐藏位生成器的共享EPR模型中NP的CE-NIZK,该生成器可以基于LWE。该协议中唯一的量子计算涉及共享的EPR对的单比特测量。
更新时间: 2025-12-15 18:23:48
领域: quant-ph,cs.CR
BlurDM: A Blur Diffusion Model for Image Deblurring
Diffusion models show promise for dynamic scene deblurring; however, existing studies often fail to leverage the intrinsic nature of the blurring process within diffusion models, limiting their full potential. To address it, we present a Blur Diffusion Model (BlurDM), which seamlessly integrates the blur formation process into diffusion for image deblurring. Observing that motion blur stems from continuous exposure, BlurDM implicitly models the blur formation process through a dual-diffusion forward scheme, diffusing both noise and blur onto a sharp image. During the reverse generation process, we derive a dual denoising and deblurring formulation, enabling BlurDM to recover the sharp image by simultaneously denoising and deblurring, given pure Gaussian noise conditioned on the blurred image as input. Additionally, to efficiently integrate BlurDM into deblurring networks, we perform BlurDM in the latent space, forming a flexible prior generation network for deblurring. Extensive experiments demonstrate that BlurDM significantly and consistently enhances existing deblurring methods on four benchmark datasets. The project page is available at https://jin-ting-he.github.io/Blur-Diffusion-Model/.
Updated: 2025-12-15 18:15:43
标题: BlurDM:一种图像去模糊的模糊扩散模型
摘要: 扩散模型在动态场景去模糊方面表现出潜力;然而,现有研究往往未能充分利用扩散模型内在的模糊过程的特性,限制了它们的完整潜力。为了解决这个问题,我们提出了一个模糊扩散模型(BlurDM),它将模糊形成过程无缝地集成到扩散中以进行图像去模糊。观察到运动模糊源于连续曝光,BlurDM通过双扩散前向方案隐式模拟了模糊形成过程,将噪声和模糊扩散到一个清晰图像上。在反向生成过程中,我们推导出一个双去噪和去模糊公式,使得BlurDM能够通过给定的模糊图像作为输入,同时去噪和去模糊来恢复清晰图像。此外,为了有效地将BlurDM集成到去模糊网络中,我们在潜在空间中进行BlurDM,形成一个灵活的先验生成网络用于去模糊。大量实验证明,BlurDM显著并一致地增强了现有的去模糊方法在四个基准数据集上的表现。项目页面可在https://jin-ting-he.github.io/Blur-Diffusion-Model/ 上找到。
更新时间: 2025-12-15 18:15:43
领域: cs.CV,cs.AI
AI Copilots for Reproducibility in Science: A Case Study
Open science initiatives seek to make research outputs more transparent, accessible, and reusable, but ensuring that published findings can be independently reproduced remains a persistent challenge. In this paper we describe an AI-driven "Reproducibility Copilot" that analyzes manuscripts, code, and supplementary materials to generate structured Jupyter Notebooks and recommendations aimed at facilitating computational, or "rote", reproducibility. Our initial results suggest that the copilot has the potential to substantially reduce reproduction time (in one case from over 30 hours to about 1 hour) while achieving high coverage of figures, tables, and results suitable for computational reproduction. The system systematically detects barriers to reproducibility, including missing values for hyperparameters, undocumented preprocessing steps, and incomplete or inaccessible datasets. Although preliminary, these findings suggest that AI tools can meaningfully reduce the burden of reproducibility efforts and contribute to more transparent and verifiable scientific communication.
Updated: 2025-12-15 18:11:39
标题: 科学中的AI副驾驶员:可复现性的案例研究
摘要: 开放科学倡议旨在使研究成果更加透明、可访问和可重复使用,但确保发表的研究结果可以被独立复制仍然是一个持续的挑战。在本文中,我们描述了一个AI驱动的“可重复性副驾驶”,该副驾驶分析手稿、代码和补充材料,生成结构化的Jupyter笔记本和建议,旨在促进计算或“机械”可重复性。我们的初步结果表明,这个副驾驶有潜力大幅减少复制时间(在一个案例中从30多个小时减少到约1个小时),同时实现了对适用于计算复制的图表、表格和结果的高覆盖率。该系统系统地检测可重复性的障碍,包括超参数缺失值、未记录的预处理步骤以及不完整或不可访问的数据集。尽管初步,这些发现表明,AI工具可以有效减轻可重复性工作的负担,并有助于更透明和可验证的科学交流。
更新时间: 2025-12-15 18:11:39
领域: cs.AI
Temporal Tokenization Strategies for Event Sequence Modeling with Large Language Models
Representing continuous time is a critical and under-explored challenge in modeling temporal event sequences with large language models (LLMs). Various strategies like byte-level representations or calendar tokens have been proposed. However, the optimal approach remains unclear, especially given the diverse statistical distributions of real-world event data, which range from smooth log-normal to discrete, spiky patterns. This paper presents the first empirical study of temporal tokenization for event sequences, comparing distinct encoding strategies: naive numeric strings, high-precision byte-level representations, human-semantic calendar tokens, classic uniform binning, and adaptive residual scalar quantization. We evaluate these strategies by fine-tuning LLMs on real-world datasets that exemplify these diverse distributions. Our analysis reveals that no single strategy is universally superior; instead, prediction performance depends heavily on aligning the tokenizer with the data's statistical properties, with log-based strategies excelling on skewed distributions and human-centric formats proving robust for mixed modalities.
Updated: 2025-12-15 18:10:51
标题: 大型语言模型的事件序列建模中的时间标记策略
摘要: 代表连续时间是在使用大型语言模型(LLMs)对时间事件序列建模中的一个关键且未充分探索的挑战。已经提出了各种策略,如字节级表示或日历标记。然而,最佳方法仍不清楚,特别是考虑到真实世界事件数据的多样统计分布,从平滑的对数正态到离散的尖峰模式。本文提出了首个关于事件序列的时间标记化的经验研究,比较了不同的编码策略:朴素数字字符串、高精度字节级表示、人类语义日历标记、经典均匀分箱以及自适应剩余标量量化。我们通过在展现这些多样分布的真实世界数据集上对LLMs进行微调来评估这些策略。我们的分析显示,没有单一策略是普遍优越的;相反,预测性能严重依赖于标记器与数据的统计特性的对齐,基于对数的策略在偏斜分布上表现优异,而人类中心格式在多样模态上证明了稳健性。
更新时间: 2025-12-15 18:10:51
领域: cs.CL,cs.LG
LightTopoGAT: Enhancing Graph Attention Networks with Topological Features for Efficient Graph Classification
Graph Neural Networks have demonstrated significant success in graph classification tasks, yet they often require substantial computational resources and struggle to capture global graph properties effectively. We introduce LightTopoGAT, a lightweight graph attention network that enhances node features through topological augmentation by incorporating node degree and local clustering coefficient to improve graph representation learning. The proposed approach maintains parameter efficiency through streamlined attention mechanisms while integrating structural information that is typically overlooked by local message passing schemes. Through comprehensive experiments on three benchmark datasets, MUTAG, ENZYMES, and PROTEINS, we show that LightTopoGAT achieves superior performance compared to established baselines including GCN, GraphSAGE, and standard GAT, with a 6.6 percent improvement in accuracy on MUTAG and a 2.2 percent improvement on PROTEINS. Ablation studies further confirm that these performance gains arise directly from the inclusion of topological features, demonstrating a simple yet effective strategy for enhancing graph neural network performance without increasing architectural complexity.
Updated: 2025-12-15 18:09:59
标题: LightTopoGAT:利用拓扑特征增强图注意力网络以实现高效的图分类
摘要: 图神经网络在图分类任务中取得了显著的成功,但通常需要大量的计算资源,并且难以有效地捕获全局图属性。我们介绍了一种轻量级图注意力网络LightTopoGAT,通过将节点度和局部聚类系数结合起来增强节点特征,以改进图表示学习。该方法通过简化的注意力机制保持参数效率,同时整合了通常被本地消息传递方案忽略的结构信息。通过在三个基准数据集MUTAG、ENZYMES和PROTEINS上进行全面的实验,我们展示了LightTopoGAT相对于已建立的基线模型包括GCN、GraphSAGE和标准GAT取得了更优越的性能,其中在MUTAG上准确率提高了6.6%,在PROTEINS上提高了2.2%。消融研究进一步证实,这些性能增益直接来自于拓扑特征的包含,展示了一种简单而有效的策略,可以增强图神经网络的性能,而不增加架构复杂性。
更新时间: 2025-12-15 18:09:59
领域: cs.LG
QoeSiGN: Towards Qualified Collaborative eSignatures
eSignatures ensure data's authenticity, non-repudiation, and integrity. EU's eIDAS regulation specifies, e.g., advanced and qualified (QES) eSignatures. While eSignatures' concrete legal effects depend on the individual case, QESs constitute the highest level of technical protection and authenticity under eIDAS. QESs are based on a qualified certificate issued by a qualified trust service provider (QTSP). Despite legal requirements, technically, a QTSP represents a single point of failure. Contrary, privacy-preserving collaborative computations (P2C2s) have become increasingly practical in recent years; yet lacking an extensive investigation on potential integrations in the QES landscape. We perform a threat analysis on the QES-creation process of Austria's national eID, using STRIDE and a DREAD-like model to extract requirement challenges (RCs) primarily related to: (1) Distributed Service Robustness, (2) Agile Crypto Deployment, and (3) Active User Involvement. To address these RCs, we present QoeSiGN, utilizing novel P2C2 technologies. While currently no P2C2 addresses all RCs, legal aspects, and practical efficiency simultaneously, QoeSiGN gives instantiation possibilities for different needs. For instance, "Multi-Party HSMs" for distributed hardware-secured computations; or secure multi-party computation (software) for highest crypto agility and user involvement, where the user participates in the QES computation. Deployment-wise, QTSPs would need to adapt the signing process and setup trusted communication channels. Legal-wise, QoeSiGN's implementation appears permissible, needing further analysis for realization. Technically, QoeSiGN addresses some regulation requirements better than the current solution, such as "sole control" or crypto agility. Our identified threats and extracted requirements can be transferred to the general QES ecosystem.
Updated: 2025-12-15 18:07:17
标题: QoeSiGN: 朝着合格的协作电子签名方向
摘要: 电子签名确保数据的真实性、不可否认性和完整性。欧盟的eIDAS法规规定了高级和合格(QES)电子签名等内容。虽然电子签名的具体法律效果取决于个案,但QES构成了eIDAS下技术保护和真实性的最高级别。QES基于由合格信任服务提供商(QTSP)颁发的合格证书。尽管存在法律要求,但从技术上讲,QTSP代表了单一故障点。相反,隐私保护协作计算(P2C2)近年来变得越来越实用;然而,对QES领域潜在整合的深入调查仍然缺乏。 我们对奥地利国家电子身份(eID)的QES创建过程进行了威胁分析,使用STRIDE和类似DREAD的模型提取了主要与需求挑战(RCs)相关的内容,主要包括:(1)分布式服务鲁棒性,(2)敏捷加密部署,和(3)积极用户参与。为了解决这些需求挑战,我们提出了QoeSiGN,利用了新型P2C2技术。尽管目前没有任何P2C2能够同时满足所有RCs、法律方面和实际效率,但QoeSiGN为不同需求提供了实例化可能性。例如,用于分布式硬件安全计算的“多方HSM”;或用于最高加密敏捷性和用户参与的安全多方计算(软件),其中用户参与电子签名计算。 在部署方面,QTSP需要调整签名过程并建立可信通信渠道。在法律方面,QoeSiGN的实施看起来是可允许的,需要进一步分析以实现。技术上,QoeSiGN比当前解决方案更好地满足了一些法规要求,例如“独立控制”或加密敏捷性。我们识别的威胁和提取的需求可以转移到一般的QES生态系统中。
更新时间: 2025-12-15 18:07:17
领域: cs.CR
Do-Undo: Generating and Reversing Physical Actions in Vision-Language Models
We introduce the Do-Undo task and benchmark to address a critical gap in vision-language models: understanding and generating physically plausible scene transformations driven by real-world actions. Unlike prior work focused on object-level edits, Do-Undo requires models to simulate the outcome of a physical action and then accurately reverse it, reflecting true cause-and-effect in the visual world. We curate a large-scale dataset of reversible actions from real-world videos and design a training strategy enforcing consistency for robust action grounding. Our experiments reveal that current models struggle with physical reversibility, underscoring the importance of this task for embodied AI, robotics, and physics-aware generative modeling. Do-Undo establishes an intuitive testbed for evaluating and advancing physical reasoning in multimodal systems.
Updated: 2025-12-15 18:03:42
标题: Do-Undo:生成和逆转视觉语言模型中的物理动作
摘要: 我们引入了Do-Undo任务和基准测试,以解决视觉语言模型中的一个关键问题:理解和生成由真实世界行动驱动的物理可信场景转换。与以往专注于对象级编辑的工作不同,Do-Undo要求模型模拟物理动作的结果,然后准确地将其逆转,反映视觉世界中真实的因果关系。我们从真实世界视频中筛选了一个大规模的可逆动作数据集,并设计了一种训练策略,强调了对稳健动作基础的一致性。我们的实验表明,当前模型在物理可逆性方面存在困难,突显了这个任务对于具有身体智能、机器人和物理感知生成建模的重要性。Do-Undo为评估和推进多模式系统中的物理推理建立了一个直观的测试平台。
更新时间: 2025-12-15 18:03:42
领域: cs.CV,cs.LG
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Building general-purpose reasoning models with reinforcement learning (RL) entails substantial cross-domain heterogeneity, including large variation in inference-time response lengths and verification latency. Such variability complicates the RL infrastructure, slows training, and makes training curriculum (e.g., response length extension) and hyperparameter selection challenging. In this work, we propose cascaded domain-wise reinforcement learning (Cascade RL) to develop general-purpose reasoning models, Nemotron-Cascade, capable of operating in both instruct and deep thinking modes. Departing from conventional approaches that blend heterogeneous prompts from different domains, Cascade RL orchestrates sequential, domain-wise RL, reducing engineering complexity and delivering state-of-the-art performance across a wide range of benchmarks. Notably, RLHF for alignment, when used as a pre-step, boosts the model's reasoning ability far beyond mere preference optimization, and subsequent domain-wise RLVR stages rarely degrade the benchmark performance attained in earlier domains and may even improve it (see an illustration in Figure 1). Our 14B model, after RL, outperforms its SFT teacher, DeepSeek-R1-0528, on LiveCodeBench v5/v6/Pro and achieves silver-medal performance in the 2025 International Olympiad in Informatics (IOI). We transparently share our training and data recipes.
Updated: 2025-12-15 18:02:35
标题: Nemotron-Cascade: 将级联强化学习扩展至通用推理模型
摘要: 使用强化学习(RL)构建通用推理模型涉及跨领域异质性,包括推理时间响应长度和验证延迟的大幅变化。这种变化增加了RL基础设施的复杂性,减慢了训练速度,并使训练课程(例如,响应长度扩展)和超参数选择变得具有挑战性。在这项工作中,我们提出了级联领域强化学习(Cascade RL)来开发通用推理模型Nemotron-Cascade,能够在指导和深度思考模式下运行。与传统方法不同,混合不同领域的异质提示,Cascade RL编排顺序的领域强化学习,降低了工程复杂性,并在广泛的基准测试中提供了最先进的性能。值得注意的是,当RLHF用作预处理步骤时,可以大大提高模型的推理能力,超越简单的优化偏好,随后的领域强化学习VR阶段很少会降低在早期领域取得的基准性能,甚至可能提高它(见图1中的示例)。我们的14B模型,在RL之后,在LiveCodeBench v5/v6/Pro上优于其SFT老师DeepSeek-R1-0528,并在2025年国际信息学奥林匹克竞赛(IOI)中获得银牌。我们透明地分享我们的训练和数据配方。
更新时间: 2025-12-15 18:02:35
领域: cs.CL,cs.AI,cs.LG
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning
Supervised Fine-Tuning (SFT) is used to specialize model behavior by training weights to produce intended target responses for queries. In contrast, In-Context Learning (ICL) adapts models during inference with instructions or demonstrations in the prompt. ICL can offer better generalizability and more calibrated responses compared to SFT in data scarce settings, at the cost of more inference compute. In this work, we ask the question: Can ICL's internal computations be used to improve the qualities of SFT? We first show that ICL and SFT produce distinct activation patterns, indicating that the two methods achieve adaptation through different functional mechanisms. Motivated by this observation and to use ICL's rich functionality, we introduce ICL Activation Alignment (IA2), a self-distillation technique which aims to replicate ICL's activation patterns in SFT models and incentivizes ICL-like internal reasoning. Performing IA2 as a priming step before SFT significantly improves the accuracy and calibration of model outputs, as shown by our extensive empirical results on 12 popular benchmarks and two model families. This finding is not only practically useful, but also offers a conceptual window into the inner mechanics of model adaptation.
Updated: 2025-12-15 18:01:10
标题: IA2:与ICL激活对齐可提高监督微调效果
摘要: 监督微调(SFT)用于通过训练权重来产生查询的预期目标响应来专门化模型行为。相比之下,在上下文学习(ICL)中,在提示中使用指示或演示来调整模型。在数据稀缺的情况下,ICL可以提供比SFT更好的泛化能力和更加校准的响应,但需要更多的推理计算。在这项工作中,我们提出一个问题:ICL的内部计算能否用来改进SFT的质量?我们首先展示ICL和SFT产生不同的激活模式,表明这两种方法通过不同的功能机制实现适应性。受这一观察的启发,并为了利用ICL的丰富功能,我们引入ICL激活对齐(IA2),这是一种自我蒸馏技术,旨在在SFT模型中复制ICL的激活模式,并激励ICL类似的内部推理。在SFT之前执行IA2作为一个预处理步骤,显著提高了模型输出的准确性和校准性,我们在12个流行基准测试和两个模型系列上的广泛经验结果表明了这一发现。这个发现不仅在实践上有用,也为模型适应的内部机制提供了一个概念窗口。
更新时间: 2025-12-15 18:01:10
领域: cs.LG,cs.AI,cs.CL
DA-SSL: self-supervised domain adaptor to leverage foundational models in turbt histopathology slides
Recent deep learning frameworks in histopathology, particularly multiple instance learning (MIL) combined with pathology foundational models (PFMs), have shown strong performance. However, PFMs exhibit limitations on certain cancer or specimen types due to domain shifts - these cancer types were rarely used for pretraining or specimens contain tissue-based artifacts rarely seen within the pretraining population. Such is the case for transurethral resection of bladder tumor (TURBT), which are essential for diagnosing muscle-invasive bladder cancer (MIBC), but contain fragmented tissue chips and electrocautery artifacts and were not widely used in publicly available PFMs. To address this, we propose a simple yet effective domain-adaptive self-supervised adaptor (DA-SSL) that realigns pretrained PFM features to the TURBT domain without fine-tuning the foundational model itself. We pilot this framework for predicting treatment response in TURBT, where histomorphological features are currently underutilized and identifying patients who will benefit from neoadjuvant chemotherapy (NAC) is challenging. In our multi-center study, DA-SSL achieved an AUC of 0.77+/-0.04 in five-fold cross-validation and an external test accuracy of 0.84, sensitivity of 0.71, and specificity of 0.91 using majority voting. Our results demonstrate that lightweight domain adaptation with self-supervision can effectively enhance PFM-based MIL pipelines for clinically challenging histopathology tasks. Code is Available at https://github.com/zhanghaoyue/DA_SSL_TURBT.
Updated: 2025-12-15 17:53:18
标题: DA-SSL:自监督领域适配器,以利用基础模型在肾脏组织病理学切片中
摘要: 最近,组织病理学中的深度学习框架,特别是将多示例学习(MIL)与病理学基础模型(PFM)相结合,表现出较强的性能。然而,由于领域转移,PFM在某些癌症或标本类型上存在局限性 - 这些癌症类型很少用于预训练,或标本含有在预训练人群中很少见的基于组织的人工制品。这种情况在膀胱肿瘤经膀胱内切除术(TURBT)中尤为突出,TURBT对于诊断浸润性膀胱癌(MIBC)至关重要,但其中包含的组织碎片和电灼伤痕迹并未广泛用于公开的PFM中。为了解决这个问题,我们提出了一个简单而有效的领域自适应自监督适配器(DA-SSL),将预训练的PFM特征重新对准到TURBT领域,而无需微调基础模型本身。我们在TURBT中尝试了这个框架,目前组织形态学特征被低估,并且识别哪些患者将从新辅助化疗(NAC)中受益具有挑战性。在我们的多中心研究中,DA-SSL在五折交叉验证中实现了0.77+/-0.04的AUC,并且在外部测试中的准确性为0.84,敏感性为0.71,特异性为0.91,使用多数投票。我们的结果表明,轻量级领域适应与自监督可以有效增强基于PFM的MIL管道,用于临床挑战性的组织病理学任务。代码可在https://github.com/zhanghaoyue/DA_SSL_TURBT 找到。
更新时间: 2025-12-15 17:53:18
领域: cs.CV,cs.AI
Textual Gradients are a Flawed Metaphor for Automatic Prompt Optimization
A well-engineered prompt can increase the performance of large language models; automatic prompt optimization techniques aim to increase performance without requiring human effort to tune the prompts. One leading class of prompt optimization techniques introduces the analogy of textual gradients. We investigate the behavior of these textual gradient methods through a series of experiments and case studies. While such methods often result in a performance improvement, our experiments suggest that the gradient analogy does not accurately explain their behavior. Our insights may inform the selection of prompt optimization strategies, and development of new approaches.
Updated: 2025-12-15 17:52:16
标题: 文本梯度是自动提示优化的一个有缺陷的隐喻
摘要: 一个精心设计的提示可以提高大型语言模型的性能;自动提示优化技术旨在提高性能,而无需人为调整提示。一种主要的提示优化技术类别引入了文本梯度的类比。我们通过一系列实验和案例研究来研究这些文本梯度方法的行为。虽然这些方法通常会导致性能提高,但我们的实验表明,梯度类比并不能准确解释它们的行为。我们的见解可能有助于选择提示优化策略,并开发新的方法。
更新时间: 2025-12-15 17:52:16
领域: cs.CL,cs.LG
SoK: Can Fully Homomorphic Encryption Support General AI Computation? A Functional and Cost Analysis
Artificial intelligence (AI) increasingly powers sensitive applications in domains such as healthcare and finance, relying on both linear operations (e.g., matrix multiplications in large language models) and non-linear operations (e.g., sorting in retrieval-augmented generation). Fully homomorphic encryption (FHE) has emerged as a promising tool for privacy-preserving computation, but it remains unclear whether existing methods can support the full spectrum of AI workloads that combine these operations. In this SoK, we ask: Can FHE support general AI computation? We provide both a functional analysis and a cost analysis. First, we categorize ten distinct FHE approaches and evaluate their ability to support general computation. We then identify three promising candidates and benchmark workloads that mix linear and non-linear operations across different bit lengths and SIMD parallelization settings. Finally, we evaluate five real-world, privacy-sensitive AI applications that instantiate these workloads. Our results quantify the costs of achieving general computation in FHE and offer practical guidance on selecting FHE methods that best fit specific AI application requirements. Our codes are available at https://github.com/UCF-ML-Research/FHE-AI-Generality.
Updated: 2025-12-15 17:51:44
标题: SoK: 完全同态加密是否能支持通用人工智能计算?功能和成本分析
摘要: 人工智能(AI)越来越多地用于诸如医疗保健和金融等领域的敏感应用程序,依赖于线性操作(例如,在大型语言模型中的矩阵乘法)和非线性操作(例如,在检索增强生成中的排序)。全同态加密(FHE)已经成为一种有望实现隐私保护计算的工具,但目前尚不清楚现有方法是否能支持结合这些操作的AI工作负载的全部范围。 在本文中,我们提出了一个研究问题:FHE能否支持通用AI计算?我们提供了功能分析和成本分析。首先,我们对十种不同的FHE方法进行分类,并评估它们支持通用计算的能力。然后,我们确定了三个有前途的候选方案,并基准测试了在不同位长度和SIMD并行化设置下混合线性和非线性操作的工作负载。最后,我们评估了五个真实世界中的、涉及隐私的AI应用程序,这些应用程序实例化了这些工作负载。我们的结果量化了在FHE中实现通用计算的成本,并提供了关于选择最适合特定AI应用程序要求的FHE方法的实用指导。我们的代码可在https://github.com/UCF-ML-Research/FHE-AI-Generality找到。
更新时间: 2025-12-15 17:51:44
领域: cs.CR
Scalable Formal Verification via Autoencoder Latent Space Abstraction
Finite Abstraction methods provide a powerful formal framework for proving that systems satisfy their specifications. However, these techniques face scalability challenges for high-dimensional systems, as they rely on state-space discretization which grows exponentially with dimension. Learning-based approaches to dimensionality reduction, utilizing neural networks and autoencoders, have shown great potential to alleviate this problem. However, ensuring the correctness of the resulting verification results remains an open question. In this work, we provide a formal approach to reduce the dimensionality of systems via convex autoencoders and learn the dynamics in the latent space through a kernel-based method. We then construct a finite abstraction from the learned model in the latent space and guarantee that the abstraction contains the true behaviors of the original system. We show that the verification results in the latent space can be mapped back to the original system. Finally, we demonstrate the effectiveness of our approach on multiple systems, including a 26D system controlled by a neural network, showing significant scalability improvements without loss of rigor.
Updated: 2025-12-15 17:48:07
标题: 通过自动编码器潜空间抽象实现可扩展的形式验证
摘要: 有限抽象方法为证明系统满足规范提供了一个强大的形式框架。然而,对于高维系统,这些技术面临着可扩展性挑战,因为它们依赖于随着维数呈指数增长的状态空间离散化。利用神经网络和自动编码器的基于学习的降维方法已经显示出缓解这一问题的巨大潜力。然而,确保所得到的验证结果的正确性仍然是一个悬而未决的问题。在这项工作中,我们提供了一种正式的方法,通过凸自动编码器降低系统的维度,并通过基于核的方法在潜在空间中学习动态。然后我们构建了一个从学习模型中的潜在空间中得到的有限抽象,并保证该抽象包含原始系统的真实行为。我们展示了在潜在空间中的验证结果可以映射回原始系统。最后,我们在多个系统上展示了我们方法的有效性,包括一个由神经网络控制的26D系统,显示了显著的可扩展性改进而没有丧失严谨性。
更新时间: 2025-12-15 17:48:07
领域: cs.LG
Image Diffusion Preview with Consistency Solver
The slow inference process of image diffusion models significantly degrades interactive user experiences. To address this, we introduce Diffusion Preview, a novel paradigm employing rapid, low-step sampling to generate preliminary outputs for user evaluation, deferring full-step refinement until the preview is deemed satisfactory. Existing acceleration methods, including training-free solvers and post-training distillation, struggle to deliver high-quality previews or ensure consistency between previews and final outputs. We propose ConsistencySolver derived from general linear multistep methods, a lightweight, trainable high-order solver optimized via Reinforcement Learning, that enhances preview quality and consistency. Experimental results demonstrate that ConsistencySolver significantly improves generation quality and consistency in low-step scenarios, making it ideal for efficient preview-and-refine workflows. Notably, it achieves FID scores on-par with Multistep DPM-Solver using 47% fewer steps, while outperforming distillation baselines. Furthermore, user studies indicate our approach reduces overall user interaction time by nearly 50% while maintaining generation quality. Code is available at https://github.com/G-U-N/consolver.
Updated: 2025-12-15 17:47:49
标题: 图像扩散预览与一致性求解器
摘要: 图像扩散模型的缓慢推理过程显著降低了用户的交互体验。为了解决这个问题,我们引入了Diffusion Preview,这是一种新颖的范式,采用快速、低步长采样生成初步输出供用户评估,推迟到预览被认为满意时进行完整步长的优化。现有的加速方法,包括无需训练的求解器和后期训练的蒸馏,难以提供高质量的预览或确保预览和最终输出之间的一致性。我们提出了ConsistencySolver,它是从一般线性多步方法中派生出来的轻量级、可训练的高阶求解器,通过强化学习进行优化,提高了预览质量和一致性。实验结果表明,ConsistencySolver在低步骤情况下显著提高了生成质量和一致性,使其成为高效的预览和优化工作流程的理想选择。值得注意的是,它在使用比Multistep DPM-Solver少47%的步骤时达到了与其相当的FID分数,同时优于蒸馏基线。此外,用户研究表明我们的方法可以将用户总体交互时间减少近50%,同时保持生成质量。代码可在https://github.com/G-U-N/consolver获取。
更新时间: 2025-12-15 17:47:49
领域: cs.LG,cs.CV
SparsePixels: Efficient Convolution for Sparse Data on FPGAs
Inference of standard convolutional neural networks (CNNs) on FPGAs often incurs high latency and a long initiation interval due to the deep nested loops required to densely convolve every input pixel regardless of its feature value. However, input features can be spatially sparse in some image data, where semantic information may occupy only a small fraction of the pixels and most computation would be wasted on empty regions. In this work, we introduce SparsePixels, a framework that implements sparse convolution on FPGAs by selectively retaining and computing on a small subset of active pixels while ignoring the rest. We show that, for identifying neutrino interactions in naturally sparse LArTPC images with 4k pixels, a standard CNN with a compact size of 4k parameters incurs an inference latency of 48.665 $μ$s on an FPGA, whereas a sparse CNN of the same base architecture, computing on less than 1% of the input pixels, achieves a $\times 73$ speedup to 0.665 $μ$s with resource utilization well within on-chip budgets, trading only a small percent-level performance loss. This work aims to benefit future algorithm development for efficient data readout in modern experiments with latency requirements of microseconds or below.
Updated: 2025-12-15 17:47:05
标题: SparsePixels:FPGA上稀疏数据的高效卷积
摘要: 在FPGAs上推断标准卷积神经网络(CNNs)通常会产生高延迟和较长的启动间隔,这是因为需要深度嵌套循环来密集地卷积每个输入像素,而不考虑其特征值。然而,在某些图像数据中,输入特征可能在空间上稀疏,语义信息可能仅占据少部分像素,大部分计算会浪费在空白区域上。在这项工作中,我们引入了SparsePixels,这是一个在FPGAs上实现稀疏卷积的框架,通过选择性地保留和计算一小部分活跃像素,而忽略其余部分。我们表明,对于在自然稀疏的LArTPC图像中识别中微子相互作用,有4k像素的标准CNN,参数为4k,引入了48.665μs的推断延迟,而具有相同基本架构的稀疏CNN,在少于1%的输入像素上计算,实现了73倍的加速到0.665μs,资源利用率远低于芯片预算,并且仅牺牲了少量百分比的性能损失。这项工作旨在为满足微秒或更低延迟要求的现代实验中高效数据读取的未来算法开发带来益处。
更新时间: 2025-12-15 17:47:05
领域: cs.AR,cs.LG,hep-ex
Bilevel ZOFO: Efficient LLM Fine-Tuning and Meta-Training
Fine-tuning pre-trained Large Language Models (LLMs) for downstream tasks using First-Order (FO) optimizers presents significant computational challenges. Parameter-Efficient Fine-Tuning (PEFT) methods address these by freezing most model parameters and training only a small subset. However, PEFT often underperforms compared to full fine-tuning when high task-specific accuracy is required. Zeroth-Order (ZO) methods fine-tune the entire pre-trained model without back-propagation, estimating gradients through forward passes only. While memory-efficient, ZO methods suffer from slow convergence and high sensitivity to prompt selection. We bridge these two worlds with Bilevel-ZOFO, a bilevel optimization method that couples fast, local FO-PEFT adaptation at the inner level with stable, memory-efficient ZO updates of the full backbone at the outer level. The FO-PEFT inner loop performs fast, low-memory local adaptation that reduces the variance of ZO estimates and stabilizes the search, guiding the outer ZO updates of the full backbone and reducing prompt sensitivity. In the mean time, the outer ZO provides better generalization ability for PEFT. We provide theoretical convergence guarantees and empirically demonstrate that Bilevel-ZOFO significantly outperforms existing ZO and FO-PEFT methods, achieving 2-4 times faster training while maintaining similar memory efficiency. Additionally, we show by updating the backbone with ZO and adapting only a tiny FO-PEFT block per task, Bilevel-ZOFO combines full-model capacity with few-shot efficiency, making it a very efficient meta-learning algorithm that quickly adapts to new tasks.
Updated: 2025-12-15 17:47:03
标题: 双层ZOFO:高效LLM微调和元训练
摘要: Fine-tuning pre-trained Large Language Models (LLMs) for downstream tasks using First-Order (FO) optimizers presents significant computational challenges. Parameter-Efficient Fine-Tuning (PEFT) methods address these by freezing most model parameters and training only a small subset. However, PEFT often underperforms compared to full fine-tuning when high task-specific accuracy is required. Zeroth-Order (ZO) methods fine-tune the entire pre-trained model without back-propagation, estimating gradients through forward passes only. While memory-efficient, ZO methods suffer from slow convergence and high sensitivity to prompt selection. We bridge these two worlds with Bilevel-ZOFO, a bilevel optimization method that couples fast, local FO-PEFT adaptation at the inner level with stable, memory-efficient ZO updates of the full backbone at the outer level. The FO-PEFT inner loop performs fast, low-memory local adaptation that reduces the variance of ZO estimates and stabilizes the search, guiding the outer ZO updates of the full backbone and reducing prompt sensitivity. In the mean time, the outer ZO provides better generalization ability for PEFT. We provide theoretical convergence guarantees and empirically demonstrate that Bilevel-ZOFO significantly outperforms existing ZO and FO-PEFT methods, achieving 2-4 times faster training while maintaining similar memory efficiency. Additionally, we show by updating the backbone with ZO and adapting only a tiny FO-PEFT block per task, Bilevel-ZOFO combines full-model capacity with few-shot efficiency, making it a very efficient meta-learning algorithm that quickly adapts to new tasks.
更新时间: 2025-12-15 17:47:03
领域: cs.LG
Explainable Quantum Machine Learning for Multispectral Images Segmentation: Case Study
The emergence of Big Data changed how we approach information systems engineering. Nowadays, when we can use remote sensing techniques for Big Data acquisition, the issues such data introduce are as important as ever. One of those concerns is the processing of the data. Classical methods often fail to address that problem or are incapable of processing the data in a reasonable time. With that in mind information system engineers are required to investigate different approaches to the data processing. The recent advancements in noisy intermediate-scale quantum (NISQ) devices implementation allow us to investigate their application to real-life computational problem. This field of study is called quantum (information) systems engineering and usually focuses on technical problems with the contemporary devices. However, hardware challenges are not the only ones that hinder our quantum computation capabilities. Software limitations are the other, less explored side of this medal. Using multispectral image segmentation as a task example, we investigated how difficult it is to run a hybrid quantum-classical model on a real, publicly available quantum device. To quantify how and explain why the performance of our model changed when ran on a real device, we propose new explainability metrics. These metrics introduce new meaning to the explainable quantum machine learning; the explanation of the performance issue comes from the quantum device behavior. We also analyzed the expected money costs of running similar experiment on contemporary quantum devices using standard market prices.
Updated: 2025-12-15 17:46:00
标题: 可解释的量子机器学习用于多光谱图像分割的解释:案例研究
摘要: 大数据的出现改变了我们对信息系统工程的处理方式。如今,当我们可以利用遥感技术进行大数据采集时,这些数据带来的问题与以往一样重要。其中之一是数据处理的问题。传统方法往往无法解决这个问题,或者无法在合理的时间内处理数据。考虑到这一点,信息系统工程师需要研究不同的数据处理方法。最近在噪声中等规模量子(NISQ)设备实施方面取得的进展使我们能够研究它们在真实计算问题中的应用。这个研究领域称为量子(信息)系统工程,通常侧重于当代设备的技术问题。然而,硬件挑战并不是唯一阻碍我们量子计算能力的因素。软件限制是另一个少为人所探讨的方面。以多光谱图像分割为任务示例,我们研究了在真实、公开可用的量子设备上运行混合量子-经典模型有多困难。为了量化和解释我们的模型在真实设备上运行时性能变化的方式,我们提出了新的可解释性指标。这些指标为可解释的量子机器学习引入了新的含义;性能问题的解释源于量子设备的行为。我们还分析了在当代量子设备上运行类似实验的预期成本,使用标准市场价格。
更新时间: 2025-12-15 17:46:00
领域: quant-ph,cs.CV,cs.LG
Faster Results from a Smarter Schedule: Reframing Collegiate Cross Country through Analysis of the National Running Club Database
Collegiate cross country teams often build their season schedules on intuition rather than evidence, partly because large-scale performance datasets are not publicly accessible. To address this limitation, we introduce the National Running Club Database (NRCD), the first openly available dataset to aggregate 23,725 race results from 7,594 collegiate club athletes across the 2023-2025 seasons. Unlike existing resources, NRCD includes detailed course metadata, allowing us to develop two standardized performance metrics: Converted Only (distance correction) and Standardized (distance, weather, and elevation adjusted). Using these standardized measures, we find that athletes with slower initial performances exhibit the greatest improvement within a season, and that race frequency is the strongest predictor of improvement. Using six machine learning models, random forest achieves the highest accuracy (r squared equals 0.92), revealing that athletes who race more frequently progress significantly faster than those who do not. At the team level, programs whose athletes race at least four times during the regular season have substantially higher odds of placing in the top 15 at nationals (chi-squared less than 0.01). These results challenge common coaching practices that favor minimal racing before championship meets. Our findings demonstrate that a data-informed scheduling strategy improves both individual development and team competitiveness. The NRCD provides a new foundation for evidence-based decision-making in collegiate cross country and opens opportunities for further research on standardized, longitudinal athlete performance modeling.
Updated: 2025-12-15 17:45:12
标题: 更快的成果来自更智能的计划:通过分析全国跑步俱乐部数据库重新构思大学越野比赛
摘要: 大学越野队经常根据直觉而不是证据来制定赛季赛程,部分原因是大规模的表现数据集不是公开可访问的。为了解决这一限制,我们介绍了第一个公开可用的数据集——国家跑步俱乐部数据库(NRCD),汇总了来自2023年至2025年赛季的7,594名大学俱乐部运动员的23,725个比赛结果。与现有资源不同,NRCD包含了详细的赛道元数据,使我们能够开发出两种标准化表现指标:只转换(距离校正)和标准化(距离、天气和海拔调整)。使用这些标准化指标,我们发现初次表现较慢的运动员在一个赛季内有最大的进步,并且比赛频率是最强预测进步的因素。使用六种机器学习模型,随机森林获得了最高的准确性(r平方等于0.92),显示出比赛更频繁的运动员进步速度明显快于不经常比赛的运动员。在团队层面上,那些在常规赛季至少参加四次比赛的项目在全国排名前15的概率明显更高(卡方小于0.01)。这些结果挑战了在冠军赛前更少比赛的常见教练实践。我们的研究结果表明,基于数据的赛程策略可以提高个人发展和团队竞争力。NRCD为大学越野比赛中基于证据的决策提供了新的基础,并为标准化、长期运动员表现建模的进一步研究提供了机会。
更新时间: 2025-12-15 17:45:12
领域: cs.CY,cs.AI,cs.LG
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV) caching, and incoherent generation arising from learning dependencies over an intractable space of token combinations. To address these limitations, we introduce ReFusion, a novel masked diffusion model that achieves superior performance and efficiency by elevating parallel decoding from the token level to a higher slot level, where each slot is a fixed-length, contiguous sub-sequence. This is achieved through an iterative ``plan-and-infill'' decoding process: a diffusion-based planning step first identifies a set of weakly dependent slots, and an autoregressive infilling step then decodes these selected slots in parallel. The slot-based design simultaneously unlocks full KV cache reuse with a unified causal framework and reduces the learning complexity from the token combination space to a manageable slot-level permutation space. Extensive experiments on seven diverse benchmarks show that ReFusion not only overwhelmingly surpasses prior MDMs with 34% performance gains and an over 18$\times$ speedup on average, but also bridges the performance gap to strong ARMs while maintaining a 2.33$\times$ average speedup.
Updated: 2025-12-15 17:41:19
标题: 重新融合:具有并行自回归解码的扩散大语言模型
摘要: 自回归模型(ARMs)受到缓慢的顺序推理的阻碍。而遮蔽扩散模型(MDMs)提供了并行的替代方案,但它们存在重要的缺陷:由于排除了键-值(KV)缓存而导致高计算开销,以及由于在一个难以处理的令牌组合空间上学习依赖关系而产生的不一致生成。为了解决这些限制,我们引入了ReFusion,这是一种新颖的遮蔽扩散模型,通过将并行解码从令牌级提升到更高的插槽级别,从而实现了卓越的性能和效率。在ReFusion中,每个插槽是一个固定长度的连续子序列。这是通过一个迭代的“计划和填充”解码过程实现的:首先是基于扩散的规划步骤,识别一组弱相关的插槽,然后是自回归填充步骤,以并行方式解码这些选定的插槽。基于插槽的设计同时通过统一的因果框架解锁完整的KV缓存重用,并将学习复杂度从令牌组合空间降低到可管理的插槽级排列空间。在七个不同的基准测试上进行的大量实验表明,ReFusion不仅在性能上明显超过先前的MDMs,平均性能提升了34%,速度提升了超过18倍,而且还弥合了与强大的ARMs之间的性能差距,同时保持了2.33倍的平均速度提升。
更新时间: 2025-12-15 17:41:19
领域: cs.CL,cs.AI,cs.LG
DP-CSGP: Differentially Private Stochastic Gradient Push with Compressed Communication
In this paper, we propose a Differentially Private Stochastic Gradient Push with Compressed communication (termed DP-CSGP) for decentralized learning over directed graphs. Different from existing works, the proposed algorithm is designed to maintain high model utility while ensuring both rigorous differential privacy (DP) guarantees and efficient communication. For general non-convex and smooth objective functions, we show that the proposed algorithm achieves a tight utility bound of $\mathcal{O}\left( \sqrt{d\log \left( \frac{1}δ \right)}/(\sqrt{n}Jε) \right)$ ($J$ and $d$ are the number of local samples and the dimension of decision variables, respectively) with $\left(ε, δ\right)$-DP guarantee for each node, matching that of decentralized counterparts with exact communication. Extensive experiments on benchmark tasks show that, under the same privacy budget, DP-CSGP achieves comparable model accuracy with significantly lower communication cost than existing decentralized counterparts with exact communication.
Updated: 2025-12-15 17:37:02
标题: DP-CSGP:具有压缩通信的差分隐私随机梯度推送
摘要: 在本文中,我们提出了一种具有压缩通信的差分隐私随机梯度推送算法(称为DP-CSGP),用于在定向图上进行分散学习。与现有作品不同,所提出的算法旨在在确保严格差分隐私(DP)保证和高效通信的同时维持高模型效用。对于一般非凸和平滑的目标函数,我们展示了所提出的算法实现了$\mathcal{O}\left( \sqrt{d\log \left( \frac{1}δ \right)}/(\sqrt{n}Jε) \right)$的紧密效用界(其中$J$和$d$分别是本地样本的数量和决策变量的维度),并为每个节点提供了(ε,δ)-DP保证,与具有精确通信的分散对应算法相匹配。在基准任务上进行的大量实验表明,在相同的隐私预算下,DP-CSGP实现了与现有具有精确通信的分散对应算法相比具有显著较低的通信成本的模型准确性。
更新时间: 2025-12-15 17:37:02
领域: cs.LG,cs.AI
From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars?
The rapid advancement of large language models (LLMs) has opened new possibilities for AI for good applications. As LLMs increasingly mediate online communication, their potential to foster empathy and constructive dialogue becomes an important frontier for responsible AI research. This work explores whether LLMs can serve not only as moderators that detect harmful content, but as mediators capable of understanding and de-escalating online conflicts. Our framework decomposes mediation into two subtasks: judgment, where an LLM evaluates the fairness and emotional dynamics of a conversation, and steering, where it generates empathetic, de-escalatory messages to guide participants toward resolution. To assess mediation quality, we construct a large Reddit-based dataset and propose a multi-stage evaluation pipeline combining principle-based scoring, user simulation, and human comparison. Experiments show that API-based models outperform open-source counterparts in both reasoning and intervention alignment when doing mediation. Our findings highlight both the promise and limitations of current LLMs as emerging agents for online social mediation.
Updated: 2025-12-15 17:31:26
标题: 从调解到中介:LLMs能在在线火拼中充当调解者吗?
摘要: 大型语言模型(LLMs)的迅速发展为AI的良好应用打开了新的可能性。随着LLMs越来越多地介入在线交流,它们促进共情和建设性对话的潜力成为负责任的AI研究的重要前沿。这项工作探讨了LLMs是否只能作为检测有害内容的调解者,还可以作为理解和缓解在线冲突的调解者。我们的框架将调解分解为两个子任务:判断,其中LLM评估对话的公平性和情绪动态;引导,其中它生成共情、缓解性的消息,引导参与者走向解决方案。为了评估调解质量,我们构建了一个基于Reddit的大型数据集,并提出了一个多阶段评估流程,结合基于原则的评分、用户模拟和人类比较。实验表明,在进行调解时,基于API的模型在推理和干预对齐方面表现优于开源对手。我们的发现突显了当前LLMs作为在线社交调解新兴代理的潜力和局限性。
更新时间: 2025-12-15 17:31:26
领域: cs.AI
Superposition as Lossy Compression: Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability
Neural networks achieve remarkable performance through superposition: encoding multiple features as overlapping directions in activation space rather than dedicating individual neurons to each feature. This challenges interpretability, yet we lack principled methods to measure superposition. We present an information-theoretic framework measuring a neural representation's effective degrees of freedom. We apply Shannon entropy to sparse autoencoder activations to compute the number of effective features as the minimum neurons needed for interference-free encoding. Equivalently, this measures how many "virtual neurons" the network simulates through superposition. When networks encode more effective features than actual neurons, they must accept interference as the price of compression. Our metric strongly correlates with ground truth in toy models, detects minimal superposition in algorithmic tasks, and reveals systematic reduction under dropout. Layer-wise patterns mirror intrinsic dimensionality studies on Pythia-70M. The metric also captures developmental dynamics, detecting sharp feature consolidation during grokking. Surprisingly, adversarial training can increase effective features while improving robustness, contradicting the hypothesis that superposition causes vulnerability. Instead, the effect depends on task complexity and network capacity: simple tasks with ample capacity allow feature expansion (abundance regime), while complex tasks or limited capacity force reduction (scarcity regime). By defining superposition as lossy compression, this work enables principled measurement of how neural networks organize information under computational constraints, connecting superposition to adversarial robustness.
Updated: 2025-12-15 17:25:39
标题: 叠加作为有损压缩:利用稀疏自编码器进行测量并与对抗性脆弱性进行连接
摘要: 神经网络通过叠加实现了卓越的性能:将多个特征编码为激活空间中重叠的方向,而不是将单独的神经元分配给每个特征。这对解释性构成了挑战,然而我们缺乏原则性的方法来衡量叠加。我们提出了一个信息论框架,用于衡量神经表示的有效自由度。我们将Shannon熵应用于稀疏自编码器的激活,计算有效特征的数量,即不受干扰编码所需的最少神经元。等效地,这衡量了网络通过叠加模拟了多少“虚拟神经元”。当网络编码的有效特征超过实际神经元时,它们必须接受干扰作为压缩的代价。我们的度量与玩具模型中的真实情况强相关,检测到算法任务中的最小叠加,并在辍学时揭示系统性的减少。层次模式反映了Pythia-70M上固有维度研究的特征。该度量还捕捉了发展动态,检测到在领悟过程中的特征强化。令人惊讶的是,对抗性训练可以增加有效特征,同时提高鲁棒性,与叠加导致脆弱性的假设相矛盾。相反,这种效应取决于任务复杂性和网络容量:简单任务与充足容量允许特征扩展(丰富区域),而复杂任务或有限容量则迫使减少(稀缺区域)。通过将叠加定义为有损压缩,这项工作使得可以在计算约束下原则性地衡量神经网络如何组织信息,将叠加与对抗性鲁棒性联系起来。
更新时间: 2025-12-15 17:25:39
领域: cs.LG,cs.AI
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
We present MixtureVitae, an open-access pretraining corpus built to minimize legal risk while providing strong downstream performance. MixtureVitae follows a permissive-first, risk-mitigated sourcing strategy that combines public-domain and permissively licensed text (e.g., CC-BY/Apache) with carefully justified low-risk additions (e.g., government works and EU TDM-eligible sources). MixtureVitae adopts a simple, single-stage pretraining recipe that integrates a large proportion of permissive synthetic instruction and reasoning data-signals typically introduced during post-training and generally scarce in permissive web corpora. We categorize all sources into a three-tier scheme that reflects varying risk levels and provide shard-level provenance metadata to enable risk-aware usage. In controlled experiments using the open-sci-ref training protocol (fixed architectures and hyperparameters; 50B and 300B token budgets across 130M-1.7B parameters), models trained on MixtureVitae consistently outperform other permissive datasets across a suite of standard benchmarks, and at the 1.7B-parameters/300B-tokens setting, they surpass FineWeb-Edu and approach DCLM late in training. Performance is particularly strong on MMLU and on math and code benchmarks: a 1.7B model pretrained on 300B MixtureVitae tokens matches or exceeds a strong 1.7B instruction-tuned baseline on GSM8K, HumanEval, and MBPP, despite using over 36 times fewer tokens (300B vs. ~11T). Supported by a thorough decontamination analysis, these results show that permissive-first data with high instruction and reasoning density, tiered by licensing and provenance-related risk, can provide a practical and risk-mitigated foundation for training capable LLMs, reducing reliance on broad web scrapes without sacrificing competitiveness. Code: https://github.com/ontocord/mixturevitae
Updated: 2025-12-15 17:24:36
标题: MixtureVitae:来自开放式网络规模预训练数据集的高质量指导和推理数据,构建于先行文本来源
摘要: 我们提出了MixtureVitae,这是一个开放获取的预训练语料库,旨在最小化法律风险同时提供强大的下游性能。MixtureVitae遵循一个宽松优先、风险缓解的采集策略,结合了公共领域和宽松许可文本(如CC-BY/Apache)以及经过精心证实的低风险添加(如政府作品和符合欧盟TDM资格的来源)。MixtureVitae采用了一个简单的、单阶段的预训练方案,整合了大量宽松的合成指导和推理数据信号,这些信号通常在后期训练中引入,并在宽松网络语料库中普遍稀缺。我们将所有来源分类为三级方案,反映不同的风险水平,并提供切片级别的来源元数据,以实现风险感知的使用。在使用开放科学参考训练协议进行的对照实验中(固定体系结构和超参数;50B和300B记号预算跨越130M-1.7B参数),在MixtureVitae上训练的模型在一系列标准基准测试中始终优于其他宽松数据集,并在1.7B参数/300B记号设置下,它们超越了FineWeb-Edu,并在训练后期接近DCLM。性能在MMLU、数学和代码基准测试上特别强大:在300B MixtureVitae记号上预训练的1.7B模型与强大的1.7B指导调整基线相匹配或超过了GSM8K、HumanEval和MBPP,尽管使用的记号数量少了36倍(300B vs. ~11T)。通过彻底的去污分析支持,这些结果表明,高指导和推理密度的宽松优先数据,通过许可和来源相关风险分级,可以为训练能力强大的LLM提供一个实用且风险缓解的基础,减少对广泛网络抓取的依赖,而不损害竞争力。 代码:https://github.com/ontocord/mixturevitae
更新时间: 2025-12-15 17:24:36
领域: cs.CL,cs.AI,cs.LG
A Nonparametric Statistics Approach to Feature Selection in Deep Neural Networks with Theoretical Guarantees
This paper tackles the problem of feature selection in a highly challenging setting: $\mathbb{E}(y | \boldsymbol{x}) = G(\boldsymbol{x}_{\mathcal{S}_0})$, where $\mathcal{S}_0$ is the set of relevant features and $G$ is an unknown, potentially nonlinear function subject to mild smoothness conditions. Our approach begins with feature selection in deep neural networks, then generalizes the results to H{ö}lder smooth functions by exploiting the strong approximation capabilities of neural networks. Unlike conventional optimization-based deep learning methods, we reformulate neural networks as index models and estimate $\mathcal{S}_0$ using the second-order Stein's formula. This gradient-descent-free strategy guarantees feature selection consistency with a sample size requirement of $n = Ω(p^2)$, where $p$ is the feature dimension. To handle high-dimensional scenarios, we further introduce a screening-and-selection mechanism that achieves nonlinear selection consistency when $n = Ω(s \log p)$, with $s$ representing the sparsity level. Additionally, we refit a neural network on the selected features for prediction and establish performance guarantees under a relaxed sparsity assumption. Extensive simulations and real-data analyses demonstrate the strong performance of our method even in the presence of complex feature interactions.
Updated: 2025-12-15 17:22:49
标题: 一种具有理论保证的深度神经网络特征选择的非参数统计方法
摘要: 本文解决了在一个极具挑战性的环境中进行特征选择的问题:$\mathbb{E}(y | \boldsymbol{x}) = G(\boldsymbol{x}_{\mathcal{S}_0})$,其中$\mathcal{S}_0$是相关特征的集合,$G$是一个未知的、可能是非线性函数,受到轻微平滑条件的限制。我们的方法从深度神经网络中开始进行特征选择,然后通过利用神经网络的强大逼近能力将结果推广到H{ö}lder平滑函数。与传统的基于优化的深度学习方法不同,我们将神经网络重新表述为指数模型,并利用第二阶Stein公式估计$\mathcal{S}_0$。这种无梯度下降的策略保证了特征选择的一致性,样本量要求为$n = Ω(p^2)$,其中$p$是特征维度。为了处理高维情况,我们进一步引入了一个筛选和选择机制,当$n = Ω(s \log p)$时,可以实现非线性选择的一致性,其中$s$代表稀疏水平。此外,我们在所选特征上重新拟合神经网络进行预测,并在一个放松的稀疏性假设下建立性能保证。大量的模拟和实际数据分析表明,即使存在复杂的特征交互作用,我们的方法在性能方面表现出色。
更新时间: 2025-12-15 17:22:49
领域: stat.ML,cs.LG
Improvement of AMPs Identification with Generative Adversarial Network and Ensemble Classification
Identification of antimicrobial peptides is an important and necessary issue in today's era. Antimicrobial peptides are essential as an alternative to antibiotics for biomedical applications and many other practical applications. These oligopeptides are useful in drug design and cause innate immunity against microorganisms. Artificial intelligence algorithms have played a significant role in the ease of identifying these peptides.This research is improved by improving proposed method in the field of antimicrobial peptides prediction. Suggested method is improved by combining the best coding method from different perspectives, In the following a deep neural network to balance the imbalanced combined datasets. The results of this research show that the proposed method have a significant improvement in the accuracy and efficiency of the prediction of antimicrobial peptides and are able to provide the best results compared to the existing methods. These development in the field of prediction and classification of antimicrobial peptides, basically in the fields of medicine and pharmaceutical industries, have high effectiveness and application.
Updated: 2025-12-15 17:22:43
标题: 使用生成对抗网络和集成分类方法提高AMPs的识别
摘要: 鉴定抗微生物肽是当今时代一个重要且必要的问题。抗微生物肽是替代抗生素用于生物医学应用和许多其他实际应用的必需品。这些寡肽在药物设计中是有用的,并引起对微生物的先天免疫。人工智能算法在识别这些肽方面发挥了重要作用。本研究通过改进提出的方法在抗微生物肽预测领域取得了进展。建议的方法通过结合不同视角的最佳编码方法,进一步通过深度神经网络来平衡不平衡的合并数据集。本研究结果表明,所提出的方法在抗微生物肽预测的准确性和效率方面有显著改进,并能够提供比现有方法更好的结果。在抗微生物肽预测和分类领域的发展,主要是在医学和制药行业,具有很高的有效性和应用性。
更新时间: 2025-12-15 17:22:43
领域: cs.LG,cs.AI
Memory in the Age of AI Agents
Memory has emerged, and will continue to remain, a core capability of foundation model-based agents. As research on agent memory rapidly expands and attracts unprecedented attention, the field has also become increasingly fragmented. Existing works that fall under the umbrella of agent memory often differ substantially in their motivations, implementations, and evaluation protocols, while the proliferation of loosely defined memory terminologies has further obscured conceptual clarity. Traditional taxonomies such as long/short-term memory have proven insufficient to capture the diversity of contemporary agent memory systems. This work aims to provide an up-to-date landscape of current agent memory research. We begin by clearly delineating the scope of agent memory and distinguishing it from related concepts such as LLM memory, retrieval augmented generation (RAG), and context engineering. We then examine agent memory through the unified lenses of forms, functions, and dynamics. From the perspective of forms, we identify three dominant realizations of agent memory, namely token-level, parametric, and latent memory. From the perspective of functions, we propose a finer-grained taxonomy that distinguishes factual, experiential, and working memory. From the perspective of dynamics, we analyze how memory is formed, evolved, and retrieved over time. To support practical development, we compile a comprehensive summary of memory benchmarks and open-source frameworks. Beyond consolidation, we articulate a forward-looking perspective on emerging research frontiers, including memory automation, reinforcement learning integration, multimodal memory, multi-agent memory, and trustworthiness issues. We hope this survey serves not only as a reference for existing work, but also as a conceptual foundation for rethinking memory as a first-class primitive in the design of future agentic intelligence.
Updated: 2025-12-15 17:22:34
标题: AI代理的时代记忆
摘要: 记忆力已经出现,并将继续保持,作为基于基础模型的智能体的核心能力。随着对智能体记忆的研究迅速扩展并吸引了前所未有的关注,这一领域也变得日益碎片化。现有的作品虽然都属于智能体记忆的范畴,但在其动机、实现方式和评估方案上通常存在显著差异,而松散定义的记忆术语的泛滥进一步模糊了概念的清晰度。传统的分类方法,如长/短期记忆,已被证明无法捕捉当代智能体记忆系统的多样性。本文旨在提供当前智能体记忆研究的最新概况。我们首先清晰地界定了智能体记忆的范围,并将其与相关概念如LLM记忆、检索增强生成(RAG)和上下文工程进行区分。然后,我们通过形式、功能和动态的统一视角来检验智能体记忆。从形式的角度来看,我们确定了智能体记忆的三种主要实现方式,即记号级、参数级和潜在记忆。从功能的角度来看,我们提出了一个更细粒度的分类法,区分了事实性、经验性和工作记忆。从动态的角度来看,我们分析了记忆如何随时间形成、演变和检索。为了支持实际开发,我们编制了关于记忆基准和开源框架的全面摘要。除了整合外,我们还提出了对新兴研究前沿的展望,包括记忆自动化、强化学习整合、多模态记忆、多智能体记忆和可信度问题。我们希望这项调查不仅作为现有工作的参考,而且作为在未来智能设计中重新思考记忆作为一种一流基元的概念基础。
更新时间: 2025-12-15 17:22:34
领域: cs.CL,cs.AI
Improving Generative Ad Text on Facebook using Reinforcement Learning
Generative artificial intelligence (AI), in particular large language models (LLMs), is poised to drive transformative economic change. LLMs are pre-trained on vast text data to learn general language patterns, but a subsequent post-training phase is critical to align them for specific real-world tasks. Reinforcement learning (RL) is the leading post-training technique, yet its economic impact remains largely underexplored and unquantified. We examine this question through the lens of the first deployment of an RL-trained LLM for generative advertising on Facebook. Integrated into Meta's Text Generation feature, our model, "AdLlama," powers an AI tool that helps advertisers create new variations of human-written ad text. To train this model, we introduce reinforcement learning with performance feedback (RLPF), a post-training method that uses historical ad performance data as a reward signal. In a large-scale 10-week A/B test on Facebook spanning nearly 35,000 advertisers and 640,000 ad variations, we find that AdLlama improves click-through rates by 6.7% (p=0.0296) compared to a supervised imitation model trained on curated ads. This represents a substantial improvement in advertiser return on investment on Facebook. We also find that advertisers who used AdLlama generated more ad variations, indicating higher satisfaction with the model's outputs. To our knowledge, this is the largest study to date on the use of generative AI in an ecologically valid setting, offering an important data point quantifying the tangible impact of RL post-training. Furthermore, the results show that RLPF is a promising and generalizable approach for metric-driven post-training that bridges the gap between highly capable language models and tangible outcomes.
Updated: 2025-12-15 17:21:22
标题: 使用强化学习提升在Facebook上的生成式广告文本
摘要: 生成式人工智能(AI),尤其是大型语言模型(LLMs),正处于推动经济变革的关键时刻。LLMs在庞大的文本数据上进行预训练,以学习一般语言模式,但随后的后训练阶段对于将它们与特定的实际任务对齐至关重要。强化学习(RL)是领先的后训练技术,但其经济影响尚未得到充分探讨和量化。我们通过首次在Facebook上部署RL训练的LLM进行生成式广告的实例来研究这个问题。集成到Meta的文本生成功能中,我们的模型“AdLlama”驱动了一个AI工具,帮助广告商创建人类编写的广告文本的新变体。为了训练这个模型,我们引入了带有性能反馈的强化学习(RLPF),这是一种后训练方法,它使用历史广告表现数据作为奖励信号。在Facebook上进行的覆盖近35,000个广告商和640,000个广告变体的大规模10周A/B测试中,我们发现AdLlama相对于在策划广告上训练的监督模型将点击率提高了6.7%(p=0.0296)。这代表了广告商在Facebook上的投资回报的实质性改进。我们还发现使用AdLlama的广告商生成了更多的广告变体,表明他们对模型的输出更满意。据我们所知,这是迄今在生态有效设置中使用生成式AI的最大研究,提供了一个重要的数据点,量化了RL后训练的实际影响。此外,结果表明RLPF是一种有前途且可推广的基于度量驱动的后训练方法,弥合了高度能力的语言模型和实际结果之间的差距。
更新时间: 2025-12-15 17:21:22
领域: cs.LG
Verifying Rumors via Stance-Aware Structural Modeling
Verifying rumors on social media is critical for mitigating the spread of false information. The stances of conversation replies often provide important cues to determine a rumor's veracity. However, existing models struggle to jointly capture semantic content, stance information, and conversation strructure, especially under the sequence length constraints of transformer-based encoders. In this work, we propose a stance-aware structural modeling that encodes each post in a discourse with its stance signal and aggregates reply embedddings by stance category enabling a scalable and semantically enriched representation of the entire thread. To enhance structural awareness, we introduce stance distribution and hierarchical depth as covariates, capturing stance imbalance and the influence of reply depth. Extensive experiments on benchmark datasets demonstrate that our approach significantly outperforms prior methods in the ability to predict truthfulness of a rumor. We also demonstrate that our model is versatile for early detection and cross-platfrom generalization.
Updated: 2025-12-15 17:16:56
标题: 通过立场感知结构建模验证谣言
摘要: 在社交媒体上验证谣言对于减少虚假信息的传播至关重要。对话回复的立场往往提供了确定谣言真实性的重要线索。然而,现有模型往往难以同时捕捉语义内容、立场信息和对话结构,尤其是在基于transformer编码器的序列长度约束下。在这项工作中,我们提出了一种立场感知的结构建模方法,通过对每个发帖进行立场信号编码,并根据立场类别聚合回复嵌入,实现了整个主题的可扩展和语义丰富表示。为了提高结构意识,我们引入了立场分布和层次深度作为协变量,捕捉了立场不平衡和回复深度的影响。对基准数据集的大量实验表明,我们的方法在预测谣言真实性方面明显优于先前的方法。我们还证明了我们的模型对于早期检测和跨平台泛化具有多面性。
更新时间: 2025-12-15 17:16:56
领域: cs.CL,cs.AI,cs.CY
WALINET: A water and lipid identification convolutional Neural Network for nuisance signal removal in 1H MR Spectroscopic Imaging
Purpose. Proton Magnetic Resonance Spectroscopic Imaging (1H-MRSI) provides non-invasive spectral-spatial mapping of metabolism. However, long-standing problems in whole-brain 1H-MRSI are spectral overlap of metabolite peaks with large lipid signal from scalp, and overwhelming water signal that distorts spectra. Fast and effective methods are needed for high-resolution 1H-MRSI to accurately remove lipid and water signals while preserving the metabolite signal. The potential of supervised neural networks for this task remains unexplored, despite their success for other MRSI processing. Methods. We introduce a deep-learning method based on a modified Y-NET network for water and lipid removal in whole-brain 1H-MRSI. The WALINET (WAter and LIpid neural NETwork) was compared to conventional methods such as the state-of-the-art lipid L2 regularization and Hankel-Lanczos singular value decomposition (HLSVD) water suppression. Methods were evaluated on simulated and in-vivo whole-brain MRSI using NMRSE, SNR, CRLB, and FWHM metrics. Results. WALINET is significantly faster and needs 8s for high-resolution whole-brain MRSI, compared to 42 minutes for conventional HLSVD+L2. Quantitative analysis shows WALINET has better performance than HLSVD+L2: 1) more lipid removal with 41% lower NRMSE, 2) better metabolite signal preservation with 71% lower NRMSE in simulated data, 155% higher SNR and 50% lower CRLB in in-vivo data. Metabolic maps obtained by WALINET in healthy subjects and patients show better gray/white-matter contrast with more visible structural details. Conclusions. WALINET has superior performance for nuisance signal removal and metabolite quantification on whole-brain 1H-MRSI compared to conventional state-of-the-art techniques. This represents a new application of deep-learning for MRSI processing, with potential for automated high-throughput workflow.
Updated: 2025-12-15 17:08:11
标题: WALINET:一种用于1H MR谱成像中去除干扰信号的水和脂质识别卷积神经网络
摘要: 目的。质子磁共振波谱成像(1H-MRSI)提供代谢的无创谱空间映射。然而,整个脑部1H-MRSI存在长期存在的问题,即代谢产物峰与头皮大量脂质信号的谱重叠,以及压倒性的水信号扭曲谱。需要快速有效的方法来进行高分辨率1H-MRSI,以准确去除脂质和水信号同时保留代谢产物信号。监督神经网络在这一任务中的潜力尚未被探索,尽管它们在其他MRSI处理中取得了成功。 方法。我们介绍了一种基于改进的Y-NET网络的深度学习方法,用于整个脑部1H-MRSI中的水和脂质去除。WALINET(WAter and LIpid神经NETwork)与传统方法进行了比较,如最先进的脂质L2正则化和Hankel-Lanczos奇异值分解(HLSVD)水抑制。方法通过使用NMRSE、SNR、CRLB和FWHM指标对模拟和体内整个脑部MRSI进行评估。 结果。与传统的HLSVD+L2相比,WALINET速度显著更快,高分辨率整个脑部MRSI仅需要8秒,而HLSVD+L2需要42分钟。定量分析表明,WALINET在模拟数据中的表现优于HLSVD+L2:1)脂质去除更多,NRMSE降低41%,2)在体内数据中,代谢产物信号保存更好,模拟数据中NRMSE降低71%,SNR增加155%,CRLB降低50%。WALINET在健康受试者和患者中获得的代谢图显示更好的灰/白质对比度,结构细节更加清晰可见。 结论。与传统的最先进技术相比,WALINET在整个脑部1H-MRSI上具有更优异的干扰信号去除和代谢产物定量性能。这代表了对MRSI处理的深度学习的新应用,具有自动化高通量工作流的潜力。
更新时间: 2025-12-15 17:08:11
领域: eess.IV,cs.CV,cs.LG
Can Language Models Discover Scaling Laws?
Discovering scaling laws for predicting model performance at scale is a fundamental and open-ended challenge, mostly reliant on slow, case specific human experimentation. To investigate the potential for LLMs to automate this process, we collect over 5,000 experiments from existing literature and curate seven diverse scaling law discovery tasks. While existing agents struggle to produce accurate law formulas, this paper introduces SLDAgent, an evolution-based agent that co-optimize the scaling law model and the parameters, enabling it to autonomously explore complex relationships between variables. For the first time, we demonstrates that SLDAgent can automatically discover laws that exhibit consistently more accurate extrapolation than their established, human-derived counterparts across all tasks. Through comprehensive analysis, we elucidate why these discovered laws are superior and verify their practical utility in both pretraining and finetuning applications. This work establishes a new paradigm for agentic scientific discovery, showing that AI systems can understand their own scaling behavior, and can contribute novel and practical knowledge back to the research community.
Updated: 2025-12-15 17:03:31
标题: 语言模型能发现尺度律吗?
摘要: 发现用于预测规模下模型性能的标度律是一个基础性的挑战,通常依赖于缓慢、特定案例的人类实验。为了调查LLMs自动化这一过程的潜力,我们收集了来自现有文献的5000多个实验,并精心策划了七个不同的标度律发现任务。虽然现有的代理难以产生准确的法则公式,但本文介绍了SLDAgent,这是一个基于演化的代理,可以共同优化标度律模型和参数,使其能够自主地探索变量之间的复杂关系。我们首次证明,SLDAgent可以自动发现法则,展现出在所有任务中比其已建立的、由人类推导的对应物更准确的外推。通过全面的分析,我们阐明了为何这些发现的法则更为优越,并验证了它们在预训练和微调应用中的实用性。这项工作确立了一种新的代理科学发现范式,表明AI系统可以理解自己的规模行为,并可以向研究社区贡献新颖而实用的知识。
更新时间: 2025-12-15 17:03:31
领域: cs.LG,cs.AI,cs.CL
Pancakes: Consistent Multi-Protocol Image Segmentation Across Biomedical Domains
A single biomedical image can be meaningfully segmented in multiple ways, depending on the desired application. For instance, a brain MRI can be segmented according to tissue types, vascular territories, broad anatomical regions, fine-grained anatomy, or pathology, etc. Existing automatic segmentation models typically either (1) support only a single protocol, the one they were trained on, or (2) require labor-intensive manual prompting to specify the desired segmentation. We introduce Pancakes, a framework that, given a new image from a previously unseen domain, automatically generates multi-label segmentation maps for multiple plausible protocols, while maintaining semantic consistency across related images. Pancakes introduces a new problem formulation that is not currently attainable by existing foundation models. In a series of experiments on seven held-out datasets, we demonstrate that our model can significantly outperform existing foundation models in producing several plausible whole-image segmentations, that are semantically coherent across images.
Updated: 2025-12-15 17:00:21
标题: 煎饼:跨生物医学领域一致的多协议图像分割
摘要: 一幅生物医学图像可以根据所需的应用程序以多种方式进行有意义的分割。例如,脑部MRI可以根据组织类型、血管领域、广泛的解剖区域、细粒度解剖结构或病理等进行分割。现有的自动分割模型通常要么(1)仅支持单个协议,即它们所经过训练的协议,要么(2)需要耗费大量人工劳动来指定所需的分割。我们引入了Pancakes框架,该框架可以在给定来自先前未见领域的新图像时,自动为多个合理的协议生成多标签分割图,同时在相关图像之间保持语义一致性。Pancakes引入了一个新的问题形式化,这是目前现有基础模型无法实现的。在七个保留数据集上进行的一系列实验中,我们证明了我们的模型可以在生成几个合理的整图分割方面明显优于现有的基础模型,并且这些分割在图像之间具有语义一致性。
更新时间: 2025-12-15 17:00:21
领域: cs.CV,cs.LG
Adaptive Sampling for Hydrodynamic Stability
An adaptive sampling approach for efficient detection of bifurcation boundaries in parametrized fluid flow problems is presented herein. The study extends the machine-learning approach of Silvester (Machine Learning for Hydrodynamic Stability, arXiv:2407.09572), where a classifier network was trained on preselected simulation data to identify bifurcated and nonbifurcated flow regimes. In contrast, the proposed methodology introduces adaptivity through a flow-based deep generative model that automatically refines the sampling of the parameter space. The strategy has two components: a classifier network maps the flow parameters to a bifurcation probability, and a probability density estimation technique (KRnet) for the generation of new samples at each adaptive step. The classifier output provides a probabilistic measure of flow stability, and the Shannon entropy of these predictions is employed as an uncertainty indicator. KRnet is trained to approximate a probability density function that concentrates sampling in regions of high entropy, thereby directing computational effort towards the evolving bifurcation boundary. This coupling between classification and generative modeling establishes a feedback-driven adaptive learning process analogous to error-indicator based refinement in contemporary partial differential equation solution strategies. Starting from a uniform parameter distribution, the new approach achieves accurate bifurcation boundary identification with significantly fewer Navier--Stokes simulations, providing a scalable foundation for high-dimensional stability analysis.
Updated: 2025-12-15 17:00:09
标题: 流体动力学稳定性的自适应采样
摘要: 这篇文章介绍了一种用于在参数化流体流动问题中高效检测分叉边界的自适应采样方法。该研究扩展了Silvester的机器学习方法(用于流体动力学稳定性的机器学习,arXiv:2407.09572),该方法在预先选择的模拟数据上训练了一个分类器网络,以识别分叉和非分叉流动状态。相比之下,提出的方法通过基于流动的深度生成模型引入了自适应性,该模型可以自动细化参数空间的采样。该策略有两个组成部分:一个分类器网络将流动参数映射到分叉概率,以及一种概率密度估计技术(KRnet),用于在每个自适应步骤中生成新样本。分类器的输出提供了流动稳定性的概率度量,并且这些预测的香农熵被用作不确定性指标。KRnet被训练为逼近一个概率密度函数,将采样集中在高熵区域,从而将计算工作重点放在不断演变的分叉边界上。分类和生成建模之间的耦合建立了一种反馈驱动的自适应学习过程,类似于当代偏微分方程解策略中基于误差指示器的细化。从均匀参数分布开始,新方法通过更少的Navier-Stokes模拟实现了准确的分叉边界识别,为高维稳定性分析提供了可扩展的基础。
更新时间: 2025-12-15 17:00:09
领域: physics.flu-dyn,cs.LG
Actively Learning Joint Contours of Multiple Computer Experiments
Contour location$\unicode{x2014}$the process of sequentially training a surrogate model to identify the design inputs that result in a pre-specified response value from a single computer experiment$\unicode{x2014}$is a well-studied active learning problem. Here, we tackle a related but distinct problem: identifying the input configuration that returns pre-specified values of multiple independent computer experiments simultaneously. Motivated by computer experiments of the rotational torques acting upon a vehicle in flight, we aim to identify stable flight conditions which result in zero torque forces. We propose a "joint contour location" (jCL) scheme that strikes a strategic balance between exploring the multiple response surfaces while exploiting learning of the intersecting contours. We employ both shallow and deep Gaussian process surrogates, but our jCL procedure is applicable to any surrogate that can provide posterior predictive distributions. Our jCL designs significantly outperform existing (single response) CL strategies, enabling us to efficiently locate the joint contour of our motivating computer experiments.
Updated: 2025-12-15 17:00:04
标题: 主动学习多个计算机实验的联合轮廓
摘要: 轮廓定位——逐步训练代理模型以识别设计输入,从而在单个计算机实验中产生预定响应数值的过程,是一个经过深入研究的主动学习问题。在这里,我们解决一个相关但不同的问题:同时识别返回多个独立计算机实验的预定数值的输入配置。受到飞行中作用于飞行器的旋转力矩计算机实验的启发,我们旨在识别导致零力矩的稳定飞行条件。我们提出了一个“联合轮廓定位”(jCL)方案,平衡探索多个响应表面和利用交叉轮廓的学习。我们使用浅层和深层高斯过程代理,但我们的jCL程序适用于任何能够提供后验预测分布的代理。我们的jCL设计明显优于现有的(单一响应)CL策略,使我们能够高效地定位我们激励计算机实验的联合轮廓。
更新时间: 2025-12-15 17:00:04
领域: stat.ME,cs.LG
Enhancing lithological interpretation from petrophysical well log of IODP expedition 390/393 using machine learning
Enhanced lithological interpretation from well logs plays a key role in geological resource exploration and mapping, as well as in geo-environmental modeling studies. Core and cutting information is useful for making sound interpretations of well logs; however, these are rarely collected at each depth due to high costs. Moreover, well log interpretation using traditional methods is constrained by poor borehole conditions. Traditional statistical methods are mostly linear, often failing to discriminate between lithology and rock facies, particularly when dealing with overlapping well log signals characterized by the structural and compositional variation of rock types. In this study, we develop multiple supervised and unsupervised machine learning algorithms to jointly analyze multivariate well log data from Integrated Ocean Drilling Program (IODP) expeditions 390 and 393 for enhanced lithological interpretations. Among the algorithms, Logistic Regression, Decision Trees, Gradient Boosting, Support Vector Machines (SVM), k-Nearest Neighbors (KNN), and Multi-Layer Perceptron (MLP) neural network models, the Decision Tree and Gradient Boosting models outperformed the others, achieving an accuracy of 0.9950 and an F1-score of 0.9951. While unsupervised machine learning (ML) provides the foundation for cluster information that inherently supports the classification algorithm, supervised ML is applied to devise a data-driven lithology clustering mechanism for IODP datasets. The joint ML-based method developed here has the potential to be further explored for analyzing other well log datasets from the world's oceans.
Updated: 2025-12-15 16:59:13
标题: 利用机器学习提升IODP探险390/393号船井岩石学解释
摘要: 增强了从测井资料中对岩性的解释在地质资源勘探和制图以及地质环境建模研究中发挥关键作用。岩心和岩屑信息对于进行准确的测井解释很有用;然而,由于成本高昂,这些信息很少在每个深度收集。此外,使用传统方法进行测井解释受到钻孔条件差的限制。传统统计方法大多是线性的,通常难以区分岩性和岩相,特别是处理具有岩石类型结构和组成变化特征的重叠测井信号时。在本研究中,我们开发了多种监督和无监督机器学习算法,联合分析来自综合海洋钻探计划(IODP) 390和393远征的多变量测井数据,以增强岩性解释。在这些算法中,逻辑回归、决策树、梯度提升、支持向量机(SVM)、k-最近邻(KNN)和多层感知器(MLP)神经网络模型中,决策树和梯度提升模型表现优于其他模型,实现了0.9950的准确度和0.9951的F1分数。虽然无监督机器学习提供了支持分类算法的聚类信息基础,监督机器学习被应用于为IODP数据集设计一个基于数据驱动的岩性聚类机制。这里开发的联合基于机器学习的方法有潜力进一步用于分析来自世界各大洋的其他测井数据集。
更新时间: 2025-12-15 16:59:13
领域: physics.geo-ph,cs.LG
Async Control: Stress-testing Asynchronous Control Measures for LLM Agents
LLM-based software engineering agents are increasingly used in real-world development tasks, often with access to sensitive data or security-critical codebases. Such agents could intentionally sabotage these codebases if they were misaligned. We investigate asynchronous monitoring, in which a monitoring system reviews agent actions after the fact. Unlike synchronous monitoring, this approach does not impose runtime latency, while still attempting to disrupt attacks before irreversible harm occurs. We treat monitor development as an adversarial game between a blue team (who design monitors) and a red team (who create sabotaging agents). We attempt to set the game rules such that they upper bound the sabotage potential of an agent based on Claude 4.1 Opus. To ground this game in a realistic, high-stakes deployment scenario, we develop a suite of 5 diverse software engineering environments that simulate tasks that an agent might perform within an AI developer's internal infrastructure. Over the course of the game, we develop an ensemble monitor that achieves a 6% false negative rate at 1% false positive rate on a held out test environment. Then, we estimate risk of sabotage at deployment time by extrapolating from our monitor's false negative rate. We describe one simple model for this extrapolation, present a sensitivity analysis, and describe situations in which the model would be invalid. Code is available at: https://github.com/UKGovernmentBEIS/async-control.
Updated: 2025-12-15 16:56:28
标题: 异步控制:对LLM代理的异步控制措施进行压力测试
摘要: 基于LLM的软件工程代理越来越多地用于实际开发任务,通常可以访问敏感数据或安全关键的代码库。如果这些代理不对齐,它们可能会有意破坏这些代码库。我们研究了异步监控,其中监控系统在事后审查代理操作。与同步监控不同,这种方法不会导致运行时延迟,同时仍然试图在不可逆的损害发生之前干扰攻击。我们将监视器开发视为一个蓝队(设计监视器)与一个红队(创建破坏代理)之间的对抗游戏。我们试图设置游戏规则,使其基于Claude 4.1 Opus上限制代理的破坏潜力。为了将这个游戏置于一个现实的、高风险的部署场景中,我们开发了一个包含5个不同软件工程环境的套件,模拟代理可能在人工智能开发者的内部基础设施中执行的任务。在游戏过程中,我们开发了一个集成监视器,在一个保留的测试环境上实现了6%的误报率和1%的误报率。然后,我们通过从我们监视器的误报率推断,估计了部署时的破坏风险。我们描述了一个简单的模型用于这种推断,进行了敏感性分析,并描述了模型无效的情况。代码可在以下链接获取:https://github.com/UKGovernmentBEIS/async-control。
更新时间: 2025-12-15 16:56:28
领域: cs.LG
Taint Analysis for Graph APIs Focusing on Broken Access Control
We present the first systematic approach to static and dynamic taint analysis for Graph APIs focusing on broken access control. The approach comprises the following. We taint nodes of the Graph API if they represent data requiring specific privileges in order to be retrieved or manipulated, and identify API calls which are related to sources and sinks. Then, we statically analyze whether a tainted information flow between API source and sink calls occurs. To this end, we model the API calls using graph transformation rules. We subsequently use Critical Pair Analysis to automatically analyze potential dependencies between rules representing source calls and rules representing sink calls. We distinguish direct from indirect tainted information flow and argue under which conditions the Critical Pair Analysis is able to detect not only direct, but also indirect tainted flow. The static taint analysis (i) identifies flows that need to be further reviewed, since tainted nodes may be created by an API call and used or manipulated by another API call later without having the necessary privileges, and (ii) can be used to systematically design dynamic security tests for broken access control. The dynamic taint analysis checks if potential broken access control risks detected during the static taint analysis really occur. We apply the approach to a part of the GitHub GraphQL API. The application illustrates that our analysis supports the detection of two types of broken access control systematically: the case where users of the API may not be able to access or manipulate information, although they should be able to do so; and the case where users (or attackers) of the API may be able to access/manipulate information that they should not.
Updated: 2025-12-15 16:54:27
标题: 基于破坏访问控制的图形API污点分析
摘要: 我们提出了第一个系统化的针对图形API的静态和动态污点分析方法,重点关注破坏访问控制。该方法包括以下步骤:如果图形API的节点代表需要特定权限才能检索或操作的数据,我们对其进行污点标记,并识别与源和污点相关的API调用。然后,我们静态分析API源和污点调用之间是否存在被污染的信息流。为此,我们使用图转换规则对API调用进行建模。随后,我们使用关键对分析自动分析代表源调用和代表污点调用的规则之间的潜在依赖关系。我们区分直接和间接的被污染信息流,并讨论在哪些条件下关键对分析能够检测到不仅直接的,还有间接的被污染流。静态污点分析(i)识别需要进一步审查的流程,因为API调用可能会创建被污染的节点,并且稍后另一个API调用可能会使用或操作这些节点而没有必要的权限,以及(ii)可以用于系统地设计破坏访问控制的动态安全测试。动态污点分析检查静态污点分析期间检测到的潜在破坏访问控制风险是否真的发生。我们将该方法应用于GitHub GraphQL API的一部分。该应用程序说明了我们的分析支持系统化地检测两种破坏访问控制的类型:一种情况是API的用户可能无法访问或操作信息,尽管他们应该能够这样做;另一种情况是API的用户(或攻击者)可能能够访问/操作他们不应该访问的信息。
更新时间: 2025-12-15 16:54:27
领域: cs.CR,cs.LO,cs.SE
Human-computer interactions predict mental health
Scalable assessments of mental illness, the leading driver of disability worldwide, remain a critical roadblock toward accessible and equitable care. Here, we show that human-computer interactions encode mental health with state-of-the-art biomarker precision. We introduce MAILA, a MAchine-learning framework for Inferring Latent mental states from digital Activity. We trained MAILA to predict 1.3 million mental-health self-reports from 20,000 cursor and touchscreen recordings recorded in 9,000 online participants. The dataset includes 2,000 individuals assessed longitudinally, 1,500 diagnosed with depression, and 500 with obsessive-compulsive disorder. MAILA tracks dynamic mental states along three orthogonal dimensions, identifies individuals living with mental illness, and achieves near-ceiling accuracy when predicting group-level mental health. By extracting non-verbal signatures of psychological function that have so far remained untapped, MAILA represents a key step toward foundation models for mental health. The ability to decode mental states at zero marginal cost creates new opportunities in neuroscience, medicine, and public health, while raising urgent questions about privacy, agency, and autonomy online.
Updated: 2025-12-15 16:47:48
标题: 人机交互预测心理健康
摘要: 精神疾病是全球残疾的主要驱动因素,而可伸缩的精神疾病评估仍然是通往可获得和公平护理的关键障碍。在这里,我们展示人机交互以最先进的生物标志物精度编码精神健康。我们介绍了MAILA,一种用于从数字活动推断潜在精神状态的机器学习框架。我们训练了MAILA来预测来自9,000名在线参与者记录的20,000个鼠标和触摸屏录音中的1.3百万份精神健康自述。数据集包括2,000名纵向评估的个体,其中1,500名被诊断为抑郁症,500名患有强迫症。MAILA沿着三个正交维度跟踪动态的精神状态,识别出患有精神疾病的个体,并在预测群体水平的精神健康时实现接近完美的准确性。通过提取迄今尚未被利用的心理功能的非语言签名,MAILA代表了精神健康基础模型的关键步骤。在零边际成本解码精神状态的能力在神经科学、医学和公共卫生领域创造了新的机会,同时也引发了关于在线隐私、代理和自主权的紫急问题。
更新时间: 2025-12-15 16:47:48
领域: q-bio.NC,cs.AI,cs.HC
MALLM: Multi-Agent Large Language Models Framework
Multi-agent debate (MAD) has demonstrated the ability to augment collective intelligence by scaling test-time compute and leveraging expertise. Current frameworks for multi-agent debate are often designed towards tool use, lack integrated evaluation, or provide limited configurability of agent personas, response generators, discussion paradigms, and decision protocols. We introduce MALLM (Multi-Agent Large Language Models), an open-source framework that enables systematic analysis of MAD components. MALLM offers more than 144 unique configurations of MAD, including (1) agent personas (e.g., Expert, Personality), (2) response generators (e.g., Critical, Reasoning), (3) discussion paradigms (e.g., Memory, Relay), and (4) decision protocols (e.g., Voting, Consensus). MALLM uses simple configuration files to define a debate. Furthermore, MALLM can load any textual Hugging Face dataset (e.g., MMLU-Pro, WinoGrande) and provides an evaluation pipeline for easy comparison of MAD configurations. MALLM enables researchers to systematically configure, run, and evaluate debates for their problems, facilitating the understanding of the components and their interplay.
Updated: 2025-12-15 16:45:40
标题: MALLM: 多智能体大型语言模型框架
摘要: 多智能体辩论(MAD)已经证明通过扩展测试时间计算和利用专业知识,能够增强集体智能。目前的多智能体辩论框架通常设计用于工具使用,缺乏综合评估,或提供有限的智能体人物、响应生成器、讨论范式和决策协议的可配置性。我们介绍了MALLM(多智能体大型语言模型),这是一个开源框架,能够系统分析MAD组件。MALLM提供了超过144种独特的MAD配置,包括(1)智能体人物(例如专家、个性),(2)响应生成器(例如批判、推理),(3)讨论范式(例如记忆、中继),以及(4)决策协议(例如投票、共识)。MALLM使用简单的配置文件来定义一个辩论。此外,MALLM可以加载任何文本Hugging Face数据集(例如MMLU-Pro、WinoGrande),并提供一个评估管道,方便比较MAD配置。MALLM使研究人员能够系统配置、运行和评估针对其问题的辩论,促进对组件及其相互作用的理解。
更新时间: 2025-12-15 16:45:40
领域: cs.MA,cs.AI,cs.CL
FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing
Three-dimensional (3D) point clouds are increasingly used in applications such as autonomous driving, robotics, and virtual reality (VR). Point-based neural networks (PNNs) have demonstrated strong performance in point cloud analysis, originally targeting small-scale inputs. However, as PNNs evolve to process large-scale point clouds with hundreds of thousands of points, all-to-all computation and global memory access in point cloud processing introduce substantial overhead, causing $O(n^2)$ computational complexity and memory traffic where n is the number of points}. Existing accelerators, primarily optimized for small-scale workloads, overlook this challenge and scale poorly due to inefficient partitioning and non-parallel architectures. To address these issues, we propose FractalCloud, a fractal-inspired hardware architecture for efficient large-scale 3D point cloud processing. FractalCloud introduces two key optimizations: (1) a co-designed Fractal method for shape-aware and hardware-friendly partitioning, and (2) block-parallel point operations that decompose and parallelize all point operations. A dedicated hardware design with on-chip fractal and flexible parallelism further enables fully parallel processing within limited memory resources. Implemented in 28 nm technology as a chip layout with a core area of 1.5 $mm^2$, FractalCloud achieves 21.7x speedup and 27x energy reduction over state-of-the-art accelerators while maintaining network accuracy, demonstrating its scalability and efficiency for PNN inference.
Updated: 2025-12-15 16:45:22
标题: 分形云:一种受分形启发的用于高效大规模点云处理的架构
摘要: 三维(3D)点云在自动驾驶、机器人和虚拟现实(VR)等应用中越来越普遍。基于点的神经网络(PNNs)已经在点云分析中展现出强大的性能,最初针对小规模输入。然而,随着PNNs演变为处理数十万个点的大规模点云,点云处理中的全对全计算和全局内存访问引入了大量开销,导致$O(n^2)$的计算复杂度和内存流量,其中n是点的数量。现有的加速器主要针对小规模工作负载进行了优化,忽视了这一挑战,并且由于不高效的分区和非并行架构而扩展性较差。为了解决这些问题,我们提出了FractalCloud,一种受分形启发的硬件架构,用于高效的大规模3D点云处理。FractalCloud引入了两个关键优化:(1)一种共同设计的分形方法,用于形状感知和硬件友好的分区,以及(2)分块并行点操作,分解和并行化所有点操作。具有芯片上分形和灵活并行性的专用硬件设计进一步实现了在有限内存资源内的完全并行处理。作为一个核心面积为1.5 $mm^2$的芯片布局,FractalCloud在维持网络准确性的同时,实现了21.7倍的加速和27倍的能耗减少,显示出其对PNN推理的可扩展性和高效性。
更新时间: 2025-12-15 16:45:22
领域: cs.AR,cs.AI
A Deep Learning Model of Mental Rotation Informed by Interactive VR Experiments
Mental rotation -- the ability to compare objects seen from different viewpoints -- is a fundamental example of mental simulation and spatial world modelling in humans. Here we propose a mechanistic model of human mental rotation, leveraging advances in deep, equivariant, and neuro-symbolic learning. Our model consists of three stacked components: (1) an equivariant neural encoder, taking images as input and producing 3D spatial representations of objects, (2) a neuro-symbolic object encoder, deriving symbolic descriptions of objects from these spatial representations, and (3) a neural decision agent, comparing these symbolic descriptions to prescribe rotation simulations in 3D latent space via a recurrent pathway. Our model design is guided by the abundant experimental literature on mental rotation, which we complemented with experiments in VR where participants could at times manipulate the objects to compare, providing us with additional insights into the cognitive process of mental rotation. Our model captures well the performance, response times and behavior of participants in our and others' experiments. The necessity of each model component is shown through systematic ablations. Our work adds to a recent collection of deep neural models of human spatial reasoning, further demonstrating the potency of integrating deep, equivariant, and symbolic representations to model the human mind.
Updated: 2025-12-15 16:43:50
标题: 一个由交互式虚拟现实实验驱动的心理旋转的深度学习模型
摘要: 心理旋转——即比较从不同视角看到的对象的能力——是人类心理模拟和空间世界建模的基本示例。在这里,我们提出了一个关于人类心理旋转的机械模型,利用了深度、等变和神经符号学习的进展。我们的模型由三个堆叠的组件组成:(1)一个等变神经编码器,将图像作为输入,产生对象的3D空间表示,(2)一个神经符号对象编码器,从这些空间表示中提取对象的符号描述,(3)一个神经决策代理,将这些符号描述与通过递归路径在3D潜在空间中指定的旋转模拟进行比较。我们的模型设计是受心理旋转的大量实验文献的启发,我们在虚拟现实中进行了实验,参与者有时可以操纵对象进行比较,为我们提供了有关心理旋转认知过程的额外见解。我们的模型很好地捕捉了我们和其他人实验中参与者的表现、反应时间和行为。通过系统的消融实验表明了每个模型组件的必要性。我们的工作补充了最近一系列关于人类空间推理的深度神经模型,进一步展示了整合深度、等变和符号表示以建模人类思维的潜力。
更新时间: 2025-12-15 16:43:50
领域: q-bio.NC,cs.LG
Towards a Common Framework for Autoformalization
Autoformalization has emerged as a term referring to the automation of formalization - specifically, the formalization of mathematics using interactive theorem provers (proof assistants). Its rapid development has been driven by progress in deep learning, especially large language models (LLMs). More recently, the term has expanded beyond mathematics to describe the broader task of translating informal input into formal logical representations. At the same time, a growing body of research explores using LLMs to translate informal language into formal representations for reasoning, planning, and knowledge representation - often without explicitly referring to this process as autoformalization. As a result, despite addressing similar tasks, the largely independent development of these research areas has limited opportunities for shared methodologies, benchmarks, and theoretical frameworks that could accelerate progress. The goal of this paper is to review - explicit or implicit - instances of what can be considered autoformalization and to propose a unified framework, encouraging cross-pollination between different fields to advance the development of next generation AI systems.
Updated: 2025-12-15 16:43:14
标题: 走向一个自动形式化的共同框架
摘要: Autoformalization已经成为一个术语,指的是自动化形式化的过程 - 具体来说,使用交互式定理证明器(证明助手)形式化数学的过程。其快速发展受到深度学习,特别是大型语言模型(LLMs)的进展推动。最近,这个术语已经扩展到超出数学范畴,用来描述将非正式输入转化为形式逻辑表示的更广泛任务。与此同时,越来越多的研究探讨使用LLMs将非正式语言翻译为形式表示,以用于推理、规划和知识表示 - 通常没有明确地将这个过程称为autoformalization。因此,尽管这些研究领域涉及类似的任务,但它们的独立发展限制了共享方法、基准和理论框架的机会,这些可以加速进展。本文的目标是回顾 - 明确或隐含的 - 可以被视为autoformalization的实例,并提出一个统一框架,鼓励不同领域之间的交叉污染,以推动下一代人工智能系统的发展。
更新时间: 2025-12-15 16:43:14
领域: cs.AI
Comparative Analysis of Wave Scattering Numerical Modeling Using the Boundary Element Method and Physics-Informed Neural Networks
This study compares the Boundary Element Method (BEM) and Physics-Informed Neural Networks (PINNs) for solving the two-dimensional Helmholtz equation in wave scattering problems. The objective is to evaluate the performance of both methods under the same conditions. We solve the Helmholtz equation using BEM and PINNs for the same scattering problem. PINNs are trained by minimizing the residual of the governing equations and boundary conditions with their configuration determined through hyperparameter optimization, while BEM is applied using boundary discretization. Both methods are evaluated in terms of solution accuracy and computation time. We conducted numerical experiments by varying the number of boundary integration points for the BEM and the number of hidden layers and neurons per layer for the PINNs. We performed a hyperparameter tuning to identify an adequate PINN configuration for this problem as a network with 3 hidden layers and 25 neurons per layer, using a learning rate of $10^{-2}$ and a sine activation function. At comparable levels of accuracy, the assembly and solution of the BEM system required a computational time on the order of $10^{-2}$~s, whereas the training time of the PINN was on the order of $10^{2}$~s, corresponding to a difference of approximately four orders of magnitude. However, once trained, the PINN achieved evaluation times on the order of $10^{-2}$~s, which is about two orders of magnitude faster than the evaluation of the BEM solution at interior points. This work establishes a procedure for comparing BEM and PINNs. It also presents a direct comparison between the two methods for the scattering problem. The analysis provides quantitative data on their performance, supporting their use in future research on wave propagation problems and outlining challenges and directions for further investigation.
Updated: 2025-12-15 16:39:42
标题: 使用边界元方法和物理信息神经网络进行波散射数值建模的比较分析
摘要: 本研究比较了边界元方法(BEM)和物理信息神经网络(PINNs)在解决二维亥姆霍兹方程波散射问题中的性能。目标是在相同条件下评估两种方法的性能。我们使用BEM和PINNs解决相同的散射问题的亥姆霍兹方程。PINNs通过最小化控制方程和边界条件的残差来训练,其配置通过超参数优化确定,而BEM则使用边界离散化。两种方法都根据解决方案准确性和计算时间进行评估。我们通过改变BEM的边界积分点数量和PINNs的隐藏层和每层神经元数量进行数值实验。我们进行了超参数调整,确定了适合该问题的PINN配置,即具有3个隐藏层和每层25个神经元的网络,学习速率为$10^{-2}$,正弦激活函数。在相当精度水平下,BEM系统的组装和解决需要大约$10^{-2}$~s的计算时间,而PINN的训练时间大约为$10^{2}$~s,相应地相差约四个数量级。然而,一旦训练完成,PINN的评估时间约为$10^{-2}$~s,比BEM解决方案在内部点的评估快大约两个数量级。这项工作建立了比较BEM和PINNs的程序。它还直接比较了散射问题中两种方法的性能。该分析提供了关于它们性能的定量数据,支持它们在未来波传播问题研究中的应用,并概述了进一步研究的挑战和方向。
更新时间: 2025-12-15 16:39:42
领域: cs.LG
MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph
Large language models with reasoning capabilities have demonstrated impressive performance across a wide range of domains. In clinical applications, a transparent, step-by-step reasoning process provides physicians with strong evidence to support decision-making. While reinforcement learning has effectively enhanced reasoning performance in medical contexts, the clinical reliability of these reasoning processes remains limited because their accuracy and validity are often overlooked during training. To address this gap, we propose MedCEG, a framework that augments medical language models with clinically valid reasoning pathways by explicitly supervising the reasoning process through a Critical Evidence Graph (CEG). We curate a dataset of challenging clinical cases and algorithmically construct a CEG for each sample to represent a high-quality verifiable reasoning pathway. To guide the reasoning process, we introduce a Clinical Reasoning Procedure Reward, which evaluates Node Coverage, Structural Correctness, and Chain Completeness, thereby providing a holistic assessment of reasoning quality. Experimental results show that MedCEG surpasses existing methods in performance while producing clinically valid reasoning chains, representing a solid advancement in reliable medical AI reasoning. The code and models are available at https://github.com/LinjieMu/MedCEG.
Updated: 2025-12-15 16:38:46
标题: MedCEG:利用关键证据图强化可验证的医学推理
摘要: 具有推理能力的大型语言模型已经在各个领域展示出了令人印象深刻的性能。在临床应用中,透明、逐步的推理过程为医生提供了强有力的证据来支持决策。虽然强化学习有效地提升了医学环境中的推理性能,但这些推理过程的临床可靠性仍然有限,因为它们的准确性和有效性在训练过程中经常被忽视。为了填补这一空白,我们提出了MedCEG,这是一个通过明确监督推理过程来增强医学语言模型的框架,通过关键证据图(CEG)来表示临床有效的推理路径。我们整理了一组具有挑战性的临床案例数据集,并为每个样本算法地构建了一个CEG来表示高质量可验证的推理路径。为了指导推理过程,我们引入了临床推理过程奖励,评估了节点覆盖率、结构正确性和链完整性,从而提供了对推理质量的全面评估。实验结果表明,MedCEG在性能上超越了现有方法,同时产生了临床有效的推理链,代表了可靠医学AI推理方面的实质进展。代码和模型可在https://github.com/LinjieMu/MedCEG 上获得。
更新时间: 2025-12-15 16:38:46
领域: cs.AI
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models
Large Language Models (LLMs) have achieved state-of-the-art performance on a broad range of Natural Language Processing (NLP) tasks, including document processing and code generation. Autoregressive Language Models (ARMs), which generate tokens sequentially conditioned on all previous tokens, have been the predominant paradigm for LLMs. While these models have achieved high accuracy across a range of downstream tasks, they exhibit low arithmetic intensity due to the inherent sequential dependency in next-token prediction. Recently, Diffusion Language Models (DLMs) have emerged as a promising alternative architecture. DLMs generate output tokens in parallel, mitigating the limitations of sequential decoding. However, the performance implications of DLMs relative to commonly deployed ARMs are not fully understood. In this work, we present a comprehensive study of the performance characteristics of ARMs and DLMs, combining theoretical analysis with empirical profiling to characterize the trade-offs between these approaches. We show that although DLMs can achieve higher arithmetic intensity than ARMs by leveraging parallelism across token positions, they fail to scale effectively with longer contexts. We then explore block-wise decoding for DLMs, which decouples arithmetic intensity from sequence length and enables better scaling to long contexts (similar to ARMs). We also examine batched inference and find that ARMs exhibit superior throughput as they benefit more from parallelism across sequences in the batch. Finally, we highlight opportunities for accelerating DLM inference, emphasizing that reducing the number of sampling steps is key for open-source DLMs to achieve lower latency relative to ARMs.
Updated: 2025-12-15 16:36:36
标题: 超越下一个标记预测:扩散与自回归语言模型的性能特征化
摘要: 大型语言模型(LLMs)在广泛的自然语言处理(NLP)任务中取得了最先进的性能,包括文档处理和代码生成。自回归语言模型(ARMs)是LLMs的主导范式,它们根据先前的所有标记顺序生成标记。尽管这些模型在一系列下游任务中实现了高准确性,但由于下一个标记预测中固有的顺序依赖性,它们表现出较低的算术强度。最近,扩散语言模型(DLMs)作为一种有前途的替代架构出现。DLMs并行生成输出标记,缓解了序列解码的限制。然而,与常用的ARMs相比,DLMs的性能影响尚不完全了解。在这项工作中,我们对ARMs和DLMs的性能特性进行了全面研究,结合理论分析和经验分析,以表征这些方法之间的权衡。我们发现,尽管DLMs可以通过利用标记位置之间的并行性实现更高的算术强度,但它们在处理更长的上下文时无法有效扩展。然后,我们探索了DLMs的块状解码,它将算术强度与序列长度分离,从而更好地适应长上下文(类似于ARMs)。我们还研究了批处理推理,并发现ARMs在各序列之间的并行性更多,因此具有更高的吞吐量。最后,我们强调了加速DLM推理的机会,强调减少采样步骤对于开源DLMs实现相对于ARMs更低延迟的关键。
更新时间: 2025-12-15 16:36:36
领域: cs.LG,cs.AI,cs.CL
Learning under Distributional Drift: Reproducibility as an Intrinsic Statistical Resource
Statistical learning under distributional drift remains insufficiently characterized: when each observation alters the data-generating law, classical generalization bounds can collapse. We introduce a new statistical primitive, the reproducibility budget $C_T$, which quantifies a system's finite capacity for statistical reproducibility - the extent to which its sampling process can remain governed by a consistent underlying distribution in the presence of both exogenous change and endogenous feedback. Formally, $C_T$ is defined as the cumulative Fisher-Rao path length of the coupled learner-environment evolution, measuring the total distributional motion accumulated during learning. From this construct we derive a drift-feedback generalization bound of order $O(T^{-1/2} + C_T/T)$, and we prove a matching minimax lower bound showing that this rate is minimax-optimal. Consequently, the results establish a reproducibility speed limit: no algorithm can achieve smaller worst-case generalization error than that imposed by the average Fisher-Rao drift rate $C_T/T$ of the data-generating process. The framework situates exogenous drift, adaptive data analysis, and performative prediction within a common geometric structure, with $C_T$ emerging as the intrinsic quantity measuring distributional motion across these settings.
Updated: 2025-12-15 16:34:47
标题: 学习在分布漂移下:可再现性作为一种固有的统计资源
摘要: 分布漂移下的统计学习仍然缺乏充分的描述:当每个观察结果改变数据生成规律时,经典的泛化界限可能会崩溃。我们引入了一个新的统计基元,再现性预算$C_T$,它量化了系统在统计再现性方面的有限能力 - 即在外生变化和内生反馈的情况下,其抽样过程能够保持由一致的基础分布控制的程度。形式上,$C_T$被定义为耦合学习者-环境演化的累积Fisher-Rao路径长度,测量了学习过程中累积的总分布运动。从这个构造中,我们推导出了一个$O(T^{-1/2} + C_T/T)$阶的漂移反馈泛化界限,并证明了一个匹配的极小极大下界,表明这个速率是极小极大最优的。因此,这些结果建立了一个再现性速度极限:没有算法能够实现比数据生成过程的平均Fisher-Rao漂移速率$C_T/T$所施加的最小最坏泛化误差更小。该框架将外生漂移、自适应数据分析和执行性预测置于一个共同的几何结构中,$C_T$在这些设置中测量分布运动的内在数量。
更新时间: 2025-12-15 16:34:47
领域: cs.LG,stat.ML
Defending the Hierarchical Result Models of Precedential Constraint
In recent years, hierarchical case-based-reasoning models of precedential constraint have been proposed. In various papers, Trevor Bench-Capon criticised these models on the grounds that they would give incorrect outcomes in some cases. In particular, the models would not account for the possibility that intermediate factors are established with different strengths by different base-level factors. In this paper we respond to these criticisms for van Woerkom's result-based hierarchical models. We argue that in some examples Bench-Capon seems to interpret intermediate factors as dimensions, and that applying van Woerkom's dimension-based version of the hierarchical result model to these examples avoids Bench-Capon's criticisms.
Updated: 2025-12-15 16:33:33
标题: 捍卫先例约束的层次化结果模型
摘要: 近年来,关于先例约束的分层案例推理模型已经被提出。在各种论文中,Trevor Bench-Capon批评这些模型,认为它们在某些情况下会得出错误的结果。特别是,这些模型不能考虑到中间因素由不同的基本因素以不同的强度建立的可能性。在本文中,我们针对van Woerkom的基于结果的分层模型对这些批评进行了回应。我们认为,在一些例子中,Bench-Capon似乎将中间因素解释为维度,并将van Woerkom的基于维度的分层结果模型应用于这些例子,避免了Bench-Capon的批评。
更新时间: 2025-12-15 16:33:33
领域: cs.AI
Behavior-Aware and Generalizable Defense Against Black-Box Adversarial Attacks for ML-Based IDS
Machine learning based intrusion detection systems are increasingly targeted by black box adversarial attacks, where attackers craft evasive inputs using indirect feedback such as binary outputs or behavioral signals like response time and resource usage. While several defenses have been proposed, including input transformation, adversarial training, and surrogate detection, they often fall short in practice. Most are tailored to specific attack types, require internal model access, or rely on static mechanisms that fail to generalize across evolving attack strategies. Furthermore, defenses such as input transformation can degrade intrusion detection system performance, making them unsuitable for real time deployment. To address these limitations, we propose Adaptive Feature Poisoning, a lightweight and proactive defense mechanism designed specifically for realistic black box scenarios. Adaptive Feature Poisoning assumes that probing can occur silently and continuously, and introduces dynamic and context aware perturbations to selected traffic features, corrupting the attacker feedback loop without impacting detection capabilities. The method leverages traffic profiling, change point detection, and adaptive scaling to selectively perturb features that an attacker is likely exploiting, based on observed deviations. We evaluate Adaptive Feature Poisoning against multiple realistic adversarial attack strategies, including silent probing, transferability based attacks, and decision boundary based attacks. The results demonstrate its ability to confuse attackers, degrade attack effectiveness, and preserve detection performance. By offering a generalizable, attack agnostic, and undetectable defense, Adaptive Feature Poisoning represents a significant step toward practical and robust adversarial resilience in machine learning based intrusion detection systems.
Updated: 2025-12-15 16:29:23
标题: 行为感知和通用防御:针对基于机器学习的入侵检测系统的黑盒对抗攻击
摘要: 基于机器学习的入侵检测系统越来越受到黑盒对抗攻击的针对,攻击者利用间接反馈,如二进制输出或行为信号,制作回避输入。尽管已经提出了几种防御方法,包括输入转换、对抗训练和替代检测,但它们在实践中通常表现不佳。大多数方法都针对特定的攻击类型,需要内部模型访问,或依赖于静态机制,无法泛化到不断变化的攻击策略。此外,诸如输入转换之类的防御方法可能会降低入侵检测系统的性能,使其不适合实时部署。 为了解决这些限制,我们提出了自适应特征毒化,这是一种专门针对现实黑盒场景设计的轻量级和主动防御机制。自适应特征毒化假设探测可以默默发生并持续进行,并引入动态和上下文感知的扰动到选定的流量特征中,破坏攻击者的反馈循环而不影响检测能力。该方法利用流量分析、变点检测和自适应缩放来有选择地扰动攻击者可能正在利用的特征,基于观察到的偏差。 我们对自适应特征毒化进行了评估,针对多种现实对抗攻击策略,包括静默探测、可传递性攻击和决策边界攻击。结果表明它能够混淆攻击者、降低攻击效果,并保持检测性能。通过提供一种可以泛化、攻击无关且不可检测的防御方法,自适应特征毒化代表了机器学习入侵检测系统中实用和强大对抗韧性的重要一步。
更新时间: 2025-12-15 16:29:23
领域: cs.CR,cs.AI
On-Device Continual Learning for Unsupervised Visual Anomaly Detection in Dynamic Manufacturing
In modern manufacturing, Visual Anomaly Detection (VAD) is essential for automated inspection and consistent product quality. Yet, increasingly dynamic and flexible production environments introduce key challenges: First, frequent product changes in small-batch and on-demand manufacturing require rapid model updates. Second, legacy edge hardware lacks the resources to train and run large AI models. Finally, both anomalous and normal training data are often scarce, particularly for newly introduced product variations. We investigate on-device continual learning for unsupervised VAD with localization, extending the PatchCore to incorporate online learning for real-world industrial scenarios. The proposed method leverages a lightweight feature extractor and an incremental coreset update mechanism based on k-center selection, enabling rapid, memory-efficient adaptation from limited data while eliminating costly cloud retraining. Evaluations on an industrial use case are conducted using a testbed designed to emulate flexible production with frequent variant changes in a controlled environment. Our method achieves a 12% AUROC improvement over the baseline, an 80% reduction in memory usage, and faster training compared to batch retraining. These results confirm that our method delivers accurate, resource-efficient, and adaptive VAD suitable for dynamic and smart manufacturing.
Updated: 2025-12-15 16:27:23
标题: 设备端持续学习用于动态制造中无监督视觉异常检测
摘要: 在现代制造业中,视觉异常检测(VAD)对于自动化检验和保持产品质量是至关重要的。然而,越来越动态和灵活的生产环境引入了关键挑战:首先,在小批量和按需制造中频繁的产品更改要求快速更新模型。其次,传统的边缘硬件缺乏资源来训练和运行大型AI模型。最后,异常和正常的训练数据通常是稀缺的,特别是对于新引入的产品变体。我们研究了用于无监督VAD的设备上的持续学习,其中包括定位,扩展了PatchCore以整合在线学习用于真实世界的工业场景。所提出的方法利用了轻量级特征提取器和基于k-center选择的增量coreset更新机制,使得可以从有限数据中快速、高效地适应,同时消除了昂贵的云端重新训练。我们使用一个旨在在受控环境中模拟具有频繁变体更改的灵活生产的测试平台进行工业案例的评估。与基线相比,我们的方法实现了12%的AUROC改进,内存使用量减少了80%,训练速度也比批量重新训练更快。这些结果证实了我们的方法提供了适用于动态和智能制造的准确、资源高效和自适应的VAD。
更新时间: 2025-12-15 16:27:23
领域: cs.LG,cs.CV
SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping
Large language models (LLM) have achieved remarkable performance across a wide range of tasks. However, their substantial parameter sizes pose significant challenges for deployment on edge devices with limited computational and memory resources. Low-rank compression is a promising approach to address this issue, as it reduces both computational and memory costs, making LLM more suitable for resource-constrained environments. Nonetheless, naïve low-rank compression methods require a significant reduction in the retained rank to achieve meaningful memory and computation savings. For a low-rank model, the ranks need to be reduced by more than half to yield efficiency gains. Such aggressive truncation, however, typically results in substantial performance degradation. To address this trade-off, we propose SkipCat, a novel low-rank compression framework that enables the use of higher ranks while achieving the same compression rates. First, we introduce an intra-layer shared low-rank projection method, where multiple matrices that share the same input use a common projection. This reduces redundancy and improves compression efficiency. Second, we propose a block skipping technique that omits computations and memory transfers for selected sub-blocks within the low-rank decomposition. These two techniques jointly enable our compressed model to retain more effective ranks under the same compression budget. Experimental results show that, without any additional fine-tuning, our method outperforms previous low-rank compression approaches by 7% accuracy improvement on zero-shot tasks under the same compression rate. These results highlight the effectiveness of our rank-maximized compression strategy in preserving model performance under tight resource constraints.
Updated: 2025-12-15 16:25:55
标题: SkipCat:通过共享投影和块跳过实现大型语言模型的秩最大化低秩压缩
摘要: 大型语言模型(LLM)在各种任务中取得了显著的性能表现。然而,它们巨大的参数规模对于在计算和内存资源有限的边缘设备上部署提出了重大挑战。低秩压缩是解决这一问题的一种有前途的方法,因为它既降低了计算成本,又减少了内存消耗,使得LLM更适用于资源受限的环境。然而,天真的低秩压缩方法需要显著降低保留的秩才能实现有意义的内存和计算节省。对于低秩模型,秩需要减少一半以上才能实现效率提升。然而,这种激进的截断通常会导致显著的性能下降。为了解决这一折衷,我们提出了SkipCat,一种新颖的低秩压缩框架,它可以在实现相同压缩率的同时使用更高的秩。首先,我们引入了一种层内共享低秩投影方法,其中共享相同输入的多个矩阵使用公共投影。这降低了冗余性并提高了压缩效率。其次,我们提出了一种块跳过技术,可以在低秩分解中省略选定子块的计算和内存传输。这两种技术共同使我们的压缩模型在相同的压缩预算下保留更有效的秩。实验结果显示,在没有任何额外微调的情况下,我们的方法在相同的压缩率下比以前的低秩压缩方法在零-shot任务中提高了7%的准确率。这些结果突显了我们的秩最大化压缩策略在保持模型性能在严格资源约束下的有效性。
更新时间: 2025-12-15 16:25:55
领域: cs.CL,cs.AI
From Zipf's Law to Neural Scaling through Heaps' Law and Hilberg's Hypothesis
We inspect the deductive connection between the neural scaling law and Zipf's law -- two statements discussed in machine learning and quantitative linguistics. The neural scaling law describes how the cross entropy rate of a foundation model -- such as a large language model -- changes with respect to the amount of training tokens, parameters, and compute. By contrast, Zipf's law posits that the distribution of tokens exhibits a power law tail. Whereas similar claims have been made in more specific settings, we show that the neural scaling law is a consequence of Zipf's law under certain broad assumptions that we reveal systematically. The derivation steps are as follows: We derive Heaps' law on the vocabulary growth from Zipf's law, Hilberg's hypothesis on the entropy scaling from Heaps' law, and the neural scaling from Hilberg's hypothesis. We illustrate these inference steps by a toy example of the Santa Fe process that satisfies all the four statistical laws.
Updated: 2025-12-15 16:25:06
标题: 从Zipf定律到神经缩放:通过Heaps定律和Hilberg假设
摘要: 我们检验了神经缩放定律和Zipf定律之间的演绎关系——这两个命题在机器学习和量化语言学中进行了讨论。神经缩放定律描述了基础模型(如大型语言模型)的交叉熵率如何随着训练标记、参数和计算量的变化而变化。相比之下,Zipf定律认为标记的分布呈现幂律尾部。尽管在更具体的设置中曾提出类似的主张,我们表明神经缩放定律是Zipf定律在我们系统性揭示的某些广泛假设下的结果。推导步骤如下:我们从Zipf定律推导出词汇增长的Heaps定律,从Heaps定律推导出熵缩放的Hilberg假设,从Hilberg假设推导出神经缩放。我们通过一个满足所有四个统计定律的圣菲过程的玩具示例来说明这些推理步骤。
更新时间: 2025-12-15 16:25:06
领域: cs.IT,cs.LG,math.ST
AI Agents with Decentralized Identifiers and Verifiable Credentials
A fundamental limitation of current LLM-based AI agents is their inability to build differentiated trust among each other at the onset of an agent-to-agent dialogue. However, autonomous and interoperable trust establishment becomes essential once agents start to operate beyond isolated environments and engage in dialogues across individual or organizational boundaries. A promising way to fill this gap in Agentic AI is to equip agents with long-lived digital identities and introduce tamper-proof and flexible identity-bound attestations of agents, provisioned by commonly trusted third parties and designed for cross-domain verifiability. This article presents a conceptual framework and a prototypical multi-agent system, where each agent is endowed with a self-sovereign digital identity. It combines a unique and ledger-anchored W3C Decentralized Identifier (DID) of an agent with a set of third-party issued W3C Verifiable Credentials (VCs). This enables agents at the start of a dialog to prove ownership of their self-controlled DIDs for authentication purposes and to establish various cross-domain trust relationships through the spontaneous exchange of their self-hosted DID-bound VCs. A comprehensive evaluation of the prototypical implementation demonstrates technical feasibility but also reveals limitations once an agent's LLM is in sole charge to control the respective security procedures.
Updated: 2025-12-15 16:24:03
标题: 具有去中心化标识和可验证凭证的人工智能代理
摘要: 当前基于LLM的人工智能代理的一个基本限制是它们无法在代理之间的对话开始阶段建立差异化的信任。然而,一旦代理开始在孤立环境之外运行并跨越个人或组织边界进行对话时,自主和可互操作的信任建立变得至关重要。填补Agentic AI中这一空白的一个有希望的途径是为代理配备长期数字身份,并引入由普遍信任的第三方提供并设计用于跨领域可验证性的防篡改和灵活的身份绑定证明。本文提出了一个概念框架和一个原型化的多代理系统,其中每个代理都具有自主数字身份。它将代理的独特且与账本锚定的W3C分散标识符(DID)与一组第三方颁发的W3C可验证凭证(VCs)结合起来。这使得代理在对话开始时可以证明对其自主控制的DIDs的所有权以进行身份验证,并通过自主交换自托管的DID绑定VCs来建立各种跨领域信任关系。对原型实现的全面评估表明技术可行性,但也揭示了一旦代理的LLM独自负责控制相应的安全程序时的限制。
更新时间: 2025-12-15 16:24:03
领域: cs.CR,cs.MA
Real-Time AI-Driven Milling Digital Twin Towards Extreme Low-Latency
Digital twin (DT) enables smart manufacturing by leveraging real-time data, AI models, and intelligent control systems. This paper presents a state-of-the-art analysis on the emerging field of DTs in the context of milling. The critical aspects of DT are explored through the lens of virtual models of physical milling, data flow from physical milling to virtual model, and feedback from virtual model to physical milling. Live data streaming protocols and virtual modeling methods are highlighted. A case study showcases the transformative capability of a real-time machine learning-driven live DT of tool-work contact in a milling process. Future research directions are outlined to achieve the goals of Industry 4.0 and beyond.
Updated: 2025-12-15 16:18:36
标题: 实时AI驱动的铣削数字孪生技术朝极低延迟方向发展
摘要: 数字孪生(DT)通过利用实时数据,人工智能模型和智能控制系统实现智能制造。本文在铣削领域中对数字孪生的新兴领域进行了最新分析。通过虚拟物理铣削模型、从物理铣削到虚拟模型的数据流以及从虚拟模型到物理铣削的反馈,探讨了数字孪生的关键方面。强调了实时数据流传输协议和虚拟建模方法。一项案例研究展示了实时机器学习驱动的铣削过程中工具-工件接触的实时数字孪生的转变能力。未来的研究方向被概述,以实现工业4.0及更高目标。
更新时间: 2025-12-15 16:18:36
领域: eess.SY,cs.LG
neuralFOMO: Can LLMs Handle Being Second Best? Measuring Envy-Like Preferences in Multi-Agent Settings
Envy is a common human behavior that shapes competitiveness and can alter outcomes in team settings. As large language models (LLMs) increasingly act on behalf of humans in collaborative and competitive workflows, there is a pressing need to evaluate whether and under what conditions they exhibit envy-like preferences. In this paper, we test whether LLMs show envy-like behavior toward each other. We considered two scenarios: (1) A point allocation game that tests whether a model tries to win over its peer. (2) A workplace setting observing behaviour when recognition is unfair. Our findings reveal consistent evidence of envy-like patterns in certain LLMs, with large variation across models and contexts. For instance, GPT-5-mini and Claude-3.7-Sonnet show a clear tendency to pull down the peer model to equalize outcomes, whereas Mistral-Small-3.2-24B instead focuses on maximizing its own individual gains. These results highlight the need to consider competitive dispositions as a safety and design factor in LLM-based multi-agent systems.
Updated: 2025-12-15 16:17:12
标题: 神经FOMO:LLMs能否承受次优?在多智能体环境中衡量类嫉妒偏好
摘要: 嫉妒是一种常见的人类行为,塑造了竞争力,并且可以改变团队环境中的结果。随着大型语言模型(LLMs)在协作和竞争工作流程中越来越多地代表人类行动,迫切需要评估它们是否以及在什么条件下表现出类似嫉妒的偏好。在本文中,我们测试了LLMs是否会对彼此展现类似嫉妒的行为。我们考虑了两种情景:(1)一个测试模型是否试图击败同行的点分配游戏。 (2)在工作场所观察行为时,当认可不公平时。我们的研究结果显示,在某些LLMs中存在一致的类似嫉妒的模式,不同模型和环境之间存在很大的变化。例如,GPT-5-mini和Claude-3.7-Sonnet显示出明显的拉低同行模型以平等化结果的倾向,而Mistral-Small-3.2-24B则专注于最大化自己的个人收益。这些结果强调了在基于LLM的多智能体系统中考虑竞争性倾向作为安全和设计因素的必要性。
更新时间: 2025-12-15 16:17:12
领域: cs.AI,cs.CL,cs.CY
Element-wise Modulation of Random Matrices for Efficient Neural Layers
Fully connected layers are a primary source of memory and computational overhead in deep neural networks due to their dense, often redundant parameterization. While various compression techniques exist, they frequently introduce complex engineering trade-offs or degrade model performance. We propose the Parametrized Random Projection (PRP) layer, a novel approach that decouples feature mixing from adaptation by utilizing a fixed random matrix modulated by lightweight, learnable element-wise parameters. This architecture drastically reduces the trainable parameter count to a linear scale while retaining reliable accuracy across various benchmarks. The design serves as a stable, computationally efficient solution for architectural scaling and deployment in resource-limited settings.
Updated: 2025-12-15 16:16:53
标题: 逐元素调制随机矩阵以实现高效神经层
摘要: 全连接层是深度神经网络中的主要内存和计算开销来源,因为它们的参数化通常是密集的,而且往往是冗余的。虽然存在各种压缩技术,但它们经常引入复杂的工程折衷或降低模型性能。我们提出了Parametrized Random Projection(PRP)层,这是一种新颖的方法,通过利用由轻量级可学习元素参数调制的固定随机矩阵,将特征混合与适应分离开来。该架构大大减少了可训练参数数量,同时在各种基准测试中保持了可靠的准确性。该设计作为一种稳定的、计算效率高的解决方案,可用于资源有限环境中的架构扩展和部署。
更新时间: 2025-12-15 16:16:53
领域: cs.LG
How PARTs assemble into wholes: Learning the relative composition of images
The composition of objects and their parts, along with object-object positional relationships, provides a rich source of information for representation learning. Hence, spatial-aware pretext tasks have been actively explored in self-supervised learning. Existing works commonly start from a grid structure, where the goal of the pretext task involves predicting the absolute position index of patches within a fixed grid. However, grid-based approaches fall short of capturing the fluid and continuous nature of real-world object compositions. We introduce PART, a self-supervised learning approach that leverages continuous relative transformations between off-grid patches to overcome these limitations. By modeling how parts relate to each other in a continuous space, PART learns the relative composition of images-an off-grid structural relative positioning that is less tied to absolute appearance and can remain coherent under variations such as partial visibility or stylistic changes. In tasks requiring precise spatial understanding such as object detection and time series prediction, PART outperforms grid-based methods like MAE and DropPos, while maintaining competitive performance on global classification tasks. By breaking free from grid constraints, PART opens up a new trajectory for universal self-supervised pretraining across diverse datatypes-from images to EEG signals-with potential in medical imaging, video, and audio.
Updated: 2025-12-15 16:15:57
标题: PARTs如何组装成整体:学习图像的相对组成
摘要: 物体及其部分的组成,以及物体之间的位置关系,为表示学习提供了丰富的信息源。因此,在自监督学习中,空间感知的预设任务一直受到积极探索。现有研究通常从网格结构开始,其中预设任务的目标涉及预测在固定网格内各个补丁的绝对位置索引。然而,基于网格的方法无法捕捉现实世界物体组合的流动和连续性特性。我们引入了PART,这是一种自监督学习方法,利用网格之外补丁之间的连续相对变换来克服这些限制。通过建模部分在连续空间中的关系,PART学习图像的相对组成-一种不受绝对外观限制的网格结构相对定位,可以在部分可见性或风格变化等变化下保持连贯性。在需要精确空间理解的任务中,如目标检测和时间序列预测,PART优于MAE和DropPos等基于网格的方法,同时在全局分类任务上保持竞争性能。通过摆脱网格约束,PART开辟了一条新的路径,可以跨多种数据类型-从图像到EEG信号-进行通用自监督预训练,在医学成像、视频和音频方面具有潜力。
更新时间: 2025-12-15 16:15:57
领域: cs.CV,cs.AI,cs.LG
Non-Resolution Reasoning: A Framework for Preserving Semantic Ambiguity in Language Models
Premature semantic collapse -- the forced early commitment to a single meaning -- remains a core architectural limitation of current language models. Softmax-driven competition and greedy decoding cause models to discard valid interpretations before sufficient context is available, resulting in brittle reasoning and context failures. We introduce Non-Resolution Reasoning (NRR), a general computational framework that preserves semantic ambiguity during inference and performs resolution only when explicitly required. NRR integrates three components: (1) Multi-Vector Embeddings that maintain multiple viable interpretations per token, (2) Non-Collapsing Attention that prevents winner-take-all dynamics across layers, and (3) Contextual Identity Tracking (CIT), which assigns context-specific identities to recurring entities (e.g., distinguishing "Dr. Smith the cardiologist" from "Dr. Smith the researcher"). These mechanisms are unified by an external Resolution Operator $ρ$ that makes semantic commitment explicit, controllable, and task-dependent. Unlike standard architectures, NRR separates representation from resolution, allowing a single model to shift between creative, factual, and ambiguity-preserving reasoning without retraining. A synthetic evaluation demonstrates NRR's ability to preserve ambiguity and track context: CIT-enhanced models achieve 90.9% accuracy on out-of-distribution identity-shift tasks, compared to 9.1% for transformer baselines. NRR provides a principled alternative to premature collapse, reframing ambiguity as an explicit representational state rather than a failure mode. The question is not whether AI should resolve ambiguity, but when, how, and under whose control.
Updated: 2025-12-15 16:14:32
标题: 非解释推理:保留语言模型中语义歧义的框架
摘要: 过早的语义崩溃——强迫性地提前承诺单一含义——仍然是当前语言模型的核心架构限制。通过softmax驱动的竞争和贪婪解码导致模型在足够的上下文可用之前丢弃有效的解释,导致脆弱的推理和上下文失败。我们引入非分辨推理(NRR),这是一种保留语义歧义并仅在明确需要时执行解析的通用计算框架。NRR集成了三个组件:(1)多向量嵌入,维护每个标记的多个可行解释,(2)非崩溃注意力,防止在层间出现胜者通吃的动态,以及(3)上下文身份追踪(CIT),为重复实体分配上下文特定的身份(例如,区分“心脏病学家史密斯医生”和“研究员史密斯医生”)。这些机制由外部解析操作符$ρ$统一,使语义承诺明确、可控和任务相关。与标准架构不同,NRR将表示与解析分开,使单个模型能够在不需重新训练的情况下在创造性、事实性和保留歧义的推理之间切换。合成评估表明NRR能够保留歧义并跟踪上下文:增强CIT的模型在分布外的身份转换任务上达到90.9%的准确率,而变压器基线只有9.1%。NRR提供了一种合理的替代方案,可以避免过早崩溃,将歧义重新定义为明确的表示状态,而不是一种失败模式。问题并不在于AI应该解决歧义,而是何时、如何以及在谁的控制下解决。
更新时间: 2025-12-15 16:14:32
领域: cs.CL,cs.AI,cs.LG
An upper bound of the silhouette validation metric for clustering
The silhouette coefficient quantifies, for each observation, the balance between within-cluster cohesion and between-cluster separation, taking values in [-1, 1]. The average silhouette width (ASW ) is a widely used internal measure of clustering quality, with higher values indicating more cohesive and well-separated clusters. However, the dataset-specific maximum of ASW is typically unknown, and the standard upper limit of 1 is rarely attainable. In this work, we derive for each data point a sharp upper bound on its silhouette width and aggregate these to obtain a canonical upper bound on the ASW. This bound-often substantially below 1-enhances the interpretability of empirical ASW values by providing guidance on how close a given clustering result is to the best possible outcome for that dataset. We evaluate the usefulness of the upper bound on a variety of datasets and conclude that it can meaningfully enrich cluster quality evaluation, but its practical relevance depends on the given dataset. Finally, we extend the framework to establish an upper bound of the macro-averaged silhouette.
Updated: 2025-12-15 16:10:19
标题: 聚类的轮廓验证指标的上限界限
摘要: 轮廓系数量化了每个观测值的簇内内聚性和簇间分离性之间的平衡,取值范围为[-1,1]。平均轮廓宽度(ASW)是一种广泛使用的聚类质量内部度量,较高的值表示更具凝聚性和良好分离的簇。然而,ASW的数据集特定最大值通常是未知的,标准上限1很少能达到。在这项工作中,我们为每个数据点推导出其轮廓宽度的尖锐上限,并将这些聚合以获得ASW的规范上限。这个上限通常远低于1,通过提供关于给定聚类结果与该数据集的最佳可能结果有多接近的指导,增强了实际ASW值的可解释性。我们在各种数据集上评估了上限的实用性,并得出结论,它可以有效地丰富簇质量评估,但其实际相关性取决于给定的数据集。最后,我们扩展了该框架以建立宏平均轮廓的上限。
更新时间: 2025-12-15 16:10:19
领域: cs.LG
Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization
Shampoo and its efficient variant, SOAP, employ structured second-moment estimations and have shown strong performance for training neural networks (NNs). In practice, however, Shampoo typically requires step-size grafting with Adam to be competitive, and SOAP mitigates this by applying Adam in Shampoo's eigenbasis -- at the cost of additional memory overhead from Adam in both methods. Prior analyses have largely relied on the Frobenius norm to motivate these estimation schemes. We instead recast their estimation procedures as covariance estimation under Kullback-Leibler (KL) divergence minimization, revealing a previously overlooked theoretical limitation and motivating principled redesigns. Building on this perspective, we develop $\textbf{KL-Shampoo}$ and $\textbf{KL-SOAP}$, practical schemes that match or exceed the performance of Shampoo and SOAP in NN pre-training while achieving SOAP-level per-iteration runtime. Notably, KL-Shampoo does not rely on Adam to attain competitive performance, eliminating the memory overhead introduced by Adam. Across our experiments, KL-Shampoo consistently outperforms SOAP, Shampoo, and even KL-SOAP, establishing the KL-based approach as a promising foundation for designing structured methods in NN optimization.
Updated: 2025-12-15 16:00:01
标题: 通过Kullback-Leibler最小化理解和改进洗发水和肥皂
摘要: 洗发水及其高效变种SOAP利用结构化的二阶矩估计,在训练神经网络(NNs)时表现出强大的性能。然而,在实践中,洗发水通常需要与Adam结合使用才能保持竞争力,而SOAP通过在洗发水的特征值基上应用Adam来减轻这一问题,但同时会增加Adam在两种方法中的内存开销。先前的分析主要依赖于Frobenius范数来推动这些估计方案。我们将它们的估计程序重新构建为Kullback-Leibler(KL)散度最小化下的协方差估计,揭示了以前被忽视的理论限制,并促使原则性的重新设计。基于这种观点,我们开发了KL-Shampoo和KL-SOAP,这些实用方案在NN预训练中与或超过了Shampoo和SOAP的性能,同时实现了与SOAP相当的每次迭代的运行时间。值得注意的是,KL-Shampoo不依赖于Adam来获得竞争性能,消除了Adam引入的内存开销。在我们的实验中,KL-Shampoo始终优于SOAP、Shampoo,甚至KL-SOAP,确立了基于KL的方法作为设计NN优化中结构化方法的有前途的基础。
更新时间: 2025-12-15 16:00:01
领域: stat.ML,cs.LG
DP-EMAR: A Differentially Private Framework for Autonomous Model Weight Repair in Federated IoT Systems
Federated Learning (FL) enables decentralized model training without sharing raw data, but model weight distortion remains a major challenge in resource constrained IoT networks. In multi tier Federated IoT (Fed-IoT) systems, unstable connectivity and adversarial interference can silently alter transmitted parameters, degrading convergence. We propose DP-EMAR, a differentially private, error model based autonomous repair framework that detects and reconstructs transmission induced distortions during FL aggregation. DP-EMAR estimates corruption patterns and applies adaptive correction before privacy noise is added, enabling reliable in network repair without violating confidentiality. By integrating Differential Privacy (DP) with Secure Aggregation (SA), the framework distinguishes DP noise from genuine transmission errors. Experiments on heterogeneous IoT sensor and graph datasets show that DP-EMAR preserves convergence stability and maintains near baseline performance under communication corruption while ensuring strict (epsilon, delta)-DP guarantees. The framework enhances robustness, communication efficiency, and trust in privacy preserving Federated IoT learning.
Updated: 2025-12-15 15:56:58
标题: DP-EMAR:联邦物联网系统中自主模型权重修复的差分隐私框架
摘要: Federated Learning(FL)实现了在资源受限的物联网(IoT)网络中进行分散模型训练而无需共享原始数据,但模型权重失真仍然是一个主要挑战。在多层级联的IoT(Fed-IoT)系统中,不稳定的连接和敌对干扰可能会悄无声息地改变传输参数,降低收敛性。我们提出了DP-EMAR,一个基于差分隐私和误差模型的自主修复框架,可以检测并重建FL聚合过程中引起的失真。DP-EMAR估计损坏模式并在添加隐私噪音之前应用自适应校正,从而实现可靠的网络内修复而不违反机密性。通过将差分隐私(DP)与安全聚合(SA)结合,该框架区分DP噪声和真实传输错误。对异构IoT传感器和图数据集的实验表明,DP-EMAR在通信失真下保持了收敛稳定性并保持了接近基准性能,同时确保了严格的(ε,δ)-DP保证。该框架增强了在隐私保护的联合IoT学习中的鲁棒性、通信效率和信任。
更新时间: 2025-12-15 15:56:58
领域: cs.LG,math.NA
SSAS: Cross-subject EEG-based Emotion Recognition through Source Selection with Adversarial Strategy
Electroencephalographic (EEG) signals have long been applied in the field of affective brain-computer interfaces (aBCIs). Cross-subject EEG-based emotion recognition has demonstrated significant potential in practical applications due to its suitability across diverse people. However, most studies on cross-subject EEG-based emotion recognition neglect the presence of inter-individual variability and negative transfer phenomena during model training. To address this issue, a cross-subject EEG-based emotion recognition through source selection with adversarial strategy is introduced in this paper. The proposed method comprises two modules: the source selection network (SS) and the adversarial strategies network (AS). The SS uses domain labels to reverse-engineer the training process of domain adaptation. Its key idea is to disrupt class separability and magnify inter-domain differences, thereby raising the classification difficulty and forcing the model to learn domain-invariant yet emotion-relevant representations. The AS gets the source domain selection results and the pretrained domain discriminators from SS. The pretrained domain discriminators compute a novel loss aimed at enhancing the performance of domain classification during adversarial training, ensuring the balance of adversarial strategies. This paper provides theoretical insights into the proposed method and achieves outstanding performance on two EEG-based emotion datasets, SEED and SEED-IV. The code can be found at https://github.com/liuyici/SSAS.
Updated: 2025-12-15 15:56:04
标题: SSAS:通过对抗策略进行源选择的跨学科基于EEG的情绪识别
摘要: 脑电图(EEG)信号长期以来一直被应用于情感脑-计算机界面(aBCIs)领域。跨主体基于EEG的情绪识别在实际应用中表现出显著潜力,因为它适用于不同人群。然而,大多数关于跨主体基于EEG的情绪识别的研究忽视了模型训练过程中的个体间变异性和负迁移现象。为了解决这个问题,本文介绍了一种通过源选择和对抗策略进行跨主体基于EEG的情绪识别方法。该提出的方法包括两个模块:源选择网络(SS)和对抗策略网络(AS)。SS使用领域标签来逆向工程领域适应的训练过程。其关键思想是破坏类别可分性和放大域间差异,从而提高分类难度,迫使模型学习域不变但与情绪相关的表示。AS获取源域选择结果和SS中预训练的领域判别器。预训练的领域判别器计算一个旨在增强在对抗训练期间领域分类性能的新型损失,确保对抗策略的平衡。本文提供了对所提出方法的理论见解,并在两个基于EEG的情绪数据集SEED和SEED-IV上取得了出色的表现。源代码可以在https://github.com/liuyici/SSAS找到。
更新时间: 2025-12-15 15:56:04
领域: cs.LG,cs.AI,eess.SP
Deep-ER: Deep Learning ECCENTRIC Reconstruction for fast high-resolution neurometabolic imaging
Introduction: Altered neurometabolism is an important pathological mechanism in many neurological diseases and brain cancer, which can be mapped non-invasively by Magnetic Resonance Spectroscopic Imaging (MRSI). Advanced MRSI using non-cartesian compressed-sense acquisition enables fast high-resolution metabolic imaging but has lengthy reconstruction times that limits throughput and needs expert user interaction. Here, we present a robust and efficient Deep Learning reconstruction to obtain high-quality metabolic maps. Methods: Fast high-resolution whole-brain metabolic imaging was performed at 3.4 mm$^3$ isotropic resolution with acquisition times between 4:11-9:21 min:s using ECCENTRIC pulse sequence on a 7T MRI scanner. Data were acquired in a high-resolution phantom and 27 human participants, including 22 healthy volunteers and 5 glioma patients. A deep neural network using recurring interlaced convolutional layers with joint dual-space feature representation was developed for deep learning ECCENTRIC reconstruction (Deep-ER). 21 subjects were used for training and 6 subjects for testing. Deep-ER performance was compared to conventional iterative Total Generalized Variation reconstruction using image and spectral quality metrics. Results: Deep-ER demonstrated 600-fold faster reconstruction than conventional methods, providing improved spatial-spectral quality and metabolite quantification with 12%-45% (P<0.05) higher signal-to-noise and 8%-50% (P<0.05) smaller Cramer-Rao lower bounds. Metabolic images clearly visualize glioma tumor heterogeneity and boundary. Conclusion: Deep-ER provides efficient and robust reconstruction for sparse-sampled MRSI. The accelerated acquisition-reconstruction MRSI is compatible with high-throughput imaging workflow. It is expected that such improved performance will facilitate basic and clinical MRSI applications.
Updated: 2025-12-15 15:55:07
标题: Deep-ER:用于快速高分辨率神经代谢成像的深度学习偏心重建
摘要: 简介:改变的神经代谢是许多神经系统疾病和脑癌的重要病理机制,可以通过磁共振波谱成像(MRSI)进行非侵入性映射。使用非笛卡尔压缩感知采集的高级MRSI可以实现快速高分辨率的代谢成像,但需要长时间的重建时间,限制了吞吐量并需要专家用户交互。在这里,我们提出了一种稳健高效的深度学习重建方法,以获得高质量的代谢图像。 方法:在7T磁共振成像仪上使用ECCENTRIC脉冲序列进行快速高分辨率的全脑代谢成像,分辨率为3.4 mm$^3$,采集时间在4:11-9:21分钟之间。数据在一个高分辨率幻影和27名参与者中进行了采集,包括22名健康志愿者和5名胶质瘤患者。开发了一个使用交替卷积层和联合双空间特征表示的深度神经网络,用于深度学习ECCENTRIC重建(Deep-ER)。21名受试者用于训练,6名受试者用于测试。将Deep-ER的性能与传统的迭代总广义变差重建方法进行比较,使用图像和光谱质量指标。 结果:Deep-ER的重建速度比传统方法快600倍,提供了改善的空间-光谱质量和代谢物定量,信噪比提高了12%-45%(P<0.05),Cramer-Rao下界减小了8%-50%(P<0.05)。代谢图像清晰地显示了胶质瘤的异质性和边界。 结论:Deep-ER为稀疏采样的MRSI提供了高效稳健的重建。加速采集-重建MRSI与高吞吐量成像工作流兼容。预计这种改进的性能将促进基础和临床MRSI应用。
更新时间: 2025-12-15 15:55:07
领域: eess.IV,cs.LG
QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain
We tackle the problem of quantifying the number of objects by a generative text-to-image model. Rather than retraining such a model for each new image domain of interest, which leads to high computational costs and limited scalability, we are the first to consider this problem from a domain-agnostic perspective. We propose QUOTA, an optimization framework for text-to-image models that enables effective object quantification across unseen domains without retraining. It leverages a dual-loop meta-learning strategy to optimize a domain-invariant prompt. Further, by integrating prompt learning with learnable counting and domain tokens, our method captures stylistic variations and maintains accuracy, even for object classes not encountered during training. For evaluation, we adopt a new benchmark specifically designed for object quantification in domain generalization, enabling rigorous assessment of object quantification accuracy and adaptability across unseen domains in text-to-image generation. Extensive experiments demonstrate that QUOTA outperforms conventional models in both object quantification accuracy and semantic consistency, setting a new benchmark for efficient and scalable text-to-image generation for any domain.
Updated: 2025-12-15 15:49:38
标题: 配额:使用文本到图像模型量化任何领域的对象
摘要: 我们研究了通过生成式文本到图像模型来量化物体数量的问题。与重新为每个感兴趣的新图像领域重新训练模型相比,这会导致高计算成本和有限的可扩展性,我们是第一个从领域不可知的角度考虑这个问题的。我们提出了QUOTA,一个针对文本到图像模型的优化框架,可以在不重新训练的情况下实现跨未知领域的有效物体量化。它利用双循环元学习策略来优化一个领域不变的提示。此外,通过将提示学习与可学习的计数和领域标记相结合,我们的方法捕捉了风格变化,并保持了准确性,即使在训练期间未遇到的物体类别也是如此。为了评估,我们采用了一个专门设计用于领域泛化中物体量化的新基准,可以严格评估文本到图像生成中跨未知领域的物体量化准确性和适应性。大量实验证明,QUOTA在物体量化准确性和语义一致性方面优于传统模型,为任何领域的高效可扩展文本到图像生成设立了新的基准。
更新时间: 2025-12-15 15:49:38
领域: cs.CV,cs.LG
XNNTab -- Interpretable Neural Networks for Tabular Data using Sparse Autoencoders
In data-driven applications relying on tabular data, where interpretability is key, machine learning models such as decision trees and linear regression are applied. Although neural networks can provide higher predictive performance, they are not used because of their blackbox nature. In this work, we present XNNTab, a neural architecture that combines the expressiveness of neural networks and interpretability. XNNTab first learns highly non-linear feature representations, which are decomposed into monosemantic features using a sparse autoencoder (SAE). These features are then assigned human-interpretable concepts, making the overall model prediction intrinsically interpretable. XNNTab outperforms interpretable predictive models, and achieves comparable performance to its non-interpretable counterparts.
Updated: 2025-12-15 15:39:59
标题: XNNTab -- 使用稀疏自编码器的表格数据可解释神经网络
摘要: 在依赖表格数据的数据驱动应用程序中,解释性至关重要,因此会应用决策树和线性回归等机器学习模型。尽管神经网络可以提供更高的预测性能,但由于其黑盒特性,很少被使用。在这项工作中,我们提出了一种名为XNNTab的神经架构,结合了神经网络的表现力和解释性。XNNTab首先学习高度非线性的特征表示,然后使用稀疏自编码器(SAE)将其分解为单语义特征。这些特征随后被赋予人类可解释的概念,使整体模型预测具有内在的解释性。XNNTab优于可解释的预测模型,并在性能上达到与其不可解释的对应物相当的水平。
更新时间: 2025-12-15 15:39:59
领域: cs.LG
From User Interface to Agent Interface: Efficiency Optimization of UI Representations for LLM Agents
While Large Language Model (LLM) agents show great potential for automated UI navigation such as automated UI testing and AI assistants, their efficiency has been largely overlooked. Our motivating study reveals that inefficient UI representation creates a critical performance bottleneck. However, UI representation optimization, formulated as the task of automatically generating programs that transform UI representations, faces two unique challenges. First, the lack of Boolean oracles, which traditional program synthesis uses to decisively validate semantic correctness, poses a fundamental challenge to co-optimization of token efficiency and completeness. Second, the need to process large, complex UI trees as input while generating long, compositional transformation programs, making the search space vast and error-prone. Toward addressing the preceding limitations, we present UIFormer, the first automated optimization framework that synthesizes UI transformation programs by conducting constraint-based optimization with structured decomposition of the complex synthesis task. First, UIFormer restricts the program space using a domain-specific language (DSL) that captures UI-specific operations. Second, UIFormer conducts LLM-based iterative refinement with correctness and efficiency rewards, providing guidance for achieving the efficiency-completeness co-optimization. UIFormer operates as a lightweight plugin that applies transformation programs for seamless integration with existing LLM agents, requiring minimal modifications to their core logic. Evaluations across three UI navigation benchmarks spanning Android and Web platforms with five LLMs demonstrate that UIFormer achieves 48.7% to 55.8% token reduction with minimal runtime overhead while maintaining or improving agent performance. Real-world industry deployment at WeChat further validates the practical impact of UIFormer.
Updated: 2025-12-15 15:34:06
标题: 从用户界面到代理界面:LLM代理的UI表现效率优化
摘要: 大型语言模型(LLM)代理显示出在自动化用户界面(UI)导航方面的巨大潜力,如自动化UI测试和人工智能助手,但它们的效率往往被忽视。我们的研究揭示了低效的UI表示会造成关键性能瓶颈。然而,UI表示优化被定义为自动生成转换UI表示的程序的任务,面临两个独特挑战。首先,传统程序合成使用布尔预言来确切验证语义正确性,缺乏布尔预言对于令牌效率和完整性的协同优化构成了一个基本挑战。其次,在处理大型、复杂的UI树作为输入时生成长、组合变换程序,使得搜索空间庞大且容易出错。为了解决上述限制,我们提出了UIFormer,这是第一个通过对复杂合成任务进行结构分解进行约束优化的自动化优化框架,UIFormer使用领域特定语言(DSL)限制程序空间,捕获UI特定操作。其次,UIFormer通过正确性和效率奖励进行基于LLM的迭代优化,为实现效率-完整性协同优化提供指导。UIFormer作为一个轻量级插件运行,应用转换程序与现有LLM代理无缝集成,对其核心逻辑需要最少的修改。通过跨越Android和Web平台的三个UI导航基准以及五个LLM的评估,UIFormer实现了48.7%至55.8%的令牌减少,并保持或改进代理性能,同时带来最小的运行时开销。在微信进行的真实行业部署进一步验证了UIFormer的实际影响。
更新时间: 2025-12-15 15:34:06
领域: cs.SE,cs.AI
RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs
The Structure Gap between probabilistic LLM generation and deterministic schema requirements hinders automated workflows. We propose RL-Struct, a lightweight framework using Gradient Regularized Policy Optimization (GRPO) with a hierarchical reward function to align LLMs with structural constraints. This approach eliminates the critic network, reducing peak VRAM by 38% compared to PPO. On complex JSON tasks, RL-Struct achieves 89.7% structural accuracy and 92.1% validity, significantly outperforming SFT and zero-shot baselines. We also report an emergent curriculum--a self-organized learning process where the model prioritizes syntax before semantics. Our model is publicly available at https://huggingface.co/Freakz3z/Qwen-JSON.
Updated: 2025-12-15 15:32:09
标题: RL-Struct:一种轻量级强化学习框架,用于在LLMs中获得可靠的结构化输出
摘要: 概率LLM生成和确定性模式要求之间的结构差距阻碍了自动化工作流程。我们提出了RL-Struct,这是一个轻量级框架,使用梯度正则化策略优化(GRPO)和分层奖励函数来使LLMs与结构约束保持一致。这种方法消除了评论网络,与PPO相比,将峰值VRAM降低了38%。在复杂的JSON任务中,RL-Struct实现了89.7%的结构准确性和92.1%的有效性,明显优于SFT和零-shot基线。我们还报告了一种新兴的课程——一种自组织学习过程,其中模型优先考虑语法而不是语义。我们的模型可以在https://huggingface.co/Freakz3z/Qwen-JSON 上公开获取。
更新时间: 2025-12-15 15:32:09
领域: cs.AI,cs.LG
On the failure of ReLU activation for physics-informed machine learning
Physics-informed machine learning uses governing ordinary and/or partial differential equations to train neural networks to represent the solution field. Like any machine learning problem, the choice of activation function influences the characteristics and performance of the solution obtained from physics-informed training. Several studies have compared common activation functions on benchmark differential equations, and have unanimously found that the rectified linear unit (ReLU) is outperformed by competitors such as the sigmoid, hyperbolic tangent, and swish activation functions. In this work, we diagnose the poor performance of ReLU on physics-informed machine learning problems. While it is well-known that the piecewise linear form of ReLU prevents it from being used on second-order differential equations, we show that ReLU fails even on variational problems involving only first derivatives. We identify the cause of this failure as second derivatives of the activation, which are taken not in the formulation of the loss, but in the process of training. Namely, we show that automatic differentiation in PyTorch fails to characterize derivatives of discontinuous fields, which causes the gradient of the physics-informed loss to be mis-specified, thus explaining the poor performance of ReLU.
Updated: 2025-12-15 15:22:16
标题: 关于ReLU激活函数在物理信息机器学习中的失败
摘要: 物理学知识驱动的机器学习利用控制常微分方程和/或偏微分方程来训练神经网络,以表示解场。与任何机器学习问题一样,激活函数的选择会影响从物理知识训练中获得的解的特性和性能。多项研究已经比较了基准微分方程上常见激活函数的表现,并一致发现修正线性单元(ReLU)在如S型、双曲正切和swish等激活函数的竞争对手面前表现不佳。在这项工作中,我们诊断了ReLU在物理知识驱动的机器学习问题上性能不佳的原因。虽然众所周知,ReLU的分段线性形式阻止了其在二阶微分方程上的使用,但我们发现ReLU甚至在只涉及一阶导数的变分问题上也表现不佳。我们确定了这种失败的原因是激活的二阶导数,在损失的制定过程中并未考虑,而是在训练过程中考虑。换句话说,我们发现PyTorch中的自动微分无法准确表征不连续场的导数,导致物理知识驱动的损失梯度被错误指定,从而解释了ReLU的性能不佳。
更新时间: 2025-12-15 15:22:16
领域: cs.LG
Credal Prediction based on Relative Likelihood
Predictions in the form of sets of probability distributions, so-called credal sets, provide a suitable means to represent a learner's epistemic uncertainty. In this paper, we propose a theoretically grounded approach to credal prediction based on the statistical notion of relative likelihood: The target of prediction is the set of all (conditional) probability distributions produced by the collection of plausible models, namely those models whose relative likelihood exceeds a specified threshold. This threshold has an intuitive interpretation and allows for controlling the trade-off between correctness and precision of credal predictions. We tackle the problem of approximating credal sets defined in this way by means of suitably modified ensemble learning techniques. To validate our approach, we illustrate its effectiveness by experiments on benchmark datasets demonstrating superior uncertainty representation without compromising predictive performance. We also compare our method against several state-of-the-art baselines in credal prediction.
Updated: 2025-12-15 15:19:55
标题: 基于相对可能性的信任预测
摘要: 预测以概率分布集合的形式,即所谓的信任集,提供了一种适当的方法来表示学习者的认识不确定性。在本文中,我们提出了一种基于统计相对可能性概念的理论基础方法,用于信任预测:预测的目标是由可能模型集合产生的所有(条件)概率分布的集合,即那些相对可能性超过指定阈值的模型。这个阈值有直观的解释,并允许控制信任预测的正确性和精度之间的权衡。我们通过适当修改的集成学习技术来解决以这种方式定义的信任集的近似问题。为了验证我们的方法,我们通过在基准数据集上进行实验证明了其有效性,展示了卓越的不确定性表示而不损害预测性能。我们还将我们的方法与几种最先进的信任预测基线进行比较。
更新时间: 2025-12-15 15:19:55
领域: stat.ML,cs.LG
Weak Enforcement and Low Compliance in PCI~DSS: A Comparative Security Study
Although credit and debit card data continue to be a prime target for attackers, organizational adherence to the Payment Card Industry Data Security Standard (PCI DSS) remains surprisingly low. Despite prior work showing that PCI DSS can reduce card fraud, only 32.4% of organizations were fully compliant in 2022, suggesting possible deficiencies in enforcement mechanisms. This study compares PCI DSS with three data security frameworks, HIPAA, NIS2, and GDPR, to examine how enforcement mechanisms relate to implementation success. The analysis reveals that PCI DSS significantly lags far behind these security frameworks and that its sanctions are orders of magnitude smaller than those under GDPR and NIS2. The findings indicate a positive association between stronger, multi-modal enforcement (including public disclosure, license actions, and imprisonment) and higher implementation rates, and highlights the structural weakness of PCI DSS's bank-dependent monitoring model. Enhanced non-monetary sanctions and the creation of an independent supervisory authority are recommended to increase transparency, reduce conflicts of interest, and improve PCI DSS compliance without discouraging card acceptance.
Updated: 2025-12-15 15:19:33
标题: PCI~DSS的弱执法和低合规性:一项比较安全性研究
摘要: 尽管信用卡和借记卡数据继续是攻击者的主要目标,但组织对支付卡行业数据安全标准(PCI DSS)的遵守仍然令人惊讶地低。尽管先前的研究表明PCI DSS可以减少信用卡欺诈,但2022年仅有32.4%的组织完全合规,这表明执法机制可能存在缺陷。本研究将PCI DSS与三种数据安全框架(HIPAA、NIS2和GDPR)进行比较,以研究执法机制与实施成功之间的关系。分析显示,PCI DSS明显落后于这些安全框架,并且其制裁比GDPR和NIS2下的制裁数量级小得多。研究结果表明,较强、多模式的执法(包括公开披露、许可行动和监禁)与更高的实施率之间存在积极关联,并突显了PCI DSS银行依赖型监测模型的结构性弱点。建议增加非金钱制裁并创建独立监管机构,以增加透明度、减少利益冲突,并改善PCI DSS的合规性,而不会影响卡片接受。
更新时间: 2025-12-15 15:19:33
领域: cs.CR,cs.CY
MineTheGap: Automatic Mining of Biases in Text-to-Image Models
Text-to-Image (TTI) models generate images based on text prompts, which often leave certain aspects of the desired image ambiguous. When faced with these ambiguities, TTI models have been shown to exhibit biases in their interpretations. These biases can have societal impacts, e.g., when showing only a certain race for a stated occupation. They can also affect user experience when creating redundancy within a set of generated images instead of spanning diverse possibilities. Here, we introduce MineTheGap - a method for automatically mining prompts that cause a TTI model to generate biased outputs. Our method goes beyond merely detecting bias for a given prompt. Rather, it leverages a genetic algorithm to iteratively refine a pool of prompts, seeking for those that expose biases. This optimization process is driven by a novel bias score, which ranks biases according to their severity, as we validate on a dataset with known biases. For a given prompt, this score is obtained by comparing the distribution of generated images to the distribution of LLM-generated texts that constitute variations on the prompt. Code and examples are available on the project's webpage.
Updated: 2025-12-15 15:17:02
标题: MineTheGap:文本到图像模型中偏见的自动挖掘
摘要: 文本到图像(TTI)模型根据文本提示生成图像,这往往会使所需图像的某些方面含糊不清。面对这些不确定性,已经证明TTI模型在解释方面存在偏见。这些偏见可能会对社会产生影响,例如,在陈述的职业中只展示某一种族。它们也可能会影响用户体验,例如在生成的图像集中创建冗余,而不是涵盖各种可能性。在这里,我们介绍了MineTheGap - 一种用于自动挖掘导致TTI模型生成偏见输出的提示的方法。我们的方法不仅仅是检测给定提示的偏见。相反,它利用遗传算法迭代地优化一组提示,寻找那些暴露偏见的提示。这个优化过程是由一个新颖的偏见评分驱动的,根据这个评分我们在一个已知偏见的数据集上验证了偏见的严重程度。对于给定的提示,这个评分是通过比较生成的图像的分布和构成提示变体的LLM生成文本的分布来获得的。代码和示例可在项目的网页上找到。
更新时间: 2025-12-15 15:17:02
领域: cs.CV,cs.LG
Behavioral Machine Learning? Regularization and Forecast Bias
Standard forecast efficiency tests interpret violations as evidence of behavioral bias. We show theoretically and empirically that rational forecasters using optimal regularization systematically violate these tests. Machine learning forecasts show near zero bias at one year horizon, but strong overreaction at two years, consistent with predictions from a model of regularization and measurement noise. We provide three complementary tests: experimental variation in regularization parameters, cross-sectional heterogeneity in firm signal quality, and quasi-experimental evidence from ML adoption around 2013. Technically trained analysts shift sharply toward overreaction post-2013. Our findings suggest reported violations may reflect statistical sophistication rather than cognitive failure.
Updated: 2025-12-15 15:05:52
标题: 行为机器学习?正则化和预测偏差
摘要: 标准预测效率测试将违规行为解释为行为偏见的证据。我们在理论上和实证上展示,使用最佳正则化的理性预测者系统性地违反这些测试。机器学习预测在一年的时间范围内显示几乎没有偏见,但在两年内表现出强烈的过度反应,与正则化和测量误差模型的预测一致。我们提供三种互补测试:正则化参数的实验变化,公司信号质量的横截面异质性,以及围绕2013年左右的机器学习采用的准实验证据。在2013年后,技术训练有素的分析师明显转向过度反应。我们的发现表明,报告的违规可能反映了统计学上的复杂性,而不是认知失败。
更新时间: 2025-12-15 15:05:52
领域: q-fin.ST,cs.LG,econ.GN,q-fin.GN
Improved Segmentation of Polyps and Visual Explainability Analysis
Colorectal cancer (CRC) remains one of the leading causes of cancer-related morbidity and mortality worldwide, with gastrointestinal (GI) polyps serving as critical precursors according to the World Health Organization (WHO). Early and accurate segmentation of polyps during colonoscopy is essential for reducing CRC progression, yet manual delineation is labor-intensive and prone to observer variability. Deep learning methods have demonstrated strong potential for automated polyp analysis, but their limited interpretability remains a barrier to clinical adoption. In this study, we present PolypSeg-GradCAM, an explainable deep learning framework that integrates a U-Net architecture with a pre-trained ResNet-34 backbone and Gradient-weighted Class Activation Mapping (Grad-CAM) for transparent polyp segmentation. To ensure rigorous benchmarking, the model was trained and evaluated using 5-Fold Cross-Validation on the Kvasir-SEG dataset of 1,000 annotated endoscopic images. Experimental results show a mean Dice coefficient of 0.8902 +/- 0.0125, a mean Intersection-over-Union (IoU) of 0.8023, and an Area Under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.9722. Advanced quantitative analysis using an optimal threshold yielded a Sensitivity of 0.9058 and Precision of 0.9083. Additionally, Grad-CAM visualizations confirmed that predictions were guided by clinically relevant regions, offering insight into the model's decision-making process. This study demonstrates that integrating segmentation accuracy with interpretability can support the development of trustworthy AI-assisted colonoscopy tools.
Updated: 2025-12-15 15:03:15
标题: 改进的息肉分割和可视化解释性分析
摘要: 结直肠癌(CRC)仍然是全球癌症相关发病率和死亡率的主要原因之一,根据世界卫生组织(WHO)的说法,胃肠道(GI)息肉是关键的前体。在结肠镜检查过程中早期和准确地分割息肉对于减少CRC的进展至关重要,然而手动描绘是一项耗时且容易受观察者变异性影响的工作。深度学习方法已经显示出自动化息肉分析的强大潜力,但其有限的可解释性仍然是临床采用的障碍。在这项研究中,我们提出了PolypSeg-GradCAM,这是一个可解释的深度学习框架,它将U-Net架构与预训练的ResNet-34骨干结合,并使用梯度加权类激活映射(Grad-CAM)进行透明的息肉分割。为了确保严格的基准测试,该模型在Kvasir-SEG数据集上进行了5折交叉验证训练和评估,该数据集包含了1000张标注的内窥镜图像。实验结果显示平均Dice系数为0.8902 +/- 0.0125,平均交并比(IoU)为0.8023,接收器操作特征曲线下面积(AUC-ROC)为0.9722。使用最佳阈值进行的高级定量分析产生了灵敏度为0.9058和精确度为0.9083。此外,Grad-CAM可视化证实了预测是由临床相关区域引导的,为模型决策过程提供了洞察。这项研究表明,将分割准确性与可解释性相结合可以支持可信赖的AI辅助结肠镜工具的发展。
更新时间: 2025-12-15 15:03:15
领域: cs.CV,cs.LG
Few-shot Class-incremental Fault Diagnosis by Preserving Class-Agnostic Knowledge with Dual-Granularity Representations
Few-Shot Class-Incremental Fault Diagnosis (FSC-FD), which aims to continuously learn from new fault classes with only a few samples without forgetting old ones, is critical for real-world industrial systems. However, this challenging task severely amplifies the issues of catastrophic forgetting of old knowledge and overfitting on scarce new data. To address these challenges, this paper proposes a novel framework built upon Dual-Granularity Representations, termed the Dual-Granularity Guidance Network (DGGN). Our DGGN explicitly decouples feature learning into two parallel streams: 1) a fine-grained representation stream, which utilizes a novel Multi-Order Interaction Aggregation module to capture discriminative, class-specific features from the limited new samples. 2) a coarse-grained representation stream, designed to model and preserve general, class-agnostic knowledge shared across all fault types. These two representations are dynamically fused by a multi-semantic cross-attention mechanism, where the stable coarse-grained knowledge guides the learning of fine-grained features, preventing overfitting and alleviating feature conflicts. To further mitigate catastrophic forgetting, we design a Boundary-Aware Exemplar Prioritization strategy. Moreover, a decoupled Balanced Random Forest classifier is employed to counter the decision boundary bias caused by data imbalance. Extensive experiments on the TEP benchmark and a real-world MFF dataset demonstrate that our proposed DGGN achieves superior diagnostic performance and stability compared to state-of-the-art FSC-FD approaches. Our code is publicly available at https://github.com/MentaY/DGGN
Updated: 2025-12-15 15:02:16
标题: 通过保留双粒度表示的类不可知知识进行少样本类增量故障诊断
摘要: Few-Shot Class-Incremental Fault Diagnosis (FSC-FD)旨在持续学习来自新故障类别的样本,而不会遗忘旧知识,对于真实世界的工业系统至关重要。然而,这一具有挑战性的任务严重放大了旧知识的灾难性遗忘和对稀缺新数据的过拟合问题。为了解决这些挑战,本文提出了一个建立在双粒度表示上的新框架,称为双粒度引导网络(DGGN)。我们的DGGN将特征学习明确地分为两个平行流:1)一个细粒度表示流,利用新颖的多阶交互聚合模块从有限的新样本中捕捉有区别的、类别特定的特征。2)一个粗粒度表示流,旨在建模和保留所有故障类型共享的一般、与类别无关的知识。这两种表示通过多语义交叉注意力机制动态融合,其中稳定的粗粒度知识引导细粒度特征的学习,防止过拟合并减轻特征冲突。为了进一步减轻灾难性遗忘,我们设计了一种边界感知的样本优先策略。此外,我们采用了一个分离的平衡随机森林分类器来对抗由数据不平衡引起的决策边界偏差。对TEP基准和真实MFF数据集进行的大量实验证明,我们提出的DGGN相对于最先进的FSC-FD方法实现了更优越的诊断性能和稳定性。我们的代码公开可在https://github.com/MentaY/DGGN获取。
更新时间: 2025-12-15 15:02:16
领域: cs.LG,cs.AI
"All of Me": Mining Users' Attributes from their Public Spotify Playlists
In the age of digital music streaming, playlists on platforms like Spotify have become an integral part of individuals' musical experiences. People create and publicly share their own playlists to express their musical tastes, promote the discovery of their favorite artists, and foster social connections. In this work, we aim to address the question: can we infer users' private attributes from their public Spotify playlists? To this end, we conducted an online survey involving 739 Spotify users, resulting in a dataset of 10,286 publicly shared playlists comprising over 200,000 unique songs and 55,000 artists. Then, we utilize statistical analyses and machine learning algorithms to build accurate predictive models for users' attributes.
Updated: 2025-12-15 15:00:26
标题: “全部的我:从用户的公共Spotify播放列表中挖掘用户属性”
摘要: 在数字音乐流媒体时代,像Spotify这样的平台上的播放列表已成为个人音乐体验的重要组成部分。人们创建并公开分享自己的播放列表,以表达他们的音乐品味,推广他们喜爱的艺术家的发现,并促进社交联系。在这项工作中,我们的目标是回答一个问题:我们能否从用户的公开Spotify播放列表中推断出他们的私人属性?为此,我们进行了一项涉及739名Spotify用户的在线调查,结果是一个包含10,286个公开分享的播放列表的数据集,包括超过200,000首独特的歌曲和55,000位艺术家。然后,我们利用统计分析和机器学习算法构建准确的用户属性预测模型。
更新时间: 2025-12-15 15:00:26
领域: cs.CR,cs.LG,cs.SI
Multiclass Graph-Based Large Margin Classifiers: Unified Approach for Support Vectors and Neural Networks
While large margin classifiers are originally an outcome of an optimization framework, support vectors (SVs) can be obtained from geometric approaches. This article presents advances in the use of Gabriel graphs (GGs) in binary and multiclass classification problems. For Chipclass, a hyperparameter-less and optimization-less GG-based binary classifier, we discuss how activation functions and support edge (SE)-centered neurons affect the classification, proposing smoother functions and structural SV (SSV)-centered neurons to achieve margins with low probabilities and smoother classification contours. We extend the neural network architecture, which can be trained with backpropagation with a softmax function and a cross-entropy loss, or by solving a system of linear equations. A new subgraph-/distance-based membership function for graph regularization is also proposed, along with a new GG recomputation algorithm that is less computationally expensive than the standard approach. Experimental results with the Friedman test show that our method was better than previous GG-based classifiers and statistically equivalent to tree-based models.
Updated: 2025-12-15 15:00:13
标题: 基于图的多类别大边界分类器:支持向量和神经网络的统一方法
摘要: 大边界分类器最初是优化框架的产物,支持向量(SVs)可以通过几何方法获得。本文介绍了在二元和多类分类问题中使用Gabriel图(GGs)的进展。对于Chipclass,一种基于GG的无超参数和无优化的二元分类器,我们讨论了激活函数和支持边缘(SE)中心神经元如何影响分类,提出了更平滑的函数和结构性支持向量(SSV)中心神经元,以实现具有低概率和更平滑分类轮廓的边界。我们扩展了可以通过反向传播训练的神经网络架构,该架构具有softmax函数和交叉熵损失,或者通过解决线性方程组。还提出了一种新的基于子图/距离的图正规化成员函数,以及一种比标准方法更少计算开销的新的GG重新计算算法。通过Friedman测试的实验结果显示,我们的方法优于先前基于GG的分类器,并且在统计上与基于树的模型相当。
更新时间: 2025-12-15 15:00:13
领域: cs.LG,stat.ML
OM4OV: Leveraging Ontology Matching for Ontology Versioning
Due to the dynamic nature of the Semantic Web, version control is necessary to capture time-varying information for widely used ontologies. Despite the long-standing recognition of ontology versioning (OV) as a crucial component of efficient ontology management, many views treat OV as similar to ontology matching (OM) and directly reuse OM systems for OV tasks. In this study, we systematically analyse the similarities and differences between OM and OV and formalise the OM4OV pipeline. The pipeline is implemented and evaluated in the state-of-the-art OM system Agent-OM. The experimental results indicate that OM systems can be reused for OV tasks, but without necessary modifications, the current OM4OV pipeline can produce skewed measurements, poor performance in detecting update entities, and less explainability for false mappings. To tackle these issues, we propose an optimisation method called the cross-reference (CR) mechanism, building upon the existing alignment(s) from OM to reduce the number of matching candidates and improve overall OV performance.
Updated: 2025-12-15 14:59:48
标题: OM4OV:利用本体匹配进行本体版本管理
摘要: 由于语义Web的动态性,版本控制对于捕获广泛使用的本体的时间变化信息是必要的。尽管人们长期以来已经认识到本体版本控制(OV)是高效本体管理的关键组成部分,但许多观点将OV视为类似于本体匹配(OM),并直接重用OM系统进行OV任务。在这项研究中,我们系统分析了OM和OV之间的相似之处和不同之处,并形式化了OM4OV管道。该管道在最先进的OM系统Agent-OM中得到了实现和评估。实验结果表明,OM系统可以被重用于OV任务,但在没有必要的修改的情况下,当前的OM4OV管道可能会产生偏斜的测量结果,在检测更新实体方面表现不佳,并且对于错误映射的解释性较差。为了解决这些问题,我们提出了一种名为交叉参考(CR)机制的优化方法,通过利用从OM到现有对齐中构建的对齐减少匹配候选者的数量,并提高整体OV性能。
更新时间: 2025-12-15 14:59:48
领域: cs.AI,cs.CL,cs.IR
A Deep-Learning Iterative Stacked Approach for Prediction of Reactive Dissolution in Porous Media
Simulating reactive dissolution of solid minerals in porous media has many subsurface applications, including carbon capture and storage (CCS), geothermal systems and oil & gas recovery. As traditional direct numerical simulators are computationally expensive, it is of paramount importance to develop faster and more efficient alternatives. Deep-learning-based solutions, most of them built upon convolutional neural networks (CNNs), have been recently designed to tackle this problem. However, these solutions were limited to approximating one field over the domain (e.g. velocity field). In this manuscript, we present a novel deep learning approach that incorporates both temporal and spatial information to predict the future states of the dissolution process at a fixed time-step horizon, given a sequence of input states. The overall performance, in terms of speed and prediction accuracy, is demonstrated on a numerical simulation dataset, comparing its prediction results against state-of-the-art approaches, also achieving a speedup around $10^4$ over traditional numerical simulators.
Updated: 2025-12-15 14:54:33
标题: 一个用于预测多孔介质中反应性溶解的深度学习迭代堆叠方法
摘要: 在多孔介质中模拟固体矿物的反应溶解具有许多地下应用,包括碳捕集与储存(CCS)、地热系统和油气回收。传统的直接数值模拟器计算成本高昂,因此开发更快速和高效的替代方案至关重要。基于深度学习的解决方案,大部分建立在卷积神经网络(CNNs)之上,最近被设计用来解决这一问题。然而,这些解决方案仅限于近似域上的一个场(例如速度场)。在本文中,我们提出了一种新颖的深度学习方法,该方法结合了时间和空间信息,以预测在固定时间步长内溶解过程的未来状态,给出一系列输入状态。通过在数值模拟数据集上展示整体性能,比较其预测结果与最先进的方法,同时实现与传统数值模拟器相比约10^4倍的加速。
更新时间: 2025-12-15 14:54:33
领域: cs.LG
End2Reg: Learning Task-Specific Segmentation for Markerless Registration in Spine Surgery
Purpose: Intraoperative navigation in spine surgery demands millimeter-level accuracy. Current systems based on intraoperative radiographic imaging and bone-anchored markers are invasive, radiation-intensive and workflow disruptive. Recent markerless RGB-D registration methods offer a promising alternative, but existing approaches rely on weak segmentation labels to isolate relevant anatomical structures, which can propagate errors throughout registration. Methods: We present End2Reg an end-to-end deep learning framework that jointly optimizes segmentation and registration, eliminating the need for weak segmentation labels and manual steps. The network learns segmentation masks specifically optimized for registration, guided solely by the registration objective without direct segmentation supervision. Results: The proposed framework achieves state-of-the-art performance on ex- and in-vivo benchmarks, reducing median Target Registration Error by 32% to 1.83mm and mean Root Mean Square Error by 45% to 3.95mm, respectively. An ablation study confirms that end-to-end optimization significantly improves registration accuracy. Conclusion: The presented end-to-end RGB-D registration pipeline removes dependency on weak labels and manual steps, advancing towards fully automatic, markerless intraoperative navigation. Code and interactive visualizations are available at: https://lorenzopettinari.github.io/end-2-reg/.
Updated: 2025-12-15 14:53:20
标题: End2Reg:学习脊柱手术中无标记配准的任务特定分割
摘要: 目的:脊柱手术中的术中导航需要毫米级的精度。目前基于术中放射影像和骨锚定标记的系统具有侵入性、辐射强度高和工作流程干扰性。最近的无标记RGB-D注册方法提供了一种有前途的替代方案,但现有方法依赖于弱分割标签来隔离相关解剖结构,这可能会在整个注册过程中传播错误。方法:我们提出了End2Reg,一个端到端的深度学习框架,它共同优化分割和注册,消除了对弱分割标签和手动步骤的需求。该网络学习了专门为注册优化的分割掩模,仅通过注册目标的引导而无需直接的分割监督。结果:所提出的框架在离体和体内基准测试中取得了最先进的性能,分别将中值目标注册误差降低了32%至1.83mm,均方根误差平均降低了45%至3.95mm。消融研究证实,端到端优化显著提高了注册精度。结论:所提出的端到端RGB-D注册流程消除了对弱标签和手动步骤的依赖,朝着完全自动、无标记的术中导航迈进。代码和交互式可视化可在以下网址获取:https://lorenzopettinari.github.io/end-2-reg/。
更新时间: 2025-12-15 14:53:20
领域: cs.CV,cs.AI
Differentiable Evolutionary Reinforcement Learning
The design of effective reward functions presents a central and often arduous challenge in reinforcement learning (RL), particularly when developing autonomous agents for complex reasoning tasks. While automated reward optimization approaches exist, they typically rely on derivative-free evolutionary heuristics that treat the reward function as a black box, failing to capture the causal relationship between reward structure and task performance. To bridge this gap, we propose Differentiable Evolutionary Reinforcement Learning (DERL), a bilevel framework that enables the autonomous discovery of optimal reward signals. In DERL, a Meta-Optimizer evolves a reward function (i.e., Meta-Reward) by composing structured atomic primitives, guiding the training of an inner-loop policy. Crucially, unlike previous evolution, DERL is differentiable in its metaoptimization: it treats the inner-loop validation performance as a signal to update the Meta-Optimizer via reinforcement learning. This allows DERL to approximate the "meta-gradient" of task success, progressively learning to generate denser and more actionable feedback. We validate DERL across three distinct domains: robotic agent (ALFWorld), scientific simulation (ScienceWorld), and mathematical reasoning (GSM8k, MATH). Experimental results show that DERL achieves state-of-the-art performance on ALFWorld and ScienceWorld, significantly outperforming methods relying on heuristic rewards, especially in out-of-distribution scenarios. Analysis of the evolutionary trajectory demonstrates that DERL successfully captures the intrinsic structure of tasks, enabling selfimproving agent alignment without human intervention.
Updated: 2025-12-15 14:50:08
标题: 可微分的进化强化学习
摘要: 有效奖励函数的设计在强化学习(RL)中是一个中心且常常艰巨的挑战,特别是在开发复杂推理任务的自主代理时。虽然存在自动化奖励优化方法,但它们通常依赖于无导数的演化启发式方法,将奖励函数视为黑盒子,无法捕捉奖励结构与任务表现之间的因果关系。为了弥补这一差距,我们提出了可微分演化强化学习(DERL),这是一个双层框架,能够实现自主发现最佳奖励信号。在DERL中,一个元优化器通过组合结构化的原子基元来演化奖励函数(即元奖励),指导内循环策略的训练。关键的是,与以往的演化不同,DERL在元优化中是可微分的:它将内循环验证表现视为信号,通过强化学习来更新元优化器。这使得DERL能够逼近任务成功的“元梯度”,逐渐学会生成更密集且更具操作性的反馈。我们在三个不同领域验证了DERL:机器人代理(ALFWorld)、科学模拟(ScienceWorld)和数学推理(GSM8k、MATH)。实验结果显示,DERL在ALFWorld和ScienceWorld上实现了最先进的性能,在特别是在分布外情况下明显优于依赖启发式奖励的方法。对演化轨迹的分析表明,DERL成功捕捉了任务的内在结构,实现了无需人类干预的自我改进代理对齐。
更新时间: 2025-12-15 14:50:08
领域: cs.AI,cs.CL
rNCA: Self-Repairing Segmentation Masks
Accurately predicting topologically correct masks remains a difficult task for general segmentation models, which often produce fragmented or disconnected outputs. Fixing these artifacts typically requires hand-crafted refinement rules or architectures specialized to a particular task. Here, we show that Neural Cellular Automata (NCA) can be directly re-purposed as an effective refinement mechanism, using local, iterative updates guided by image context to repair segmentation masks. By training on imperfect masks and ground truths, the automaton learns the structural properties of the target shape while relying solely on local information. When applied to coarse, globally predicted masks, the learned dynamics progressively reconnect broken regions, prune loose fragments and converge towards stable, topologically consistent results. We show how refinement NCA (rNCA) can be easily applied to repair common topological errors produced by different base segmentation models and tasks: for fragmented retinal vessels, it yields 2-3% gains in Dice/clDice and improves Betti errors, reducing $β_0$ errors by 60% and $β_1$ by 20%; for myocardium, it repairs 61.5% of broken cases in a zero-shot setting while lowering ASSD and HD by 19% and 16%, respectively. This showcases NCA as effective and broadly applicable refiners.
Updated: 2025-12-15 14:49:55
标题: rNCA:自修复分割掩模
摘要: 准确预测拓扑正确的掩膜对于一般分割模型仍然是一个困难的任务,这些模型通常会产生碎片化或断开的输出。修复这些伪影通常需要手工制定的细化规则或专门针对特定任务的架构。在这里,我们展示了神经元细胞自动机(NCA)可以直接重新用作有效的细化机制,利用由图像上下文引导的局部迭代更新修复分割掩膜。通过在不完美的掩膜和地面真相上训练,自动机仅依赖于局部信息学习目标形状的结构特性。当应用于粗糙的全局预测掩膜时,学习的动态逐渐重新连接断开的区域,修剪松散的碎片,并趋于稳定、拓扑一致的结果。我们展示了如何轻松应用细化NCA(rNCA)来修复不同基本分割模型和任务产生的常见拓扑错误:对于碎片化的视网膜血管,它在Dice/clDice上获得2-3%的增益,并改善Betti错误,将$β_0$错误降低了60%,将$β_1$错误降低了20%;对于心肌,它在零射击设置下修复了61.5%的破损案例,同时将ASSD和HD分别降低了19%和16%。这展示了NCA作为有效且广泛适用的细化器。
更新时间: 2025-12-15 14:49:55
领域: cs.CV,cs.LG,eess.IV
The prediction of the quality of results in Logic Synthesis using Transformer and Graph Neural Networks
In the logic synthesis stage, structure transformations in the synthesis tool need to be combined into optimization sequences and act on the circuit to meet the specified circuit area and delay. However, logic synthesis optimization sequences are time-consuming to run, and predicting the quality of the results (QoR) against the synthesis optimization sequence for a circuit can help engineers find a better optimization sequence faster. In this work, we propose a deep learning method to predict the QoR of unseen circuit-optimization sequences pairs. Specifically, the structure transformations are translated into vectors by embedding methods and advanced natural language processing (NLP) technology (Transformer) is used to extract the features of the optimization sequences. In addition, to enable the prediction process of the model to be generalized from circuit to circuit, the graph representation of the circuit is represented as an adjacency matrix and a feature matrix. Graph neural networks(GNN) are used to extract the structural features of the circuits. For this problem, the Transformer and three typical GNNs are used. Furthermore, the Transformer and GNNs are adopted as a joint learning policy for the QoR prediction of the unseen circuit-optimization sequences. The methods resulting from the combination of Transformer and GNNs are benchmarked. The experimental results show that the joint learning of Transformer and GraphSage gives the best results. The Mean Absolute Error (MAE) of the predicted result is 0.412.
Updated: 2025-12-15 14:45:23
标题: 使用Transformer和图神经网络来预测逻辑综合结果的质量
摘要: 在逻辑综合阶段,综合工具中的结构变换需要结合成优化序列,并作用于电路以满足指定的电路面积和延迟。然而,逻辑综合优化序列运行耗时,并且预测结果质量(QoR)对于电路的综合优化序列可以帮助工程师更快地找到更好的优化序列。在这项工作中,我们提出了一种深度学习方法,用于预测未见的电路优化序列对的QoR。具体而言,通过嵌入方法将结构变换转化为向量,并利用先进的自然语言处理(NLP)技术(Transformer)提取优化序列的特征。此外,为了使模型的预测过程能够从电路通用化,电路的图表示被表示为邻接矩阵和特征矩阵。图神经网络(GNN)被用于提取电路的结构特征。对于这个问题,使用了Transformer和三种典型的GNN。此外,Transformer和GNN被采用作为未见电路优化序列QoR预测的联合学习策略。由Transformer和GNN组合而成的方法被进行了基准测试。实验结果表明,Transformer和GraphSage的联合学习效果最好。预测结果的平均绝对误差(MAE)为0.412。
更新时间: 2025-12-15 14:45:23
领域: cs.AR,cs.LG
Neural stochastic Volterra equations: learning path-dependent dynamics
Stochastic Volterra equations (SVEs) serve as mathematical models for the time evolutions of random systems with memory effects and irregular behaviour. We introduce neural stochastic Volterra equations as a physics-inspired architecture, generalizing the class of neural stochastic differential equations, and provide some theoretical foundation. Numerical experiments on various SVEs, like the disturbed pendulum equation, the generalized Ornstein--Uhlenbeck process, the rough Heston model and a monetary reserve dynamics, are presented, comparing the performance of neural SVEs, neural SDEs and Deep Operator Networks (DeepONets).
Updated: 2025-12-15 14:36:12
标题: 神经随机Volterra方程:学习路径相关动态
摘要: 随机Volterra方程(SVEs)作为具有记忆效应和不规则行为的随机系统的时间演化的数学模型。我们引入神经随机Volterra方程作为一种受物理启发的架构,推广了神经随机微分方程的类别,并提供了一些理论基础。我们展示了对各种SVEs的数值实验,如扰动摆方程,广义Ornstein-Uhlenbeck过程,粗糙的Heston模型和货币储备动态,并比较了神经SVEs,神经SDEs和深度操作网络(DeepONets)的性能。
更新时间: 2025-12-15 14:36:12
领域: cs.LG,math.PR,stat.ML
Dual-Phase Federated Deep Unlearning via Weight-Aware Rollback and Reconstruction
Federated Unlearning (FUL) focuses on client data and computing power to offer a privacy-preserving solution. However, high computational demands, complex incentive mechanisms, and disparities in client-side computing power often lead to long times and higher costs. To address these challenges, many existing methods rely on server-side knowledge distillation that solely removes the updates of the target client, overlooking the privacy embedded in the contributions of other clients, which can lead to privacy leakage. In this work, we introduce DPUL, a novel server-side unlearning method that deeply unlearns all influential weights to prevent privacy pitfalls. Our approach comprises three components: (i) identifying high-weight parameters by filtering client update magnitudes, and rolling them back to ensure deep removal. (ii) leveraging the variational autoencoder (VAE) to reconstruct and eliminate low-weight parameters. (iii) utilizing a projection-based technique to recover the model. Experimental results on four datasets demonstrate that DPUL surpasses state-of-the-art baselines, providing a 1%-5% improvement in accuracy and up to 12x reduction in time cost.
Updated: 2025-12-15 14:32:47
标题: 双相联邦深度反学习:通过权重感知回滚和重构
摘要: 联邦式遗忘(FUL)侧重于利用客户数据和计算能力提供隐私保护解决方案。然而,高计算需求、复杂的激励机制以及客户端计算能力的差异往往导致长时间和更高的成本。为了解决这些挑战,许多现有方法依赖于仅消除目标客户端更新的服务器端知识蒸馏,忽视了其他客户端贡献中嵌入的隐私,这可能导致隐私泄露。在这项工作中,我们介绍了DPUL,一种新颖的服务器端遗忘方法,深度遗忘所有有影响力的权重以防止隐私陷阱。我们的方法包括三个组成部分:(i)通过过滤客户端更新幅度来识别高权重参数,并将其回滚以确保深度移除。 (ii)利用变分自动编码器(VAE)重建和消除低权重参数。 (iii)利用基于投影的技术恢复模型。在四个数据集上的实验结果表明,DPUL超越了最先进的基线,提供了1%-5%的准确性改进和高达12倍的时间成本降低。
更新时间: 2025-12-15 14:32:47
领域: cs.LG
Behavior and Representation in Large Language Models for Combinatorial Optimization: From Feature Extraction to Algorithm Selection
Recent advances in Large Language Models (LLMs) have opened new perspectives for automation in optimization. While several studies have explored how LLMs can generate or solve optimization models, far less is understood about what these models actually learn regarding problem structure or algorithmic behavior. This study investigates how LLMs internally represent combinatorial optimization problems and whether such representations can support downstream decision tasks. We adopt a twofold methodology combining direct querying, which assesses LLM capacity to explicitly extract instance features, with probing analyses that examine whether such information is implicitly encoded within their hidden layers. The probing framework is further extended to a per-instance algorithm selection task, evaluating whether LLM-derived representations can predict the best-performing solver. Experiments span four benchmark problems and three instance representations. Results show that LLMs exhibit moderate ability to recover feature information from problem instances, either through direct querying or probing. Notably, the predictive power of LLM hidden-layer representations proves comparable to that achieved through traditional feature extraction, suggesting that LLMs capture meaningful structural information relevant to optimization performance.
Updated: 2025-12-15 14:28:35
标题: 大型语言模型在组合优化中的行为和表示:从特征提取到算法选择
摘要: 最近对大型语言模型(LLMs)的进展为优化中的自动化开辟了新的视角。虽然有几项研究探讨了LLMs如何生成或解决优化模型,但对于这些模型实际学习到了有关问题结构或算法行为的信息了解较少。本研究调查了LLMs如何在内部表示组合优化问题,以及这种表示是否能支持下游的决策任务。我们采用了一种结合直接查询和探究分析的双重方法,直接查询评估LLMs明确提取实例特征的能力,而探究分析则检查这些信息是否被隐式编码在它们的隐藏层中。探究框架进一步扩展到基于每个实例的算法选择任务,评估LLM衍生的表示是否能预测表现最佳的求解器。实验涵盖了四个基准问题和三种实例表示。结果显示,LLMs表现出中等能力从问题实例中恢复特征信息,无论是通过直接查询还是探究。值得注意的是,LLM隐藏层表示的预测能力与通过传统特征提取实现的预测能力相当,表明LLMs捕捉到与优化性能相关的有意义结构性信息。
更新时间: 2025-12-15 14:28:35
领域: cs.AI
Detecting Emotion Drift in Mental Health Text Using Pre-Trained Transformers
This study investigates emotion drift: the change in emotional state across a single text, within mental health-related messages. While sentiment analysis typically classifies an entire message as positive, negative, or neutral, the nuanced shift of emotions over the course of a message is often overlooked. This study detects sentence-level emotions and measures emotion drift scores using pre-trained transformer models such as DistilBERT and RoBERTa. The results provide insights into patterns of emotional escalation or relief in mental health conversations. This methodology can be applied to better understand emotional dynamics in content.
Updated: 2025-12-15 14:18:12
标题: 利用预训练的转换器检测心理健康文本中的情绪漂移
摘要: 本研究调查了情绪漂移:在与心理健康相关的消息中,文本中情绪状态的变化。虽然情感分析通常将整个消息分类为积极的、消极的或中性的,但消息在传递过程中情绪的微妙变化经常被忽视。本研究检测句子级的情绪,并使用预训练的转换器模型,如DistilBERT和RoBERTa,测量情绪漂移得分。结果提供了有关心理健康对话中情绪升级或缓解模式的见解。这种方法可以应用于更好地理解内容中的情感动态。
更新时间: 2025-12-15 14:18:12
领域: cs.CL,cs.AI
Automated User Identification from Facial Thermograms with Siamese Networks
The article analyzes the use of thermal imaging technologies for biometric identification based on facial thermograms. It presents a comparative analysis of infrared spectral ranges (NIR, SWIR, MWIR, and LWIR). The paper also defines key requirements for thermal cameras used in biometric systems, including sensor resolution, thermal sensitivity, and a frame rate of at least 30 Hz. Siamese neural networks are proposed as an effective approach for automating the identification process. In experiments conducted on a proprietary dataset, the proposed method achieved an accuracy of approximately 80%. The study also examines the potential of hybrid systems that combine visible and infrared spectra to overcome the limitations of individual modalities. The results indicate that thermal imaging is a promising technology for developing reliable security systems.
Updated: 2025-12-15 14:13:49
标题: 用孪生网络从面部热像图中自动识别用户
摘要: 这篇文章分析了基于面部热图的生物特征识别中热成像技术的应用。它提出了对红外光谱范围(近红外、短波红外、中波红外和长波红外)的比较分析。该论文还定义了生物特征系统中热相机的关键要求,包括传感器分辨率、热灵敏度和至少30 Hz的帧速率。提出将暹罗神经网络作为自动化识别过程的有效方法。在对专有数据集进行的实验中,提出的方法实现了约80%的准确率。该研究还探讨了将可见光和红外光谱结合的混合系统的潜力,以克服单一模态的局限性。结果表明,热成像技术是开发可靠安全系统的有前途的技术。
更新时间: 2025-12-15 14:13:49
领域: cs.CV,cs.CR
Meta Pruning via Graph Metanetworks : A Universal Meta Learning Framework for Network Pruning
We propose an entirely new meta-learning framework for network pruning. It is a general framework that can be theoretically applied to almost all types of networks with all kinds of pruning and has great generality and transferability. Experiments have shown that it can achieve outstanding results on many popular and representative pruning tasks (including both CNNs and Transformers). Unlike all prior works that either rely on fixed, hand-crafted criteria to prune in a coarse manner, or employ learning to prune ways that require special training during each pruning and lack generality. Our framework can learn complex pruning rules automatically via a neural network (metanetwork) and has great generality that can prune without any special training. More specifically, we introduce the newly developed idea of metanetwork from meta-learning into pruning. A metanetwork is a network that takes another network as input and produces a modified network as output. In this paper, we first establish a bijective mapping between neural networks and graphs, and then employ a graph neural network as our metanetwork. We train a metanetwork that learns the pruning strategy automatically and can transform a network that is hard to prune into another network that is much easier to prune. Once the metanetwork is trained, our pruning needs nothing more than a feedforward through the metanetwork and some standard finetuning to prune at state-of-the-art. Our code is available at https://github.com/Yewei-Liu/MetaPruning
Updated: 2025-12-15 14:12:36
标题: 元修剪通过图元网络:网络修剪的通用元学习框架
摘要: 我们提出了一种全新的网络剪枝元学习框架。这是一个通用框架,理论上可以应用于几乎所有类型的网络以及各种剪枝方式,并具有很强的普适性和可迁移性。实验证明,它在许多热门和代表性的剪枝任务(包括CNN和Transformer)上取得了出色的结果。与以往依赖固定、手工制定的剪枝标准进行粗略剪枝的作品不同,或者采用需要在每次剪枝过程中进行特殊训练且缺乏普适性的学习剪枝方法。我们的框架可以通过神经网络(元网络)自动学习复杂的剪枝规则,并具有可以在没有任何特殊训练的情况下进行剪枝的广泛适用性。更具体地说,我们将元学习中新开发的元网络的概念引入剪枝中。元网络是一个以另一个网络为输入并产生修改后的网络为输出的网络。在本文中,我们首先建立了神经网络与图之间的双射映射,然后采用图神经网络作为我们的元网络。我们训练一个元网络,该网络可以自动学习剪枝策略,并将一个难以剪枝的网络转化为一个更容易剪枝的网络。一旦元网络训练完成,我们的剪枝只需通过元网络前向传播和一些标准微调即可达到最先进的剪枝效果。我们的代码可以在https://github.com/Yewei-Liu/MetaPruning找到。
更新时间: 2025-12-15 14:12:36
领域: cs.LG,cs.AI,cs.CV
Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles
Autonomous Underwater Vehicles (AUVs) require reliable six-degree-of-freedom (6-DOF) position control to operate effectively in complex and dynamic marine environments. Traditional controllers are effective under nominal conditions but exhibit degraded performance when faced with unmodeled dynamics or environmental disturbances. Reinforcement learning (RL) provides a powerful alternative but training is typically slow and sim-to-real transfer remains challenging. This work introduces a GPU-accelerated RL training pipeline built in JAX and MuJoCo-XLA (MJX). By jointly JIT-compiling large-scale parallel physics simulation and learning updates, we achieve training times of under two minutes.Through systematic evaluation of multiple RL algorithms, we show robust 6-DOF trajectory tracking and effective disturbance rejection in real underwater experiments, with policies transferred zero-shot from simulation. Our results provide the first explicit real-world demonstration of RL-based AUV position control across all six degrees of freedom.
Updated: 2025-12-15 14:12:32
标题: 快速策略学习用于水下载具六自由度位置控制
摘要: 自主水下车辆(AUVs)需要可靠的六自由度(6-DOF)位置控制,以在复杂和动态的海洋环境中有效运行。传统控制器在正常条件下有效,但在面对未建模的动态或环境干扰时表现出性能下降。强化学习(RL)提供了一个强大的替代方案,但训练通常较慢,而且从模拟到现实的转移仍然具有挑战性。本文介绍了一个在JAX和MuJoCo-XLA(MJX)中构建的GPU加速的RL训练流程。通过联合即时编译大规模并行物理模拟和学习更新,我们实现了不到两分钟的训练时间。通过对多个RL算法进行系统评估,我们展示了在真实水下实验中的鲁棒6-DOF轨迹跟踪和有效的干扰拒绝,策略从模拟中零-shot转移。我们的结果提供了基于RL的AUV位置控制在所有六个自由度上的首次明确的实际演示。
更新时间: 2025-12-15 14:12:32
领域: cs.RO,cs.LG
Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)
This paper proposes a reinforcement learning (RL) framework for controlling and stabilizing the Twin Rotor Aerodynamic System (TRAS) at specific pitch and azimuth angles and tracking a given trajectory. The complex dynamics and non-linear characteristics of the TRAS make it challenging to control using traditional control algorithms. However, recent developments in RL have attracted interest due to their potential applications in the control of multirotors. The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm was used in this paper to train the RL agent. This algorithm is used for environments with continuous state and action spaces, similar to the TRAS, as it does not require a model of the system. The simulation results illustrated the effectiveness of the RL control method. Next, external disturbances in the form of wind disturbances were used to test the controller's effectiveness compared to conventional PID controllers. Lastly, experiments on a laboratory setup were carried out to confirm the controller's effectiveness in real-world applications.
Updated: 2025-12-15 14:10:04
标题: 使用双延迟深度确定性策略梯度(TD3)控制双转子
摘要: 本文提出了一种强化学习(RL)框架,用于控制和稳定双旋翼气动系统(TRAS)在特定俯仰和方位角,并跟踪给定轨迹。TRAS的复杂动力学和非线性特性使得使用传统控制算法控制具有挑战性。然而,最近RL的发展引起了兴趣,因为它们在多旋翼控制中的潜在应用。本文使用了Twin Delayed Deep Deterministic Policy Gradient(TD3)算法来训练RL代理。这个算法用于具有连续状态和动作空间的环境,类似于TRAS,因为它不需要系统模型。模拟结果表明了RL控制方法的有效性。接下来,以风扰动的形式引入外部干扰来测试控制器相对于传统PID控制器的效果。最后,在实验室设置中进行了实验,以确认控制器在实际应用中的有效性。
更新时间: 2025-12-15 14:10:04
领域: cs.RO,cs.AI,cs.LG
On the Effectiveness of Membership Inference in Targeted Data Extraction from Large Language Models
Large Language Models (LLMs) are prone to mem- orizing training data, which poses serious privacy risks. Two of the most prominent concerns are training data extraction and Membership Inference Attacks (MIAs). Prior research has shown that these threats are interconnected: adversaries can extract training data from an LLM by querying the model to generate a large volume of text and subsequently applying MIAs to verify whether a particular data point was included in the training set. In this study, we integrate multiple MIA techniques into the data extraction pipeline to systematically benchmark their effectiveness. We then compare their performance in this integrated setting against results from conventional MIA bench- marks, allowing us to evaluate their practical utility in real-world extraction scenarios.
Updated: 2025-12-15 14:05:49
标题: 关于从大型语言模型中进行定向数据提取的成员推理有效性
摘要: 大型语言模型(LLMs)容易记忆训练数据,这会带来严重的隐私风险。最突出的两个问题是训练数据提取和成员推理攻击(MIAs)。先前的研究表明这些威胁是相互关联的:对手可以通过查询模型生成大量文本来从LLM中提取训练数据,然后应用MIAs来验证特定数据点是否包含在训练集中。在这项研究中,我们将多种MIA技术整合到数据提取管道中,系统地评估它们的有效性。然后,我们将它们在这个整合设置中的表现与传统的MIA基准结果进行比较,从而评估它们在真实提取场景中的实际效用。
更新时间: 2025-12-15 14:05:49
领域: cs.LG,cs.CL,cs.CR
IPR-1: Interactive Physical Reasoner
Humans learn by observing, interacting with environments, and internalizing physics and causality. Here, we aim to ask whether an agent can similarly acquire human-like reasoning from interaction and keep improving with more experience. To study this, we introduce a Game-to-Unseen (G2U) benchmark of 1,000+ heterogeneous games that exhibit significant visual domain gaps. Existing approaches, including VLMs and world models, struggle to capture underlying physics and causality since they are not focused on core mechanisms and overfit to visual details. VLM/VLA agents reason but lack look-ahead in interactive settings, while world models imagine but imitate visual patterns rather than analyze physics and causality. We therefore propose IPR (Interactive Physical Reasoner), using world-model rollouts to score and reinforce a VLM's policy, and introduce PhysCode, a physics-centric action code aligning semantic intent with dynamics to provide a shared action space for prediction and reasoning. Pretrained on 1,000+ games, our IPR performs robustly on levels from primitive intuition to goal-driven reasoning, and even surpasses GPT-5 overall. We find that performance improves with more training games and interaction steps, and that the model also zero-shot transfers to unseen games. These results support physics-centric interaction as a path to steadily improving physical reasoning. Further demos and project details can be found at https://mybearyzhang.github.io/ipr-1.
Updated: 2025-12-15 14:03:42
标题: IPR-1: 互动物理推理者
摘要: 人类通过观察、与环境互动以及内化物理和因果关系来学习。在这里,我们旨在探讨一个代理是否可以通过互动类似地从人类那里获得推理能力,并随着更多经验的积累不断提高。为了研究这一点,我们引入了一个包含1000多个异质游戏的Game-to-Unseen(G2U)基准,这些游戏展示了显著的视觉领域差距。现有的方法,包括VLMs和世界模型,在捕捉底层物理和因果关系时面临困难,因为它们没有专注于核心机制,并且过度拟合于视觉细节。VLM/VLA代理推理但在互动环境中缺乏预见,而世界模型想象但模仿视觉模式而不是分析物理和因果关系。因此,我们提出了IPR(Interactive Physical Reasoner),利用世界模型的展开来评分和强化VLM的策略,并引入了PhysCode,一个以物理为中心的动作代码,将语义意图与动力学对齐,为预测和推理提供共享的动作空间。在1000多个游戏上进行预训练后,我们的IPR在从基本直觉到目标驱动的推理的各个级别上表现出色,甚至在整体上超越了GPT-5。我们发现,性能随着更多训练游戏和互动步骤的增加而提高,并且该模型也可以零-shot转移到未知游戏。这些结果支持以物理为中心的互动作为逐步提高物理推理能力的途径。更多演示和项目详情可在https://mybearyzhang.github.io/ipr-1找到。
更新时间: 2025-12-15 14:03:42
领域: cs.AI,cs.CV
In-context Learning of Evolving Data Streams with Tabular Foundational Models
State-of-the-art data stream mining has long drawn from ensembles of the Very Fast Decision Tree, a seminal algorithm honored with the 2015 KDD Test-of-Time Award. However, the emergence of large tabular models, i.e., transformers designed for structured numerical data, marks a significant paradigm shift. These models move beyond traditional weight updates, instead employing in-context learning through prompt tuning. By using on-the-fly sketches to summarize unbounded streaming data, one can feed this information into a pre-trained model for efficient processing. This work bridges advancements from both areas, highlighting how transformers' implicit meta-learning abilities, pre-training on drifting natural data, and reliance on context optimization directly address the core challenges of adaptive learning in dynamic environments. Exploring real-time model adaptation, this research demonstrates that TabPFN, coupled with a simple sliding memory strategy, consistently outperforms ensembles of Hoeffding trees, such as Adaptive Random Forest, and Streaming Random Patches, across all non-stationary benchmarks.
Updated: 2025-12-15 13:59:16
标题: 在具有表格基础模型的演变数据流上的上下文学习
摘要: 最先进的数据流挖掘长期以来一直依赖于非常快速决策树的集成,这是一种被授予2015年KDD时间测试奖的开创性算法。然而,大型表格式模型的出现,即为结构化数值数据设计的变压器,标志着一个重大的范式转变。这些模型超越了传统的权重更新,而是通过即时调整进行上下文学习。通过使用即时草图对无限流数据进行总结,可以将这些信息馈送到一个预先训练的模型中进行高效处理。这项工作结合了这两个领域的进步,突显了变压器的隐式元学习能力,在漂移的自然数据上进行预训练,以及对上下文优化的依赖,直接解决了动态环境中自适应学习的核心挑战。通过探索实时模型适应性,这项研究表明,与自适应随机森林等Hoeffding树的集成相比,TabPFN与简单的滑动内存策略结合,始终在所有非平稳基准测试中表现出色。
更新时间: 2025-12-15 13:59:16
领域: cs.LG,cs.AI
Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models
Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate generation that ignores the solver's ability and yields low-value problems, or reliance on complex data pipelines to balance problem difficulty; and (ii) a lack of reasoning in problem generation, leading to shallow problem variants. In this paper, we develop a problem generator that reasons explicitly to plan problem directions before synthesis and adapts difficulty to the solver's ability. Specifically, we construct related problem pairs and augment them with intermediate problem-design CoT produced by a reasoning model. These data bootstrap problem-design strategies from the generator. Then, we treat the solver's feedback on synthetic problems as a reward signal, enabling the generator to calibrate difficulty and produce complementary problems near the edge of the solver's competence. Extensive experiments on 10 mathematical and general reasoning benchmarks show that our method achieves an average improvement of 2.5% and generalizes to both language and vision-language models. Moreover, a solver trained on the synthesized data provides improved rewards for continued generator training, enabling co-evolution and yielding a further 0.7% performance gain. Our code will be made publicly available here.
Updated: 2025-12-15 13:55:27
标题: 学习提出问题:基于推理驱动和求解器自适应的数据合成,用于大型推理模型
摘要: 数据综合用于训练大型推理模型,为有限的、人为策划的数据集提供了可扩展的替代方案,从而实现高质量数据的创建。然而,现有方法面临几个挑战:(i) 不加选择地生成问题,忽略求解器的能力并产生低价值问题,或依赖复杂的数据管道来平衡问题难度;(ii) 在问题生成过程中缺乏推理,导致问题变种表面化。在本文中,我们开发了一个问题生成器,在综合之前明确推理以规划问题方向,并根据求解器的能力调整难度。具体而言,我们构建相关的问题对,并通过一个推理模型生成的中间问题设计CoT来增强它们。这些数据从生成器中引导问题设计策略。然后,我们将求解器对合成问题的反馈视为奖励信号,使生成器能够校准难度,并生成接近求解器能力边缘的互补问题。对10个数学和一般推理基准的大量实验表明,我们的方法平均改进了2.5%,并且可以推广到语言和视觉-语言模型。此外,在合成数据上训练的求解器为继续生成器训练提供了改进的奖励,实现了共同进化,并产生了进一步的0.7%性能增益。我们的代码将在这里公开。
更新时间: 2025-12-15 13:55:27
领域: cs.AI,cs.CV
Link-Aware Energy-Frugal Continual Learning for Fault Detection in IoT Networks
The use of lightweight machine learning (ML) models in internet of things (IoT) networks enables resource constrained IoT devices to perform on-device inference for several critical applications. However, the inference accuracy deteriorates due to the non-stationarity in the IoT environment and limited initial training data. To counteract this, the deployed models can be updated occasionally with new observed data samples. However, this approach consumes additional energy, which is undesirable for energy constrained IoT devices. This letter introduces an event-driven communication framework that strategically integrates continual learning (CL) in IoT networks for energy-efficient fault detection. Our framework enables the IoT device and the edge server (ES) to collaboratively update the lightweight ML model by adapting to the wireless link conditions for communication and the available energy budget. Evaluation on real-world datasets show that the proposed approach can outperform both periodic sampling and non-adaptive CL in terms of inference recall; our proposed approach achieves up to a 42.8% improvement, even under tight energy and bandwidth constraint.
Updated: 2025-12-15 13:54:38
标题: 意识到链接的节能持续学习,在物联网网络中的故障检测
摘要: 在物联网(IoT)网络中使用轻量级机器学习(ML)模型使资源受限的IoT设备能够为多个关键应用执行设备端推断。然而,由于IoT环境中的非稳态性和有限的初始训练数据,推断精度会下降。为了对抗这一问题,部署的模型可以偶尔使用新观测数据样本进行更新。然而,这种方法会消耗额外的能量,这对于能量受限的IoT设备是不可取的。本文介绍了一种事件驱动通信框架,该框架在IoT网络中战略性地集成了持续学习(CL)以实现能效高的故障检测。我们的框架使IoT设备和边缘服务器(ES)能够通过适应通信的无线链路条件和可用能量预算来协同更新轻量级ML模型。对真实数据集的评估表明,所提出的方法在推断召回方面能够胜过周期采样和非自适应CL;我们的方法在能量和带宽受限的情况下甚至可以实现高达42.8%的改进。
更新时间: 2025-12-15 13:54:38
领域: cs.LG,cs.NI
FROC: A Unified Framework with Risk-Optimized Control for Machine Unlearning in LLMs
Machine unlearning (MU) seeks to eliminate the influence of specific training examples from deployed models. As large language models (LLMs) become widely used, managing risks arising from insufficient forgetting or utility loss is increasingly crucial. Current MU techniques lack effective mechanisms for evaluating and controlling these risks, hindering the selection of strategies that appropriately balance safety and utility, and raising trust concerns surrounding the "right to be forgotten." To address these issues, we propose FROC, a unified framework with Risk-Optimized Control for machine unlearning in LLMs. FROC is built around a conformal-style risk-control formulation that expresses a user-specified risk budget on unlearning behavior. This probability-based constraint enables FROC to compare MU strategies, identify feasible operating regions, and guide hyperparameter selection according to desired trade-offs between forgetting sufficiency and utility preservation. To operationalize this constraint, FROC introduces a smoothly varying continuous risk model that aggregates forgetting deficiency and utility degradation into a single configuration-level score. Building on conformal risk analysis, FROC computes (1) the Conformal Unlearning Risk (CUR), a data-driven estimated value on the probability that forgotten samples continue to influence model predictions, and (2) risk-controlled configuration sets, which identify unlearning hyperparameters that are valid under the specified risk budget. Experiments across multiple LLM MU methods demonstrate that FROC produces stable, interpretable risk landscapes and reveals consistent relationships between unlearning configurations, semantic shift, and utility impact. FROC reframes MU as a controllable, risk-aware process and offers a practical foundation for managing unlearning behavior in large-scale LLM deployments.
Updated: 2025-12-15 13:53:12
标题: FROC:具有风险优化控制的统一框架,用于在LLM中进行机器遗忘
摘要: 机器遗忘(MU)旨在消除已部署模型中特定训练示例的影响。随着大型语言模型(LLMs)的广泛应用,管理因遗忘不足或效用损失而产生的风险变得越来越关键。当前的MU技术缺乏有效的机制来评估和控制这些风险,阻碍了适当平衡安全性和效用性的策略的选择,并引发了围绕“被遗忘权”的信任问题。为了解决这些问题,我们提出了FROC,这是一个用于LLMs中机器遗忘的风险优化控制的统一框架。FROC围绕一种类似于符合性的风险控制公式构建,表达了用户指定的对遗忘行为的风险预算。这种基于概率的约束使得FROC能够比较MU策略,确定可行的操作区域,并根据所需的遗忘充分性和效用保留之间的权衡选择引导超参数选择。为了实现这种约束,FROC引入了一个平滑变化的连续风险模型,将遗忘不足和效用降级聚合成一个单一的配置级别分数。基于符合性风险分析,FROC计算了(1)符合性遗忘风险(CUR),这是一个基于数据的估计值,用于描述已遗忘样本继续影响模型预测的概率,以及(2)受风险控制的配置集,这些集合确定了在指定风险预算下有效的遗忘超参数。跨多个LLM MU方法的实验表明,FROC产生了稳定、可解释的风险景观,并揭示了遗忘配置、语义转变和效用影响之间的一致关系。FROC将MU重新构建为一个可控的、风险感知的过程,并为在大规模LLM部署中管理遗忘行为提供了实践基础。
更新时间: 2025-12-15 13:53:12
领域: cs.LG
KD-PINN: Knowledge-Distilled PINNs for ultra-low-latency real-time neural PDE solvers
This work introduces Knowledge-Distilled Physics-Informed Neural Networks (KD-PINN), a framework that transfers the predictive accuracy of a high-capacity teacher model to a compact student through a continuous adaptation of the Kullback-Leibler divergence. To confirm its generality for various dynamics and dimensionalities, the framework is evaluated on a representative set of partial differential equations (PDEs). In all tested cases, the student model preserved the teacher's physical accuracy, with a mean RMSE increase below 0.64%, and achieved inference speedups ranging from 4.8x (Navier-Stokes) to 6.9x (Burgers). The distillation process also revealed a regularizing effect. With an average inference latency of 5.3 ms on CPU, the distilled models enter the ultra-low-latency real-time regime defined by sub-10 ms performance. Finally, this study examines how knowledge distillation reduces inference latency in PINNs to contribute to the development of accurate ultra-low-latency neural PDE solvers.
Updated: 2025-12-15 13:51:20
标题: KD-PINN:知识蒸馏PINN用于超低延迟实时神经PDE求解器
摘要: 这项工作介绍了知识蒸馏物理信息神经网络(KD-PINN),这是一个通过持续调整Kullback-Leibler散度将高容量教师模型的预测准确性转移到紧凑学生模型的框架。为了确认其在各种动力学和维度上的普适性,该框架在代表性的偏微分方程(PDEs)集合上进行了评估。在所有测试案例中,学生模型保留了教师的物理准确性,均方根误差增加不超过0.64%,并实现了推断加速,范围从4.8倍(Navier-Stokes)到6.9倍(Burgers)。蒸馏过程还表现出一种正则化效应。在CPU上的平均推断延迟为5.3毫秒,蒸馏模型进入了由次于10毫秒性能定义的超低延迟实时领域。最后,本研究探讨了知识蒸馏如何降低PINNs中的推断延迟,以促进精确的超低延迟神经PDE求解器的发展。
更新时间: 2025-12-15 13:51:20
领域: cs.LG,math.NA
Quantum Disruption: An SOK of How Post-Quantum Attackers Reshape Blockchain Security and Performance
As quantum computing advances toward practical deployment, it threatens a wide range of classical cryptographic mechanisms, including digital signatures, key exchange protocols, public-key encryption, and certain hash-based constructions that underpin modern network infrastructures. These primitives form the security backbone of most blockchain platforms, raising serious concerns about the long-term viability of blockchain systems in a post-quantum world. Although migrating to post-quantum cryptography may appear straightforward, the substantially larger key sizes and higher computational costs of post-quantum primitives can introduce significant challenges and, in some cases, render such transitions impractical for blockchain environments. In this paper, we examine the implications of adopting post-quantum cryptography in blockchain systems across four key dimensions. We begin by identifying the cryptographic primitives within blockchain architectures that are most vulnerable to quantum attacks, particularly those used in consensus mechanisms, identity management, and transaction validation. We then survey proposed post-quantum adaptations across existing blockchain designs, analyzing their feasibility within decentralized and resource-constrained settings. Building on this analysis, we evaluate how replacing classical primitives with post-quantum alternatives affects system performance, protocol dynamics, and the incentive and trust structures that sustain blockchain ecosystems. Our study demonstrates that integrating post-quantum signature schemes into blockchain systems is not a simple drop-in replacement; instead, it requires careful architectural redesign, as naive substitutions risk undermining both security guarantees and operational efficiency.
Updated: 2025-12-15 13:48:14
标题: 量子破坏:后量子攻击者如何重塑区块链安全性和性能的SOK
摘要: 随着量子计算朝着实际部署的方向发展,它威胁着一系列经典的加密机制,包括数字签名、密钥交换协议、公钥加密以及支撑现代网络基础设施的某些基于哈希的构建。这些基本构建形成了大多数区块链平台的安全支柱,引发了关于区块链系统在后量子世界中长期可行性的严重关切。尽管迁移到后量子密码学可能看起来很简单,但后量子基元的更大密钥大小和更高计算成本可能会引入重大挑战,并在某些情况下使这种过渡对区块链环境不切实际。 在本文中,我们从四个关键维度探讨了在区块链系统中采用后量子密码学的影响。我们首先确定了区块链体系结构中最容易受到量子攻击的加密基元,特别是那些用于共识机制、身份管理和交易验证的基元。然后我们调查了现有区块链设计中提出的后量子适应方案,并分析了它们在去中心化和资源受限的环境中的可行性。基于这一分析,我们评估了用后量子替代方案替换经典基元如何影响系统性能、协议动态以及维持区块链生态系统的激励和信任结构。我们的研究表明,将后量子签名方案集成到区块链系统中并非简单的替换操作;相反,它需要仔细的架构重新设计,因为天真的替代可能会损害安全保障和操作效率。
更新时间: 2025-12-15 13:48:14
领域: cs.CR
FIN-bench-v2: A Unified and Robust Benchmark Suite for Evaluating Finnish Large Language Models
We introduce FIN-bench-v2, a unified benchmark suite for evaluating large language models in Finnish. FIN-bench-v2 consolidates Finnish versions of widely used benchmarks together with an updated and expanded version of the original FIN-bench into a single, consistently formatted collection, covering multiple-choice and generative tasks across reading comprehension, commonsense reasoning, sentiment analysis, world knowledge, and alignment. All datasets are converted to HuggingFace Datasets, which include both cloze and multiple-choice prompt formulations with five variants per task, and we incorporate human annotation or review for machine-translated resources such as GoldenSwag and XED. To select robust tasks, we pretrain a set of 2.15B-parameter decoder-only models and use their learning curves to compute monotonicity, signal-to-noise, non-random performance, and model ordering consistency, retaining only tasks that satisfy all criteria. We further evaluate a set of larger instruction-tuned models to characterize performance across tasks and prompt formulations. All datasets, prompts, and evaluation configurations are publicly available via our fork of the Language Model Evaluation Harness at https://github.com/LumiOpen/lm-evaluation-harness. Supplementary resources are released in a separate repository at https://github.com/TurkuNLP/FIN-bench-v2.
Updated: 2025-12-15 13:41:41
标题: FIN-bench-v2:一个统一且稳健的基准套件,用于评估芬兰大型语言模型
摘要: 我们介绍了FIN-bench-v2,这是一个用于评估大型语言模型在芬兰语中的统一基准套件。FIN-bench-v2将广泛使用的基准测试的芬兰语版本与原始FIN-bench的更新和扩展版本整合在一起,形成一个一致格式的集合,涵盖阅读理解、常识推理、情感分析、世界知识和对齐等多项选择和生成任务。所有数据集都转换为HuggingFace数据集,其中包括每个任务的填空和多选提示公式,每个任务有五个变种,并且我们还为像GoldenSwag和XED这样的机器翻译资源添加人工注释或审查。为了选择稳健的任务,我们对一组2.15B参数的仅解码器模型进行预训练,并利用它们的学习曲线来计算单调性、信噪比、非随机性能和模型排序一致性,只保留满足所有标准的任务。我们进一步评估了一组更大的指令调整模型,以描述跨任务和提示公式的性能。所有数据集、提示和评估配置都可以通过我们的Language Model Evaluation Harness分支https://github.com/LumiOpen/lm-evaluation-harness 公开获取。补充资源发布在https://github.com/TurkuNLP/FIN-bench-v2 的一个单独的存储库中。
更新时间: 2025-12-15 13:41:41
领域: cs.CL,cs.AI
Security and Detectability Analysis of Unicode Text Watermarking Methods Against Large Language Models
Securing digital text is becoming increasingly relevant due to the widespread use of large language models. Individuals' fear of losing control over data when it is being used to train such machine learning models or when distinguishing model-generated output from text written by humans. Digital watermarking provides additional protection by embedding an invisible watermark within the data that requires protection. However, little work has been taken to analyze and verify if existing digital text watermarking methods are secure and undetectable by large language models. In this paper, we investigate the security-related area of watermarking and machine learning models for text data. In a controlled testbed of three experiments, ten existing Unicode text watermarking methods were implemented and analyzed across six large language models: GPT-5, GPT-4o, Teuken 7B, Llama 3.3, Claude Sonnet 4, and Gemini 2.5 Pro. The findings of our experiments indicate that, especially the latest reasoning models, can detect a watermarked text. Nevertheless, all models fail to extract the watermark unless implementation details in the form of source code are provided. We discuss the implications for security researchers and practitioners and outline future research opportunities to address security concerns.
Updated: 2025-12-15 13:40:00
标题: 大语言模型背景下Unicode文本水印方法的安全性和可检测性分析
摘要: 由于大型语言模型的广泛应用,保护数字文本变得越来越重要。个人担心在用于训练这些机器学习模型或区分由人类编写的文本和模型生成的输出时失去对数据的控制。数字水印技术通过在需要保护的数据中嵌入一个不可见的水印来提供额外的保护。然而,目前很少有研究分析和验证现有的数字文本水印技术是否安全且无法被大型语言模型检测到。本文研究了文本数据的水印和机器学习模型的安全相关领域。在三个实验的受控测试中,实施并分析了十种现有的Unicode文本水印方法,跨越了六个大型语言模型:GPT-5、GPT-4o、Teuken 7B、Llama 3.3、Claude Sonnet 4和Gemini 2.5 Pro。我们实验的结果表明,特别是最新的推理模型,可以检测到水印文本。然而,除非提供源代码等实施细节,所有模型均无法提取水印。我们讨论了对安全研究人员和从业者的影响,并概述了未来研究机会,以解决安全问题。
更新时间: 2025-12-15 13:40:00
领域: cs.CR,cs.AI,cs.LG
Error-Driven Prompt Optimization for Arithmetic Reasoning
Recent advancements in artificial intelligence have sparked interest in industrial agents capable of supporting analysts in regulated sectors, such as finance and healthcare, within tabular data workflows. A key capability for such systems is performing accurate arithmetic operations on structured data while ensuring sensitive information never leaves secure, on-premises environments. Here, we introduce an error-driven optimization framework for arithmetic reasoning that enhances a Code Generation Agent (CGA), specifically applied to on-premises small language models (SLMs). Through a systematic evaluation of a leading SLM (Qwen3 4B), we find that while the base model exhibits fundamental limitations in arithmetic tasks, our proposed error-driven method, which clusters erroneous predictions to refine prompt-rules iteratively, dramatically improves performance, elevating the model's accuracy to 70.8\%. Our results suggest that developing reliable, interpretable, and industrially deployable AI assistants can be achieved not only through costly fine-tuning but also via systematic, error-driven prompt optimization, enabling small models to surpass larger language models (GPT-3.5 Turbo) in a privacy-compliant manner.
Updated: 2025-12-15 13:39:14
标题: 基于错误驱动的算术推理提示优化
摘要: 最近人工智能的进展引起了对工业代理人的兴趣,这些代理人能够在金融和医疗保健等受监管行业的表格数据工作流中支持分析师。这类系统的一个关键能力是在结构化数据上执行准确的算术运算,同时确保敏感信息永远不会离开安全的本地环境。在这里,我们介绍了一个基于错误驱动优化框架的算术推理方法,可增强代码生成代理(CGA),特别应用于本地小语言模型(SLM)。通过对一个领先的SLM(Qwen3 4B)进行系统评估,我们发现,虽然基础模型在算术任务中存在基本限制,但我们提出的错误驱动方法,通过将错误预测聚类以迭代地优化提示规则,显著提高了性能,将模型的准确度提升到70.8%。我们的结果表明,开发可靠、可解释和可工业部署的AI助手不仅可以通过昂贵的微调实现,还可以通过系统化的错误驱动提示优化实现,使小模型能够以符合隐私规定的方式超越更大的语言模型(GPT-3.5 Turbo)。
更新时间: 2025-12-15 13:39:14
领域: cs.AI,cs.LG
Relational Anatomical Supervision for Accurate 3D Multi-Chamber Cardiac Mesh Reconstruction
Accurate reconstruction of multi-chamber cardiac anatomy from medical images is a cornerstone for patient-specific modeling, physiological simulation, and interventional planning. However, current reconstruction pipelines fundamentally rely on surface-wise geometric supervision and model each chamber in isolation, resulting in anatomically implausible inter-chamber violations despite apparently favorable overlap or distance metrics. In this work, we propose a relational anatomical supervision framework for multi-chamber cardiac mesh reconstruction by introducing a Mesh Interrelation Enhancement (MIE) loss. The proposed formulation explicitly encodes spatial relationships between cardiac structures into a differentiable occupancy-based objective, thereby transforming qualitative anatomical rules into quantitative geometric supervision. We further establish violation-aware evaluation metrics to directly quantify inter-chamber structural correctness, revealing systematic limitations of commonly used geometric measures such as Dice and Chamfer distance. Extensive experiments on multi-center CT data, densely sampled MR data, and two independent external cohorts, including a highly heterogeneous congenital heart disease population, demonstrate that the proposed method consistently suppresses clinically critical boundary violations by up to 83\%, while maintaining competitive volumetric accuracy and achieving superior surface fidelity. Notably, the proposed relational supervision generalizes robustly across imaging modalities, centers, and pathological conditions, even under severe anatomical deformation. These results demonstrate that distance-based supervision alone is insufficient to guarantee anatomically faithful reconstruction, and that explicit enforcement of multi-structure anatomical relations provides a principled and robust pathway toward reliable patient-specific cardiac modeling.
Updated: 2025-12-15 13:35:29
标题: 关于准确3D多腔心脏网格重建的解剖关系监督
摘要: 准确重建多腔室心脏解剖结构是患者特异建模、生理模拟和干预规划的基石。然而,当前的重建流程基本上依赖于表面几何监督,并将每个心室单独建模,导致解剖学上不合理的腔室违规,尽管表面上看起来重叠或距离指标是有利的。在这项工作中,我们提出了一个用于多腔室心脏网格重建的关系解剖监督框架,引入了Mesh Interrelation Enhancement(MIE)损失。所提出的公式将心脏结构之间的空间关系明确地编码为可微的基于占用的目标,从而将定性解剖规则转化为定量几何监督。我们进一步建立了违规感知评估指标,直接量化了腔室结构的正确性,揭示了常用几何度量(如Dice和Chamfer距离)的系统性限制。对多中心CT数据、密集采样的MR数据以及两个独立的外部队列(包括高度异质性的先天性心脏病人群)进行了广泛实验,结果表明所提出的方法能够稳定抑制临床上关键边界违规高达83%,同时保持竞争性的容积准确性并实现更优越的表面保真度。值得注意的是,所提出的关系监督在成像模态、中心和病理状况下都能够稳健地推广,即使在严重解剖变形的情况下也是如此。这些结果表明,仅仅基于距离的监督是不足以保证解剖上忠实的重建的,明确强制多结构解剖关系提供了一条基于原则且稳健的可靠患者特异性心脏建模途径。
更新时间: 2025-12-15 13:35:29
领域: cs.CV,cs.AI
Face Identity Unlearning for Retrieval via Embedding Dispersion
Face recognition systems rely on learning highly discriminative and compact identity clusters to enable accurate retrieval. However, as with other surveillance-oriented technologies, such systems raise serious privacy concerns due to their potential for unauthorized identity tracking. While several works have explored machine unlearning as a means of privacy protection, their applicability to face retrieval - especially for modern embedding-based recognition models - remains largely unexplored. In this work, we study the problem of face identity unlearning for retrieval systems and present its inherent challenges. The goal is to make selected identities unretrievable by dispersing their embeddings on the hypersphere and preventing the formation of compact identity clusters that enable re-identification in the gallery. The primary challenge is to achieve this forgetting effect while preserving the discriminative structure of the embedding space and the retrieval performance of the model for the remaining identities. To address this, we evaluate several existing approximate class unlearning methods (e.g., Random Labeling, Gradient Ascent, Boundary Unlearning, and other recent approaches) in the context of face retrieval and propose a simple yet effective dispersion-based unlearning approach. Extensive experiments on standard benchmarks (VGGFace2, CelebA) demonstrate that our method achieves superior forgetting behavior while preserving retrieval utility.
Updated: 2025-12-15 13:35:28
标题: 面部身份遗忘用于通过嵌入离散性进行检索
摘要: 人脸识别系统依赖于学习高度具有区分性和紧凑性的身份簇,以实现准确的检索。然而,与其他面向监视的技术一样,这些系统由于可能存在未经授权的身份跟踪潜力而引发严重的隐私问题。虽然一些研究已经探讨了机器遗忘作为一种隐私保护手段,但它们对于面向检索的人脸 - 尤其是现代基于嵌入的识别模型 - 的适用性仍然大部分未被探讨。在这项工作中,我们研究了人脸身份遗忘对于检索系统的问题,并提出了其固有的挑战。目标是通过在超球面上分散其嵌入并防止形成使图库中的重新识别变得可能的紧凑身份簇,使选定的身份无法检索。主要挑战在于在保留嵌入空间的区分性结构和模型对于其余身份的检索性能的同时实现这种遗忘效果。为了解决这个问题,我们在人脸检索的背景下评估了几种现有的近似类别遗忘方法(例如,随机标记、梯度上升、边界遗忘和其他最近的方法),并提出了一种简单而有效的基于分散的遗忘方法。对标准基准(VGGFace2、CelebA)的大量实验表明,我们的方法在保留检索效用的同时实现了更好的遗忘行为。
更新时间: 2025-12-15 13:35:28
领域: cs.CV,cs.AI,cs.LG
ALIGN-FL: Architecture-independent Learning through Invariant Generative component sharing in Federated Learning
We present ALIGN-FL, a novel approach to distributed learning that addresses the challenge of learning from highly disjoint data distributions through selective sharing of generative components. Instead of exchanging full model parameters, our framework enables privacy-preserving learning by transferring only generative capabilities across clients, while the server performs global training using synthetic samples. Through complementary privacy mechanisms: DP-SGD with adaptive clipping and Lipschitz regularized VAE decoders and a stateful architecture supporting heterogeneous clients, we experimentally validate our approach on MNIST and Fashion-MNIST datasets with cross-domain outliers. Our analysis demonstrates that both privacy mechanisms effectively map sensitive outliers to typical data points while maintaining utility in extreme Non-IID scenarios typical of cross-silo collaborations. Index Terms: Client-invariant Learning, Federated Learning (FL), Privacy-preserving Generative Models, Non-Independent and Identically Distributed (Non-IID), Heterogeneous Architectures
Updated: 2025-12-15 13:35:27
标题: ALIGN-FL:通过不变生成组件共享实现与架构无关的联邦学习
摘要: 我们提出了ALIGN-FL,这是一种新颖的分布式学习方法,通过选择性共享生成组件来解决从高度不连贯的数据分布中学习的挑战。我们的框架不是交换完整的模型参数,而是通过在客户端之间传输生成能力,使服务器使用合成样本进行全局训练,从而实现保护隐私的学习。通过互补的隐私机制:具有自适应剪切和李普希茨正则化VAE解码器的DP-SGD和支持异构客户端的有状态架构,我们在MNIST和Fashion-MNIST数据集上进行了实验验证我们的方法,包括跨领域异常值。我们的分析表明,这两种隐私机制能够有效地将敏感的异常值映射到典型数据点,同时在典型的跨隔离合作中典型的极端非独立和同分布(Non-IID)情况下保持效用。 索引词:客户端不变学习,联邦学习(FL),保护隐私的生成模型,非独立和同分布(Non-IID),异构架构
更新时间: 2025-12-15 13:35:27
领域: cs.LG,cs.AI
DynamiX: Large-Scale Dynamic Social Network Simulator
Understanding the intrinsic mechanisms of social platforms is an urgent demand to maintain social stability. The rise of large language models provides significant potential for social network simulations to capture attitude dynamics and reproduce collective behaviors. However, existing studies mainly focus on scaling up agent populations, neglecting the dynamic evolution of social relationships. To address this gap, we introduce DynamiX, a novel large-scale social network simulator dedicated to dynamic social network modeling. DynamiX uses a dynamic hierarchy module for selecting core agents with key characteristics at each timestep, enabling accurate alignment of real-world adaptive switching of user roles. Furthermore, we design distinct dynamic social relationship modeling strategies for different user types. For opinion leaders, we propose an information-stream-based link prediction method recommending potential users with similar stances, simulating homogeneous connections, and autonomous behavior decisions. For ordinary users, we construct an inequality-oriented behavior decision-making module, effectively addressing unequal social interactions and capturing the patterns of relationship adjustments driven by multi-dimensional factors. Experimental results demonstrate that DynamiX exhibits marked improvements in attitude evolution simulation and collective behavior analysis compared to static networks. Besides, DynamiX opens a new theoretical perspective on follower growth prediction, providing empirical evidence for opinion leaders cultivation.
Updated: 2025-12-15 13:32:12
标题: DynamiX: 大规模动态社交网络模拟器
摘要: 理解社交平台的内在机制是维护社会稳定的迫切需求。大型语言模型的兴起为社交网络模拟提供了显著的潜力,可以捕捉态度动态并再现集体行为。然而,现有研究主要集中在扩大代理人群体规模,忽视了社会关系的动态演变。为了填补这一空白,我们引入了DynamiX,一种专门用于动态社交网络建模的新型大规模社交网络模拟器。DynamiX使用动态层次结构模块,在每个时间步中选择具有关键特征的核心代理人,从而实现与现实世界用户角色自适应切换的准确对齐。此外,我们为不同用户类型设计了不同的动态社交关系建模策略。对于意见领袖,我们提出了一种基于信息流的链接预测方法,推荐具有相似立场的潜在用户,模拟同质连接和自主行为决策。对于普通用户,我们构建了一个基于不平等行为决策的模块,有效解决不平等的社交互动,并捕捉由多维因素驱动的关系调整模式。实验结果表明,与静态网络相比,DynamiX在态度演变模拟和集体行为分析方面表现出显著改进。此外,DynamiX开辟了一个新的关注追随者增长预测的理论视角,为意见领袖培育提供了经验证据。
更新时间: 2025-12-15 13:32:12
领域: physics.soc-ph,cs.AI
Adaptive Risk Mitigation in Demand Learning
We study dynamic pricing of a product with an unknown demand distribution over a finite horizon. Departing from the standard no-regret learning environment in which prices can be adjusted at any time, we restrict price changes to predetermined points in time to reflect common retail practice. This constraint, coupled with demand model ambiguity and an unknown customer arrival pattern, imposes a high risk of revenue loss, as a price based on a misestimated demand model may be applied to many customers before it can be revised. We develop an adaptive risk learning (ARL) framework that embeds a data-driven ambiguity set (DAS) to quantify demand model ambiguity by adapting to the unknown arrival pattern. Initially, when arrivals are few, the DAS includes a broad set of plausible demand models, reflecting high ambiguity and revenue risk. As new data is collected through pricing, the DAS progressively shrinks, capturing the reduction in model ambiguity and associated risk. We establish the probabilistic convergence of the DAS to the true demand model and derive a regret bound for the ARL policy that explicitly links revenue loss to the data required for the DAS to identify the true model with high probability. The dependence of our regret bound on the arrival pattern is unique to our constrained dynamic pricing problem and contrasts with no-regret learning environments, where regret is constant and arrival-pattern independent. Relaxing the constraint on infrequent price changes, we show that ARL attains the known constant regret bound. Numerical experiments further demonstrate that ARL outperforms benchmarks that prioritize either regret or risk alone by adaptively balancing both without knowledge of the arrival pattern. This adaptive risk adjustment is crucial for achieving high revenues and low downside risk when prices are sticky and both demand and arrival patterns are unknown.
Updated: 2025-12-15 13:25:58
标题: 需求学习中的自适应风险缓解
摘要: 我们研究了在有限时间内对具有未知需求分布的产品进行动态定价。与标准的无悔学习环境不同,在该环境中价格可以随时调整,我们将价格更改限制为预定的时间点,以反映常见的零售实践。这种约束,加上需求模型的模糊性和未知的客户到达模式,会导致高风险的收入损失,因为基于错误估计的需求模型可能在可以修正之前应用于许多客户。我们开发了一种自适应风险学习(ARL)框架,它嵌入了一个数据驱动的模糊集(DAS),通过适应未知到达模式来量化需求模型的模糊性。最初,当到达较少时,DAS包括一组广泛的合理需求模型,反映出高模糊性和收入风险。随着通过定价收集的新数据,DAS逐渐缩小,捕捉模型模糊性和相关风险的减少。我们建立了DAS收敛到真实需求模型的概率收敛性,并为ARL策略导出了一个遗憾界限,明确将收入损失与DAS识别真实模型所需的数据与高概率联系起来。我们的遗憾界限对到达模式的依赖是我们受限的动态定价问题所特有的,与无悔学习环境形成对比,其中遗憾是恒定的且与到达模式无关。放宽对价格变化不频繁的约束,我们表明ARL实现了已知的恒定遗憾界限。数值实验进一步证明,ARL通过自适应平衡两者而超越了重视遗憾或风险的基准,而无需了解到达模式。这种自适应风险调整对于在价格固定且需求和到达模式都未知的情况下实现高收入和低下行风险至关重要。
更新时间: 2025-12-15 13:25:58
领域: cs.LG,cs.IR
Phythesis: Physics-Guided Evolutionary Scene Synthesis for Energy-Efficient Data Center Design via LLMs
Data center (DC) infrastructure serves as the backbone to support the escalating demand for computing capacity. Traditional design methodologies that blend human expertise with specialized simulation tools scale poorly with the increasing system complexity. Recent studies adopt generative artificial intelligence to design plausible human-centric indoor layouts. However, they do not consider the underlying physics, making them unsuitable for the DC design that sets quantifiable operational objectives and strict physical constraints. To bridge the gap, we propose Phythesis, a novel framework that synergizes large language models (LLMs) and physics-guided evolutionary optimization to automate simulation-ready (SimReady) scene synthesis for energy-efficient DC design. Phythesis employs an iterative bi-level optimization architecture, where (i) the LLM-driven optimization level generates physically plausible three-dimensional layouts and self-criticizes them to refine the scene topology, and (ii) the physics-informed optimization level identifies the optimal asset parameters and selects the best asset combination. Experiments on three generation scales show that Phythesis achieves 57.3% generation success rate increase and 11.5% power usage effectiveness (PUE) improvement, compared with the vanilla LLM-based solution.
Updated: 2025-12-15 13:25:12
标题: 物理学引导的进化场景综合,用于通过LLMs实现能源高效的数据中心设计
摘要: 数据中心(DC)基础设施作为支持不断增长的计算需求的支柱。传统的设计方法很难与日益复杂的系统相适应,因为它们将人类专业知识与专门的仿真工具结合在一起。最近的研究采用生成式人工智能来设计合理的以人为中心的室内布局。然而,它们没有考虑到底层物理原理,因此不适用于设置可量化运营目标和严格物理约束的DC设计。为了弥合这一差距,我们提出了Phythesis,这是一个新颖的框架,将大型语言模型(LLMs)与物理引导的进化优化相结合,自动化模拟就绪(SimReady)场景合成,用于节能DC设计。Phythesis采用迭代的双层优化架构,其中LLM驱动的优化层生成物理可信的三维布局,并对其进行自我批评以优化场景拓扑,物理信息优化层确定最佳资产参数并选择最佳资产组合。在三个不同的生成规模上的实验表明,与基于普通LLM的解决方案相比,Phythesis实现了57.3%的生成成功率增加和11.5%的功耗使用效率(PUE)改善。
更新时间: 2025-12-15 13:25:12
领域: cs.AI,cs.NE
Can Large Language Models Automate Phishing Warning Explanations? A Controlled Experiment on Effectiveness and User Perception
Phishing has become a prominent risk in modern cybersecurity, often used to bypass technological defences by exploiting predictable human behaviour. Warning dialogues are a standard mitigation measure, but the lack of explanatory clarity and static content limits their effectiveness. In this paper, we report on our research to assess the capacity of Large Language Models (LLMs) to generate clear, concise, and scalable explanations for phishing warnings. We carried out a large-scale between-subjects user study (N = 750) to compare the influence of warning dialogues supplemented with manually generated explanations against those generated by two LLMs, Claude 3.5 Sonnet and Llama 3.3 70B. We investigated two explanatory styles (feature-based and counterfactual) for their effects on behavioural metrics (click-through rate) and perceptual outcomes (e.g., trust, risk, clarity). The results provide empirical evidence that LLM-generated explanations achieve a level of protection statistically comparable to expert-crafted messages, effectively automating a high-cost task. While Claude 3.5 Sonnet showed a trend towards reducing click-through rates compared to manual baselines, Llama 3.3, despite being perceived as clearer, did not yield the same behavioral benefits. Feature-based explanations were more effective for genuine phishing attempts, whereas counterfactual explanations diminished false-positive rates. Other variables, such as workload, gender, and prior familiarity with warning dialogues, significantly moderated the effectiveness of warnings. These results indicate that LLMs can be used to automatically build explanations for warning users against phishing, and that such solutions are scalable, adaptive, and consistent with human-centred values.
Updated: 2025-12-15 13:15:51
标题: 大型语言模型能自动化钓鱼警告解释吗?对效果和用户感知的控制实验
摘要: 网络钓鱼已成为现代网络安全中突出的风险,经常利用可预测的人类行为来绕过技术防御措施。警告对话框是一种标准的缓解措施,但缺乏解释清晰度和静态内容限制了它们的有效性。本文报告了我们的研究,评估大型语言模型(LLMs)生成网络钓鱼警告的清晰、简洁和可扩展解释的能力。我们进行了一项大规模的受试者间用户研究(N = 750),比较了与手动生成解释相结合的警告对话框与由两个LLMs生成的对话框(Claude 3.5 Sonnet和Llama 3.3 70B)对行为指标(点击率)和感知结果(如信任、风险、清晰度)的影响。结果提供了实证证据,表明LLM生成的解释实现了与专家制作消息在统计上可比的保护水平,有效地自动化了一项高成本任务。尽管Claude 3.5 Sonnet显示出与手动基线相比降低点击率的趋势,但Llama 3.3尽管被认为更清晰,但并未产生相同的行为效益。基于特征的解释对真实的网络钓鱼尝试更有效,而反事实解释降低了误报率。其他变量,如工作量、性别和先前熟悉警告对话框的程度,显著调节了警告的有效性。这些结果表明,LLMs可用于自动构建网络钓鱼警告的解释,并且这样的解决方案是可扩展的、适应性强的,并且符合以人为中心的价值观。
更新时间: 2025-12-15 13:15:51
领域: cs.CR,cs.HC
No One Left Behind: How to Exploit the Incomplete and Skewed Multi-Label Data for Conversion Rate Prediction
In most real-world online advertising systems, advertisers typically have diverse customer acquisition goals. A common solution is to use multi-task learning (MTL) to train a unified model on post-click data to estimate the conversion rate (CVR) for these diverse targets. In practice, CVR prediction often encounters missing conversion data as many advertisers submit only a subset of user conversion actions due to privacy or other constraints, making the labels of multi-task data incomplete. If the model is trained on all available samples where advertisers submit user conversion actions, it may struggle when deployed to serve a subset of advertisers targeting specific conversion actions, as the training and deployment data distributions are mismatched. While considerable MTL efforts have been made, a long-standing challenge is how to effectively train a unified model with the incomplete and skewed multi-label data. In this paper, we propose a fine-grained Knowledge transfer framework for Asymmetric Multi-Label data (KAML). We introduce an attribution-driven masking strategy (ADM) to better utilize data with asymmetric multi-label data in training. However, the more relaxed masking in ADM is a double-edged sword: it provides additional training signals but also introduces noise due to skewed data. To address this, we propose a hierarchical knowledge extraction mechanism (HKE) to model the sample discrepancy within the target task tower. Finally, to maximize the utility of unlabeled samples, we incorporate ranking loss strategy to further enhance our model. The effectiveness of KAML has been demonstrated through comprehensive evaluations on offline industry datasets and online A/B tests, which show significant performance improvements over existing MTL baselines.
Updated: 2025-12-15 13:14:20
标题: 不要让任何人被落下:如何利用不完整和倾斜的多标签数据进行转化率预测
摘要: 在大多数实际在线广告系统中,广告商通常有多样化的客户获取目标。一种常见的解决方案是使用多任务学习(MTL)在后点击数据上训练统一模型,以估计这些多样化目标的转化率(CVR)。在实践中,CVR 预测经常遇到缺失的转化数据,因为许多广告商仅提交用户转化行为的子集,原因可能是隐私或其他限制,这使得多任务数据的标签不完整。如果模型在广告商提交用户转化行为的所有可用样本上训练,那么当部署为服务于针对特定转化行为的广告商子集时,可能会遇到困难,因为训练和部署数据分布不匹配。尽管已经做出了大量的多任务学习努力,但一个长期存在的挑战是如何有效地训练一个具有不完整和倾斜多标签数据的统一模型。在本文中,我们提出了一种用于不对称多标签数据的细粒度知识迁移框架(KAML)。我们引入了一种基于归因的屏蔽策略(ADM),以更好地利用训练中的不对称多标签数据。然而,ADM 中更宽松的屏蔽是一把双刃剑:它提供了额外的训练信号,但也由于倾斜数据引入了噪音。为了解决这个问题,我们提出了一种层次化知识提取机制(HKE),以建模目标任务塔内的样本差异。最后,为了最大化未标记样本的效用,我们引入了排名损失策略来进一步增强我们的模型。通过在离线行业数据集和在线 A/B 测试上进行全面评估,KAML 的有效性得到了证实,显示出明显优于现有 MTL 基线的性能改进。
更新时间: 2025-12-15 13:14:20
领域: cs.LG,cs.AI,cs.IR
MiniLingua: A Small Open-Source LLM for European Languages
Large language models are powerful but often limited by high computational cost, privacy concerns, and English-centric training. Recent progress demonstrates that small, efficient models with around one billion parameters can deliver strong results and enable on-device use. This paper introduces MiniLingua, a multilingual open-source LLM of one billion parameters trained from scratch for 13 European languages, designed to balance coverage and instruction-following capabilities. Based on evaluation results, the instruction-tuned version of MiniLingua outperforms EuroLLM, a model with a similar training approach but a larger training budget, on summarization, classification and both open- and closed-book question answering. Moreover, it remains competitive with more advanced state-of-the-art models on open-ended generation tasks. We release model weights, tokenizer and source code used for data processing and model training.
Updated: 2025-12-15 13:12:42
标题: MiniLingua:一个适用于欧洲语言的小型开源LLM
摘要: 大型语言模型功能强大,但通常受到高计算成本、隐私问题和英语为中心的训练的限制。最近的进展表明,具有约10亿参数的小型高效模型可以提供强大的结果,并实现设备上的使用。本文介绍了MiniLingua,一个具有10亿参数的多语言开源LLM,从头开始为13种欧洲语言进行训练,旨在平衡覆盖范围和指令遵循能力。根据评估结果,MiniLingua的指令调整版本在摘要、分类以及开放式和闭卷问答方面的表现均优于EuroLLM,后者采用了类似的训练方法但具有更大的训练预算。此外,它在开放式生成任务上与更先进的最新模型保持竞争力。我们发布了用于数据处理和模型训练的模型权重、分词器和源代码。
更新时间: 2025-12-15 13:12:42
领域: cs.CL,cs.AI
MedInsightBench: Evaluating Medical Analytics Agents Through Multi-Step Insight Discovery in Multimodal Medical Data
In medical data analysis, extracting deep insights from complex, multi-modal datasets is essential for improving patient care, increasing diagnostic accuracy, and optimizing healthcare operations. However, there is currently a lack of high-quality datasets specifically designed to evaluate the ability of large multi-modal models (LMMs) to discover medical insights. In this paper, we introduce MedInsightBench, the first benchmark that comprises 332 carefully curated medical cases, each annotated with thoughtfully designed insights. This benchmark is intended to evaluate the ability of LMMs and agent frameworks to analyze multi-modal medical image data, including posing relevant questions, interpreting complex findings, and synthesizing actionable insights and recommendations. Our analysis indicates that existing LMMs exhibit limited performance on MedInsightBench, which is primarily attributed to their challenges in extracting multi-step, deep insights and the absence of medical expertise. Therefore, we propose MedInsightAgent, an automated agent framework for medical data analysis, composed of three modules: Visual Root Finder, Analytical Insight Agent, and Follow-up Question Composer. Experiments on MedInsightBench highlight pervasive challenges and demonstrate that MedInsightAgent can improve the performance of general LMMs in medical data insight discovery.
Updated: 2025-12-15 13:10:42
标题: MedInsightBench: 通过多步骤洞察发现在多模式医疗数据中评估医疗分析代理
摘要: 在医学数据分析中,从复杂的多模态数据集中提取深入见解对于改善患者护理、提高诊断准确性和优化医疗运营至关重要。然而,目前缺乏专门设计用于评估大型多模态模型(LMMs)发现医学见解能力的高质量数据集。本文介绍了MedInsightBench,这是第一个由332个精心策划的医学案例组成的基准,每个案例都注有经过深思熟虑设计的见解。该基准旨在评估LMMs和代理框架分析多模态医学图像数据的能力,包括提出相关问题、解释复杂发现,并综合可操作的见解和建议。我们的分析表明,现有的LMMs在MedInsightBench上表现出有限的性能,主要是由于它们在提取多步骤、深入见解和缺乏医学专业知识方面的挑战。因此,我们提出了MedInsightAgent,这是一个用于医学数据分析的自动代理框架,由三个模块组成:视觉根查找器、分析见解代理和后续问题组合器。对MedInsightBench的实验突显了普遍的挑战,并证明MedInsightAgent可以提高一般LMMs在医学数据见解发现方面的性能。
更新时间: 2025-12-15 13:10:42
领域: cs.AI,cs.LG
Progressive Localisation in Localist LLMs
This paper demonstrates that progressive localization, the gradual increase of attention locality from early distributed layers to late localized layers, represents the optimal architecture for creating interpretable large language models (LLMs) while preserving performance. Through systematic experimentation with GPT-2 fine-tuned on The Psychology of Artificial Superintelligence, we evaluate five locality configurations: two uniform baselines (fully distributed and fully localist) and three progressive polynomial schedules. We investigate whether interpretability constraints can be aligned with natural semantic structure while being applied strategically across network depth. We demonstrate that progressive semantic localization, combining adaptive semantic block partitioning with steep polynomial locality schedules, achieves near-baseline language modeling performance while providing interpretable attention patterns. Multiple independent training runs with different random seeds establish that results are statistically robust and highly reproducible. The approach dramatically outperforms both fixed-window localization and naive uniform locality constraints. Analysis reveals that maintaining flexibility through low-fidelity constraints preserves model capacity while providing interpretability benefits, and that steep schedules concentrating locality in decision-critical final layers while preserving distributed learning in early layers achieve near-baseline attention distribution characteristics. These findings demonstrate that interpretability mechanisms should align with semantic structure to achieve practical performance-interpretability tradeoffs for trustworthy AI systems.
Updated: 2025-12-15 13:10:19
标题: "局部主义LLMs中的渐进式本地化"
摘要: 本文表明,渐进性本地化,即从早期分布式层逐渐增加关注度局部化到后期局部化层,代表了创建可解释的大型语言模型(LLMs)的最佳架构,同时保持性能。通过对在《人工超智能心理学》上进行微调的GPT-2进行系统实验,我们评估了五种本地性配置:两种均匀基线(完全分布式和完全本地化)和三种渐进性多项式计划。我们调查了解释能力约束是否可以与自然语义结构对齐,同时在网络深度上进行战略应用。我们证明,渐进语义本地化,结合自适应语义块分区和陡峭的多项式本地性计划,实现了接近基线的语言建模性能,同时提供了可解释的注意力模式。通过不同随机种子进行多次独立训练运行,确定结果在统计上是稳健且高度可复制的。该方法明显优于固定窗口本地化和朴素均匀本地性约束。分析显示,通过低保真度约束保持模型容量同时提供解释能力的好处,以及在决策关键的最终层中集中本地性并保留早期层中的分布式学习的陡峭计划实现了接近基线的注意力分布特征。这些发现表明,解释性机制应与语义结构对齐,以实现可信AI系统的实际性能-可解释性权衡。
更新时间: 2025-12-15 13:10:19
领域: cs.AI
Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems
Cyber-Physical Systems (CPS) in domains such as manufacturing and energy distribution generate complex time series data crucial for Prognostics and Health Management (PHM). While Deep Learning (DL) methods have demonstrated strong forecasting capabilities, their adoption in industrial CPS remains limited due insufficient robustness. Existing robustness evaluations primarily focus on formal verification or adversarial perturbations, inadequately representing the complexities encountered in real-world CPS scenarios. To address this, we introduce a practical robustness definition grounded in distributional robustness, explicitly tailored to industrial CPS, and propose a systematic framework for robustness evaluation. Our framework simulates realistic disturbances, such as sensor drift, noise and irregular sampling, enabling thorough robustness analyses of forecasting models on real-world CPS datasets. The robustness definition provides a standardized score to quantify and compare model performance across diverse datasets, assisting in informed model selection and architecture design. Through extensive empirical studies evaluating prominent DL architectures (including recurrent, convolutional, attention-based, modular, and structured state-space models) we demonstrate the applicability and effectiveness of our approach. We publicly release our robustness benchmark to encourage further research and reproducibility.
Updated: 2025-12-15 13:07:25
标题: 量化稳健性:用于深度学习预测在网络物理系统中的基准框架
摘要: 在制造业和能源分配等领域,网络物理系统(CPS)产生的复杂时间序列数据对于预测和健康管理(PHM)至关重要。虽然深度学习(DL)方法已经展示了强大的预测能力,但其在工业CPS中的应用仍然受到限制,主要是由于缺乏足够的稳健性。现有的稳健性评估主要集中在形式验证或对抗性扰动上,不足以充分代表实际CPS场景中遇到的复杂情况。为了解决这个问题,我们引入了一个基于分布稳定性的实用稳健性定义,明确针对工业CPS,并提出了一个稳健性评估的系统框架。我们的框架模拟了现实中的干扰,如传感器漂移、噪声和不规则采样,从而能够对真实世界的CPS数据集上的预测模型进行全面的稳健性分析。这个稳健性定义提供了一个标准化的评分,用于量化和比较在不同数据集上的模型性能,有助于选择和设计模型架构。通过对突出的DL架构(包括循环、卷积、注意力、模块化和结构状态空间模型)的广泛实证研究,我们展示了我们方法的适用性和有效性。我们公开发布我们的稳健性基准,以鼓励进一步的研究和可重复性。
更新时间: 2025-12-15 13:07:25
领域: cs.LG,cs.AI
Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration
This paper investigates the application of reinforcement learning (RL) to multi-robot social formation navigation, a critical capability for enabling seamless human-robot coexistence. While RL offers a promising paradigm, the inherent unpredictability and often uncooperative dynamics of pedestrian behavior pose substantial challenges, particularly concerning the efficiency of coordinated exploration among robots. To address this, we propose a novel coordinated-exploration multi-robot RL algorithm introducing an intrinsic motivation exploration. Its core component is a self-learning intrinsic reward mechanism designed to collectively alleviate policy conservatism. Moreover, this algorithm incorporates a dual-sampling mode within the centralized training and decentralized execution framework to enhance the representation of both the navigation policy and the intrinsic reward, leveraging a two-time-scale update rule to decouple parameter updates. Empirical results on social formation navigation benchmarks demonstrate the proposed algorithm's superior performance over existing state-of-the-art methods across crucial metrics. Our code and video demos are available at: https://github.com/czxhunzi/CEMRRL.
Updated: 2025-12-15 13:03:08
标题: 具有协调探索的内在动机多机器人社交编队导航
摘要: 本文研究了强化学习(RL)在多机器人社会形态导航中的应用,这是实现无缝人机共存的关键能力。虽然RL提供了一种有前途的范式,但行人行为的固有不可预测性和时常不合作的动态给机器人之间的协调探索效率带来了重大挑战。为了解决这一问题,我们提出了一种新颖的协调探索多机器人RL算法,引入了一种内在动机探索。其核心组件是一个自学习的内在奖励机制,旨在共同减轻策略保守性。此外,该算法在集中式训练和分散执行框架内结合了双采样模式,以增强导航策略和内在奖励的表示,利用两时间尺度更新规则来解耦参数更新。对社会形态导航基准测试的实证结果表明,所提出的算法在关键指标上优于现有的最先进方法。我们的代码和视频演示可在以下链接找到:https://github.com/czxhunzi/CEMRRL。
更新时间: 2025-12-15 13:03:08
领域: cs.RO,cs.AI
LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models
Diffusion models (DMs) have achieved remarkable success in image and video generation. However, they still struggle with (1) physical alignment and (2) out-of-distribution (OOD) instruction following. We argue that these issues stem from the models' failure to learn causal directions and to disentangle causal factors for novel recombination. We introduce the Causal Scene Graph (CSG) and the Physical Alignment Probe (PAP) dataset to enable diagnostic interventions. This analysis yields three key insights. First, DMs struggle with multi-hop reasoning for elements not explicitly determined in the prompt. Second, the prompt embedding contains disentangled representations for texture and physics. Third, visual causal structure is disproportionately established during the initial, computationally limited denoising steps. Based on these findings, we introduce LINA (Learning INterventions Adaptively), a novel framework that learns to predict prompt-specific interventions, which employs (1) targeted guidance in the prompt and visual latent spaces, and (2) a reallocated, causality-aware denoising schedule. Our approach enforces both physical alignment and OOD instruction following in image and video DMs, achieving state-of-the-art performance on challenging causal generation tasks and the Winoground dataset. Our project page is at https://opencausalab.github.io/LINA.
Updated: 2025-12-15 12:59:59
标题: LINA:学习干预以适应扩散模型中的物理对齐和泛化
摘要: 扩散模型(DMs)在图像和视频生成方面取得了显著的成功。然而,它们仍然在(1)物理对齐和(2)超出分布(OOD)指令跟随方面存在困难。我们认为这些问题源于模型未能学习因果方向并将因果因素解离以进行新颖的重组。我们引入了因果场景图(CSG)和物理对齐探针(PAP)数据集以实现诊断干预。这种分析得出了三个关键见解。首先,DMs在未明确确定提示中的元素的多跳推理方面存在困难。其次,提示嵌入包含了纹理和物理的解离表示。第三,视觉因果结构在初始的、计算受限的去噪步骤中被不成比例地建立起来。基于这些发现,我们引入了LINA(自适应学习干预)这一新框架,学习预测特定提示的干预措施,利用(1)提示和视觉潜在空间中的有针对性的指导以及(2)重新分配的、具有因果意识的去噪时间表。我们的方法在图像和视频DMs中强化了物理对齐和OOD指令跟随,实现了挑战性因果生成任务和Winoground数据集上的最先进性能。我们的项目页面位于https://opencausalab.github.io/LINA。
更新时间: 2025-12-15 12:59:59
领域: cs.CV,cs.AI,cs.LG
Skrull: Towards Efficient Long Context Fine-tuning through Dynamic Data Scheduling
Long-context supervised fine-tuning (Long-SFT) plays a vital role in enhancing the performance of large language models (LLMs) on long-context tasks. To smoothly adapt LLMs to long-context scenarios, this process typically entails training on mixed datasets containing both long and short sequences. However, this heterogeneous sequence length distribution poses significant challenges for existing training systems, as they fail to simultaneously achieve high training efficiency for both long and short sequences, resulting in sub-optimal end-to-end system performance in Long-SFT. In this paper, we present a novel perspective on data scheduling to address the challenges posed by the heterogeneous data distributions in Long-SFT. We propose Skrull, a dynamic data scheduler specifically designed for efficient long-SFT. Through dynamic data scheduling, Skrull balances the computation requirements of long and short sequences, improving overall training efficiency. Furthermore, we formulate the scheduling process as a joint optimization problem and thoroughly analyze the trade-offs involved. Based on those analysis, Skrull employs a lightweight scheduling algorithm to achieve near-zero cost online scheduling in Long-SFT. Finally, we implement Skrull upon DeepSpeed, a state-of-the-art distributed training system for LLMs. Experimental results demonstrate that Skrull outperforms DeepSpeed by 3.76x on average (up to 7.54x) in real-world long-SFT scenarios.
Updated: 2025-12-15 12:58:53
标题: Skrull:通过动态数据调度实现高效的长上下文微调
摘要: 长文本监督微调(Long-SFT)在提升大型语言模型(LLMs)在长文本任务上的性能中起着至关重要的作用。为了顺利适应LLMs到长文本场景,这个过程通常涉及在包含长短序列的混合数据集上进行训练。然而,这种异质序列长度分布对现有训练系统提出了重大挑战,因为它们无法同时实现长短序列的高训练效率,导致Long-SFT中的端到端系统性能不佳。在本文中,我们提出了一种新颖的数据调度视角,以解决Long-SFT中异质数据分布带来的挑战。我们提出了Skrull,一个专门设计用于高效长文本监督微调的动态数据调度器。通过动态数据调度,Skrull平衡了长短序列的计算需求,提高了整体训练效率。此外,我们将调度过程形式化为一个联合优化问题,并深入分析了所涉及的权衡。基于这些分析,Skrull采用了一种轻量级调度算法,在Long-SFT中实现了接近零成本的在线调度。最后,我们在DeepSpeed上实现了Skrull,DeepSpeed是一种用于LLMs的最先进的分布式训练系统。实验结果表明,在真实世界的长文本监督微调场景中,Skrull平均优于DeepSpeed 3.76倍(最高达到7.54倍)。
更新时间: 2025-12-15 12:58:53
领域: cs.LG,cs.AI
Template-Free Retrosynthesis with Graph-Prior Augmented Transformers
Retrosynthesis reaction prediction aims to infer plausible reactant molecules for a given product and is a important problem in computer-aided organic synthesis. Despite recent progress, many existing models still fall short of the accuracy and robustness required for practical deployment. In this paper, we present a template-free, Transformer-based framework that removes the need for handcrafted reaction templates or additional chemical rule engines. Our model injects molecular graph information into the attention mechanism to jointly exploit SMILES sequences and structural cues, and further applies a paired data augmentation strategy to enhance training diversity and scale. Extensive experiments on the USPTO-50K benchmark demonstrate that our approach achieves state-of-the-art performance among template-free methods and substantially outperforms a vanilla Transformer baseline.
Updated: 2025-12-15 12:53:39
标题: 无模板的图先验增强变换的逆合成
摘要: 逆合成反应预测旨在推断出给定产物的可能反应物分子,是计算辅助有机合成中一个重要的问题。尽管近年来取得了一些进展,但许多现有模型仍然未能达到实际部署所需的准确性和稳健性。本文介绍了一种无模板、基于Transformer的框架,消除了手工制作反应模板或额外化学规则引擎的需要。我们的模型将分子图信息注入到注意力机制中,共同利用SMILES序列和结构线索,并进一步应用成对数据增强策略来增强训练多样性和规模。在USPTO-50K基准上进行的大量实验证明,我们的方法在无模板方法中实现了最先进的性能,并明显优于普通的Transformer基线。
更新时间: 2025-12-15 12:53:39
领域: cs.LG
From Clicks to Preference: A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System
Generative query suggestion using large language models offers a powerful way to enhance conversational systems, but aligning outputs with nuanced user preferences remains a critical challenge. To address this, we introduce a multi-stage framework designed for progressive alignment between the generation policy and user intent. Our pipeline begins with prompt engineering as a cold-start strategy, followed by the Supervised Fine-Tuning stage, in which we introduce a distillation method on click logs to create a robust foundational model. To better model user preferences while capturing their inherent uncertainty, we develop a Gaussian Reward Model (GaRM) that represents user preferences as probability distributions rather than point estimates. Finally, we employ reinforcement learning to align the generation policy with these preferences, guided by a composite reward function that integrates GaRM with auxiliary heuristics to mitigate reward hacking. To maintain training stability, this process is enhanced by a novel out-of-distribution regularization method and a two-stage reward fusion technique. Extensive experiments demonstrate that our framework significantly outperforms baselines on both automatic and human evaluations and yields a 34\% relative increase in user engagement as measured by click-through rate in live A/B tests.
Updated: 2025-12-15 12:51:55
标题: 从点击到偏好:面向对话系统的生成式查询建议的多阶段对齐框架
摘要: 使用大型语言模型生成查询建议为增强对话系统提供了强大的方式,但将输出与复杂的用户偏好对齐仍然是一个关键挑战。为了解决这个问题,我们引入了一个多阶段框架,旨在逐步对齐生成策略和用户意图。我们的流程从提示工程开始作为冷启动策略,然后是监督微调阶段,其中我们引入了一个点击日志上的蒸馏方法来创建一个强大的基础模型。为了更好地建模用户偏好并捕捉其固有的不确定性,我们开发了一个高斯奖励模型(GaRM),将用户偏好表示为概率分布而不是点估计。最后,我们采用强化学习来将生成策略与这些偏好对齐,通过一个综合奖励函数引导,该函数将GaRM与辅助启发式方法结合起来,以减少奖励欺骗。为了保持训练稳定性,这个过程通过一种新颖的分布外正则化方法和一个两阶段奖励融合技术得到增强。大量实验表明,我们的框架在自动和人工评估方面显著优于基线,并在实时A/B测试中通过点击率测量得到34\%的用户参与度相对增加。
更新时间: 2025-12-15 12:51:55
领域: cs.CL,cs.AI
AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning
Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories while interleaving external tool use. Existing approaches assume a fixed inventory of tools, limiting LLM agents' adaptability to new or evolving toolsets. We present AutoTool, a framework that equips LLM agents with dynamic tool-selection capabilities throughout their reasoning trajectories. We first construct a 200k dataset with explicit tool-selection rationales across 1,000+ tools and 100+ tasks spanning mathematics, science, code generation, and multimodal reasoning. Building on this data foundation, AutoTool employs a dual-phase optimization pipeline: (i) supervised and RL-based trajectory stabilization for coherent reasoning, and (ii) KL-regularized Plackett-Luce ranking to refine consistent multi-step tool selection. Across ten diverse benchmarks, we train two base models, Qwen3-8B and Qwen2.5-VL-7B, with AutoTool. With fewer parameters, AutoTool consistently outperforms advanced LLM agents and tool-integration methods, yielding average gains of 6.4% in math & science reasoning, 4.5% in search-based QA, 7.7% in code generation, and 6.9% in multimodal understanding. In addition, AutoTool exhibits stronger generalization by dynamically leveraging unseen tools from evolving toolsets during inference.
Updated: 2025-12-15 12:38:04
标题: AutoTool:面向主动推理的动态工具选择和集成 (Note: "agentic reasoning" refers to the ability of an agent to make decisions and take actions independently.)
摘要: 主动强化学习已经推动了大型语言模型(LLMs)通过长链条思维轨迹进行推理,同时交错使用外部工具。现有方法假设工具的存货固定,限制了LLM代理对新的或不断演化的工具集的适应性。我们提出了AutoTool,一个为LLM代理提供动态工具选择能力的框架,使其在推理轨迹中能够灵活选择工具。我们首先构建了一个包括数学、科学、代码生成和多模态推理等领域的1000多个工具和100多个任务的200k数据集,其中包含明确的工具选择原理。基于这一数据基础,AutoTool采用了双阶段优化流程:(i)有监督和基于RL的轨迹稳定化,以进行连贯推理,(ii)KL正则化的Plackett-Luce排序,以改进一致的多步工具选择。在十个不同的基准测试中,我们使用AutoTool对两个基础模型Qwen3-8B和Qwen2.5-VL-7B进行训练。AutoTool的参数较少,但始终优于先进的LLM代理和工具集成方法,数学和科学推理平均获得了6.4%的增益,基于搜索的QA增益了4.5%,代码生成增益了7.7%,多模态理解增益了6.9%。此外,AutoTool展现出更强的泛化能力,在推理过程中动态利用来自不断演化的工具集中未知的工具。
更新时间: 2025-12-15 12:38:04
领域: cs.CL,cs.LG
Fully Bayesian Differential Gaussian Processes through Stochastic Differential Equations
Deep Gaussian process models typically employ discrete hierarchies, but recent advancements in differential Gaussian processes (DiffGPs) have extended these models to infinite depths. However, existing DiffGP approaches often overlook the uncertainty in kernel hyperparameters by treating them as fixed and time-invariant, which degrades the model's predictive performance and neglects the posterior distribution. In this work, we introduce a fully Bayesian framework that models kernel hyperparameters as random variables and utilizes coupled stochastic differential equations (SDEs) to jointly learn their posterior distributions alongside those of inducing points. By incorporating the estimation uncertainty of hyperparameters, our method significantly enhances model flexibility and adaptability to complex dynamic systems. Furthermore, we employ a black-box adaptive SDE solver with a neural network to achieve realistic, time varying posterior approximations, thereby improving the expressiveness of the variational posterior. Comprehensive experimental evaluations demonstrate that our approach outperforms traditional methods in terms of flexibility, accuracy, and other key performance metrics. This work not only provides a robust Bayesian extension to DiffGP models but also validates its effectiveness in handling intricate dynamic behaviors, thereby advancing the applicability of Gaussian process models in diverse real-world scenarios.
Updated: 2025-12-15 12:27:16
标题: 基于随机微分方程的贝叶斯差分高斯过程
摘要: 深高斯过程模型通常采用离散层次结构,但最近在微分高斯过程(DiffGPs)方面的进展已将这些模型拓展到了无限深度。然而,现有的DiffGP方法常常忽视核超参数的不确定性,将其视为固定且时间不变的,这降低了模型的预测性能并忽略了后验分布。在这项工作中,我们引入了一个完全贝叶斯框架,将核超参数建模为随机变量,并利用耦合的随机微分方程(SDEs)来同时学习其后验分布以及感应点的后验分布。通过结合超参数的估计不确定性,我们的方法显著提高了模型对复杂动态系统的灵活性和适应性。此外,我们采用了一个带有神经网络的黑盒自适应SDE求解器,以实现真实的、时变的后验近似,从而提高了变分后验的表达能力。全面的实验评估表明,我们的方法在灵活性、准确性和其他关键性能指标方面优于传统方法。这项工作不仅为DiffGP模型提供了强大的贝叶斯扩展,还验证了其处理复杂动态行为的有效性,从而推动了高斯过程模型在各种现实场景中的应用。
更新时间: 2025-12-15 12:27:16
领域: cs.LG,cs.AI
Decoupling Understanding from Reasoning via Problem Space Mapping for Small-Scale Model Reasoning
Despite recent advances in the reasoning capabilities of Large Language Models (LLMs), improving the reasoning ability of Small Language Models (SLMs, e.g., up to 1.5B parameters) remains challenging. A key obstacle lies in the complexity and variability of natural language: essentially equivalent problems often appear in diverse surface forms, often obscured by redundant or distracting details. This imposes a dual burden on SLMs: they must first extract the core problem from complex linguistic input, and then perform reasoning based on that understanding. The resulting vast and noisy problem space hinders optimization, particularly for models with limited capacity. To address this, we propose a new framework that decouples understanding from reasoning by mapping natural language problems into a canonical problem space-a semantically simplified yet expressive domain. This enables SLMs to focus on reasoning over standardized inputs, free from linguistic variability. Within this framework, we introduce DURIT (Decoupled Understanding from Reasoning via Iterative Training), a three-step algorithm that iteratively: (1) mapping natural language problems via reinforcement learning, (2) aligns reasoning trajectories through self-distillation, and (3) trains reasoning policies in the problem space. The mapper and reasoner are co-trained in an alternating loop throughout this process. Experiments show that DURIT substantially improves SLMs' performance on both in-domain and out-of-domain mathematical and logical reasoning tasks. Beyond improving reasoning capabilities, DURIT also improves the robustness of reasoning, validating decoupling understanding from reasoning as an effective strategy for strengthening SLMs.
Updated: 2025-12-15 12:12:19
标题: 通过问题空间映射将理解与推理解耦:小规模模型推理
摘要: 尽管大型语言模型(LLMs)的推理能力近年来有所提高,但改进小型语言模型(SLMs,例如1.5B参数)的推理能力仍具有挑战性。一个关键障碍在于自然语言的复杂性和变异性:基本上等效的问题往往以不同的表面形式出现,通常被冗余或分散注意力的细节所掩盖。这给SLMs带来了双重负担:它们必须首先从复杂的语言输入中提取核心问题,然后基于这种理解进行推理。由此产生的庞大且嘈杂的问题空间阻碍了优化,特别是对于容量有限的模型。为了解决这个问题,我们提出了一个新的框架,通过将自然语言问题映射到一个语义简化但富有表现力的标准问题空间,将理解与推理分离开来。这使得SLMs能够专注于对标准化输入进行推理,摆脱语言变异性。在这个框架内,我们引入了DURIT(通过迭代训练将理解与推理解耦)的三步算法,通过迭代地:(1)通过强化学习映射自然语言问题,(2)通过自我精炼对齐推理轨迹,以及(3)在问题空间中训练推理策略。在整个过程中,映射器和推理器交替训练。实验证明,DURIT显著提高了SLMs在领域内和领域外的数学和逻辑推理任务表现。除了改进推理能力,DURIT还提高了推理的鲁棒性,验证了将理解与推理解耦作为强化SLMs的有效策略。
更新时间: 2025-12-15 12:12:19
领域: cs.CL,cs.AI
BézierFlow: Bézier Stochastic Interpolant Schedulers for Few-Step Generation
We introduce BézierFlow, a lightweight training approach for few-step generation with pretrained diffusion and flow models. BézierFlow achieves a 2-3x performance improvement for sampling with $\leq$ 10 NFEs while requiring only 15 minutes of training. Recent lightweight training approaches have shown promise by learning optimal timesteps, but their scope remains restricted to ODE discretizations. To broaden this scope, we propose learning the optimal transformation of the sampling trajectory by parameterizing stochastic interpolant (SI) schedulers. The main challenge lies in designing a parameterization that satisfies critical desiderata, including boundary conditions, differentiability, and monotonicity of the SNR. To effectively meet these requirements, we represent scheduler functions as Bézier functions, where control points naturally enforce these properties. This reduces the problem to learning an ordered set of points in the time range, while the interpretation of the points changes from ODE timesteps to Bézier control points. Across a range of pretrained diffusion and flow models, BézierFlow consistently outperforms prior timestep-learning methods, demonstrating the effectiveness of expanding the search space from discrete timesteps to Bézier-based trajectory transformations.
Updated: 2025-12-15 12:09:32
标题: 贝塞尔流:贝塞尔随机插值调度器用于少步生成
摘要: 我们介绍了BézierFlow,这是一种轻量级训练方法,用于使用预训练的扩散和流模型进行少步生成。BézierFlow在采样时实现了2-3倍的性能改进,同时只需要15分钟的训练。最近的轻量级训练方法通过学习最佳时间步长显示了潜力,但它们的范围仍然局限于ODE离散化。为了扩大这一范围,我们提出通过参数化随机插值器(SI)调度器来学习采样轨迹的最佳变换。主要挑战在于设计一个参数化,满足关键要求,包括边界条件、可微性和信噪比的单调性。为了有效满足这些要求,我们将调度器函数表示为Bézier函数,其中控制点自然地强制执行这些属性。这将问题简化为学习时间范围内的一组有序点,而这些点的解释从ODE时间步长变为Bézier控制点。在一系列预训练的扩散和流模型中,BézierFlow始终优于先前的时间步长学习方法,证明了从离散时间步长扩展到基于Bézier的轨迹变换的搜索空间扩大的有效性。
更新时间: 2025-12-15 12:09:32
领域: cs.LG
Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection
Direct Preference Optimization (DPO) has emerged as a lightweight and effective alternative to Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with AI Feedback (RLAIF) for aligning large language and vision-language models. However, the standard DPO formulation, in which both the chosen and rejected responses are generated by the same policy, suffers from a weak learning signal because the two responses often share similar errors and exhibit small Kullback-Leibler (KL) divergence. This leads to slow and unstable convergence. To address this limitation, we introduce Reflective Preference Optimization (RPO), a new framework that incorporates hint-guided reflection into the DPO paradigm. RPO uses external models to identify hallucination sources and generate concise reflective hints, enabling the construction of on-policy preference pairs with stronger contrastiveness and clearer preference signals. We theoretically show that conditioning on hints increases the expected preference margin through mutual information and improves sample efficiency while remaining within the policy distribution family. Empirically, RPO achieves superior alignment with fewer training samples and iterations, substantially reducing hallucination rates and delivering state-of-the-art performance across multimodal benchmarks.
Updated: 2025-12-15 11:55:55
标题: 反思式偏好优化(RPO):通过提示引导的反思增强政策对齐
摘要: 直接偏好优化(DPO)已经成为一个轻量级和有效的替代方案,用于对齐大型语言和视觉-语言模型,与通过人类反馈进行强化学习(RLHF)和通过AI反馈进行强化学习(RLAIF)相比。然而,标准的DPO公式,其中选定和拒绝的响应都是由相同策略生成的,由于这两个响应经常共享相似的错误并显示出较小的Kullback-Leibler(KL)散度,因此受到弱的学习信号的影响。这导致收敛速度慢且不稳定。为了解决这一限制,我们引入了反思偏好优化(RPO),这是一个新的框架,将提示引导的反思融入DPO范式中。RPO使用外部模型识别幻觉源并生成简洁的反思提示,从而使得构建具有更强对比性和更清晰偏好信号的基于策略的偏好对成为可能。我们在理论上表明,条件提示增加了通过互信息来提高预期偏好边界,并提高样本效率,同时仍然保持在策略分布族中。在实证方面,RPO在较少的训练样本和迭代次数中实现了优越的对齐性,大大降低了幻觉率,并在多模态基准测试中提供了最新的性能。
更新时间: 2025-12-15 11:55:55
领域: cs.AI,cs.LG
Statistical Taylor Expansion: A New and Path-Independent Method for Uncertainty Analysis
As a rigorous statistical approach, statistical Taylor expansion extends the conventional Taylor expansion by replacing precise input variables with random variables of known distributions, to compute means and standard deviations of the results. Statistical Taylor expansion traces the dependency of the input uncertainties in the intermediate steps, so that the variables in the intermediate analytic expressions can no longer be regarded as independent of each other, and the result of the analytic expression is path independent. Thus, it differs fundamentally from the conventional common approaches in applied mathematics which optimize execution path for each calculation. In fact, statistical Taylor expansion may standardize numerical calculations for analytic expressions. Its statistical nature allows religious testing of its result when the sample size is large enough. This paper also introduces an implementation of statistical Taylor expansion called variance arithmetic and presents corresponding test results in a very wide range of mathematical applications. Another important conclusion of this paper is that the numerical errors in the library function can have significant effects on the result. For example, the periodic numerical errors in the trigonometric library functions can resonate with periodic signals, producing large numerical errors in the results.
Updated: 2025-12-15 11:52:52
标题: 统计泰勒展开:一种新的独立路径不确定性分析方法
摘要: 作为一种严谨的统计方法,统计泰勒展开通过用已知分布的随机变量取代精确输入变量来扩展传统的泰勒展开,以计算结果的均值和标准差。统计泰勒展开追踪中间步骤中输入不确定性的依赖关系,因此中间解析表达式中的变量不再被视为彼此独立,并且解析表达式的结果是路径无关的。因此,它与应用数学中优化每个计算的执行路径的传统常见方法有根本的不同。事实上,统计泰勒展开可能标准化解析表达式的数值计算。其统计性质允许在样本量足够大时对其结果进行可靠性测试。本文还介绍了一种名为方差算术的统计泰勒展开实现,并在非常广泛的数学应用范围内提供了相应的测试结果。 本文的另一个重要结论是库函数中的数值误差可能对结果产生显著影响。例如,三角函数库函数中的周期性数值误差可能与周期信号共振,导致结果中出现较大的数值误差。
更新时间: 2025-12-15 11:52:52
领域: stat.CO,cs.LG
Learning to Retrieve with Weakened Labels: Robust Training under Label Noise
Neural Encoders are frequently used in the NLP domain to perform dense retrieval tasks, for instance, to generate the candidate documents for a given query in question-answering tasks. However, sparse annotation and label noise in the training data make it challenging to train or fine-tune such retrieval models. Although existing works have attempted to mitigate these problems by incorporating modified loss functions or data cleaning, these approaches either require some hyperparameters to tune during training or add substantial complexity to the training setup. In this work, we consider a label weakening approach to generate robust retrieval models in the presence of label noise. Instead of enforcing a single, potentially erroneous label for each query document pair, we allow for a set of plausible labels derived from both the observed supervision and the model's confidence scores. We perform an extensive evaluation considering two retrieval models, one re-ranking model, considering four diverse ranking datasets. To this end, we also consider a realistic noisy setting by using a semantic-aware noise generation technique to generate different ratios of noise. Our initial results show that label weakening can improve the performance of the retrieval tasks in comparison to 10 different state-of-the-art loss functions.
Updated: 2025-12-15 11:52:13
标题: 学习使用弱标签检索:标签噪声下的鲁棒训练
摘要: 神经编码器经常用于自然语言处理领域进行密集检索任务,例如,在问答任务中为给定查询生成候选文档。然而,在训练数据中稀疏标注和标签噪声使得训练或微调这样的检索模型具有挑战性。尽管现有的工作已经尝试通过结合修改的损失函数或数据清洗来减轻这些问题,但这些方法要么需要在训练过程中调整一些超参数,要么会给训练设置增加显著的复杂性。在这项工作中,我们考虑了一种标签削弱的方法,在标签噪声存在的情况下生成强大的检索模型。我们允许从观察到的监督和模型信心分数中派生的一组合理标签,而不是为每个查询文档对强制执行单个、潜在错误的标签。我们进行了广泛评估,考虑了两种检索模型、一种重新排名模型,涵盖了四个不同的排名数据集。为此,我们还通过使用语义感知的噪声生成技术在一个真实的嘈杂设置中生成不同比例的噪声。我们的初步结果显示,与10种不同的最先进损失函数相比,标签削弱可以提高检索任务的性能。
更新时间: 2025-12-15 11:52:13
领域: cs.LG,cs.IR
From Pruning to Grafting: Dynamic Knowledge Redistribution via Learnable Layer Fusion
Structured pruning of Generative Pre-trained Transformers (GPTs) offers a promising path to efficiency but often suffers from irreversible performance degradation due to the discarding of transformer blocks. In this paper, we introduce FuseGPT, a compression paradigm that reframes structured pruning as iterative knowledge grafting rather than simple removal. Motivated by the observation that linear block merging fails to capture non-linear feature disparities and that block importance fluctuates dynamically during pruning, FuseGPT employs a dual-strategy pipeline. First, we propose Macro Influence (MI), a dynamic fusion-aware metric that continuously re-evaluates block redundancy as the network topology evolves. Second, instead of rigid parameter averaging, we introduce a learnable low-rank fusion mechanism that adaptively grafts the knowledge of pruned blocks onto surviving layers via lightweight local distillation. Extensive experiments on LLaMA, Mistral, Qwen, and Phi families demonstrate that FuseGPT establishes a new state-of-the-art on the compression-accuracy Pareto frontier: at 25\% sparsity, FuseGPT achieves lower perplexity than prior methods at 20\% sparsity, improves zero-shot reasoning by up to 4.5 points, and delivers 1.33$\times$ inference speedup with 25\% memory reduction. Furthermore, FuseGPT is orthogonal to quantization, achieving 52.1\% total compression with negligible quality loss when combined with 4-bit GPTQ. We make our code publicly available at https://github.com/JarvisPei/FuseGPT.
Updated: 2025-12-15 11:49:40
标题: 从修剪到嫁接:通过可学习的层融合实现动态知识重新分配
摘要: 生成式预训练变压器(GPT)的结构修剪为提高效率提供了一个有前途的途径,但往往由于舍弃变压器块而导致不可逆的性能下降。在本文中,我们介绍了FuseGPT,一种将结构修剪重新构想为迭代知识移植而不是简单删除的压缩范式。受到线性块合并无法捕捉非线性特征差异以及在修剪过程中块重要性动态波动的观察的启发,FuseGPT采用了双策略管道。首先,我们提出了Macro Influence(MI),一种动态融合感知度量,随着网络拓扑的演变不断重新评估块冗余性。其次,我们引入了一个可学习的低秩融合机制,通过轻量级本地蒸馏,将被修剪块的知识自适应地移植到存活层上。在LLaMA、Mistral、Qwen和Phi家族的广泛实验中,FuseGPT在压缩-准确性帕累托前沿上建立了新的最先进技术:在25%的稀疏度下,FuseGPT的困惑度低于20%稀疏度下的先前方法,零击杀推理提高了最多4.5个点,并且在25%记忆减少的情况下提供1.33倍的推理加速。此外,FuseGPT与量化是正交的,在与4位GPTQ结合时实现了52.1%的总压缩,几乎没有质量损失。我们将我们的代码公开发布在https://github.com/JarvisPei/FuseGPT。
更新时间: 2025-12-15 11:49:40
领域: cs.LG,cs.AI,cs.CL
CORE: Contrastive Masked Feature Reconstruction on Graphs
In the rapidly evolving field of self-supervised learning on graphs, generative and contrastive methodologies have emerged as two dominant approaches. Our study focuses on masked feature reconstruction (MFR), a generative technique where a model learns to restore the raw features of masked nodes in a self-supervised manner. We observe that both MFR and graph contrastive learning (GCL) aim to maximize agreement between similar elements. Building on this observation, we reveal a novel theoretical insight: under specific conditions, the objectives of MFR and node-level GCL converge, despite their distinct operational mechanisms. This theoretical connection suggests these approaches are complementary rather than fundamentally different, prompting us to explore their integration to enhance self-supervised learning on graphs. Our research presents Contrastive Masked Feature Reconstruction (CORE), a novel graph self-supervised learning framework that integrates contrastive learning into MFR. Specifically, we form positive pairs exclusively between the original and reconstructed features of masked nodes, encouraging the encoder to prioritize contextual information over the node's own features. Additionally, we leverage the masked nodes themselves as negative samples, combining MFR's reconstructive power with GCL's discriminative ability to better capture intrinsic graph structures. Empirically, our proposed framework CORE significantly outperforms MFR across node and graph classification tasks, demonstrating state-of-the-art results. In particular, CORE surpasses GraphMAE and GraphMAE2 by up to 2.80% and 3.72% on node classification tasks, and by up to 3.82% and 3.76% on graph classification tasks.
Updated: 2025-12-15 11:48:48
标题: 核心:图上的对比掩蔽特征重建
摘要: 在自监督学习图领域中,生成式和对比方法已经成为两种主要方法。我们的研究关注遮蔽特征重建(MFR),这是一种生成技术,模型学习以自监督方式恢复遮蔽节点的原始特征。我们观察到MFR和图对比学习(GCL)都旨在最大化相似元素之间的一致性。基于这一观察,我们揭示了一个新颖的理论见解:在特定条件下,尽管它们具有不同的操作机制,MFR和节点级GCL的目标会收敛。这种理论联系暗示这些方法是互补的而不是根本不同的,促使我们探索将它们集成以增强图的自监督学习。我们的研究提出了对比遮蔽特征重建(CORE),这是一个将对比学习整合到MFR中的新颖图自监督学习框架。具体而言,我们仅在原始和重建特征之间的遮蔽节点之间形成正样本对,鼓励编码器优先考虑上下文信息而不是节点自身的特征。此外,我们利用遮蔽节点本身作为负样本,将MFR的重建能力与GCL的区分能力相结合,更好地捕捉内在的图结构。从经验上看,我们提出的框架CORE在节点和图分类任务中显著优于MFR,展示了最先进的结果。特别是,在节点分类任务上,CORE比GraphMAE和GraphMAE2分别提高了高达2.80%和3.72%,在图分类任务上提高了高达3.82%和3.76%。
更新时间: 2025-12-15 11:48:48
领域: cs.LG,cs.AI
ModSSC: A Modular Framework for Semi-Supervised Classification on Heterogeneous Data
Semi-supervised classification leverages both labeled and unlabeled data to improve predictive performance, but existing software support is fragmented across methods and modalities. We introduce ModSSC, an open source Python framework that unifies inductive and transductive semi-supervised classification in a modular code base. ModSSC implements a broad range of classical and recent algorithms, provides loaders for tabular, image, text, audio and graph datasets, and exposes a single configuration interface for specifying datasets, models and evaluation protocols. It supports both lightweight classical methods on small datasets running on CPU and recent deep approaches that can exploit multiple GPUs within the same experimental framework. Experiments are described declaratively in YAML, which facilitates reproducing existing work and running large comparative studies. ModSSC 1.0.0 is released under the MIT license with extensive documentation and tests, and is available at https://github.com/ModSSC/ModSSC.
Updated: 2025-12-15 11:43:14
标题: ModSSC:用于异构数据上的半监督分类的模块化框架
摘要: 半监督分类利用有标记和无标记数据来提高预测性能,但现有软件支持在方法和模态之间存在碎片化。我们介绍了ModSSC,一个开源的Python框架,它在模块化的代码库中统一了归纳和传导式半监督分类。ModSSC实现了广泛的经典和最新算法,为表格、图像、文本、音频和图形数据集提供加载器,并提供了一个单一配置界面,用于指定数据集、模型和评估协议。它支持在CPU上运行的小数据集上的轻量级经典方法,以及可以在同一实验框架中利用多个GPU的最新深度方法。实验以YAML形式进行声明,有助于复制现有工作并进行大规模的比较研究。ModSSC 1.0.0在MIT许可下发布,具有广泛的文档和测试,并可在https://github.com/ModSSC/ModSSC上获得。
更新时间: 2025-12-15 11:43:14
领域: cs.LG
Better LMO-based Momentum Methods with Second-Order Information
The use of momentum in stochastic optimization algorithms has shown empirical success across a range of machine learning tasks. Recently, a new class of stochastic momentum algorithms has emerged within the Linear Minimization Oracle (LMO) framework--leading to state-of-the-art methods, such as Muon, Scion, and Gluon, that effectively solve deep neural network training problems. However, traditional stochastic momentum methods offer convergence guarantees no better than the ${O}(1/K^{1/4})$ rate. While several approaches--such as Hessian-Corrected Momentum (HCM)--have aimed to improve this rate, their theoretical results are generally restricted to the Euclidean norm setting. This limitation hinders their applicability in problems, where arbitrary norms are often required. In this paper, we extend the LMO-based framework by integrating HCM, and provide convergence guarantees under relaxed smoothness and arbitrary norm settings. We establish improved convergence rates of ${O}(1/K^{1/3})$ for HCM, which can adapt to the geometry of the problem and achieve a faster rate than traditional momentum. Experimental results on training Multi-Layer Perceptrons (MLPs) and Long Short-Term Memory (LSTM) networks verify our theoretical observations.
Updated: 2025-12-15 11:43:09
标题: 更好的基于LMO的动量方法与二阶信息
摘要: 随机优化算法中动量的使用已在各种机器学习任务中取得了实证成功。最近,在线性最小化预言机(LMO)框架内出现了一类新的随机动量算法,导致了诸如Muon、Scion和Gluon等最先进的方法,有效地解决了深度神经网络训练问题。然而,传统的随机动量方法提供的收敛保证不超过$O(1/K^{1/4})$的速率。虽然一些方法,如Hessian-Corrected Momentum(HCM),旨在改善这一速率,但它们的理论结果通常局限于欧几里得范数设置。这种限制阻碍了它们在通常需要任意范数的问题中的适用性。在本文中,我们通过整合HCM扩展了基于LMO的框架,并在放松光滑性和任意范数设置下提供了收敛保证。我们建立了HCM的改进收敛速率为$O(1/K^{1/3})$,它可以适应问题的几何性质,并实现比传统动量更快的速率。对多层感知器(MLPs)和长短期记忆(LSTM)网络的训练的实验结果验证了我们的理论观察。
更新时间: 2025-12-15 11:43:09
领域: math.OC,cs.LG
Decoding and Engineering the Phytobiome Communication for Smart Agriculture
Smart agriculture applications, integrating technologies like the Internet of Things and machine learning/artificial intelligence (ML/AI) into agriculture, hold promise to address modern challenges of rising food demand, environmental pollution, and water scarcity. Alongside the concept of the phytobiome, which defines the area including the plant, its environment, and associated organisms, and the recent emergence of molecular communication (MC), there exists an important opportunity to advance agricultural science and practice using communication theory. In this article, we motivate to use the communication engineering perspective for developing a holistic understanding of the phytobiome communication and bridge the gap between the phytobiome communication and smart agriculture. Firstly, an overview of phytobiome communication via molecular and electrophysiological signals is presented and a multi-scale framework modeling the phytobiome as a communication network is conceptualized. Then, how this framework is used to model electrophysiological signals is demonstrated with plant experiments. Furthermore, possible smart agriculture applications, such as smart irrigation and targeted delivery of agrochemicals, through engineering the phytobiome communication are proposed. These applications merge ML/AI methods with the Internet of Bio-Nano-Things enabled by MC and pave the way towards more efficient, sustainable, and eco-friendly agricultural production. Finally, the implementation challenges, open research issues, and industrial outlook for these applications are discussed.
Updated: 2025-12-15 11:34:13
标题: 解码和改造植物微生物组通信,实现智能农业
摘要: 智能农业应用,将物联网和机器学习/人工智能(ML/AI)等技术整合到农业中,有望解决现代面临的食品需求增长、环境污染和水资源短缺等挑战。随着植物生物组的概念的提出,该概念定义了包括植物、其环境和相关生物在内的区域,以及分子通信(MC)的最近出现,存在一个重要的机会利用通信理论推动农业科学和实践。在本文中,我们激励使用通信工程的视角来发展对植物生物组通信的整体理解,并弥合植物生物组通信与智能农业之间的差距。首先,通过分子和电生理信号介绍了植物生物组通信的概述,并构想了一个将植物生物组作为通信网络进行建模的多尺度框架。然后,展示了如何利用这个框架来模拟电生理信号的植物实验。此外,通过工程化植物生物组通信提出了可能的智能农业应用,如智能灌溉和农药的定向输送。这些应用将ML/AI方法与通过MC实现的生物纳米物联网相结合,并为更高效、可持续和环保的农业生产铺平了道路。最后,讨论了这些应用的实施挑战、开放研究问题和工业前景。
更新时间: 2025-12-15 11:34:13
领域: eess.SP,cs.AI,cs.ET,cs.NI,q-bio.MN
Rethinking Physics-Informed Regression Beyond Training Loops and Bespoke Architectures
We revisit the problem of physics-informed regression, and propose a method that directly computes the state at the prediction point, simultaneously with the derivative and curvature information of the existing samples. We frame each prediction as a constrained optimisation problem, leveraging multivariate Taylor series expansions and explicitly enforcing physical laws. Each individual query can be processed with low computational cost without any pre- or re-training, in contrast to global function approximator-based solutions such as neural networks. Our comparative benchmarks on a reaction-diffusion system show competitive predictive accuracy relative to a neural network-based solution, while completely eliminating the need for long training loops, and remaining robust to changes in the sampling layout.
Updated: 2025-12-15 11:31:41
标题: 重新思考基于物理的回归:超越训练循环和定制架构
摘要: 我们重新审视了物理学信息回归的问题,并提出了一种方法,该方法直接计算预测点的状态,同时还计算现有样本的导数和曲率信息。我们将每个预测框架化为一个受限制的优化问题,利用多元泰勒级数展开并明确地执行物理定律。与基于全局函数逼近器的解决方案(如神经网络)相比,每个单独查询都可以以较低的计算成本进行处理,无需任何预训练或重新训练。我们在一个反应扩散系统上的比较基准显示,相对于基于神经网络的解决方案,我们的预测准确性具有竞争力,同时完全消除了长时间训练循环的需要,并且对采样布局的更改保持稳健。
更新时间: 2025-12-15 11:31:41
领域: math.OC,cs.LG
Towards Secure Decentralized Applications and Consensus Protocols in Blockchains (on Selfish Mining, Undercutting Attacks, DAG-Based Blockchains, E-Voting, Cryptocurrency Wallets, Secure-Logging, and CBDC)
With the rise of cryptocurrencies, many new applications built on decentralized blockchains have emerged. Blockchains are full-stack distributed systems where multiple sub-systems interact. While many deployed blockchains and decentralized applications need better scalability and performance, security is also critical. Due to their complexity, assessing blockchain and DAPP security requires a more holistic view than for traditional distributed or centralized systems. In this thesis, we summarize our contributions to blockchain and decentralized application security. We propose a security reference architecture to support standardized vulnerability and threat analysis. We study consensus security in single-chain Proof-of-Work blockchains, including resistance to selfish mining, undercutting, and greedy transaction selection, as well as related issues in DAG-based systems. We contribute to wallet security with a new classification of authentication schemes and a two-factor method based on One-Time Passwords. We advance e-voting with a practical boardroom voting protocol, extend it to a scalable version for millions of participants while preserving security and privacy, and introduce a repetitive voting framework that enables vote changes between elections while avoiding peak-end effects. Finally, we improve secure logging using blockchains and trusted computing through a centralized ledger that guarantees non-equivocation, integrity, and censorship evidence, then build on it to propose an interoperability protocol for central bank digital currencies that ensures atomic transfers.
Updated: 2025-12-15 11:26:43
标题: 朝着安全的去中心化应用和区块链共识协议迈进(关于自私挖矿、破坏攻击、基于有向无环图的区块链、电子投票、加密货币钱包、安全日志和中央银行数字货币)
摘要: 随着加密货币的兴起,许多建立在去中心化区块链上的新应用程序已经出现。区块链是全栈分布式系统,其中多个子系统相互作用。虽然许多部署的区块链和去中心化应用程序需要更好的可扩展性和性能,但安全性也至关重要。由于其复杂性,评估区块链和去中心化应用程序的安全性需要比传统的分布式或集中式系统更全面的视角。 在这篇论文中,我们总结了我们在区块链和去中心化应用程序安全方面的贡献。我们提出了一个安全参考架构,以支持标准化的漏洞和威胁分析。我们研究了单链工作量证明区块链中的共识安全性,包括对自私挖矿、削弱和贪婪交易选择的抵抗,以及DAG系统中相关问题。我们通过一个基于一次性密码的新分类认证方案和双因素方法,为钱包安全做出贡献。我们提出了一种实用的董事会投票协议,扩展到适用于数百万参与者的可扩展版本,同时保持安全性和隐私性,并引入一个重复投票框架,使选举之间可以更改选票而避免高峰末端效应。最后,我们通过使用区块链和可信计算来改进安全日志记录,通过一个保证不可否认、完整性和审查证据的中心化账本,然后在此基础上提出了一个中央银行数字货币的互操作性协议,确保原子转账。
更新时间: 2025-12-15 11:26:43
领域: cs.CR,cs.DC,cs.GT
Generating Reliable Synthetic Clinical Trial Data: The Role of Hyperparameter Optimization and Domain Constraints
The generation of synthetic clinical trial data offers a promising approach to mitigating privacy concerns and data accessibility limitations in medical research. However, ensuring that synthetic datasets maintain high fidelity, utility, and adherence to domain-specific constraints remains a key challenge. While hyperparameter optimization (HPO) improves generative model performance, the effectiveness of different optimization strategies for synthetic clinical data remains unclear. This study systematically evaluates four HPO objectives across nine generative models, comparing single-metric to compound metric optimization. Our results demonstrate that HPO consistently improves synthetic data quality, with Tab DDPM achieving the largest relative gains, followed by TVAE (60%), CTGAN (39%), and CTAB-GAN+ (38%). Compound metric optimization outperformed single-metric objectives, producing more generalizable synthetic datasets. Despite improving overall quality, HPO alone fails to prevent violations of essential clinical survival constraints. Preprocessing and postprocessing played a crucial role in reducing these violations, as models lacking robust processing steps produced invalid data in up to 61% of cases. These findings underscore the necessity of integrating explicit domain knowledge alongside HPO to generate high-quality synthetic datasets. Our study provides actionable recommendations for improving synthetic data generation, with future work needed to refine metric selection and validate findings on larger datasets.
Updated: 2025-12-15 11:25:28
标题: 生成可靠的合成临床试验数据:超参数优化和领域约束的作用
摘要: 合成临床试验数据的生成为缓解医学研究中的隐私顾虑和数据获取限制提供了一种有前途的方法。然而,确保合成数据集保持高保真度、实用性和遵守领域特定约束仍然是一个关键挑战。虽然超参数优化(HPO)提高了生成模型的性能,但不同优化策略对合成临床数据的有效性仍不清楚。本研究系统评估了四个HPO目标在九个生成模型上的表现,比较了单指标与复合指标优化。我们的结果表明,HPO始终提高了合成数据的质量,Tab DDPM实现了最大的相对增益,其次是TVAE(60%)、CTGAN(39%)和CTAB-GAN+(38%)。复合指标优化优于单指标目标,产生更具概括性的合成数据集。尽管提高了整体质量,但仅凭HPO无法防止违反基本临床存活约束。预处理和后处理在减少这些违规方面起到了至关重要的作用,因为缺乏强大处理步骤的模型在最多61%的情况下生成无效数据。这些发现强调了将明确的领域知识与HPO相结合以生成高质量合成数据集的必要性。我们的研究提供了改进合成数据生成的可操作建议,未来的工作需要进一步完善指标选择并在更大的数据集上验证结果。
更新时间: 2025-12-15 11:25:28
领域: cs.LG,cs.AI
Evaluating Adversarial Attacks on Federated Learning for Temperature Forecasting
Deep learning and federated learning (FL) are becoming powerful partners for next-generation weather forecasting. Deep learning enables high-resolution spatiotemporal forecasts that can surpass traditional numerical models, while FL allows institutions in different locations to collaboratively train models without sharing raw data, addressing efficiency and security concerns. While FL has shown promise across heterogeneous regions, its distributed nature introduces new vulnerabilities. In particular, data poisoning attacks, in which compromised clients inject manipulated training data, can degrade performance or introduce systematic biases. These threats are amplified by spatial dependencies in meteorological data, allowing localized perturbations to influence broader regions through global model aggregation. In this study, we investigate how adversarial clients distort federated surface temperature forecasts trained on the Copernicus European Regional ReAnalysis (CERRA) dataset. We simulate geographically distributed clients and evaluate patch-based and global biasing attacks on regional temperature forecasts. Our results show that even a small fraction of poisoned clients can mislead predictions across large, spatially connected areas. A global temperature bias attack from a single compromised client shifts predictions by up to -1.7 K, while coordinated patch attacks more than triple the mean squared error and produce persistent regional anomalies exceeding +3.5 K. Finally, we assess trimmed mean aggregation as a defense mechanism, showing that it successfully defends against global bias attacks (2-13\% degradation) but fails against patch attacks (281-603\% amplification), exposing limitations of outlier-based defenses for spatially correlated data.
Updated: 2025-12-15 11:22:24
标题: 评估对温度预测联邦学习的对抗性攻击
摘要: 深度学习和联邦学习(FL)正在成为下一代天气预报的强大合作伙伴。深度学习使得能够超越传统数值模型的高分辨率时空预测成为可能,而FL使得不同地点的机构能够共同训练模型,而无需共享原始数据,从而解决效率和安全性问题。虽然FL在异质区域显示出潜力,但其分布式性质引入了新的漏洞。特别是,数据中毒攻击,即受损客户端注入操纵的训练数据,可能会降低性能或引入系统偏差。这些威胁在气象数据中的空间依赖性下得以加剧,允许局部扰动通过全局模型聚合影响更广泛的区域。在本研究中,我们调查了对Copernicus欧洲区域再分析(CERRA)数据集训练的联邦地表温度预报进行扭曲的对抗客户端。我们模拟地理分布的客户端,并评估对区域温度预报进行基于补丁和全局偏差的攻击。我们的结果表明,即使是少量中毒客户端也可以在大范围、空间连接的区域中误导预测。来自单个受损客户端的全球温度偏差攻击可将预测值偏移至-1.7 K,而协调的补丁攻击会使均方误差超过三倍,并产生持续的区域异常,超过+3.5 K。最后,我们评估了修剪均值聚合作为一种防御机制,结果显示它成功抵御了全局偏差攻击(2-13\%的降级),但在补丁攻击(281-603\%的放大)中失败,暴露了基于离群值的防御在空间相关数据中的局限性。
更新时间: 2025-12-15 11:22:24
领域: cs.LG,cs.CR
Investigation of a Bit-Sequence Reconciliation Protocol Based on Neural TPM Networks in Secure Quantum Communications
The article discusses a key reconciliation protocol for quantum key distribution (QKD) systems based on Tree Parity Machines (TPM). The idea of transforming key material into neural network weights is presented. Two experiments were conducted to study how the number of synchronization iterations and the amount of leaked information depend on the quantum bit error rate (QBER) and the range of neural network weights. The results show a direct relationship between the average number of synchronization iterations and QBER, an increase in iterations when the weight range is expanded, and a reduction in leaked information as the weight range increases. Based on these results, conclusions are drawn regarding the applicability of the protocol and the prospects for further research on neural cryptographic methods in the context of key reconciliation.
Updated: 2025-12-15 11:14:00
标题: 基于神经TPM网络的比特序列调和协议在安全量子通信中的研究
摘要: 本文讨论了基于树奇偶机(TPM)的量子密钥分发(QKD)系统的关键协商协议。提出了将密钥材料转换为神经网络权重的想法。进行了两个实验,研究了同步迭代次数和泄漏信息量如何取决于量子比特误码率(QBER)和神经网络权重范围。结果显示,平均同步迭代次数与QBER之间存在直接关系,在权重范围扩大时迭代次数增加,并且随着权重范围的增加泄漏信息减少。根据这些结果,得出了关于协议适用性和在密钥协商背景下神经密码学方法进一步研究前景的结论。
更新时间: 2025-12-15 11:14:00
领域: quant-ph,cs.CR
MicroPhaseNO: Adapting an Earthquake-Trained Phase Neural Operator for Microseismic Phase Picking
Seismic phase picking is very often used for microseismic monitoring and subsurface imaging. Traditional manual processing is not feasible for either real-time applications or large arrays. Deep learning-based pickers trained on large earthquake catalogs offer an automated alternative. However, they are typically optimized for high signal-to-noise, long-duration networks and struggle with the challenges presented by microseismic datasets, which are purpose-built for limited time without previously detected seismicity. In this study, we demonstrate how a network-wide earthquake phase picker, the Phase Neural Operator (PhaseNO), can be adapted to microseismic monitoring using transfer learning. Starting from a PhaseNO model pre-trained on more than 57,000 three-component earthquake and noise records, we fine-tune the model using only 200 labeled and noise seismograms from induced events in hydraulic-fracturing settings. The fine-tuned model thus preserves the rich spatio-temporal representation learned from abundant earthquake data, while adapting to the characteristics and labeling conventions of microseismic phases, which are often picked on peaks or troughs rather than onsets. We evaluate performance on three distinct real-world microseismic datasets with different network geometries and acquisition parameters. Compared to the original PhaseNO and a conventional workflow, the adapted model increases F1 score and accuracy by up to 30%, and strongly reduces systematic timing bias and pick uncertainty. Because the adaptation relies on a small, campaign-specific calibration set, the approach is readily transferable to other microseismic tasks where public earthquake data and pre-trained models are accessible. The associated code will be released openly at https://github.com/ayratabd/MicroPhaseNO.
Updated: 2025-12-15 11:13:21
标题: MicroPhaseNO: 将经过地震训练的相神经网络操作符适应微地震相拾取
摘要: 地震相位拾取在微地震监测和地下成像中经常被使用。传统的手动处理对于实时应用或大型阵列都不可行。基于深度学习的拾取器在大型地震目录上训练提供了一种自动化替代方案。然而,它们通常针对高信噪比、长持续时间网络进行优化,并且在处理微地震数据集时面临挑战,这些数据集是专门为限定时间而构建的,没有先前检测到的地震活动。在本研究中,我们展示了一个网络范围的地震相位拾取器Phase Neural Operator(PhaseNO)如何通过迁移学习适应微地震监测。从一个预先训练了超过57,000个三分量地震和噪声记录的PhaseNO模型开始,我们仅使用来自水力压裂事件中诱发事件的200个标记和噪声地震图进行微调。因此,微调模型保留了从丰富的地震数据中学到的丰富时空表征,同时适应了微地震相位的特征和标记习惯,通常在峰值或谷值上而不是起始点上拾取。我们在三个不同网络几何结构和采集参数的真实世界微地震数据集上评估性能。与原始PhaseNO和传统工作流程相比,适应模型将F1分数和准确性提高了高达30%,并且大大减少了系统定时偏差和拾取不确定性。由于适应依赖于一个小型的特定活动校准集,该方法可轻松迁移到其他微地震任务,其中公共地震数据和预训练模型是可访问的。相关代码将在https://github.com/ayratabd/MicroPhaseNO上公开发布。
更新时间: 2025-12-15 11:13:21
领域: physics.geo-ph,cs.LG
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach
In this paper we present a neurosymbolic architecture for coupling language-guided visual reasoning with robot manipulation. A non-expert human user can prompt the robot using unconstrained natural language, providing a referring expression (REF), a question (VQA), or a grasp action instruction. The system tackles all cases in a task-agnostic fashion through the utilization of a shared library of primitive skills. Each primitive handles an independent sub-task, such as reasoning about visual attributes, spatial relation comprehension, logic and enumeration, as well as arm control. A language parser maps the input query to an executable program composed of such primitives, depending on the context. While some primitives are purely symbolic operations (e.g. counting), others are trainable neural functions (e.g. visual grounding), therefore marrying the interpretability and systematic generalization benefits of discrete symbolic approaches with the scalability and representational power of deep networks. We generate a 3D vision-and-language synthetic dataset of tabletop scenes in a simulation environment to train our approach and perform extensive evaluations in both synthetic and real-world scenes. Results showcase the benefits of our approach in terms of accuracy, sample-efficiency, and robustness to the user's vocabulary, while being transferable to real-world scenes with few-shot visual fine-tuning. Finally, we integrate our method with a robot framework and demonstrate how it can serve as an interpretable solution for an interactive object-picking task, achieving an average success rate of 80.2\%, both in simulation and with a real robot. We make supplementary material available in https://gtziafas.github.io/neurosymbolic-manipulation.
Updated: 2025-12-15 11:12:03
标题: 通过神经符号方法提高机器人操作的可解释性和互动性
摘要: 在本文中,我们提出了一种神经符号架构,用于将语言导向的视觉推理与机器人操作相结合。一个非专家用户可以使用不受限制的自然语言提示机器人,提供一个指称表达式(REF)、一个问题(VQA)或一个抓取动作指令。系统以一种与任务无关的方式处理所有情况,通过利用一个共享的原始技能库。每个原始技能处理一个独立的子任务,如推理视觉属性、空间关系理解、逻辑和枚举,以及手臂控制。一个语言解析器将输入查询映射到一个由这些原始技能组成的可执行程序,取决于上下文。虽然一些原始技能是纯符号操作(例如计数),另一些是可训练的神经函数(例如视觉基础),因此将离散符号方法的可解释性和系统化泛化优势与深度网络的可扩展性和表征能力相结合。我们生成了一个在模拟环境中的桌面场景的3D视觉和语言综合数据集,用于训练我们的方法,并在合成和真实场景中进行了广泛的评估。结果展示了我们的方法在准确性、样本效率和对用户词汇的鲁棒性方面的优势,同时可以在进行少量视觉微调后迁移到真实场景。最后,我们将方法与机器人框架集成,并演示了它如何作为交互式物体拾取任务的可解释解决方案,实现了80.2\%的平均成功率,无论是在模拟环境中还是与真实机器人一起。我们将补充材料提供在 https://gtziafas.github.io/neurosymbolic-manipulation。
更新时间: 2025-12-15 11:12:03
领域: cs.RO,cs.AI,cs.HC
Noise-Resilient Quantum Aggregation on NISQ for Federated ADAS Learning
Advanced Driver Assistance Systems (ADAS) increasingly employ Federated Learning (FL) to collaboratively train models across distributed vehicular nodes while preserving data privacy. Yet, conventional FL aggregation remains susceptible to noise, latency, and security constraints inherent to real-time vehicular networks. This paper introduces Noise-Resilient Quantum Federated Learning (NR-QFL), a hybrid quantum-classical framework that enables secure, low-latency aggregation through variational quantum circuits (VQCs) operating under Noisy Intermediate-Scale Quantum (NISQ) conditions. The framework encodes model parameters as quantum states with adaptive gate reparameterization, ensuring bounded-error convergence and provable resilience under Completely Positive Trace-Preserving (CPTP) dynamics. NR-QFL employs quantum entropy-based client selection and multi-server coordination for fairness and stability. Empirical validation shows consistent convergence with reduced gradient variance, lower communication overhead, and enhanced noise tolerance under constrained edge conditions. The framework establishes a scalable foundation for quantum-enhanced federated learning, enabling secure, efficient, and dynamically stable ADAS intelligence at the vehicular edge.
Updated: 2025-12-15 11:10:40
标题: 噪声鲁棒的NISQ量子聚合在联邦ADAS学习中的应用
摘要: 高级驾驶辅助系统(ADAS)越来越多地采用联合学习(FL)来在分布式车载节点之间协作训练模型,同时保护数据隐私。然而,传统的FL聚合仍然容易受到噪声、延迟和安全约束的影响,这些约束是实时车载网络固有的。本文介绍了噪声鲁棒量子联合学习(NR-QFL),这是一个混合量子-经典框架,通过在噪声中等规模量子(NISQ)条件下运行的变分量子电路(VQC)实现安全、低延迟的聚合。该框架将模型参数编码为量子状态,并通过自适应门重参数化,确保有界误差收敛,并在完全正数保持(CPTP)动态下具有可证明的韧性。NR-QFL采用基于量子熵的客户端选择和多服务器协调,以实现公平和稳定。经验验证显示,在受限制的边缘条件下,该框架具有一致的收敛性,减少梯度方差,减少通信开销,并提高对噪声的容忍度。该框架为量子增强的联合学习奠定了可扩展的基础,实现了安全、高效和动态稳定的ADAS智能在车载边缘。
更新时间: 2025-12-15 11:10:40
领域: cs.LG,cs.AR
Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models
Speculative Decoding is a prominent technique for accelerating the autoregressive inference of large language models (LLMs) by employing a fast draft model to propose candidate token sequences and a large target model to verify them in parallel. However, its core component -- the rejection sampling mechanism -- relies on a fixed, context-independent random threshold. This leads to a significant "random rejection" problem in high-uncertainty generation scenarios, where plausible candidate tokens are frequently rejected due to random chance, undermining inference efficiency. This paper introduces Efficient Adaptive Rejection Sampling (EARS), a novel method that dynamically adjusts the acceptance threshold by incorporating the target model's own predictive uncertainty, measured as \(1 - \max(P_{\mathrm{target}})\). By introducing a tolerance term proportional to this uncertainty, EARS intelligently relaxes the acceptance criterion when the model is uncertain, effectively reducing random rejections while maintaining strict standards when the model is confident. Experiments on creative writing and open-domain QA tasks demonstrate that EARS significantly enhances the efficiency of speculative decoding, achieving up to an 18.12% increase in throughput with a negligible 0.84% accuracy drop on the GSM8K benchmark. The method requires no modifications to model architectures and can be seamlessly integrated into existing speculative decoding frameworks.
Updated: 2025-12-15 11:08:56
标题: 高效自适应拒绝采样:加速大型语言模型中的推测解码
摘要: 投机解码是加速大型语言模型(LLMs)的自回归推断的一个显著技术,它利用快速草稿模型提出候选令牌序列,并使用大型目标模型并行验证这些序列。然而,其核心组成部分--拒绝抽样机制--依赖于一个固定的、与上下文无关的随机阈值。这导致在高不确定性生成情景中存在显著的“随机拒绝”问题,即由于随机机会,可能的候选令牌经常被拒绝,从而削弱推断效率。本文介绍了高效自适应拒绝抽样(EARS),这是一种新颖的方法,通过将目标模型自身的预测不确定性(\(1 - \max(P_{\mathrm{target}})\))纳入考虑,动态调整接受阈值。通过引入与此不确定性成比例的容差项,EARS可以在模型不确定时智能地放宽接受标准,在模型自信时保持严格标准,有效减少随机拒绝。在创意写作和开放领域问答任务上的实验表明,EARS显著提高了投机解码的效率,在GSM8K基准测试中,吞吐量最多提高了18.12%,准确率仅下降了可忽略的0.84%。该方法不需要对模型架构进行修改,可以无缝集成到现有的投机解码框架中。
更新时间: 2025-12-15 11:08:56
领域: cs.CL,cs.AI
CRPS-Based Targeted Sequential Design with Application in Chemical Space
Sequential design of real and computer experiments via Gaussian Process (GP) models has proven useful for parsimonious, goal-oriented data acquisition purposes. In this work, we focus on acquisition strategies for a GP model that needs to be accurate within a predefined range of the response of interest. Such an approach is useful in various fields including synthetic chemistry, where finding molecules with particular properties is essential for developing useful materials and effective medications. GP modeling and sequential design of experiments have been successfully applied to a plethora of domains, including molecule research. Our main contribution here is to use the threshold-weighted Continuous Ranked Probability Score (CRPS) as a basic building block for acquisition functions employed within sequential design. We study pointwise and integral criteria relying on two different weighting measures and benchmark them against competitors, demonstrating improved performance with respect to considered goals. The resulting acquisition strategies are applicable to a wide range of fields and pave the way to further developing sequential design relying on scoring rules.
Updated: 2025-12-15 10:58:38
标题: CRPS基于目标顺序设计在化学空间中的应用
摘要: 通过高斯过程(GP)模型进行实际和计算机实验的顺序设计已被证明对于简约、目标导向的数据采集目的非常有用。在这项工作中,我们专注于针对需要在感兴趣的响应的预定范围内准确的GP模型的采集策略。这种方法在包括合成化学在内的各个领域都很有用,其中找到具有特定性质的分子对于开发有用材料和有效药物至关重要。GP建模和实验的顺序设计已成功应用于众多领域,包括分子研究。我们在这里的主要贡献是将阈值加权的连续排名概率得分(CRPS)作为顺序设计中采集函数的基本构建块。我们研究了依赖于两种不同加权措施的逐点和积分标准,并将它们与竞争对手进行了基准测试,展示了相对于考虑的目标而言的性能改进。由此产生的采集策略适用于广泛的领域,为进一步基于评分规则开发顺序设计铺平了道路。
更新时间: 2025-12-15 10:58:38
领域: stat.ML,cs.LG,stat.AP,stat.CO
WAY: Estimation of Vessel Destination in Worldwide AIS Trajectory
The Automatic Identification System (AIS) enables data-driven maritime surveillance but suffers from reliability issues and irregular intervals. We address vessel destination estimation using global-scope AIS data by proposing a differentiated approach that recasts long port-to-port trajectories as a nested sequence structure. Using spatial grids, this method mitigates spatio-temporal bias while preserving detailed resolution. We introduce a novel deep learning architecture, WAY, designed to process these reformulated trajectories for long-term destination estimation days to weeks in advance. WAY comprises a trajectory representation layer and Channel-Aggregative Sequential Processing (CASP) blocks. The representation layer generates multi-channel vector sequences from kinematic and non-kinematic features. CASP blocks utilize multi-headed channel- and self-attention for aggregation and sequential information delivery. Additionally, we propose a task-specialized Gradient Dropout (GD) technique to enable many-to-many training on single labels, preventing biased feedback surges by stochastically blocking gradient flow based on sample length. Experiments on 5-year AIS data demonstrate WAY's superiority over conventional spatial grid-based approaches regardless of trajectory progression. Results further confirm that adopting GD leads to performance gains. Finally, we explore WAY's potential for real-world application through multitask learning for ETA estimation.
Updated: 2025-12-15 10:55:20
标题: 方法:全球AIS轨迹中船只目的地的估计
摘要: 自动识别系统(AIS)实现了基于数据驱动的海上监视,但存在可靠性问题和不规则的间隔。我们提出了一种差异化的方法,通过重新构建长途港口之间的航行轨迹为嵌套序列结构,来处理全球范围的AIS数据进行船舶目的地估计。利用空间网格,这种方法减轻了时空偏差,同时保留了详细的分辨率。我们引入了一种新颖的深度学习架构WAY,旨在处理这些重新构建的轨迹,以提前数天到数周进行目的地估计。WAY包括一个轨迹表示层和通道聚合顺序处理(CASP)块。表示层从运动学和非运动学特征生成多通道向量序列。CASP块利用多头通道和自注意力进行聚合和顺序信息传递。此外,我们提出了一种任务专用的梯度丢弃(GD)技术,以实现单标签的多对多训练,通过基于样本长度随机阻止梯度流,防止反馈过程中的偏差。对5年AIS数据的实验表明,WAY在轨迹进展方面优于传统的基于空间网格的方法。结果进一步证实,采用GD会带来性能提升。最后,我们探讨了WAY在实际应用中通过多任务学习进行ETA估计的潜力。
更新时间: 2025-12-15 10:55:20
领域: cs.LG,cs.AI
PolySet: Restoring the Statistical Ensemble Nature of Polymers for Machine Learning
Machine-learning (ML) models in polymer science typically treat a polymer as a single, perfectly defined molecular graph, even though real materials consist of stochastic ensembles of chains with distributed lengths. This mismatch between physical reality and digital representation limits the ability of current models to capture polymer behaviour. Here we introduce PolySet, a framework that represents a polymer as a finite, weighted ensemble of chains sampled from an assumed molar-mass distribution. This ensemble-based encoding is independent of chemical detail, compatible with any molecular representation and illustrated here in the homopolymer case using a minimal language model. We show that PolySet retains higher-order distributional moments (such as Mz, Mz+1), enabling ML models to learn tail-sensitive properties with greatly improved stability and accuracy. By explicitly acknowledging the statistical nature of polymer matter, PolySet establishes a physically grounded foundation for future polymer machine learning, naturally extensible to copolymers, block architectures, and other complex topologies.
Updated: 2025-12-15 10:50:48
标题: PolySet:恢复聚合物的统计集合特性以用于机器学习
摘要: 在聚合物科学中,机器学习(ML)模型通常将聚合物视为单一、完美定义的分子图,即使真实材料由具有分布长度的链的随机集合组成。这种物理现实与数字表示之间的不匹配限制了当前模型捕捉聚合物行为的能力。在这里,我们介绍了PolySet,这是一个框架,将聚合物表示为从假定的摩尔质量分布中采样的有限加权链集合。这种基于集合的编码与化学细节无关,与任何分子表示兼容,并且在这里通过使用最小语言模型展示了同聚物情况。我们展示了PolySet保留了更高阶的分布矩(如Mz、Mz+1),使得ML模型能够学习具有极大稳定性和准确性的尾部敏感性属性。通过明确承认聚合物物质的统计性质,PolySet为未来聚合物机器学习奠定了一个基于物理的基础,自然地可扩展到共聚物、嵌段结构和其他复杂的拓扑结构。
更新时间: 2025-12-15 10:50:48
领域: cs.LG,cond-mat.mtrl-sci,cs.AI
Error Estimate and Convergence Analysis for Data Valuation
Data valuation quantifies data importance, but existing methods cannot ensure validity in a single training process. The neural dynamic data valuation (NDDV) method [3] addresses this limitation. Based on NDDV, we are the first to explore error estimation and convergence analysis in data valuation. Under Lipschitz and smoothness assumptions, we derive quadratic error bounds for loss differences that scale inversely with time steps and quadratically with control variations, ensuring stability. We also prove that the expected squared gradient norm for the training loss vanishes asymptotically, and that the meta loss converges sublinearly over iterations. In particular, NDDV achieves sublinear convergence.
Updated: 2025-12-15 10:50:29
标题: 数据估值的误差估计和收敛分析
摘要: 数据估值量化了数据的重要性,但现有方法无法确保在单个训练过程中的有效性。神经动态数据估值(NDDV)方法[3]解决了这一限制。基于NDDV,我们是第一个探索数据估值中的误差估计和收敛分析。在Lipschitz和平滑性假设下,我们推导了损失差异的二次误差界,这些误差界与时间步长成反比例关系,与控制变化的平方成二次关系,确保了稳定性。我们还证明了训练损失的期望平方梯度范数会在渐近情况下消失,并且元损失会在迭代过程中次线性收敛。特别地,NDDV实现了次线性收敛。
更新时间: 2025-12-15 10:50:29
领域: cs.LG
A Collectivist, Economic Perspective on AI
Information technology is in the midst of a revolution in which omnipresent data collection and machine learning are impacting the human world as never before. The word ``intelligence'' is being used as a North Star for the development of this technology, with human cognition viewed as a baseline. This view neglects the fact that humans are social animals and that much of our intelligence is social and cultural in origin. Moreover, failing to properly situate aspects of intelligence at the social level contributes to the treatment of the societal consequences of technology as an afterthought. The path forward is not merely more data and compute, and not merely more attention paid to cognitive or symbolic representations, but a thorough blending of economic and social concepts with computational and inferential concepts at the level of algorithm design.
Updated: 2025-12-15 10:39:01
标题: 一个集体主义、经济学视角下的人工智能
摘要: 信息技术正处于一场革命之中,无处不在的数据收集和机器学习正在以前所未有的方式影响人类世界。“智能”一词被用作这项技术发展的北极星,将人类认知视为基准。然而,这种观点忽视了人类是社会动物这一事实,我们的许多智慧是社会和文化的产物。此外,未能妥善将智力的各个方面置于社会层面,会导致对技术社会后果的对待成为事后的想法。前进的道路不仅仅是更多的数据和计算资源,也不仅仅是更多的关注认知或符号表征,而是在算法设计层面上彻底融合经济和社会概念与计算和推理概念。
更新时间: 2025-12-15 10:39:01
领域: cs.CY,cs.AI,stat.ML
Carrot, stick, or both? Price incentives for sustainable food choice in competitive environments
Meat consumption is a major driver of global greenhouse gas emissions. While pricing interventions have shown potential to reduce meat intake, previous studies have focused on highly constrained environments with limited consumer choice. Here, we present the first large-scale field experiment to evaluate multiple pricing interventions in a real-world, competitive setting. Using a sequential crossover design with matched menus in a Swiss university campus, we systematically compared vegetarian-meal discounts (-2.5 CHF), meat surcharges (+2.5 CHF), and a combined scheme (-1.2 CHF=+1.2 CHF) across four campus cafeterias. Only the surcharge and combined interventions led to significant increases in vegetarian meal uptake--by 26.4% and 16.6%, respectively--and reduced CO2 emissions per meal by 7.4% and 11.3%, respectively. The surcharge, while effective, triggered a 12.3% drop in sales at intervention sites and a corresponding 14.9% increase in non-treated locations, hence causing a spillover effect that completely offset environmental gains. In contrast, the combined approach achieved meaningful emission reductions without significant effects on overall sales or revenue, making it both effective and economically viable. Notably, pricing interventions were equally effective for both vegetarian-leaning customers and habitual meat-eaters, stimulating change even within entrenched dietary habits. Our results show that balanced pricing strategies can reduce the carbon footprint of realistic food environments, but require coordinated implementation to maximize climate benefits and avoid unintended spillover effects.
Updated: 2025-12-15 10:35:44
标题: 胡萝卜、鞭子还是两者兼有?竞争环境下可持续食品选择的价格激励
摘要: 肉类消费是全球温室气体排放的主要驱动因素。尽管定价干预已显示出减少肉类摄入量的潜力,但先前的研究主要集中在选择有限的高度受限环境中。在这里,我们首次展示了一个大规模实地实验,以评估现实世界中竞争环境中的多个定价干预。在瑞士大学校园中使用顺序交叉设计和匹配菜单,我们系统地比较了素食餐折扣(-2.5瑞士法郎)、肉类附加费(+2.5瑞士法郎)和组合方案(-1.2瑞士法郎=+1.2瑞士法郎)在四个校园餐厅中的效果。只有附加费和组合干预导致素食餐的摄入量显著增加--分别增加了26.4%和16.6%,并将每餐二氧化碳排放减少了7.4%和11.3%。附加费虽然有效,但导致干预地点销量下降了12.3%,而未处理地点相应增加了14.9%,因此引发了一个完全抵消环境收益的溢出效应。相比之下,组合方法实现了有意义的减排效果,而且对整体销售或收入没有显著影响,使其既有效又经济可行。值得注意的是,定价干预对于素食倾向顾客和习惯性肉食者同样有效,促进了甚至在根深蒂固的饮食习惯中的改变。我们的结果表明,平衡的定价策略可以降低现实食品环境的碳足迹,但需要协调实施以最大化气候效益,并避免意外的溢出效应。
更新时间: 2025-12-15 10:35:44
领域: econ.GN,cs.AI
Iterative Tuning of Nonlinear Model Predictive Control for Robotic Manufacturing Tasks
Manufacturing processes are often perturbed by drifts in the environment and wear in the system, requiring control re-tuning even in the presence of repetitive operations. This paper presents an iterative learning framework for automatic tuning of Nonlinear Model Predictive Control (NMPC) weighting matrices based on task-level performance feedback. Inspired by norm-optimal Iterative Learning Control (ILC), the proposed method adaptively adjusts NMPC weights Q and R across task repetitions to minimize key performance indicators (KPIs) related to tracking accuracy, control effort, and saturation. Unlike gradient-based approaches that require differentiating through the NMPC solver, we construct an empirical sensitivity matrix, enabling structured weight updates without analytic derivatives. The framework is validated through simulation on a UR10e robot performing carbon fiber winding on a tetrahedral core. Results demonstrate that the proposed approach converges to near-optimal tracking performance (RMSE within 0.3% of offline Bayesian Optimization (BO)) in just 4 online repetitions, compared to 100 offline evaluations required by BO algorithm. The method offers a practical solution for adaptive NMPC tuning in repetitive robotic tasks, combining the precision of carefully optimized controllers with the flexibility of online adaptation.
Updated: 2025-12-15 10:30:40
标题: 机器人制造任务的非线性模型预测控制的迭代调节
摘要: 制造过程经常会受到环境漂移和系统磨损的影响,即使在重复操作的情况下,也需要重新调整控制。本文提出了一种基于任务级性能反馈的自动调整非线性模型预测控制(NMPC)加权矩阵的迭代学习框架。受到范数最优迭代学习控制(ILC)的启发,所提出的方法通过跨任务重复自适应调整NMPC权重Q和R,以最小化与跟踪精度、控制努力和饱和度相关的关键性能指标(KPIs)。与基于梯度的方法需要通过NMPC求解器进行微分不同,我们构建了一个经验敏感性矩阵,使得能够进行结构化的权重更新而无需解析导数。该框架通过在一个UR10e机器人在四面体核上进行碳纤维绕制的模拟验证。结果表明,与BO算法所需的100次离线评估相比,所提出的方法只需进行4次在线重复就能收敛到接近最优的跟踪性能(均方根误差在离线贝叶斯优化(BO)的0.3%)。该方法为重复机器人任务中的自适应NMPC调整提供了实用解决方案,结合了精心优化控制器的精度和在线适应性的灵活性。
更新时间: 2025-12-15 10:30:40
领域: cs.RO,cs.LG,eess.SY,math.OC
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows
We introduce a finance & accounting benchmark (Finch) for evaluating AI agents on real-world, enterprise-grade professional workflows -- interleaving data entry, structuring, formatting, web search, cross-file retrieval, calculation, modeling, validation, translation, visualization, and reporting. Finch is sourced from authentic enterprise workspaces at Enron (15,000 spreadsheets and 500,000 emails from 150 employees) and other financial institutions, preserving in-the-wild messiness across multimodal artifacts (text, tables, formulas, charts, code, and images) and spanning diverse domains such as budgeting, trading, and asset management. We propose a workflow construction process that combines LLM-assisted discovery with expert annotation: (1) LLM-assisted, expert-verified derivation of workflows from real-world email threads and version histories of spreadsheet files, and (2) meticulous expert annotation for workflows, requiring over 700 hours of domain-expert effort. This yields 172 composite workflows with 384 tasks, involving 1,710 spreadsheets with 27 million cells, along with PDFs and other artifacts, capturing the intrinsically messy, long-horizon, knowledge-intensive, and collaborative nature of real-world enterprise work. We conduct both human and automated evaluations of frontier AI systems including GPT 5.1, Claude Sonnet 4.5, Gemini 3 Pro, Grok 4, and Qwen 3 Max, and GPT 5.1 Pro spends 48 hours in total yet passes only 38.4% of workflows, while Claude Sonnet 4.5 passes just 25.0%. Comprehensive case studies further surface the challenges that real-world enterprise workflows pose for AI agents.
Updated: 2025-12-15 10:28:45
标题: 雀鸟:跨电子表格中心企业工作流程的财务会计基准测试
摘要: 我们引入了一个金融和会计基准(Finch),用于评估AI代理在现实世界中的企业级专业工作流程上的表现 - 包括数据输入、结构化、格式化、网络搜索、跨文件检索、计算、建模、验证、翻译、可视化和报告。Finch的数据来源于Enron(来自150名员工的15,000个电子表格和500,000封电子邮件)等真实企业工作空间,保留了跨多模态工件(文本、表格、公式、图表、代码和图像)的混乱,并涵盖了预算、交易和资产管理等不同领域。 我们提出了一个工作流程构建过程,结合了LLM辅助发现和专家注释:(1)LLM辅助的、经专家验证的从真实世界电子邮件线程和电子表格文件版本历史中导出工作流程,以及(2)需要超过700小时领域专家努力的细致专家注释。这产生了172个复合工作流程,包括384个任务,涉及1,710个电子表格和2700万个单元格,以及PDF和其他工件,捕捉了真实世界企业工作的固有混乱、长期、知识密集和协作性质。 我们对包括GPT 5.1、Claude Sonnet 4.5、Gemini 3 Pro、Grok 4和Qwen 3 Max在内的前沿AI系统进行了人工和自动化评估,而GPT 5.1 Pro总共花费了48小时,但只通过了38.4%的工作流程,而Claude Sonnet 4.5只通过了25.0%。综合案例研究进一步展示了真实世界企业工作流程对AI代理提出的挑战。
更新时间: 2025-12-15 10:28:45
领域: cs.AI,cs.CE,cs.IR,cs.MA
SACn: Soft Actor-Critic with n-step Returns
Soft Actor-Critic (SAC) is widely used in practical applications and is now one of the most relevant off-policy online model-free reinforcement learning (RL) methods. The technique of n-step returns is known to increase the convergence speed of RL algorithms compared to their 1-step returns-based versions. However, SAC is notoriously difficult to combine with n-step returns, since their usual combination introduces bias in off-policy algorithms due to the changes in action distribution. While this problem is solved by importance sampling, a method for estimating expected values of one distribution using samples from another distribution, importance sampling may result in numerical instability. In this work, we combine SAC with n-step returns in a way that overcomes this issue. We present an approach to applying numerically stable importance sampling with simplified hyperparameter selection. Furthermore, we analyze the entropy estimation approach of Soft Actor-Critic in the context of the n-step maximum entropy framework and formulate the $τ$-sampled entropy estimation to reduce the variance of the learning target. Finally, we formulate the Soft Actor-Critic with n-step returns (SAC$n$) algorithm that we experimentally verify on MuJoCo simulated environments.
Updated: 2025-12-15 10:23:13
标题: SACn:具有n步回报的软性演员-评论家算法
摘要: 软演员评论家(SAC)广泛应用于实际应用中,现在是最相关的离线策略无模型增强学习(RL)方法之一。 n步返回技术被认为可以加快RL算法的收敛速度,与基于1步返回的版本相比。然而,由于动作分布的变化,SAC与n步回报难以结合,因为它们通常的结合会在离线策略算法中引入偏差。虽然重要性抽样可以解决这个问题,重要性抽样可能导致数值不稳定性。在这项工作中,我们以一种克服这个问题的方式将SAC与n步回报相结合。我们提出了一种应用数值稳定的重要性抽样和简化的超参数选择方法。此外,我们分析了软演员评论家在n步最大熵框架下的熵估计方法,并制定了τ-采样熵估计方法以减少学习目标的方差。最后,我们制定了Soft Actor-Critic with n-step returns(SAC$n$)算法,并在MuJoCo模拟环境中进行了实验验证。
更新时间: 2025-12-15 10:23:13
领域: cs.LG,cs.AI
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis
The development of clinical-grade artificial intelligence in pathology is limited by the scarcity of diverse, high-quality annotated datasets. Generative models offer a potential solution but suffer from semantic instability and morphological hallucinations that compromise diagnostic reliability. To address this challenge, we introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS), the first generative foundation model for pathology-specific text-to-image synthesis. By leveraging a dual-stage training strategy on approximately 2.8 million image-caption pairs, CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy. This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations. Furthermore, CRAFTS-augmented datasets enhance the performance across various clinical tasks, including classification, cross-modal retrieval, self-supervised learning, and visual question answering. In addition, coupling CRAFTS with ControlNet enables precise control over tissue architecture from inputs such as nuclear segmentation masks and fluorescence images. By overcoming the critical barriers of data scarcity and privacy concerns, CRAFTS provides a limitless source of diverse, annotated histology data, effectively unlocking the creation of robust diagnostic tools for rare and complex cancer phenotypes.
Updated: 2025-12-15 10:22:43
标题: 一个语义增强的生成式基础模型改进了病理图像合成
摘要: 在病理学中开发临床级别的人工智能受到多样性、高质量标注数据集的稀缺性限制。生成模型提供了潜在的解决方案,但受到语义不稳定性和形态幻觉的影响,从而损害了诊断可靠性。为了解决这一挑战,我们引入了一种面向组织合成的相关调节对齐框架(CRAFTS),这是病理学特定的文本到图像合成的第一个生成基础模型。通过在大约280万个图像-标题对上采用双阶段训练策略,CRAFTS整合了一种新颖的对齐机制,抑制语义漂移以确保生物准确性。该模型生成了涵盖30种癌症类型的多样化病理图像,其质量经过客观指标和病理学家评估的严格验证。此外,CRAFTS增强的数据集提高了各种临床任务的性能,包括分类、跨模态检索、自监督学习和视觉问答。此外,将CRAFTS与ControlNet相结合,可以精确控制从核分割掩模和荧光图像等输入的组织结构。通过克服数据稀缺性和隐私问题的关键障碍,CRAFTS为多样化、标注组织学数据提供了一个无限的来源,有效地解锁了用于罕见和复杂癌症表型的强大诊断工具的创建。
更新时间: 2025-12-15 10:22:43
领域: cs.CV,cs.AI
HTMformer: Hybrid Time and Multivariate Transformer for Time Series Forecasting
Transformer-based methods have achieved impressive results in time series forecasting. However, existing Transformers still exhibit limitations in sequence modeling as they tend to overemphasize temporal dependencies. This incurs additional computational overhead without yielding corresponding performance gains. We find that the performance of Transformers is highly dependent on the embedding method used to learn effective representations. To address this issue, we extract multivariate features to augment the effective information captured in the embedding layer, yielding multidimensional embeddings that convey richer and more meaningful sequence representations. These representations enable Transformer-based forecasters to better understand the series. Specifically, we introduce Hybrid Temporal and Multivariate Embeddings (HTME). The HTME extractor integrates a lightweight temporal feature extraction module with a carefully designed multivariate feature extraction module to provide complementary features, thereby achieving a balance between model complexity and performance. By combining HTME with the Transformer architecture, we present HTMformer, leveraging the enhanced feature extraction capability of the HTME extractor to build a lightweight forecaster. Experiments conducted on eight real-world datasets demonstrate that our approach outperforms existing baselines in both accuracy and efficiency.
Updated: 2025-12-15 10:16:46
标题: HTMformer:用于时间序列预测的混合时间和多元变换器
摘要: 基于Transformer的方法在时间序列预测中取得了令人印象深刻的结果。然而,现有的Transformer在序列建模方面仍存在局限性,因为它们往往过分强调时间依赖关系。这导致了额外的计算开销,而没有相应的性能提升。我们发现,Transformer的性能高度依赖于用于学习有效表示的嵌入方法。为了解决这个问题,我们提取多变量特征来增强嵌入层中捕获的有效信息,产生传达更丰富和更有意义的序列表示的多维嵌入。这些表示使基于Transformer的预测器能够更好地理解时间序列。具体而言,我们引入了混合时间和多变量嵌入(HTME)。HTME提取器集成了一个轻量级的时间特征提取模块和一个精心设计的多变量特征提取模块,提供互补的特征,从而在模型复杂性和性能之间实现平衡。通过将HTME与Transformer架构相结合,我们提出了HTMformer,利用HTME提取器的增强特征提取能力来构建一个轻量级的预测器。在八个真实世界数据集上进行的实验表明,我们的方法在准确性和效率方面优于现有的基准。
更新时间: 2025-12-15 10:16:46
领域: cs.LG,cs.AI
Reliability-Adjusted Prioritized Experience Replay
Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-10 benchmark.
Updated: 2025-12-15 10:13:42
标题: 可靠性调整的优先经验重播
摘要: 经验回放使在线强化学习代理从过去的经验中实现数据高效学习。传统上,经验是从回放缓冲区均匀抽样的,而不考虑经验特定学习潜力的差异。为了更有效地抽样,研究人员引入了优先经验回放(PER)。在本文中,我们提出了对PER的扩展,通过引入一种新的时间差异误差可靠性度量。我们在理论上展示了由此产生的转换选择算法,可靠性调整的优先经验回放(ReaPER),比PER实现更高效的学习。我们进一步呈现了实证结果,表明ReaPER在各种环境类型(包括Atari-10基准测试)中优于PER。
更新时间: 2025-12-15 10:13:42
领域: cs.LG,stat.ML
SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning
Effective human-agent collaboration is increasingly prevalent in real-world applications. Current trends in such collaborations are predominantly unidirectional, with users providing instructions or posing questions to agents, where agents respond directly without seeking necessary clarifications or confirmations. However, the evolving capabilities of these agents require more proactive engagement, where agents should dynamically participate in conversations to clarify user intents, resolve ambiguities, and adapt to changing circumstances. Existing prior work under-utilize the conversational capabilities of language models (LMs), thereby optimizing agents as better followers rather than effective speakers. In this work, we introduce SpeakRL, a reinforcement learning (RL) method that enhances agents' conversational capabilities by rewarding proactive interactions with users, such as asking right clarification questions when necessary. To support this, we curate SpeakER, a synthetic dataset that includes diverse scenarios from task-oriented dialogues, where tasks are resolved through interactive clarification questions. We present a systematic analysis of reward design for conversational proactivity and propose a principled reward formulation for teaching agents to balance asking with acting. Empirical evaluations demonstrate that our approach achieves a 20.14% absolute improvement in task completion over base models without increasing conversation turns even surpassing even much larger proprietary models, demonstrating the promise of clarification-centric user-agent interactions.
Updated: 2025-12-15 10:08:53
标题: SpeakRL: 利用强化学习在语言模型中协同推理、说话和行动
摘要: 有效的人-代理协作在现实世界的应用中越来越普遍。目前这种协作的趋势主要是单向的,用户向代理提供指令或提出问题,代理直接回应而不寻求必要的澄清或确认。然而,这些代理的不断发展的能力需要更积极的参与,代理应该动态参与对话以澄清用户意图,解决歧义,并适应不断变化的情况。现有的先前工作未充分利用语言模型(LMs)的对话能力,因此优化代理作为更好的跟随者而不是有效的发言者。在这项工作中,我们介绍了SpeakRL,这是一种增强代理对话能力的强化学习(RL)方法,通过奖励与用户的积极互动,例如在必要时提出正确的澄清问题。为了支持这一点,我们整理了SpeakER,这是一个包含来自任务导向对话的各种情景的合成数据集,通过互动式澄清问题解决任务。我们对对话积极性的奖励设计进行了系统分析,并提出了一种原则性的奖励制定方法,教导代理平衡提问和行动。实证评估表明,我们的方法在任务完成上取得了20.14%的绝对改善,超过了基础模型,甚至超过了规模更大的专有模型,展示了以澄清为中心的用户-代理互动的潜力。
更新时间: 2025-12-15 10:08:53
领域: cs.AI,cs.CL
Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance
Recent work on large language models (LLMs) has increasingly focused on post-training and alignment with datasets curated to enhance instruction following, world knowledge, and specialized skills. However, most post-training datasets used in leading open- and closed-source LLMs remain inaccessible to the public, with limited information about their construction process. This lack of transparency has motivated the recent development of open-source post-training corpora. While training on these open alternatives can yield performance comparable to that of leading models, systematic comparisons remain challenging due to the significant computational cost of conducting them rigorously at scale, and are therefore largely absent. As a result, it remains unclear how specific samples, task types, or curation strategies influence downstream performance when assessing data quality. In this work, we conduct the first comprehensive side-by-side analysis of two prominent open post-training datasets: Tulu-3-SFT-Mix and SmolTalk. Using the Magpie framework, we annotate each sample with detailed quality metrics, including turn structure (single-turn vs. multi-turn), task category, input quality, and response quality, and we derive statistics that reveal structural and qualitative similarities and differences between the two datasets. Based on these insights, we design a principled curation recipe that produces a new data mixture, TuluTalk, which contains 14% fewer samples than either source dataset while matching or exceeding their performance on key benchmarks. Our findings offer actionable insights for constructing more effective post-training datasets that improve model performance within practical resource limits. To support future research, we publicly release both the annotated source datasets and our curated TuluTalk mixture.
Updated: 2025-12-15 10:08:14
标题: 在后期修复:LLM后训练数据质量和模型性能的比较研究
摘要: 最近关于大型语言模型(LLMs)的研究越来越集中于后期训练和与旨在增强指令遵循、世界知识和专业技能的数据集的对齐。然而,目前主要开源和闭源LLMs中使用的后期训练数据集仍无法公开获得,并且关于它们的构建过程的信息有限。这种缺乏透明度促使了最近开发开源后期训练语料库。虽然在这些开放替代方案上进行训练可以产生与主要模型相媲美的性能,但由于在规模上严格进行这些比较的计算成本很高,因此系统性的比较仍然具有挑战性,并且在很大程度上缺乏。因此,尚不清楚在评估数据质量时,特定样本、任务类型或策划策略如何影响下游性能。在这项工作中,我们进行了对比分析两个知名的开源后期训练数据集:Tulu-3-SFT-Mix和SmolTalk。我们使用Magpie框架为每个样本注释了详细的质量指标,包括对话结构(单轮对话vs多轮对话)、任务类别、输入质量和响应质量,并推导出反映两个数据集之间结构和质量相似性和差异性的统计数据。基于这些见解,我们设计了一个合理的策划配方,生成了一个新的数据混合物TuluTalk,该混合物比任一源数据集少14%的样本,同时在主要基准测试中匹配或超过它们的性能。我们的研究结果为构建更有效的后期训练数据集提供了可操作的见解,可以在实践资源限制内提高模型性能。为支持未来的研究,我们公开发布了注释的源数据集和我们精心策划的TuluTalk混合物。
更新时间: 2025-12-15 10:08:14
领域: cs.CL,cs.AI
Intrinsic Image Fusion for Multi-View 3D Material Reconstruction
We introduce Intrinsic Image Fusion, a method that reconstructs high-quality physically based materials from multi-view images. Material reconstruction is highly underconstrained and typically relies on analysis-by-synthesis, which requires expensive and noisy path tracing. To better constrain the optimization, we incorporate single-view priors into the reconstruction process. We leverage a diffusion-based material estimator that produces multiple, but often inconsistent, candidate decompositions per view. To reduce the inconsistency, we fit an explicit low-dimensional parametric function to the predictions. We then propose a robust optimization framework using soft per-view prediction selection together with confidence-based soft multi-view inlier set to fuse the most consistent predictions of the most confident views into a consistent parametric material space. Finally, we use inverse path tracing to optimize for the low-dimensional parameters. Our results outperform state-of-the-art methods in material disentanglement on both synthetic and real scenes, producing sharp and clean reconstructions suitable for high-quality relighting.
Updated: 2025-12-15 10:05:59
标题: 多视角3D材料重建的内在图像融合
摘要: 我们介绍了内在图像融合,这是一种从多视角图像中重建高质量基于物理的材料的方法。材料重建高度不受约束,通常依赖于分析合成,这需要昂贵且嘈杂的路径追踪。为了更好地约束优化过程,我们将单视角先验信息纳入重建过程中。我们利用基于扩散的材料估计器,该估计器为每个视角生成多个,但通常不一致的候选分解。为了减少不一致性,我们将显式的低维参数函数拟合到预测中。然后,我们提出了一个鲁棒的优化框架,使用软的每个视角预测选择以及基于置信度的软多视角内点集,将最一致的预测从最有信心的视角中融合到一致的参数化材料空间中。最后,我们使用逆向路径追踪来优化低维参数。我们的结果在合成和真实场景中的材料剖解方面优于最先进的方法,产生适合高质量照明重新渲染的清晰干净的重建结果。
更新时间: 2025-12-15 10:05:59
领域: cs.CV,cs.AI
MAC: A Multi-Agent Framework for Interactive User Clarification in Multi-turn Conversations
Conversational agents often encounter ambiguous user requests, requiring an effective clarification to successfully complete tasks. While recent advancements in real-world applications favor multi-agent architectures to manage complex conversational scenarios efficiently, ambiguity resolution remains a critical and underexplored challenge--particularly due to the difficulty of determining which agent should initiate a clarification and how agents should coordinate their actions when faced with uncertain or incomplete user input. The fundamental questions of when to interrupt a user and how to formulate the optimal clarification query within the most optimal multi-agent settings remain open. In this paper, we propose MAC (Multi-Agent Clarification), an interactive multi-agent framework specifically optimized to resolve user ambiguities by strategically managing clarification dialogues. We first introduce a novel taxonomy categorizing user ambiguities to systematically guide clarification strategies. Then, we present MAC that autonomously coordinates multiple agents to interact synergistically with users. Empirical evaluations on MultiWOZ 2.4 demonstrate that enabling clarification at both levels increases task success rate 7.8\% (54.5 to 62.3) and reduces the average number of dialogue turns (6.53 to 4.86) by eliciting all required user information up front and minimizing repetition. Our findings highlight the importance of active user interaction and role-aware clarification for more reliable human-agent communication.
Updated: 2025-12-15 10:02:50
标题: MAC: 一个用于多轮对话中交互用户澄清的多Agent框架
摘要: 对话代理经常遇到模糊的用户请求,需要有效的澄清才能成功完成任务。虽然最近在实际应用中的进展倾向于采用多代理架构来有效地管理复杂的对话场景,模糊解决仍然是一个关键且未被充分探讨的挑战--特别是由于确定哪个代理应该启动澄清以及代理在面对不确定或不完整的用户输入时应如何协调他们的行动的困难。关于何时打断用户以及如何在最佳的多代理设置中制定最佳澄清查询的基本问题仍然存在。在本文中,我们提出了 MAC(多代理澄清),这是一个专门优化以策略地管理澄清对话来解决用户模糊的交互式多代理框架。我们首先引入了一个新颖的分类用户模糊的分类法,以系统地指导澄清策略。然后,我们介绍了 MAC,它可以自主地协调多个代理与用户进行协同互动。对 MultiWOZ 2.4 的实证评估表明,在两个级别都启用澄清可以将任务成功率提高 7.8\%(从 54.5% 到 62.3%),并通过在前期引出所有所需的用户信息并最小化重复来减少对话轮次的平均数量(从 6.53 到 4.86)。我们的发现强调了积极的用户互动和角色感知的澄清对于更可靠的人机交流的重要性。
更新时间: 2025-12-15 10:02:50
领域: cs.AI,cs.CL
Enhancing Node-Level Graph Domain Adaptation by Alleviating Local Dependency
Recent years have witnessed significant advancements in machine learning methods on graphs. However, transferring knowledge effectively from one graph to another remains a critical challenge. This highlights the need for algorithms capable of applying information extracted from a source graph to an unlabeled target graph, a task known as unsupervised graph domain adaptation (GDA). One key difficulty in unsupervised GDA is conditional shift, which hinders transferability. In this paper, we show that conditional shift can be observed only if there exists local dependencies among node features. To support this claim, we perform a rigorous analysis and also further provide generalization bounds of GDA when dependent node features are modeled using markov chains. Guided by the theoretical findings, we propose to improve GDA by decorrelating node features, which can be specifically implemented through decorrelated GCN layers and graph transformer layers. Our experimental results demonstrate the effectiveness of this approach, showing not only substantial performance enhancements over baseline GDA methods but also clear visualizations of small intra-class distances in the learned representations. Our code is available at https://github.com/TechnologyAiGroup/DFT
Updated: 2025-12-15 10:00:25
标题: 通过缓解局部依赖性增强节点级图领域自适应
摘要: 近年来,图形机器学习方法取得了显著进展。然而,有效地将知识从一个图形传输到另一个图形仍然是一个关键挑战。这凸显了需要能够将从源图中提取的信息应用到未标记目标图中的算法的需求,这一任务被称为无监督图形领域适应(GDA)。无监督GDA中的一个关键困难是条件转移,这妨碍了可转移性。在本文中,我们表明只有当节点特征之间存在局部依赖时,才能观察到条件转移。为了支持这一观点,我们进行了严格的分析,并进一步提供了当使用马尔可夫链建模依赖节点特征时GDA的泛化界限。在理论发现的指导下,我们提出通过解耦节点特征来改进GDA,这可以通过解耦的GCN层和图形变换层来具体实现。我们的实验结果表明了这种方法的有效性,不仅显示出与基线GDA方法相比的显著性能提升,还清晰地可视化了学习表示中小类内距离。我们的代码可在https://github.com/TechnologyAiGroup/DFT上找到。
更新时间: 2025-12-15 10:00:25
领域: cs.LG,stat.ML
Channel Dependence, Limited Lookback Windows, and the Simplicity of Datasets: How Biased is Time Series Forecasting?
In Time Series Forecasting (TSF), the lookback window (the length of historical data used for prediction) is a critical hyperparameter that is often set arbitrarily, undermining the validity of model evaluations. We argue that the lookback window must be tuned on a per-task basis to ensure fair comparisons. Our empirical results show that failing to do so can invert performance rankings, particularly when comparing univariate and multivariate methods. Experiments on standard benchmarks reposition Channel-Independent (CI) models, such as PatchTST, as state-of-the-art methods. However, we reveal this superior performance is largely an artifact of weak inter-channel correlations and simplicity of patterns within these specific datasets. Using Granger causality analysis and ODE datasets (with implicit channel correlations), we demonstrate that the true strength of multivariate Channel-Dependent (CD) models emerges on datasets with strong, inherent cross-channel dependencies, where they significantly outperform CI models. We conclude with three key recommendations for improving TSF research: (i) treat the lookback window as a critical hyperparameter to be tuned, (ii) use statistical analysis of datasets to inform the choice between CI and CD architectures, and (iii) favor CD models in applications with limited data.
Updated: 2025-12-15 09:52:51
标题: 通道依赖性、有限回顾窗口和数据集的简单性:时间序列预测存在多大偏差?
摘要: 在时间序列预测(TSF)中,回溯窗口(用于预测的历史数据长度)是一个关键的超参数,通常会被任意设置,从而损害模型评估的有效性。我们认为,必须根据每个任务调整回溯窗口,以确保公平比较。我们的实证结果显示,不这样做会颠倒性能排序,特别是在比较单变量和多变量方法时。在标准基准测试中,实验重新定位通道独立(CI)模型,如PatchTST,作为最先进的方法。然而,我们揭示了这种优越性能在很大程度上是由于弱通道间相关性和这些特定数据集中模式的简单性造成的。通过格兰杰因果分析和ODE数据集(具有隐式通道相关性),我们证明了多变量通道相关(CD)模型在具有强烈内在交叉通道依赖性的数据集上表现出真正的优势,在这些数据集上,它们明显优于CI模型。我们总结了改进TSF研究的三个关键建议:(i)将回溯窗口视为一个关键的超参数进行调整,(ii)使用数据集的统计分析来决定CI和CD架构之间的选择,(iii)在数据有限的应用中更倾向于CD模型。
更新时间: 2025-12-15 09:52:51
领域: cs.LG
Weight Space Correlation Analysis: Quantifying Feature Utilization in Deep Learning Models
Deep learning models in medical imaging are susceptible to shortcut learning, relying on confounding metadata (e.g., scanner model) that is often encoded in image embeddings. The crucial question is whether the model actively utilizes this encoded information for its final prediction. We introduce Weight Space Correlation Analysis, an interpretable methodology that quantifies feature utilization by measuring the alignment between the classification heads of a primary clinical task and auxiliary metadata tasks. We first validate our method by successfully detecting artificially induced shortcut learning. We then apply it to probe the feature utilization of an SA-SonoNet model trained for Spontaneous Preterm Birth (sPTB) prediction. Our analysis confirmed that while the embeddings contain substantial metadata, the sPTB classifier's weight vectors were highly correlated with clinically relevant factors (e.g., birth weight) but decoupled from clinically irrelevant acquisition factors (e.g. scanner). Our methodology provides a tool to verify model trustworthiness, demonstrating that, in the absence of induced bias, the clinical model selectively utilizes features related to the genuine clinical signal.
Updated: 2025-12-15 09:52:46
标题: 权重空间相关性分析:量化深度学习模型中特征利用
摘要: 医学影像中的深度学习模型容易受到捷径学习的影响,依赖于常常被编码在图像嵌入中的混淆元数据(例如,扫描仪型号)。关键问题在于模型是否积极利用这些编码信息进行最终预测。我们引入了一种可解释的方法,称为权重空间相关性分析,通过测量主要临床任务的分类头和辅助元数据任务之间的对齐程度来量化特征的利用情况。我们首先通过成功检测人为诱导的捷径学习来验证我们的方法。然后,我们将其应用于探究为预测自发性早产(sPTB)而训练的SA-SonoNet模型的特征利用情况。我们的分析确认,虽然嵌入包含大量元数据,但sPTB分类器的权重向量与临床相关因素(例如出生体重)高度相关,但与临床无关的采集因素(例如扫描仪)解耦。我们的方法提供了一个验证模型可信度的工具,表明在没有诱导偏见的情况下,临床模型选择性地利用与真实临床信号相关的特征。
更新时间: 2025-12-15 09:52:46
领域: cs.CV,cs.LG,eess.IV
Can AI Understand What We Cannot Say? Measuring Multilevel Alignment Through Abortion Stigma Across Cognitive, Interpersonal, and Structural Levels
As large language models increasingly mediate stigmatized health decisions, their capacity to genuinely understand complex psychological and physiological phenomena remains poorly evaluated. Can AI understand what we cannot say? We investigate whether LLMs coherently represent abortion stigma across the cognitive, interpersonal, and structural levels where it operates. We systematically tested 627 demographically diverse personas across five leading LLMs using the validated Individual Level Abortion Stigma Scale (ILAS). Our multilevel analysis examined whether models coherently represent stigma at the cognitive level (self-judgment), interpersonal level (anticipated judgment and isolation), and structural level (community condemnation and disclosure patterns), as well as overall stigma. Models fail tests of genuine understanding across all levels. They overestimate interpersonal stigma while underestimating cognitive stigma, assume uniform community condemnation, introduce demographic biases absent from human validation data, miss the empirically validated stigma-secrecy relationship, and contradict themselves within theoretical constructs. These patterns reveal that current alignment approaches ensure appropriate language but not coherent multilevel understanding. This work provides empirical evidence that current LLMs lack coherent multilevel understanding of psychological and physiological constructs. AI safety in high-stakes contexts demands new approaches to design (multilevel coherence), evaluation (continuous auditing), governance and regulation (mandatory audits, accountability, deployment restrictions), and AI literacy in domains where understanding what people cannot say determines whether support helps or harms.
Updated: 2025-12-15 09:50:00
标题: AI能理解我们无法表达的东西吗?通过在认知、人际和结构层面上测量对堕胎耻辱的多层次契合
摘要: 随着大型语言模型越来越多地介入带有污名的健康决策,它们真正理解复杂的心理和生理现象的能力仍然得到了很少的评估。人工智能能否理解我们无法表达的东西?我们调查了LLMs是否在操作的认知、人际和结构层面上连贯地代表了堕胎污名。我们系统地使用经过验证的个人水平堕胎污名量表(ILAS)对627个不同人口统计特征的角色进行了测试,涵盖了五个领先的LLMs。我们的多层次分析考察了模型是否在认知层面(自我评判)、人际层面(预期评判和孤立)和结构层面(社区谴责和披露模式)以及总体污名方面连贯地代表污名。模型在所有层面上都未能通过真正理解的测试。它们高估了人际污名,同时低估了认知污名,假设社区谴责是一致的,引入了人类验证数据中不存在的人口统计偏见,忽视了经验证的污名-保密关系,并在理论构建中自相矛盾。这些模式表明,目前的对齐方法确保了适当的语言,但并不具有连贯的多层次理解。这项工作提供了实证证据,表明当前的LLMs缺乏对心理和生理构建的连贯多层次理解。在高风险环境中的人工智能安全要求采用新的设计方法(多层次一致性)、评估方法(持续审计)、治理和监管方法(强制审计、问责制、部署限制),以及在理解人们无法表达的东西是否决定支持是帮助还是伤害的领域中的人工智能素养。
更新时间: 2025-12-15 09:50:00
领域: cs.AI,cs.HC
Towards Unified Co-Speech Gesture Generation via Hierarchical Implicit Periodicity Learning
Generating 3D-based body movements from speech shows great potential in extensive downstream applications, while it still suffers challenges in imitating realistic human movements. Predominant research efforts focus on end-to-end generation schemes to generate co-speech gestures, spanning GANs, VQ-VAE, and recent diffusion models. As an ill-posed problem, in this paper, we argue that these prevailing learning schemes fail to model crucial inter- and intra-correlations across different motion units, i.e. head, body, and hands, thus leading to unnatural movements and poor coordination. To delve into these intrinsic correlations, we propose a unified Hierarchical Implicit Periodicity (HIP) learning approach for audio-inspired 3D gesture generation. Different from predominant research, our approach models this multi-modal implicit relationship by two explicit technique insights: i) To disentangle the complicated gesture movements, we first explore the gesture motion phase manifolds with periodic autoencoders to imitate human natures from realistic distributions while incorporating non-period ones from current latent states for instance-level diversities. ii) To model the hierarchical relationship of face motions, body gestures, and hand movements, driving the animation with cascaded guidance during learning. We exhibit our proposed approach on 3D avatars and extensive experiments show our method outperforms the state-of-the-art co-speech gesture generation methods by both quantitative and qualitative evaluations. Code and models will be publicly available.
Updated: 2025-12-15 09:43:08
标题: 朝向通过分层隐式周期性学习实现统一的语音伴随手势生成
摘要: 从语音生成基于3D的身体动作在广泛的下游应用中显示出巨大潜力,但仍然面临着模仿逼真人类动作的挑战。主导性的研究工作集中在端到端生成方案上,以生成共同语音手势,涵盖了GANs、VQ-VAE和最新的扩散模型。作为一个不适定问题,在本文中,我们认为这些主流学习方案未能建模不同运动单元之间的关键的内部和内部相关性,即头部、身体和手部,因此导致不自然的动作和协调不佳。为了深入研究这些固有的相关性,我们提出了一种用于受音频启发的3D手势生成的统一的分层隐式周期性(HIP)学习方法。与主流研究不同,我们的方法通过两种显式技术洞察来建模这种多模态隐式关系:i) 为了解开复杂的手势运动,我们首先利用周期性自动编码器探索手势运动相位流形,从逼真分布中模仿人类特性,同时从当前潜在状态中融入非周期性部分以实现实例级的多样性。ii) 为了模拟面部运动、身体手势和手部运动的层次关系,在学习过程中使用级联引导驱动动画。我们在3D头像上展示了我们提出的方法,并广泛实验证明我们的方法在数量和质量评估方面优于最先进的共同语音手势生成方法。代码和模型将公开提供。
更新时间: 2025-12-15 09:43:08
领域: cs.AI,cs.CV,cs.GR,cs.MM,cs.SD
We Can Always Catch You: Detecting Adversarial Patched Objects WITH or WITHOUT Signature
Recently, object detection has proven vulnerable to adversarial patch attacks. The attackers holding a specially crafted patch can hide themselves from state-of-the-art detectors, e.g., YOLO, even in the physical world. This attack can bring serious security threats, such as escaping from surveillance cameras. How to effectively detect this kind of adversarial examples to catch potential attacks has become an important problem. In this paper, we propose two detection methods: the signature-based method and the signature-independent method. First, we identify two signatures of existing adversarial patches that can be utilized to precisely locate patches within adversarial examples. By employing the signatures, a fast signature-based method is developed to detect the adversarial objects. Second, we present a robust signature-independent method based on the \textit{content semantics consistency} of model outputs. Adversarial objects violate this consistency, appearing locally but disappearing globally, while benign ones remain consistently present. The experiments demonstrate that two proposed methods can effectively detect attacks both in the digital and physical world. These methods each offer distinct advantage. Specifically, the signature-based method is capable of real-time detection, while the signature-independent method can detect unknown adversarial patch attacks and makes defense-aware attacks almost impossible to perform.
Updated: 2025-12-15 09:38:34
标题: 我们总是能够捕捉到你:检测带有或不带有标记的对抗性贴片物体
摘要: 最近,目标检测已经被对抗性贴片攻击所攻击。攻击者携带一个特制的贴片可以躲避现代检测器的侦测,例如 YOLO,甚至在现实世界中也是如此。这种攻击可能带来严重的安全威胁,比如逃避监控摄像头。如何有效地检测这种对抗性示例以捕捉潜在的攻击已经成为一个重要的问题。在本文中,我们提出了两种检测方法:基于签名的方法和无签名的方法。首先,我们确定了现有对抗性贴片的两个签名,这些签名可以被用来精确定位对抗性示例中的贴片。通过利用这些签名,我们开发了一种快速的基于签名的方法来检测对抗性对象。其次,我们提出了一种基于模型输出内容语义一致性的鲁棒的无签名方法。对抗性对象违反了这种一致性,局部出现但全局消失,而良性对象则保持一致地存在。实验证明,这两种提出的方法都能有效地检测数字和物理世界中的攻击。这些方法各自具有不同的优势。具体来说,基于签名的方法能够实时检测,而无签名方法可以检测未知的对抗性贴片攻击,并且使得防御感知攻击几乎不可能执行。
更新时间: 2025-12-15 09:38:34
领域: cs.CV,cs.CR
A Multi-Component AI Framework for Computational Psychology: From Robust Predictive Modeling to Deployed Generative Dialogue
The confluence of Artificial Intelligence and Computational Psychology presents an opportunity to model, understand, and interact with complex human psychological states through computational means. This paper presents a comprehensive, multi-faceted framework designed to bridge the gap between isolated predictive modeling and an interactive system for psychological analysis. The methodology encompasses a rigorous, end-to-end development lifecycle. First, foundational performance benchmarks were established on four diverse psychological datasets using classical machine learning techniques. Second, state-of-the-art transformer models were fine-tuned, a process that necessitated the development of effective solutions to overcome critical engineering challenges, including the resolution of numerical instability in regression tasks and the creation of a systematic workflow for conducting large-scale training under severe resource constraints. Third, a generative large language model (LLM) was fine-tuned using parameter-efficient techniques to function as an interactive "Personality Brain." Finally, the entire suite of predictive and generative models was architected and deployed as a robust, scalable microservices ecosystem. Key findings include the successful stabilization of transformer-based regression models for affective computing, showing meaningful predictive performance where standard approaches failed, and the development of a replicable methodology for democratizing large-scale AI research. The significance of this work lies in its holistic approach, demonstrating a complete research-to-deployment pipeline that integrates predictive analysis with generative dialogue, thereby providing a practical model for future research in computational psychology and human-AI interaction.
Updated: 2025-12-15 09:34:04
标题: 一个用于计算心理学的多组件人工智能框架:从稳健的预测建模到部署的生成对话
摘要: 人工智能和计算心理学的交汇为通过计算手段对复杂的人类心理状态进行建模、理解和交互提供了机会。本文提出了一个全面多方位的框架,旨在弥合孤立的预测建模与心理分析交互系统之间的差距。该方法论涵盖了严格的端到端开发生命周期。首先,利用经典机器学习技术在四个不同的心理数据集上建立了基础性能基准。其次,最先进的转换器模型进行了微调,这一过程需要开发有效的解决方案来克服回归任务中的数值不稳定性,并创建一个系统化的工作流程,以在严格的资源约束下进行大规模训练。第三,利用参数高效的技术对生成式大型语言模型(LLM)进行微调,使其能够作为一个交互式的“个性大脑”。最后,整套预测和生成模型被设计并部署为一个稳健、可扩展的微服务生态系统。关键发现包括成功稳定基于转换器的回归模型用于情感计算,展示了标准方法失败的有意义的预测性能,以及开发了一个可复制的方法论,用于民主化大规模人工智能研究。这项工作的重要性在于其整体方法,展示了一个完整的研究到部署的管道,将预测分析与生成对话集成在一起,从而为未来计算心理学和人工智能交互的研究提供了一个实用模型。
更新时间: 2025-12-15 09:34:04
领域: cs.AI,cs.HC,cs.LG
Quanvolutional Neural Networks for Spectrum Peak-Finding
The analysis of spectra, such as Nuclear Magnetic Resonance (NMR) spectra, for the comprehensive characterization of peaks is a challenging task for both experts and machines, especially with complex molecules. This process, also known as deconvolution, involves identifying and quantifying the peaks in the spectrum. Machine learning techniques have shown promising results in automating this process. With the advent of quantum computing, there is potential to further enhance these techniques. In this work, inspired by the success of classical Convolutional Neural Networks (CNNs), we explore the use of Quanvolutional Neural Networks (QuanvNNs) for the multi-task peak finding problem, involving both peak counting and position estimation. We implement a simple and interpretable QuanvNN architecture that can be directly compared to its classical CNN counterpart, and evaluate its performance on a synthetic NMR-inspired dataset. Our results demonstrate that QuanvNNs outperform classical CNNs on challenging spectra, achieving an 11\% improvement in F1 score and a 30\% reduction in mean absolute error for peak position estimation. Additionally, QuanvNNs appear to exhibit better convergence stability for harder problems.
Updated: 2025-12-15 09:33:54
标题: 谱峰查找的全卷积神经网络
摘要: 谱分析,如核磁共振(NMR)谱的分析,对于专家和机器来说都是一项具有挑战性的任务,特别是对于复杂分子。这个过程,也被称为解卷积,涉及识别和量化谱中的峰值。机器学习技术在自动化这一过程方面表现出了很好的效果。随着量子计算的出现,有可能进一步提高这些技术。在这项工作中,受到经典卷积神经网络(CNN)成功的启发,我们探索了量子卷积神经网络(QuanvNNs)在多任务峰值查找问题中的应用,涉及峰值计数和位置估计。我们实现了一个简单且可解释的QuanvNN架构,可以直接与其经典CNN对应物进行比较,并评估其在合成NMR数据集上的性能。我们的结果表明,在具有挑战性的谱中,QuanvNNs的性能优于经典CNN,F1分数提高了11%,峰值位置估计的平均绝对误差减少了30%。此外,QuanvNNs似乎对更难的问题表现出更好的收敛稳定性。
更新时间: 2025-12-15 09:33:54
领域: cs.LG
Efficient Low-Tubal-Rank Tensor Estimation via Alternating Preconditioned Gradient Descent
The problem of low-tubal-rank tensor estimation is a fundamental task with wide applications across high-dimensional signal processing, machine learning, and image science. Traditional approaches tackle such a problem by performing tensor singular value decomposition, which is computationally expensive and becomes infeasible for large-scale tensors. Recent approaches address this issue by factorizing the tensor into two smaller factor tensors and solving the resulting problem using gradient descent. However, this kind of approach requires an accurate estimate of the tensor rank, and when the rank is overestimated, the convergence of gradient descent and its variants slows down significantly or even diverges. To address this problem, we propose an Alternating Preconditioned Gradient Descent (APGD) algorithm, which accelerates convergence in the over-parameterized setting by adding a preconditioning term to the original gradient and updating these two factors alternately. Based on certain geometric assumptions on the objective function, we establish linear convergence guarantees for more general low-tubal-rank tensor estimation problems. Then we further analyze the specific cases of low-tubal-rank tensor factorization and low-tubal-rank tensor recovery. Our theoretical results show that APGD achieves linear convergence even under over-parameterization, and the convergence rate is independent of the tensor condition number. Extensive simulations on synthetic data are carried out to validate our theoretical assertions.
Updated: 2025-12-15 09:28:22
标题: 通过交替预处理梯度下降法高效地估计低秩张量
摘要: 低张量秩估计问题是一个基础任务,在高维信号处理、机器学习和图像科学中具有广泛的应用。传统方法通过进行张量奇异值分解来解决这个问题,但这种方法在计算上是昂贵的,并且对于大规模张量来说是不可行的。最近的方法通过将张量分解为两个较小的因子张量,并使用梯度下降来解决所得问题来解决这个问题。然而,这种方法需要准确估计张量秩,当秩被高估时,梯度下降及其变种的收敛速度显著减慢甚至发散。为了解决这个问题,我们提出了一种交替预处理梯度下降(APGD)算法,通过向原梯度添加预处理项,并交替更新这两个因子来加速过参数化设置下的收敛。基于对目标函数的一些几何假设,我们为更一般的低张量秩估计问题建立了线性收敛保证。然后,我们进一步分析了低张量秩张量分解和低张量秩张量恢复的特定情况。我们的理论结果表明,APGD即使在过参数化情况下也能实现线性收敛,收敛速度与张量条件数无关。我们进行了大量对合成数据的模拟实验,以验证我们的理论断言。
更新时间: 2025-12-15 09:28:22
领域: cs.LG,math.OC
Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences
We study stopping rules for stochastic gradient descent (SGD) for convex optimization from the perspective of anytime-valid confidence sequences. Classical analyses of SGD provide convergence guarantees in expectation or at a fixed horizon, but offer no statistically valid way to assess, at an arbitrary time, how close the current iterate is to the optimum. We develop an anytime-valid, data-dependent upper confidence sequence for the weighted average suboptimality of projected SGD, constructed via nonnegative supermartingales and requiring no smoothness or strong convexity. This confidence sequence yields a simple stopping rule that is provably $\varepsilon$-optimal with probability at least $1-α$ and is almost surely finite under standard stochastic approximation stepsizes. To the best of our knowledge, these are the first rigorous, time-uniform performance guarantees and finite-time $\varepsilon$-optimality certificates for projected SGD with general convex objectives, based solely on observable trajectory quantities.
Updated: 2025-12-15 09:26:45
标题: 随机梯度下降的停止规则:基于随时有效的置信序列
摘要: 我们从任意有效的置信序列的角度研究了凸优化的随机梯度下降(SGD)的停止规则。传统的SGD分析提供了期望收敛保证或在固定时间内,但没有统计上有效的方法来评估在任意时间点,当前迭代与最优解的接近程度。我们开发了一个任意有效的、数据相关的上限置信序列,用于投影SGD的加权平均次优性,通过非负超鞍式构造,不需要光滑性或强凸性。这个置信序列提供了一个简单的停止规则,证明了在标准随机逼近步长下,至少有概率为$1-α$的$\varepsilon$-最优性,并且在有限的时间内几乎确定是有限的。据我们所知,这是基于可观测轨迹数量的首个严格的、时间一致的性能保证和有限时间的$\varepsilon$-最优性证明,适用于具有一般凸目标的投影SGD。
更新时间: 2025-12-15 09:26:45
领域: math.OC,cs.LG,math.ST,stat.ML
DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass
Current methods for dense 3D point tracking in dynamic scenes typically rely on pairwise processing, require known camera poses, or assume a temporal ordering to input frames, constraining their flexibility and applicability. Additionally, recent advances have successfully enabled efficient 3D reconstruction from large-scale, unposed image collections, underscoring opportunities for unified approaches to dynamic scene understanding. Motivated by this, we propose DePT3R, a novel framework that simultaneously performs dense point tracking and 3D reconstruction of dynamic scenes from multiple images in a single forward pass. This multi-task learning is achieved by extracting deep spatio-temporal features with a powerful backbone and regressing pixel-wise maps with dense prediction heads. Crucially, DePT3R operates without requiring camera poses, substantially enhancing its adaptability and efficiency-especially important in dynamic environments with rapid changes. We validate DePT3R on several challenging benchmarks involving dynamic scenes, demonstrating strong performance and significant improvements in memory efficiency over existing state-of-the-art methods. Data and codes are available via the open repository: https://github.com/StructuresComp/DePT3R
Updated: 2025-12-15 09:21:28
标题: DePT3R: 在单次前向传递中联合密集点跟踪和动态场景的3D重建
摘要: 目前,在动态场景中进行密集3D点跟踪的方法通常依赖于成对处理,需要已知的摄像机姿态,或者假设输入帧的时间顺序,从而限制了它们的灵活性和适用性。此外,最近的进展已成功地实现了从大规模、未姿态图像集合中高效地进行3D重建,突显了对动态场景理解的统一方法的机遇。受此启发,我们提出了DePT3R,一个新颖的框架,可以在单个前向传递中同时执行动态场景的密集点跟踪和3D重建。通过使用强大的骨干提取深度时空特征,并通过密集预测头回归像素级地图,实现了这种多任务学习。关键是,DePT3R在不需要摄像机姿态的情况下运行,显著提高了其适应性和效率-这在动态环境中快速变化的情况下尤为重要。我们在涉及动态场景的几个具有挑战性的基准测试上验证了DePT3R,展示了强大的性能和与现有最先进方法相比的内存效率显著改进。数据和代码可通过开放代码库获取:https://github.com/StructuresComp/DePT3R
更新时间: 2025-12-15 09:21:28
领域: cs.CV,cs.AI
Towards Practical Large-scale Dynamical Heterogeneous Graph Embedding: Cold-start Resilient Recommendation
Deploying dynamic heterogeneous graph embeddings in production faces key challenges of scalability, data freshness, and cold-start. This paper introduces a practical, two-stage solution that balances deep graph representation with low-latency incremental updates. Our framework combines HetSGFormer, a scalable graph transformer for static learning, with Incremental Locally Linear Embedding (ILLE), a lightweight, CPU-based algorithm for real-time updates. HetSGFormer captures global structure with linear scalability, while ILLE provides rapid, targeted updates to incorporate new data, thus avoiding costly full retraining. This dual approach is cold-start resilient, leveraging the graph to create meaningful embeddings from sparse data. On billion-scale graphs, A/B tests show HetSGFormer achieved up to a 6.11% lift in Advertiser Value over previous methods, while the ILLE module added another 3.22% lift and improved embedding refresh timeliness by 83.2%. Our work provides a validated framework for deploying dynamic graph learning in production environments.
Updated: 2025-12-15 09:19:23
标题: 朝着实用的大规模动态异构图嵌入:冷启动恢复推荐
摘要: 在生产环境中部署动态异构图嵌入面临着规模性、数据新鲜度和冷启动等关键挑战。本文介绍了一个实用的两阶段解决方案,平衡了深度图表示和低延迟增量更新。我们的框架结合了HetSGFormer,一个用于静态学习的可扩展图变换器,以及增量局部线性嵌入(ILLE),一个轻量级的基于CPU的实时更新算法。HetSGFormer通过线性可扩展性捕获全局结构,而ILLE提供快速、有针对性的更新,以整合新数据,从而避免昂贵的完整重新训练。这种双重方法对冷启动有韧性,利用图形从稀疏数据中创建有意义的嵌入。在十亿级规模的图形上,A/B测试显示HetSGFormer相比以前的方法提高了高达6.11%的广告价值,而ILLE模块又增加了3.22%的提升,并将嵌入刷新及时性提高了83.2%。我们的工作提供了一个验证过的框架,用于在生产环境中部署动态图学习。
更新时间: 2025-12-15 09:19:23
领域: cs.IR,cs.LG
Less Is More: Sparse and Cooperative Perturbation for Point Cloud Attacks
Most adversarial attacks on point clouds perturb a large number of points, causing widespread geometric changes and limiting applicability in real-world scenarios. While recent works explore sparse attacks by modifying only a few points, such approaches often struggle to maintain effectiveness due to the limited influence of individual perturbations. In this paper, we propose SCP, a sparse and cooperative perturbation framework that selects and leverages a compact subset of points whose joint perturbations produce amplified adversarial effects. Specifically, SCP identifies the subset where the misclassification loss is locally convex with respect to their joint perturbations, determined by checking the positivedefiniteness of the corresponding Hessian block. The selected subset is then optimized to generate high-impact adversarial examples with minimal modifications. Extensive experiments show that SCP achieves 100% attack success rates, surpassing state-of-the-art sparse attacks, and delivers superior imperceptibility to dense attacks with far fewer modifications.
Updated: 2025-12-15 09:18:27
标题: Less Is More: 稀疏和协作扰动在点云攻击中的应用
摘要: 对点云的大多数对抗性攻击会扰乱大量点,导致广泛的几何变化,并限制在现实世界场景中的适用性。尽管最近的研究探索了仅修改少量点的稀疏攻击,但这些方法往往难以保持有效性,因为单个扰动的影响有限。在本文中,我们提出了SCP,一个稀疏且合作扰动框架,选择并利用一个紧凑的点子集,它们的联合扰动产生放大的对抗效果。具体而言,SCP确定那些在其联合扰动方面的误分类损失是局部凸的子集,通过检查相应Hessian块的正定性来确定。然后对所选子集进行优化,以生成具有最小修改的高影响对抗性示例。大量实验证明,SCP实现了100%的攻击成功率,超越了最先进的稀疏攻击,并且在较少的修改的情况下提供了比密集攻击更优越的难以感知性。
更新时间: 2025-12-15 09:18:27
领域: cs.CR
SoK: Preconfirmations
In recent years, significant research efforts have focused on improving blockchain throughput and confirmation speeds without compromising security. While decreasing the time it takes for a transaction to be included in the blockchain ledger enhances user experience, a fundamental delay still remains between when a transaction is issued by a user and when its inclusion is confirmed in the blockchain ledger. This delay limits user experience gains through the confirmation uncertainty it brings for users. This inherent delay in conventional blockchain protocols has led to the emergence of preconfirmation protocols -- protocols that provide users with early guarantees of eventual transaction confirmation. This article presents a Systematization of Knowledge (SoK) on preconfirmations. We present the core terms and definitions needed to understand preconfirmations, outline a general framework for preconfirmation protocols, and explore the economics and risks of preconfirmations. Finally, we survey and apply our framework to several implementations of real-world preconfirmation protocols, bridging the gap between theory and practice.
Updated: 2025-12-15 09:10:48
标题: SoK: 预确认
摘要: 近年来,大量的研究工作集中在提高区块链的吞吐量和确认速度,同时不影响安全性。虽然减少交易被包含在区块链账本中的时间可以提升用户体验,但是用户发起交易和其在区块链账本中确认之间仍存在基本延迟。这种延迟通过确认不确定性限制了用户体验的提升。传统区块链协议中固有的延迟导致了预确认协议的出现——这些协议为用户提供了交易最终确认的早期保证。 本文提出了对预确认的知识体系化(SoK)。我们提供了理解预确认所需的核心术语和定义,概述了预确认协议的一般框架,并探讨了预确认的经济学和风险。最后,我们调查并应用我们的框架到几种实际预确认协议的实施中,弥合了理论和实践之间的鸿沟。
更新时间: 2025-12-15 09:10:48
领域: cs.CR,cs.NI
From Overfitting to Reliability: Introducing the Hierarchical Approximate Bayesian Neural Network
In recent years, neural networks have revolutionized various domains, yet challenges such as hyperparameter tuning and overfitting remain significant hurdles. Bayesian neural networks offer a framework to address these challenges by incorporating uncertainty directly into the model, yielding more reliable predictions, particularly for out-of-distribution data. This paper presents Hierarchical Approximate Bayesian Neural Network, a novel approach that uses a Gaussian-inverse-Wishart distribution as a hyperprior of the network's weights to increase both the robustness and performance of the model. We provide analytical representations for the predictive distribution and weight posterior, which amount to the calculation of the parameters of Student's t-distributions in closed form with linear complexity with respect to the number of weights. Our method demonstrates robust performance, effectively addressing issues of overfitting and providing reliable uncertainty estimates, particularly for out-of-distribution tasks. Experimental results indicate that HABNN not only matches but often outperforms state-of-the-art models, suggesting a promising direction for future applications in safety-critical environments.
Updated: 2025-12-15 09:08:42
标题: 从过度拟合到可靠性:引入分层近似贝叶斯神经网络
摘要: 近年来,神经网络已经彻底改变了各个领域,然而超参数调整和过拟合等挑战仍然是重要的障碍。贝叶斯神经网络通过直接将不确定性纳入模型来解决这些挑战,从而产生更可靠的预测,特别是对于超出分布范围的数据。本文提出了分层近似贝叶斯神经网络(Hierarchical Approximate Bayesian Neural Network),这是一种新颖的方法,使用高斯逆-维沙特分布作为网络权重的超先验,以增加模型的稳健性和性能。我们提供了预测分布和权重后验的解析表示,这等同于以封闭形式计算学生t-分布的参数,其复杂性与权重数量成线性关系。我们的方法展示了稳健的性能,有效解决了过拟合问题,并为超出分布任务提供可靠的不确定性估计。实验结果表明,HABNN不仅与但经常超越了最先进的模型,为未来在安全关键环境中应用提供了一个有希望的方向。
更新时间: 2025-12-15 09:08:42
领域: cs.LG,cs.AI
Uncovering the Role of Initial Saliency in U-Shaped Attention Bias: Scaling Initial Token Weight for Enhanced Long-Text Processing
Large language models (LLMs) have demonstrated strong performance on a variety of natural language processing (NLP) tasks. However, they often struggle with long-text sequences due to the ``lost in the middle'' phenomenon. This issue has been shown to arise from a U-shaped attention bias, where attention is disproportionately focused on the beginning and end of a text, leaving the middle section underrepresented. While previous studies have attributed this bias to position encoding, our research first identifies an additional factor: initial saliency. It means that in the attention computation for each token, tokens with higher attention weights relative to the initial token tend to receive more attention in the prediction of the next token. We further find that utilizing this property by scaling attention weight between the initial token and others improves the model's ability to process long contexts, achieving a maximum improvement of 3.6\% in MDQA dataset. Moreover, combining this approach with existing methods to reduce position encoding bias further enhances performance, achieving a maximum improvement of 3.4\% in KV-Retrieval tasks.
Updated: 2025-12-15 09:04:06
标题: 揭示初始显著性在U型注意偏向中的作用:通过调整初始词元权重以增强长文本处理
摘要: 大型语言模型(LLMs)在各种自然语言处理(NLP)任务上表现出强大的性能。然而,它们在处理长文本序列时经常遇到“中间丢失”现象。这个问题被证明是由U形注意偏差引起的,即注意力过度集中在文本的开头和结尾,导致中间部分被低估。尽管先前的研究将这种偏差归因于位置编码,但我们的研究首次确定了一个额外的因素:初始显著性。这意味着在每个标记的注意力计算中,相对于初始标记,具有更高注意力权重的标记在预测下一个标记时会获得更多的关注。我们进一步发现,通过在初始标记和其他标记之间调整注意力权重,可以提高模型处理长上下文的能力,在MDQA数据集中实现了最大3.6%的改进。此外,将这种方法与现有方法结合起来减少位置编码偏差进一步增强了性能,在KV-Retrieval任务中实现了最大3.4%的改进。
更新时间: 2025-12-15 09:04:06
领域: cs.CL,cs.AI
Diffusion-Based Restoration for Multi-Modal 3D Object Detection in Adverse Weather
Multi-modal 3D object detection is important for reliable perception in robotics and autonomous driving. However, its effectiveness remains limited under adverse weather conditions due to weather-induced distortions and misalignment between different data modalities. In this work, we propose DiffFusion, a novel framework designed to enhance robustness in challenging weather through diffusion-based restoration and adaptive cross-modal fusion. Our key insight is that diffusion models possess strong capabilities for denoising and generating data that can adapt to various weather conditions. Building on this, DiffFusion introduces Diffusion-IR restoring images degraded by weather effects and Point Cloud Restoration (PCR) compensating for corrupted LiDAR data using image object cues. To tackle misalignments between two modalities, we develop Bidirectional Adaptive Fusion and Alignment Module (BAFAM). It enables dynamic multi-modal fusion and bidirectional bird's-eye view (BEV) alignment to maintain consistent spatial correspondence. Extensive experiments on three public datasets show that DiffFusion achieves state-of-the-art robustness under adverse weather while preserving strong clean-data performance. Zero-shot results on the real-world DENSE dataset further validate its generalization. The implementation of our DiffFusion will be released as open-source.
Updated: 2025-12-15 09:03:46
标题: 逆境天气下多模态3D物体检测的基于扩散的恢复
摘要: 多模态三维物体检测对于机器人和自动驾驶中的可靠感知至关重要。然而,在恶劣天气条件下,由于天气引起的失真和不同数据模态之间的错位,其有效性仍然受到限制。在这项工作中,我们提出了DiffFusion,这是一个旨在通过基于扩散的恢复和自适应跨模态融合来增强在恶劣天气条件下的鲁棒性的新框架。我们的关键洞察是,扩散模型具有强大的去噪和生成数据的能力,可以适应各种天气条件。基于此,DiffFusion引入了Diffusion-IR来恢复由天气影响而降级的图像,并使用图像物体线索来补偿受损的LiDAR数据的Point Cloud Restoration(PCR)。为了解决两种模态之间的错位,我们开发了双向自适应融合和对齐模块(BAFAM)。它实现了动态多模态融合和双向鸟瞰视图对齐,以保持一致的空间对应关系。在三个公共数据集上进行的大量实验证明了DiffFusion在恶劣天气条件下实现了最先进的鲁棒性,同时保留了强大的干净数据性能。在真实世界的DENSE数据集上的零样本结果进一步验证了其泛化能力。我们的DiffFusion实现将作为开源发布。
更新时间: 2025-12-15 09:03:46
领域: cs.CV,cs.AI
TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning
Reinforcement learning with verifiable rewards (RLVR) has proven effective in training large reasoning models (LRMs) by leveraging answer-verifiable signals to guide policy optimization, which, however, suffers from high annotation costs. To alleviate this problem, recent work has explored unsupervised RLVR methods that derive rewards solely from the model's internal consistency, such as through entropy and majority voting. While seemingly promising, these methods often suffer from model collapse in the later stages of training, which may arise from the reinforcement of incorrect reasoning patterns in the absence of external supervision. In this work, we investigate a novel semi-supervised RLVR paradigm that utilizes a small labeled set to guide RLVR training on unlabeled samples. Our key insight is that supervised rewards are essential for stabilizing consistency-based training on unlabeled samples, ensuring that only reasoning patterns verified on labeled instances are incorporated into RL training. Technically, we propose an effective policy optimization algorithm, TraPO, that identifies reliable unlabeled samples by matching their learning trajectory similarity to labeled ones. Building on this, TraPO achieves remarkable data efficiency and strong generalization on six widely used mathematical reasoning benchmarks (AIME24/25, AMC, MATH-500, Minerva, and Olympiad) and three out-of-distribution tasks (ARC-c, GPQA-diamond, and MMLU-pro). With only 1K labeled and 3K unlabeled samples, TraPO reaches 42.6% average accuracy, surpassing the best unsupervised method trained on 45K unlabeled samples (38.3%). Notably, when using 4K labeled and 12K unlabeled samples, TraPO even outperforms the fully supervised model trained on the full 45K labeled samples on all benchmarks, while using only 10% of the labeled data. The code is available via https://github.com/ShenzhiYang2000/TRAPO.
Updated: 2025-12-15 09:03:45
标题: TraPO:一种半监督强化学习框架用于提升LLM推理
摘要: 具有可验证奖励的强化学习(RLVR)已被证明可以通过利用可验证信号来指导策略优化,从而有效地训练大型推理模型(LRMs),然而,这种方法成本高昂。为了缓解这个问题,最近的工作探索了无监督的RLVR方法,仅从模型的内部一致性中获取奖励,例如通过熵和多数投票。尽管这些方法看似很有前途,但它们在训练后期经常出现模型崩溃的问题,这可能是由于在缺乏外部监督的情况下强化了错误的推理模式。在这项工作中,我们研究了一种新颖的半监督RLVR范式,利用少量标记集来指导对未标记样本的RLVR训练。我们的关键见解是,监督奖励对于稳定基于一致性的训练对于未标记样本至关重要,确保只有在标记实例上验证的推理模式被纳入RL训练中。在技术上,我们提出了一个有效的策略优化算法TraPO,通过将学习轨迹的相似性与标记的样本进行匹配,来识别可靠的未标记样本。基于此,TraPO在六个广泛使用的数学推理基准测试(AIME24/25、AMC、MATH-500、Minerva和奥林匹克)和三个超出分布的任务(ARC-c、GPQA-diamond和MMLU-pro)上实现了显著的数据效率和强大的泛化能力。仅使用1K标记样本和3K未标记样本,TraPO达到了42.6%的平均准确率,超过了使用45K未标记样本训练的最佳无监督方法(38.3%)。值得注意的是,当使用4K标记样本和12K未标记样本时,TraPO甚至在所有基准测试上都优于完全监督模型,后者是在全部45K标记样本上训练的,而仅使用了10%的标记数据。代码可通过https://github.com/ShenzhiYang2000/TRAPO获得。
更新时间: 2025-12-15 09:03:45
领域: cs.LG,cs.AI
Socratic Students: Teaching Language Models to Learn by Asking Questions
Large Language Models (LLMs) excel at static interactions, where they answer user queries by retrieving knowledge encoded in their parameters. However, in many real-world settings, such as educational tutoring or medical assistance, relevant information is not directly available and must be actively acquired through dynamic interactions. An interactive agent would recognize its own uncertainty, ask targeted questions, and retain new knowledge efficiently. Prior work has primarily explored effective ways for a teacher to instruct the student, where the teacher identifies student gaps and provides guidance. In this work, we shift the focus to the student and investigate effective strategies to actively query the teacher in seeking useful information. Across math and coding benchmarks, where baseline student models begin with near-zero performance, we show that student-led approaches consistently yield absolute Pass@k improvements of at least 0.5 over static baselines. To improve question quality, we train students using Direct Preference Optimization (DPO) with guidance from either self or stronger students. We find that this guided training enables smaller models to learn how to ask better questions, further enhancing learning efficiency.
Updated: 2025-12-15 08:59:19
标题: 苏格拉底式学生:通过提问教导语言模型学习
摘要: 大型语言模型(LLMs)在静态交互方面表现出色,通过检索其参数中编码的知识来回答用户查询。然而,在许多实际情境中,如教育辅导或医疗援助,相关信息并不直接可用,必须通过动态交互积极获取。交互式代理会意识到自身的不确定性,提出有针对性的问题,并有效地保留新知识。先前的工作主要探讨了教师如何有效指导学生的方式,其中教师识别学生的差距并提供指导。在这项工作中,我们将焦点转向学生,并研究了主动向教师提出有用信息查询的有效策略。在数学和编码基准测试中,基线学生模型的表现几乎为零,我们发现学生主导的方法始终比静态基线具有至少0.5的绝对Pass@k改进。为了提高问题质量,我们使用直接偏好优化(DPO)训练学生,并在自身或更强学生的指导下进行。我们发现,这种指导训练使较小模型学会如何提出更好的问题,进一步提高了学习效率。
更新时间: 2025-12-15 08:59:19
领域: cs.AI
Chasing Shadows: Pitfalls in LLM Security Research
Large language models (LLMs) are increasingly prevalent in security research. Their unique characteristics, however, introduce challenges that undermine established paradigms of reproducibility, rigor, and evaluation. Prior work has identified common pitfalls in traditional machine learning research, but these studies predate the advent of LLMs. In this paper, we identify nine common pitfalls that have become (more) relevant with the emergence of LLMs and that can compromise the validity of research involving them. These pitfalls span the entire computation process, from data collection, pre-training, and fine-tuning to prompting and evaluation. We assess the prevalence of these pitfalls across all 72 peer-reviewed papers published at leading Security and Software Engineering venues between 2023 and 2024. We find that every paper contains at least one pitfall, and each pitfall appears in multiple papers. Yet only 15.7% of the present pitfalls were explicitly discussed, suggesting that the majority remain unrecognized. To understand their practical impact, we conduct four empirical case studies showing how individual pitfalls can mislead evaluation, inflate performance, or impair reproducibility. Based on our findings, we offer actionable guidelines to support the community in future work.
Updated: 2025-12-15 08:58:27
标题: 追逐阴影:LLM安全研究中的陷阱
摘要: 大型语言模型(LLMs)在安全研究中日益普遍。然而,它们独特的特点引入了挑战,破坏了已建立的可重现性、严谨性和评估范式。先前的工作已经确定了传统机器学习研究中的常见陷阱,但这些研究早于LLMs的出现。在本文中,我们确定了九个随着LLMs的出现而变得(更)相关的常见陷阱,这些陷阱可能危及涉及它们的研究的有效性。这些陷阱涵盖了整个计算过程,从数据收集、预训练和微调到提示和评估。 我们评估了在2023年至2024年间发表在主要安全和软件工程会议上的72篇同行评议论文中这些陷阱的普遍性。我们发现每篇论文至少包含一个陷阱,每个陷阱都出现在多篇论文中。然而,仅有15.7%的当前陷阱被明确讨论,这表明大多数陷阱仍未被认识到。为了了解它们的实际影响,我们进行了四个实证案例研究,展示了个别陷阱如何误导评估、夸大性能或损害可重现性。根据我们的发现,我们提供了可操作的指南,以支持未来工作中的社区。
更新时间: 2025-12-15 08:58:27
领域: cs.CR
Harmonizing Generalization and Specialization: Uncertainty-Informed Collaborative Learning for Semi-supervised Medical Image Segmentation
Vision foundation models have demonstrated strong generalization in medical image segmentation by leveraging large-scale, heterogeneous pretraining. However, they often struggle to generalize to specialized clinical tasks under limited annotations or rare pathological variations, due to a mismatch between general priors and task-specific requirements. To address this, we propose Uncertainty-informed Collaborative Learning (UnCoL), a dual-teacher framework that harmonizes generalization and specialization in semi-supervised medical image segmentation. Specifically, UnCoL distills both visual and semantic representations from a frozen foundation model to transfer general knowledge, while concurrently maintaining a progressively adapting teacher to capture fine-grained and task-specific representations. To balance guidance from both teachers, pseudo-label learning in UnCoL is adaptively regulated by predictive uncertainty, which selectively suppresses unreliable supervision and stabilizes learning in ambiguous regions. Experiments on diverse 2D and 3D segmentation benchmarks show that UnCoL consistently outperforms state-of-the-art semi-supervised methods and foundation model baselines. Moreover, our model delivers near fully supervised performance with markedly reduced annotation requirements.
Updated: 2025-12-15 08:57:49
标题: 协调概括和专门化:基于不确定性的半监督医学图像分割的协作学习
摘要: 视觉基础模型通过利用大规模、异构的预训练,在医学图像分割中展示了强大的泛化能力。然而,在有限标注或罕见病理变异的情况下,它们往往难以泛化到专门的临床任务,这是因为通用先验和特定任务需求之间存在不匹配。为了解决这个问题,我们提出了一种基于不确定性的协作学习(UnCoL)方法,这是一个双教师框架,可以在半监督的医学图像分割中协调泛化和专业化。具体来说,UnCoL从一个冻结的基础模型中提取视觉和语义表示,以转移一般知识,同时保持一个逐渐适应的教师来捕获细粒度和任务特定的表示。为了平衡两个教师的指导,UnCoL中的伪标签学习通过预测不确定性自适应调节,选择性地抑制不可靠的监督,并在模糊区域稳定学习。在各种2D和3D分割基准上的实验证明,UnCoL始终优于最先进的半监督方法和基础模型基线。此外,我们的模型在明显减少标注要求的情况下,提供接近完全监督的性能。
更新时间: 2025-12-15 08:57:49
领域: cs.CV,cs.AI,cs.LG
OXE-AugE: A Large-Scale Robot Augmentation of OXE for Scaling Cross-Embodiment Policy Learning
Large and diverse datasets are needed for training generalist robot policies that have potential to control a variety of robot embodiments -- robot arm and gripper combinations -- across diverse tasks and environments. As re-collecting demonstrations and retraining for each new hardware platform are prohibitively costly, we show that existing robot data can be augmented for transfer and generalization. The Open X-Embodiment (OXE) dataset, which aggregates demonstrations from over 60 robot datasets, has been widely used as the foundation for training generalist policies. However, it is highly imbalanced: the top four robot types account for over 85\% of its real data, which risks overfitting to robot--scene combinations. We present AugE-Toolkit, a scalable robot augmentation pipeline, and OXE-AugE, a high-quality open-source dataset that augments OXE with 9 different robot embodiments. OXE-AugE provides over 4.4 million trajectories, more than triple the size of the original OXE. We conduct a systematic study of how scaling robot augmentation impacts cross-embodiment learning. Results suggest that augmenting datasets with diverse arms and grippers improves policy performance not only on the augmented robots, but also on unseen robots and even the original robots under distribution shifts. In physical experiments, we demonstrate that state-of-the-art generalist policies such as OpenVLA and $π_0$ benefit from fine-tuning on OXE-AugE, improving success rates by 24-45% on previously unseen robot--gripper combinations across four real-world manipulation tasks. Project website: https://OXE-AugE.github.io/.
Updated: 2025-12-15 08:57:15
标题: OXE-AugE:一种用于扩展跨体验策略学习的大规模机器人增强OXE
摘要: 需要大量且多样化的数据集来训练通用机器人策略,这些策略有潜力控制各种机器人实体——机械臂和抓取器的组合——跨越不同任务和环境。由于重新收集示范和为每个新的硬件平台重新训练成本过高,我们表明现有的机器人数据可以通过增强来进行转移和泛化。开放式X-实体(OXE)数据集聚合了来自60多个机器人数据集的示范,被广泛用作训练通用策略的基础。然而,它存在严重的不平衡性:前四种机器人类型占其真实数据的85%以上,这可能导致过拟合到机器人-场景组合。我们提出了AugE-Toolkit,一个可扩展的机器人增强管道,以及OXE-AugE,一个高质量的开源数据集,通过9种不同的机器人实体增强OXE。OXE-AugE提供了超过440万条轨迹,是原始OXE大小的三倍以上。我们对扩展机器人增强如何影响跨实体学习进行了系统研究。结果表明,通过使用多样化的机械臂和抓取器增强数据集不仅可以提高增强机器人的策略性能,还可以在看不见的机器人和甚至在分布转移下的原始机器人上提高性能。在物理实验中,我们展示了OpenVLA和$π_0$等最先进的通用策略通过在OXE-AugE上进行微调,可以将在四个真实世界操纵任务中以前看不见的机器人-抓取器组合的成功率提高24-45%。项目网站:https://OXE-AugE.github.io/。
更新时间: 2025-12-15 08:57:15
领域: cs.RO,cs.AI
ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning
To combine the advantages of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), recent methods have integrated ''hints'' into post-training, which are prefix segments of complete reasoning trajectories, aiming for powerful knowledge expansion and reasoning generalization. However, existing hint-based RL methods typically ignore difficulty when scheduling hint ratios and estimating relative advantages, leading to unstable learning and excessive imitation of off-policy hints. In this work, we propose ADHint, which treats difficulty as a key factor in both hint-ratio schedule and relative-advantage estimation to achieve a better trade-off between exploration and imitation. Specifically, we propose Adaptive Hint with Sample Difficulty Prior, which evaluates each sample's difficulty under the policy model and accordingly schedules an appropriate hint ratio to guide its rollouts. We also introduce Consistency-based Gradient Modulation and Selective Masking for Hint Preservation to modulate token-level gradients within hints, preventing biased and destructive updates. Additionally, we propose Advantage Estimation with Rollout Difficulty Posterior, which leverages the relative difficulty of rollouts with and without hints to estimate their respective advantages, thereby achieving more balanced updates. Extensive experiments across diverse modalities, model scales, and domains demonstrate that ADHint delivers superior reasoning ability and out-of-distribution generalization, consistently surpassing existing methods in both pass@1 and avg@8. Our code and dataset will be made publicly available upon paper acceptance.
Updated: 2025-12-15 08:53:47
标题: ADHint: 自适应提示与困难先验用于强化学习
摘要: 为了结合监督微调(SFT)和强化学习(RL)的优势,最近的方法在后期训练中引入了“提示”,这些提示是完整推理轨迹的前缀片段,旨在实现强大的知识扩展和推理泛化。然而,现有基于提示的RL方法通常在调度提示比例和估计相对优势时忽略了困难,导致学习不稳定并过度模仿离线提示。在这项工作中,我们提出了ADHint,将困难视为提示比例调度和相对优势估计的关键因素,以实现探索和模仿之间更好的权衡。具体而言,我们提出了基于样本困难先验的自适应提示,该方法评估了每个样本在策略模型下的困难程度,并相应地安排适当的提示比例来指导其推演。我们还引入了基于一致性的梯度调制和选择性屏蔽提示以调制提示内的令牌级梯度,防止有偏见和破坏性的更新。此外,我们提出了基于推演困难后验的优势估计,利用有和没有提示的推演的相对困难来估计它们各自的优势,从而实现更平衡的更新。在各种模态、模型规模和领域的广泛实验中,ADHint展现出卓越的推理能力和超出分布的泛化能力,始终在pass@1和avg@8两个方面优于现有方法。我们的代码和数据集将在论文被接受后公开提供。
更新时间: 2025-12-15 08:53:47
领域: cs.CV,cs.LG
Cost-aware LLM-based Online Dataset Annotation
Recent advances in large language models (LLMs) have enabled automated dataset labeling with minimal human supervision. While majority voting across multiple LLMs can improve label reliability by mitigating individual model biases, it incurs high computational costs due to repeated querying. In this work, we propose a novel online framework, Cost-aware Majority Voting (CaMVo), for efficient and accurate LLM-based dataset annotation. CaMVo adaptively selects a subset of LLMs for each data instance based on contextual embeddings, balancing confidence and cost without requiring pre-training or ground-truth labels. Leveraging a LinUCB-based selection mechanism and a Bayesian estimator over confidence scores, CaMVo estimates a lower bound on labeling accuracy for each LLM and aggregates responses through weighted majority voting. Our empirical evaluation on the MMLU and IMDB Movie Review datasets demonstrates that CaMVo achieves comparable or superior accuracy to full majority voting while significantly reducing labeling costs. This establishes CaMVo as a practical and robust solution for cost-efficient annotation in dynamic labeling environments.
Updated: 2025-12-15 08:51:21
标题: 成本感知的基于LLM的在线数据集标注
摘要: 最近大型语言模型(LLMs)的进展使得自动数据集标记在最少人工监督下成为可能。虽然跨多个LLMs的多数投票可以通过减轻单个模型偏见来提高标签可靠性,但由于重复查询而产生高计算成本。在这项工作中,我们提出了一种新颖的在线框架,成本感知多数投票(CaMVo),用于基于LLM的数据集注释的高效和准确。CaMVo根据上下文嵌入自适应地为每个数据实例选择LLMs子集,平衡置信度和成本,而无需预先训练或地面真值标签。利用基于LinUCB的选择机制和基于置信度分数的贝叶斯估计器,CaMVo为每个LLM估计标记准确性的下限,并通过加权多数投票聚合响应。我们在MMLU和IMDB电影评论数据集上的实证评估表明,CaMVo在显著降低标记成本的同时实现了与全面多数投票相当或更高的准确性。这将CaMVo确立为动态标记环境中成本高效注释的实用和稳健解决方案。
更新时间: 2025-12-15 08:51:21
领域: cs.LG,cs.CL,cs.IT
Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation
Imitation learning (IL) has emerged as a central paradigm in autonomous driving. While IL excels in matching expert behavior in open-loop settings by minimizing per-step prediction errors, its performance degrades unexpectedly in closed-loop due to the gradual accumulation of small, often imperceptible errors over time.Over successive planning cycles, these errors compound, potentially resulting in severe failures.Current research efforts predominantly rely on increasingly sophisticated network architectures or high-fidelity training datasets to enhance the robustness of IL planners against error accumulation, focusing on the state-level robustness at a single time point. However, autonomous driving is inherently a continuous-time process, and leveraging the temporal scale to enhance robustness may provide a new perspective for addressing this issue.To this end, we propose a method termed Sequence of Experts (SoE), a temporal alternation policy that enhances closed-loop performance without increasing model size or data requirements. Our experiments on large-scale autonomous driving benchmarks nuPlan demonstrate that SoE method consistently and significantly improves the performance of all the evaluated models, and achieves state-of-the-art performance.This module may provide a key and widely applicable support for improving the training efficiency of autonomous driving models.
Updated: 2025-12-15 08:50:23
标题: 专家序列:通过时间交替增强模仿规划者,实现自动驾驶
摘要: 模仿学习(IL)已成为自动驾驶中的中心范式。虽然IL在通过最小化每步预测误差匹配专家行为方面在开环设置中表现出色,但在闭环中,其性能出乎意料地下降,这是由于随着时间的推移小而常常难以察觉的错误逐渐累积。在连续规划周期中,这些错误会累积,可能导致严重失败。目前的研究工作主要依赖于越来越复杂的网络架构或高保真度的训练数据集来增强IL规划器对错误累积的鲁棒性,重点放在单个时间点的状态级鲁棒性。然而,自动驾驶本质上是一个连续时间过程,利用时间尺度来增强鲁棒性可能为解决这一问题提供新的视角。为此,我们提出了一种称为专家序列(SoE)的方法,这是一种增强闭环性能而不增加模型大小或数据要求的时间交替策略。我们在大规模自动驾驶基准nuPlan上的实验表明,SoE方法始终显著改善了所有评估模型的性能,并实现了最先进的性能。该模块可能为提高自动驾驶模型的训练效率提供关键且广泛适用的支持。
更新时间: 2025-12-15 08:50:23
领域: cs.RO,cs.AI
PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations
Achieving efficient and robust whole-body control (WBC) is essential for enabling humanoid robots to perform complex tasks in dynamic environments. Despite the success of reinforcement learning (RL) in this domain, its sample inefficiency remains a significant challenge due to the intricate dynamics and partial observability of humanoid robots. To address this limitation, we propose PvP, a Proprioceptive-Privileged contrastive learning framework that leverages the intrinsic complementarity between proprioceptive and privileged states. PvP learns compact and task-relevant latent representations without requiring hand-crafted data augmentations, enabling faster and more stable policy learning. To support systematic evaluation, we develop SRL4Humanoid, the first unified and modular framework that provides high-quality implementations of representative state representation learning (SRL) methods for humanoid robot learning. Extensive experiments on the LimX Oli robot across velocity tracking and motion imitation tasks demonstrate that PvP significantly improves sample efficiency and final performance compared to baseline SRL methods. Our study further provides practical insights into integrating SRL with RL for humanoid WBC, offering valuable guidance for data-efficient humanoid robot learning.
Updated: 2025-12-15 08:50:20
标题: PvP:利用本体感优势对比表示进行数据高效的人形机器人学习
摘要: 实现高效且稳健的全身控制(WBC)对于使人形机器人能够在动态环境中执行复杂任务至关重要。尽管强化学习(RL)在这一领域取得了成功,但由于人形机器人复杂的动力学和部分可观测性,其样本效率仍然是一个重大挑战。为了解决这一限制,我们提出了PvP,一种基于本体感知优势对比学习框架,利用本体感知和优先状态之间的内在互补性。PvP学习紧凑且与任务相关的潜在表示,无需手工制作数据增强,从而实现更快速和更稳定的策略学习。为了支持系统化评估,我们开发了SRL4Humanoid,这是第一个统一且模块化的框架,为人形机器人学习提供了代表性状态表示学习(SRL)方法的高质量实现。对LimX Oli机器人进行的大量实验,涉及速度跟踪和动作模仿任务,表明与基准SRL方法相比,PvP显著提高了样本效率和最终性能。我们的研究进一步提供了将SRL与RL整合用于人形WBC的实际见解,为数据高效的人形机器人学习提供了宝贵的指导。
更新时间: 2025-12-15 08:50:20
领域: cs.RO,cs.LG
UniVCD: A New Method for Unsupervised Change Detection in the Open-Vocabulary Era
Change detection (CD) identifies scene changes from multi-temporal observations and is widely used in urban development and environmental monitoring. Most existing CD methods rely on supervised learning, making performance strongly dataset-dependent and incurring high annotation costs; they typically focus on a few predefined categories and generalize poorly to diverse scenes. With the rise of vision foundation models such as SAM2 and CLIP, new opportunities have emerged to relax these constraints. We propose Unified Open-Vocabulary Change Detection (UniVCD), an unsupervised, open-vocabulary change detection method built on frozen SAM2 and CLIP. UniVCD detects category-agnostic changes across diverse scenes and imaging geometries without any labeled data or paired change images. A lightweight feature alignment module is introduced to bridge the spatially detailed representations from SAM2 and the semantic priors from CLIP, enabling high-resolution, semantically aware change estimation while keeping the number of trainable parameters small. On top of this, a streamlined post-processing pipeline is further introduced to suppress noise and pseudo-changes, improving the detection accuracy for objects with well-defined boundaries. Experiments on several public BCD (Binary Change Detection) and SCD (Semantic Change Detection) benchmarks show that UniVCD achieves consistently strong performance and matches or surpasses existing open-vocabulary CD methods in key metrics such as F1 and IoU. The results demonstrate that unsupervised change detection with frozen vision foundation models and lightweight multi-modal alignment is a practical and effective paradigm for open-vocabulary CD. Code and pretrained models will be released at https://github.com/Die-Xie/UniVCD.
Updated: 2025-12-15 08:42:23
标题: UniVCD:一种新的无监督变化检测方法在开放词汇时代的应用
摘要: 变化检测(CD)从多时相观测中识别场景变化,在城市发展和环境监测中被广泛应用。大多数现有的CD方法依赖于监督学习,性能强烈依赖于数据集,并产生高昂的注释成本;它们通常关注少数预定义的类别,并且很难推广到不同的场景。随着SAM2和CLIP等视觉基础模型的兴起,出现了放松这些约束的新机会。我们提出了统一开放词汇变化检测(UniVCD),这是一种基于冻结的SAM2和CLIP的无监督、开放词汇变化检测方法。UniVCD可以在不需要任何标记数据或配对变化图像的情况下,在各种场景和成像几何中检测与类别无关的变化。引入了轻量级的特征对齐模块,以桥接来自SAM2的空间详细表示和来自CLIP的语义先验,从而实现高分辨率、语义感知的变化估计,同时保持可训练参数的数量较小。此外,还引入了简化的后处理流程,进一步抑制噪声和伪变化,提高了对具有明确定义边界的对象的检测准确性。在几个公共BCD(二元变化检测)和SCD(语义变化检测)基准上的实验表明,UniVCD在关键指标(如F1和IoU)上取得了一致强大的性能,并且与现有的开放词汇CD方法相匹配或超越。结果表明,利用冻结的视觉基础模型和轻量级多模态对齐进行无监督变化检测是一种实用且有效的开放词汱CD范式。代码和预训练模型将在https://github.com/Die-Xie/UniVCD 上发布。
更新时间: 2025-12-15 08:42:23
领域: cs.CV,cs.AI
Adaptive-lambda Subtracted Importance Sampled Scores in Machine Unlearning for DDPMs and VAEs
Machine Unlearning is essential for large generative models (VAEs, DDPMs) to comply with the right to be forgotten and prevent undesired content generation without costly retraining. Existing approaches, such as Static-lambda SISS for diffusion models, rely on a fixed mixing weight lambda, which is suboptimal because the required unlearning strength varies across samples and training stages. We propose Adaptive-lambda SISS, a principled extension that turns lambda into a latent variable dynamically inferred at each training step. A lightweight inference network parameterizes an adaptive posterior over lambda, conditioned on contextual features derived from the instantaneous SISS loss terms (retain/forget losses and their gradients). This enables joint optimization of the diffusion model and the lambda-inference mechanism via a variational objective, yielding significantly better trade-offs. We further extend the adaptive-lambda principle to score-based unlearning and introduce a multi-class variant of Score Forgetting Distillation. In addition, we present two new directions: (i) a hybrid objective combining the data-free efficiency of Score Forgetting Distillation with the direct gradient control of SISS, and (ii) a Reinforcement Learning formulation that treats unlearning as a sequential decision process, learning an optimal policy over a state space defined by the model's current memory of the forget set. Experiments on an augmented MNIST benchmark show that Adaptive-lambda SISS substantially outperforms the original static-lambda SISS, achieving stronger removal of forgotten classes while better preserving generation quality on the retain set.
Updated: 2025-12-15 08:37:27
标题: 机器遗忘中DDPMs和VAEs中的自适应λ减去重要性采样分数
摘要: 机器遗忘对于大型生成模型(如VAEs,DDPMs)遵守被遗忘权利并防止不必要的内容生成是至关重要的,而无需昂贵的重新训练。现有方法,如扩散模型的Static-lambda SISS,依赖于固定的混合权重lambda,这是次优的,因为所需的遗忘强度在样本和训练阶段之间变化。 我们提出了Adaptive-lambda SISS,这是一个合理的扩展,将lambda转化为一个潜变量,动态地在每个训练步骤中推断出来。一个轻量级推理网络参数化了一个自适应的后验概率,条件是基于瞬时SISS损失项(保留/遗忘损失及其梯度)派生的上下文特征。这使得通过变分目标对扩散模型和lambda推理机制进行联合优化,从而产生更好的权衡。 我们进一步将自适应lambda原则扩展到基于分数的遗忘,并引入了Score Forgetting Distillation的多类变体。此外,我们提出了两个新方向:(i)将Score Forgetting Distillation的无数据效率与SISS的直接梯度控制相结合的混合目标,以及(ii)将遗忘视为一个顺序决策过程的强化学习制定,学习一个在由模型当前对忘记集的记忆定义的状态空间上的最优策略。 对增强的MNIST基准进行的实验表明,Adaptive-lambda SISS在明显优于原始的Static-lambda SISS,实现了更强的删除被遗忘类别,同时更好地保留了保留集上的生成质量。
更新时间: 2025-12-15 08:37:27
领域: cs.LG,cs.AI
DiRe: Diversity-promoting Regularization for Dataset Condensation
In Dataset Condensation, the goal is to synthesize a small dataset that replicates the training utility of a large original dataset. Existing condensation methods synthesize datasets with significant redundancy, so there is a dire need to reduce redundancy and improve the diversity of the synthesized datasets. To tackle this, we propose an intuitive Diversity Regularizer (DiRe) composed of cosine similarity and Euclidean distance, which can be applied off-the-shelf to various state-of-the-art condensation methods. Through extensive experiments, we demonstrate that the addition of our regularizer improves state-of-the-art condensation methods on various benchmark datasets from CIFAR-10 to ImageNet-1K with respect to generalization and diversity metrics.
Updated: 2025-12-15 08:33:44
标题: DiRe:用于数据集精简的促进多样性的正则化
摘要: 在数据集压缩中,目标是合成一个小数据集,以复制大型原始数据集的训练效用。现有的压缩方法合成具有显着冗余的数据集,因此迫切需要减少冗余并改善合成数据集的多样性。为了解决这个问题,我们提出了一个直观的多样性正则化器(DiRe),由余弦相似度和欧氏距离组成,可以直接应用于各种最新的压缩方法。通过大量实验证明,我们的正则化器的添加改进了各种基准数据集(从CIFAR-10到ImageNet-1K)上的最新压缩方法,从而提高了泛化和多样性指标。
更新时间: 2025-12-15 08:33:44
领域: cs.CV,cs.LG
FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle
Predicting wildfire risk is a reasoning-intensive spatial problem that requires the integration of visual, climatic, and geographic factors to infer continuous risk maps. Existing methods lack the causal reasoning and multimodal understanding required for reliable generalization. We introduce $\textbf{FireScope-Bench}$, a large-scale dataset and benchmark that couples Sentinel-2 imagery and climate data with expert-defined risk rasters across the USA, and real wildfire events in Europe for cross-continental evaluation. Building on this dataset, we propose $\textbf{FireScope}$, a VLM-based reasoning-to-generation framework that learns from both reinforcement learning and visual supervision to predict risk rasters with complementary reasoning traces. When trained in the USA and tested in Europe, $\textbf{FireScope}$ achieves substantial performance gains, while expert feedback and automated analysis confirm that its reasoning traces are faithful and semantically meaningful. Our findings demonstrate that reasoning can ground raster prediction models, improving both generalization and interpretability. To our knowledge, this is the first framework to (1) demonstrate that language-based reasoning can improve generalization in visual generation, (2) propose a high-resolution wildfire risk model that can be applied across continents, and (3) enable systematic studies of robust cross-continental generalization for multimodal fire risk models. We believe that $\textbf{FireScope-Bench}$ has the potential to serve as a foundation for advancing reasoning-driven, interpretable and generalizable spatial modeling. Data and source code will be made publicly available.
Updated: 2025-12-15 08:25:07
标题: FireScope:使用一系列思维预言来预测森林火灾风险
摘要: 预测野火风险是一个需要整合视觉、气候和地理因素来推断连续风险地图的推理密集型空间问题。现有方法缺乏可靠泛化所需的因果推理和多模态理解。我们引入了 $\textbf{FireScope-Bench}$,一个大规模数据集和基准,将Sentinel-2图像和气候数据与美国范围内专家定义的风险栅格以及欧洲真实的野火事件进行跨洲评估。基于这个数据集,我们提出了 $\textbf{FireScope}$,一个基于VLM的推理到生成框架,通过从强化学习和视觉监督中学习来预测具有互补推理痕迹的风险栅格。当在美国进行训练并在欧洲进行测试时,$\textbf{FireScope}$实现了显著的性能提升,而专家反馈和自动分析证实其推理痕迹是忠实且语义有意义的。我们的研究结果表明,推理可以为栅格预测模型奠定基础,改善泛化和可解释性。据我们所知,这是第一个框架,(1)证明语言推理可以改善视觉生成中的泛化能力,(2)提出一个可以跨洲应用的高分辨率野火风险模型,(3)实现多模态火灾风险模型的跨洲鲁棒泛化的系统研究。我们相信 $\textbf{FireScope-Bench}$ 有潜力成为推进基于推理的、可解释的和泛化的空间建模的基础。数据和源代码将公开提供。
更新时间: 2025-12-15 08:25:07
领域: cs.CV,cs.LG
LikeBench: Evaluating Subjective Likability in LLMs for Personalization
A personalized LLM should remember user facts, apply them correctly, and adapt over time to provide responses that the user prefers. Existing LLM personalization benchmarks are largely centered on two axes: accurately recalling user information and accurately applying remembered information in downstream tasks. We argue that a third axis, likability, is both subjective and central to user experience, yet under-measured by current benchmarks. To measure likability holistically, we introduce LikeBench, a multi-session, dynamic evaluation framework that measures likability across multiple dimensions by how much an LLM can adapt over time to a user's preferences to provide more likable responses. In LikeBench, the LLMs engage in conversation with a simulated user and learn preferences only from the ongoing dialogue. As the interaction unfolds, models try to adapt to responses, and after each turn, they are evaluated for likability across seven dimensions by the same simulated user. To the best of our knowledge, we are the first to decompose likability into multiple diagnostic metrics: emotional adaptation, formality matching, knowledge adaptation, reference understanding, conversation length fit, humor fit, and callback, which makes it easier to pinpoint where a model falls short. To make the simulated user more realistic and discriminative, LikeBench uses fine-grained, psychologically grounded descriptive personas rather than the coarse high/low trait rating based personas used in prior work. Our benchmark shows that strong memory performance does not guarantee high likability: DeepSeek R1, with lower memory accuracy (86%, 17 facts/profile), outperformed Qwen3 by 28% on likability score despite Qwen3's higher memory accuracy (93%, 43 facts/profile). Even SOTA models like GPT-5 adapt well in short exchanges but show only limited robustness in longer, noisier interactions.
Updated: 2025-12-15 08:18:42
标题: LikeBench:用于个性化LLMs中主观喜好评估
摘要: 个性化的LLM应该记住用户的信息,正确应用这些信息,并随着时间的推移适应,以提供用户喜欢的响应。现有的LLM个性化基准主要集中在两个方面:准确回忆用户信息和在下游任务中准确应用记住的信息。我们认为,一个第三个方面,即喜好度,是主观的且对用户体验至关重要,但目前的基准测试对其测量不足。为了全面衡量喜好度,我们引入了LikeBench,一个多会话、动态评估框架,通过LLM如何随时间适应用户的偏好以提供更受欢迎的响应来跨多个维度衡量喜好度。在LikeBench中,LLMs与模拟用户进行对话,仅从进行中的对话中学习偏好。随着交互的展开,模型尝试适应响应,并在每个回合后,由同一模拟用户评估喜好度跨七个维度。据我们所知,我们是第一个将喜好度分解为多个诊断指标的团队:情感适应、形式匹配、知识适应、参考理解、对话长度匹配、幽默匹配和回呼,这样更容易确定模型的不足之处。为了使模拟用户更真实和有区分性,LikeBench使用了细粒度、基于心理学的描述性人物而不是先前工作中使用的粗粒度高/低特质评分的人物。我们的基准测试显示,强大的记忆性能并不保证高喜好度:具有较低记忆准确性(86%,17个事实/个人资料)的DeepSeek R1,在喜好度评分上比Qwen3高出28%,尽管Qwen3的记忆准确性更高(93%,43个事实/个人资料)。即使是像GPT-5这样的SOTA模型在短暂的交流中表现出色,但在更长、更嘈杂的交互中只显示有限的鲁棒性。
更新时间: 2025-12-15 08:18:42
领域: cs.LG
A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval
Dense retrieval has become the industry standard in large-scale information retrieval systems due to its high efficiency and competitive accuracy. Its core relies on a coarse-to-fine hierarchical architecture that enables rapid candidate selection and precise semantic matching, achieving millisecond-level response over billion-scale corpora. This capability makes it essential not only in traditional search and recommendation scenarios but also in the emerging paradigm of generative recommendation driven by large language models, where semantic IDs-themselves a form of coarse-to-fine representation-play a foundational role. However, the widely adopted dual-tower encoding architecture introduces inherent challenges, primarily representational space misalignment and retrieval index inconsistency, which degrade matching accuracy, retrieval stability, and performance on long-tail queries. These issues are further magnified in semantic ID generation, ultimately limiting the performance ceiling of downstream generative models. To address these challenges, this paper proposes a simple and effective framework named SCI comprising two synergistic modules: a symmetric representation alignment module that employs an innovative input-swapping mechanism to unify the dual-tower representation space without adding parameters, and an consistent indexing with dual-tower synergy module that redesigns retrieval paths using a dual-view indexing strategy to maintain consistency from training to inference. The framework is systematic, lightweight, and engineering-friendly, requiring minimal overhead while fully supporting billion-scale deployment. We provide theoretical guarantees for our approach, with its effectiveness validated by results across public datasets and real-world e-commerce datasets.
Updated: 2025-12-15 08:11:24
标题: 一个简单而有效的框架用于大规模密集检索中的对称一致索引化
摘要: 稠密检索已经成为大规模信息检索系统中的行业标准,因为它具有高效率和竞争力的准确性。其核心依赖于一种粗到细的分层架构,使得快速候选者选择和精确语义匹配成为可能,在十亿级语料库上实现毫秒级响应。这种能力使其不仅在传统搜索和推荐场景中至关重要,而且在由大型语言模型驱动的生成推荐新范式中也至关重要,其中语义ID本身是一种粗到细的表示形式,发挥着基础性作用。然而,被广泛采用的双塔编码架构引入了固有挑战,主要是表示空间不对齐和检索索引不一致,这会降低匹配准确性,检索稳定性,并且在长尾查询上影响性能。这些问题在语义ID生成中进一步放大,最终限制了下游生成模型的性能上限。 为了解决这些挑战,本文提出了一个名为SCI的简单而有效的框架,由两个协同模块组成:一个对称表示对齐模块,利用创新的输入交换机制来统一双塔表示空间,而不添加参数,以及一个与双塔协同一致的索引模块,通过使用双视图索引策略重新设计检索路径,以保持从训练到推断的一致性。该框架系统化,轻量级,符合工程需求,需要最小的额外开销,同时完全支持十亿级部署。我们为我们的方法提供了理论保证,通过对公共数据集和真实世界电子商务数据集的结果验证了其有效性。
更新时间: 2025-12-15 08:11:24
领域: cs.IR,cs.AI
SpectrumFM: A Foundation Model for Intelligent Spectrum Management
Intelligent spectrum management is crucial for improving spectrum efficiency and achieving secure utilization of spectrum resources. However, existing intelligent spectrum management methods, typically based on small-scale models, suffer from notable limitations in recognition accuracy, convergence speed, and generalization, particularly in the complex and dynamic spectrum environments. To address these challenges, this paper proposes a novel spectrum foundation model, termed SpectrumFM, establishing a new paradigm for spectrum management. SpectrumFM features an innovative encoder architecture that synergistically exploits the convolutional neural networks and the multi-head self-attention mechanisms to enhance feature extraction and enable robust representation learning. The model is pre-trained via two novel self-supervised learning tasks, namely masked reconstruction and next-slot signal prediction, which leverage large-scale in-phase and quadrature (IQ) data to achieve comprehensive and transferable spectrum representations. Furthermore, a parameter-efficient fine-tuning strategy is proposed to enable SpectrumFM to adapt to various downstream spectrum management tasks, including automatic modulation classification (AMC), wireless technology classification (WTC), spectrum sensing (SS), and anomaly detection (AD). Extensive experiments demonstrate that SpectrumFM achieves superior performance in terms of accuracy, robustness, adaptability, few-shot learning efficiency, and convergence speed, consistently outperforming conventional methods across multiple benchmarks. Specifically, SpectrumFM improves AMC accuracy by up to 12.1% and WTC accuracy by 9.3%, achieves an area under the curve (AUC) of 0.97 in SS at -4 dB signal-to-noise ratio (SNR), and enhances AD performance by over 10%.
Updated: 2025-12-15 08:08:20
标题: SpectrumFM:智能频谱管理的基础模型
摘要: 智能频谱管理对于提高频谱效率和实现频谱资源的安全利用至关重要。然而,现有的智能频谱管理方法通常基于小规模模型,存在在识别精度、收敛速度和泛化能力方面的明显局限性,尤其是在复杂和动态的频谱环境中。为了解决这些挑战,本文提出了一种新颖的频谱基础模型,称为SpectrumFM,建立了一种新的频谱管理范式。SpectrumFM具有创新的编码器架构,协同利用卷积神经网络和多头自注意机制来增强特征提取并实现强大的表示学习。该模型通过两个新颖的自监督学习任务进行预训练,即掩码重建和下一个时隙信号预测,利用大规模的正交(IQ)数据实现全面和可迁移的频谱表示。此外,提出了一种参数高效的微调策略,使SpectrumFM能够适应各种下游频谱管理任务,包括自动调制分类(AMC)、无线技术分类(WTC)、频谱感知(SS)和异常检测(AD)。大量实验证明,SpectrumFM在准确性、鲁棒性、适应性、少样本学习效率和收敛速度方面表现出色,始终优于多个基准方法。具体而言,SpectrumFM将AMC准确率提高了最多12.1%,WTC准确率提高了9.3%,在-4 dB信噪比(SNR)下SS的曲线下面积(AUC)达到了0.97,并将AD性能提升了超过10%。
更新时间: 2025-12-15 08:08:20
领域: eess.SP,cs.AI
M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization
Self-supervised reinforcement learning (RL) presents a promising approach for enhancing the reasoning capabilities of Large Language Models (LLMs) without reliance on expensive human-annotated data. However, we find that existing methods suffer from a critical failure mode under long-horizon training: a "policy collapse" where performance precipitously degrades. We diagnose this instability and demonstrate that simply scaling the number of rollouts -- a common strategy to improve performance -- only delays, but does not prevent, this collapse. To counteract this instability, we first introduce M-GRPO (Momentum-Anchored Group Relative Policy Optimization), a framework that leverages a slowly evolving momentum model to provide a stable training target. In addition, we identify that this process is often accompanied by a rapid collapse in policy entropy, resulting in a prematurely confident and suboptimal policy. To specifically address this issue, we propose a second contribution: an adaptive filtering method based on the interquartile range (IQR) that dynamically prunes low-entropy trajectories, preserving essential policy diversity. Our extensive experiments on multiple reasoning benchmarks demonstrate that M-GRPO stabilizes the training process while the IQR filter prevents premature convergence. The combination of these two innovations leads to superior training stability and state-of-the-art performance.
Updated: 2025-12-15 08:07:23
标题: M-GRPO:使用动量锚定策略优化稳定的大型语言模型自监督强化学习
摘要: 自监督强化学习(RL)为增强大型语言模型(LLMs)的推理能力提供了一种有前途的方法,而无需依赖昂贵的人工标注数据。然而,我们发现现有方法在长时间训练下存在一个关键的失败模式:一种“策略崩溃”,导致性能急剧下降。我们诊断了这种不稳定性,并证明简单地扩大回合数——一种提高性能的常见策略——只能延迟,而不能阻止这种崩溃。为了对抗这种不稳定性,我们首先引入了M-GRPO(动量锚定组相对策略优化),这是一个利用缓慢演变的动量模型提供稳定训练目标的框架。此外,我们发现这一过程通常伴随着策略熵的快速崩溃,导致出现过早自信且次优的策略。为了专门解决这个问题,我们提出了第二个贡献:基于四分位距(IQR)的自适应滤波方法,动态修剪低熵轨迹,保留重要的策略多样性。我们在多个推理基准上进行了大量实验,证明了M-GRPO稳定了训练过程,而IQR过滤器防止了过早的收敛。这两种创新的结合导致了更优秀的训练稳定性和最先进的性能。
更新时间: 2025-12-15 08:07:23
领域: cs.AI,cs.CL
Multi-fidelity aerodynamic data fusion by autoencoder transfer learning
Accurate aerodynamic prediction often relies on high-fidelity simulations; however, their prohibitive computational costs severely limit their applicability in data-driven modeling. This limitation motivates the development of multi-fidelity strategies that leverage inexpensive low-fidelity information without compromising accuracy. Addressing this challenge, this work presents a multi-fidelity deep learning framework that combines autoencoder-based transfer learning with a newly developed Multi-Split Conformal Prediction (MSCP) strategy to achieve uncertainty-aware aerodynamic data fusion under extreme data scarcity. The methodology leverages abundant Low-Fidelity (LF) data to learn a compact latent physics representation, which acts as a frozen knowledge base for a decoder that is subsequently fine-tuned using scarce HF samples. Tested on surface-pressure distributions for NACA airfoils (2D) and a transonic wing (3D) databases, the model successfully corrects LF deviations and achieves high-accuracy pressure predictions using minimal HF training data. Furthermore, the MSCP framework produces robust, actionable uncertainty bands with pointwise coverage exceeding 95%. By combining extreme data efficiency with uncertainty quantification, this work offers a scalable and reliable solution for aerodynamic regression in data-scarce environments.
Updated: 2025-12-15 08:06:52
标题: 多保真度气动数据融合的自编码器迁移学习
摘要: 准确的空气动力学预测通常依赖于高保真度的模拟;然而,它们的计算成本过高严重限制了它们在数据驱动建模中的适用性。这一限制促使发展多保真度策略,利用廉价的低保真度信息而不损害准确性。为了解决这一挑战,本文提出了一个多保真度深度学习框架,结合基于自动编码器的迁移学习和新开发的多分割一致性预测(MSCP)策略,实现在极端数据稀缺情况下的带有不确定性感知的空气动力学数据融合。该方法利用大量的低保真度(LF)数据学习一个紧凑的潜在物理表示,该表示作为一个冻结的知识库,用于一个解码器,随后使用稀缺的HF样本进行微调。在NACA翼型(2D)和跨音速机翼(3D)数据库的表面压力分布上进行测试,该模型成功纠正了LF的偏差,并利用最少的HF训练数据实现了高准确性的压力预测。此外,MSCP框架生成了稳健的、可操作的不确定性带,点覆盖率超过95%。通过将极端数据效率与不确定性量化相结合,本文为数据稀缺环境中的空气动力学回归提供了一种可扩展且可靠的解决方案。
更新时间: 2025-12-15 08:06:52
领域: cs.LG,physics.flu-dyn,stat.ML
Latent-Autoregressive GP-VAE Language Model
We investigate a fully Latent AutoRegressive scheme based on a Gaussian Process (GP) integrated into a Variational Autoencoder (VAE). In this setting, sequential dynamics are transferred from the observation space to a continuous latent space, while linguistic generation remains parallel through a non-autoregressive decoder. We present a complete methodological formulation, including a causal GP prior, a structured amortized posterior, and a training protocol based on a regularized ELBO. Empirical evaluation, conducted within a deliberately constrained proof-of-concept (POC) framework, shows that the model can be trained stably and that the sequential and parallel sampling variants exhibit consistent behavior. Overall, the results suggest that part of the temporal structure in a language model can be supported by the probabilistic geometry of the latent space rather than by explicit neural operations.
Updated: 2025-12-15 08:05:00
标题: 潜在-自回归GP-VAE语言模型
摘要: 我们研究了一个基于高斯过程(GP)的完全潜在自回归方案,将其整合到变分自编码器(VAE)中。在这种设置下,顺序动态从观测空间转移到连续的潜在空间,而语言生成通过非自回归解码器保持并行。我们提出了一个完整的方法论公式,包括因果GP先验、结构化的摊销后验和基于正则化ELBO的训练协议。在一个故意受限的概念验证(POC)框架内进行的实证评估显示,模型可以稳定训练,并且顺序和并行抽样变体表现一致。总的来说,结果表明语言模型中的部分时间结构可以通过潜在空间的概率几何来支持,而不是通过明确的神经操作。
更新时间: 2025-12-15 08:05:00
领域: cs.LG
End-to-End Reinforcement Learning of Koopman Models for eNMPC of an Air Separation Unit
With our recently proposed method based on reinforcement learning (Mayfrank et al. (2024), Comput. Chem. Eng. 190), Koopman surrogate models can be trained for optimal performance in specific (economic) nonlinear model predictive control ((e)NMPC) applications. So far, our method has exclusively been demonstrated on a small-scale case study. Herein, we show that our method scales well to a more challenging demand response case study built on a large-scale model of a single-product (nitrogen) air separation unit. Across all numerical experiments, we assume observability of only a few realistically measurable plant variables. Compared to a purely system identification-based Koopman eNMPC, which generates small economic savings but frequently violates constraints, our method delivers similar economic performance while avoiding constraint violations.
Updated: 2025-12-15 07:55:20
标题: 空分设备的eNMPC的Koopman模型的端到端强化学习
摘要: 通过我们最近提出的基于强化学习的方法(Mayfrank等人(2024年),Comput. Chem. Eng. 190),可以训练Koopman代理模型,以实现在特定(经济)非线性模型预测控制((e)NMPC)应用中的最佳性能。到目前为止,我们的方法仅在一个小规模案例研究中进行了演示。在这里,我们展示了我们的方法在一个更具挑战性的需求响应案例研究中的扩展,该案例建立在一个单产品(氮气)空气分离装置的大规模模型上。在所有数值实验中,我们假设只能观测到少量实际可测量的工厂变量。与纯粹基于系统辨识的Koopman eNMPC相比,后者产生了少量经济节省但经常违反约束条件,我们的方法提供了类似的经济表现,同时避免了约束条件的违反。
更新时间: 2025-12-15 07:55:20
领域: cs.LG,math.OC
LLM Rationalis? Measuring Bargaining Capabilities of AI Negotiators
Bilateral negotiation is a complex, context-sensitive task in which human negotiators dynamically adjust anchors, pacing, and flexibility to exploit power asymmetries and informal cues. We introduce a unified mathematical framework for modeling concession dynamics based on a hyperbolic tangent curve, and propose two metrics burstiness tau and the Concession-Rigidity Index (CRI) to quantify the timing and rigidity of offer trajectories. We conduct a large-scale empirical comparison between human negotiators and four state-of-the-art large language models (LLMs) across natural-language and numeric-offers settings, with and without rich market context, as well as six controlled power-asymmetry scenarios. Our results reveal that, unlike humans who smoothly adapt to situations and infer the opponents position and strategies, LLMs systematically anchor at extremes of the possible agreement zone for negotiations and optimize for fixed points irrespective of leverage or context. Qualitative analysis further shows limited strategy diversity and occasional deceptive tactics used by LLMs. Moreover the ability of LLMs to negotiate does not improve with better models. These findings highlight fundamental limitations in current LLM negotiation capabilities and point to the need for models that better internalize opponent reasoning and context-dependent strategy.
Updated: 2025-12-15 07:50:09
标题: LLM Rationalis?测量AI谈判者的谈判能力
摘要: 双边谈判是一项复杂的、具有情境敏感性的任务,在这个任务中,人类谈判者动态调整锚点、节奏和灵活性,以利用权力不对称和非正式线索。我们引入了一个统一的数学框架,用于建模基于双曲正切曲线的让步动态,并提出了两个度量标准burstiness tau和让步刚度指数(CRI),用于量化提议轨迹的时机和刚度。我们在自然语言和数字提议设置中进行了人类谈判者和四种最先进的大型语言模型(LLMs)之间的大规模实证比较,包括有丰富市场背景和没有市场背景的情况,以及六种受控的权力不对称场景。我们的结果显示,与人类不断适应情况、推断对手立场和策略的平滑过渡不同,LLMs系统地锚定在可能协议区域的极端,并优化固定点,不考虑杠杆或情境。定性分析进一步显示,LLMs使用的策略多样性有限,偶尔使用欺骗性战术。此外,LLMs的谈判能力并不随着更好的模型而改善。这些发现突显了当前LLM谈判能力的基本局限性,并指出需要更好地内化对手推理和情境依赖策略的模型。
更新时间: 2025-12-15 07:50:09
领域: cs.CL,cs.AI
WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control
Humanoid robots require precise locomotion and dexterous manipulation to perform challenging loco-manipulation tasks. Yet existing approaches, modular or end-to-end, are deficient in manipulation-aware locomotion. This confines the robot to a limited workspace, preventing it from performing large-space loco-manipulation. We attribute this to: (1) the challenge of acquiring loco-manipulation knowledge due to the scarcity of humanoid teleoperation data, and (2) the difficulty of faithfully and reliably executing locomotion commands, stemming from the limited precision and stability of existing RL controllers. To acquire richer loco-manipulation knowledge, we propose a unified latent learning framework that enables Vision-Language-Action (VLA) system to learn from low-cost action-free egocentric videos. Moreover, an efficient human data collection pipeline is devised to augment the dataset and scale the benefits. To execute the desired locomotion commands more precisely, we present a loco-manipulation-oriented (LMO) RL policy specifically tailored for accurate and stable core loco-manipulation movements, such as advancing, turning, and squatting. Building on these components, we introduce WholeBodyVLA, a unified framework for humanoid loco-manipulation. To the best of our knowledge, WholeBodyVLA is one of its kind enabling large-space humanoid loco-manipulation. It is verified via comprehensive experiments on the AgiBot X2 humanoid, outperforming prior baseline by 21.3%. It also demonstrates strong generalization and high extensibility across a broad range of tasks.
Updated: 2025-12-15 07:46:35
标题: WholeBodyVLA:朝向统一的全身隐含VLA以实现全身运动操作控制
摘要: 人形机器人需要精确的运动和灵巧的操作来执行具有挑战性的运动-操作任务。然而,现有的模块化或端到端方法在操作感知的运动方面存在不足。这限制了机器人在有限的工作空间内活动,阻碍了其进行大空间的运动-操作。我们将这归因于:(1)由于人形遥操作数据的稀缺性,获取运动-操作知识的挑战,以及(2)由于现有强化学习控制器的有限精度和稳定性,准确和可靠地执行运动命令的困难。为了获取更丰富的运动-操作知识,我们提出了一个统一的潜在学习框架,使视觉-语言-动作(VLA)系统能够从低成本的无动作自我中心视频中学习。此外,设计了一个高效的人类数据收集流水线,以增加数据集并扩大收益。为了更精确地执行所需的运动命令,我们提出了一个针对准确和稳定的核心运动-操作动作(如前进、转向和蹲下)量身定制的运动-操作导向(LMO)强化学习策略。基于这些组件,我们引入了WholeBodyVLA,一个用于人形机器人运动-操作的统一框架。据我们所知,WholeBodyVLA是一种独一无二的,能够实现大空间人形机器人运动-操作的方法。通过对AgiBot X2人形机器人进行全面实验验证,性能比之前的基线提高了21.3%。它还展示了强大的泛化能力和高度的可扩展性,适用于广泛的任务范围。
更新时间: 2025-12-15 07:46:35
领域: cs.RO,cs.AI,cs.CV
Why quantum state verification cannot be both efficient and secure: a categorical approach
The advantage of quantum protocols lies in the inherent properties of the shared quantum states. These states are sometimes provided by sources that are not trusted, and therefore need to be verified. Finding secure and efficient quantum state verification protocols remains a big challenge, and recent works illustrate trade-offs between efficiency and security for different groups of states in restricted settings. However, whether a universal trade-off exists for all quantum states and all verification strategies remains unknown. In this work, we instantiate the categorical composable cryptography framework to show a fundamental limit for quantum state verification for all cut-and-choose approaches used to verify arbitrary quantum states. Our findings show that the prevailing cut-and-choose techniques cannot lead to quantum state verification protocols that are both efficient and secure.
Updated: 2025-12-15 07:41:49
标题: 为什么量子态验证不能同时高效和安全:一个范畴论方法论
摘要: 量子协议的优势在于共享量子态的固有属性。这些状态有时由不受信任的来源提供,因此需要进行验证。寻找安全且高效的量子态验证协议仍然是一个巨大挑战,最近的研究表明在受限制的情境下不同状态群体之间效率和安全性之间存在权衡。然而,是否存在所有量子态和所有验证策略的普遍权衡仍未知。在这项工作中,我们利用分类组合密码学框架展示了对用于验证任意量子态的切割和选择方法的量子态验证的基本极限。我们的发现表明,目前流行的切割和选择技术不能导致既高效又安全的量子状态验证协议。
更新时间: 2025-12-15 07:41:49
领域: quant-ph,cs.CR,math.CT
Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments
This paper addresses the challenges of low scheduling efficiency, unbalanced resource allocation, and poor adaptability in ETL (Extract-Transform-Load) processes under heterogeneous data environments by proposing an intelligent scheduling optimization framework based on deep Q-learning. The framework formalizes the ETL scheduling process as a Markov Decision Process and enables adaptive decision-making by a reinforcement learning agent in high-dimensional state spaces to dynamically optimize task allocation and resource scheduling. The model consists of a state representation module, a feature embedding network, a Q-value estimator, and a reward evaluation mechanism, which collectively consider task dependencies, node load states, and data flow characteristics to derive the optimal scheduling strategy in complex environments. A multi-objective reward function is designed to balance key performance indicators such as average scheduling delay, task completion rate, throughput, and resource utilization. Sensitivity experiments further verify the model's robustness under changes in hyperparameters, environmental dynamics, and data scale. Experimental results show that the proposed deep Q-learning scheduling framework significantly reduces scheduling delay, improves system throughput, and enhances execution stability under multi-source heterogeneous task conditions, demonstrating the strong potential of reinforcement learning in complex data scheduling and resource management, and providing an efficient and scalable optimization strategy for intelligent data pipeline construction.
Updated: 2025-12-15 07:38:47
标题: 基于深度Q学习的异构数据环境中ETL优化的智能调度
摘要: 本文提出了一种基于深度Q学习的智能调度优化框架,以解决异构数据环境下ETL(提取-转换-加载)过程中低调度效率、资源分配不均衡和适应性差的挑战。该框架将ETL调度过程形式化为马尔可夫决策过程,并通过强化学习代理在高维状态空间中进行自适应决策,动态优化任务分配和资源调度。该模型包括状态表示模块、特征嵌入网络、Q值估计器和奖励评估机制,共同考虑任务依赖关系、节点负载状态和数据流特性,以推导出复杂环境中的最佳调度策略。设计了一个多目标奖励函数,平衡关键性能指标,如平均调度延迟、任务完成率、吞吐量和资源利用率。敏感性实验进一步验证了模型在超参数、环境动态和数据规模变化下的稳健性。实验结果表明,所提出的深度Q学习调度框架显著减少了调度延迟,提高了系统吞吐量,并增强了在多源异构任务条件下的执行稳定性,展示了强化学习在复杂数据调度和资源管理中的巨大潜力,并为智能数据流水线构建提供了高效且可扩展的优化策略。
更新时间: 2025-12-15 07:38:47
领域: cs.LG
EmeraldMind: A Knowledge Graph-Augmented Framework for Greenwashing Detection
As AI and web agents become pervasive in decision-making, it is critical to design intelligent systems that not only support sustainability efforts but also guard against misinformation. Greenwashing, i.e., misleading corporate sustainability claims, poses a major challenge to environmental progress. To address this challenge, we introduce EmeraldMind, a fact-centric framework integrating a domain-specific knowledge graph with retrieval-augmented generation to automate greenwashing detection. EmeraldMind builds the EmeraldGraph from diverse corporate ESG (environmental, social, and governance) reports, surfacing verifiable evidence, often missing in generic knowledge bases, and supporting large language models in claim assessment. The framework delivers justification-centric classifications, presenting transparent, evidence-backed verdicts and abstaining responsibly when claims cannot be verified. Experiments on a new greenwashing claims dataset demonstrate that EmeraldMind achieves competitive accuracy, greater coverage, and superior explanation quality compared to generic LLMs, without the need for fine-tuning or retraining.
Updated: 2025-12-15 07:14:59
标题: EmeraldMind: 一种用于检测绿色洗白的知识图增强框架
摘要: 随着人工智能和网络代理在决策中变得普遍,设计智能系统不仅支持可持续发展努力,还能防范错误信息是至关重要的。绿色洗白,即误导性的企业可持续性声明,对环境进展构成重大挑战。为了应对这一挑战,我们引入了EmeraldMind,这是一个以事实为中心的框架,将领域特定知识图与检索增强生成相结合,以自动化检测绿色洗白。EmeraldMind从不同的企业ESG(环境、社会和治理)报告构建EmeraldGraph,展示可验证的证据,通常在通用知识库中缺失,并支持大规模语言模型进行声明评估。该框架提供以理由为中心的分类,呈现透明、证据支持的判定,并在无法验证的情况下负责地弃权。对一个新的绿色洗白声明数据集进行的实验表明,与通用LLMs相比,EmeraldMind实现了竞争性准确性、更广泛的覆盖范围和更优质的解释质量,而无需进行精细调整或重新训练。
更新时间: 2025-12-15 07:14:59
领域: cs.AI
GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training
Multi-turn reinforcement learning (RL) for multi-modal agents built upon vision-language models (VLMs) is hampered by sparse rewards and long-horizon credit assignment. Recent methods densify the reward by querying a teacher that provides step-level feedback, e.g., Guided Thought Reinforcement (GTR) and On-Policy Distillation, but rely on costly, often privileged models as the teacher, limiting practicality and reproducibility. We introduce GTR-Turbo, a highly efficient upgrade to GTR, which matches the performance without training or querying an expensive teacher model. Specifically, GTR-Turbo merges the weights of checkpoints produced during the ongoing RL training, and then uses this merged model as a "free" teacher to guide the subsequent RL via supervised fine-tuning or soft logit distillation. This design removes dependence on privileged VLMs (e.g., GPT or Gemini), mitigates the "entropy collapse" observed in prior work, and keeps training stable. Across diverse visual agentic tasks, GTR-Turbo improves the accuracy of the baseline model by 10-30% while reducing wall-clock training time by 50% and compute cost by 60% relative to GTR.
Updated: 2025-12-15 07:11:56
标题: GTR-Turbo:合并的检查点暗中是主动VLM训练的免费老师
摘要: 基于视觉-语言模型(VLMs)构建的多模态代理的多轮强化学习(RL)受到稀疏奖励和长期信用分配的限制。最近的方法通过查询提供步级反馈的教师来增加奖励,例如,引导思想强化(GTR)和在政策蒸馏,但依赖于成本高昂,通常是特权模型作为教师,限制了实用性和可重复性。我们引入了GTR-Turbo,这是对GTR的高效升级版本,它在不训练或查询昂贵的教师模型的情况下与性能相匹配。具体而言,GTR-Turbo合并了在进行中的RL训练期间产生的检查点的权重,然后使用这个合并模型作为“免费”教师来通过监督微调或软logit蒸馏指导后续的RL。这种设计消除了对特权VLMs(例如GPT或Gemini)的依赖,减轻了在先前工作中观察到的“熵坍塌”现象,并保持了训练的稳定性。在各种视觉代理任务中,相对于GTR,GTR-Turbo将基线模型的准确性提高了10-30%,同时将壁钟训练时间减少了50%,计算成本减少了60%。
更新时间: 2025-12-15 07:11:56
领域: cs.CV,cs.AI
Understanding Structured Financial Data with LLMs: A Case Study on Fraud Detection
Detecting fraud in financial transactions typically relies on tabular models that demand heavy feature engineering to handle high-dimensional data and offer limited interpretability, making it difficult for humans to understand predictions. Large Language Models (LLMs), in contrast, can produce human-readable explanations and facilitate feature analysis, potentially reducing the manual workload of fraud analysts and informing system refinements. However, they perform poorly when applied directly to tabular fraud detection due to the difficulty of reasoning over many features, the extreme class imbalance, and the absence of contextual information. To bridge this gap, we introduce FinFRE-RAG, a two-stage approach that applies importance-guided feature reduction to serialize a compact subset of numeric/categorical attributes into natural language and performs retrieval-augmented in-context learning over label-aware, instance-level exemplars. Across four public fraud datasets and three families of open-weight LLMs, FinFRE-RAG substantially improves F1/MCC over direct prompting and is competitive with strong tabular baselines in several settings. Although these LLMs still lag behind specialized classifiers, they narrow the performance gap and provide interpretable rationales, highlighting their value as assistive tools in fraud analysis.
Updated: 2025-12-15 07:09:11
标题: 利用LLMs理解结构化金融数据:一项欺诈检测案例研究
摘要: 在金融交易中检测欺诈通常依赖于要求大量特征工程来处理高维数据并提供有限可解释性的表格模型,这使得人类难以理解预测。相比之下,大型语言模型(LLMs)可以生成人类可读的解释,并促进特征分析,潜在地减少欺诈分析师的手动工作量并提供系统改进的信息。然而,当直接应用于表格欺诈检测时,它们的性能较差,原因在于难以推理许多特征、极端的类别不平衡和缺乏上下文信息。为了弥合这一差距,我们引入了FinFRE-RAG,这是一个两阶段方法,通过重要性引导的特征缩减将数值/分类属性的紧凑子集序列化为自然语言,并在标签感知、实例级样本上执行检索增强的上下文学习。在四个公共欺诈数据集和三个开放权重LLMs家族中,FinFRE-RAG在直接提示上显著提高了F1/MCC,并在几种设置中与强大的表格基线竞争。尽管这些LLMs仍落后于专门的分类器,但它们缩小了性能差距并提供可解释的理由,突显它们作为欺诈分析中辅助工具的价值。
更新时间: 2025-12-15 07:09:11
领域: cs.LG,cs.CL
Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models
Concept erasure, which fine-tunes diffusion models to remove undesired or harmful visual concepts, has become a mainstream approach to mitigating unsafe or illegal image generation in text-to-image models.However, existing removal methods typically adopt a unidirectional erasure strategy by either suppressing the target concept or reinforcing safe alternatives, making it difficult to achieve a balanced trade-off between concept removal and generation quality. To address this limitation, we propose a novel Bidirectional Image-Guided Concept Erasure (Bi-Erasing) framework that performs concept suppression and safety enhancement simultaneously. Specifically, based on the joint representation of text prompts and corresponding images, Bi-Erasing introduces two decoupled image branches: a negative branch responsible for suppressing harmful semantics and a positive branch providing visual guidance for safe alternatives. By jointly optimizing these complementary directions, our approach achieves a balance between erasure efficacy and generation usability. In addition, we apply mask-based filtering to the image branches to prevent interference from irrelevant content during the erasure process. Across extensive experiment evaluations, the proposed Bi-Erasing outperforms baseline methods in balancing concept removal effectiveness and visual fidelity.
Updated: 2025-12-15 07:08:35
标题: Bi-Erasing:扩散模型中概念移除的双向框架
摘要: 概念抹除已经成为减轻文本到图像模型中不安全或非法图像生成的主流方法,可以微调扩散模型以消除不受欢迎或有害的视觉概念。然而,现有的抹除方法通常采用单向抹除策略,要么压制目标概念,要么强化安全的替代方案,这使得很难在概念抹除和生成质量之间取得平衡。为了解决这一局限性,我们提出了一种新颖的双向图像引导概念抹除(Bi-Erasing)框架,可以同时进行概念抑制和安全增强。具体来说,基于文本提示和相应图像的联合表示,Bi-Erasing引入了两个解耦的图像分支:一个负责抑制有害语义的负分支,一个为安全替代提供视觉指导的正分支。通过共同优化这些互补方向,我们的方法在抹除效果和生成可用性之间取得了平衡。此外,我们还在图像分支上应用基于掩模的过滤,以防止在抹除过程中干扰来自无关内容。通过广泛的实验评估,所提出的Bi-Erasing在平衡概念抹除效果和视觉保真度方面优于基准方法。
更新时间: 2025-12-15 07:08:35
领域: cs.CV,cs.CR
Progressive Refinement of E-commerce Search Ranking Based on Short-Term Activities of the Buyer
In e-commerce shopping, aligning search results with a buyer's immediate needs and preferences presents a significant challenge, particularly in adapting search results throughout the buyer's shopping journey as they move from the initial stages of browsing to making a purchase decision or shift from one intent to another. This study presents a systematic approach to adapting e-commerce search results based on the current context. We start with basic methods and incrementally incorporate more contextual information and state-of-the-art techniques to improve the search outcomes. By applying this evolving contextual framework to items displayed on the search engine results page (SERP), we progressively align search outcomes more closely with the buyer's interests and current search intentions. Our findings demonstrate that this incremental enhancement, from simple heuristic autoregressive features to advanced sequence models, significantly improves ranker performance. The integration of contextual techniques enhances the performance of our production ranker, leading to improved search results in both offline and online A/B testing in terms of Mean Reciprocal Rank (MRR). Overall, the paper details iterative methodologies and their substantial contributions to search result contextualization on e-commerce platforms.
Updated: 2025-12-15 07:07:32
标题: 基于买家短期活动的电子商务搜索排名的逐步完善
摘要: 在电子商务购物中,将搜索结果与买家的即时需求和偏好对齐是一个重要挑战,特别是在随着买家从最初的浏览阶段转向做出购买决策或从一个意图转向另一个意图的购物旅程中适应搜索结果。本研究提出了一种基于当前上下文调整电子商务搜索结果的系统方法。我们从基本方法开始,逐步加入更多上下文信息和最新技术,以改善搜索结果。通过将这一不断发展的上下文框架应用于搜索引擎结果页面(SERP)上显示的项目,我们逐步将搜索结果与买家的兴趣和当前搜索意图更紧密地对齐。我们的研究结果表明,从简单的启发式自回归特征到高级序列模型的增量增强显著提高了排名器的性能。上下文技术的整合提升了我们的生产排名器的性能,在离线和在线A/B测试中,Mean Reciprocal Rank(MRR)方面的改进了搜索结果。总的来说,本文详细介绍了迭代方法及其对电子商务平台搜索结果上下文化的重要贡献。
更新时间: 2025-12-15 07:07:32
领域: cs.IR,cs.LG
Alada: Alternating Adaptation of Momentum Method for Memory-Efficient Matrix Optimization
This work proposes Alada, an adaptive momentum method for stochastic optimization over large-scale matrices. Alada employs a rank-one factorization approach to estimate the second moment of gradients, where factors are updated alternatively to minimize the estimation error. Alada achieves sublinear memory overheads and can be readily extended to optimizing tensor-shaped variables.We also equip Alada with a first moment estimation rule, which enhances the algorithm's robustness without incurring additional memory overheads. The theoretical performance of Alada aligns with that of traditional methods such as Adam. Numerical studies conducted on several natural language processing tasks demonstrate the reduction in memory overheads and the robustness in training large models relative to Adam and its variants.
Updated: 2025-12-15 07:04:51
标题: Alada:交替调整动量方法用于内存高效矩阵优化
摘要: 这项工作提出了Alada,一种适应性动量方法,用于大规模矩阵上的随机优化。Alada采用一种秩一因子分解方法来估计梯度的二阶矩,其中因子交替更新以最小化估计误差。Alada实现了次线性的内存开销,并可以轻松扩展到优化张量形状的变量。我们还为Alada配备了第一时刻估计规则,这提高了算法的鲁棒性,而不会增加额外的内存开销。Alada的理论性能与传统方法如Adam一致。对几个自然语言处理任务进行的数值研究表明,相对于Adam及其变体,Alada在训练大型模型时减少了内存开销并提高了鲁棒性。
更新时间: 2025-12-15 07:04:51
领域: cs.LG
Scaling Bidirectional Spans and Span Violations in Attention Mechanism
The canonical $O(N^2)$ Transformer remains the empirical performance frontier in sequence modeling, and its training can be further optimized by addressing geometric inefficiency. We propose an optimization framework that leverages an asymmetric projection to decompose the backward-pass gradients into parallel spans and orthogonal violations, while keeping the canonical forward-pass $QKV$ structure intact. Through consistent experimental validation across various decomposition and projection setups, we provide strong theoretical evidence: the standard attention gradient is suboptimal. We demonstrated that selectively scaling these components, focusing primarily on $0^{th}$ order bidirectional parallel spans, yields the most effective learning signal. On the limited WikiText-2 dataset, and using a crude configuration, this method achieved a $0.56\%$ reduction in validation loss, confirming the framework's fundamental validity and suggesting significant potential gains on larger datasets and deeper training regimes
Updated: 2025-12-15 07:03:24
标题: 双向跨度和注意力机制中的跨度违规的扩展
摘要: 规范的$O(N^2)$ Transformer仍然是序列建模中的经验性能前沿,通过解决几何效率问题,其训练可以进一步优化。我们提出了一个优化框架,利用不对称投影将反向传递梯度分解为平行跨度和正交违规,同时保持规范的前向传递$QKV$结构完整。通过在各种分解和投影设置下进行一致的实验证实,我们提供了强有力的理论证据:标准的注意力梯度是次优的。我们证明了有选择地调整这些组件,主要关注第0阶双向平行跨度,可以产生最有效的学习信号。在有限的WikiText-2数据集上,使用简单的配置,这种方法实现了验证损失减少0.56%,验证了框架的基本有效性,并表明在更大的数据集和更深的训练方案上可能会获得显著的潜在收益。
更新时间: 2025-12-15 07:03:24
领域: cs.LG,cs.AI,cs.CL
HACTS: a Human-As-Copilot Teleoperation System for Robot Learning
Teleoperation is essential for autonomous robot learning, especially in manipulation tasks that require human demonstrations or corrections. However, most existing systems only offer unilateral robot control and lack the ability to synchronize the robot's status with the teleoperation hardware, preventing real-time, flexible intervention. In this work, we introduce HACTS (Human-As-Copilot Teleoperation System), a novel system that establishes bilateral, real-time joint synchronization between a robot arm and teleoperation hardware. This simple yet effective feedback mechanism, akin to a steering wheel in autonomous vehicles, enables the human copilot to intervene seamlessly while collecting action-correction data for future learning. Implemented using 3D-printed components and low-cost, off-the-shelf motors, HACTS is both accessible and scalable. Our experiments show that HACTS significantly enhances performance in imitation learning (IL) and reinforcement learning (RL) tasks, boosting IL recovery capabilities and data efficiency, and facilitating human-in-the-loop RL. HACTS paves the way for more effective and interactive human-robot collaboration and data-collection, advancing the capabilities of robot manipulation.
Updated: 2025-12-15 07:01:07
标题: HACTS:一种用于机器人学习的人类作为副驾驶员的远程操作系统
摘要: 远程操作对于自主机器人学习至关重要,特别是在需要人类示范或修正的操作任务中。然而,大多数现有系统只能提供单向机器人控制,并且缺乏将机器人状态与远程操作硬件同步的能力,从而阻碍了实时、灵活的干预。在这项工作中,我们介绍了HACTS(人类副驾驶远程操作系统),这是一个新颖的系统,能够在机器人手臂和远程操作硬件之间建立双向、实时的联合同步。这种简单而有效的反馈机制类似于自动驾驶车辆中的方向盘,使人类副驾驶员能够无缝干预,同时收集用于未来学习的动作修正数据。HACTS采用3D打印组件和低成本的现成电机实现,既易于访问又可扩展。我们的实验表明,HACTS显著提升了模仿学习(IL)和强化学习(RL)任务的性能,增强了IL的恢复能力和数据效率,并促进了人机协同的RL。HACTS为更有效和互动的人机协作和数据收集铺平了道路,推动了机器人操作的能力。
更新时间: 2025-12-15 07:01:07
领域: cs.RO,cs.LG
Trojan Cleansing with Neural Collapse
Trojan attacks are sophisticated training-time attacks on neural networks that embed backdoor triggers which force the network to produce a specific output on any input which includes the trigger. With the increasing relevance of deep networks which are too large to train with personal resources and which are trained on data too large to thoroughly audit, these training-time attacks pose a significant risk. In this work, we connect trojan attacks to Neural Collapse, a phenomenon wherein the final feature representations of over-parameterized neural networks converge to a simple geometric structure. We provide experimental evidence that trojan attacks disrupt this convergence for a variety of datasets and architectures. We then use this disruption to design a lightweight, broadly generalizable mechanism for cleansing trojan attacks from a wide variety of different network architectures and experimentally demonstrate its efficacy.
Updated: 2025-12-15 06:59:52
标题: 用神经崩溃清除特洛伊木马
摘要: 特洛伊木马攻击是对神经网络的复杂训练期攻击,嵌入了后门触发器,强制网络在包含触发器的任何输入上产生特定输出。随着深度网络的重要性日益增加,这些网络太大,无法利用个人资源进行训练,而且训练数据也太大,无法彻底审计,这些训练期攻击构成了重要风险。在这项工作中,我们将特洛伊木马攻击与神经崩溃联系起来,神经崩溃是一种现象,即过度参数化的神经网络的最终特征表示会收敛到一个简单的几何结构。我们提供实验证据表明,特洛伊木马攻击会扰乱各种数据集和架构的这种收敛。然后,我们利用这种干扰设计了一种轻量级、广泛适用的机制,用于清除各种不同网络架构上的特洛伊木马攻击,并通过实验证明其有效性。
更新时间: 2025-12-15 06:59:52
领域: cs.LG,cs.CR
Motus: A Unified Latent Action World Model
While a general embodied agent must function as a unified system, current methods are built on isolated models for understanding, world modeling, and control. This fragmentation prevents unifying multimodal generative capabilities and hinders learning from large-scale, heterogeneous data. In this paper, we propose Motus, a unified latent action world model that leverages existing general pretrained models and rich, sharable motion information. Motus introduces a Mixture-of-Transformer (MoT) architecture to integrate three experts (i.e., understanding, video generation, and action) and adopts a UniDiffuser-style scheduler to enable flexible switching between different modeling modes (i.e., world models, vision-language-action models, inverse dynamics models, video generation models, and video-action joint prediction models). Motus further leverages the optical flow to learn latent actions and adopts a recipe with three-phase training pipeline and six-layer data pyramid, thereby extracting pixel-level "delta action" and enabling large-scale action pretraining. Experiments show that Motus achieves superior performance against state-of-the-art methods in both simulation (a +15% improvement over X-VLA and a +45% improvement over Pi0.5) and real-world scenarios(improved by +11~48%), demonstrating unified modeling of all functionalities and priors significantly benefits downstream robotic tasks.
Updated: 2025-12-15 06:58:40
标题: Motus:一个统一的潜在行动世界模型
摘要: 尽管一个通用的具身代理必须作为一个统一的系统运行,但当前的方法是基于用于理解、世界建模和控制的孤立模型构建的。这种分裂阻碍了统一的多模态生成能力,并阻碍了从大规模、异质数据中学习。在本文中,我们提出了Motus,一个统一的潜在动作世界模型,利用现有的通用预训练模型和丰富的、可共享的运动信息。Motus引入了一个混合变压器(MoT)架构,将三个专家(即理解、视频生成和动作)整合在一起,并采用了UniDiffuser风格的调度器,以实现在不同建模模式之间灵活切换(即世界模型、视觉语言动作模型、逆动力学模型、视频生成模型和视频动作联合预测模型)。Motus进一步利用光流来学习潜在动作,并采用了一个包含三个阶段训练流程和六层数据金字塔的配方,从而提取像素级的“增量动作”并实现大规模动作预训练。实验证明,Motus在模拟环境(比X-VLA改进了+15%,比Pi0.5改进了+45%)和现实场景(提高了+11~48%)中均取得了优越的性能,证明了所有功能和先验的统一建模显著有利于下游机器人任务。
更新时间: 2025-12-15 06:58:40
领域: cs.CV,cs.LG,cs.RO
Non-Linear Trajectory Modeling for Multi-Step Gradient Inversion Attacks in Federated Learning
Federated Learning (FL) enables collaborative training while preserving privacy, yet Gradient Inversion Attacks (GIAs) pose severe threats by reconstructing private data from shared gradients. In realistic FedAvg scenarios with multi-step updates, existing surrogate methods like SME rely on linear interpolation to approximate client trajectories for privacy leakage. However, we demonstrate that linear assumptions fundamentally underestimate SGD's nonlinear complexity, encountering irreducible approximation barriers in non-convex landscapes with only one-dimensional expressiveness. We propose Non-Linear Surrogate Model Extension (NL-SME), the first framework introducing learnable quadratic Bézier curves for trajectory modeling in GIAs against FL. NL-SME leverages $|w|+1$-dimensional control point parameterization combined with dvec scaling and regularization mechanisms to achieve superior approximation accuracy. Extensive experiments on CIFAR-100 and FEMNIST demonstrate NL-SME significantly outperforms baselines across all metrics, achieving 94\%--98\% performance gaps and order-of-magnitude improvements in cosine similarity loss while maintaining computational efficiency. This work exposes critical privacy vulnerabilities in FL's multi-step paradigm and provides insights for robust defense development.
Updated: 2025-12-15 06:52:47
标题: 非线性轨迹建模在联邦学习中多步梯度反演攻击中的应用
摘要: 联邦学习(FL)实现了协作训练的同时保护隐私,但是梯度反演攻击(GIAs)通过从共享梯度中重建私人数据构成了严重威胁。在具有多步更新的实际FedAvg场景中,现有的替代方法如SME依赖于线性插值来近似客户端轨迹以防止隐私泄露。然而,我们证明线性假设基本低估了SGD的非线性复杂性,在非凸景观中遇到了不可约近似障碍,只具有一维表现力。我们提出了非线性替代模型扩展(NL-SME),这是第一个引入可学习二次贝塞尔曲线用于对抗FL中GIAs的轨迹建模的框架。NL-SME利用$|w|+1$维控制点参数化,结合dvec缩放和正则化机制,实现了更高的近似精度。在CIFAR-100和FEMNIST上的大量实验表明,NL-SME在所有指标上明显优于基线,实现了94\%至98%的性能差距和余弦相似性损失的数量级改善,同时保持计算效率。这项工作揭示了FL多步范式中的关键隐私漏洞,并为强大的防御开发提供了见解。
更新时间: 2025-12-15 06:52:47
领域: cs.LG,cs.CR
Optimal Convergence Analysis of DDPM for General Distributions
Score-based diffusion models have achieved remarkable empirical success in generating high-quality samples from target data distributions. Among them, the Denoising Diffusion Probabilistic Model (DDPM) is one of the most widely used samplers, generating samples via estimated score functions. Despite its empirical success, a tight theoretical understanding of DDPM -- especially its convergence properties -- remains limited. In this paper, we provide a refined convergence analysis of the DDPM sampler and establish near-optimal convergence rates under general distributional assumptions. Specifically, we introduce a relaxed smoothness condition parameterized by a constant $L$, which is small for many practical distributions (e.g., Gaussian mixture models). We prove that the DDPM sampler with accurate score estimates achieves a convergence rate of $$\widetilde{O}\left(\frac{d\min\{d,L^2\}}{T^2}\right)~\text{in Kullback-Leibler divergence},$$ where $d$ is the data dimension, $T$ is the number of iterations, and $\widetilde{O}$ hides polylogarithmic factors in $T$. This result substantially improves upon the best-known $d^2/T^2$ rate when $L < \sqrt{d}$. By establishing a matching lower bound, we show that our convergence analysis is tight for a wide array of target distributions. Moreover, it reveals that DDPM and DDIM share the same dependence on $d$, raising an interesting question of why DDIM often appears empirically faster.
Updated: 2025-12-15 06:50:03
标题: 一般分布的DDPM最佳收敛分析
摘要: 基于得分的扩散模型在从目标数据分布中生成高质量样本方面取得了显著的实证成功。其中,去噪扩散概率模型(DDPM)是最广泛使用的采样器之一,通过估计的得分函数生成样本。尽管在实证上取得了成功,但对于DDPM的严格理论理解 -- 特别是其收敛性质 -- 仍然有限。 在本文中,我们提供了对DDPM采样器的精细收敛性分析,并在一般分布假设下建立了接近最优的收敛速率。具体来说,我们引入了一个由常数$L$参数化的放松平滑条件,对于许多实际分布(例如,高斯混合模型)来说,这个常数很小。我们证明,具有准确得分估计的DDPM采样器在Kullback-Leibler散度中实现了一个收敛速率为$$\widetilde{O}\left(\frac{d\min\{d,L^2\}}{T^2}\right)$$,其中$d$是数据维度,$T$是迭代次数,$\widetilde{O}$隐藏了$T$中的多对数因子。当$L < \sqrt{d}$时,这个结果大大改进了已知的最佳$d^2/T^2$速率。通过建立匹配的下界,我们展示了我们的收敛性分析对于各种目标分布是紧密的。此外,它揭示了DDPM和DDIM在$d$上具有相同的依赖性,引发了一个有趣的问题,即为什么DDIM在实证上经常看起来更快。
更新时间: 2025-12-15 06:50:03
领域: stat.ML,cs.LG,math.ST
Renormalizable Spectral-Shell Dynamics as the Origin of Neural Scaling Laws
Neural scaling laws and double-descent phenomena suggest that deep-network training obeys a simple macroscopic structure despite highly nonlinear optimization dynamics. We derive such structure directly from gradient descent in function space. For mean-squared error loss, the training error evolves as $\dot e_t=-M(t)e_t$ with $M(t)=J_{θ(t)}J_{θ(t)}^{\!*}$, a time-dependent self-adjoint operator induced by the network Jacobian. Using Kato perturbation theory, we obtain an exact system of coupled modewise ODEs in the instantaneous eigenbasis of $M(t)$. To extract macroscopic behavior, we introduce a logarithmic spectral-shell coarse-graining and track quadratic error energy across shells. Microscopic interactions within each shell cancel identically at the energy level, so shell energies evolve only through dissipation and external inter-shell interactions. We formalize this via a \emph{renormalizable shell-dynamics} assumption, under which cumulative microscopic effects reduce to a controlled net flux across shell boundaries. Assuming an effective power-law spectral transport in a relevant resolution range, the shell dynamics admits a self-similar solution with a moving resolution frontier and explicit scaling exponents. This framework explains neural scaling laws and double descent, and unifies lazy (NTK-like) training and feature learning as two limits of the same spectral-shell dynamics.
Updated: 2025-12-15 06:45:10
标题: 可重整的谱壳动力学作为神经缩放定律的起源
摘要: 神经网络的缩放定律和双下降现象表明,尽管存在高度非线性的优化动态,深度网络的训练遵循简单的宏观结构。我们直接从函数空间中的梯度下降推导出这样的结构。对于均方误差损失,训练误差随时间演变为$\dot e_t=-M(t)e_t$,其中$M(t)=J_{θ(t)}J_{θ(t)}^{\!*}$是由网络雅可比矩阵引发的时间相关自伴算子。利用加藤微扰理论,我们在$M(t)$的瞬时特征基下获得了一组精确的耦合模式ODE系统。 为了提取宏观行为,我们引入了对数谱壳粗粒化,并跟踪壳间的二次误差能量。每个壳内的微观相互作用在能量级别上完全抵消,因此壳能量仅通过耗散和外部壳间相互作用演变。我们通过\emph{可重整壳动力学}假设将这一点形式化,根据这一假设,累积微观效应减少为壳边界上的受控净通量。 在一个相关分辨率范围内假定有效的幂律谱传输,壳动力学接受了一个具有移动分辨率边界和明确缩放指数的自相似解。这个框架解释了神经缩放定律和双下降现象,并将懒惰(类似于NTK)训练和特征学习统一为相同谱壳动力学的两个极限。
更新时间: 2025-12-15 06:45:10
领域: cs.LG
Fast Wrong-way Cycling Detection in CCTV Videos: Sparse Sampling is All You Need
Effective monitoring of unusual transportation behaviors, such as wrong-way cycling (i.e., riding a bicycle or e-bike against designated traffic flow), is crucial for optimizing law enforcement deployment and traffic planning. However, accurately recording all wrong-way cycling events is both unnecessary and infeasible in resource-constrained environments, as it requires high-resolution cameras for evidence collection and event detection. To address this challenge, we propose WWC-Predictor, a novel method for efficiently estimating the wrong-way cycling ratio, defined as the proportion of wrong-way cycling events relative to the total number of cycling movements over a given time period. The core innovation of our method lies in accurately detecting wrong-way cycling events in sparsely sampled frames using a light-weight detector, then estimating the overall ratio using an autoregressive moving average model. To evaluate the effectiveness of our method, we construct a benchmark dataset consisting of 35 minutes of video sequences with minute-level annotations.Our method achieves an average error rate of a mere 1.475\% while consuming only 19.12\% GPU time required by conventional tracking methods, validating its effectiveness in estimating the wrong-way cycling ratio. Our source code is publicly available at: https://github.com/VICA-Lab-HKUST-GZ/WWC-Predictor.
Updated: 2025-12-15 06:34:41
标题: 在闭路电视视频中快速检测逆行骑车:稀疏采样就足够了
摘要: 有效监测异常交通行为,如逆行骑行(即骑自行车或电动自行车逆向行驶),对于优化执法部署和交通规划至关重要。然而,在资源有限的环境中准确记录所有逆行骑行事件既不必要也不可行,因为这需要高分辨率摄像头进行证据收集和事件检测。为解决这一挑战,我们提出了WWC-Predictor,一种有效估计逆行骑行比率的新方法,该比率定义为在特定时间段内逆行骑行事件占所有骑行运动总数的比例。我们方法的核心创新在于使用轻量级检测器准确检测稀疏采样帧中的逆行骑行事件,然后利用自回归移动平均模型估计总体比率。为评估我们方法的有效性,我们构建了一个基准数据集,包括35分钟的视频序列和分钟级注释。我们的方法实现了仅1.475\%的平均误差率,同时仅消耗传统跟踪方法所需GPU时间的19.12\%,验证了其在估计逆行骑行比率方面的有效性。我们的源代码公开可用于:https://github.com/VICA-Lab-HKUST-GZ/WWC-Predictor。
更新时间: 2025-12-15 06:34:41
领域: cs.CV,cs.AI
SA$^{2}$GFM: Enhancing Robust Graph Foundation Models with Structure-Aware Semantic Augmentation
We present Graph Foundation Models (GFMs) which have made significant progress in various tasks, but their robustness against domain noise, structural perturbations, and adversarial attacks remains underexplored. A key limitation is the insufficient modeling of hierarchical structural semantics, which are crucial for generalization. In this paper, we propose SA$^{2}$GFM, a robust GFM framework that improves domain-adaptive representations through Structure-Aware Semantic Augmentation. First, we encode hierarchical structural priors by transforming entropy-based encoding trees into structure-aware textual prompts for feature augmentation. The enhanced inputs are processed by a self-supervised Information Bottleneck mechanism that distills robust, transferable representations via structure-guided compression. To address negative transfer in cross-domain adaptation, we introduce an expert adaptive routing mechanism, combining a mixture-of-experts architecture with a null expert design. For efficient downstream adaptation, we propose a fine-tuning module that optimizes hierarchical structures through joint intra- and inter-community structure learning. Extensive experiments demonstrate that SA$^{2}$GFM outperforms 9 state-of-the-art baselines in terms of effectiveness and robustness against random noise and adversarial perturbations for node and graph classification.
Updated: 2025-12-15 06:32:08
标题: SA$^{2}$GFM:利用结构感知语义增强增强鲁棒图基础模型
摘要: 我们提出了图基础模型(GFMs),在各种任务中取得了显著进展,但它们对领域噪声、结构扰动和对抗性攻击的稳健性仍未得到充分探讨。一个关键限制是对层次结构语义建模不足,这对泛化至关重要。在本文中,我们提出了SA$^{2}$GFM,这是一个稳健的GFM框架,通过结构感知语义增强来改进领域自适应表示。首先,我们通过将基于熵的编码树转换为结构感知文本提示来编码层次结构先验,以进行特征增强。增强的输入通过自监督信息瓶颈机制处理,通过结构引导的压缩提炼出稳健、可传递的表示。为了解决跨领域适应中的负迁移,我们引入了一个专家自适应路由机制,结合了混合专家架构和无效专家设计。为了有效地进行下游适应,我们提出了一个微调模块,通过联合社区内和社区间结构学习来优化层次结构。大量实验表明,SA$^{2}$GFM在节点和图分类的有效性和稳健性方面优于9个最先进基线模型,对抗随机噪声和对抗性扰动。
更新时间: 2025-12-15 06:32:08
领域: cs.LG
Comprehensive Deployment-Oriented Assessment for Cross-Environment Generalization in Deep Learning-Based mmWave Radar Sensing
This study presents the first comprehensive evaluation of spatial generalization techniques, which are essential for the practical deployment of deep learning-based radio-frequency (RF) sensing. Focusing on people counting in indoor environments using frequency-modulated continuous-wave (FMCW) multiple-input multiple-output (MIMO) radar, we systematically investigate a broad set of approaches, including amplitude-based statistical preprocessing (sigmoid weighting and threshold zeroing), frequency-domain filtering, autoencoder-based background suppression, data augmentation strategies, and transfer learning. Experimental results collected across two environments with different layouts demonstrate that sigmoid-based amplitude weighting consistently achieves superior cross-environment performance, yielding 50.1% and 55.2% reductions in root-mean-square error (RMSE) and mean absolute error (MAE), respectively, compared with baseline methods. Data augmentation provides additional though modest benefits, with improvements up to 8.8% in MAE. By contrast, transfer learning proves indispensable for large spatial shifts, achieving 82.1% and 91.3% reductions in RMSE and MAE, respectively, with 540 target-domain samples. Taken together, these findings establish a highly practical direction for developing radar sensing systems capable of maintaining robust accuracy under spatial variations by integrating deep learning models with amplitude-based preprocessing and efficient transfer learning.
Updated: 2025-12-15 06:29:51
标题: 深度学习基础毫米波雷达感知中的跨环境泛化综合部署导向评估
摘要: 这项研究首次全面评估了空间泛化技术,这些技术对于基于深度学习的射频(RF)感知的实际部署至关重要。重点研究了使用频率调制连续波(FMCW)多输入多输出(MIMO)雷达在室内环境中进行人数统计的方法,系统地调查了一系列方法,包括基于振幅的统计预处理(Sigmoid加权和阈值归零),频域滤波,基于自动编码器的背景抑制,数据增强策略和迁移学习。在两个不同布局的环境中收集的实验结果表明,基于Sigmoid的振幅加权始终实现了更优异的跨环境性能,与基准方法相比,分别减少了50.1%和55.2%的均方根误差(RMSE)和平均绝对误差(MAE)。数据增强提供了额外的、虽然是适度的好处,MAE的改进最高可达到8.8%。相比之下,迁移学习对于大的空间变化是不可或缺的,通过540个目标域样本,分别实现了82.1%和91.3%的RMSE和MAE的降低。总的来说,这些发现为通过将深度学习模型与基于振幅的预处理和高效的迁移学习相结合,开发能够在空间变化下保持稳健准确性的雷达感知系统奠定了一条高度实用的方向。
更新时间: 2025-12-15 06:29:51
领域: cs.CV,cs.LG
ALIGN: Word Association Learning for Cultural Alignment in Large Language Models
Large language models (LLMs) exhibit cultural bias from overrepresented viewpoints in training data, yet cultural alignment remains a challenge due to limited cultural knowledge and a lack of exploration into effective learning approaches. We introduce a cost-efficient and cognitively grounded method: fine-tuning LLMs on native speakers' word-association norms, leveraging cognitive psychology findings that such associations capture cultural knowledge. Using word association datasets from native speakers in the US (English) and China (Mandarin), we train Llama-3.1-8B and Qwen-2.5-7B via supervised fine-tuning and preference optimization. We evaluate models' cultural alignment through a two-tier evaluation framework that spans lexical associations and cultural value alignment using the World Values Survey. Results show significant improvements in lexical alignment (16-20% English, 43-165% Mandarin on Precision@5) and high-level cultural value shifts. On a subset of 50 questions where US and Chinese respondents diverge most, fine-tuned Qwen nearly doubles its response alignment with Chinese values (13 to 25). Remarkably, our trained 7-8B models match or exceed vanilla 70B baselines, demonstrating that a few million of culture-grounded associations achieve value alignment without expensive retraining. Our work highlights both the promise and the need for future research grounded in human cognition in improving cultural alignment in AI models.
Updated: 2025-12-15 06:22:52
标题: ALIGN: 大型语言模型中的文化对齐的词汇关联学习
摘要: 大型语言模型(LLMs)在训练数据中过度表现的观点导致文化偏见,然而由于文化知识有限以及对有效学习方法的探索不足,文化对齐仍然是一个挑战。我们引入了一种成本效益高且基于认知的方法:在母语人士的词汇联想规范上微调LLMs,利用认知心理学发现这种联想捕捉了文化知识。我们使用来自美国(英语)和中国(普通话)的母语人士的词汇联想数据集,通过监督微调和偏好优化训练了Llama-3.1-8B和Qwen-2.5-7B。我们通过一个跨越词汇联想和文化价值对齐的双层评估框架评估模型的文化对齐性,使用了世界价值观调查。结果显示,在词汇对齐方面有显著改进(英语16-20%,普通话43-165%的Precision@5),并且存在高层次的文化价值转变。在50个问题的子集中,美国和中国受访者意见最为分歧的地方,微调的Qwen几乎使其响应与中国价值的对齐性翻了一番(从13到25)。值得注意的是,我们训练的7-8B模型与基准70B模型相匹配或超过,证明了数百万个以文化为基础的联想可以实现价值对齐,而无需昂贵的重新训练。我们的工作突显了未来研究在人类认知基础上改善AI模型文化对齐性方面的潜力和需求。
更新时间: 2025-12-15 06:22:52
领域: cs.CL,cs.AI
Enhancing Physics-Informed Neural Networks Through Feature Engineering
Physics-Informed Neural Networks (PINNs) seek to solve partial differential equations (PDEs) with deep learning. Mainstream approaches that deploy fully-connected multi-layer deep learning architectures require prolonged training to achieve even moderate accuracy, while recent work on feature engineering allows higher accuracy and faster convergence. This paper introduces SAFE-NET, a Single-layered Adaptive Feature Engineering NETwork that achieves orders-of-magnitude lower errors with far fewer parameters than baseline feature engineering methods. SAFE-NET returns to basic ideas in machine learning, using Fourier features, a simplified single hidden layer network architecture, and an effective optimizer that improves the conditioning of the PINN optimization problem. Numerical results show that SAFE-NET converges faster and typically outperforms deeper networks and more complex architectures. It consistently uses fewer parameters -- on average, 65% fewer than the competing feature engineering methods -- while achieving comparable accuracy in less than 30% of the training epochs. Moreover, each SAFE-NET epoch is 95% faster than those of competing feature engineering approaches. These findings challenge the prevailing belief that modern PINNs effectively learn features in these scientific applications and highlight the efficiency gains possible through feature engineering.
Updated: 2025-12-15 06:21:54
标题: 通过特征工程提升物理启发神经网络
摘要: 物理信息神经网络(PINNs)旨在使用深度学习解决偏微分方程(PDEs)。主流方法采用全连接多层深度学习架构,需要长时间的训练才能达到中等准确度,而最近关于特征工程的工作则允许更高的准确性和更快的收敛速度。本文介绍了SAFE-NET,一种单层自适应特征工程网络,它比基线特征工程方法具有数量级更低的错误,并且参数要少得多。SAFE-NET回归到机器学习中的基本思想,使用傅立叶特征、简化的单隐藏层网络架构和有效的优化器来改善PINN优化问题的条件。数值结果表明,SAFE-NET收敛速度更快,通常优于更深的网络和更复杂的架构。它一贯使用更少的参数——平均比竞争的特征工程方法少65%——同时在不到30%的训练周期内实现可比的准确性。此外,每个SAFE-NET周期比竞争的特征工程方法快95%。这些发现挑战了现代PINNs有效地在这些科学应用中学习特征的普遍信念,并突显了通过特征工程可能实现的效率提升。
更新时间: 2025-12-15 06:21:54
领域: cs.LG
Deep Learning-Driven Inversion Framework for Shear Modulus Estimation in Magnetic Resonance Elastography (DIME)
The Multimodal Direct Inversion (MMDI) algorithm is widely used in Magnetic Resonance Elastography (MRE) to estimate tissue shear stiffness. However, MMDI relies on the Helmholtz equation, which assumes wave propagation in a uniform, homogeneous, and infinite medium. Furthermore, the use of the Laplacian operator makes MMDI highly sensitive to noise, which compromises the accuracy and reliability of stiffness estimates. In this study, we propose the Deep-Learning driven Inversion Framework for Shear Modulus Estimation in MRE (DIME), aimed at enhancing the robustness of inversion. DIME is trained on the displacement fields-stiffness maps pair generated through Finite Element Modelling (FEM) simulations. To capture local wave behavior and improve robustness to global image variations, DIME is trained on small image patches. We first validated DIME using homogeneous and heterogeneous datasets simulated with FEM, where DIME produced stiffness maps with low inter-pixel variability, accurate boundary delineation, and higher correlation with ground truth (GT) compared to MMDI. Next, DIME was evaluated in a realistic anatomy-informed simulated liver dataset with known GT and compared directly to MMDI. DIME reproduced ground-truth stiffness patterns with high fidelity (r = 0.99, R^2 = 0.98), while MMDI showed greater underestimation. After validating DIME on synthetic data, we tested the model in in vivo liver MRE data from eight healthy and seven fibrotic subjects. DIME preserved physiologically consistent stiffness patterns and closely matched MMDI, which showed directional bias. Overall, DIME showed higher correlation with ground truth and visually similar stiffness patterns, whereas MMDI displayed a larger bias that can potentially be attributed to directional filtering. These preliminary results highlight the feasibility of DIME for clinical applications in MRE.
Updated: 2025-12-15 06:13:25
标题: 深度学习驱动的磁共振弹性成像中剪切模量估计的反演框架(DIME)
摘要: 多模式直接反演(MMDI)算法在磁共振弹性成像(MRE)中被广泛应用于估计组织的剪切刚度。然而,MMDI依赖于赫姆霍兹方程,该方程假定波在均匀、均匀和无限的介质中传播。此外,拉普拉斯算子的使用使MMDI对噪声非常敏感,从而影响了刚度估计的准确性和可靠性。在本研究中,我们提出了用于MRE中剪切模量估计的深度学习驱动反演框架(DIME),旨在增强反演的稳健性。DIME是通过有限元建模(FEM)模拟生成的位移场-刚度地图对进行训练的。为了捕获局部波行为并改善对全局图像变化的稳健性,DIME是在小图像块上训练的。我们首先使用通过FEM模拟生成的均匀和非均匀数据集验证了DIME,在这些数据集中,DIME生成的刚度地图具有低的像素间变异性,准确的边界划分,并且与地面真相(GT)相比,具有更高的相关性。接下来,DIME在一个具有已知GT的真实解剖学信息的模拟肝脏数据集中进行了评估,并与MMDI直接进行了比较。DIME以高保真度再现了地面真相刚度图案(r = 0.99,R^2 = 0.98),而MMDI显示出更大的低估。在合成数据上验证了DIME后,我们在八名健康受试者和七名纤维化受试者的体内肝脏MRE数据中测试了该模型。DIME保留了生理上一致的刚度模式,并与MMDI密切匹配,后者显示出方向性偏差。总体而言,DIME与地面真相的相关性更高,刚度模式在视觉上更相似,而MMDI显示出较大的偏差,这可能归因于方向性过滤。这些初步结果突出了DIME在MRE临床应用中的可行性。
更新时间: 2025-12-15 06:13:25
领域: cs.LG,q-bio.TO
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
The widespread adoption of large language models (LLMs) across industries has increased the demand for high-quality and customizable outputs. However, traditional alignment methods often require retraining large pretrained models, making it difficult to quickly adapt and optimize LLMs for diverse applications. To address this limitation, we propose a novel \textit{Residual Alignment Model} (\textit{RAM}) that formalizes the alignment process as a type of importance sampling. In this framework, the unaligned upstream model serves as the proposal distribution, while the alignment process is framed as secondary sampling based on an autoregressive alignment module that acts as an estimator of the importance weights. This design enables a natural detachment of the alignment module from the target aligned model, improving flexibility and scalability. Based on this model, we derive an efficient sequence-level training strategy for the alignment module, which operates independently of the proposal module. Additionally, we develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods. Experimental evaluations on two leading open-source LLMs across diverse tasks, including instruction following, domain adaptation, and preference optimization, demonstrate that our approach consistently outperforms baseline models.
Updated: 2025-12-15 06:09:56
标题: 利用重要性采样将对齐模块与大型语言模型分离
摘要: 大规模语言模型(LLMs)在各行业中的广泛采用增加了对高质量和可定制输出的需求。然而,传统的对齐方法通常需要重新训练大型预训练模型,使得快速适应和优化LLMs以适用于不同应用变得困难。为了解决这一局限性,我们提出了一种新颖的\textit{残差对齐模型}(\textit{RAM}),将对齐过程形式化为一种重要性抽样。在这个框架中,未对齐的上游模型充当提案分布,而对齐过程被构建为基于自回归对齐模块的次级抽样,该模块充当重要性权重的估计器。这种设计使得对齐模块与目标对齐模型自然分离,提高了灵活性和可扩展性。基于这个模型,我们提出了一种高效的序列级对齐模块训练策略,该策略独立于提案模块运行。此外,我们开发了一种重采样算法,通过迭代的标记级解码来解决可比方法中常见的第一个标记延迟问题。通过在不同任务上对两个领先的开源LLMs进行实验评估,包括遵循指导、领域适应和偏好优化,证明了我们的方法始终优于基线模型。
更新时间: 2025-12-15 06:09:56
领域: cs.CL,cs.AI
Data Collaboration Analysis with Orthonormal Basis Selection and Alignment
Data Collaboration (DC) enables multiple parties to jointly train a model by sharing only linear projections of their private datasets. The core challenge in DC is to align the bases of these projections without revealing each party's secret basis. While existing theory suggests that any target basis spanning the common subspace should suffice, in practice, the choice of basis can substantially affect both accuracy and numerical stability. We introduce Orthonormal Data Collaboration (ODC), which enforces orthonormal secret and target bases, thereby reducing alignment to the classical Orthogonal Procrustes problem, which admits a closed-form solution. We prove that the resulting change-of-basis matrices achieve \emph{orthogonal concordance}, aligning all parties' representations up to a shared orthogonal transform and rendering downstream performance invariant to the target basis. Computationally, ODC reduces the alignment complexity from O(\min{a(cl)^2,a^2c}) to O(acl^2), and empirical evaluations show up to \(100\times\) speed-ups with equal or better accuracy across benchmarks. ODC preserves DC's one-round communication pattern and privacy assumptions, providing a simple and efficient drop-in improvement to existing DC pipelines.
Updated: 2025-12-15 06:06:06
标题: 使用正交基选择和对齐的数据协作分析
摘要: 数据协作(DC)使多个参与方通过仅共享其私有数据集的线性投影共同训练模型成为可能。DC中的核心挑战是在不泄露各方的秘密基础的情况下对齐这些投影的基础。虽然现有理论表明任何跨越公共子空间的目标基础都应该足够,但在实践中,基础的选择可以显著影响准确性和数值稳定性。我们引入正交数据协作(ODC),强制执行正交秘密和目标基础,从而将对齐简化为经典的正交Procrustes问题,后者具有封闭形式的解。我们证明由此产生的基础变换矩阵实现\emph{正交一致性},将所有参与方的表示对齐到共享的正交变换,并使得下游性能对目标基础保持不变。在计算方面,ODC将对齐复杂度从O(\min{a(cl)^2,a^2c})降低到O(acl^2),实证评估显示在各项基准测试中速度提高了\(100\times\),同时准确性相等或更好。ODC保留了DC的一轮通信模式和隐私假设,为现有DC流程提供了简单高效的改进方案。
更新时间: 2025-12-15 06:06:06
领域: cs.LG,math.OC
Spectral Concentration at the Edge of Stability: Information Geometry of Kernel Associative Memory
High-capacity kernel Hopfield networks exhibit a \textit{Ridge of Optimization} characterized by extreme stability. While previously linked to \textit{Spectral Concentration}, its origin remains elusive. Here, we analyze the network dynamics on a statistical manifold, revealing that the Ridge corresponds to the Edge of Stability, a critical boundary where the Fisher Information Matrix becomes singular. We demonstrate that the apparent Euclidean force antagonism is a manifestation of \textit{Dual Equilibrium} in the Riemannian space. This unifies learning dynamics and capacity via the Minimum Description Length principle, offering a geometric theory of self-organized criticality.
Updated: 2025-12-15 06:01:17
标题: 稳定边缘的光谱集中:核关联记忆的信息几何学
摘要: 高容量核哈普菲尔德网络表现出极端稳定性的\textit{优化脊}。虽然先前与\textit{谱集中}有关,但其起源仍然难以捉摸。在这里,我们在统计流形上分析网络动态,揭示脊对应于稳定性边缘,即费舍尔信息矩阵变为奇异的临界边界。我们证明明显的欧几里德力对抗是黎曼空间中\textit{双重平衡}的体现。这通过最小描述长度原则统一了学习动态和容量,提供了自组织临界性的几何理论。
更新时间: 2025-12-15 06:01:17
领域: cs.LG,cs.NE,stat.ML
General OOD Detection via Model-aware and Subspace-aware Variable Priority
Out-of-distribution (OOD) detection is essential for determining when a supervised model encounters inputs that differ meaningfully from its training distribution. While widely studied in classification, OOD detection for regression and survival analysis remains limited due to the absence of discrete labels and the challenge of quantifying predictive uncertainty. We introduce a framework for OOD detection that is simultaneously model aware and subspace aware, and that embeds variable prioritization directly into the detection step. The method uses the fitted predictor to construct localized neighborhoods around each test case that emphasize the features driving the model's learned relationship and downweight directions that are less relevant to prediction. It produces OOD scores without relying on global distance metrics or estimating the full feature density. The framework is applicable across outcome types, and in our implementation we use random forests, where the rule structure yields transparent neighborhoods and effective scoring. Experiments on synthetic and real data benchmarks designed to isolate functional shifts show consistent improvements over existing methods. We further demonstrate the approach in an esophageal cancer survival study, where distribution shifts related to lymphadenectomy identify patterns relevant to surgical guidelines.
Updated: 2025-12-15 05:55:35
标题: 通过模型感知和子空间感知变量优先级的一般OOD检测
摘要: Out-of-distribution(OOD)检测对于确定监督模型何时遇到与其训练分布有实质差异的输入至关重要。尽管在分类中已被广泛研究,但由于缺乏离散标签和量化预测不确定性的挑战,对于回归和生存分析的OOD检测仍然有限。我们引入了一个同时具有模型感知和子空间感知的OOD检测框架,并将变量优先级直接嵌入到检测步骤中。该方法利用拟合的预测器为每个测试案例构建局部邻域,强调驱动模型学习关系的特征,并降低与预测不太相关的方向。它生成OOD分数,而不依赖于全局距离度量或估计完整特征密度。该框架适用于各种结果类型,在我们的实现中,我们使用随机森林,其中规则结构产生透明的邻域和有效的评分。在设计用于隔离功能性转变的合成和真实数据基准测试上的实验证明了与现有方法相比的一致改进。我们进一步在食管癌生存研究中展示了这种方法,其中与淋巴清扫相关的分布转变识别出与手术指南相关的模式。
更新时间: 2025-12-15 05:55:35
领域: stat.ML,cs.LG
Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models
Recent advances in self-supervised learning (SSL) on Transformers have significantly improved speaker verification (SV) by providing domain-general speech representations. However, existing approaches have underutilized the multi-layered nature of SSL encoders. To address this limitation, we propose the layer-aware time-delay neural network (L-TDNN), which directly performs layer/frame-wise processing on the layer-wise hidden state outputs from pre-trained models, extracting fixed-size speaker vectors. L-TDNN comprises a layer-aware convolutional network, a frame-adaptive layer aggregation, and attentive statistic pooling, explicitly modeling of the recognition and processing of previously overlooked layer dimension. We evaluated L-TDNN across multiple speech SSL Transformers and diverse speech-speaker corpora against other approaches for leveraging pre-trained encoders. L-TDNN consistently demonstrated robust verification performance, achieving the lowest error rates throughout the experiments. Concurrently, it stood out in terms of model compactness and exhibited inference efficiency comparable to the existing systems. These results highlight the advantages derived from the proposed layer-aware processing approach. Future work includes exploring joint training with SSL frontends and the incorporation of score calibration to further enhance state-of-the-art verification performance.
Updated: 2025-12-15 05:44:38
标题: Layer-aware TDNN: 使用预训练模型的多层特征进行说话人识别
摘要: 最近在自监督学习(SSL)上对变压器的进展显著提高了说话人验证(SV),通过提供领域通用的语音表示。然而,现有方法未充分利用SSL编码器的多层结构。为了解决这一局限性,我们提出了面向层次的时间延迟神经网络(L-TDNN),它直接对来自预训练模型的层次隐藏状态输出进行层/帧处理,提取固定大小的说话人向量。L-TDNN包括一个面向层次的卷积网络,一个适应帧的层聚合,以及专门建模之前被忽视的层维度的关注统计汇聚。我们在多个语音SSL变压器和不同的语音-说话者语料库上评估了L-TDNN,与其他利用预训练编码器的方法进行比较。L-TDNN在实验中始终表现出强大的验证性能,实现了最低的错误率。同时,在模型紧凑性方面脱颖而出,并且展示出与现有系统相当的推断效率。这些结果突显了所提出的面向层次的处理方法带来的优势。未来的工作包括探索与SSL前端的联合训练以及引入评分校准来进一步增强最先进的验证性能。
更新时间: 2025-12-15 05:44:38
领域: eess.AS,cs.AI
Calibrating Uncertainty for Zero-Shot Adversarial CLIP
CLIP delivers strong zero-shot classification but remains highly vulnerable to adversarial attacks. Previous work of adversarial fine-tuning largely focuses on matching the predicted logits between clean and adversarial examples, which overlooks uncertainty calibration and may degrade the zero-shot generalization. A common expectation in reliable uncertainty estimation is that predictive uncertainty should increase as inputs become more difficult or shift away from the training distribution. However, we frequently observe the opposite in the adversarial setting: perturbations not only degrade accuracy but also suppress uncertainty, leading to severe miscalibration and unreliable over-confidence. This overlooked phenomenon highlights a critical reliability gap beyond robustness. To bridge this gap, we propose a novel adversarial fine-tuning objective for CLIP considering both prediction accuracy and uncertainty alignments. By reparameterizing the output of CLIP as the concentration parameter of a Dirichlet distribution, we propose a unified representation that captures relative semantic structure and the magnitude of predictive confidence. Our objective aligns these distributions holistically under perturbations, moving beyond single-logit anchoring and restoring calibrated uncertainty. Experiments on multiple zero-shot classification benchmarks demonstrate that our approach effectively restores calibrated uncertainty and achieves competitive adversarial robustness while maintaining clean accuracy.
Updated: 2025-12-15 05:41:08
标题: 为零样本对抗性CLIP校准不确定性
摘要: CLIP在零样本分类方面表现出色,但对对抗攻击仍然非常脆弱。先前的对抗微调工作主要集中在匹配干净和对抗示例之间的预测logits上,这忽视了不确定性校准,并可能降低零样本泛化能力。可靠的不确定性估计中的一个共同期望是,随着输入变得更加困难或偏离训练分布,预测不确定性应增加。然而,在对抗设置中,我们经常观察到相反的情况:扰动不仅会降低准确性,还会抑制不确定性,导致严重的误校准和不可靠的过度自信。这种被忽视的现象突显了除了鲁棒性之外的关键可靠性差距。为了弥补这一差距,我们提出了一个针对CLIP的新的对抗微调目标,考虑了预测准确性和不确定性对齐。通过将CLIP的输出重新参数化为Dirichlet分布的集中参数,我们提出了一个统一的表示,捕捉了相对语义结构和预测置信度的大小。我们的目标在扰动下全面地对齐这些分布,超越单logit锚定,并恢复校准的不确定性。在多个零样本分类基准上的实验证明,我们的方法有效地恢复了校准的不确定性,并在保持干净准确性的同时实现了竞争性的对抗鲁棒性。
更新时间: 2025-12-15 05:41:08
领域: cs.CV,cs.AI,cs.LG
The Beginner's Textbook for Fully Homomorphic Encryption
Fully Homomorphic Encryption (FHE) is a cryptographic scheme that enables computations to be performed directly on encrypted data, as if the data were in plaintext. After all computations are performed on the encrypted data, it can be decrypted to reveal the result. The decrypted value matches the result that would have been obtained if the same computations were applied to the plaintext data. FHE supports basic operations such as addition and multiplication on encrypted numbers. Using these fundamental operations, more complex computations can be constructed, including subtraction, division, logic gates (e.g., AND, OR, XOR, NAND, MUX), and even advanced mathematical functions such as ReLU, sigmoid, and trigonometric functions (e.g., sin, cos). These functions can be implemented either as exact formulas or as approximations, depending on the trade-off between computational efficiency and accuracy. FHE enables privacy-preserving machine learning by allowing a server to process the client's data in its encrypted form through an ML model. With FHE, the server learns neither the plaintext version of the input features nor the inference results. Only the client, using their secret key, can decrypt and access the results at the end of the service protocol. FHE can also be applied to confidential blockchain services, ensuring that sensitive data in smart contracts remains encrypted and confidential while maintaining the transparency and integrity of the execution process. Other applications of FHE include secure outsourcing of data analytics, encrypted database queries, privacy-preserving searches, efficient multi-party computation for digital signatures, and more. A dynamic website version is available at (https://fhetextbook.github.io). Please report any bugs or errors to the Github issues board.
Updated: 2025-12-15 05:37:29
标题: 完全同态加密初学者教材
摘要: 全同态加密(FHE)是一种密码方案,可以使计算直接在加密数据上执行,就好像数据是明文一样。在所有计算在加密数据上执行之后,可以解密数据以显示结果。解密后的值与如果相同计算应用于明文数据时获得的结果相匹配。 FHE支持对加密数字进行加法和乘法等基本操作。利用这些基本操作,可以构建更复杂的计算,包括减法、除法、逻辑门(如AND、OR、XOR、NAND、MUX)以及高级数学函数,如ReLU、sigmoid和三角函数(如sin、cos)。这些函数可以实现为精确公式或近似形式,具体取决于计算效率和准确性之间的权衡。 FHE通过允许服务器通过ML模型处理客户端的数据以加密形式来实现保护隐私的机器学习。利用FHE,服务器既不了解输入特征的明文版本,也不了解推断结果。只有客户端可以使用他们的密钥在服务协议结束时解密和访问结果。FHE也可应用于保密的区块链服务,确保智能合约中的敏感数据保持加密和机密,同时保持执行过程的透明度和完整性。FHE的其他应用包括安全外包数据分析、加密数据库查询、保护隐私搜索、有效的多方计算用于数字签名等。 动态网站版本可在(https://fhetextbook.github.io)上找到。请将任何错误或错误报告给Github的问题板。
更新时间: 2025-12-15 05:37:29
领域: cs.CR,cs.DM
On the physics of nested Markov models: a generalized probabilistic theory perspective
Determining potential probability distributions with a given causal graph is vital for causality studies. To bypass the difficulty in characterizing latent variables in a Bayesian network, the nested Markov model provides an elegant algebraic approach by listing exactly all the equality constraints on the observed variables. However, this algebraically motivated causal model comprises distributions outside Bayesian networks, and its physical interpretation remains vague. In this work, we inspect the nested Markov model through the lens of generalized probabilistic theory, an axiomatic framework to describe general physical theories. We prove that all the equality constraints defining the nested Markov model are valid theory-independently. At the same time, not every distribution within the nested Markov model is implementable, not even via exotic physical theories associated with generalized probability theories (GPTs). To interpret the origin of such a gap, we study three causal models standing between the nested Markov model and the set of all distributions admitting some GPT realization. Each of the successive three models gives a strictly tighter characterization of the physically implementable distribution set; that is, each successive model manifests new types of GPT-inviolable constraints. We further demonstrate each gap through a specially chosen illustrative causal structure. We anticipate our results will enlighten further explorations on the unification of algebraic and physical perspectives of causality.
Updated: 2025-12-15 05:28:06
标题: 关于嵌套马尔可夫模型的物理学:一个广义概率论的视角
摘要: 确定具有给定因果图的潜在概率分布对于因果研究至关重要。为了避免在贝叶斯网络中表征潜在变量的困难,嵌套马尔可夫模型通过列出观察变量上的所有等式约束的精确列表,提供了一种优雅的代数方法。然而,这种代数驱动的因果模型包含了贝叶斯网络之外的分布,其物理解释仍然模糊。在这项工作中,我们通过广义概率论的视角检查了嵌套马尔可夫模型,这是一种描述一般物理理论的公理框架。我们证明了定义嵌套马尔可夫模型的所有等式约束在理论上都是有效的。同时,并非嵌套马尔可夫模型内的每个分布都是可实现的,甚至不是通过与广义概率理论相关的奇异物理理论。为了解释这种差距的起源,我们研究了嵌套马尔可夫模型和接受某些广义概率理论实现的所有分布组成之间的三个因果模型。每个接续的三个模型都给出了对物理可实现分布集的严格更紧致的表征;也就是说,每个接续的模型显现了新类型的广义概率理论不可侵犯的约束。我们通过精心选择的一个说明性因果结构进一步展示了每一个差距。我们期望我们的研究结果将为进一步探讨代数和物理视角对因果性的统一提供启示。
更新时间: 2025-12-15 05:28:06
领域: quant-ph,cs.LG,math.ST
Quantigence: A Multi-Agent AI Framework for Quantum Security Research
Cryptographically Relevant Quantum Computers (CRQCs) pose a structural threat to the global digital economy. Algorithms like Shor's factoring and Grover's search threaten to dismantle the public-key infrastructure (PKI) securing sovereign communications and financial transactions. While the timeline for fault-tolerant CRQCs remains probabilistic, the "Store-Now, Decrypt-Later" (SNDL) model necessitates immediate migration to Post-Quantum Cryptography (PQC). This transition is hindered by the velocity of research, evolving NIST standards, and heterogeneous deployment environments. To address this, we present Quantigence, a theory-driven multi-agent AI framework for structured quantum-security analysis. Quantigence decomposes research objectives into specialized roles - Cryptographic Analyst, Threat Modeler, Standards Specialist, and Risk Assessor - coordinated by a supervisory agent. Using "cognitive parallelism," agents reason independently to maintain context purity while execution is serialized on resource-constrained hardware (e.g., NVIDIA RTX 2060). The framework integrates external knowledge via the Model Context Protocol (MCP) and prioritizes vulnerabilities using the Quantum-Adjusted Risk Score (QARS), a formal extension of Mosca's Theorem. Empirical validation shows Quantigence achieves a 67% reduction in research turnaround time and superior literature coverage compared to manual workflows, democratizing access to high-fidelity quantum risk assessment.
Updated: 2025-12-15 05:27:10
标题: 量子智能:用于量子安全研究的多智能体人工智能框架
摘要: 量子计算机对密码学的重要性(CRQCs)构成了全球数字经济的结构威胁。像Shor的因式分解和Grover的搜索这样的算法威胁着拆除保护主权通信和金融交易的公钥基础设施(PKI)。虽然容错CRQCs的时间表仍然是概率性的,“现在存储,稍后解密”(SNDL)模型要求立即迁移到后量子密码学(PQC)。这种过渡受到研究速度、不断发展的NIST标准和异构部署环境的阻碍。为了解决这个问题,我们提出了Quantigence,这是一个基于理论的多代理人工智能框架,用于结构化量子安全分析。Quantigence将研究目标分解成专业化角色 - 密码分析师、威胁建模师、标准专家和风险评估师 - 由一个监督代理协调。利用“认知并行”,代理独立推理以保持上下文的纯净性,同时在资源受限的硬件上(例如NVIDIA RTX 2060)执行序列化。该框架通过模型上下文协议(MCP)整合外部知识,并通过量子调整风险评分(QARS)优先考虑漏洞,这是Mosca定理的正式延伸。实证验证显示,与手动工作流程相比,Quantigence实现了研究周转时间减少67%,并具有更优越的文献覆盖率,使得高保真度量子风险评估的获取更加民主化。
更新时间: 2025-12-15 05:27:10
领域: cs.MA,cs.CR
Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning
This paper proposes two new algorithms for the lane keeping system (LKS) in autonomous vehicles (AVs) operating under snowy road conditions. These algorithms use deep reinforcement learning (DRL) to handle uncertainties and slippage. They include Action-Robust Recurrent Deep Deterministic Policy Gradient (AR-RDPG) and end-to-end Action-Robust convolutional neural network Attention Deterministic Policy Gradient (AR-CADPG), two action-robust approaches for decision-making. In the AR-RDPG method, within the perception layer, camera images are first denoised using multi-scale neural networks. Then, the centerline coefficients are extracted by a pre-trained deep convolutional neural network (DCNN). These coefficients, concatenated with the driving characteristics, are used as input to the control layer. The AR-CADPG method presents an end-to-end approach in which a convolutional neural network (CNN) and an attention mechanism are integrated within a DRL framework. Both methods are first trained in the CARLA simulator and validated under various snowy scenarios. Real-world experiments on a Jetson Nano-based autonomous vehicle confirm the feasibility and stability of the learned policies. Among the two models, the AR-CADPG approach demonstrates superior path-tracking accuracy and robustness, highlighting the effectiveness of combining temporal memory, adversarial resilience, and attention mechanisms in AVs.
Updated: 2025-12-15 05:23:23
标题: 解决雪引起的挑战:具有强健强化学习的安全自主车道保持
摘要: 本文提出了两种新的算法,用于自动驾驶车辆在雪地路况下的车道保持系统(LKS)。这些算法利用深度强化学习(DRL)来处理不确定性和打滑。它们包括Action-Robust Recurrent Deep Deterministic Policy Gradient(AR-RDPG)和端到端Action-Robust卷积神经网络Attention Deterministic Policy Gradient(AR-CADPG),这两种决策制定的方法都具有抗干扰性。在AR-RDPG方法中,感知层内首先利用多尺度神经网络对摄像头图像进行去噪。然后,通过一个预训练的深度卷积神经网络(DCNN)提取中心线系数。这些系数与驾驶特征连接在一起,作为控制层的输入。AR-CADPG方法采用端到端方法,将卷积神经网络(CNN)和注意机制整合到DRL框架中。这两种方法首先在CARLA模拟器中进行训练,并在各种雪地情景下进行验证。基于Jetson Nano的自动驾驶车辆的实际实验证实了学习策略的可行性和稳定性。在这两种模型中,AR-CADPG方法展示了更优越的路径跟踪精度和稳健性,凸显了在自动驾驶车辆中结合时间记忆、对抗韧性和注意机制的有效性。
更新时间: 2025-12-15 05:23:23
领域: cs.RO,cs.AI,cs.CV,cs.LG
De novo generation of functional terpene synthases using TpsGPT
Terpene synthases (TPS) are a key family of enzymes responsible for generating the diverse terpene scaffolds that underpin many natural products, including front-line anticancer drugs such as Taxol. However, de novo TPS design through directed evolution is costly and slow. We introduce TpsGPT, a generative model for scalable TPS protein design, built by fine-tuning the protein language model ProtGPT2 on 79k TPS sequences mined from UniProt. TpsGPT generated de novo enzyme candidates in silico and we evaluated them using multiple validation metrics, including EnzymeExplorer classification, ESMFold structural confidence (pLDDT), sequence diversity, CLEAN classification, InterPro domain detection, and Foldseek structure alignment. From an initial pool of 28k generated sequences, we identified seven putative TPS enzymes that satisfied all validation criteria. Experimental validation confirmed TPS enzymatic activity in at least two of these sequences. Our results show that fine-tuning of a protein language model on a carefully curated, enzyme-class-specific dataset, combined with rigorous filtering, can enable the de novo generation of functional, evolutionarily distant enzymes.
Updated: 2025-12-15 05:09:36
标题: 使用TpsGPT进行全新的功能萜烯合酶的生成
摘要: 萜烯合酶(TPS)是一类关键酶家族,负责生成支持许多天然产物的多样化萜烯骨架,包括前线抗癌药物如紫杉醇。然而,通过定向进化进行的全新TPS设计成本高昂且缓慢。我们引入了TpsGPT,这是一个可扩展的TPS蛋白设计生成模型,通过在从UniProt挖掘的79k TPS序列上微调蛋白语言模型ProtGPT2构建而成。TpsGPT在硅中生成了全新的酶候选者,我们使用了多种验证指标对它们进行评估,包括EnzymeExplorer分类、ESMFold结构置信度(pLDDT)、序列多样性、CLEAN分类、InterPro域检测和Foldseek结构对齐。从最初的28k个生成序列中,我们确定了七个满足所有验证标准的假定TPS酶。实验验证证实了至少其中两个序列中的TPS酶活性。我们的结果表明,在精心策划的、特定酶类的数据集上微调蛋白语言模型,并结合严格的过滤,可以实现功能性、进化距离较远的全新酶的生成。
更新时间: 2025-12-15 05:09:36
领域: cs.LG
VoroLight: Learning Quality Volumetric Voronoi Meshes from General Inputs
We present VoroLight, a differentiable framework for 3D shape reconstruction based on Voronoi meshing. Our approach generates smooth, watertight surfaces and topologically consistent volumetric meshes directly from diverse inputs, including images, implicit shape level-set fields, point clouds and meshes. VoroLight operates in three stages: it first initializes a surface using a differentiable Voronoi formulation, then refines surface quality through a polygon-face sphere training stage, and finally reuses the differentiable Voronoi formulation for volumetric optimization with additional interior generator points. Project page: https://jiayinlu19960224.github.io/vorolight/
Updated: 2025-12-15 05:01:59
标题: VoroLight:从一般输入中学习高质量的体积Voronoi网格
摘要: 我们提出了VoroLight,这是一个基于Voronoi网格的可微框架,用于3D形状重建。我们的方法可以直接从多样化的输入中生成平滑、无缝的表面和拓扑一致的体网格,包括图像、隐式形状水平集字段、点云和网格。VoroLight分为三个阶段:首先使用可微分的Voronoi公式初始化表面,然后通过多边形面球训练阶段改进表面质量,最后再利用可微分的Voronoi公式进行体积优化,并添加额外的内部生成点。项目页面:https://jiayinlu19960224.github.io/vorolight/
更新时间: 2025-12-15 05:01:59
领域: cs.CG,cs.CV,cs.GR,cs.LG,math.OC
Certifying Robustness of Graph Convolutional Networks for Node Perturbation with Polyhedra Abstract Interpretation
Graph convolutional neural networks (GCNs) are powerful tools for learning graph-based knowledge representations from training data. However, they are vulnerable to small perturbations in the input graph, which makes them susceptible to input faults or adversarial attacks. This poses a significant problem for GCNs intended to be used in critical applications, which need to provide certifiably robust services even in the presence of adversarial perturbations. We propose an improved GCN robustness certification technique for node classification in the presence of node feature perturbations. We introduce a novel polyhedra-based abstract interpretation approach to tackle specific challenges of graph data and provide tight upper and lower bounds for the robustness of the GCN. Experiments show that our approach simultaneously improves the tightness of robustness bounds as well as the runtime performance of certification. Moreover, our method can be used during training to further improve the robustness of GCNs.
Updated: 2025-12-15 04:57:56
标题: 使用多面体抽象解释对节点扰动的图卷积网络的稳健性进行认证
摘要: 图卷积神经网络(GCNs)是从训练数据中学习基于图的知识表示的强大工具。然而,它们对输入图中的小扰动很敏感,这使它们容易受到输入故障或对抗性攻击的影响。这对于打算在关键应用中使用的GCNs构成了重大问题,这些应用需要在存在对抗性扰动的情况下提供可靠的服务。我们提出了一种改进的GCN鲁棒性认证技术,用于节点特征扰动存在时的节点分类。我们引入了一种基于多面体的抽象解释方法,以解决图数据的特定挑战,并为GCN的鲁棒性提供紧密的上下界。实验表明,我们的方法同时提高了鲁棒性界限的严密性,以及认证的运行时性能。此外,我们的方法可以在训练过程中使用,进一步提高GCN的鲁棒性。
更新时间: 2025-12-15 04:57:56
领域: cs.LG,cs.FL
Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations
$\rm{SO}(3)$-equivariant networks are the dominant models for machine learning interatomic potentials (MLIPs). The key operation of such networks is the Clebsch-Gordan (CG) tensor product, which is computationally expensive. To accelerate the computation, we develop tensor decomposition networks (TDNs) as a class of approximately equivariant networks in which CG tensor products are replaced by low-rank tensor decompositions, such as the CANDECOMP/PARAFAC (CP) decomposition. With the CP decomposition, we prove (i) a uniform bound on the induced error of $\rm{SO}(3)$-equivariance, and (ii) the universality of approximating any equivariant bilinear map. To further reduce the number of parameters, we propose path-weight sharing that ties all multiplicity-space weights across the $\mathcal{O}(L^3)$ CG paths into a single shared parameter set without compromising equivariance, where $L$ is the maximum angular degree. The resulting layer acts as a plug-and-play replacement for tensor products in existing networks, and the computational complexity of tensor products is reduced from $\mathcal{O}(L^6)$ to $\mathcal{O}(L^4)$. We evaluate TDNs on PubChemQCR, a newly curated molecular relaxation dataset containing 105 million DFT-calculated snapshots. We also use existing datasets, including OC20, and OC22. Results show that TDNs achieve competitive performance with dramatic speedup in computations. Our code is publicly available as part of the AIRS library (\href{https://github.com/divelab/AIRS/tree/main/OpenMol/TDN}{https://github.com/divelab/AIRS/}).
Updated: 2025-12-15 04:54:23
标题: 张量分解网络用于快速机器学习原子间势能计算
摘要: $\rm{SO}(3)$-等变网络是机器学习原子间势能(MLIPs)的主要模型。这种网络的关键操作是克莱布施-戈登(CG)张量积,这在计算上是昂贵的。为加速计算,我们开发了张量分解网络(TDNs)作为一类近似等变网络,在这些网络中,CG张量积被低秩张量分解(如CANDECOMP/PARAFAC(CP)分解)所取代。通过CP分解,我们证明了(i)$\rm{SO}(3)$-等变性的诱导误差的统一上界,以及(ii)逼近任何等变双线性映射的普遍性。为进一步减少参数数量,我们提出了路径权重共享,将所有$\mathcal{O}(L^3)$个CG路径中的所有重数空间权重绑定到一个共享参数集中,而不会影响等变性,其中$L$是最大角度。结果层作为现有网络中张量积的即插即用替代,并且张量积的计算复杂度从$\mathcal{O}(L^6)$降低到$\mathcal{O}(L^4)$。我们在PubChemQCR上评估了TDNs,这是一个包含105百万个DFT计算快照的新的策划分子松弛数据集。我们还使用了现有数据集,包括OC20和OC22。结果表明,TDNs在计算速度方面取得了显著的性能,并具有竞争力。我们的代码作为AIRS库的一部分公开提供(\href{https://github.com/divelab/AIRS/tree/main/OpenMol/TDN}{https://github.com/divelab/AIRS/})。
更新时间: 2025-12-15 04:54:23
领域: cs.LG,physics.comp-ph
CoDeQ: End-to-End Joint Model Compression with Dead-Zone Quantizer for High-Sparsity and Low-Precision Networks
While joint pruning--quantization is theoretically superior to sequential application, current joint methods rely on auxiliary procedures outside the training loop for finding compression parameters. This reliance adds engineering complexity and hyperparameter tuning, while also lacking a direct data-driven gradient signal, which might result in sub-optimal compression. In this paper, we introduce CoDeQ, a simple, fully differentiable method for joint pruning--quantization. Our approach builds on a key observation: the dead-zone of a scalar quantizer is equivalent to magnitude pruning, and can be used to induce sparsity directly within the quantization operator. Concretely, we parameterize the dead-zone width and learn it via backpropagation, alongside the quantization parameters. This design provides explicit control of sparsity, regularized by a single global hyperparameter, while decoupling sparsity selection from bit-width selection. The result is a method for Compression with Dead-zone Quantizer (CoDeQ) that supports both fixed-precision and mixed-precision quantization (controlled by an optional second hyperparameter). It simultaneously determines the sparsity pattern and quantization parameters in a single end-to-end optimization. Consequently, CoDeQ does not require any auxiliary procedures, making the method architecture-agnostic and straightforward to implement. On ImageNet with ResNet-18, CoDeQ reduces bit operations to ~5% while maintaining close to full precision accuracy in both fixed and mixed-precision regimes.
Updated: 2025-12-15 04:53:32
标题: CoDeQ:用于高稀疏度和低精度网络的端到端联合模型压缩与死区量化器
摘要: 尽管联合剪枝-量化在理论上优于顺序应用,但当前的联合方法依赖于训练循环之外的辅助程序来寻找压缩参数。这种依赖增加了工程复杂性和超参数调整,同时缺乏直接的数据驱动梯度信号,可能导致次优的压缩效果。在本文中,我们介绍了CoDeQ,一种简单、完全可微的联合剪枝-量化方法。我们的方法建立在一个关键观察上:标量量化器的死区等同于幅值剪枝,并且可以直接在量化运算符内部引入稀疏性。具体而言,我们对死区宽度进行参数化,并通过反向传播学习它,同时学习量化参数。这种设计提供了对稀疏性的显式控制,由一个全局超参数正则化,同时将稀疏性选择与比特宽度选择分离。结果是一种支持固定精度和混合精度量化的Dead-zone量化器压缩方法(CoDeQ),通过可选的第二个超参数控制。它同时确定了稀疏模式和量化参数,实现了单一端到端优化。因此,CoDeQ不需要任何辅助程序,使得该方法与架构无关且易于实现。在ImageNet上使用ResNet-18,CoDeQ将比特操作减少到约5%,同时在固定精度和混合精度制度下保持接近完整精度的准确性。
更新时间: 2025-12-15 04:53:32
领域: cs.LG,stat.ML
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
Humans continuously infer the states, goals, and behaviors of others by perceiving their surroundings in dynamic, real-world social interactions. However, most Theory of Mind (ToM) benchmarks only evaluate static, text-based scenarios, which have a significant gap compared to real interactions. We propose the SoMi-ToM benchmark, designed to evaluate multi-perspective ToM in embodied multi-agent complex social interactions. This benchmark is based on rich multimodal interaction data generated by the interaction environment SoMi, covering diverse crafting goals and social relationships. Our framework supports multi-level evaluation: (1) first-person evaluation provides multimodal (visual, dialogue, action, etc.) input from a first-person perspective during a task for real-time state inference, (2) third-person evaluation provides complete third-person perspective video and text records after a task for goal and behavior inference. This evaluation method allows for a more comprehensive examination of a model's ToM capabilities from both the subjective immediate experience and the objective global observation. We constructed a challenging dataset containing 35 third-person perspective videos, 363 first-person perspective images, and 1225 expert-annotated multiple-choice questions (three options). On this dataset, we systematically evaluated the performance of human subjects and several state-of-the-art large vision-language models (LVLMs). The results show that LVLMs perform significantly worse than humans on SoMi-ToM: the average accuracy gap between humans and models is 40.1% in first-person evaluation and 26.4% in third-person evaluation. This indicates that future LVLMs need to further improve their ToM capabilities in embodied, complex social interactions.
Updated: 2025-12-15 04:45:27
标题: SoMi-ToM:评估在具身社交互动中的多角度心智理论
摘要: 人类通过感知周围环境在动态的、真实的社交互动中不断推断他人的状态、目标和行为。然而,大多数心灵理论(ToM)基准只评估静态的、基于文本的场景,与真实互动相比存在显著差距。我们提出了SoMi-ToM基准,旨在评估具有多角度视角的ToM在具体的多智能体复杂社交互动中的表现。该基准基于由交互环境SoMi生成的丰富多模态交互数据,涵盖不同的制作目标和社会关系。我们的框架支持多级评估:(1)第一人称评估为实时状态推断提供了来自第一人称视角的多模态(视觉、对话、动作等)输入,在任务期间进行,(2)第三人称评估为目标和行为推断提供了完整的第三人称视角视频和文本记录。这种评估方法允许更全面地检查模型的ToM能力,既从主观的即时体验,也从客观的全局观察。我们构建了一个具有挑战性的数据集,包含35个第三人称视角视频,363个第一人称视角图像和1225个专家注释的多项选择问题(三个选项)。在这个数据集上,我们系统地评估了人类参与者和几种最先进的大规模视觉语言模型(LVLMs)的性能。结果显示LVLMs在SoMi-ToM上表现明显不如人类:在第一人称评估中,人类与模型之间的平均准确率差距为40.1%,在第三人称评估中为26.4%。这表明未来的LVLMs需要进一步提高其在具体的、复杂的社交互动中的ToM能力。
更新时间: 2025-12-15 04:45:27
领域: cs.CL,cs.AI,cs.CV,cs.RO
A Physics-Embedded Dual-Learning Imaging Framework for Electrical Impedance Tomography
Electrical Impedance Tomography (EIT) is a promising noninvasive imaging technique that reconstructs the spatial conductivity distribution from boundary voltage measurements. However, it poses a highly nonlinear and ill-posed inverse problem. Traditional regularization-based methods are sensitive to noise and often produce significant artifacts. Physics-Embedded learning frameworks, particularly Physics-Informed Neural Networks (PINNs), have shown success in solving such inverse problems under ideal conditions with abundant internal data. Yet in practical EIT applications, only sparse and noisy boundary measurements are available. Moreover, changing boundary excitations require the simultaneous training of multiple forward networks and one inverse network, which significantly increases computational complexity and hampers convergence. To overcome these limitations, we propose a Physics-Embedded Dual-Learning Imaging Framework for EIT. The dual-learning strategy is composed of a supervised CNN-based forward network, which learns to predict a discrete internal potential distribution under fixed Neumann-to-Dirichlet boundary conditions, and an unsupervised PINN-based inverse network, which reconstructs the conductivity by enforcing the governing PDE through discrete numerical differentiation of the predicted potentials. This decoupled architecture removes the need for smooth conductivity assumptions, reduces the number of forward networks required from $K$ to 1, and improves reconstruction robustness and efficiency under realistic measurement constraints.(https://github.com/XuanxuanYang/CNN-PINNframework.git)
Updated: 2025-12-15 04:42:54
标题: 一种物理嵌入的双学习成像框架用于电阻抗断层成像
摘要: 电阻抗断层成像(EIT)是一种有前途的非侵入性成像技术,它通过边界电压测量重建空间电导率分布。然而,它面临一个高度非线性和不适定的反问题。传统的基于正则化的方法对噪声敏感,通常会产生显著的伪影。物理嵌入式学习框架,尤其是物理信息神经网络(PINNs),在理想条件下通过丰富的内部数据成功解决了这类反问题。然而,在实际EIT应用中,只有稀疏和嘈杂的边界测量数据可用。此外,改变边界激发需要同时训练多个正向网络和一个逆向网络,这显著增加了计算复杂性并阻碍了收敛。为了克服这些限制,我们提出了一种用于EIT的物理嵌入式双学习成像框架。双学习策略由一个基于监督的CNN正向网络组成,该网络学习在固定的诺伊曼边界条件下预测离散内部电位分布,并且一个基于无监督PINN的逆向网络,通过对预测电位的离散数值微分来强制执行控制PDE以重建电导率。这种分离的架构消除了对平滑电导率假设的需求,将所需的正向网络数量从K减少到1,并在现实测量约束条件下改善了重建的稳健性和效率。
更新时间: 2025-12-15 04:42:54
领域: eess.SP,cs.LG
Application of Deep Learning in Biological Data Compression
Cryogenic electron microscopy (Cryo-EM) has become an essential tool for capturing high-resolution biological structures. Despite its advantage in visualizations, the large storage size of Cryo-EM data file poses significant challenges for researchers and educators. This paper investigates the application of deep learning, specifically implicit neural representation (INR), to compress Cryo-EM biological data. The proposed approach first extracts the binary map of each file according to the density threshold. The density map is highly repetitive, ehich can be effectively compressed by GZIP. The neural network then trains to encode spatial density information, allowing the storage of network parameters and learnable latent vectors. To improve reconstruction accuracy, I further incorporate the positional encoding to enhance spatial representation and a weighted Mean Squared Error (MSE) loss function to balance density distribution variations. Using this approach, my aim is to provide a practical and efficient biological data compression solution that can be used for educational and research purpose, while maintaining a reasonable compression ratio and reconstruction quality from file to file.
Updated: 2025-12-15 04:40:23
标题: 深度学习在生物数据压缩中的应用
摘要: 低温电子显微镜(Cryo-EM)已成为捕捉高分辨率生物结构的重要工具。尽管其在可视化方面具有优势,但Cryo-EM数据文件的大存储大小为研究人员和教育者带来了重大挑战。本文研究了深度学习,特别是隐式神经表示(INR)在压缩Cryo-EM生物数据方面的应用。所提出的方法首先根据密度阈值提取每个文件的二进制地图。密度图高度重复,可以通过GZIP有效压缩。神经网络然后训练以编码空间密度信息,允许存储网络参数和可学习的潜在向量。为了提高重建准确性,我进一步结合位置编码以增强空间表示,以及加权均方误差(MSE)损失函数以平衡密度分布变化。使用这种方法,我的目标是提供一个实用和高效的生物数据压缩解决方案,可用于教育和研究目的,同时保持从文件到文件的合理压缩比和重建质量。
更新时间: 2025-12-15 04:40:23
领域: cs.LG,cs.IT
When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
The landscape of scientific peer review is rapidly evolving with the integration of Large Language Models (LLMs). This shift is driven by two parallel trends: the widespread individual adoption of LLMs by reviewers to manage workload (the "Lazy Reviewer" hypothesis) and the formal institutional deployment of AI-powered assessment systems by conferences like AAAI and Stanford's Agents4Science. This study investigates the robustness of these "LLM-as-a-Judge" systems (both illicit and sanctioned) to adversarial PDF manipulation. Unlike general jailbreaks, we focus on a distinct incentive: flipping "Reject" decisions to "Accept," for which we develop a novel evaluation metric which we term as WAVS (Weighted Adversarial Vulnerability Score). We curated a dataset of 200 scientific papers and adapted 15 domain-specific attack strategies to this task, evaluating them across 13 Language Models, including GPT-5, Claude Haiku, and DeepSeek. Our results demonstrate that obfuscation strategies like "Maximum Mark Magyk" successfully manipulate scores, achieving alarming decision flip rates even in large-scale models. We will release our complete dataset and injection framework to facilitate more research on this topic.
Updated: 2025-12-15 04:40:20
标题: 当拒绝变成接受:量化基于LLM的科学审稿人对间接提示注入的脆弱性
摘要: 科学同行评审的格局正在迅速演变,其中融入了大型语言模型(LLMs)。这种转变受到两种并行趋势的推动:广泛采用LLMs来管理工作量的审稿人个体采用(“懒惰审稿人”假设)以及由AAAI和斯坦福大学Agents4Science等会议正式机构部署的AI驱动评估系统。本研究调查了这些“LLM作为法官”系统(违规和合法)在面对对抗性PDF操作时的稳健性。与一般越狱不同,我们关注一个独特的动机:将“拒绝”决定翻转为“接受”,为此我们开发了一种新的评估指标,称为WAVS(加权对抗性脆弱性评分)。我们筛选了200篇科学论文的数据集,并针对这一任务适应了15种领域特定的攻击策略,并在包括GPT-5、Claude Haiku和DeepSeek在内的13种语言模型上进行了评估。我们的结果表明,像“最大标记魔法”这样的混淆策略成功地操纵了分数,甚至在大规模模型中也实现了令人担忧的决策翻转率。我们将发布我们的完整数据集和注入框架,以促进更多关于这一主题的研究。
更新时间: 2025-12-15 04:40:20
领域: cs.AI,cs.CL,cs.CR
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Machine unlearning algorithms aim to efficiently remove data from a model without retraining it from scratch, in order to remove corrupted or outdated data or respect a user's ``right to be forgotten." Certified machine unlearning is a strong theoretical guarantee based on differential privacy that quantifies the extent to which an algorithm erases data from the model weights. In contrast to existing works in certified unlearning for convex or strongly convex loss functions, or nonconvex objectives with limiting assumptions, we propose the first, first-order, black-box (i.e., can be applied to models pretrained with vanilla gradient descent) algorithm for unlearning on general nonconvex loss functions, which unlearns by ``rewinding" to an earlier step during the learning process before performing gradient descent on the loss function of the retained data points. We prove $(ε, δ)$ certified unlearning and performance guarantees that establish the privacy-utility-complexity tradeoff of our algorithm, and we prove generalization guarantees for functions that satisfy the Polyak-Lojasiewicz inequality. Finally, we demonstrate the superior performance of our algorithm compared to existing methods, within a new experimental framework that more accurately reflects unlearning user data in practice.
Updated: 2025-12-15 04:39:24
标题: 倒带删除:非凸函数的认证机器遗忘
摘要: 机器遗忘算法旨在有效地从模型中删除数据,而无需从头开始重新训练,以删除损坏或过时的数据,或尊重用户的“被遗忘权利”。认证机器遗忘是基于差分隐私的强大理论保证,量化算法从模型权重中擦除数据的程度。与现有的用于凸或强凸损失函数的认证遗忘作品,或带有限制性假设的非凸目标相比,我们提出了第一个基于一阶黑盒(即可应用于使用普通梯度下降预训练的模型)算法,用于一般非凸损失函数的遗忘,通过在学习过程中“倒带”到较早步骤,然后在保留数据点的损失函数上执行梯度下降来进行遗忘。我们证明了我们算法的$(ε, δ)$认证遗忘和性能保证,建立了我们算法的隐私-效用-复杂性权衡,并证明了满足Polyak-Lojasiewicz不等式的函数的泛化保证。最后,我们展示了我们的算法相对于现有方法的卓越性能,在一个新的实验框架中,更准确地反映了实践中遗忘用户数据的情况。
更新时间: 2025-12-15 04:39:24
领域: cs.LG
Building Self-Evolving Agents via Experience-Driven Lifelong Learning: A Framework and Benchmark
As AI advances toward general intelligence, the focus is shifting from systems optimized for static tasks to creating open-ended agents that learn continuously. In this paper, we introduce Experience-driven Lifelong Learning (ELL), a framework for building self-evolving agents capable of continuous growth through real-world interaction. The framework is built on four core principles: (1) Experience Exploration: Agents learn through continuous, self-motivated interaction with dynamic environments, navigating interdependent tasks and generating rich experiential trajectories. (2) Long-term Memory: Agents preserve and structure historical knowledge, including personal experiences, domain expertise, and commonsense reasoning, into a persistent memory system. (3) Skill Learning: Agents autonomously improve by abstracting recurring patterns from experience into reusable skills, which are actively refined and validated for application in new tasks. (4) Knowledge Internalization: Agents internalize explicit and discrete experiences into implicit and intuitive capabilities as "second nature". We also introduce StuLife, a benchmark dataset for ELL that simulates a student's holistic college journey, from enrollment to academic and personal development, across three core phases and ten detailed sub-scenarios. StuLife is designed around three key paradigm
Updated: 2025-12-15 04:36:06
标题: 通过经验驱动的终身学习构建自我进化的智能体:一个框架和基准
摘要: 随着人工智能向通用智能发展,重点正在从针对静态任务优化的系统转向创建能够持续学习的开放式代理人。在本文中,我们介绍了经验驱动的终身学习(ELL)框架,这是一个构建能够通过与现实世界的互动不断增长的自我进化代理人的框架。该框架建立在四个核心原则上:(1)经验探索:代理人通过与动态环境的持续、自我激励互动学习,导航相互关联的任务并生成丰富的经验轨迹。(2)长期记忆:代理人将历史知识,包括个人经历、领域专业知识和常识推理,保存并组织到一个持久的记忆系统中。(3)技能学习:代理人通过将经验中的重复模式抽象成可重复使用的技能来自主改进,这些技能会被积极地完善和验证,以应用于新任务。(4)知识内化:代理人将明确和离散的经验内化为内隐和直观的能力,使其成为“第二天性”。 我们还介绍了StuLife,一个用于ELL的基准数据集,模拟了学生在整个大学学习过程中的全面发展,从入学到学术和个人发展,跨越了三个核心阶段和十个详细的子场景。StuLife围绕着三个关键范式设计:
更新时间: 2025-12-15 04:36:06
领域: cs.AI,cs.CL
Learning to Optimize for Mixed-Integer Non-linear Programming with Feasibility Guarantees
Mixed-integer nonlinear programs (MINLPs) arise in domains such as energy systems, process engineering, and transportation, and are notoriously difficult to solve at scale due to the interplay of discrete decisions and nonlinear constraints. In many practical settings, these problems appear in parametric form, where objectives and constraints depend on instance-specific parameters, creating the need for fast and reliable solutions across related instances. While learning-to-optimize (L2O) methods have shown strong performance in continuous optimization, extending them to MINLPs requires enforcing both feasibility and integrality within a data-driven framework. We propose an L2O approach tailored to parametric MINLPs that generates instance-specific solutions using integer correction layers to enforce integrality and a gradient-based projection to ensure feasibility of the inequality constraints. Theoretically, we provide asymptotic and non-asymptotic convergence guarantees of the projection step. Empirically, the framework scales to MINLPs with tens of thousands of variables and produces feasible high-quality solutions within milliseconds, often outperforming traditional solvers and heuristic baselines in repeated-solve settings.
Updated: 2025-12-15 04:35:41
标题: 学习优化具有可行性保证的混合整数非线性规划
摘要: 混合整数非线性规划(MINLPs)在领域中出现,如能源系统、过程工程和运输,并因离散决策和非线性约束的相互作用而难以规模解决。在许多实际情况下,这些问题以参数形式出现,其中目标和约束取决于特定实例的参数,因此需要跨相关实例快速可靠的解决方案。虽然学习优化(L2O)方法在连续优化中表现出色,将其扩展到MINLPs需要在数据驱动框架中同时执行可行性和整数性。我们提出了一种针对参数化MINLPs的L2O方法,利用整数校正层生成特定实例的解决方案,以强制整数性,并通过基于梯度的投影确保不等式约束的可行性。从理论上讲,我们提供了投影步骤的渐近和非渐近收敛保证。在实证方面,该框架可扩展到具有数万个变量的MINLPs,并在几毫秒内生成可行的高质量解决方案,通常在重复解决设置中优于传统求解器和启发式基线。
更新时间: 2025-12-15 04:35:41
领域: cs.LG,math.OC
Adaptive Information Routing for Multimodal Time Series Forecasting
Time series forecasting is a critical task for artificial intelligence with numerous real-world applications. Traditional approaches primarily rely on historical time series data to predict the future values. However, in practical scenarios, this is often insufficient for accurate predictions due to the limited information available. To address this challenge, multimodal time series forecasting methods which incorporate additional data modalities, mainly text data, alongside time series data have been explored. In this work, we introduce the Adaptive Information Routing (AIR) framework, a novel approach for multimodal time series forecasting. Unlike existing methods that treat text data on par with time series data as interchangeable auxiliary features for forecasting, AIR leverages text information to dynamically guide the time series model by controlling how and to what extent multivariate time series information should be combined. We also present a text-refinement pipeline that employs a large language model to convert raw text data into a form suitable for multimodal forecasting, and we introduce a benchmark that facilitates multimodal forecasting experiments based on this pipeline. Experiment results with the real world market data such as crude oil price and exchange rates demonstrate that AIR effectively modulates the behavior of the time series model using textual inputs, significantly enhancing forecasting accuracy in various time series forecasting tasks.
Updated: 2025-12-15 04:30:57
标题: 多模态时间序列预测的自适应信息路由
摘要: 时间序列预测是人工智能的一个关键任务,具有许多现实世界应用。传统方法主要依赖于历史时间序列数据来预测未来值。然而,在实际场景中,由于可用信息有限,这通常不足以进行准确预测。为了解决这一挑战,研究人员探索了将额外数据模态(主要是文本数据)与时间序列数据结合的多模态时间序列预测方法。在这项工作中,我们介绍了自适应信息路由(AIR)框架,这是一种新颖的多模态时间序列预测方法。与现有方法将文本数据视为可以与时间序列数据相互替换的辅助特征进行预测不同,AIR利用文本信息动态引导时间序列模型,控制多变量时间序列信息应如何以及在何种程度上进行组合。我们还提出了一个文本精炼管道,利用大型语言模型将原始文本数据转换成适合多模态预测的形式,并介绍了一个基于该管道的促进多模态预测实验的基准。对原油价格和汇率等真实市场数据的实验结果表明,AIR有效地通过文本输入调节时间序列模型的行为,显著提高了各种时间序列预测任务的准确性。
更新时间: 2025-12-15 04:30:57
领域: cs.LG,cs.AI
Multi-View Oriented GPLVM: Expressiveness and Efficiency
The multi-view Gaussian process latent variable model (MV-GPLVM) aims to learn a unified representation from multi-view data but is hindered by challenges such as limited kernel expressiveness and low computational efficiency. To overcome these issues, we first introduce a new duality between the spectral density and the kernel function. By modeling the spectral density with a bivariate Gaussian mixture, we then derive a generic and expressive kernel termed Next-Gen Spectral Mixture (NG-SM) for MV-GPLVMs. To address the inherent computational inefficiency of the NG-SM kernel, we design a new form of random Fourier feature approximation. Combined with a tailored reparameterization trick, this approximation enables scalable variational inference for both the model and the unified latent representations. Numerical evaluations across a diverse range of multi-view datasets demonstrate that our proposed method consistently outperforms state-of-the-art models in learning meaningful latent representations.
Updated: 2025-12-15 04:25:44
标题: 多视角导向的GPLVM:表达能力和效率
摘要: 多视角高斯过程潜变量模型(MV-GPLVM)旨在从多视角数据中学习统一的表示,但受限于挑战,如有限的核表达能力和低计算效率。为了克服这些问题,首先引入了谱密度和核函数之间的新对偶性。通过用双变量高斯混合模型化谱密度,我们推导出一种通用且富有表现力的核函数,称为Next-Gen Spectral Mixture(NG-SM),用于MV-GPLVMs。为了解决NG-SM核函数固有的计算效率低下问题,我们设计了一种新形式的随机傅里叶特征近似。结合定制的重参数化技巧,这种近似使得模型和统一的潜在表示的可伸缩变分推断成为可能。在各种多视角数据集上的数值评估表明,我们提出的方法在学习有意义的潜在表示方面始终优于最先进的模型。
更新时间: 2025-12-15 04:25:44
领域: stat.ML,cs.LG
ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data
Prompt injection attacks aim to contaminate the input data of an LLM to mislead it into completing an attacker-chosen task instead of the intended task. In many applications and agents, the input data originates from multiple sources, with each source contributing a segment of the overall input. In these multi-source scenarios, an attacker may control only a subset of the sources and contaminate the corresponding segments, but typically does not know the order in which the segments are arranged within the input. Existing prompt injection attacks either assume that the entire input data comes from a single source under the attacker's control or ignore the uncertainty in the ordering of segments from different sources. As a result, their success is limited in domains involving multi-source data. In this work, we propose ObliInjection, the first prompt injection attack targeting LLM applications and agents with multi-source input data. ObliInjection introduces two key technical innovations: the order-oblivious loss, which quantifies the likelihood that the LLM will complete the attacker-chosen task regardless of how the clean and contaminated segments are ordered; and the orderGCG algorithm, which is tailored to minimize the order-oblivious loss and optimize the contaminated segments. Comprehensive experiments across three datasets spanning diverse application domains and twelve LLMs demonstrate that ObliInjection is highly effective, even when only one out of 6-100 segments in the input data is contaminated. Our code and data are available at: https://github.com/ReachalWang/ObliInjection.
Updated: 2025-12-15 04:21:36
标题: ObliInjection: 面向具有多源数据的LLM代理的无序提示注入攻击
摘要: 迅速注入攻击旨在污染LLM的输入数据,以误导它完成攻击者选择的任务,而不是预期的任务。在许多应用程序和代理中,输入数据源自多个来源,每个来源都贡献整体输入的一个部分。在这些多源场景中,攻击者可能仅控制部分来源并污染相应的段,但通常不知道输入中不同来源的段的排列顺序。现有的提示注入攻击要么假定整个输入数据来自攻击者控制的单个来源,要么忽略来自不同来源的段的顺序不确定性。因此,在涉及多源数据的领域中,它们的成功受到限制。 在这项工作中,我们提出了ObliInjection,这是针对具有多源输入数据的LLM应用程序和代理的第一个提示注入攻击。ObliInjection引入了两个关键技术创新:无序损失,它量化LLM完成攻击者选择的任务的可能性,无论干净和污染的段如何排序;以及orderGCG算法,专门设计用于最小化无序损失并优化污染的段。跨越三个跨越不同应用领域的数据集和十二个LLM的全面实验表明,ObliInjection非常有效,即使只有输入数据中的6-100个段中的一个被污染。我们的代码和数据可在以下网址获得:https://github.com/ReachalWang/ObliInjection。
更新时间: 2025-12-15 04:21:36
领域: cs.CR
Towards Open Standards for Systemic Complexity in Digital Forensics
The intersection of artificial intelligence (AI) and digital forensics (DF) is becoming increasingly complex, ubiquitous, and pervasive, with overlapping techniques and technologies being adopted in all types of scientific and technical inquiry. Despite incredible advances, forensic sciences are not exempt from errors and remain vulnerable to fallibility. To mitigate the limitations of errors in DF, the systemic complexity is identified and addressed with the adoption of human-readable artifacts and open standards. A DF AI model schema based on the state of the art is outlined.
Updated: 2025-12-15 04:18:56
标题: 朝着数字取证中系统复杂性的开放标准前进
摘要: 人工智能(AI)和数字取证(DF)的交叉领域变得越来越复杂、普遍和普及,各种科学和技术调查中采用了重叠的技术和技术。尽管取证科学取得了令人难以置信的进展,但仍然容易出现错误,并且容易受到错误的影响。为了减轻数字取证中错误的限制,通过采用人类可读的工件和开放标准来识别和解决系统复杂性。概述了基于当前技术水平的数字取证AI模型架构。
更新时间: 2025-12-15 04:18:56
领域: cs.AI,cs.CY
Astra: General Interactive World Model with Autoregressive Denoising
Recent advances in diffusion transformers have empowered video generation models to generate high-quality video clips from texts or images. However, world models with the ability to predict long-horizon futures from past observations and actions remain underexplored, especially for general-purpose scenarios and various forms of actions. To bridge this gap, we introduce Astra, an interactive general world model that generates real-world futures for diverse scenarios (e.g., autonomous driving, robot grasping) with precise action interactions (e.g., camera motion, robot action). We propose an autoregressive denoising architecture and use temporal causal attention to aggregate past observations and support streaming outputs. We use a noise-augmented history memory to avoid over-reliance on past frames to balance responsiveness with temporal coherence. For precise action control, we introduce an action-aware adapter that directly injects action signals into the denoising process. We further develop a mixture of action experts that dynamically route heterogeneous action modalities, enhancing versatility across diverse real-world tasks such as exploration, manipulation, and camera control. Astra achieves interactive, consistent, and general long-term video prediction and supports various forms of interactions. Experiments across multiple datasets demonstrate the improvements of Astra in fidelity, long-range prediction, and action alignment over existing state-of-the-art world models.
Updated: 2025-12-15 03:59:41
标题: Astra:具有自回归去噪的通用交互式世界模型
摘要: 最近在扩散变压器领域取得的进展使视频生成模型能够从文本或图像中生成高质量视频片段。然而,具有预测长期未来能力的世界模型,尤其是适用于通用场景和各种形式行动的世界模型仍未得到充分探索。为了填补这一空白,我们引入了Astra,一个交互式通用世界模型,可以为不同场景(如自动驾驶、机器人抓取)生成真实世界未来,具有精确的行动交互(如摄像机运动、机器人行动)。我们提出了一个自回归去噪架构,并使用时间因果注意力来聚合过去的观察结果并支持流式输出。我们使用了一个噪声增强的历史记忆来避免对过去帧的过度依赖,以平衡响应性和时间连贯性。为了实现精确的行动控制,我们引入了一个行动感知适配器,直接将行动信号注入去噪过程。我们进一步开发了一个混合行动专家组,动态路由异构行动模态,增强了在不同真实世界任务(如探索、操作和摄像机控制)中的多功能性。Astra实现了交互式、一致且通用的长期视频预测,并支持各种形式的交互。在多个数据集上的实验表明,Astra在保真度、长期预测和行动对齐方面比现有最先进的世界模型有所改进。
更新时间: 2025-12-15 03:59:41
领域: cs.CV,cs.AI,cs.LG
SpurLens: Automatic Detection of Spurious Cues in Multimodal LLMs
Unimodal vision models are known to rely on spurious correlations, but it remains unclear to what extent Multimodal Large Language Models (MLLMs) exhibit similar biases despite language supervision. In this paper, we investigate spurious bias in MLLMs and introduce SpurLens, a pipeline that leverages GPT-4 and open-set object detectors to automatically identify spurious visual cues without human supervision. Our findings reveal that spurious correlations cause two major failure modes in MLLMs: (1) over-reliance on spurious cues for object recognition, where removing these cues reduces accuracy, and (2) object hallucination, where spurious cues amplify the hallucination by over 10x. We validate our findings in various MLLMs and datasets. Beyond diagnosing these failures, we explore potential mitigation strategies, such as prompt ensembling and reasoning-based prompting, and conduct ablation studies to examine the root causes of spurious bias in MLLMs. By exposing the persistence of spurious correlations, our study calls for more rigorous evaluation methods and mitigation strategies to enhance the reliability of MLLMs.
Updated: 2025-12-15 03:45:42
标题: SpurLens:多模式LLMs中虚假线索的自动检测
摘要: 单模视觉模型已知依赖于虚假相关性,但尽管有语言监督,多模态大语言模型(MLLMs)在多大程度上存在类似偏见仍不清楚。在本文中,我们研究了MLLMs中的虚假偏见,并引入了SpurLens,这是一个利用GPT-4和开放式物体检测器的管道,可以在无人监督的情况下自动识别虚假视觉线索。我们的研究发现,虚假相关性导致MLLMs中的两种主要失败模式:(1)对于物体识别依赖于虚假线索,去除这些线索会降低准确性,(2)物体幻觉,虚假线索将幻觉放大超过10倍。我们在各种MLLMs和数据集中验证了我们的发现。除了诊断这些失败,我们还探索了潜在的缓解策略,例如提示集成和基于推理的提示,并进行消融研究以检验MLLMs中虚假偏见的根本原因。通过揭示虚假相关性的持续性,我们的研究呼吁采用更严格的评估方法和缓解策略,以提高MLLMs的可靠性。
更新时间: 2025-12-15 03:45:42
领域: cs.CV,cs.CL,cs.LG
Building from Scratch: A Multi-Agent Framework with Human-in-the-Loop for Multilingual Legal Terminology Mapping
Accurately mapping legal terminology across languages remains a significant challenge, especially for language pairs like Chinese and Japanese, which share a large number of homographs with different meanings. Existing resources and standardized tools for these languages are limited. To address this, we propose a human-AI collaborative approach for building a multilingual legal terminology database, based on a multi-agent framework. This approach integrates advanced large language models and legal domain experts throughout the entire process-from raw document preprocessing, article-level alignment, to terminology extraction, mapping, and quality assurance. Unlike a single automated pipeline, our approach places greater emphasis on how human experts participate in this multi-agent system. Humans and AI agents take on different roles: AI agents handle specific, repetitive tasks, such as OCR, text segmentation, semantic alignment, and initial terminology extraction, while human experts provide crucial oversight, review, and supervise the outputs with contextual knowledge and legal judgment. We tested the effectiveness of this framework using a trilingual parallel corpus comprising 35 key Chinese statutes, along with their English and Japanese translations. The experimental results show that this human-in-the-loop, multi-agent workflow not only improves the precision and consistency of multilingual legal terminology mapping but also offers greater scalability compared to traditional manual methods.
Updated: 2025-12-15 03:29:21
标题: 从头开始构建:一个带有人类参与的多代理框架,用于多语言法律术语映射
摘要: 跨语言准确映射法律术语仍然是一个重大挑战,特别是对于中日这样共享大量同音异义词的语言对而言。现有资源和标准化工具对这些语言的支持有限。为了解决这个问题,我们提出了一种人工智能协作方法来构建多语种法律术语数据库,基于一个多代理框架。这种方法整合了先进的大型语言模型和法律领域专家,贯穿整个流程——从原始文档预处理,到文章级别对齐,术语提取,映射和质量保证。与单一自动化流水线不同,我们的方法更加注重人类专家如何参与这个多代理系统。人类和人工智能代理承担不同的角色:人工智能代理处理具体的重复性任务,如OCR、文本分割、语义对齐和初始术语提取,而人类专家提供关键的监督、审查,并结合上下文知识和法律判断监督输出。我们使用包括35个关键的中国法规及其英文和日文翻译的三语平行语料库测试了这一框架的有效性。实验结果显示,这种人机协作的多代理工作流程不仅提高了多语种法律术语映射的精度和一致性,而且相比传统手动方法具有更大的可扩展性。
更新时间: 2025-12-15 03:29:21
领域: cs.CL,cs.AI
Understanding When Graph Convolutional Networks Help: A Diagnostic Study on Label Scarcity and Structural Properties
Graph Convolutional Networks (GCNs) have become a standard approach for semi-supervised node classification, yet practitioners lack clear guidance on when GCNs provide meaningful improvements over simpler baselines. We present a diagnostic study using the Amazon Computers co-purchase data to understand when and why GCNs help. Through systematic experiments with simulated label scarcity, feature ablation, and per-class analysis, we find that GCN performance depends critically on the interaction between graph homophily and feature quality. GCNs provide the largest gains under extreme label scarcity, where they leverage neighborhood structure to compensate for limited supervision. Surprisingly, GCNs can match their original performance even when node features are replaced with random noise, suggesting that structure alone carries sufficient signal on highly homophilous graphs. However, GCNs hurt performance when homophily is low and features are already strong, as noisy neighbors corrupt good predictions. Our quadrant analysis reveals that GCNs help in three of four conditions and only hurt when low homophily meets strong features. These findings offer practical guidance for practitioners deciding whether to adopt graph-based methods.
Updated: 2025-12-15 03:23:50
标题: 理解图卷积网络何时有帮助:关于标签稀缺性和结构特性的诊断研究
摘要: 图卷积网络(GCN)已成为半监督节点分类的标准方法,然而实践者缺乏关于何时GCN相比简单基线提供有意义改进的明确指导。我们利用亚马逊计算机共购数据进行诊断研究,以了解GCN何时以及为什么有帮助。通过对模拟标签稀缺性、特征消融和按类别分析的系统实验,我们发现GCN的性能关键取决于图同质性和特征质量之间的交互作用。在极端标签稀缺性下,GCN利用邻域结构来弥补监督的限制,提供了最大的收益。令人惊讶的是,即使节点特征被替换为随机噪声,GCN也可以匹配其原始性能,表明仅仅结构在高度同质的图中已经携带足够的信号。然而,当同质性较低且特征已经强大时,GCN会降低性能,因为嘈杂的邻居会破坏良好的预测。我们的四象限分析显示,GCN在四个条件中有三个条件有帮助,只有在低同质性遇到强特征时才会有损害。这些发现为决定是否采用基于图的方法的实践者提供了实用指导。
更新时间: 2025-12-15 03:23:50
领域: cs.LG,stat.AP,stat.ML
Unsupervised Acquisition of Discrete Grammatical Categories
This article presents experiments performed using a computational laboratory environment for language acquisition experiments. It implements a multi-agent system consisting of two agents: an adult language model and a daughter language model that aims to learn the mother language. Crucially, the daughter agent does not have access to the internal knowledge of the mother language model but only to the language exemplars the mother agent generates. These experiments illustrate how this system can be used to acquire abstract grammatical knowledge. We demonstrate how statistical analyses of patterns in the input data corresponding to grammatical categories yield discrete grammatical rules. These rules are subsequently added to the grammatical knowledge of the daughter language model. To this end, hierarchical agglomerative cluster analysis was applied to the utterances consecutively generated by the mother language model. It is argued that this procedure can be used to acquire structures resembling grammatical categories proposed by linguists for natural languages. Thus, it is established that non-trivial grammatical knowledge has been acquired. Moreover, the parameter configuration of this computational laboratory environment determined using training data generated by the mother language model is validated in a second experiment with a test set similarly resulting in the acquisition of non-trivial categories.
Updated: 2025-12-15 03:17:48
标题: 无监督获取离散语法类别
摘要: 本文介绍了在语言习得实验中使用计算实验室环境进行的实验。它实现了一个由两个代理组成的多代理系统:一个成年语言模型和一个旨在学习母语的女儿语言模型。关键是,女儿代理没有访问母语模型的内部知识,只能访问母代理生成的语言示例。这些实验说明了如何使用这个系统来获得抽象的语法知识。我们展示了如何对与语法类别相对应的输入数据中的模式进行统计分析,从而得出离散的语法规则。这些规则随后被添加到女儿语言模型的语法知识中。为此,使用分级凝聚聚类分析对母语模型连续生成的话语进行了应用。据称,这个过程可以用来获取类似语言学家为自然语言提出的语法类别的结构。因此,已经确定已获得了非平凡的语法知识。此外,使用母语模型生成的训练数据确定的这个计算实验室环境的参数配置在第二个实验中得到验证,测试集同样导致了非平凡类别的获得。
更新时间: 2025-12-15 03:17:48
领域: cs.CL,cs.AI,cs.LG,cs.MA
China Regional 3km Downscaling Based on Residual Corrective Diffusion Model
A fundamental challenge in numerical weather prediction is to efficiently produce high-resolution forecasts. A common solution is applying downscaling methods, which include dynamical downscaling and statistical downscaling, to the outputs of global models. This work focuses on statistical downscaling, which establishes statistical relationships between low-resolution and high-resolution historical data using statistical models. Deep learning has emerged as a powerful tool for this task, giving rise to various high-performance super-resolution models, which can be directly applied for downscaling, such as diffusion models and Generative Adversarial Networks. This work relies on a diffusion-based downscaling framework named CorrDiff. In contrast to the original work of CorrDiff, the region considered in this work is nearly 40 times larger, and we not only consider surface variables as in the original work, but also encounter high-level variables (six pressure levels) as target downscaling variables. In addition, a global residual connection is added to improve accuracy. In order to generate the 3km forecasts for the China region, we apply our trained models to the 25km global grid forecasts of CMA-GFS, an operational global model of the China Meteorological Administration (CMA), and SFF, a data-driven deep learning-based weather model developed from Spherical Fourier Neural Operators (SFNO). CMA-MESO, a high-resolution regional model, is chosen as the baseline model. The experimental results demonstrate that the forecasts downscaled by our method generally outperform the direct forecasts of CMA-MESO in terms of MAE for the target variables. Our forecasts of radar composite reflectivity show that CorrDiff, as a generative model, can generate fine-scale details that lead to more realistic predictions compared to the corresponding deterministic regression models.
Updated: 2025-12-15 03:17:09
标题: 中国区域基于残差校正扩散模型的3公里尺度降尺度化
摘要: 气象数值预报中的一个基本挑战是高效地产生高分辨率的预报。一种常见的解决方案是应用降尺度方法,包括动力学降尺度和统计降尺度,对全球模型的输出进行处理。本文关注统计降尺度,它利用统计模型建立低分辨率和高分辨率历史数据之间的统计关系。深度学习已经成为这项任务的强大工具,产生了各种高性能的超分辨率模型,可以直接用于降尺度,如扩散模型和生成对抗网络。本文依赖于一个基于扩散的降尺度框架名为CorrDiff。与CorrDiff原始工作相比,本文考虑的区域几乎大40倍,并且我们不仅考虑原始工作中的地表变量,还遇到高层变量(六个压力水平)作为目标降尺度变量。此外,我们添加了全局残差连接以提高准确性。为了生成中国地区的3km预报,我们将训练好的模型应用于中国气象局(CMA)的运行全球模型CMA-GFS的25km全球网格预报,以及SFNO开发的基于数据驱动深度学习的天气模型SFF。选择高分辨率区域模型CMA-MESO作为基准模型。实验结果表明,我们的方法降尺度的预报在目标变量的MAE方面通常优于CMA-MESO的直接预报。我们的雷达组合反射率预报显示,作为生成模型的CorrDiff能够生成精细的细节,从而产生比对应的确定性回归模型更真实的预测。
更新时间: 2025-12-15 03:17:09
领域: cs.LG,cs.AI,physics.ao-ph
LookAhead Tuning: Safer Language Models via Partial Answer Previews
Fine-tuning enables large language models (LLMs) to adapt to specific domains, but often compromises their previously established safety alignment. To mitigate the degradation of model safety during fine-tuning, we introduce LookAhead Tuning, a lightweight and effective data-driven approach that preserves safety during fine-tuning. The method introduces two simple strategies that modify training data by previewing partial answer prefixes, thereby minimizing perturbations to the model's initial token distributions and maintaining its built-in safety mechanisms. Comprehensive experiments demonstrate that LookAhead Tuning effectively maintains model safety without sacrificing robust performance on downstream tasks. Our findings position LookAhead Tuning as a reliable and efficient solution for the safe and effective adaptation of LLMs.
Updated: 2025-12-15 03:15:38
标题: 向前预测调整:通过部分答案预览提高语言模型的安全性
摘要: Fine-tuning allows large language models (LLMs) to adapt to specific domains, but often compromises their previously established safety alignment. To address the degradation of model safety during fine-tuning, we propose LookAhead Tuning, a lightweight and effective data-driven approach that preserves safety. This method involves two simple strategies that modify training data by previewing partial answer prefixes, minimizing perturbations to the model's initial token distributions and preserving its safety mechanisms. Extensive experiments show that LookAhead Tuning effectively maintains model safety while maintaining robust performance on downstream tasks. Our results suggest that LookAhead Tuning is a reliable and efficient solution for the safe and effective adaptation of LLMs.
更新时间: 2025-12-15 03:15:38
领域: cs.CL,cs.AI,cs.CV,cs.LG,cs.MM
Continuous Edit Distance, Geodesics and Barycenters of Time-varying Persistence Diagrams
We introduce the Continuous Edit Distance (CED), a geodesic and elastic distance for time-varying persistence diagrams (TVPDs). The CED extends edit-distance ideas to TVPDs by combining local substitution costs with penalized deletions/insertions, controlled by two parameters: \(α\) (trade-off between temporal misalignment and diagram discrepancy) and \(β\) (gap penalty). We also provide an explicit construction of CED-geodesics. Building on these ingredients, we present two practical barycenter solvers, one stochastic and one greedy, that monotonically decrease the CED Frechet energy. Empirically, the CED is robust to additive perturbations (both temporal and spatial), recovers temporal shifts, and supports temporal pattern search. On real-life datasets, the CED achieves clustering performance comparable or better than standard elastic dissimilarities, while our clustering based on CED-barycenters yields superior classification results. Overall, the CED equips TVPD analysis with a principled distance, interpretable geodesics, and practical barycenters, enabling alignment, comparison, averaging, and clustering directly in the space of TVPDs. A C++ implementation is provided for reproducibility at the following address https://github.com/sebastien-tchitchek/ContinuousEditDistance.
Updated: 2025-12-15 02:57:21
标题: 持续编辑距离、地理和时变持续性图的重心
摘要: 我们引入了连续编辑距离(CED),一种用于时变持续图(TVPDs)的测地和弹性距离。CED通过将局部替换成本与惩罚删除/插入结合,由两个参数控制:\(α\)(时间错位和图表差异之间的折衷)和\(β\)(间隙惩罚)。我们还提供了CED-geodesics的显式构造。基于这些要素,我们提出了两种实用的质心求解器,一种是随机的,一种是贪婪的,它们可以单调地减少CED Frechet能量。实证上,CED对于加性扰动(时间和空间)是稳健的,能够恢复时间偏移,并支持时间模式搜索。在真实数据集上,CED在聚类性能上达到了与标准弹性不相似性相媲美甚至更好的水平,而基于CED质心的聚类产生了更优秀的分类结果。总的来说,CED为TVPD分析提供了一个有原则的距离、可解释的测地线和实用的质心,使得在TVPDs空间中直接进行对齐、比较、平均和聚类成为可能。可在以下地址 https://github.com/sebastien-tchitchek/ContinuousEditDistance 找到用于可重现性的C++实现。
更新时间: 2025-12-15 02:57:21
领域: cs.CG,cs.CV,cs.GR,cs.LG
Adversarial bandit optimization for approximately linear functions
We consider a bandit optimization problem for nonconvex and non-smooth functions, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player's choice. We give both expected and high probability regret bounds for the problem. Our result also implies an improved high-probability regret bound for the bandit linear optimization, a special case with no perturbation. We also give a lower bound on the expected regret.
Updated: 2025-12-15 02:55:18
标题: 对于大致线性函数的对抗性赌徒优化
摘要: 我们考虑了一个非凸非光滑函数的赌徒优化问题,其中在每次试验中,损失函数是一个线性函数和在观察到玩家选择后选择的一个小而任意的扰动的总和。我们为这个问题提供了期望和高概率的后悔界限。我们的结果还意味着对于没有扰动的特殊情况——赌徒线性优化问题,我们还提供了改进的高概率后悔界限。我们还给出了期望后悔的下界。
更新时间: 2025-12-15 02:55:18
领域: cs.LG,cs.AI
Content Adaptive based Motion Alignment Framework for Learned Video Compression
Recent advances in end-to-end video compression have shown promising results owing to their unified end-to-end learning optimization. However, such generalized frameworks often lack content-specific adaptation, leading to suboptimal compression performance. To address this, this paper proposes a content adaptive based motion alignment framework that improves performance by adapting encoding strategies to diverse content characteristics. Specifically, we first introduce a two-stage flow-guided deformable warping mechanism that refines motion compensation with coarse-to-fine offset prediction and mask modulation, enabling precise feature alignment. Second, we propose a multi-reference quality aware strategy that adjusts distortion weights based on reference quality, and applies it to hierarchical training to reduce error propagation. Third, we integrate a training-free module that downsamples frames by motion magnitude and resolution to obtain smooth motion estimation. Experimental results on standard test datasets demonstrate that our framework CAMA achieves significant improvements over state-of-the-art Neural Video Compression models, achieving a 24.95% BD-rate (PSNR) savings over our baseline model DCVC-TCM, while also outperforming reproduced DCVC-DC and traditional codec HM-16.25.
Updated: 2025-12-15 02:51:47
标题: 基于内容自适应的学习视频压缩运动对齐框架
摘要: 最近在端到端视频压缩方面取得了一些进展,这得益于它们统一的端到端学习优化。然而,这种通用框架通常缺乏内容特定的适应性,导致压缩性能不佳。为了解决这个问题,本文提出了一个基于内容自适应的运动对齐框架,通过将编码策略调整到不同内容特征来改善性能。具体而言,我们首先引入了一个两阶段的流引导可变形对齐机制,通过粗到细的偏移预测和掩模调制来精确对齐特征。其次,我们提出了一个多参考质量感知策略,根据参考质量调整失真权重,并将其应用于分层训练以减少误差传播。第三,我们集成了一个无需训练的模块,通过运动幅度和分辨率对帧进行降采样,以获得平滑的运动估计。对标准测试数据集的实验结果表明,我们的框架CAMA在现有的神经视频压缩模型上取得了显著的改进,相比我们的基准模型DCVC-TCM,BD率(PSNR)节省了24.95%,同时也优于重现的DCVC-DC和传统编解码器HM-16.25。
更新时间: 2025-12-15 02:51:47
领域: cs.CV,cs.AI
Unified Interactive Multimodal Moment Retrieval via Cascaded Embedding-Reranking and Temporal-Aware Score Fusion
The exponential growth of video content has created an urgent need for efficient multimodal moment retrieval systems. However, existing approaches face three critical challenges: (1) fixed-weight fusion strategies fail across cross modal noise and ambiguous queries, (2) temporal modeling struggles to capture coherent event sequences while penalizing unrealistic gaps, and (3) systems require manual modality selection, reducing usability. We propose a unified multimodal moment retrieval system with three key innovations. First, a cascaded dual-embedding pipeline combines BEIT-3 and SigLIP for broad retrieval, refined by BLIP-2 based reranking to balance recall and precision. Second, a temporal-aware scoring mechanism applies exponential decay penalties to large temporal gaps via beam search, constructing coherent event sequences rather than isolated frames. Third, Agent-guided query decomposition (GPT-4o) automatically interprets ambiguous queries, decomposes them into modality specific sub-queries (visual/OCR/ASR), and performs adaptive score fusion eliminating manual modality selection. Qualitative analysis demonstrates that our system effectively handles ambiguous queries, retrieves temporally coherent sequences, and dynamically adapts fusion strategies, advancing interactive moment search capabilities.
Updated: 2025-12-15 02:50:43
标题: 通过级联嵌入-重新排序和时域感知评分融合的统一交互式多模式时刻检索
摘要: 视频内容的指数增长引发了对高效多模态时刻检索系统的迫切需求。然而,现有方法面临三个关键挑战:(1)固定权重融合策略在跨模态噪声和模糊查询中失败,(2)时间建模难以捕捉连贯的事件序列,同时惩罚不切实际的间隔,(3)系统需要手动模态选择,降低了可用性。我们提出了一种统一的多模态时刻检索系统,具有三项关键创新。首先,级联双嵌入管道结合了BEIT-3和SigLIP进行广泛检索,通过基于BLIP-2的重新排序来平衡召回率和精度。其次,一种时间感知评分机制通过波束搜索对大时间间隔应用指数衰减惩罚,构建连贯的事件序列而不是孤立的帧。第三,代理引导的查询分解(GPT-4o)自动解释模糊查询,将其分解为特定于模态的子查询(视觉/OCR/ASR),并执行自适应评分融合,消除手动模态选择。定性分析表明,我们的系统有效处理模糊查询,检索时间连贯的序列,并动态调整融合策略,推进交互式时刻搜索能力。
更新时间: 2025-12-15 02:50:43
领域: cs.CV,cs.AI,cs.IR
Investigating Data Pruning for Pretraining Biological Foundation Models at Scale
Biological foundation models (BioFMs), pretrained on large-scale biological sequences, have recently shown strong potential in providing meaningful representations for diverse downstream bioinformatics tasks. However, such models often rely on millions to billions of training sequences and billions of parameters, resulting in prohibitive computational costs and significant barriers to reproducibility and accessibility, particularly for academic labs. To address these challenges, we investigate the feasibility of data pruning for BioFM pretraining and propose a post-hoc influence-guided data pruning framework tailored to biological domains. Our approach introduces a subset-based self-influence formulation that enables efficient estimation of sample importance at low computational cost, and builds upon it two simple yet effective selection strategies, namely Top-k Influence (Top I) and Coverage-Centric Influence (CCI). We empirically validate our method on two representative BioFMs, RNA-FM and ESM-C. For RNA, our framework consistently outperforms random selection baselines under an extreme pruning rate of over 99 percent, demonstrating its effectiveness. Furthermore, we show the generalizability of our framework on protein-related tasks using ESM-C. In particular, our coreset even outperforms random subsets that are ten times larger in both RNA and protein settings, revealing substantial redundancy in biological sequence datasets. These findings underscore the potential of influence-guided data pruning to substantially reduce the computational cost of BioFM pretraining, paving the way for more efficient, accessible, and sustainable biological AI research.
Updated: 2025-12-15 02:42:52
标题: 研究在大规模预训练生物基础模型中的数据修剪
摘要: 生物基础模型(BioFMs)是在大规模生物序列上预训练的模型,最近显示出在各种下游生物信息学任务中提供有意义表示的强大潜力。然而,这些模型通常依赖于数百万到数十亿的训练序列和数十亿的参数,导致了昂贵的计算成本和重复性和可访问性的重大障碍,特别是对于学术实验室。为了解决这些挑战,我们研究了对BioFM预训练进行数据修剪的可行性,并提出了一个专门针对生物领域的后续影响引导数据修剪框架。我们的方法引入了一个基于子集的自我影响公式,可以以较低的计算成本有效估计样本重要性,并在此基础上构建了两种简单而有效的选择策略,即Top-k Influence(Top I)和Coverage-Centric Influence(CCI)。我们在两个代表性的BioFM,RNA-FM和ESM-C上进行了实证验证。对于RNA,我们的框架在超过99%的极端修剪率下始终优于随机选择基线,证明了其有效性。此外,我们展示了我们的框架在蛋白相关任务上的泛化能力,使用了ESM-C。特别是,我们的核心甚至在RNA和蛋白质设置中表现出比十倍更大的随机子集更好的性能,揭示了生物序列数据集中的重大冗余。这些发现强调了影响引导数据修剪大大降低BioFM预训练的计算成本的潜力,为更高效、可访问和可持续的生物人工智能研究铺平了道路。
更新时间: 2025-12-15 02:42:52
领域: cs.LG,cs.AI,cs.CE
Improved Generalization Bounds for Transductive Learning by Transductive Local Complexity and Its Applications
We introduce Transductive Local Complexity (TLC) as a new tool for analyzing the generalization performance of transductive learning methods. Our work extends the classical Local Rademacher Complexity (LRC) to the transductive setting, incorporating substantial and novel components. Although LRC has been used to obtain sharp generalization bounds and minimax rates for inductive tasks such as classification and nonparametric regression, it has remained an open problem whether a localized Rademacher complexity framework can be effectively adapted to transductive learning to achieve sharp or nearly sharp bounds consistent with inductive results. We provide an affirmative answer via TLC. TLC is constructed by first deriving a new concentration inequality in Theorem 4.1 for the supremum of empirical processes capturing the gap between test and training losses, termed the test-train process, under uniform sampling without replacement, which leverages a novel combinatorial property of the test-train process and a new proof strategy applying the exponential Efron-Stein inequality twice. A subsequent peeling strategy and a new surrogate variance operator then yield excess risk bounds in the transductive setting that are nearly consistent with classical LRC-based inductive bounds up to a logarithmic gap. We further advance transductive learning through two applications: (1) for realizable transductive learning over binary-valued classes with finite VC dimension of $\dVC$ and $u \ge m \ge \dVC$, where $u$ and $m$ are the number of test features and training features, our Theorem 6.1 gives a nearly optimal bound $Θ(\dVC \log(me/\dVC)/m)$ matching the minimax rate $Θ(\dVC/m)$ up to $\log m$, resolving a decade-old open question; and (2) Theorem 6.2 presents a sharper excess risk bound for transductive kernel learning compared to the current state-of-the-art.
Updated: 2025-12-15 02:36:34
标题: 通过传导局部复杂性改进的传导学习泛化界限及其应用
摘要: 我们引入Transductive Local Complexity(TLC)作为分析传导学习方法的泛化性能的新工具。我们的工作将经典的Local Rademacher Complexity(LRC)扩展到传导设置中,包含了实质性和新颖的组件。虽然LRC已被用来获得对归纳任务(如分类和非参数回归)的尖锐泛化界限和极小比率,但一个未解决的问题是,是否可以有效地将局部Rademacher复杂性框架适应于传导学习,以实现与归纳结果一致的尖锐或几乎尖锐的界限。我们通过TLC提供了一个肯定的答案。TLC首先通过定理4.1导出了一个新的集中不等式,用于捕捉测试和训练损失之间的差距的经验过程的最大值,称为测试-训练过程,在无替换的均匀采样下,利用了测试-训练过程的新颖组合特性和两次应用指数Efron-Stein不等式的新证明策略。随后的剥离策略和新的替代方差算子在传导设置中产生了几乎与经典LRC基于归纳界限一致的超额风险界限,只有一个对数差距。我们通过两个应用进一步推进传导学习:(1)对于具有有限VC维度$\dVC$的二值类别的可实现传导学习,其中$u \ge m \ge \dVC$,其中$u$和$m$分别是测试特征和训练特征的数量,我们的定理6.1给出了一个几乎最优的界限$Θ(\dVC \log(me/\dVC)/m)$,与极小比率$Θ(\dVC/m)$一致,直到$\log m$,解决了一个有十年历史的未解问题;和(2)定理6.2提出了一个与当前最先进技术相比更锋利的传导核学习的超额风险界限。
更新时间: 2025-12-15 02:36:34
领域: stat.ML,cs.LG
SeVeDo: A Heterogeneous Transformer Accelerator for Low-Bit Inference via Hierarchical Group Quantization and SVD-Guided Mixed Precision
Low-bit quantization is a promising technique for efficient transformer inference by reducing computational and memory overhead. However, aggressive bitwidth reduction remains challenging due to activation outliers, leading to accuracy degradation. Existing methods, such as outlier-handling and group quantization, achieve high accuracy but incur substantial energy consumption. To address this, we propose SeVeDo, an energy-efficient SVD-based heterogeneous accelerator that structurally separates outlier-sensitive components into a high-precision low-rank path, while the remaining computations are executed in a low-bit residual datapath with group quantization. To further enhance efficiency, Hierarchical Group Quantization (HGQ) combines coarse-grained floating-point scaling with fine-grained shifting, effectively reducing dequantization cost. Also, SVD-guided mixed precision (SVD-MP) statically allocates higher bitwidths to precision-sensitive components identified through low-rank decomposition, thereby minimizing floating-point operation cost. Experimental results show that SeVeDo achieves a peak energy efficiency of 13.8TOPS/W, surpassing conventional designs, with 12.7TOPS/W on ViT-Base and 13.4TOPS/W on Llama2-7B benchmarks.
Updated: 2025-12-15 02:29:08
标题: SeVeDo:一种用于低位推断的异构Transformer加速器,通过分层组量化和SVD引导的混合精度
摘要: 低比特量化是一种有效的变压器推理技术,通过减少计算和内存开销来提高效率。然而,由于激活异常值导致的准确性降低,依然存在挑战性的比特宽度降低。现有方法,如异常值处理和组量化,可以实现高准确性,但会引起大量能源消耗。为了解决这个问题,我们提出了SeVeDo,一种基于SVD的节能异构加速器,将异常值敏感组件结构上分离为高精度低秩路径,而其余计算则在低比特残差数据路径中执行,同时采用组量化。为进一步提高效率,分层组量化(HGQ)结合粗粒度浮点缩放和细粒度位移,有效降低去量化成本。此外,SVD引导的混合精度(SVD-MP)通过低秩分解识别出精度敏感组件,静态分配更高的比特宽度,从而最小化浮点运算成本。实验结果表明,SeVeDo在ViT-Base和Llama2-7B基准上分别达到了13.8TOPS/W和12.7TOPS/W的峰值能效,超越了传统设计。
更新时间: 2025-12-15 02:29:08
领域: cs.LG,cs.AR
MADTempo: An Interactive System for Multi-Event Temporal Video Retrieval with Query Augmentation
The rapid expansion of video content across online platforms has accelerated the need for retrieval systems capable of understanding not only isolated visual moments but also the temporal structure of complex events. Existing approaches often fall short in modeling temporal dependencies across multiple events and in handling queries that reference unseen or rare visual concepts. To address these challenges, we introduce MADTempo, a video retrieval framework developed by our team, AIO_Trinh, that unifies temporal search with web-scale visual grounding. Our temporal search mechanism captures event-level continuity by aggregating similarity scores across sequential video segments, enabling coherent retrieval of multi-event queries. Complementarily, a Google Image Search-based fallback module expands query representations with external web imagery, effectively bridging gaps in pretrained visual embeddings and improving robustness against out-of-distribution (OOD) queries. Together, these components advance the temporal rea- soning and generalization capabilities of modern video retrieval systems, paving the way for more semantically aware and adaptive retrieval across large-scale video corpora.
Updated: 2025-12-15 02:25:46
标题: MADTempo:一种用于多事件时间视频检索的交互式系统,具有查询增强功能
摘要: 视频内容在在线平台上的迅速扩展加快了对检索系统的需求,这些系统不仅能够理解孤立的视觉时刻,还能理解复杂事件的时间结构。现有方法通常在建模跨多个事件的时间依赖性和处理引用未见或罕见视觉概念的查询方面存在不足。为了解决这些挑战,我们介绍了MADTempo,这是由我们的团队AIO_Trinh开发的视频检索框架,它将时间搜索与大规模视觉定位结合在一起。我们的时间搜索机制通过在连续视频片段之间聚合相似性分数来捕捉事件级别的连续性,从而实现多事件查询的连贯检索。此外,基于Google图像搜索的备用模块利用外部网络图像扩展查询表示,有效地弥合了预训练视觉嵌入中的差距,并提高了对分布外(OOD)查询的鲁棒性。这些组件共同推进了现代视频检索系统的时间推理和泛化能力,为在大规模视频语料库中实现更具语义意识和自适应的检索铺平了道路。
更新时间: 2025-12-15 02:25:46
领域: cs.CV,cs.AI
Joint Hybrid Beamforming and Artificial Noise Design for Secure Multi-UAV ISAC Networks
Integrated sensing and communication (ISAC) emerges as a key enabler for next-generation applications such as smart cities and autonomous systems. Its integration with unmanned aerial vehicles (UAVs) unlocks new potentials for reliable communication and precise sensing in dynamic aerial environments. However, existing research predominantly treats UAVs as aerial base stations, overlooking their role as ISAC users, and fails to leverage large-scale antenna arrays at terrestrial base stations to enhance security and spectral efficiency. This paper propose a secure and spectral efficient ISAC framework for multi-UAV networks, and a two-stage optimization approach is developed to jointly design hybrid beamforming (HBF), artificial noise (AN) injection, and UAV trajectories. Aiming at maximizing the sum secrecy rate, the first stage employs Proximal Policy Optimization (PPO) to optimize digital beamformers and trajectories, and the second stage decomposes the digital solution into analog and digital components via low-complexity matrix factorization. Simulation results demonstrate the effectiveness of the proposed framework compared to benchmark schemes.
Updated: 2025-12-15 02:23:56
标题: 多无人机ISAC网络安全的联合混合波束成形和人工噪声设计
摘要: 综合感知和通信(ISAC)作为下一代应用(如智能城市和自主系统)的关键推动者出现。其与无人机(UAV)的集成为可靠通信和精确感知在动态空中环境中释放了新的潜力。然而,现有研究主要将UAV视为空中基站,忽视了它们作为ISAC用户的角色,并未利用地面基站上的大规模天线阵列来增强安全性和频谱效率。本文提出了一种用于多UAV网络的安全和频谱有效的ISAC框架,并开发了一个两阶段优化方法来联合设计混合波束成形(HBF)、人工噪声(AN)注入和UAV轨迹。第一阶段旨在最大化总保密速率,采用Proximal Policy Optimization(PPO)来优化数字波束形成器和轨迹,第二阶段通过低复杂度矩阵分解将数字解决方案分解为模拟和数字组件。仿真结果表明,所提出的框架相对于基准方案的有效性。
更新时间: 2025-12-15 02:23:56
领域: eess.SP,cs.AI
Integrated Prediction and Multi-period Portfolio Optimization
Multi-period portfolio optimization is important for real portfolio management, as it accounts for transaction costs, path-dependent risks, and the intertemporal structure of trading decisions that single-period models cannot capture. Classical methods usually follow a two-stage framework: machine learning algorithms are employed to produce forecasts that closely fit the realized returns, and the predicted values are then used in a downstream portfolio optimization problem to determine the asset weights. This separation leads to a fundamental misalignment between predictions and decision outcomes, while also ignoring the impact of transaction costs. To bridge this gap, recent studies have proposed the idea of end-to-end learning, integrating the two stages into a single pipeline. This paper introduces IPMO (Integrated Prediction and Multi-period Portfolio Optimization), a model for multi-period mean-variance portfolio optimization with turnover penalties. The predictor generates multi-period return forecasts that parameterize a differentiable convex optimization layer, which in turn drives learning via portfolio performance. For scalability, we introduce a mirror-descent fixed-point (MDFP) differentiation scheme that avoids factorizing the Karush-Kuhn-Tucker (KKT) systems, which thus yields stable implicit gradients and nearly scale-insensitive runtime as the decision horizon grows. In experiments with real market data and two representative time-series prediction models, the IPMO method consistently outperforms the two-stage benchmarks in risk-adjusted performance net of transaction costs and achieves more coherent allocation paths. Our results show that integrating machine learning prediction with optimization in the multi-period setting improves financial outcomes and remains computationally tractable.
Updated: 2025-12-15 02:16:24
标题: 综合预测与多期投资组合优化
摘要: 多期投资组合优化对于实际投资组合管理非常重要,因为它考虑了交易成本、路径依赖风险以及单期模型无法捕捉的交易决策的时间结构。经典方法通常遵循一个两阶段的框架:机器学习算法被用来产生与实现回报密切匹配的预测,然后预测值被用于下游投资组合优化问题,以确定资产权重。这种分离导致了预测与决策结果之间的根本不一致,同时也忽略了交易成本的影响。为了弥合这一差距,最近的研究提出了端到端学习的概念,将这两个阶段整合到一个单一的流程中。本文介绍了IPMO(综合预测和多期投资组合优化)模型,这是一个带有换手惩罚的多期均值方差投资组合优化模型。预测器生成多期回报预测,参数化一个可微的凸优化层,进而通过投资组合表现驱动学习。为了可扩展性,我们引入了一个镜像下降固定点(MDFP)微分方案,避免分解Karush-Kuhn-Tucker(KKT)系统,从而产生稳定的隐式梯度,并且随着决策视野的增长几乎不受规模影响的运行时间。在使用真实市场数据和两种代表性时间序列预测模型的实验中,IPMO方法始终在扣除交易成本后的风险调整绩效方面优于两阶段基准,并实现了更一致的配置路径。我们的结果表明,在多期设置中将机器学习预测与优化相结合可以改善金融结果,并且仍然在计算上可行。
更新时间: 2025-12-15 02:16:24
领域: cs.CE,cs.LG
Mathematical exploration and discovery at scale
AlphaEvolve (Novikov et al., 2025) is a generic evolutionary coding agent that combines the generative capabilities of LLMs with automated evaluation in an iterative evolutionary framework that proposes, tests, and refines algorithmic solutions to challenging scientific and practical problems. In this paper we showcase AlphaEvolve as a tool for autonomously discovering novel mathematical constructions and advancing our understanding of long-standing open problems. To demonstrate its breadth, we considered a list of 67 problems spanning mathematical analysis, combinatorics, geometry, and number theory. The system rediscovered the best known solutions in most of the cases and discovered improved solutions in several. In some instances, AlphaEvolve is also able to generalize results for a finite number of input values into a formula valid for all input values. Furthermore, we are able to combine this methodology with Deep Think and AlphaProof in a broader framework where the additional proof-assistants and reasoning systems provide automated proof generation and further mathematical insights. These results demonstrate that large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best known results, highlighting the potential for significant new ways of interaction between mathematicians and AI systems. We present AlphaEvolve as a powerful new tool for mathematical discovery, capable of exploring vast search spaces to solve complex optimization problems at scale, often with significantly reduced requirements on preparation and computation time.
Updated: 2025-12-15 02:15:40
标题: 规模化的数学探索与发现
摘要: AlphaEvolve(Novikov等人,2025)是一个通用的进化编码代理,结合了LLMs的生成能力和迭代进化框架中的自动评估,提出、测试和完善针对具有挑战性的科学和实际问题的算法解决方案。在本文中,我们展示AlphaEvolve作为一种工具,用于自主发现新颖的数学构造并推动我们对长期存在的开放问题的理解。 为了展示其广泛性,我们考虑了一系列涵盖数学分析、组合数学、几何学和数论的67个问题。系统重新发现了大多数情况下已知的最佳解决方案,并在一些情况下发现了改进的解决方案。在某些情况下,AlphaEvolve还能够将有限数量的输入值的结果推广为适用于所有输入值的公式。此外,我们能够将这种方法与Deep Think和AlphaProof相结合,形成一个更广泛的框架,其中额外的证明助手和推理系统提供自动生成证明和更多数学见解。 这些结果表明,大型语言模型引导的进化搜索可以自主地发现数学构造,以补充人类直觉,有时甚至可以匹敌或改进已知的最佳结果,突出了数学家和AI系统之间进行重要的新互动方式的潜力。我们将AlphaEvolve作为一种强大的新型数学发现工具呈现,能够探索广阔的搜索空间以解决大规模的复杂优化问题,通常极大地减少了对准备和计算时间的要求。
更新时间: 2025-12-15 02:15:40
领域: cs.NE,cs.AI,math.CA,math.CO,math.MG
VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm
Vision-language models (VLMs) excel at image understanding tasks, but the large number of visual tokens imposes significant computational costs, hindering deployment on mobile devices. Many pruning methods rely solely on token importance and thus overlook inter-token redundancy, retaining numerous duplicated tokens and wasting capacity. Although some redundancy-aware approaches have been proposed, they often ignore the spatial relationships among visual tokens. This can lead to overly sparse selections of retained tokens that fail to adequately cover the regions of target objects. To address these limitations, we propose VLM-Pruner, a training-free token pruning algorithm that explicitly balances redundancy and spatial sparsity. We introduce a centrifugal token pruning paradigm that enables near-to-far selection while prioritizing the preservation of fine-grained object details. Moreover, we design a Buffering for Spatial Sparsity (BSS) criterion that defers the selection of spatially distant tokens. We further adopt a parallel greedy strategy to conduct token selection efficiently. To mitigate information loss from pruning, we selectively fuse salient information from the discarded tokens into the retained ones. Comprehensive comparisons demonstrate that VLM-Pruner consistently outperforms strong baselines across five VLMs with an 88.9\% pruning rate, while delivering an end-to-end inference speedup. The code is available at https://github.com/Casey-bit/VLMPruner.
Updated: 2025-12-15 02:13:12
标题: VLM-Pruner:在高效VLM离心令牌修剪范式中对空间稀疏性进行缓冲
摘要: 视觉语言模型(VLMs)在图像理解任务方面表现出色,但大量的视觉令牌带来了显著的计算成本,阻碍了在移动设备上的部署。许多剪枝方法仅依赖于令牌重要性,从而忽略了令牌之间的冗余,保留了大量重复的令牌,浪费了容量。尽管已经提出了一些关注冗余的方法,但它们通常忽略了视觉令牌之间的空间关系。这可能导致保留的令牌选择过于稀疏,不能充分覆盖目标对象的区域。为了解决这些限制,我们提出了VLM-Pruner,这是一种无需训练的令牌剪枝算法,明确平衡了冗余和空间稀疏性。我们引入了一个离心式令牌剪枝范式,实现了近到远的选择,同时优先保留了细粒度对象细节。此外,我们设计了一个用于空间稀疏性缓冲区(BSS)准则,推迟了选择空间距离较远的令牌。我们进一步采用并行贪婪策略,高效地进行令牌选择。为了减少剪枝带来的信息损失,我们有选择地将被丢弃的令牌中的显著信息融合到保留的令牌中。全面的比较表明,VLM-Pruner在五个VLMs中以88.9\%的剪枝率稳定地优于强基线,并提供端到端的推理加速。代码可在https://github.com/Casey-bit/VLMPruner上找到。
更新时间: 2025-12-15 02:13:12
领域: cs.CV,cs.LG
LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization
In modern financial markets, investors increasingly seek personalized and adaptive portfolio strategies that reflect their individual risk preferences and respond to dynamic market conditions. Traditional rule-based or static optimization approaches often fail to capture the nonlinear interactions among investor behavior, market volatility, and evolving financial objectives. To address these limitations, this paper introduces the LLM-based Personalized Portfolio Recommender , an integrated framework that combines Large Language Models, reinforcement learning, and individualized risk preference modeling to support intelligent investment decision-making.
Updated: 2025-12-15 02:12:53
标题: 基于LLM的个性化投资组合推荐器:整合大型语言模型和强化学习实现智能投资策略优化
摘要: 在现代金融市场中,投资者越来越多地寻求个性化和适应性的投资组合策略,以反映其个人风险偏好并应对动态市场条件。传统的基于规则或静态优化方法往往无法捕捉投资者行为、市场波动和不断发展的财务目标之间的非线性相互作用。为了解决这些局限性,本文介绍了基于LLM的个性化投资组合推荐系统,这是一个集成框架,结合了大型语言模型、强化学习和个性化风险偏好建模,以支持智能投资决策。
更新时间: 2025-12-15 02:12:53
领域: cs.LG
Improving Time Series Forecasting via Instance-aware Post-hoc Revision
Time series forecasting plays a vital role in various real-world applications and has attracted significant attention in recent decades. While recent methods have achieved remarkable accuracy by incorporating advanced inductive biases and training strategies, we observe that instance-level variations remain a significant challenge. These variations--stemming from distribution shifts, missing data, and long-tail patterns--often lead to suboptimal forecasts for specific instances, even when overall performance appears strong. To address this issue, we propose a model-agnostic framework, PIR, designed to enhance forecasting performance through Post-forecasting Identification and Revision. Specifically, PIR first identifies biased forecasting instances by estimating their accuracy. Based on this, the framework revises the forecasts using contextual information, including covariates and historical time series, from both local and global perspectives in a post-processing fashion. Extensive experiments on real-world datasets with mainstream forecasting models demonstrate that PIR effectively mitigates instance-level errors and significantly improves forecasting reliability.
Updated: 2025-12-15 02:12:16
标题: 通过实例感知的事后修订改进时间序列预测
摘要: 时间序列预测在各种实际应用中起着至关重要的作用,并在近几十年吸引了相当大的关注。尽管最近的方法通过整合先进的归纳偏差和训练策略取得了显著的准确性,但我们观察到实例级别的变化仍然是一个重大挑战。这些变化——源自分布转移、缺失数据和长尾模式——通常会导致特定实例的预测不佳,即使整体性能看起来很强。为了解决这个问题,我们提出了一个模型无关的框架PIR,旨在通过后处理方法通过后处理方法提高预测性能。具体而言,PIR首先通过估计其准确性来识别有偏预测实例。基于此,该框架使用来自本地和全局视角的上下文信息,包括协变量和历史时间序列,以后处理的方式修订预测。对主流预测模型的真实数据集进行的大量实验表明,PIR有效地减轻了实例级别的错误,并显著提高了预测的可靠性。
更新时间: 2025-12-15 02:12:16
领域: cs.LG
Cisco Integrated AI Security and Safety Framework Report
Artificial intelligence (AI) systems are being readily and rapidly adopted, increasingly permeating critical domains: from consumer platforms and enterprise software to networked systems with embedded agents. While this has unlocked potential for human productivity gains, the attack surface has expanded accordingly: threats now span content safety failures (e.g., harmful or deceptive outputs), model and data integrity compromise (e.g., poisoning, supply-chain tampering), runtime manipulations (e.g., prompt injection, tool and agent misuse), and ecosystem risks (e.g., orchestration abuse, multi-agent collusion). Existing frameworks such as MITRE ATLAS, National Institute of Standards and Technology (NIST) AI 100-2 Adversarial Machine Learning (AML) taxonomy, and OWASP Top 10s for Large Language Models (LLMs) and Agentic AI Applications provide valuable viewpoints, but each covers only slices of this multi-dimensional space. This paper presents Cisco's Integrated AI Security and Safety Framework ("AI Security Framework"), a unified, lifecycle-aware taxonomy and operationalization framework that can be used to classify, integrate, and operationalize the full range of AI risks. It integrates AI security and AI safety across modalities, agents, pipelines, and the broader ecosystem. The AI Security Framework is designed to be practical for threat identification, red-teaming, risk prioritization, and it is comprehensive in scope and can be extensible to emerging deployments in multimodal contexts, humanoids, wearables, and sensory infrastructures. We analyze gaps in prevailing frameworks, discuss design principles for our framework, and demonstrate how the taxonomy provides structure for understanding how modern AI systems fail, how adversaries exploit these failures, and how organizations can build defenses across the AI lifecycle that evolve alongside capability advancements.
Updated: 2025-12-15 02:12:12
标题: 思科集成人工智能安全和安全框架报告
摘要: 人工智能(AI)系统正在被广泛和迅速地采用,越来越多地渗透到关键领域:从消费者平台和企业软件到嵌入代理的网络系统。虽然这为人类生产力的提升打开了潜力,但攻击面也相应扩大:威胁现在涵盖了内容安全失败(例如有害或欺骗性输出)、模型和数据完整性受损(例如中毒、供应链篡改)、运行时操纵(例如提示注入、工具和代理误用)以及生态系统风险(例如协调滥用、多代理勾结)。现有框架如MITRE ATLAS、国家标准和技术研究所(NIST)AI 100-2对抗性机器学习(AML)分类法,以及OWASP关于大型语言模型(LLMs)和机械化AI应用的前10名提供了有价值的观点,但每个只涵盖了这个多维空间的一部分。 本文介绍了思科的综合AI安全和安全框架(“AI安全框架”),这是一个统一的、生命周期感知的分类和操作化框架,可用于对全范围的AI风险进行分类、集成和操作化。它在模态、代理、管道和更广泛的生态系统中集成了AI安全和AI安全。AI安全框架旨在实用,用于威胁识别、红队行动、风险优先级确定,它在范围上是全面的,并且可以扩展到多模态环境、人形机器人、可穿戴设备和感应基础设施中新兴部署。我们分析了现有框架中的差距,讨论了我们框架的设计原则,并展示了分类法如何提供结构,以了解现代AI系统失败的方式、对手如何利用这些失败,以及组织如何在AI生命周期中构建随着能力进步而演变的防御措施。
更新时间: 2025-12-15 02:12:12
领域: cs.CR,cs.AI
Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions
Diffusion and flow matching approaches to generative modeling have shown promise in domains where the state space is continuous, such as image generation or protein folding & design, and discrete, exemplified by diffusion large language models. They offer a natural fit when the number of elements in a state is fixed in advance (e.g. images), but require ad hoc solutions when, for example, the length of a response from a large language model, or the number of amino acids in a protein chain is not known a priori. Here we propose Branching Flows, a generative modeling framework that, like diffusion and flow matching approaches, transports a simple distribution to the data distribution. But in Branching Flows, the elements in the state evolve over a forest of binary trees, branching and dying stochastically with rates that are learned by the model. This allows the model to control, during generation, the number of elements in the sequence. We also show that Branching Flows can compose with any flow matching base process on discrete sets, continuous Euclidean spaces, smooth manifolds, and `multimodal' product spaces that mix these components. We demonstrate this in three domains: small molecule generation (multimodal), antibody sequence generation (discrete), and protein backbone generation (multimodal), and show that Branching Flows is a capable distribution learner with a stable learning objective, and that it enables new capabilities.
Updated: 2025-12-15 02:11:44
标题: 分支流:分散、连续和流匹配与分裂和删除
摘要: 扩散和流匹配方法在生成建模领域显示出潜力,在状态空间连续的领域,如图像生成或蛋白质折叠与设计,以及离散的领域,如扩散大语言模型。它们在状态中元素数量提前固定的情况下(例如图像)具有自然匹配,但当大语言模型响应的长度或蛋白质链中氨基酸的数量等事先未知时,需要临时解决方案。 在这里,我们提出了分支流(Branching Flows),这是一个生成建模框架,类似于扩散和流匹配方法,将简单分布传输到数据分布。但在分支流中,状态中的元素在一组二叉树的森林中演化,通过学习模型随机分支和死亡的速率。这使得模型能够在生成过程中控制序列中元素的数量。我们还展示了分支流可以与任何流匹配基本过程组合,包括离散集合、连续欧几里得空间、光滑流形以及混合这些组件的“多模态”乘积空间。我们在三个领域展示了这一点:小分子生成(多模态)、抗体序列生成(离散)和蛋白质主干生成(多模态),并展示了分支流是一个能够学习分布的模型,具有稳定的学习目标,并且能够实现新的功能。
更新时间: 2025-12-15 02:11:44
领域: stat.ML,cs.LG
The Gray Zone of Faithfulness: Taming Ambiguity in Unfaithfulness Detection
Ensuring that Large Language Models (LLMs) generate summaries faithful to a given source document is essential for real-world applications. While prior research has explored LLM faithfulness, existing benchmarks suffer from annotation ambiguity, primarily due to the ill-defined boundary of permissible external knowledge in generated outputs. For instance, common sense is often incorporated into responses and labeled as "faithful", yet the acceptable extent of such knowledge remains unspecified, leading to inconsistent annotations. To address this issue, we propose a novel faithfulness annotation framework, which introduces an intermediate category, Out-Dependent, to classify cases where external knowledge is required for verification. Using this framework, we construct VeriGray (Verification with the Gray Zone) -- a new unfaithfulness detection benchmark in summarization. Statistics reveal that even SOTA LLMs, such as GPT-5, exhibit hallucinations ($\sim 6\%$ of sentences) in summarization tasks. Moreover, a substantial proportion ($\sim 8\%$ on average of models) of generated sentences fall into the Out-Dependent category, underscoring the importance of resolving annotation ambiguity in unfaithfulness detection benchmarks. Experiments demonstrate that our benchmark poses significant challenges to multiple baseline methods, indicating considerable room for future improvement.
Updated: 2025-12-15 02:10:04
标题: 忠诚的灰色地带:驯服在不忠检测中的模糊性
摘要: 确保大型语言模型(LLMs)生成符合给定源文档的摘要对于实际应用至关重要。尽管先前的研究已经探讨了LLM的忠实度,但现有的基准存在注释模糊性,主要是由于生成输出中允许的外部知识的界限不明确。例如,常识常常被纳入响应并标记为“忠实”,但此类知识的可接受范围仍未明确,导致注释不一致。为解决这一问题,我们提出了一个新颖的忠实度注释框架,引入了一个中间分类,即Out-Dependent,用于分类需要外部知识进行验证的情况。利用这一框架,我们构建了VeriGray(灰区验证) - 一个新的摘要中不忠实检测基准。统计数据显示,即使是先进的LLMs,如GPT-5,在摘要任务中也存在幻觉(大约6%的句子)。此外,相当大比例(平均约8%的模型)的生成句子属于Out-Dependent类别,突出了解决不忠实检测基准中的注释模糊性的重要性。实验表明,我们的基准对多个基线方法提出了重大挑战,表明未来有很大改进空间。
更新时间: 2025-12-15 02:10:04
领域: cs.CL,cs.AI
Measuring What Matters: The AI Pluralism Index
Artificial intelligence systems increasingly mediate knowledge, communication, and decision making. Development and governance remain concentrated within a small set of firms and states, raising concerns that technologies may encode narrow interests and limit public agency. Capability benchmarks for language, vision, and coding are common, yet public, auditable measures of pluralistic governance are rare. We define AI pluralism as the degree to which affected stakeholders can shape objectives, data practices, safeguards, and deployment. We present the AI Pluralism Index (AIPI), a transparent, evidence-based instrument that evaluates producers and system families across four pillars: participatory governance, inclusivity and diversity, transparency, and accountability. AIPI codes verifiable practices from public artifacts and independent evaluations, explicitly handling "Unknown" evidence to report both lower-bound ("evidence") and known-only scores with coverage. We formalize the measurement model; implement a reproducible pipeline that integrates structured web and repository analysis, external assessments, and expert interviews; and assess reliability with inter-rater agreement, coverage reporting, cross-index correlations, and sensitivity analysis. The protocol, codebook, scoring scripts, and evidence graph are maintained openly with versioned releases and a public adjudication process. We report pilot provider results and situate AIPI relative to adjacent transparency, safety, and governance frameworks. The index aims to steer incentives toward pluralistic practice and to equip policymakers, procurers, and the public with comparable evidence.
Updated: 2025-12-15 02:08:45
标题: 衡量重要性:AI多元主义指数
摘要: 人工智能系统越来越多地介入知识、交流和决策。发展和治理仍然集中在少数几家公司和国家,引发人们担忧,担心技术可能会编码狭隘的利益并限制公共机构的行动。语言、视觉和编码的能力基准很普遍,但公开、可审计的多元治理措施却很少见。我们将AI多元主义定义为受影响利益相关者能够塑造目标、数据实践、保障措施和部署的程度。我们提出了AI多元主义指数(AIPI),这是一个透明、基于证据的工具,评估生产者和系统家族在参与式治理、包容性和多样性、透明度和问责性四个支柱上的表现。AIPI从公共资料和独立评估中编码可验证的实践,明确处理“未知”证据,以报告包括下限(“证据”)和仅已知分数的覆盖范围。我们形式化了测量模型;实施了一个可重现的流程,集成了结构化网络和存储库分析,外部评估和专家访谈;并通过互评者一致性、覆盖范围报告、交叉指数相关性和敏感性分析来评估可靠性。协议、代码书、评分脚本和证据图表都以版本发布和公开裁决过程的形式维护。我们报告了试点提供者的结果,并将AIPI与相邻的透明度、安全性和治理框架进行了比较。该指数旨在引导激励措施朝向多元实践,并为政策制定者、采购者和公众提供可比较的证据。
更新时间: 2025-12-15 02:08:45
领域: cs.AI
Satisfiability Modulo Theory Meets Inductive Logic Programming
Inductive Logic Programming (ILP) provides interpretable rule learning in relational domains, yet remains limited in its ability to induce and reason with numerical constraints. Classical ILP systems operate over discrete predicates and typically rely on discretisation or hand-crafted numerical predicates, making it difficult to infer thresholds or arithmetic relations that must hold jointly across examples. Recent work has begun to address these limitations through tighter integrations of ILP with Satisfiability Modulo Theories (SMT) or specialised numerical inference mechanisms. In this paper we investigate a modular alternative that couples the ILP system PyGol with the SMT solver Z3. Candidate clauses proposed by PyGol are interpreted as quantifier-free formulas over background theories such as linear or nonlinear real arithmetic, allowing numerical parameters to be instantiated and verified by the SMT solver while preserving ILP's declarative relational bias. This supports the induction of hybrid rules that combine symbolic predicates with learned numerical constraints, including thresholds, intervals, and multi-literal arithmetic relations. We formalise this SMT-ILP setting and evaluate it on a suite of synthetic datasets designed to probe linear, relational, nonlinear, and multi-hop reasoning. The results illustrate how a modular SMT-ILP architecture can extend the expressivity of symbolic rule learning, complementing prior numerical ILP approaches while providing a flexible basis for future extensions toward richer theory-aware induction.
Updated: 2025-12-15 02:08:32
标题: 满足性模态理论遇上归纳逻辑编程
摘要: 归纳逻辑编程(ILP)在关系领域提供可解释的规则学习,但在诱导和处理数值约束方面仍存在局限性。经典的ILP系统在离散谓词上运行,通常依赖于离散化或手工制作的数值谓词,这使得难以推断在示例之间必须共同保持的阈值或算术关系。最近的研究已经开始通过更紧密地将ILP与Satisfiability Modulo Theories(SMT)或专门的数值推理机制整合来解决这些限制。在本文中,我们研究了一种模块化的替代方案,将ILP系统PyGol与SMT求解器Z3相结合。PyGol提出的候选子句被解释为关于背景理论(如线性或非线性实数算术)的无量词公式,从而允许SMT求解器实例化和验证数值参数,同时保持ILP的声明性关系偏差。这支持结合符号谓词和学习的数值约束(包括阈值、区间和多文字算术关系)的混合规则的归纳。我们形式化了这种SMT-ILP设置,并在一系列设计用于探索线性、关系、非线性和多跳推理的合成数据集上对其进行了评估。结果说明了模块化的SMT-ILP架构如何扩展符号规则学习的表达能力,补充了以前的数值ILP方法,同时为未来向更丰富的理论感知归纳提供了灵活的基础。
更新时间: 2025-12-15 02:08:32
领域: cs.AI
Efficient Quantum-resistant Delegable Data Analysis Scheme with Revocation and Keyword Search in Mobile Cloud Computing
With the rapid growth of smart devices and mobile internet, large-scale data processing is becoming increasingly important, while mobile devices remain resource-constrained. Mobile Cloud Computing (MCC) addresses this limitation by offloading tasks to the cloud. Nevertheless, the widespread adoption of MCC also raises challenges such as data privacy, selective computation, efficient revocation, and keyword search. Additionally, the development of quantum computers also threatens data security in MCC. To address these challenges, we propose an efficient quantum-resistant delegable data analysis scheme with revocation and keyword search (EQDDA-RKS) for MCC. In the proposed scheme, an authorised mobile device can perform keyword searches and compute inner product values over encrypted data without disclosing any additional information. Meanwhile, if a user's function key is compromised, it can be revoked. To alleviate the burden on mobile devices, most of the computation which should be executed by the mobile device is outsourced to a cloud server, and the mobile device only needs to interact with a central authority once. Furthermore, an authorised mobile device can temporarily delegate its keyword search and function computation rights to a delegatee in case the device becomes unavailable due to power depletion, going offline, etc. Our scheme is formally proven secure in the standard model against quantum attacks, chosen plaintext attacks, chosen keyword attacks, and outside keyword guessing attacks. Furthermore, the analysis demonstrates that the number of interactions between a mobile device and the central authority is $O(1)$ in our scheme, rather than growing linearly with the number of functions, which is well-suited for MCC scenarios.
Updated: 2025-12-15 02:07:52
标题: Mobile Cloud Computing 中高效的抗量子委托数据分析方案,具备撤销和关键字搜索功能
摘要: 随着智能设备和移动互联网的快速增长,大规模数据处理变得越来越重要,而移动设备仍然受到资源限制。移动云计算(MCC)通过将任务卸载到云端来解决这一限制。然而,MCC的广泛应用也带来了诸如数据隐私、选择性计算、高效吊销和关键字搜索等挑战。此外,量子计算机的发展也威胁到MCC中的数据安全。为了应对这些挑战,我们提出了一种有效的、抗量子的可委托数据分析方案,具有吊销和关键字搜索功能(EQDDA-RKS)用于MCC。在提出的方案中,授权的移动设备可以在不披露任何额外信息的情况下对加密数据进行关键字搜索和计算内积值。同时,如果用户的功能密钥被破坏,可以被吊销。为了减轻移动设备的负担,大部分应该由移动设备执行的计算被外包给云服务器,移动设备只需与中央机构进行一次交互。此外,授权的移动设备可以临时将其关键字搜索和功能计算权限委托给委托者,以防设备因功耗耗尽、离线等原因不可用。我们的方案在标准模型中经过正式证明,抵抗量子攻击、选择明文攻击、选择关键字攻击和外部关键字猜测攻击。此外,分析表明,在我们的方案中,移动设备与中央机构之间的交互次数是$O(1)$,而不是随着功能数量的线性增长,这非常适合MCC场景。
更新时间: 2025-12-15 02:07:52
领域: cs.CR
CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs
Large Language Models (LLMs) are often fine-tuned to adapt their general-purpose knowledge to specific tasks and domains such as cyber threat intelligence (CTI). Fine-tuning is mostly done through proprietary datasets that may contain sensitive information. Owners expect their fine-tuned model to not inadvertently leak this information to potentially adversarial end users. Using CTI as a use case, we demonstrate that data-extraction attacks can recover sensitive information from fine-tuned models on CTI reports, underscoring the need for mitigation. Retraining the full model to eliminate this leakage is computationally expensive and impractical. We propose an alternative approach, which we call privacy alignment, inspired by safety alignment in LLMs. Just like safety alignment teaches the model to abide by safety constraints through a few examples, we enforce privacy alignment through few-shot supervision, integrating a privacy classifier and a privacy redactor, both handled by the same underlying LLM. We evaluate our system, called CTIGuardian, using GPT-4o mini and Mistral-7B Instruct models, benchmarking against Presidio, a named entity recognition (NER) baseline. Results show that CTIGuardian provides a better privacy-utility trade-off than NER based models. While we demonstrate its effectiveness on a CTI use case, the framework is generic enough to be applicable to other sensitive domains.
Updated: 2025-12-15 01:59:14
标题: CTIGuardian:一种用于减少微调LLMs中隐私泄露的少样本框架
摘要: 大型语言模型(LLMs)通常被微调以使其通用知识适应特定任务和领域,如网络威胁情报(CTI)。微调通常通过可能包含敏感信息的专有数据集完成。所有者期望其微调模型不会意外地泄露这些信息给潜在的对抗性最终用户。以CTI为用例,我们展示了数据提取攻击可以从CTI报告上的微调模型中恢复敏感信息,强调了减轻的必要性。重新训练完整模型以消除此泄漏是计算昂贵且不切实际的。我们提出了一种受到LLMs中安全对准启发的替代方法,称为隐私对准。就像安全对准通过几个示例教导模型遵守安全约束一样,我们通过少量监督强制执行隐私对准,集成了一个隐私分类器和一个隐私编辑器,两者都由相同的基础LLM处理。我们使用GPT-4o mini和Mistral-7B Instruct模型评估我们的系统,称为CTIGuardian,与一个命名实体识别(NER)基线Presidio进行基准测试。结果显示,CTIGuardian提供了比基于NER的模型更好的隐私-效用权衡。虽然我们展示了它在CTI用例上的有效性,但该框架足够通用,可适用于其他敏感领域。
更新时间: 2025-12-15 01:59:14
领域: cs.CR,cs.AI,cs.LG
Solving Inequality Proofs with Large Language Models
Inequality proving, crucial across diverse scientific and mathematical fields, tests advanced reasoning skills such as discovering tight bounds and strategic theorem application. This makes it a distinct, demanding frontier for large language models (LLMs), offering insights beyond general mathematical problem-solving. Progress in this area is hampered by existing datasets that are often scarce, synthetic, or rigidly formal. We address this by proposing an informal yet verifiable task formulation, recasting inequality proving into two automatically checkable subtasks: bound estimation and relation prediction. Building on this, we release IneqMath, an expert-curated dataset of Olympiad-level inequalities, including a test set and training corpus enriched with step-wise solutions and theorem annotations. We also develop a novel LLM-as-judge evaluation framework, combining a final-answer judge with four step-wise judges designed to detect common reasoning flaws. A systematic evaluation of 29 leading LLMs on IneqMath reveals a surprising reality: even top models like o1 achieve less than 10% overall accuracy under step-wise scrutiny; this is a drop of up to 65.5% from their accuracy considering only final answer equivalence. This discrepancy exposes fragile deductive chains and a critical gap for current LLMs between merely finding an answer and constructing a rigorous proof. Scaling model size and increasing test-time computation yield limited gains in overall proof correctness. Instead, our findings highlight promising research directions such as theorem-guided reasoning and self-refinement. Code and data are available at https://ineqmath.github.io/.
Updated: 2025-12-15 01:49:41
标题: 用大型语言模型解决不等式证明
摘要: 不平等证明在各种科学和数学领域中至关重要,测试高级推理技能,如发现紧密边界和战略性定理应用。这使得它成为大型语言模型(LLMs)的一个独特而严格的前沿,提供超越一般数学问题解决的见解。该领域的进展受到现有数据集的阻碍,这些数据集通常稀缺、合成或严格正式。我们通过提出一种非正式但可验证的任务形式来解决这个问题,将不等式证明重塑为两个可自动检查的子任务:界限估计和关系预测。基于此,我们发布了IneqMath,这是一个专家策划的奥林匹克级别不等式数据集,包括一个测试集和一个训练语料库,其中包含逐步解决方案和定理注释。我们还开发了一种新颖的LLM作为评判者评估框架,将最终答案评委与设计用于检测常见推理缺陷的四个逐步评委结合在一起。对IneqMath上的29个领先LLM进行系统评估揭示了一个令人惊讶的现实:即使像o1这样的顶级模型在逐步审查下的整体准确率也不到10%;这相对于仅考虑最终答案等价性时的准确率下降了高达65.5%。这种差异暴露了脆弱的演绎链和当前LLMs之间构建严格证明的关键差距。扩大模型规模和增加测试时间计算在整体证明正确性上带来的收益有限。相反,我们的发现突出了有前途的研究方向,如定理引导推理和自我完善。代码和数据可在https://ineqmath.github.io/上获得。
更新时间: 2025-12-15 01:49:41
领域: cs.AI,cs.CL,cs.LG
Evaluating Singular Value Thresholds for DNN Weight Matrices based on Random Matrix Theory
This study evaluates thresholds for removing singular values from singular value decomposition-based low-rank approximations of deep neural network weight matrices. Each weight matrix is modeled as the sum of signal and noise matrices. The low-rank approximation is obtained by removing noise-related singular values using a threshold based on random matrix theory. To assess the adequacy of this threshold, we propose an evaluation metric based on the cosine similarity between the singular vectors of the signal and original weight matrices. The proposed metric is used in numerical experiments to compare two threshold estimation methods.
Updated: 2025-12-15 01:49:20
标题: 基于随机矩阵理论评估DNN权重矩阵的奇异值阈值
摘要: 本研究评估了从基于奇异值分解的低秩逼近深度神经网络权重矩阵中移除奇异值的阈值。每个权重矩阵被建模为信号矩阵和噪声矩阵的和。通过基于随机矩阵理论的阈值移除与噪声相关的奇异值来获得低秩逼近。为了评估该阈值的适当性,我们提出了一种基于信号和原始权重矩阵的奇异向量之间的余弦相似度的评估指标。所提出的指标在数值实验中用于比较两种阈值估计方法。
更新时间: 2025-12-15 01:49:20
领域: stat.ML,cs.LG
Learning in Focus: Detecting Behavioral and Collaborative Engagement Using Vision Transformers
In early childhood education, accurately detecting collaborative and behavioral engagement is essential to foster meaningful learning experiences. This paper presents an AI driven approach that leverages Vision Transformers (ViTs) to automatically classify children s engagement using visual cues such as gaze direction, interaction, and peer collaboration. Utilizing the ChildPlay gaze dataset, our method is trained on annotated video segments to classify behavioral and collaborative engagement states (e.g., engaged, not engaged, collaborative, not collaborative). We evaluated six state of the art transformer models: Vision Transformer (ViT), Data efficient Image Transformer (DeiT), Swin Transformer, VitGaze, APVit and GazeTR. Among these, the Swin Transformer achieved the highest classification performance with an accuracy of 97.58 percent, demonstrating its effectiveness in modeling local and global attention. Our results highlight the potential of transformer based architectures for scalable, automated engagement analysis in real world educational settings.
Updated: 2025-12-15 01:43:33
标题: 聚焦学习:利用视觉变换器检测行为和协作参与
摘要: 在幼儿教育中,准确检测合作和行为参与对于促进有意义的学习经验至关重要。本文提出了一种利用视觉转换器(ViTs)的AI驱动方法,通过视觉线索(如凝视方向、互动和同伴合作)自动分类儿童的参与度。利用ChildPlay凝视数据集,我们的方法在带有注释的视频片段上进行训练,以分类行为和合作参与状态(例如,参与、未参与、合作、非合作)。我们评估了六种最先进的转换器模型:Vision Transformer(ViT)、Data efficient Image Transformer(DeiT)、Swin Transformer、VitGaze、APVit和GazeTR。在这些模型中,Swin Transformer的分类性能最高,准确率达到97.58%,表明其在建模局部和全局关注方面的有效性。我们的结果突显了基于转换器的体系结构在实际教育环境中可扩展、自动化的参与度分析的潜力。
更新时间: 2025-12-15 01:43:33
领域: q-bio.NC,cs.AI
Robust Simultaneous Multislice MRI Reconstruction Using Slice-Wise Learned Generative Diffusion Priors
Simultaneous multislice (SMS) imaging is a powerful technique for accelerating magnetic resonance imaging (MRI) acquisitions. However, SMS reconstruction remains challenging due to complex signal interactions between and within the excited slices. In this study, we introduce ROGER, a robust SMS MRI reconstruction method based on deep generative priors. Utilizing denoising diffusion probabilistic models (DDPM), ROGER begins with Gaussian noise and gradually recovers individual slices through reverse diffusion iterations while enforcing data consistency from measured k-space data within the readout concatenation framework. The posterior sampling procedure is designed such that the DDPM training can be performed on single-slice images without requiring modifications for SMS tasks. Additionally, our method incorporates a low-frequency enhancement (LFE) module to address the practical issue that SMS-accelerated fast spin echo (FSE) and echo planar imaging (EPI) sequences cannot easily embed fully-sampled autocalibration signals. Extensive experiments on both retrospectively and prospectively accelerated datasets demonstrate that ROGER consistently outperforms existing methods, enhancing both anatomical and functional imaging with strong out-of-distribution generalization. The source code and sample data for ROGER are available at https://github.com/Solor-pikachu/ROGER.
Updated: 2025-12-15 01:42:28
标题: 使用基于切片学习的生成扩散先验的鲁棒的同时多层切片MRI重建
摘要: 多重切片同时成像(SMS)是加速磁共振成像(MRI)采集的强大技术。然而,由于激发切片之间和内部复杂信号交互,SMS重建仍然具有挑战性。在本研究中,我们介绍了一种基于深度生成先验的鲁棒SMS MRI重建方法ROGER。利用去噪扩散概率模型(DDPM),ROGER从高斯噪声开始,并通过反向扩散迭代逐渐恢复单个切片,同时在读出串联框架内强制执行来自测得k空间数据的数据一致性。后验采样过程设计得使得DDPM训练可以在不需要为SMS任务进行修改的单切片图像上进行。此外,我们的方法还整合了一个低频增强(LFE)模块,以解决SMS加速的快速自旋回波(FSE)和回波平面成像(EPI)序列无法轻松嵌入完全采样的自校准信号的实际问题。对回顾性和前瞻性加速数据集的大量实验表明,ROGER始终优于现有方法,增强了解剖和功能成像,并具有强大的超出分布泛化能力。ROGER的源代码和示例数据可在https://github.com/Solor-pikachu/ROGER上获得。
更新时间: 2025-12-15 01:42:28
领域: eess.IV,cs.AI,cs.CV,eess.SP,physics.med-ph
Development and Benchmarking of a Blended Human-AI Qualitative Research Assistant
Qualitative research emphasizes constructing meaning through iterative engagement with textual data. Traditionally this human-driven process requires navigating coder fatigue and interpretative drift, thus posing challenges when scaling analysis to larger, more complex datasets. Computational approaches to augment qualitative research have been met with skepticism, partly due to their inability to replicate the nuance, context-awareness, and sophistication of human analysis. Large language models, however, present new opportunities to automate aspects of qualitative analysis while upholding rigor and research quality in important ways. To assess their benefits and limitations - and build trust among qualitative researchers - these approaches must be rigorously benchmarked against human-generated datasets. In this work, we benchmark Muse, an interactive, AI-powered qualitative research system that allows researchers to identify themes and annotate datasets, finding an inter-rater reliability between Muse and humans of Cohen's $κ$ = 0.71 for well-specified codes. We also conduct robust error analysis to identify failure mode, guide future improvements, and demonstrate the capacity to correct for human bias.
Updated: 2025-12-15 01:41:02
标题: "混合人工智能定性研究助手的开发和基准测试"
摘要: Qualitative research emphasizes constructing meaning through iterative engagement with textual data. Traditionally, this human-driven process requires navigating coder fatigue and interpretative drift, thus posing challenges when scaling analysis to larger, more complex datasets. Computational approaches to augment qualitative research have been met with skepticism, partly due to their inability to replicate the nuance, context-awareness, and sophistication of human analysis. Large language models, however, present new opportunities to automate aspects of qualitative analysis while upholding rigor and research quality in important ways. To assess their benefits and limitations - and build trust among qualitative researchers - these approaches must be rigorously benchmarked against human-generated datasets. In this work, we benchmark Muse, an interactive, AI-powered qualitative research system that allows researchers to identify themes and annotate datasets, finding an inter-rater reliability between Muse and humans of Cohen's κ = 0.71 for well-specified codes. We also conduct robust error analysis to identify failure modes, guide future improvements, and demonstrate the capacity to correct for human bias.
更新时间: 2025-12-15 01:41:02
领域: cs.HC,cs.AI
Machine Learning Architectures for the Estimation of Predicted Occupancy Grids in Road Traffic
This paper introduces a novel machine learning architecture for an efficient estimation of the probabilistic space-time representation of complex traffic scenarios. A detailed representation of the future traffic scenario is of significant importance for autonomous driving and for all active safety systems. In order to predict the future space-time representation of the traffic scenario, first the type of traffic scenario is identified and then the machine learning algorithm maps the current state of the scenario to possible future states. The input to the machine learning algorithms is the current state representation of a traffic scenario, termed as the Augmented Occupancy Grid (AOG). The output is the probabilistic space-time representation which includes uncertainties regarding the behaviour of the traffic participants and is termed as the Predicted Occupancy Grid (POG). The novel architecture consists of two Stacked Denoising Autoencoders (SDAs) and a set of Random Forests. It is then compared with the other two existing architectures that comprise of SDAs and DeconvNet. The architectures are validated with the help of simulations and the comparisons are made both in terms of accuracy and computational time. Also, a brief overview on the applications of POGs in the field of active safety is presented.
Updated: 2025-12-15 01:24:02
标题: 机器学习架构用于道路交通中预测占用格栅的估计
摘要: 这篇论文介绍了一种新颖的机器学习架构,用于有效估计复杂交通场景的概率时空表示。对未来交通场景的详细表示对于自动驾驶和所有主动安全系统都至关重要。为了预测交通场景的未来时空表示,首先识别交通场景的类型,然后机器学习算法将当前场景的状态映射到可能的未来状态。机器学习算法的输入是交通场景的当前状态表示,称为增强占用格(AOG)。输出是包括交通参与者行为不确定性的概率时空表示,称为预测占用格(POG)。这一新颖架构由两个堆叠去噪自动编码器(SDAs)和一组随机森林组成。然后将其与另外两种包含SDAs和DeconvNet的现有架构进行比较。通过模拟验证架构,并从准确性和计算时间两方面进行比较。此外,还简要介绍了POGs在主动安全领域的应用。
更新时间: 2025-12-15 01:24:02
领域: cs.LG
GRADE: Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation
Retrieval-Augmented Generation (RAG) systems are widely adopted in knowledge-intensive NLP tasks, but current evaluations often overlook the structural complexity and multi-step reasoning required in real-world scenarios. These benchmarks overlook key factors such as the interaction between retrieval difficulty and reasoning depth. To address this gap, we propose GRADE, a novel evaluation framework that models task difficulty along two orthogonal dimensions: (1) reasoning depth, defined by the number of inference steps (hops), and (2) semantic distance between the query and its supporting evidence. We construct a synthetic multi-hop QA dataset from factual news articles by extracting knowledge graphs and augmenting them through semantic clustering to recover missing links, allowing us to generate diverse and difficulty-controlled queries. Central to our framework is a 2D difficulty matrix that combines generator-side and retriever-side difficulty. Experiments across multiple domains and models show that error rates strongly correlate with our difficulty measures, validating their diagnostic utility. GRADE enables fine-grained analysis of RAG performance and provides a scalable foundation for evaluating and improving multi-hop reasoning in real-world applications.
Updated: 2025-12-15 01:19:21
标题: GRADE: 为RAG评估生成多跳QA和细粒度困难矩阵
摘要: 检索增强生成(RAG)系统广泛应用于知识密集型自然语言处理任务,但当前评估常常忽视真实场景中所需的结构复杂性和多步推理。这些基准忽视了关键因素,如检索困难度与推理深度之间的互动。为了填补这一空白,我们提出了GRADE,一个新颖的评估框架,它沿两个正交维度对任务难度进行建模:(1)推理深度,由推理步骤(跳数)的数量定义,以及(2)查询与其支持证据之间的语义距离。我们通过提取知识图并通过语义聚类来增强它们,构建了一个从事实新闻文章中提取的多跳QA数据集,以恢复丢失的链接,从而使我们能够生成多样化且难度可控的查询。我们框架的核心是一个结合生成器端和检索器端难度的2D难度矩阵。跨多个领域和模型的实验证明错误率与我们的难度度量强相关,验证了它们的诊断实用性。GRADE使得能够对RAG性能进行细粒度分析,并为评估和改进真实应用中的多跳推理提供了可扩展的基础。
更新时间: 2025-12-15 01:19:21
领域: cs.CL,cs.AI
PAC-Bayes Bounds for Multivariate Linear Regression and Linear Autoencoders
Linear Autoencoders (LAEs) have shown strong performance in state-of-the-art recommender systems. However, this success remains largely empirical, with limited theoretical understanding. In this paper, we investigate the generalizability -- a theoretical measure of model performance in statistical learning -- of multivariate linear regression and LAEs. We first propose a PAC-Bayes bound for multivariate linear regression, extending the earlier bound for single-output linear regression by Shalaeva et al., and establish sufficient conditions for its convergence. We then show that LAEs, when evaluated under a relaxed mean squared error, can be interpreted as constrained multivariate linear regression models on bounded data, to which our bound adapts. Furthermore, we develop theoretical methods to improve the computational efficiency of optimizing the LAE bound, enabling its practical evaluation on large models and real-world datasets. Experimental results demonstrate that our bound is tight and correlates well with practical ranking metrics such as Recall@K and NDCG@K.
Updated: 2025-12-15 01:12:11
标题: PAC-Bayes Bounds适用于多元线性回归和线性自编码器
摘要: 线性自动编码器(LAEs)在最先进的推荐系统中显示出良好的性能。然而,这种成功在很大程度上仅仅是经验性的,缺乏理论上的理解。在本文中,我们研究了多元线性回归和LAEs的泛化能力——这是统计学习中模型性能的理论度量。我们首先提出了多元线性回归的PAC-Bayes界限,扩展了Shalaeva等人提出的单输出线性回归的早期界限,并建立了其收敛的充分条件。然后,我们展示了当在一个放松的均方误差下评估时,LAEs可以被解释为在有界数据上的受限多元线性回归模型,而我们的界限适应于这种情况。此外,我们开发了理论方法来提高优化LAE界限的计算效率,使其能够在大型模型和真实世界数据集上进行实际评估。实验结果表明,我们的界限是紧密的,并与实际排名指标(如Recall@K和NDCG@K)有很好的相关性。
更新时间: 2025-12-15 01:12:11
领域: stat.ML,cs.LG
OptHQC: Optimize HQC for High-Performance Post-Quantum Cryptography
As post-quantum cryptography (PQC) becomes increasingly critical for securing future communication systems, the performance overhead introduced by quantum-resistant algorithms presents a major computing challenge. HQC (Hamming Quasi-Cyclic) is a newly standardized code-based PQC scheme designed to replace classical key exchange methods. In this paper, we propose OptHQC, an optimized implementation of the HQC scheme to deliver high-performance cryptographic operations. Our approach provides a comprehensive analysis of each computational blocks in HQC and introduces optimizations across all three stages: key generation, encryption, and decryption. We first exploit data-level sparsity in vector multiplication to accelerate polynomial operations during vector generation. We then leverage instruction-level acceleration (e.g., AVX2) in hash computation to further improve performance. Last, we transform multiplication into lookup table indexing and optimize memory access patterns in syndrome computation and error vector recovery, which are the most computationally intensive operations in HQC. Overall, OptHQC achieves an average 55% speedup over the reference HQC implementation on CPU.
Updated: 2025-12-15 01:07:57
标题: OptHQC:针对高性能后量子密码的优化HQC
摘要: 随着后量子密码学(PQC)在保护未来通信系统中变得越来越关键,由量子抗攻击算法引入的性能开销构成了一个主要的计算挑战。HQC(汉明准循环)是一种新标准化的基于码的PQC方案,旨在取代传统的密钥交换方法。在本文中,我们提出了OptHQC,这是HQC方案的优化实现,旨在提供高性能的加密操作。我们的方法对HQC中的每个计算块进行了全面分析,并在密钥生成、加密和解密三个阶段引入了优化。首先,我们利用向量乘法中的数据级稀疏性加速向量生成过程中的多项式操作。然后,我们利用指令级加速(例如AVX2)在哈希计算中进一步提高性能。最后,我们将乘法转换为查找表索引,并优化综合计算和错误向量恢复中的内存访问模式,这是HQC中最计算密集的操作。总体而言,OptHQC在CPU上实现了对参考HQC实现的平均55%加速。
更新时间: 2025-12-15 01:07:57
领域: cs.CR
Advancing Financial Engineering with Foundation Models: Progress, Applications, and Challenges
The advent of foundation models (FMs), large-scale pre-trained models with strong generalization capabilities, has opened new frontiers for financial engineering. While general-purpose FMs such as GPT-4 and Gemini have demonstrated promising performance in tasks ranging from financial report summarization to sentiment-aware forecasting, many financial applications remain constrained by unique domain requirements such as multimodal reasoning, regulatory compliance, and data privacy. These challenges have spurred the emergence of financial foundation models (FFMs): a new class of models explicitly designed for finance. This survey presents a comprehensive overview of FFMs, with a taxonomy spanning three key modalities: financial language foundation models (FinLFMs), financial time-series foundation models (FinTSFMs), and financial visual-language foundation models (FinVLFMs). We review their architectures, training methodologies, datasets, and real-world applications. Furthermore, we identify critical challenges in data availability, algorithmic scalability, and infrastructure constraints and offer insights into future research opportunities. We hope this survey can serve as both a comprehensive reference for understanding FFMs and a practical roadmap for future innovation.
Updated: 2025-12-15 01:07:02
标题: 用基础模型推进金融工程:进展、应用和挑战
摘要: 基于强大泛化能力的大规模预训练模型(FMs)的出现为金融工程开辟了新的前沿。虽然通用性FMs如GPT-4和Gemini在金融报告摘要到情感感知预测等任务中表现出有希望的性能,但许多金融应用仍受到独特领域要求的限制,如多模态推理、监管合规和数据隐私。这些挑战促使了金融基础模型(FFMs)的出现:一类专为金融而设计的新型模型。本调查对FFMs进行了全面概述,其分类涵盖了三个关键模态:金融语言基础模型(FinLFMs)、金融时间序列基础模型(FinTSFMs)和金融视觉语言基础模型(FinVLFMs)。我们审查了它们的架构、训练方法、数据集和真实世界应用。此外,我们确定了数据可用性、算法可扩展性和基础设施限制等关键挑战,并提供了未来研究机会的见解。我们希望这项调查既可以作为了解FFMs的全面参考,也可以作为未来创新的实用路线图。
更新时间: 2025-12-15 01:07:02
领域: q-fin.CP,cs.AI,cs.LG
Next-generation reservoir computing validated by classification task
An emerging computing paradigm, so-called next-generation reservoir computing (NG-RC) is investigated. True to its namesake, NG-RC requires no actual reservoirs for input data mixing but rather computing the polynomial terms directly from the time series inputs. However, benchmark tests so far reported have been one-sided, limited to prediction tasks of temporal waveforms such as Lorenz 63 attractor and Mackey-Glass chaotic signal. We will demonstrate for the first time that NG-RC can perform classification task as good as conventional RC. This validates the versatile computational capability of NG-RC in tasks of both prediction and classification.
Updated: 2025-12-15 01:06:23
标题: 下一代水库计算在分类任务中的验证
摘要: 一种新兴的计算范式,被称为下一代储层计算(NG-RC)正在被研究。忠于其名字,NG-RC不需要实际的储层来混合输入数据,而是直接从时间序列输入计算多项式项。然而,迄今为止报道的基准测试只是单方面的,仅限于对时间波形的预测任务,如Lorenz 63吸引子和Mackey-Glass混沌信号。我们将首次展示NG-RC可以像传统RC一样执行分类任务。这验证了NG-RC在预测和分类任务中的多功能计算能力。
更新时间: 2025-12-15 01:06:23
领域: cs.LG
Predicted-occupancy grids for vehicle safety applications based on autoencoders and the Random Forest algorithm
In this paper, a probabilistic space-time representation of complex traffic scenarios is predicted using machine learning algorithms. Such a representation is significant for all active vehicle safety applications especially when performing dynamic maneuvers in a complex traffic scenario. As a first step, a hierarchical situation classifier is used to distinguish the different types of traffic scenarios. This classifier is responsible for identifying the type of the road infrastructure and the safety-relevant traffic participants of the driving environment. With each class representing similar traffic scenarios, a set of Random Forests (RFs) is individually trained to predict the probabilistic space-time representation, which depicts the future behavior of traffic participants. This representation is termed as a Predicted-Occupancy Grid (POG). The input to the RFs is an Augmented Occupancy Grid (AOG). In order to increase the learning accuracy of the RFs and to perform better predictions, the AOG is reduced to low-dimensional features using a Stacked Denoising Autoencoder (SDA). The excellent performance of the proposed machine learning approach consisting of SDAs and RFs is demonstrated in simulations and in experiments with real vehicles. An application of POGs to estimate the criticality of traffic scenarios and to determine safe trajectories is also presented.
Updated: 2025-12-15 00:59:44
标题: 基于自动编码器和随机森林算法的车辆安全应用中的预测占用格网
摘要: 本文利用机器学习算法预测复杂交通场景的概率时空表示。这样的表示对于所有主动车辆安全应用都很重要,特别是在复杂交通场景中执行动态操作时。作为第一步,使用分层情境分类器来区分不同类型的交通场景。该分类器负责识别道路基础设施的类型和驾驶环境中与安全相关的交通参与者。每个类代表类似的交通场景,一组随机森林(RFs)被分别训练以预测概率时空表示,描述交通参与者的未来行为。这种表示被称为预测占用格(POG)。RFs的输入是增强占用格(AOG)。为了提高RFs的学习准确性并进行更好的预测,使用堆叠去噪自动编码器(SDA)将AOG降低为低维特征。提出的机器学习方法包括SDAs和RFs在仿真和与真实车辆的实验中表现出色。还展示了将POGs应用于估计交通场景的关键性和确定安全轨迹。
更新时间: 2025-12-15 00:59:44
领域: cs.LG
ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning
Behavior-cloning based visuomotor policies enable precise manipulation but often inherit the slow, cautious tempo of human demonstrations, limiting practical deployment. However, prior studies on acceleration methods mainly rely on statistical or heuristic cues that ignore task semantics and can fail across diverse manipulation settings. We present ESPADA, a semantic and spatially aware framework that segments demonstrations using a VLM-LLM pipeline with 3D gripper-object relations, enabling aggressive downsampling only in non-critical segments while preserving precision-critical phases, without requiring extra data or architectural modifications, or any form of retraining. To scale from a single annotated episode to the full dataset, ESPADA propagates segment labels via Dynamic Time Warping (DTW) on dynamics-only features. Across both simulation and real-world experiments with ACT and DP baselines, ESPADA achieves approximately a 2x speed-up while maintaining success rates, narrowing the gap between human demonstrations and efficient robot control.
Updated: 2025-12-15 00:51:44
标题: ESPADA:通过对模仿学习的语义感知演示数据降采样来加快执行速度
摘要: 基于行为克隆的视觉运动策略能够实现精确的操作,但通常会继承人类演示的缓慢、谨慎的节奏,从而限制了实际部署的可能性。然而,先前关于加速方法的研究主要依赖于统计或启发式线索,忽略了任务语义,并且在不同的操作设置中可能失败。我们提出了ESPADA,这是一个语义和空间感知的框架,利用VLM-LLM管道对演示进行分割,使用3D夹持器-物体关系,在非关键部分进行激进的降采样,同时保留了精确关键阶段,而无需额外的数据或架构修改,也无需任何形式的重新训练。为了从单个注释的事件扩展到整个数据集,ESPADA通过仅在动态特征上使用动态时间规整(DTW)来传播段标签。在模拟和真实世界实验中,使用ACT和DP基线,ESPADA实现了大约2倍的加速,同时保持了成功率,缩小了人类演示和高效机器人控制之间的差距。
更新时间: 2025-12-15 00:51:44
领域: cs.RO,cs.AI
Qonvolution: Towards Learning High-Frequency Signals with Queried Convolution
Accurately learning high-frequency signals is a challenge in computer vision and graphics, as neural networks often struggle with these signals due to spectral bias or optimization difficulties. While current techniques like Fourier encodings have made great strides in improving performance, there remains scope for improvement when presented with high-frequency information. This paper introduces Queried-Convolutions (Qonvolutions), a simple yet powerful modification using the neighborhood properties of convolution. Qonvolution convolves a low-frequency signal with queries (such as coordinates) to enhance the learning of intricate high-frequency signals. We empirically demonstrate that Qonvolutions enhance performance across a variety of high-frequency learning tasks crucial to both the computer vision and graphics communities, including 1D regression, 2D super-resolution, 2D image regression, and novel view synthesis (NVS). In particular, by combining Gaussian splatting with Qonvolutions for NVS, we showcase state-of-the-art performance on real-world complex scenes, even outperforming powerful radiance field models on image quality.
Updated: 2025-12-15 00:46:09
标题: Qonvolution:朝向用查询卷积学习高频信号
摘要: 准确学习高频信号是计算机视觉和图形领域的一个挑战,因为神经网络往往由于谱偏差或优化困难而难以处理这些信号。尽管当前的技术如傅里叶编码在改善性能方面取得了重大进展,但在处理高频信息时仍有改进的空间。本文介绍了一种名为查询卷积(Qonvolutions)的简单而强大的修改,利用卷积的邻域属性来增强对复杂高频信号的学习。我们在实证实验中证明了Qonvolutions在计算机视觉和图形社区中关键的各种高频学习任务上提高了性能,包括1D回归、2D超分辨率、2D图像回归和新视角合成(NVS)。特别是,通过将高斯喷洒与Qonvolutions结合用于NVS,我们展示了在真实复杂场景中的最先进性能,甚至在图像质量上超过了强大的辐射场模型。
更新时间: 2025-12-15 00:46:09
领域: cs.CV,cs.GR,cs.LG
Probability Estimation for Predicted-Occupancy Grids in Vehicle Safety Applications Based on Machine Learning
This paper presents a method to predict the evolution of a complex traffic scenario with multiple objects. The current state of the scenario is assumed to be known from sensors and the prediction is taking into account various hypotheses about the behavior of traffic participants. This way, the uncertainties regarding the behavior of traffic participants can be modelled in detail. In the first part of this paper a model-based approach is presented to compute Predicted-Occupancy Grids (POG), which are introduced as a grid-based probabilistic representation of the future scenario hypotheses. However, due to the large number of possible trajectories for each traffic participant, the model-based approach comes with a very high computational load. Thus, a machine-learning approach is adopted for the computation of POGs. This work uses a novel grid-based representation of the current state of the traffic scenario and performs the mapping to POGs. This representation consists of augmented cells in an occupancy grid. The adopted machine-learning approach is based on the Random Forest algorithm. Simulations of traffic scenarios are performed to compare the machine-learning with the model-based approach. The results are promising and could enable the real-time computation of POGs for vehicle safety applications. With this detailed modelling of uncertainties, crucial components in vehicle safety systems like criticality estimation and trajectory planning can be improved.
Updated: 2025-12-15 00:45:26
标题: 基于机器学习的车辆安全应用中预测占用栅格的概率估计
摘要: 本文提出了一种预测复杂交通场景中多个对象演变的方法。假定从传感器已知场景的当前状态,并且预测考虑了关于交通参与者行为的各种假设。通过这种方式,可以详细建模关于交通参与者行为的不确定性。在本文的第一部分中,提出了一种基于模型的方法来计算预测占用网格(POG),将其引入作为未来场景假设的基于网格的概率表示。然而,由于每个交通参与者可能的轨迹数量庞大,基于模型的方法带来了非常高的计算负荷。因此,采用了机器学习方法来计算POGs。本文使用了一种新颖的基于网格的交通场景当前状态表示,并将其映射到POGs。该表示包括在占用网格中的增强单元。采用的机器学习方法基于随机森林算法。进行了交通场景的模拟以比较机器学习和基于模型的方法。结果令人鼓舞,可以实现车辆安全应用中POGs的实时计算。通过对不确定性的详细建模,可以改进车辆安全系统中的关键组件,如危险性估计和轨迹规划。
更新时间: 2025-12-15 00:45:26
领域: cs.LG
Wait, Wait, Wait... Why Do Reasoning Models Loop?
Reasoning models (e.g., DeepSeek-R1) generate long chains of thought to solve harder problems, but they often loop, repeating the same text at low temperatures or with greedy decoding. We study why this happens and what role temperature plays. With open reasoning models, we find that looping is common at low temperature. Larger models tend to loop less, and distilled students loop significantly even when their teachers rarely do. This points to mismatches between the training distribution and the learned model, which we refer to as errors in learning, as a key cause. To understand how such errors cause loops, we introduce a synthetic graph reasoning task and demonstrate two mechanisms. First, risk aversion caused by hardness of learning: when the correct progress-making action is hard to learn but an easy cyclic action is available, the model puts relatively more probability on the cyclic action and gets stuck. Second, even when there is no hardness, Transformers show an inductive bias toward temporally correlated errors, so the same few actions keep being chosen and loops appear. Higher temperature reduces looping by promoting exploration, but it does not fix the errors in learning, so generations remain much longer than necessary at high temperature; in this sense, temperature is a stopgap rather than a holistic solution. We end with a discussion of training-time interventions aimed at directly reducing errors in learning.
Updated: 2025-12-15 00:44:54
标题: 等等等...为什么推理模型会循环?
摘要: Reasoning models(例如DeepSeek-R1)生成长链的思考来解决更难的问题,但它们经常出现循环,在低温度或贪婪解码时重复相同的文本。我们研究了为什么这种情况会发生以及温度在其中扮演的角色。通过开放的推理模型,我们发现在低温度下循环是常见的。更大的模型往往循环较少,而精简的学生即使他们的老师很少循环,也会显著循环。这表明了训练分布与学习模型之间存在不匹配,我们将其称为学习中的错误,是导致这种情况的主要原因。为了理解这些错误如何导致循环,我们引入了一个合成的图推理任务,并展示了两种机制。首先,学习困难导致的风险规避:当正确的推进行为难以学习但易于循环的行为可用时,模型会相对更多地将概率放在循环行为上并陷入困境。其次,即使没有困难,变压器也表现出一种偏向于时间相关错误的归纳偏差,因此相同的几个行动不断被选择,循环出现。较高的温度通过促进探索来减少循环,但并不修复学习中的错误,因此在高温度下生成时间比必要的长得多;从这个意义上说,温度是一种权宜之计,而不是一个全面的解决方案。最后,我们讨论了旨在直接减少学习中的错误的训练时干预措施。
更新时间: 2025-12-15 00:44:54
领域: cs.LG
Distillation of Discrete Diffusion by Exact Conditional Distribution Matching
Discrete diffusion models (DDMs) are a powerful class of generative models for categorical data, but they typically require many function evaluations for a single sample, making inference expensive. Existing acceleration methods either rely on approximate simulators, such as $τ$-leaping, or on distillation schemes that train new student models and auxiliary networks with proxy objectives. We propose a simple and principled distillation alternative based on \emph{conditional distribution matching}. Our key observation is that the reverse conditional distribution of clean data given a noisy state, $p_{0\mid t}(x_0 \mid x_t)$, admits a Markov decomposition through intermediate times and can be recovered from marginal density ratios and the known forward CTMC kernel. We exploit this structure to define distillation objectives that directly match conditional distributions between a pre-trained teacher and a low-NFE student, both for one-step and few-step samplers.
Updated: 2025-12-15 00:16:10
标题: 通过精确条件分布匹配对离散扩散的蒸馏
摘要: 离散扩散模型(DDMs)是一类用于分类数据的强大生成模型,但它们通常需要许多函数评估来生成单个样本,使推断成本昂贵。现有的加速方法要么依赖于近似模拟器,例如$τ$-leaping,要么依赖于训练具有代理目标的新学生模型和辅助网络的提取方案。我们提出了一种基于\emph{条件分布匹配}的简单而原则性的提取替代方法。我们的关键观察是,在给定嘈杂状态的情况下,干净数据的逆条件分布$p_{0\mid t}(x_0 \mid x_t)$通过中间时间具有马尔可夫分解,并且可以通过边际密度比和已知的正向CTMC内核来恢复。我们利用这种结构定义了提取目标,直接在预训练的教师和低NFE学生之间匹配一步和少步采样器的条件分布。
更新时间: 2025-12-15 00:16:10
领域: cs.LG
Meta-GPT: Decoding the Metasurface Genome with Generative Artificial Intelligence
Advancing artificial intelligence for physical sciences requires representations that are both interpretable and compatible with the underlying laws of nature. We introduce METASTRINGS, a symbolic language for photonics that expresses nanostructures as textual sequences encoding materials, geometries, and lattice configurations. Analogous to molecular textual representations in chemistry, METASTRINGS provides a framework connecting human interpretability with computational design by capturing the structural hierarchy of photonic metasurfaces. Building on this representation, we develop Meta-GPT, a foundation transformer model trained on METASTRINGS and finetuned with physics-informed supervised, reinforcement, and chain-of-thought learning. Across various design tasks, the model achieves <3% mean-squared spectral error and maintains >98% syntactic validity, generating diverse metasurface prototypes whose experimentally measured optical responses match their target spectra. These results demonstrate that Meta-GPT can learn the compositional rules of light-matter interactions through METASTRINGS, laying a rigorous foundation for AI-driven photonics and representing an important step toward a metasurface genome project.
Updated: 2025-12-15 00:09:14
标题: Meta-GPT:利用生成人工智能解码超表面基因组
摘要: 推进物理科学的人工智能需要既可解释又与自然规律相容的表征。我们引入了METASTRINGS,这是一种用于光子学的符号语言,将纳米结构表示为编码材料、几何和晶格配置的文本序列。类似于化学中的分子文本表示,METASTRINGS通过捕获光子超表面的结构层次,提供了一个连接人类可解释性与计算设计的框架。基于这种表示,我们开发了Meta-GPT,这是一个基于METASTRINGS训练的基础变压器模型,并通过物理知识的监督、强化和思维链学习进行微调。在各种设计任务中,该模型实现了<3%的均方光谱误差,并保持了>98%的句法有效性,生成了各种光子超表面原型,其实验测量的光学响应与目标光谱相匹配。这些结果表明,Meta-GPT可以通过METASTRINGS学习光-物质相互作用的组成规则,为AI驱动的光子学奠定了严密的基础,并代表了迈向超表面基因组计划的重要一步。
更新时间: 2025-12-15 00:09:14
领域: physics.optics,cs.AI,cs.CL,cs.LG