Arxiv Day: Article

SI-Agent: An Agentic Framework for Feedback-Driven Generation and Tuning of Human-Readable System Instructions for Large Language Models

System Instructions (SIs), or system prompts, are pivotal for guiding Large Language Models (LLMs) but manual crafting is resource-intensive and often suboptimal. Existing automated methods frequently generate non-human-readable "soft prompts," sacrificing interpretability. This paper introduces SI-Agent, a novel agentic framework designed to automatically generate and iteratively refine human-readable SIs through a feedback-driven loop. SI-Agent employs three collaborating agents: an Instructor Agent, an Instruction Follower Agent (target LLM), and a Feedback/Reward Agent evaluating task performance and optionally SI readability. The framework utilizes iterative cycles where feedback guides the Instructor's refinement strategy (e.g., LLM-based editing, evolutionary algorithms). We detail the framework's architecture, agent roles, the iterative refinement process, and contrast it with existing methods. We present experimental results validating SI-Agent's effectiveness, focusing on metrics for task performance, SI readability, and efficiency. Our findings indicate that SI-Agent generates effective, readable SIs, offering a favorable trade-off between performance and interpretability compared to baselines. Potential implications include democratizing LLM customization and enhancing model transparency. Challenges related to computational cost and feedback reliability are acknowledged.

Updated: 2025-07-03 23:44:50

标题: SI-Agent：用于大型语言模型生成和调整可读系统指令的代理框架

摘要: 系统指令（SIs）或系统提示对于引导大型语言模型（LLMs）至关重要，但手工制作资源密集且通常不够优化。现有的自动化方法通常生成非人类可读的“软提示”，牺牲了可解释性。本文介绍了SI-Agent，这是一个新颖的代理框架，旨在通过反馈驱动循环自动生成和迭代优化可读的SIs。SI-Agent利用三个协作代理：一个指导代理，一个遵循指令的代理（目标LLM），以及一个评估任务表现和可选的SI可读性的反馈/奖励代理。该框架利用迭代循环，其中反馈指导指导者的优化策略（例如基于LLM的编辑，进化算法）。我们详细介绍了框架的架构、代理角色、迭代优化过程，并将其与现有方法进行对比。我们提供了验证SI-Agent有效性的实验结果，重点关注任务表现、SI可读性和效率的度量标准。我们的研究结果表明，与基准相比，SI-Agent生成了有效的、可读的SIs，在性能和可解释性之间提供了有利的权衡。潜在的影响包括民主化LLM定制和增强模型的透明度。我们已认识到与计算成本和反馈可靠性相关的挑战。

更新时间: 2025-07-03 23:44:50

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.03223v1

The role of gain neuromodulation in layer-5 pyramidal neurons

Biological and artificial learning systems alike confront the plasticity-stability dilemma. In the brain, neuromodulators such as acetylcholine and noradrenaline relieve this tension by tuning neuronal gain and inhibitory gating, balancing segregation and integration of circuits. Fed by dense cholinergic and noradrenergic projections from the ascending arousal system, layer-5 pyramidal neurons in the cerebral cortex offer a relevant substrate for understanding these dynamics. When distal dendritic signals coincide with back-propagating action potentials, calcium plateaus turn a single somatic spike into a high-gain burst, and interneuron inhibition sculpts the output. These properties make layer-5 cells gain-tunable amplifiers that translate neuromodulatory cues into flexible cortical activity. To capture this mechanism we developed a two-compartment Izhikevich model for pyramidal neurons and single-compartment somatostatin (SOM) and parvalbumin (PV) interneurons, linked by Gaussian connectivity and spike-timing-dependent plasticity (STDP). The soma and apical dendrite are so coupled that somatic spikes back-propagate, while dendritic plateaus can switch the soma from regular firing to bursting by shifting reset and adaptation variables. We show that stronger dendritic drive or tighter coupling raise gain by increasing the likelihood of calcium-triggered somatic bursts. In contrast, dendritic-targeted inhibition suppresses gain, while somatic-targeted inhibition raises the firing threshold of neighboring neurons, thus gating neurons output. Notably, bursting accelerates STDP, supporting rapid synaptic reconfiguration and flexibility.This suggests that brief gain pulses driven by neuromodulators could serve as an adaptive two-timescale optimization mechanism, effectively modulating the synaptic weight updates.

Updated: 2025-07-03 23:29:29

标题: 层-5金字塔形神经元中增益神经调节的作用

摘要: 生物学和人工学习系统都面临着可塑性-稳定性困境。在大脑中，乙酰胆碱和去甲肾上腺素等神经递质通过调节神经元增益和抑制性门控来缓解这种紧张状态，平衡电路的分离和整合。由上行唤醒系统密集的胆碱能和去甲肾上腺素能投射，大脑皮层的第5层金字塔形神经元提供了一个相关的基质，用于理解这些动态。当远端树突信号与反向传播的动作电位相遇时，钙平台会将单个体突触转变为高增益爆发，并且抑制神经元会塑造输出。这些特性使得第5层细胞成为可调增益的放大器，将神经调节信号转换为灵活的皮质活动。为了捕捉这种机制，我们开发了一个金字塔神经元的两室Izhikevich模型和单室生长抑素（SOM）和帕尔瓦尔布明（PV）抑制神经元，通过高斯连接和时序相关性可塑性（STDP）连接。细胞体和顶端树突耦合在一起，使得体突触反向传播，而树突平台可以通过转移重置和适应变量将细胞体从正常射击转变为爆发。我们表明，更强的树突驱动或更紧密的耦合通过增加钙触发的细胞体爆发的可能性来提高增益。相反，树突靶向的抑制会抑制增益，而体突靶向的抑制会提高相邻神经元的射击阈值，从而控制神经元的输出。值得注意的是，爆发加速了STDP，支持快速窑突重构和灵活性。这表明，由神经调节因子驱动的短暂增益脉冲可能作为一种自适应的双时间尺度优化机制，有效地调节突触权重更新。

更新时间: 2025-07-03 23:29:29

领域: q-bio.NC,cs.AI,68T05

下载: http://arxiv.org/abs/2507.03222v1

Neural Inhibition Improves Dynamic Routing and Mixture of Experts

To be effective, efficient, and diverse, deep learning models need to dynamically choose its architecture based on signals from a population of neurons. We hypothesize dynamic routing models can be improved with neural inhibition in those neural populations. This means signals commonly shared among the various modes of data statistics can be inhibited so that the routing model can choose a specialized expert path for each data sample. Only through inhibition is the routing mechanism able to effectively select neural pathways. We believe this is an under-studied and under-verified implementation methodology for Mixture-of-Experts, dynamic routing, and transformer language models. We provide experimental evidence that the neural inhibition algorithm significantly boosts the performance of general tasks and motivates more effort to be invested in this research direction.

Updated: 2025-07-03 23:28:36

标题: 神经抑制改善动态路由和专家混合

摘要: 为了有效、高效和多样化，深度学习模型需要根据来自神经元群体的信号动态选择其架构。我们假设动态路由模型可以通过在这些神经元群体中引入神经抑制来改进。这意味着可以抑制在各种数据统计模式中共享的信号，以便路由模型可以为每个数据样本选择一个专门的专家路径。只有通过抑制，路由机制才能有效地选择神经通路。我们认为这是Mixture-of-Experts、动态路由和Transformer语言模型的一种未充分研究和未经验证的实现方法。我们提供实验证据表明神经抑制算法显著提升了通用任务的性能，并激励更多的努力投入到这个研究方向。

更新时间: 2025-07-03 23:28:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03221v1

Symbiosis: Multi-Adapter Inference and Fine-Tuning

Parameter-efficient fine-tuning (PEFT) allows model builders to capture the task specific parameters into adapters, which are a fraction of the size of the original base model. Popularity of PEFT technique for fine-tuning has led to creation of a large number of adapters for popular Large Language Models (LLMs). However, existing frameworks fall short in supporting inference or fine-tuning with multiple adapters in the following ways. 1) For fine-tuning, each job needs to deploy its dedicated base model instance, which results in excessive GPU memory consumption and poor GPU utilization. 2) While popular inference platforms can serve multiple PEFT adapters, they do not allow independent resource management or mixing of different PEFT methods. 3) They cannot share resources (such as base model instance) between inference and fine-tuning jobs. 4) They do not provide privacy to users who may not wish to expose their fine-tuned parameters to service providers. In Symbiosis, we address the above problems by enabling as-a-service deployment of base model. The base model layers can be shared across multiple inference or fine-tuning processes. Our split-execution technique decouples the execution of client-specific adapters and layers from the frozen base model layers offering them flexibility to manage their resources, to select their fine-tuning method, to achieve their performance goals. Our approach is transparent to models and works out-of-the-box for most models in the transformers library. Our evaluation on Llama2-13B shows the compared to baseline, Symbiosis can fine-tune 4X more adapters on the same set of GPUs in the same amount of time.

Updated: 2025-07-03 23:25:59

标题: 共生关系：多适配器推理和微调

摘要: 参数高效微调（PEFT）使模型构建者能够将任务特定参数捕捉到适配器中，这些适配器的大小仅为原始基础模型的一小部分。PEFT技术在微调方面的流行导致为流行的大型语言模型（LLMs）创建了大量适配器。然而，现有框架在以下几个方面不足以支持具有多个适配器的推理或微调。1）对于微调，每个作业需要部署其专用基础模型实例，这导致了过多的GPU内存消耗和GPU利用率不佳。2）虽然流行的推理平台可以为多个PEFT适配器提供服务，但它们不允许独立的资源管理或混合不同的PEFT方法。3）它们不能在推理和微调作业之间共享资源（例如基础模型实例）。4）它们不能为不希望将微调参数暴露给服务提供商的用户提供隐私保护。在Symbiosis中，我们通过启用基础模型的即服务部署来解决上述问题。基础模型层可以在多个推理或微调过程之间共享。我们的分离执行技术将客户特定适配器和层的执行与冻结的基础模型层分离，为它们提供了管理资源、选择微调方法、实现性能目标的灵活性。我们的方法对模型透明，并且适用于transformers库中的大多数模型。我们在Llama2-13B上的评估显示，与基准相比，Symbiosis能够在相同的GPU组中在相同的时间内微调4倍更多的适配器。

更新时间: 2025-07-03 23:25:59

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.03220v1

Advancing Precision Oncology Through Modeling of Longitudinal and Multimodal Data

Cancer evolves continuously over time through a complex interplay of genetic, epigenetic, microenvironmental, and phenotypic changes. This dynamic behavior drives uncontrolled cell growth, metastasis, immune evasion, and therapy resistance, posing challenges for effective monitoring and treatment. However, today's data-driven research in oncology has primarily focused on cross-sectional analysis using data from a single modality, limiting the ability to fully characterize and interpret the disease's dynamic heterogeneity. Advances in multiscale data collection and computational methods now enable the discovery of longitudinal multimodal biomarkers for precision oncology. Longitudinal data reveal patterns of disease progression and treatment response that are not evident from single-timepoint data, enabling timely abnormality detection and dynamic treatment adaptation. Multimodal data integration offers complementary information from diverse sources for more precise risk assessment and targeting of cancer therapy. In this review, we survey methods of longitudinal and multimodal modeling, highlighting their synergy in providing multifaceted insights for personalized care tailored to the unique characteristics of a patient's cancer. We summarize the current challenges and future directions of longitudinal multimodal analysis in advancing precision oncology.

Updated: 2025-07-03 23:24:15

标题: 通过长期和多模态数据建模推进精准肿瘤治疗

摘要: 癌症在时间上持续演变，通过遗传、表观遗传、微环境和表型变化的复杂相互作用。这种动态行为驱动着细胞不受控制的生长、转移、免疫逃逸和治疗抵抗，给有效监测和治疗带来挑战。然而，当今肿瘤学中基于数据驱动的研究主要集中在使用来自单一模态的数据进行横断面分析，限制了对疾病动态异质性的全面表征和解释能力。多尺度数据收集和计算方法的进步现在使得能够发现用于精准肿瘤学的纵向多模态生物标志物。纵向数据揭示了疾病进展和治疗反应的模式，这些模式在单个时间点数据中不明显，使得及时发现异常和动态调整治疗成为可能。多模态数据整合提供了来自不同来源的互补信息，以更精确地评估风险和针对癌症治疗。在本综述中，我们调查了纵向和多模态建模方法，突出它们在为根据患者癌症的独特特征量身定制的个性化护理提供多方面洞见方面的协同作用。我们总结了纵向多模态分析在推进精准肿瘤学方面的当前挑战和未来方向。

更新时间: 2025-07-03 23:24:15

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2502.07836v3

Fast Algorithm for Constrained Linear Inverse Problems

We consider the constrained Linear Inverse Problem (LIP), where a certain atomic norm (like the $\ell_1 $ norm) is minimized subject to a quadratic constraint. Typically, such cost functions are non-differentiable, which makes them not amenable to the fast optimization methods existing in practice. We propose two equivalent reformulations of the constrained LIP with improved convex regularity: (i) a smooth convex minimization problem, and (ii) a strongly convex min-max problem. These problems could be solved by applying existing acceleration-based convex optimization methods which provide better $ O \left( \frac{1}{k^2} \right) $ theoretical convergence guarantee, improving upon the current best rate of $ O \left( \frac{1}{k} \right) $. We also provide a novel algorithm named the Fast Linear Inverse Problem Solver (FLIPS), which is tailored to maximally exploit the structure of the reformulations. We demonstrate the performance of FLIPS on the classical problems of Binary Selection, Compressed Sensing, and Image Denoising. We also provide an open-source package for these three examples, which can be easily adapted to other LIPs.

Updated: 2025-07-03 23:23:14

标题: 受限线性反问题的快速算法

摘要: 我们考虑了受限线性逆问题（LIP），其中最小化某种原子范数（如$ \ell_1 $范数），受到二次约束的限制。通常，这样的成本函数是不可微的，这使它们不适合于实践中现有的快速优化方法。我们提出了两种等价的受限LIP改进的凸正则化重新表述：（i）一个平滑凸最小化问题，（ii）一个强凸最小-最大问题。这些问题可以通过应用现有的加速凸优化方法来解决，这些方法提供更好的$ O \left( \frac{1}{k^2} \right) $ 理论收敛保证，改进了当前最佳速率$ O \left( \frac{1}{k} \right) $。我们还提供了一种名为快速线性逆问题求解器（FLIPS）的新算法，该算法旨在最大程度地利用这些重新表述的结构。我们展示了FLIPS在二元选择、压缩感知和图像去噪等经典问题上的性能。我们还为这三个示例提供了一个开源软件包，可以轻松地适应其他LIPs。

更新时间: 2025-07-03 23:23:14

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2212.01068v7

Disclosing Generative AI Use in Digital Humanities Research

This survey study investigates how digital humanists perceive and approach generative AI disclosure in research. The results indicate that while digital humanities scholars acknowledge the importance of disclosing GenAI use, the actual rate of disclosure in research practice remains low. Respondents differ in their views on which activities most require disclosure and on the most appropriate methods for doing so. Most also believe that safeguards for AI disclosure should be established through institutional policies rather than left to individual decisions. The study's findings will offer empirical guidance to scholars, institutional leaders, funders, and other stakeholders responsible for shaping effective disclosure policies.

Updated: 2025-07-03 23:11:45

标题: 揭示数字人文研究中生成式人工智能的应用

摘要: 这项调查研究探讨了数字人文学者如何看待和处理研究中的生成式人工智能披露。结果显示，虽然数字人文学者认识到披露GenAI使用的重要性，但实际研究实践中的披露率仍然很低。受访者在哪些活动最需要披露以及最合适的方法上存在不同看法。大多数人也认为，关于AI披露的保障措施应该通过机构政策来建立，而不是留给个人决定。该研究的发现将为学者、机构领导、资助机构和其他相关利益相关者提供实证指导，帮助他们塑造有效的披露政策。

更新时间: 2025-07-03 23:11:45

领域: cs.CY,cs.AI,cs.DL,cs.ET

下载: http://arxiv.org/abs/2507.03216v1

Structured Difference-of-Q via Orthogonal Learning

Offline reinforcement learning is important in many settings with available observational data but the inability to deploy new policies online due to safety, cost, and other concerns. Many recent advances in causal inference and machine learning target estimation of causal contrast functions such as CATE, which is sufficient for optimizing decisions and can adapt to potentially smoother structure. We develop a dynamic generalization of the R-learner (Nie and Wager 2021, Lewis and Syrgkanis 2021) for estimating and optimizing the difference of $Q^\pi$-functions, $Q^\pi(s,1)-Q^\pi(s,0)$ (which can be used to optimize multiple-valued actions). We leverage orthogonal estimation to improve convergence rates in the presence of slower nuisance estimation rates and prove consistency of policy optimization under a margin condition. The method can leverage black-box nuisance estimators of the $Q$-function and behavior policy to target estimation of a more structured $Q$-function contrast.

Updated: 2025-07-03 23:10:51

标题: 通过正交学习的结构差异-Q

摘要: 离线强化学习在许多具有可用观测数据但由于安全、成本和其他考虑无法在线部署新策略的情境中至关重要。最近许多因果推断和机器学习的进展都致力于估计因果对比函数，如CATE，这对优化决策是足够的，并且可以适应潜在更平滑的结构。我们开发了R-learner的动态泛化版本（Nie和Wager 2021，Lewis和Syrgkanis 2021），用于估计和优化$Q^\pi$-函数的差异，即$Q^\pi(s,1)-Q^\pi(s,0)$（可用于优化多值动作）。我们利用正交估计来提高在存在较慢的干扰估计速率时的收敛速率，并证明了在边界条件下策略优化的一致性。这种方法可以利用$Q$-函数和行为策略的黑盒干扰估计器来目标估计更结构化的$Q$-函数对比。

更新时间: 2025-07-03 23:10:51

领域: stat.ML,cs.LG,math.OC,stat.ME

下载: http://arxiv.org/abs/2406.08697v3

DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing

Fine-tuning large language models (LLMs) remains resource-intensive due to their sheer scale. While zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating backward passes, its application to multi-hundred-billion-parameter models is constrained by GPU memory and compute throughput. The ZO2 framework addresses the memory bottleneck by offloading model parameters to CPU memory and overlapping transformer block transfer with dual forward computation on a single GPU. However, ZO2 remains limited by its single-device execution and achieves modest throughput. In this work, we present DistZO2, a high-throughput, memory-efficient framework for distributed zeroth-order fine-tuning of LLMs. DistZO2 introduces three parallel strategies: (1) Perturbation Parallelism (PertP), which parallelizes the two perturbed forward passes across devices; (2) Distributed Data Parallelism (DDP), adapted to the scalar-gradient nature of ZO training; and (3) a unified 2D Parallelism design that combines PertP and DDP. To further mitigate communication bottlenecks introduced by parameter offloading, we propose a hardware-aware communication strategy that slices parameter blocks and redistributes them across GPUs via high-speed interconnects such as NVLink. DistZO2 scales zeroth-order fine-tuning to modern multi-GPU systems, preserving ZO2's memory efficiency while substantially improving training throughput. In our experiments on OPT-175B, DistZO2 achieves a 3x speedup over ZO2 with distributed computing. DistZO2's code has been open-sourced in https://github.com/liangyuwang/zo2.

Updated: 2025-07-03 22:53:34

标题: DistZO2：使用分布式并行计算进行高吞吐量和内存高效的零阶微调LLMs

摘要: 大型语言模型（LLMs）的微调仍然需要大量资源，因为它们的规模巨大。零阶（ZO）优化提供了一种内存高效的替代方案，通过消除反向传播，但其在处理多百亿参数模型时受到GPU内存和计算吞吐量的限制。ZO2框架通过将模型参数转移到CPU内存，并将transformer块传输与单GPU上的双向计算重叠，解决了内存瓶颈。然而，ZO2仍受限于单设备执行，并且实现的吞吐量有限。在这项工作中，我们提出了DistZO2，一个用于分布式零阶微调LLMs的高吞吐量、内存高效的框架。DistZO2引入了三种并行策略：（1）扰动并行（PertP），将两个扰动的前向传递并行化到设备上；（2）分布式数据并行（DDP），适应ZO训练的标量梯度性质；以及（3）统一的2D并行设计，结合了PertP和DDP。为了进一步缓解参数转移引入的通信瓶颈，我们提出了一种硬件感知的通信策略，将参数块切片并通过高速互连（如NVLink）重新分布到GPU上。DistZO2将零阶微调扩展到现代多GPU系统，保持ZO2的内存效率，同时大幅提高训练吞吐量。在OPT-175B上的实验中，DistZO2在分布式计算中实现了3倍的加速比ZO2。DistZO2的代码已在https://github.com/liangyuwang/zo2上开源。

更新时间: 2025-07-03 22:53:34

领域: cs.LG,cs.PF

下载: http://arxiv.org/abs/2507.03211v1

LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction

The discovery of new ionizable lipids for efficient lipid nanoparticle (LNP)-mediated RNA delivery remains a critical bottleneck for RNA-based therapeutics development. Recent advances have highlighted the potential of machine learning (ML) to predict transfection efficiency from molecular structure, enabling high-throughput virtual screening and accelerating lead identification. However, existing approaches are hindered by inadequate data quality, ineffective feature representations, low predictive accuracy, and poor generalizability. Here, we present LANTERN (Lipid nANoparticle Transfection Efficiency pRedictioN), a robust ML framework for predicting transfection efficiency based on ionizable lipid representation. We benchmarked a diverse set of ML models against AGILE, a previously published model developed for transfection prediction. Our results show that combining simpler models with chemically informative features, particularly count-based Morgan fingerprints, outperforms more complex models that rely on internally learned embeddings, such as AGILE. We also show that a multi-layer perceptron trained on a combination of Morgan fingerprints and Expert descriptors achieved the highest performance ($\text{R}^2$ = 0.8161, r = 0.9053), significantly exceeding AGILE ($\text{R}^2$ = 0.2655, r = 0.5488). We show that the models in LANTERN consistently have strong performance across multiple evaluation metrics. Thus, LANTERN offers a robust benchmarking framework for LNP transfection prediction and serves as a valuable tool for accelerating lipid-based RNA delivery systems design.

Updated: 2025-07-03 22:49:49

标题: 灯笼: 一种用于脂质纳米粒子转染效率预测的机器学习框架

摘要: 发现新的可离子化脂质以提高脂质纳米粒子（LNP）介导的RNA传递效率仍然是RNA疗法发展的一个关键瓶颈。最近的进展突显了机器学习（ML）在从分子结构预测转染效率方面的潜力，从而实现高通量虚拟筛选并加快先导物的鉴定。然而，现有方法受到数据质量不足、特征表示无效、预测准确性低和泛化能力差的阻碍。在这里，我们提出了LANTERN（脂质纳米粒子转染效率预测），这是一个基于可离子化脂质表示的强大机器学习框架，用于预测转染效率。我们将各种ML模型与以前为转染预测开发的AGILE模型进行了基准测试。我们的结果显示，结合简单模型和化学信息特征，特别是基于计数的Morgan指纹，胜过依赖内部学习嵌入的更复杂模型，如AGILE。我们还表明，一个基于Morgan指纹和专家描述符组合训练的多层感知器达到了最高性能（R^2 = 0.8161，r = 0.9053），明显超过AGILE（R^2 = 0.2655，r = 0.5488）。我们展示LANTERN中的模型在多个评估指标上始终表现出色。因此，LANTERN为LNP转染预测提供了一个强大的基准框架，并成为加速基于脂质的RNA传递系统设计的有价值工具。

更新时间: 2025-07-03 22:49:49

领域: q-bio.QM,cs.CE,cs.LG,q-bio.MN

下载: http://arxiv.org/abs/2507.03209v1

Statistical Significance of Feature Importance Rankings

Feature importance scores are ubiquitous tools for understanding the predictions of machine learning models. However, many popular attribution methods suffer from high instability due to random sampling. Leveraging novel ideas from hypothesis testing, we devise techniques that ensure the most important features are correct with high-probability guarantees. These assess the set of $K$ top-ranked features, as well as the order of its elements. Given a set of local or global importance scores, we demonstrate how to retrospectively verify the stability of the highest ranks. We then introduce two efficient sampling algorithms that identify the $K$ most important features, perhaps in order, with probability exceeding $1-\alpha$. The theoretical justification for these procedures is validated empirically on SHAP and LIME.

Updated: 2025-07-03 22:45:39

标题: 特征重要性排名的统计显著性

摘要: 特征重要性评分是理解机器学习模型预测的普遍工具。然而，许多流行的归因方法由于随机抽样而导致高度不稳定。利用假设检验的新颖思想，我们设计了一些技术，确保最重要的特征具有高概率保证是正确的。这些技术评估了$K$个排名最高的特征集，以及其元素的顺序。在给定一组局部或全局重要性评分的情况下，我们演示了如何回顾性验证最高排名的稳定性。然后，我们介绍了两种有效的抽样算法，可以识别$K$个最重要的特征，也许是按顺序，其概率超过$1-\alpha$。这些程序的理论依据在SHAP和LIME上经验验证。

更新时间: 2025-07-03 22:45:39

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2401.15800v5

Weak Form Scientific Machine Learning: Test Function Construction for System Identification

Weak form Scientific Machine Learning (WSciML) is a recently developed framework for data-driven modeling and scientific discovery. It leverages the weak form of equation error residuals to provide enhanced noise robustness in system identification via convolving model equations with test functions, reformulating the problem to avoid direct differentiation of data. The performance, however, relies on wisely choosing a set of compactly supported test functions. In this work, we mathematically motivate a novel data-driven method for constructing Single-scale-Local reference functions for creating the set of test functions. Our approach numerically approximates the integration error introduced by the quadrature and identifies the support size for which the error is minimal, without requiring access to the model parameter values. Through numerical experiments across various models, noise levels, and temporal resolutions, we demonstrate that the selected supports consistently align with regions of minimal parameter estimation error. We also compare the proposed method against the strategy for constructing Multi-scale-Global (and orthogonal) test functions introduced in our prior work, demonstrating the improved computational efficiency.

Updated: 2025-07-03 22:36:34

标题: 弱形式科学机器学习：用于系统识别的测试函数构建

摘要: 弱形式科学机器学习（WSciML）是一种最近开发的基于数据驱动建模和科学发现的框架。它利用方程误差残差的弱形式，通过将模型方程与测试函数卷积，重构问题以避免直接对数据进行微分，从而提供了增强的噪声鲁棒性用于系统辨识。然而，性能取决于明智选择一组紧支撑的测试函数。在这项工作中，我们在数学上推导了一种新颖的数据驱动方法，用于构建单尺度局部参考函数，以创建测试函数集。我们的方法通过数值逼近积分误差引入的误差，并确定误差最小的支持大小，而无需访问模型参数值。通过在各种模型、噪声水平和时间分辨率上进行数值实验，我们展示了所选支撑与参数估计误差最小的区域一致的结果。我们还将所提出的方法与我们先前工作中介绍的构建多尺度全局（和正交）测试函数的策略进行了比较，展示了改进的计算效率。

更新时间: 2025-07-03 22:36:34

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2507.03206v1

On the Adversarial Robustness of Graph Neural Networks with Graph Reduction

As Graph Neural Networks (GNNs) become increasingly popular for learning from large-scale graph data across various domains, their susceptibility to adversarial attacks when using graph reduction techniques for scalability remains underexplored. In this paper, we present an extensive empirical study to investigate the impact of graph reduction techniques, specifically graph coarsening and sparsification, on the robustness of GNNs against adversarial attacks. Through extensive experiments involving multiple datasets and GNN architectures, we examine the effects of four sparsification and six coarsening methods on the poisoning attacks. Our results indicate that, while graph sparsification can mitigate the effectiveness of certain poisoning attacks, such as Mettack, it has limited impact on others, like PGD. Conversely, graph coarsening tends to amplify the adversarial impact, significantly reducing classification accuracy as the reduction ratio decreases. Additionally, we provide a novel analysis of the causes driving these effects and examine how defensive GNN models perform under graph reduction, offering practical insights for designing robust GNNs within graph acceleration systems.

Updated: 2025-07-03 22:33:12

标题: 关于图神经网络在图缩减中的对抗稳健性

摘要: 随着图神经网络（GNNs）在各个领域的大规模图数据中越来越受欢迎，当使用图缩减技术进行扩展性时，它们对于对抗性攻击的易受性仍未得到充分探讨。本文通过一个广泛的实证研究，探讨了图缩减技术，特别是图粗化和稀疏化，对GNNs对抗性攻击的鲁棒性的影响。通过涉及多个数据集和GNN架构的广泛实验，我们研究了四种稀疏化和六种粗化方法对中毒攻击的影响。我们的结果表明，尽管图稀疏化可以减轻某些中毒攻击（如Mettack）的有效性，但对其他攻击（如PGD）的影响有限。相反，图粗化往往会放大对抗性影响，在缩减比例减小时显著降低分类准确性。此外，我们提供了对这些效果的原因的新颖分析，并研究了防御型GNN模型在图缩减下的表现，为设计图加速系统内的鲁棒性GNN提供实用见解。

更新时间: 2025-07-03 22:33:12

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2412.05883v2

AI-driven Web Application for Early Detection of Sudden Death Syndrome (SDS) in Soybean Leaves Using Hyperspectral Images and Genetic Algorithm

Sudden Death Syndrome (SDS), caused by Fusarium virguliforme, poses a significant threat to soybean production. This study presents an AI-driven web application for early detection of SDS on soybean leaves using hyperspectral imaging, enabling diagnosis prior to visible symptom onset. Leaf samples from healthy and inoculated plants were scanned using a portable hyperspectral imaging system (398-1011 nm), and a Genetic Algorithm was employed to select five informative wavelengths (505.4, 563.7, 712.2, 812.9, and 908.4 nm) critical for discriminating infection status. These selected bands were fed into a lightweight Convolutional Neural Network (CNN) to extract spatial-spectral features, which were subsequently classified using ten classical machine learning models. Ensemble classifiers (Random Forest, AdaBoost), Linear SVM, and Neural Net achieved the highest accuracy (>98%) and minimal error across all folds, as confirmed by confusion matrices and cross-validation metrics. Poor performance by Gaussian Process and QDA highlighted their unsuitability for this dataset. The trained models were deployed within a web application that enables users to upload hyperspectral leaf images, visualize spectral profiles, and receive real-time classification results. This system supports rapid and accessible plant disease diagnostics, contributing to precision agriculture practices. Future work will expand the training dataset to encompass diverse genotypes, field conditions, and disease stages, and will extend the system for multiclass disease classification and broader crop applicability.

Updated: 2025-07-03 22:20:47

标题: 基于AI的网络应用程序用于利用高光谱图像和遗传算法对大豆叶片的突发死亡综合征（SDS）进行早期检测

摘要: 突发性死亡综合征（SDS）是由镰刀菌引起的，对大豆生产构成了重大威胁。本研究提出了一种基于人工智能的网络应用程序，利用高光谱成像对大豆叶片上SDS进行早期检测，使诊断在可见症状出现之前进行。健康和接种植物的叶样本使用便携式高光谱成像系统（398-1011 nm）进行扫描，并采用遗传算法选择了五个信息波长（505.4、563.7、712.2、812.9和908.4 nm），这些波长对于区分感染状态至关重要。这些选择的波段被输入到轻量级卷积神经网络（CNN）中，以提取空间-光谱特征，随后使用十种传统机器学习模型进行分类。集成分类器（随机森林、AdaBoost）、线性SVM和神经网络在所有折叠中实现了最高的准确性（>98%）和最小的误差，这得到了混淆矩阵和交叉验证指标的确认。高斯过程和QDA的表现不佳突显了它们对于这一数据集的不适用性。训练好的模型已部署在一个网络应用程序中，用户可以上传高光谱叶片图像，可视化光谱轮廓，并获得实时分类结果。该系统支持快速和易于获取的植物疾病诊断，有助于精准农业实践。未来的工作将扩展培训数据集以涵盖不同的基因型、田间条件和疾病阶段，并将扩展该系统用于多类疾病分类和更广泛的作物适用性。

更新时间: 2025-07-03 22:20:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.03198v1

Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding

CD8+ "killer" T cells and CD4+ "helper" T cells play a central role in the adaptive immune system by recognizing antigens presented by Major Histocompatibility Complex (pMHC) molecules via T Cell Receptors (TCRs). Modeling binding between T cells and the pMHC complex is fundamental to understanding basic mechanisms of human immune response as well as in developing therapies. While transformer-based models such as TULIP have achieved impressive performance in this domain, their black-box nature precludes interpretability and thus limits a deeper mechanistic understanding of T cell response. Most existing post-hoc explainable AI (XAI) methods are confined to encoder-only, co-attention, or model-specific architectures and cannot handle encoder-decoder transformers used in TCR-pMHC modeling. To address this gap, we propose Quantifying Cross-Attention Interaction (QCAI), a new post-hoc method designed to interpret the cross-attention mechanisms in transformer decoders. Quantitative evaluation is a challenge for XAI methods; we have compiled TCR-XAI, a benchmark consisting of 274 experimentally determined TCR-pMHC structures to serve as ground truth for binding. Using these structures we compute physical distances between relevant amino acid residues in the TCR-pMHC interaction region and evaluate how well our method and others estimate the importance of residues in this region across the dataset. We show that QCAI achieves state-of-the-art performance on both interpretability and prediction accuracy under the TCR-XAI benchmark.

Updated: 2025-07-03 22:18:54

标题: 利用Transformer量化TCR-pMHC结合中交叉注意力相互作用

摘要: CD8+“杀手”T细胞和CD4+“辅助”T细胞通过识别由主要组织相容性复合物（pMHC）分子呈现的抗原，通过T细胞受体（TCR）在适应性免疫系统中发挥中心作用。对T细胞和pMHC复合物之间的结合建模对于理解人类免疫应答的基本机制以及开发治疗方法至关重要。尽管基于Transformer的模型如TULIP在该领域取得了令人印象深刻的表现，但它们的黑匣子特性限制了可解释性，从而限制了对T细胞应答的更深层次的机械理解。大多数现有的事后可解释人工智能（XAI）方法局限于仅编码器、共同注意力或特定模型架构，并且无法处理用于TCR-pMHC建模的编码器-解码器transformer。为了填补这一差距，我们提出了Quantifying Cross-Attention Interaction（QCAI），这是一种新的事后方法，旨在解释transformer解码器中的交叉注意力机制。定量评估对于XAI方法来说是一个挑战；我们已经编制了TCR-XAI，一个基准，其中包括274个实验确定的TCR-pMHC结构，作为结合的基准。利用这些结构，我们计算了TCR-pMHC相互作用区域中相关氨基酸残基之间的物理距离，并评估我们的方法和其他方法在整个数据集中估计该区域残基重要性的表现如何。我们展示了QCAI在TCR-XAI基准下在解释性和预测准确性方面取得了最新的表现。

更新时间: 2025-07-03 22:18:54

领域: cs.CE,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2507.03197v1

Towards Universal Semantics With Large Language Models

The Natural Semantic Metalanguage (NSM) is a linguistic theory based on a universal set of semantic primes: simple, primitive word-meanings that have been shown to exist in most, if not all, languages of the world. According to this framework, any word, regardless of complexity, can be paraphrased using these primes, revealing a clear and universally translatable meaning. These paraphrases, known as explications, can offer valuable applications for many natural language processing (NLP) tasks, but producing them has traditionally been a slow, manual process. In this work, we present the first study of using large language models (LLMs) to generate NSM explications. We introduce automatic evaluation methods, a tailored dataset for training and evaluation, and fine-tuned models for this task. Our 1B and 8B models outperform GPT-4o in producing accurate, cross-translatable explications, marking a significant step toward universal semantic representation with LLMs and opening up new possibilities for applications in semantic analysis, translation, and beyond. Our code is available at https://github.com/OSU-STARLAB/DeepNSM.

Updated: 2025-07-03 22:02:03

标题: 朝向具有大型语言模型的通用语义

摘要: 自然语义金属anguage (NSM) 是一种基于一组普遍的语义原语的语言理论：简单、原始的词义，已经被证明存在于世界上大多数，如果不是全部，语言中。根据这一框架，任何单词，无论多么复杂，都可以用这些原语进行释义，揭示出清晰且普遍可翻译的含义。这些释义，被称为阐释，可以为许多自然语言处理（NLP）任务提供有价值的应用，但传统上生成它们是一个缓慢而手动的过程。在这项工作中，我们首次研究使用大型语言模型（LLMs）生成NSM释义。我们引入了自动评估方法，为训练和评估定制的数据集，以及针对这一任务的微调模型。我们的1B和8B模型在生成准确、可交叉翻译的释义方面优于GPT-4o，标志着向LLMs实现通用语义表示迈出了重要一步，并为语义分析、翻译等领域开辟了新的可能性。我们的代码可在 https://github.com/OSU-STARLAB/DeepNSM 上找到。

更新时间: 2025-07-03 22:02:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2505.11764v3

CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning

Continual learning (CL) - the ability to progressively acquire and integrate new concepts - is essential to intelligent systems to adapt to dynamic environments. However, deep neural networks struggle with catastrophic forgetting (CF) when learning tasks sequentially, as training for new tasks often overwrites previously learned knowledge. To address this, recent approaches constrain updates to orthogonal subspaces using gradient projection, effectively preserving important gradient directions for previous tasks. While effective in reducing forgetting, these approaches inadvertently hinder forward knowledge transfer (FWT), particularly when tasks are highly correlated. In this work, we propose Conceptor-based gradient projection for Deep Continual Learning (CODE-CL), a novel method that leverages conceptor matrix representations, a form of regularized reconstruction, to adaptively handle highly correlated tasks. CODE-CL mitigates CF by projecting gradients onto pseudo-orthogonal subspaces of previous task feature spaces while simultaneously promoting FWT. It achieves this by learning a linear combination of shared basis directions, allowing efficient balance between stability and plasticity and transfer of knowledge between overlapping input feature representations. Extensive experiments on continual learning benchmarks validate CODE-CL's efficacy, demonstrating superior performance, reduced forgetting, and improved FWT as compared to state-of-the-art methods.

Updated: 2025-07-03 21:58:43

标题: CODE-CL: 基于概念者的梯度投影用于深度持续学习

摘要: 持续学习（CL）-逐渐获取和整合新概念的能力-对于智能系统适应动态环境至关重要。然而，深度神经网络在顺序学习任务时往往会遭遇灾难性遗忘（CF），因为学习新任务时往往会覆盖先前学到的知识。为了解决这个问题，最近的方法通过梯度投影将更新限制在正交子空间中，有效地保留了先前任务的重要梯度方向。尽管这些方法在减少遗忘方面很有效，但在任务高度相关时，它们无意中阻碍了前向知识传递（FWT）。在这项工作中，我们提出了基于概念器的梯度投影深度持续学习方法（CODE-CL），这是一种利用概念器矩阵表示的新方法，一种正则化重构形式，用于自适应处理高度相关的任务。CODE-CL通过将梯度投影到先前任务特征空间的伪正交子空间上来减轻CF，同时促进FWT。它通过学习共享基础方向的线性组合来实现这一点，从而实现在稳定性和可塑性之间的有效平衡，并在重叠的输入特征表示之间转移知识。对持续学习基准的大量实验验证了CODE-CL的有效性，表明与最先进的方法相比，其性能更优越，遗忘更少，并且改善了FWT。

更新时间: 2025-07-03 21:58:43

领域: cs.LG,cs.AI,cs.CV,cs.NE

下载: http://arxiv.org/abs/2411.15235v3

How Much Content Do LLMs Generate That Induces Cognitive Bias in Users?

Large language models (LLMs) are increasingly integrated into applications ranging from review summarization to medical diagnosis support, where they affect human decisions. Even though LLMs perform well in many tasks, they may also inherit societal or cognitive biases, which can inadvertently transfer to humans. We investigate when and how LLMs expose users to biased content and quantify its severity. Specifically, we assess three LLM families in summarization and news fact-checking tasks, evaluating how much LLMs stay consistent with their context and/or hallucinate. Our findings show that LLMs expose users to content that changes the sentiment of the context in 21.86% of the cases, hallucinates on post-knowledge-cutoff data questions in 57.33% of the cases, and primacy bias in 5.94% of the cases. We evaluate 18 distinct mitigation methods across three LLM families and find that targeted interventions can be effective. Given the prevalent use of LLMs in high-stakes domains, such as healthcare or legal analysis, our results highlight the need for robust technical safeguards and for developing user-centered interventions that address LLM limitations.

Updated: 2025-07-03 21:56:44

标题: LLM（Large Language Models）在用户中引起认知偏见的内容量有多大？

摘要: 大型语言模型（LLMs）越来越多地被整合到从评论总结到医学诊断支持等应用中，这些应用影响人类的决策。尽管LLMs在许多任务中表现良好，但它们也可能继承社会或认知偏见，这可能会无意中传递给人类。我们调查了LLMs何时以及如何向用户展示偏见内容，并量化其严重程度。具体而言，我们评估了三个LLM家族在总结和新闻事实核查任务中的表现，评估LLMs在多大程度上与其上下文保持一致和/或产生幻觉。我们的研究结果显示，LLMs在21.86％的情况下向用户展示会改变上下文情感的内容，在57.33％的情况下在切断后知识数据问题上产生幻觉，并在5.94％的情况下表现出先前偏见。我们评估了三个LLM家族中的18种不同的缓解方法，发现有针对性的干预可以取得效果。鉴于LLMs在高风险领域（如医疗保健或法律分析）中的广泛使用，我们的研究结果凸显了对强大的技术保障和开发面向用户的干预措施来解决LLMs局限性的需求。

更新时间: 2025-07-03 21:56:44

领域: cs.CL,cs.AI,I.2.7, H.5.2

下载: http://arxiv.org/abs/2507.03194v1

Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity

To integrate into human-centered environments, autonomous agents must learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) can enable this by learning reward functions from human preferences. However, humans live in a world full of diverse information, most of which is irrelevant to completing any particular task. It then becomes essential that agents learn to focus on the subset of task-relevant state features. To that end, this work proposes R2N (Robust-to-Noise), the first PbRL algorithm that leverages principles of dynamic sparse training to learn robust reward models that can focus on task-relevant features. In experiments with a simulated teacher, we demonstrate that R2N can adapt the sparse connectivity of its neural networks to focus on task-relevant features, enabling R2N to significantly outperform several sparse training and PbRL algorithms across simulated robotic environments.

Updated: 2025-07-03 21:51:45

标题: 使用动态稀疏性增强偏好驱动的强化学习的稳健性

摘要: 要融入以人为中心的环境，自主代理必须从人类的本土环境中学习并适应。基于偏好的强化学习（PbRL）可以通过从人类偏好中学习奖励函数来实现这一点。然而，人类生活在充满各种信息的世界中，其中大部分与完成任何特定任务无关。因此，代理学习集中在与任务相关的状态特征的子集上变得至关重要。为此，本文提出了R2N（Robust-to-Noise），这是第一个利用动态稀疏训练原则来学习能够集中在任务相关特征上的稳健奖励模型的PbRL算法。通过与模拟教师的实验，我们证明了R2N可以调整其神经网络的稀疏连接以集中在任务相关特征上，从而使R2N能够在模拟机器人环境中明显优于几种稀疏训练和PbRL算法。

更新时间: 2025-07-03 21:51:45

领域: cs.LG

下载: http://arxiv.org/abs/2406.06495v3

A Fuzzy Supervisor Agent Design for Clinical Reasoning Assistance in a Multi-Agent Educational Clinical Scenario Simulation

Assisting medical students with clinical reasoning (CR) during clinical scenario training remains a persistent challenge in medical education. This paper presents the design and architecture of the Fuzzy Supervisor Agent (FSA), a novel component for the Multi-Agent Educational Clinical Scenario Simulation (MAECSS) platform. The FSA leverages a Fuzzy Inference System (FIS) to continuously interpret student interactions with specialized clinical agents (e.g., patient, physical exam, diagnostic, intervention) using pre-defined fuzzy rule bases for professionalism, medical relevance, ethical behavior, and contextual distraction. By analyzing student decision-making processes in real-time, the FSA is designed to deliver adaptive, context-aware feedback and provides assistance precisely when students encounter difficulties. This work focuses on the technical framework and rationale of the FSA, highlighting its potential to provide scalable, flexible, and human-like supervision in simulation-based medical education. Future work will include empirical evaluation and integration into broader educational settings. More detailed design and implementation is~\href{https://github.com/2sigmaEdTech/MAS/}{open sourced here}.

Updated: 2025-07-03 21:51:27

标题: 一个模糊监督代理设计用于在多代理教育临床场景模拟中辅助临床推理

摘要: 在临床场景培训中协助医学生进行临床推理（CR）仍然是医学教育中一个持久的挑战。本文介绍了模糊监督代理（FSA）的设计和架构，这是多智能体教育临床场景模拟（MAECSS）平台的一个新颖组件。FSA利用模糊推理系统（FIS）持续解释学生与专业临床代理（例如患者、体格检查、诊断、干预）的互动，使用预定义的模糊规则库来评估专业精神、医学相关性、道德行为和背景干扰。通过实时分析学生的决策过程，FSA旨在提供自适应、上下文感知的反馈，并在学生遇到困难时提供帮助。本研究侧重于FSA的技术框架和理论基础，强调其在基于模拟的医学教育中提供可扩展、灵活和类似人类监督的潜力。未来的工作将包括实证评估和将其整合到更广泛的教育环境中。更详细的设计和实现可以在此处开源：https://github.com/2sigmaEdTech/MAS/。

更新时间: 2025-07-03 21:51:27

领域: cs.CY,cs.AI,cs.HC,cs.LO,cs.MA,D.2.4; K.3.1; C.3; I.2.6

下载: http://arxiv.org/abs/2507.05275v1

Discovering Algorithms with Computational Language Processing

Algorithms are the engine for reproducible problem-solving. We present a framework automating algorithm discovery by conceptualizing them as sequences of operations, represented as tokens. These computational tokens are chained using a grammar, enabling the formation of increasingly sophisticated procedures. Our ensemble Monte Carlo tree search (MCTS) guided by reinforcement learning (RL) explores token chaining and drives the creation of new tokens. This methodology rediscovers, improves, and generates new algorithms that substantially outperform existing methods for strongly NP-hard combinatorial optimization problems and foundational quantum computing approaches such as Grover's and Quantum Approximate Optimization Algorithm. Operating at the computational rather than code-generation level, our framework produces algorithms that can be tailored specifically to problem instances, not merely classes.

Updated: 2025-07-03 21:45:17

标题: 使用计算语言处理发现算法

摘要: 算法是可重复问题解决的引擎。我们提出了一个框架，通过将算法概念化为表示为标记的操作序列，自动化算法发现。这些计算标记使用语法链接在一起，使得能够形成越来越复杂的程序。我们的集成蒙特卡罗树搜索(MCTS)由强化学习(RL)引导，探索标记链接并推动新标记的创建。这种方法重新发现、改进和生成新的算法，这些算法明显优于现有方法，特别是对于强NP难的组合优化问题和基础量子计算方法，如Grover和量子近似优化算法。在计算层面而不是代码生成层面操作，我们的框架产生可以针对特定问题实例而不仅仅是类别的算法。

更新时间: 2025-07-03 21:45:17

领域: cs.AI,cs.DS,cs.LG,es: 68T05, 68T20, 68Q12, 90C27,I.2.6; I.2.8; F.2.2; F.1.2; G.2.1

下载: http://arxiv.org/abs/2507.03190v1

First Contact: Data-driven Friction-Stir Process Control

This study validates the use of Neural Lumped Parameter Differential Equations for open-loop setpoint control of the plunge sequence in Friction Stir Processing (FSP). The approach integrates a data-driven framework with classical heat transfer techniques to predict tool temperatures, informing control strategies. By utilizing a trained Neural Lumped Parameter Differential Equation model, we translate theoretical predictions into practical set-point control, facilitating rapid attainment of desired tool temperatures and ensuring consistent thermomechanical states during FSP. This study covers the design, implementation, and experimental validation of our control approach, establishing a foundation for efficient, adaptive FSP operations.

Updated: 2025-07-03 21:09:46

标题: 首次接触：基于数据驱动的搅拌摩擦焊工艺控制

摘要: 这项研究验证了使用神经网络集总参数微分方程来进行摩擦搅拌加工（FSP）中的开环设定点控制的有效性。该方法将数据驱动框架与经典热传递技术相结合，以预测工具温度，从而指导控制策略。通过利用经过训练的神经网络集总参数微分方程模型，我们将理论预测转化为实际设定点控制，促进快速达到所需的工具温度，并确保在FSP过程中保持一致的热机械状态。本研究涵盖了我们控制方法的设计、实施和实验验证，为高效、自适应的FSP操作奠定了基础。

更新时间: 2025-07-03 21:09:46

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2507.03177v1

Deep Learning Atmospheric Models Reliably Simulate Out-of-Sample Land Heat and Cold Wave Frequencies

Deep learning (DL)-based general circulation models (GCMs) are emerging as fast simulators, yet their ability to replicate extreme events outside their training range remains unknown. Here, we evaluate two such models -- the hybrid Neural General Circulation Model (NGCM) and purely data-driven Deep Learning Earth System Model (DL\textit{ESy}M) -- against a conventional high-resolution land-atmosphere model (HiRAM) in simulating land heatwaves and coldwaves. All models are forced with observed sea surface temperatures and sea ice over 1900-2020, focusing on the out-of-sample early-20th-century period (1900-1960). Both DL models generalize successfully to unseen climate conditions, broadly reproducing the frequency and spatial patterns of heatwave and cold wave events during 1900-1960 with skill comparable to HiRAM. An exception is over portions of North Asia and North America, where all models perform poorly during 1940-1960. Due to excessive temperature autocorrelation, DL\textit{ESy}M tends to overestimate heatwave and cold wave frequencies, whereas the physics-DL hybrid NGCM exhibits persistence more similar to HiRAM.

Updated: 2025-07-03 21:09:27

标题: 深度学习大气模型可可靠地模拟样本外的陆地热波和寒潮频率

摘要: 基于深度学习（DL）的大气环流模型（GCMs）正逐渐成为快速模拟器，然而它们在复制训练范围之外的极端事件的能力仍然未知。在这里，我们评估了两种这样的模型--混合神经大气环流模型（NGCM）和纯数据驱动的深度学习地球系统模型（DLESyM）--与传统的高分辨率陆气模型（HiRAM）在模拟陆地热浪和寒潮方面的表现。所有模型在1900-2020年受到观测到的海表温度和海冰的影响，重点关注样本外的20世纪初期（1900-1960年）的时期。两个DL模型成功地泛化到未见的气候条件，广泛地再现了1900-1960年间热浪和寒潮事件的频率和空间模式，其技能与HiRAM相当。一个例外是在北亚和北美部分地区，在1940-1960年间所有模型的表现都较差。由于温度自相关性过高，DLESyM倾向于高估热浪和寒潮的频率，而物理-DL混合NGCM表现出的持续性更类似于HiRAM。

更新时间: 2025-07-03 21:09:27

领域: physics.ao-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.03176v1

Understanding Knowledge Transferability for Transfer Learning: A Survey

Transfer learning has become an essential paradigm in artificial intelligence, enabling the transfer of knowledge from a source task to improve performance on a target task. This approach, particularly through techniques such as pretraining and fine-tuning, has seen significant success in fields like computer vision and natural language processing. However, despite its widespread use, how to reliably assess the transferability of knowledge remains a challenge. Understanding the theoretical underpinnings of each transferability metric is critical for ensuring the success of transfer learning. In this survey, we provide a unified taxonomy of transferability metrics, categorizing them based on transferable knowledge types and measurement granularity. This work examines the various metrics developed to evaluate the potential of source knowledge for transfer learning and their applicability across different learning paradigms emphasizing the need for careful selection of these metrics. By offering insights into how different metrics work under varying conditions, this survey aims to guide researchers and practitioners in selecting the most appropriate metric for specific applications, contributing to more efficient, reliable, and trustworthy AI systems. Finally, we discuss some open challenges in this field and propose future research directions to further advance the application of transferability metrics in trustworthy transfer learning.

Updated: 2025-07-03 21:06:30

标题: 理解知识可转移性对于迁移学习的重要性：一项调查

摘要: 迁移学习已成为人工智能中的一个重要范式，可以通过将知识从源任务转移到目标任务来提高性能。这种方法，特别是通过预训练和微调等技术，在计算机视觉和自然语言处理等领域取得了显著成功。然而，尽管被广泛应用，如何可靠地评估知识的可转移性仍然是一个挑战。了解每种可转移性指标的理论基础对于确保迁移学习的成功至关重要。在本调查中，我们提供了一个统一的迁移性指标分类法，根据可迁移知识类型和测量粒度对其进行分类。这项工作考察了各种指标的发展，用于评估源知识在迁移学习中的潜力以及它们在不同学习范式中的适用性，强调需要谨慎选择这些指标。通过揭示不同指标在不同条件下的工作方式，本调查旨在指导研究人员和从业者选择最适合特定应用的指标，为更高效、可靠和可信赖的人工智能系统做出贡献。最后，我们讨论了这一领域的一些挑战，并提出了未来研究方向，以进一步推进迁移性指标在可信任的迁移学习中的应用。

更新时间: 2025-07-03 21:06:30

领域: cs.LG,cs.AI,68U01

下载: http://arxiv.org/abs/2507.03175v1

Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data

Accurate characterization of the equilibrium distributions of complex molecular systems and their dependence on environmental factors such as temperature is essential for understanding thermodynamic properties and transition mechanisms. Projecting these distributions onto meaningful low-dimensional representations enables interpretability and downstream analysis. Recent advances in generative AI, particularly flow models such as Normalizing Flows (NFs), have shown promise in modeling such distributions, but their scope is limited without tailored representation learning. In this work, we introduce Latent Thermodynamic Flows (LaTF), an end-to-end framework that tightly integrates representation learning and generative modeling. LaTF unifies the State Predictive Information Bottleneck (SPIB) with NFs to simultaneously learn low-dimensional latent representations, referred to as Collective Variables (CVs), classify metastable states, and generate equilibrium distributions across temperatures beyond the training data. The two components of representation learning and generative modeling are optimized jointly, ensuring that the learned latent features capture the system's slow, important degrees of freedom while the generative model accurately reproduces the system's equilibrium behavior. We demonstrate LaTF's effectiveness across diverse systems, including a model potential, the Chignolin protein, and cluster of Lennard Jones particles, with thorough evaluations and benchmarking using multiple metrics and extensive simulations. Finally, we apply LaTF to a RNA tetraloop system, where despite using simulation data from only two temperatures, LaTF reconstructs the temperature-dependent structural ensemble and melting behavior, consistent with experimental and prior extensive computational results.

Updated: 2025-07-03 21:02:36

标题: 潜在热力学流：从有限数据中统一表示学习和生成建模温度相关行为

摘要: 精确表征复杂分子系统的平衡分布及其对温度等环境因素的依赖是理解热力学性质和转变机制的关键。将这些分布投影到有意义的低维表示中能够实现可解释性和下游分析。最近，生成式人工智能的进展，特别是诸如归一化流（NFs）之类的流模型，在建模这些分布方面显示出潜力，但如果没有量身定制的表示学习，其范围将受到限制。在这项工作中，我们介绍了潜热力流（LaTF），这是一个端到端框架，紧密集成了表示学习和生成式建模。LaTF将状态预测信息瓶颈（SPIB）与NFs统一起来，同时学习低维潜在表示，称为集体变量（CVs），分类亚稳态，以及在训练数据之外的温度范围内生成平衡分布。表示学习和生成式建模的两个组件一起优化，确保学到的潜在特征捕捉到系统的慢速、重要自由度，同时生成模型准确地再现系统的平衡行为。我们通过多种系统展示了LaTF的有效性，包括一个模型势能、Chignolin蛋白质和Lennard Jones粒子团簇，并使用多种度量标准和大量模拟进行了全面评估和基准测试。最后，我们将LaTF应用于RNA四环系统，在这个系统中，尽管仅使用了两个温度的模拟数据，LaTF重建了依赖温度的结构集合和熔化行为，与实验和先前的大量计算结果一致。

更新时间: 2025-07-03 21:02:36

领域: cs.LG,cond-mat.stat-mech,physics.bio-ph,physics.chem-ph

下载: http://arxiv.org/abs/2507.03174v1

A Neurosymbolic Framework for Geometric Reduction of Binary Forms

This paper compares Julia reduction and hyperbolic reduction with the aim of finding equivalent binary forms with minimal coefficients. We demonstrate that hyperbolic reduction generally outperforms Julia reduction, particularly in the cases of sextics and decimics, though neither method guarantees achieving the minimal form. We further propose an additional shift and scaling to approximate the minimal form more closely. Finally, we introduce a machine learning framework to identify optimal transformations that minimize the heights of binary forms. This study provides new insights into the geometry and algebra of binary forms and highlights the potential of AI in advancing symbolic computation and reduction techniques. The findings, supported by extensive computational experiments, lay the groundwork for hybrid approaches that integrate traditional reduction methods with data-driven techniques.

Updated: 2025-07-03 20:56:03

标题: 一个用于几何化简二进制形式的神经符号框架

摘要: 这篇论文比较了Julia约化和双曲约化，旨在找到具有最小系数的等价二次形式。我们证明双曲约化通常优于Julia约化，特别是在六次和十次的情况下，尽管两种方法都不能保证达到最小形式。我们进一步提出了一个额外的平移和缩放来更接近逼近最小形式。最后，我们引入了一个机器学习框架来识别最小化二次形式高度的最佳转换。这项研究提供了有关二次形式的几何和代数的新见解，并强调了人工智能在推进符号计算和约化技术方面的潜力。通过大量的计算实验支持的发现为将传统约化方法与数据驱动技术相结合的混合方法奠定了基础。

更新时间: 2025-07-03 20:56:03

领域: cs.AI,cs.LG,I.2.3

下载: http://arxiv.org/abs/2501.15404v2

Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation

The rise of generative AI search engines is disrupting traditional SEO, with Gartner predicting 25% reduction in conventional search usage by 2026. This necessitates new approaches for web content visibility in AI-driven search environments. We present a domain-specific fine-tuning approach for Generative Engine Optimization (GEO) that transforms web content to improve discoverability in large language model outputs. Our method fine-tunes a BART-base transformer on synthetically generated training data comprising 1,905 cleaned travel website content pairs. Each pair consists of raw website text and its GEO-optimized counterpart incorporating credible citations, statistical evidence, and improved linguistic fluency. We evaluate using intrinsic metrics (ROUGE-L, BLEU) and extrinsic visibility assessments through controlled experiments with Llama-3.3-70B. The fine-tuned model achieves significant improvements over baseline BART: ROUGE-L scores of 0.249 (vs. 0.226) and BLEU scores of 0.200 (vs. 0.173). Most importantly, optimized content demonstrates substantial visibility gains in generative search responses with 15.63% improvement in absolute word count and 30.96% improvement in position-adjusted word count metrics. This work provides the first empirical demonstration that targeted transformer fine-tuning can effectively enhance web content visibility in generative search engines with modest computational resources. Our results suggest GEO represents a tractable approach for content optimization in the AI-driven search landscape, offering concrete evidence that small-scale, domain-focused fine-tuning yields meaningful improvements in content discoverability.

Updated: 2025-07-03 20:52:10

标题: 超越SEO：一种基于Transformer的方法，重新定义网络内容优化

摘要: 生成式人工智能搜索引擎的崛起正在打破传统的SEO，Gartner预测到2026年传统搜索使用量将减少25%。这需要在AI驱动的搜索环境中采用新方法来提高网页内容的可见性。我们提出了一种领域特定的精细调整方法，用于生成引擎优化(GEO)，该方法将网页内容转化为大型语言模型输出中的可发现性。我们的方法在合成生成的训练数据上对BART-base transformer进行了微调，该数据包括1,905对经过清理的旅行网站内容。每对包括原始网站文本及其经过GEO优化的对应内容，其中包括可信引用、统计证据和改进的语言流畅性。我们使用内在指标(ROUGE-L、BLEU)和通过与Llama-3.3-70B的控制实验进行的外在可见性评估进行评估。微调模型相对基线BART实现了显著改进：ROUGE-L得分为0.249(对比0.226)和BLEU得分为0.200(对比0.173)。最重要的是，优化后的内容在生成式搜索响应中表现出明显的可见性增益，绝对单词数增加了15.63%，位置调整后的单词数指标提高了30.96%。这项工作提供了第一个实证证明，有针对性的transformer微调可以有效提高生成式搜索引擎中的网页内容可见性，而且只需适度的计算资源。我们的结果表明，GEO代表了在AI驱动的搜索格局中进行内容优化的可行方法，提供了具体证据，即小规模、面向领域的微调会显著提升内容的可发现性。

更新时间: 2025-07-03 20:52:10

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.03169v1

Adopting a human developmental visual diet yields robust, shape-based AI vision

Despite years of research and the dramatic scaling of artificial intelligence (AI) systems, a striking misalignment between artificial and human vision persists. Contrary to humans, AI heavily relies on texture-features rather than shape information, lacks robustness to image distortions, remains highly vulnerable to adversarial attacks, and struggles to recognise simple abstract shapes within complex backgrounds. To close this gap, we here introduce a solution that arises from a previously underexplored direction: rather than scaling up, we take inspiration from how human vision develops from early infancy into adulthood. We quantified the visual maturation by synthesising decades of psychophysical and neurophysiological research into a novel developmental visual diet (DVD) for AI vision. We show that guiding AI systems through this human-inspired curriculum produces models that closely align with human behaviour on every hallmark of robust vision tested yielding the strongest reported reliance on shape information to date, abstract shape recognition beyond the state of the art, higher robustness to image corruptions, and stronger resilience to adversarial attacks. By outperforming high parameter AI foundation models trained on orders of magnitude more data, we provide evidence that robust AI vision can be achieved by guiding the way how a model learns, not merely how much it learns, offering a resource-efficient route toward safer and more human-like artificial visual systems.

Updated: 2025-07-03 20:52:08

标题: 采用人类发展性视觉饮食可产生强大的基于形状的人工智能视觉

摘要: 尽管经过多年的研究和人工智能系统的显著扩展，人工智能与人类视觉之间存在明显的不一致。与人类相反，人工智能在很大程度上依赖纹理特征而不是形状信息，对图像失真缺乏稳健性，对对抗攻击高度脆弱，并且在复杂背景下难以识别简单的抽象形状。为了弥补这一差距，我们在此介绍了一种解决方案，该解决方案来自以前未曾探索的方向：与其扩大规模，不如从人类视觉从幼儿期到成年期的发展中获得灵感。我们通过将几十年的心理物理学和神经生理学研究综合成一种新颖的发展性视觉饮食（DVD）来量化视觉成熟。我们表明，通过引导人工智能系统通过这种受人类启发的课程，可以产生与人类行为在每个经过测试的稳健视觉标志上密切一致的模型，这些模型迄今为止对形状信息的依赖性最强，对抽象形状的识别超越了现有技术水平，对图像损坏具有更高的稳健性，并且对对抗攻击具有更强的抵抗力。通过胜过受过数倍数据训练的高参数人工智能基础模型，我们提供证据表明，通过引导模型学习的方式，而不仅仅是学习的量，可以实现稳健的人工智能视觉，为更安全和更类似人类的人工视觉系统提供了一条资源高效的途径。

更新时间: 2025-07-03 20:52:08

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.03168v1

Adversarial Manipulation of Reasoning Models using Internal Representations

Reasoning models generate chain-of-thought (CoT) tokens before their final output, but how this affects their vulnerability to jailbreak attacks remains unclear. While traditional language models make refusal decisions at the prompt-response boundary, we find evidence that DeepSeek-R1-Distill-Llama-8B makes these decisions within its CoT generation. We identify a linear direction in activation space during CoT token generation that predicts whether the model will refuse or comply -- termed the "caution" direction because it corresponds to cautious reasoning patterns in the generated text. Ablating this direction from model activations increases harmful compliance, effectively jailbreaking the model. We additionally show that intervening only on CoT token activations suffices to control final outputs, and that incorporating this direction into prompt-based attacks improves success rates. Our findings suggest that the chain-of-thought itself is a promising new target for adversarial manipulation in reasoning models. Code available at https://github.com/ky295/reasoning-manipulation

Updated: 2025-07-03 20:51:32

标题: 对通过内部表示进行推理模型的对抗性操纵

摘要: 推理模型在最终输出之前生成思维链（CoT）令牌，但这如何影响它们对越狱攻击的脆弱性尚不清楚。传统语言模型在提示-响应边界处进行拒绝决策，但我们发现DeepSeek-R1-Distill-Llama-8B在其CoT生成过程中做出这些决策。我们确定了在CoT令牌生成过程中激活空间中的线性方向，该方向可以预测模型是否会拒绝或遵从--称为“谨慎”方向，因为它对应于生成文本中谨慎的推理模式。从模型激活中消除这个方向会增加有害的遵从，有效地越狱模型。我们还展示仅干预CoT令牌激活就足以控制最终输出，并且将这个方向纳入基于提示的攻击可以提高成功率。我们的研究结果表明，思维链本身是推理模型中有望进行对抗性操纵的新目标。代码可在https://github.com/ky295/reasoning-manipulation找到。

更新时间: 2025-07-03 20:51:32

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.03167v1

PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset

Multimodal deep learning holds promise for improving clinical prediction by integrating diverse patient data, including text, imaging, time-series, and structured demographics. Contrastive learning facilitates this integration by producing a unified representation that can be reused across tasks, reducing the need for separate models or encoders. Although contrastive learning has seen success in vision-language domains, its use in clinical settings remains largely limited to image and text pairs. We propose the Pipeline for Contrastive Modality Evaluation and Encoding (PiCME), which systematically assesses five clinical data types from MIMIC: discharge summaries, radiology reports, chest X-rays, demographics, and time-series. We pre-train contrastive models on all 26 combinations of two to five modalities and evaluate their utility on in-hospital mortality and phenotype prediction. To address performance plateaus with more modalities, we introduce a Modality-Gated LSTM that weights each modality according to its contrastively learned importance. Our results show that contrastive models remain competitive with supervised baselines, particularly in three-modality settings. Performance declines beyond three modalities, which supervised models fail to recover. The Modality-Gated LSTM mitigates this drop, improving AUROC from 73.19% to 76.93% and AUPRC from 51.27% to 62.26% in the five-modality setting. We also compare contrastively learned modality importance scores with attribution scores and evaluate generalization across demographic subgroups, highlighting strengths in interpretability and fairness. PiCME is the first to scale contrastive learning across all modality combinations in MIMIC, offering guidance for modality selection, training strategies, and equitable clinical prediction.

Updated: 2025-07-03 20:45:37

标题: PiCME：在MIMIC数据集中对比模态评估和编码的管道

摘要: 多模态深度学习有望通过整合不同类型的患者数据，包括文本、影像、时间序列和结构化人口统计信息，改善临床预测。对比学习通过生成可以跨任务重复使用的统一表示，降低了对独立模型或编码器的需求，从而促进了这种整合。尽管对比学习在视觉语言领域取得了成功，但在临床环境中的应用仍然主要局限于图像和文本对。我们提出了Pipeline for Contrastive Modality Evaluation and Encoding（PiCME），系统评估了MIMIC中的五种临床数据类型：出院总结、放射学报告、胸部X射线、人口统计数据和时间序列。我们在所有两到五种模态的26种组合上预先训练对比模型，并评估它们在院内死亡和表型预测上的实用性。为了解决随着模态增多而性能达到平台的问题，我们引入了一种Modality-Gated LSTM，根据其对比学习的重要性对每种模态进行加权。我们的结果表明，对比模型在三模态设置中仍然与监督基线竞争力相当。在超过三种模态的情况下，性能下降，而监督模型无法恢复。Modality-Gated LSTM缓解了这种下降，将AUROC从73.19%提高到76.93%，将AUPRC从51.27%提高到62.26%。我们还将对比学习的模态重要性分数与归因分数进行比较，并评估跨人口统计子组的泛化能力，突显了解释性和公平性的优势。PiCME是首个在MIMIC中跨所有模态组合进行对比学习的工具，为模态选择、训练策略和公平的临床预测提供了指导。

更新时间: 2025-07-03 20:45:37

领域: cs.LG

下载: http://arxiv.org/abs/2507.03165v1

MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks

The rapid advancement of Large Language Models (LLMs) has transformed various domains, particularly computer science (CS) education. These models exhibit remarkable capabilities in code-related tasks and problem-solving, raising questions about their potential and limitations in advanced CS contexts. This study presents a novel bilingual (English-Romanian) multimodal (text and image) dataset of multiple-choice questions derived from a high-level computer science competition. A particularity of our dataset is that the problems are conceived such that some of them are easier solved using reasoning on paper, while for others writing code is more efficient. We systematically evaluate State of The Art LLMs on this dataset, analyzing their performance on theoretical programming tasks. Our findings reveal the strengths and limitations of current LLMs, including the influence of language choice (English vs. Romanian), providing insights into their applicability in CS education and competition settings. We also address critical ethical considerations surrounding educational integrity and the fairness of assessments in the context of LLM usage. These discussions aim to inform future educational practices and policies. To support further research, our dataset will be made publicly available in both English and Romanian. Additionally, we release an educational application tailored for Romanian students, enabling them to self-assess using the dataset in an interactive and practice-oriented environment.

Updated: 2025-07-03 20:43:28

标题: MateInfoUB：用于测试竞争性、多语言和多模态教育任务中LLMs的真实世界基准。

摘要: 大型语言模型（LLMs）的快速发展已经改变了各个领域，尤其是计算机科学（CS）教育。这些模型在与代码相关的任务和问题解决方面展现出了卓越的能力，引发了关于它们在高级CS环境中潜力和局限性的问题。本研究介绍了一个新颖的双语（英语-罗马尼亚语）多模态（文本和图像）数据集，该数据集由一场高级计算机科学竞赛中的多项选择题组成。我们数据集的一个特点是，这些问题被设计为有些可以更容易地通过纸上推理解决，而对于其他问题，编写代码更有效。我们在这个数据集上系统评估了最先进的LLMs在理论编程任务上的表现。我们的研究结果揭示了当前LLMs的优势和局限性，包括语言选择（英语 vs. 罗马尼亚语）的影响，为它们在CS教育和竞赛环境中的适用性提供了见解。我们还就LLM的使用背景下涉及教育诚信和评估公平性的重要道德考虑进行了讨论。这些讨论旨在为未来的教育实践和政策提供信息。为了支持进一步的研究，我们将以英语和罗马尼亚语的形式公开发布我们的数据集。此外，我们还发布了一款专为罗马尼亚学生量身定制的教育应用程序，使他们能够在互动和实践导向的环境中使用数据集进行自我评估。

更新时间: 2025-07-03 20:43:28

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.03162v1

MathOptAI.jl: Embed trained machine learning predictors into JuMP models

We present \texttt{MathOptAI.jl}, an open-source Julia library for embedding trained machine learning predictors into a JuMP model. \texttt{MathOptAI.jl} can embed a wide variety of neural networks, decision trees, and Gaussian Processes into a larger mathematical optimization model. In addition to interfacing a range of Julia-based machine learning libraries such as \texttt{Lux.jl} and \texttt{Flux.jl}, \texttt{MathOptAI.jl} uses Julia's Python interface to provide support for PyTorch models. When the PyTorch support is combined with \texttt{MathOptAI.jl}'s gray-box formulation, the function, Jacobian, and Hessian evaluations associated with the PyTorch model are offloaded to the GPU in Python, while the rest of the nonlinear oracles are evaluated on the CPU in Julia. \MathOptAI is available at https://github.com/lanl-ansi/MathOptAI.jl under a BSD-3 license.

Updated: 2025-07-03 20:32:08

标题: MathOptAI.jl：将训练好的机器学习预测器嵌入到JuMP模型中

摘要: 我们介绍了\texttt{MathOptAI.jl}，这是一个用于将训练好的机器学习预测器嵌入到JuMP模型中的开源Julia库。 \texttt{MathOptAI.jl}可以将各种神经网络、决策树和高斯过程嵌入到更大的数学优化模型中。除了与一系列基于Julia的机器学习库（如\texttt{Lux.jl}和\texttt{Flux.jl}）进行接口，\texttt{MathOptAI.jl}还使用Julia的Python接口来支持PyTorch模型。当PyTorch支持与\texttt{MathOptAI.jl}的灰箱形式相结合时，与PyTorch模型相关的函数、雅可比矩阵和Hessian矩阵的计算被转移到Python中的GPU上，而其他非线性神谕则在Julia中的CPU上进行评估。\MathOptAI可在https://github.com/lanl-ansi/MathOptAI.jl上以BSD-3许可证获得。

更新时间: 2025-07-03 20:32:08

领域: cs.LG,math.OC,90-04

下载: http://arxiv.org/abs/2507.03159v1

The Impact of LLM-Assistants on Software Developer Productivity: A Systematic Literature Review

Large language model assistants (LLM-assistants) present new opportunities to transform software development. Developers are increasingly adopting these tools across tasks, including coding, testing, debugging, documentation, and design. Yet, despite growing interest, there is no synthesis of how LLM-assistants affect software developer productivity. In this paper, we present a systematic literature review of 37 peer-reviewed studies published between January 2014 and December 2024 that examine this impact. Our analysis reveals that LLM-assistants offer both considerable benefits and critical risks. Commonly reported gains include minimized code search, accelerated development, and the automation of trivial and repetitive tasks. However, studies also highlight concerns around cognitive offloading, reduced team collaboration, and inconsistent effects on code quality. While the majority of studies (92%) adopt a multi-dimensional perspective by examining at least two SPACE dimensions, reflecting increased awareness of the complexity of developer productivity, only 14% extend beyond three dimensions, indicating substantial room for more integrated evaluations. Satisfaction, Performance, and Efficiency are the most frequently investigated dimensions, whereas Communication and Activity remain underexplored. Most studies are exploratory (64%) and methodologically diverse, but lack longitudinal and team-based evaluations. This review surfaces key research gaps and provides recommendations for future research and practice. All artifacts associated with this study are publicly available at https://zenodo.org/records/15788502.

Updated: 2025-07-03 20:25:49

标题: LLM助手对软件开发人员生产力的影响：系统文献综述

摘要: 大型语言模型助手（LLM助手）为转变软件开发提供了新的机会。开发人员越来越多地在包括编码、测试、调试、文档编写和设计等任务中采用这些工具。然而，尽管越来越受关注，但还没有综合评估LLM助手对软件开发人员生产力的影响。在本文中，我们对37篇同行评议的研究进行了系统性文献综述，这些研究发表于2014年1月至2024年12月之间，研究了这种影响。我们的分析显示，LLM助手既提供了相当大的好处，也存在重要的风险。常见的收益包括减少代码搜索、加速开发以及自动化琐碎和重复的任务。然而，研究也强调了认知卸载、减少团队协作以及对代码质量影响不一致等方面的担忧。尽管大多数研究（92%）采用了多维度视角，至少考察了两个SPACE维度，反映了对开发人员生产力复杂性的增强意识，但只有14%的研究超越了三个维度，表明还有很大的空间进行更多的综合评估。满意度、绩效和效率是最常研究的维度，而沟通和活动仍未得到充分探讨。大多数研究是探索性的（64%），方法上多样化，但缺乏纵向和团队评估。本综述揭示了关键的研究空白，并为未来的研究和实践提出建议。与本研究相关的所有文献均可在https://zenodo.org/records/15788502 上公开获取。

更新时间: 2025-07-03 20:25:49

领域: cs.SE,cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.03156v1

Curvature-Aligned Federated Learning (CAFe): Harmonizing Loss Landscapes for Fairness Without Demographics

Federated Learning (FL) enables privacy-preserving collaborative training, making it well-suited for decentralized human-sensing applications. Ensuring fairness in FL is challenging, as current methods rely on sensitive attribute knowledge, which conflicts with FL's privacy principles. Additionally, sensitive attributes in human-sensing data may be unknown or latent. To address this, we introduce Curvature-Aligned Federated Learning (CAFe), a theoretically grounded approach that achieves fairness in FL without requiring sensitive attribute knowledge, a concept termed "Fairness without Demographics" (FWD). CAFe introduces loss-landscape curvature regularization during local training and clients' loss-landscape sharpness-aware aggregation to align curvature both within and across clients, enabling a strong balance between higher fairness and performance. CAFe is especially suitable for real-world human-sensing FL scenarios involving single or multi-user edge devices with unknown or multiple bias factors. We validated CAFe through theoretical and empirical justifications, and comprehensive evaluations using three real-world datasets and a live real-world FL deployment with a heterogeneous testbed of resource-constrained devices. Additionally, we conduct sensitivity analyses on local training data volume, client sampling, communication overhead, resource costs, and runtime performance to demonstrate its feasibility for practical FL edge device deployment.

Updated: 2025-07-03 20:23:17

标题: 曲率对齐的联邦学习（CAFe）：在不考虑人口统计数据的情况下协调损失景观以实现公平

摘要: 联邦学习（FL）实现了隐私保护的协作训练，使其非常适合分散式人类感知应用。在FL中确保公平性是具有挑战性的，因为当前的方法依赖于敏感属性知识，这与FL的隐私原则相冲突。此外，人类感知数据中的敏感属性可能是未知或潜在的。为了解决这个问题，我们引入了曲率对齐联邦学习（CAFe），这是一种基于理论的方法，实现了在FL中的公平性，而无需敏感属性知识，这一概念被称为“无人口统计数据的公平性”（FWD）。CAFe在本地训练期间引入了损失-地形曲率正则化，并且在客户端的损失-地形锐度感知聚合过程中对曲率进行对齐，从而在客户端内部和之间实现更高的公平性和性能之间的强大平衡。CAFe特别适用于涉及未知或多个偏倚因素的单用户或多用户边缘设备的现实世界人类感知FL场景。我们通过理论和经验验证，并使用三个真实世界数据集以及一个包含资源受限设备的异构测试平台的实时真实世界FL部署进行全面评估，验证了CAFe。此外，我们对本地训练数据量、客户端采样、通信开销、资源成本和运行性能进行敏感性分析，以展示其在实际FL边缘设备部署中的可行性。

更新时间: 2025-07-03 20:23:17

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2404.19725v4

HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference

Scaling inference for large language models (LLMs) is increasingly constrained by limited GPU memory, especially due to growing key-value (KV) caches required for long-context generation. While existing approaches offload KV caches to CPU memory or apply sparse attention to reduce GPU load, they often underutilize CPU compute resources and compromise accuracy. We present HGCA, a hybrid CPU-GPU attention mechanism that enables scalable, high-throughput LLM inference with near-full attention quality. HGCA performs dense attention on recently generated KV entries retained in GPU memory and parallel sparse attention on selected, salient KV entries in CPU memory. The attention outputs are efficiently merged using log-sum-exp fusion, minimizing PCIe transfer overhead. HGCA also introduces a finegrained, per-head sparsification strategy optimized for CPU execution, preserving contextual relevance while reducing computation. Our implementation seamlessly integrates into existing LLM frameworks without requiring model retraining. Experiments across diverse models and workloads show that HGCA achieves superior scalability, supports longer sequences and larger batch sizes, and outperforms existing sparse attention baselines in both performance and accuracy -- all on commodity GPU hardware.

Updated: 2025-07-03 20:20:33

标题: HGCA：用于长上下文LLM推理的混合GPU-CPU注意力

摘要: 大语言模型（LLMs）的扩展推断受到有限的GPU内存的限制，尤其是由于长上下文生成所需的增长中键-值（KV）缓存。虽然现有方法将KV缓存卸载到CPU内存或应用稀疏注意力来减少GPU负载，但它们通常未充分利用CPU计算资源并影响准确性。我们提出了HGCA，一种混合CPU-GPU注意力机制，可实现可扩展的高吞吐量LLM推断，并具有接近全注意力质量。HGCA在保留在GPU内存中的最近生成的KV条目上执行密集注意力，并在CPU内存中选择的显著KV条目上执行并行稀疏注意力。使用对数-总和-指数融合高效地合并注意力输出，最小化PCIe传输开销。HGCA还引入了一种针对CPU执行进行优化的细粒度，每头稀疏化策略，保留上下文相关性同时减少计算。我们的实现可以无缝集成到现有的LLM框架中，无需重新训练模型。跨不同模型和工作负载的实验表明，HGCA实现了更优越的可扩展性，支持更长的序列和更大的批处理大小，并且在商品GPU硬件上在性能和准确性方面优于现有的稀疏注意力基线。

更新时间: 2025-07-03 20:20:33

领域: cs.LG

下载: http://arxiv.org/abs/2507.03153v1

Expert-level validation of AI-generated medical text with scalable language models

With the growing use of language models (LMs) in clinical environments, there is an immediate need to evaluate the accuracy and safety of LM-generated medical text. Currently, such evaluation relies solely on manual physician review. However, detecting errors in LM-generated text is challenging because 1) manual review is costly and 2) expert-composed reference outputs are often unavailable in real-world settings. While the "LM-as-judge" paradigm (a LM evaluating another LM) offers scalable evaluation, even frontier LMs can miss subtle but clinically significant errors. To address these challenges, we propose MedVAL, a self-supervised framework that leverages synthetic data to train evaluator LMs to assess whether LM-generated medical outputs are factually consistent with inputs, without requiring physician labels or reference outputs. To evaluate LM performance, we introduce MedVAL-Bench, a dataset containing 840 outputs annotated by physicians, following a physician-defined taxonomy of risk levels and error categories. Across 6 diverse medical tasks and 10 state-of-the-art LMs spanning open-source, proprietary, and medically adapted models, MedVAL fine-tuning significantly improves (p < 0.001) alignment with physicians on both seen and unseen tasks, increasing average F1 scores from 66% to 83%, with per-sample safety classification scores up to 86%. MedVAL improves the performance of even the best-performing proprietary LM (GPT-4o) by 8%. To support a scalable, risk-aware pathway towards clinical integration, we open-source the 1) codebase ( https://github.com/StanfordMIMI/MedVAL ), 2) MedVAL-Bench ( https://huggingface.co/datasets/stanfordmimi/MedVAL-Bench ), and 3) MedVAL-4B ( https://huggingface.co/stanfordmimi/MedVAL-4B ), the best-performing open-source LM. Our research provides the first evidence of LMs approaching expert-level validation ability for medical text.

Updated: 2025-07-03 20:19:18

标题: 专家级别验证具有可扩展语言模型的人工智能生成的医学文本

摘要: 随着语言模型（LMs）在临床环境中的日益广泛应用，有必要立即评估LM生成的医学文本的准确性和安全性。目前，这种评估仅依赖于手动医师审查。然而，检测LM生成的文本中的错误具有挑战性，因为1）手动审查成本高，2）在现实世界环境中往往缺乏专家编写的参考输出。虽然“LM作为评判者”范式（一个LM评估另一个LM）提供了可扩展的评估，但即使是最先进的LMs也可能忽略细微但临床重要的错误。为了解决这些挑战，我们提出了MedVAL，这是一个自监督框架，利用合成数据训练评估LMs，评估LM生成的医学输出是否与输入事实一致，而无需医师标签或参考输出。为了评估LM的性能，我们引入了MedVAL-Bench，一个包含840个由医师注释的输出的数据集，遵循医师定义的风险级别和错误类别分类法。在6个不同的医学任务和10个最先进的LMs中，跨越开源、专有和医学适应模型，MedVAL微调显著改善了与医师在已知和未知任务上的一致性（p <0.001），将平均F1分数从66％提高到83％，每个样本的安全分类得分高达86％。MedVAL将甚至最佳专有LM（GPT-4o）的性能提高了8％。为了支持一条可扩展的、风险感知的通向临床整合的路径，我们开源了1）代码库（https://github.com/StanfordMIMI/MedVAL），2）MedVAL-Bench（https://huggingface.co/datasets/stanfordmimi/MedVAL-Bench），以及3）MedVAL-4B（https://huggingface.co/stanfordmimi/MedVAL-4B），这是性能最佳的开源LM。我们的研究提供了LMs接近专家级验证能力的第一证据，用于医学文本。

更新时间: 2025-07-03 20:19:18

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.03152v1

Last-Iterate Convergence of No-Regret Learning for Equilibria in Bargaining Games

Bargaining games, where agents attempt to agree on how to split utility, are an important class of games used to study economic behavior, which motivates a study of online learning algorithms in these games. In this work, we tackle when no-regret learning algorithms converge to Nash equilibria in bargaining games. Recent results have shown that online algorithms related to Follow the Regularized Leader (FTRL) converge to Nash equilibria (NE) in the last iterate in a wide variety of games, including zero-sum games. However, bargaining games do not have the properties used previously to established convergence guarantees, even in the simplest case of the ultimatum game, which features a single take-it-or-leave-it offer. Nonetheless, we establish that FTRL (without the modifications necessary for zero-sum games) achieves last-iterate convergence to an approximate NE in the ultimatum game along with a bound on convergence time under mild assumptions. Further, we provide experimental results to demonstrate that convergence to NE, including NE with asymmetric payoffs, occurs under a broad range of initial conditions, both in the ultimatum game and in bargaining games with multiple rounds. This work demonstrates how complex economic behavior (e.g. learning to use threats and the existence of many possible equilibrium outcomes) can result from using a simple learning algorithm, and that FTRL can converge to equilibria in a more diverse set of games than previously known.

Updated: 2025-07-03 20:12:59

标题: 博弈游戏中无悔学习的最终收敛到均衡

摘要: 谈判游戏是一类重要的游戏，代理尝试就如何分配效用达成协议，在经济行为研究中被广泛应用，这促使了对这些游戏中在线学习算法的研究。在这项工作中，我们研究了无悔学习算法在谈判游戏中收敛到纳什均衡的情况。最近的研究结果表明，与Follow the Regularized Leader（FTRL）相关的在线算法在各种游戏中，包括零和游戏，都会在最后迭代中收敛到纳什均衡（NE）。然而，谈判游戏并不具备先前用于建立收敛保证的属性，即使在最简单的提案游戏中，其中只有一次接受或拒绝的提议。尽管如此，我们证明了FTRL（不需要针对零和游戏的修改）在提案游戏中实现了近似NE的最后迭代收敛，并在温和的假设下对收敛时间进行了界定。此外，我们提供实验结果表明，在广泛范围的初始条件下，包括提案游戏和具有多轮的谈判游戏中，收敛到NE，包括具有不对称收益的NE。这项工作展示了如何通过使用简单的学习算法产生复杂的经济行为（例如学会使用威胁和存在许多可能的均衡结果），以及FTRL能够在比之前更多样的游戏中收敛到均衡。

更新时间: 2025-07-03 20:12:59

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2507.03150v1

Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models

The ability to build and reason about models of the world is essential for situated language understanding. But evaluating world modeling capabilities in modern AI systems -- especially those based on language models -- has proven challenging, in large part because of the difficulty of disentangling conceptual knowledge about the world from knowledge of surface co-occurrence statistics. This paper presents Elements of World Knowledge (EWoK), a framework for evaluating language models' understanding of the conceptual knowledge underlying world modeling. EWoK targets specific concepts from multiple knowledge domains known to be important for world modeling in humans, from social interactions (help, deceive) to spatial relations (left, right). Objects, agents, and locations in the items can be flexibly filled in, enabling easy generation of multiple controlled datasets. We then introduce EWoK-core-1.0, a dataset of 4,374 items covering 11 world knowledge domains. We evaluate 20 open-weights large language models (1.3B--70B parameters) and compare them with human performance. All tested models perform worse than humans, with results varying drastically across domains. Performance on social interactions and social properties was highest and performance on physical relations and spatial relations was lowest. Overall, this dataset highlights simple cases where even large models struggle and presents rich avenues for targeted research on LLM world modeling capabilities.

Updated: 2025-07-03 20:10:24

标题: 世界知识要素 (EWoK)：一个受认知启发的框架，用于评估语言模型中的基本世界知识

摘要: 构建和推理关于世界的模型的能力对于语境语言理解至关重要。但是，在现代人工智能系统中评估世界建模能力--特别是基于语言模型的系统--已被证明具有挑战性，主要是因为难以将关于世界的概念知识与表面共现统计知识区分开来。本文介绍了世界知识要素（EWoK），这是一个评估语言模型对世界建模基本概念理解的框架。EWoK针对人类在社会互动（帮助，欺骗）到空间关系（左，右）等多个知识领域中已知对世界建模重要的特定概念。项目中的对象、代理和位置可以灵活填充，从而实现多个受控数据集的轻松生成。然后我们介绍了EWoK-core-1.0，这是一个涵盖11个世界知识领域的4,374个项目的数据集。我们评估了20个开放权重的大型语言模型（1.3B--70B参数），并将它们与人类表现进行比较。所有测试的模型表现都不如人类，各领域结果差异巨大。社会互动和社会属性的表现最好，而物理关系和空间关系的表现最差。总的来说，该数据集突出了即使是大型模型也会遇到困难的简单情况，并为针对LLM世界建模能力的有针对性研究提供了丰富的研究方向。

更新时间: 2025-07-03 20:10:24

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.09605v2

On the Relationship between Accent Strength and Articulatory Features

This paper explores the relationship between accent strength and articulatory features inferred from acoustic speech. To quantify accent strength, we compare phonetic transcriptions with transcriptions based on dictionary-based references, computing phoneme-level difference as a measure of accent strength. The proposed framework leverages recent self-supervised learning articulatory inversion techniques to estimate articulatory features. Analyzing a corpus of read speech from American and British English speakers, this study examines correlations between derived articulatory parameters and accent strength proxies, associating systematic articulatory differences with indexed accent strength. Results indicate that tongue positioning patterns distinguish the two dialects, with notable differences inter-dialects in rhotic and low back vowels. These findings contribute to automated accent analysis and articulatory modeling for speech processing applications.

Updated: 2025-07-03 20:08:28

标题: 关于口音强度与发音特征之间的关系

摘要: 本文探讨了口音强度与从声学语音推断的发音特征之间的关系。为了量化口音强度，我们将语音转录与基于词典的参考转录进行比较，计算音素级差异作为口音强度的衡量指标。所提出的框架利用最近的自监督学习发音逆向技术来估计发音特征。通过分析来自美国和英国英语演讲者的语料库，本研究考察了派生的发音参数与口音强度代理之间的相关性，将系统性的发音差异与索引的口音强度联系起来。结果表明，舌位模式区分了这两种方言，在卷舌音和低后元音中方言之间存在明显差异。这些发现有助于自动口音分析和语音处理应用的发音建模。

更新时间: 2025-07-03 20:08:28

领域: eess.AS,cs.AI,cs.CL,cs.SD

下载: http://arxiv.org/abs/2507.03149v1

DeepGesture: A conversational gesture synthesis system based on emotions and semantics

Along with the explosion of large language models, improvements in speech synthesis, advancements in hardware, and the evolution of computer graphics, the current bottleneck in creating digital humans lies in generating character movements that correspond naturally to text or speech inputs. In this work, we present DeepGesture, a diffusion-based gesture synthesis framework for generating expressive co-speech gestures conditioned on multimodal signals-text, speech, emotion, and seed motion. Built upon the DiffuseStyleGesture model, DeepGesture introduces novel architectural enhancements that improve semantic alignment and emotional expressiveness in generated gestures. Specifically, we integrate fast text transcriptions as semantic conditioning and implement emotion-guided classifier-free diffusion to support controllable gesture generation across affective states. A lightweight Transformer backbone combines full self-attention and cross-local attention for effective feature fusion of heterogeneous modalities. To visualize results, we implement a full rendering pipeline in Unity based on BVH output from the model. Evaluation on the ZeroEGGS dataset shows that DeepGesture produces gestures with improved human-likeness and contextual appropriateness, outperforming baselines on Mean Opinion Score and Frechet Gesture Distance metrics. Our system supports interpolation between emotional states and demonstrates generalization to out-of-distribution speech, including synthetic voices-marking a step forward toward fully multimodal, emotionally aware digital humans.

Updated: 2025-07-03 20:04:04

标题: DeepGesture：基于情感和语义的对话手势合成系统

摘要: 随着大型语言模型的爆发，语音合成的改进，硬件的进步以及计算机图形学的演变，当前创作数字人类的瓶颈在于生成与文本或语音输入自然对应的角色动作。在这项工作中，我们提出了DeepGesture，一个基于扩散的手势合成框架，用于生成受多模态信号-文本、语音、情绪和种子运动条件的富有表现力的共语手势。建立在DiffuseStyleGesture模型之上，DeepGesture引入了新颖的架构增强，提高了生成手势的语义对齐和情感表现力。具体而言，我们将快速文本转录作为语义调节条件，并实现了基于情绪引导的无分类器扩散，以支持跨情感状态可控手势生成。一个轻量级Transformer骨干结合全自注意力和交叉局部注意力，有效地融合异质模态的特征。为了可视化结果，我们在Unity中实现了一个完整的渲染流水线，基于模型输出的BVH。对ZeroEGGS数据集的评估显示，DeepGesture生成了具有改进的人类样态和上下文适宜性的手势，优于均值意见得分和Frechet手势距离指标的基线。我们的系统支持情感状态之间的插值，并展示了对超出分布的语音的泛化，包括合成声音-这是朝着完全多模态、情感感知的数字人类迈出的一步。

更新时间: 2025-07-03 20:04:04

领域: cs.HC,cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.03147v1

Set Valued Predictions For Robust Domain Generalization

Despite the impressive advancements in modern machine learning, achieving robustness in Domain Generalization (DG) tasks remains a significant challenge. In DG, models are expected to perform well on samples from unseen test distributions (also called domains), by learning from multiple related training distributions. Most existing approaches to this problem rely on single-valued predictions, which inherently limit their robustness. We argue that set-valued predictors could be leveraged to enhance robustness across unseen domains, while also taking into account that these sets should be as small as possible. We introduce a theoretical framework defining successful set prediction in the DG setting, focusing on meeting a predefined performance criterion across as many domains as possible, and provide theoretical insights into the conditions under which such domain generalization is achievable. We further propose a practical optimization method compatible with modern learning architectures, that balances robust performance on unseen domains with small prediction set sizes. We evaluate our approach on several real-world datasets from the WILDS benchmark, demonstrating its potential as a promising direction for robust domain generalization.

Updated: 2025-07-03 19:57:09

标题: 鲁棒性域泛化的集值预测

摘要: 尽管现代机器学习取得了令人印象深刻的进展，但在领域泛化（DG）任务中实现稳健性仍然是一个重要挑战。在DG中，模型被期望在来自未见过的测试分布（也称为领域）的样本上表现良好，通过从多个相关的训练分布中学习。大多数现有方法依赖于单值预测，这在本质上限制了它们的稳健性。我们认为，可以利用集值预测器来增强在未见领域中的稳健性，同时还考虑到这些集应尽可能小。我们引入了一个理论框架，定义了在DG设置中成功的集预测，重点是满足尽可能多领域中的预定义性能标准，并提供了在这些条件下实现这种领域泛化的理论见解。我们进一步提出了一种与现代学习架构兼容的实用优化方法，平衡在未见领域上的稳健性和小预测集大小。我们在来自WILDS基准数据集的几个真实世界数据集上评估了我们的方法，展示了它作为稳健领域泛化的一个有前途的方向的潜力。

更新时间: 2025-07-03 19:57:09

领域: cs.LG

下载: http://arxiv.org/abs/2507.03146v1

Neural Substitute Solver for Efficient Edge Inference of Power Electronic Hybrid Dynamics

Advancing the dynamics inference of power electronic systems (PES) to the real-time edge-side holds transform-ative potential for testing, control, and monitoring. How-ever, efficiently inferring the inherent hybrid continu-ous-discrete dynamics on resource-constrained edge hardware remains a significant challenge. This letter pro-poses a neural substitute solver (NSS) approach, which is a neural-network-based framework aimed at rapid accurate inference with significantly reduced computational costs. Specifically, NSS leverages lightweight neural networks to substitute time-consuming matrix operation and high-order numerical integration steps in traditional solvers, which transforms sequential bottlenecks into highly parallel operation suitable for edge hardware. Experimental vali-dation on a multi-stage DC-DC converter demonstrates that NSS achieves 23x speedup and 60% hardware resource reduction compared to traditional solvers, paving the way for deploying edge inference of high-fidelity PES dynamics.

Updated: 2025-07-03 19:52:32

标题: 神经替代求解器用于高效处理电力电子混合动力学边缘推断

摘要: 将功率电子系统（PES）的动态推断推进到实时边缘端具有测试、控制和监测的转变潜力。然而，在资源受限的边缘硬件上高效推断固有的混合连续-离散动态仍然是一个重大挑战。本文提出了一种神经替代求解器（NSS）方法，这是一个基于神经网络的框架，旨在以显著降低的计算成本快速准确地推断。具体而言，NSS利用轻量级神经网络替代传统求解器中耗时的矩阵运算和高阶数值积分步骤，将顺序瓶颈转变为适合边缘硬件的高度并行操作。在一个多级DC-DC变换器上的实验验证表明，与传统求解器相比，NSS实现了23倍的加速和60%的硬件资源减少，为部署高保真度PES动态的边缘推断铺平了道路。

更新时间: 2025-07-03 19:52:32

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2507.03144v1

The geometry of financial institutions -- Wasserstein clustering of financial data

The increasing availability of granular and big data on various objects of interest has made it necessary to develop methods for condensing this information into a representative and intelligible map. Financial regulation is a field that exemplifies this need, as regulators require diverse and often highly granular data from financial institutions to monitor and assess their activities. However, processing and analyzing such data can be a daunting task, especially given the challenges of dealing with missing values and identifying clusters based on specific features. To address these challenges, we propose a variant of Lloyd's algorithm that applies to probability distributions and uses generalized Wasserstein barycenters to construct a metric space which represents given data on various objects in condensed form. By applying our method to the financial regulation context, we demonstrate its usefulness in dealing with the specific challenges faced by regulators in this domain. We believe that our approach can also be applied more generally to other fields where large and complex data sets need to be represented in concise form.

Updated: 2025-07-03 19:38:08

标题: 金融机构的几何结构--金融数据的Wasserstein聚类

摘要: 随着各种感兴趣对象的细粒度和大数据的日益可用性，有必要开发方法将这些信息压缩成一个具有代表性和易理解的地图。金融监管是一个典型的领域，监管机构需要从金融机构获取多样化且常常是高度细粒度的数据，以监测和评估它们的活动。然而，处理和分析这样的数据可能是一项艰巨的任务，尤其是面对处理缺失值和基于特定特征识别聚类的挑战时。为了解决这些挑战，我们提出了一种适用于概率分布的Lloyd算法的变体，利用广义Wasserstein重心构建一个度量空间，以压缩形式表示给定数据中各种对象。通过将我们的方法应用于金融监管背景，我们展示了它在应对监管机构在该领域面临的具体挑战方面的有用性。我们相信我们的方法也可以更普遍地应用于其他领域，其中需要将大型和复杂数据集以简明形式表示。

更新时间: 2025-07-03 19:38:08

领域: stat.ML,cs.LG,math.OC,math.PR,q-fin.MF

下载: http://arxiv.org/abs/2305.03565v2

Holographic Projection and Cyber Attack Surface: A Physical Analogy for Digital Security

This article presents an in-depth exploration of the analogy between the Holographic Principle in theoretical physics and cyber attack surfaces in digital security. Building on concepts such as black hole entropy and AdS/CFT duality, it highlights how complex infrastructures project their vulnerabilities onto their external interfaces. The paper draws a parallel between a black hole's event horizon, which encodes all internal information, and the attack surface, which reflects the internal architecture's security posture. Additionally, the article outlines how this conceptual framework can guide cybersecurity practices, emphasizing strategies such as attack surface reduction, continuous scanning with tools like OWASP ZAP and Greenbone OpenVAS, and the implementation of Zero Trust Architecture. This analogy not only provides a unique perspective on digital security but also underscores the critical importance of boundary-level defenses in protecting vast internal infrastructures.

Updated: 2025-07-03 19:25:53

标题: 全息投影和网络攻击面：数字安全的物理类比

摘要: 本文深入探讨了理论物理中的全息原理与数字安全中的网络攻击面之间的类比。基于诸如黑洞熵和AdS/CFT对偶等概念，文章强调了复杂基础设施如何将其漏洞投射到外部接口上。该论文将黑洞的事件视界（编码所有内部信息）与攻击面（反映内部架构安全姿态）进行了类比。此外，文章概述了这一概念框架如何指导网络安全实践，强调攻击面缩减、使用OWASP ZAP和Greenbone OpenVAS等工具进行持续扫描，以及实施零信任架构等策略。这种类比不仅提供了对数字安全的独特视角，还强调了在保护庞大的内部基础设施时边界级防御的关键重要性。

更新时间: 2025-07-03 19:25:53

领域: cs.CR,cs.LO

下载: http://arxiv.org/abs/2507.03136v1

Towards a Psychoanalytic Perspective on VLM Behaviour: A First-step Interpretation with Intriguing Observations

Hallucination is a long-standing problem that has been actively investigated in Vision-Language Models (VLMs). Existing research commonly attributes hallucinations to technical limitations or sycophancy bias, where the latter means the models tend to generate incorrect answers to align with user expectations. However, these explanations primarily focus on technical or externally driven factors, may have neglected the possibility that hallucination behaviours might mirror cognitive biases observed in human psychology. In this work, we introduce a psychological taxonomy, categorizing VLMs' hallucination behaviours, including sycophancy, logical inconsistency, and a newly identified VLMs behaviour: authority bias. To systematically analyze these behaviours, we design AIpsych, a scalable benchmark that reveals psychological tendencies in model response patterns. Leveraging this benchmark, we investigate how variations in model architecture and parameter size influence model behaviour when responding to strategically manipulated questions. Our experiments reveal that as model size increases, VLMs exhibit stronger sycophantic tendencies but reduced authority bias, suggesting increasing competence but a potential erosion of response integrity. A human subject study further validates our hypotheses and highlights key behavioural differences between VLMs and human respondents. This work suggests a new perspective for understanding hallucination in VLMs and highlights the importance of integrating psychological principles into model evaluation.The benchmark is available at https://github.com/lxrswdd/AIpsych.

Updated: 2025-07-03 19:03:16

标题: 朝向心理分析视角看待VLM行为：带有引人注目观察的第一步解释

摘要: 幻觉是视觉语言模型（VLMs）中长期存在的问题，一直受到积极调查。现有研究通常将幻觉归因于技术限制或谄媚偏见，后者意味着模型倾向于生成不正确的答案以符合用户期望。然而，这些解释主要集中在技术或外部驱动因素上，可能忽视了幻觉行为可能反映人类心理学中观察到的认知偏见的可能性。在这项工作中，我们引入了心理学分类法，将VLMs的幻觉行为分类，包括谄媚、逻辑不一致和新发现的VLMs行为：权威偏见。为了系统分析这些行为，我们设计了一个可扩展的基准AIpsych，揭示了模型响应模式中的心理倾向。利用这个基准，我们调查了模型架构和参数大小的变化如何影响模型在回答经过策略性操纵的问题时的行为。我们的实验表明，随着模型规模的增加，VLMs表现出更强的谄媚倾向，但权威偏见减少，这表明增加了能力但可能侵蚀了响应的完整性。一项人类主体研究进一步验证了我们的假设，并突出了VLMs和人类受访者之间的关键行为差异。这项工作提出了一个新的视角，用于理解VLMs中的幻觉，并强调将心理学原则整合到模型评估中的重要性。该基准可在https://github.com/lxrswdd/AIpsych上找到。

更新时间: 2025-07-03 19:03:16

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.03123v1

MORPH-LER: Log-Euclidean Regularization for Population-Aware Image Registration

Spatial transformations that capture population-level morphological statistics are critical for medical image analysis. Commonly used smoothness regularizers for image registration fail to integrate population statistics, leading to anatomically inconsistent transformations. Inverse consistency regularizers promote geometric consistency but lack population morphometrics integration. Regularizers that constrain deformation to low-dimensional manifold methods address this. However, they prioritize reconstruction over interpretability and neglect diffeomorphic properties, such as group composition and inverse consistency. We introduce MORPH-LER, a Log-Euclidean regularization framework for population-aware unsupervised image registration. MORPH-LER learns population morphometrics from spatial transformations to guide and regularize registration networks, ensuring anatomically plausible deformations. It features a bottleneck autoencoder that computes the principal logarithm of deformation fields via iterative square-root predictions. It creates a linearized latent space that respects diffeomorphic properties and enforces inverse consistency. By integrating a registration network with a diffeomorphic autoencoder, MORPH-LER produces smooth, meaningful deformation fields. The framework offers two main contributions: (1) a data-driven regularization strategy that incorporates population-level anatomical statistics to enhance transformation validity and (2) a linearized latent space that enables compact and interpretable deformation fields for efficient population morphometrics analysis. We validate MORPH-LER across two families of deep learning-based registration networks, demonstrating its ability to produce anatomically accurate, computationally efficient, and statistically meaningful transformations on the OASIS-1 brain imaging dataset. https://github.com/iyerkrithika21/MORPH_LER

Updated: 2025-07-03 19:02:34

标题: MORPH-LER：适用于基于人群的图像配准的Log-Euclidean正则化

摘要: 捕捉人群水平形态统计的空间变换对于医学图像分析至关重要。常用的图像配准平滑性正则化器未能整合人群统计，导致解剖学不一致的变换。逆一致性正则化器促进几何一致性，但缺乏人群形态统计整合。将变形限制在低维流形方法的正则化器解决了这个问题。然而，它们更注重重建而非可解释性，并忽略了群组合成和逆一致性等微分性质。我们介绍了MORPH-LER，这是一个Log-Euclidean正则化框架，用于基于人群意识的无监督图像配准。MORPH-LER从空间变换中学习人群形态统计，以引导和正则化配准网络，确保解剖学上合理的变形。它具有一个瓶颈自动编码器，通过迭代的平方根预测计算变形场的主对数。它创建了一个线性化的潜在空间，尊重微分性质并强制逆一致性。通过将配准网络与微分自动编码器整合，MORPH-LER产生平滑、有意义的变形场。该框架提供了两个主要贡献：（1）一个数据驱动的正则化策略，将人群水平解剖统计整合到增强变换有效性中；（2）一个线性化的潜在空间，为高效的人群形态统计分析提供紧凑且可解释的变形场。我们在两类基于深度学习的配准网络上验证了MORPH-LER，展示了它在OASIS-1脑成像数据集上产生解剖学准确、计算高效和统计意义的变换的能力。

更新时间: 2025-07-03 19:02:34

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.02029v2

LEDA: Log-Euclidean Diffeomorphism Autoencoder for Efficient Statistical Analysis of Diffeomorphisms

Image registration is a core task in computational anatomy that establishes correspondences between images. Invertible deformable registration, which computes a deformation field and handles complex, non-linear transformations, is essential for tracking anatomical variations, especially in neuroimaging applications where inter-subject differences and longitudinal changes are key. Analyzing the deformation fields is challenging due to their non-linearity, which limits statistical analysis. However, traditional approaches for analyzing deformation fields are computationally expensive, sensitive to initialization, and prone to numerical errors, especially when the deformation is far from the identity. To address these limitations, we propose the Log-Euclidean Diffeomorphism Autoencoder (LEDA), an innovative framework designed to compute the principal logarithm of deformation fields by efficiently predicting consecutive square roots. LEDA operates within a linearized latent space that adheres to the diffeomorphisms group action laws, enhancing our model's robustness and applicability. We also introduce a loss function to enforce inverse consistency, ensuring accurate latent representations of deformation fields. Extensive experiments with the OASIS-1 dataset demonstrate the effectiveness of LEDA in accurately modeling and analyzing complex non-linear deformations while maintaining inverse consistency. Additionally, we evaluate its ability to capture and incorporate clinical variables, enhancing its relevance for clinical applications.

Updated: 2025-07-03 18:59:54

标题: LEDA：用于流形变换统计分析的对数欧几里德自动编码器

摘要: 图像配准是计算解剖学中的核心任务，建立图像之间的对应关系。可逆可变形配准计算变形场并处理复杂的非线性变换，在跟踪解剖变化方面至关重要，特别是在神经影像应用中，个体间差异和纵向变化是关键因素。由于变形场的非线性，分析变形场具有挑战性，限制了统计分析。然而，传统的分析变形场方法在计算上昂贵，对初始化敏感，易受数值误差影响，特别是当变形远离单位时。为了解决这些局限性，我们提出了Log-Euclidean Diffeomorphism Autoencoder（LEDA），这是一个创新性框架，旨在通过有效预测连续平方根来计算变形场的主对数。LEDA在符合微分同胚群作用定律的线性化潜在空间中运行，增强了我们模型的鲁棒性和适用性。我们还引入了一个损失函数来强制反向一致性，确保对变形场的准确潜在表示。对OASIS-1数据集的广泛实验表明LEDA在准确建模和分析复杂非线性变形的能力，同时保持反向一致性。此外，我们评估了LEDA捕获和整合临床变量的能力，提升了其在临床应用中的相关性。

更新时间: 2025-07-03 18:59:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.16129v2

Predicting the Susceptibility of Examples to Catastrophic Forgetting

Catastrophic forgetting - the tendency of neural networks to forget previously learned data when learning new information - remains a central challenge in continual learning. In this work, we adopt a behavioral approach, observing a connection between learning speed and forgetting: examples learned more quickly are less prone to forgetting. Focusing on replay-based continual learning, we show that the composition of the replay buffer - specifically, whether it contains quickly or slowly learned examples - has a significant effect on forgetting. Motivated by this insight, we introduce Speed-Based Sampling (SBS), a simple yet general strategy that selects replay examples based on their learning speed. SBS integrates easily into existing buffer-based methods and improves performance across a wide range of competitive continual learning benchmarks, advancing state-of-the-art results. Our findings underscore the value of accounting for the forgetting dynamics when designing continual learning algorithms.

Updated: 2025-07-03 18:59:06

标题: 预测示例对灾难性遗忘的易感性

摘要: 灾难性遗忘-神经网络在学习新信息时忘记先前学习的数据的倾向-仍然是继续学习中的一个核心挑战。在这项工作中，我们采用行为方法，观察学习速度和遗忘之间的联系：学习更快的示例更不容易遗忘。专注于基于重播的继续学习，我们展示了重播缓冲区的组成-具体而言，它是否包含学习速度快或慢的示例-对遗忘有显著影响。受到这一见解的启发，我们引入了基于速度的采样（SBS），这是一种简单但通用的策略，根据学习速度选择重播示例。SBS可以轻松集成到现有的基于缓冲区的方法中，并在各种竞争性继续学习基准测试中提高性能，推动了最新技术结果。我们的发现强调了在设计继续学习算法时考虑遗忘动态的价值。

更新时间: 2025-07-03 18:59:06

领域: cs.LG

下载: http://arxiv.org/abs/2406.09935v2

Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings

This study investigates the feasibility and performance of federated learning (FL) for multi-label ICD code classification using clinical notes from the MIMIC-IV dataset. Unlike previous approaches that rely on centralized training or fine-tuned large language models, we propose a lightweight and scalable pipeline combining frozen text embeddings with simple multilayer perceptron (MLP) classifiers. This design offers a privacy-preserving and deployment-efficient alternative for clinical NLP applications, particularly suited to distributed healthcare settings. Extensive experiments across both centralized and federated configurations were conducted, testing six publicly available embedding models from Massive Text Embedding Benchmark leaderboard and three MLP classifier architectures under two medical coding (ICD-9 and ICD-10). Additionally, ablation studies over ten random stratified splits assess performance stability. Results show that embedding quality substantially outweighs classifier complexity in determining predictive performance, and that federated learning can closely match centralized results in idealized conditions. While the models are orders of magnitude smaller than state-of-the-art architectures and achieved competitive micro and macro F1 scores, limitations remain including the lack of end-to-end training and the simplified FL assumptions. Nevertheless, this work demonstrates a viable way toward scalable, privacy-conscious medical coding systems and offers a step toward for future research into federated, domain-adaptive clinical AI.

Updated: 2025-07-03 18:58:36

标题: 使用轻量级模型和预训练嵌入的联邦学习用于ICD分类

摘要: 这项研究调查了在MIMIC-IV数据集中使用临床笔记进行多标签ICD代码分类的联邦学习（FL）的可行性和性能。与先前依赖于集中式训练或精细调整的大型语言模型的方法不同，我们提出了一种轻量级且可扩展的流水线，将冻结文本嵌入与简单的多层感知器（MLP）分类器相结合。该设计为临床NLP应用提供了一种保护隐私且部署高效的替代方案，特别适用于分布式医疗设置。进行了广泛的实验，测试了两种医学编码（ICD-9和ICD-10）下来自Massive Text Embedding Benchmark排行榜的六个公开可用的嵌入模型和三种MLP分类器架构，以及在两个中央化和联邦化配置下进行了实验。此外，还进行了十个随机分层拆分的消融研究，评估了性能稳定性。结果表明，嵌入质量在确定预测性能方面远远超过了分类器复杂性，并且在理想条件下，联邦学习可以与中央化结果接近匹配。虽然这些模型比最先进的架构小几个数量级，并且取得了有竞争力的微观和宏观F1分数，但仍存在限制，包括缺乏端到端训练和简化的FL假设。然而，这项工作展示了一种可行的方式，可以向可扩展、注重隐私的医学编码系统迈进，并为将来关于联邦、领域自适应临床AI的研究提供了一步。

更新时间: 2025-07-03 18:58:36

领域: cs.IR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.03122v1

How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models

Large language models (LLMs) exhibit strikingly conflicting behaviors: they can appear steadfastly overconfident in their initial answers whilst at the same time being prone to excessive doubt when challenged. To investigate this apparent paradox, we developed a novel experimental paradigm, exploiting the unique ability to obtain confidence estimates from LLMs without creating memory of their initial judgments -- something impossible in human participants. We show that LLMs -- Gemma 3, GPT4o and o1-preview -- exhibit a pronounced choice-supportive bias that reinforces and boosts their estimate of confidence in their answer, resulting in a marked resistance to change their mind. We further demonstrate that LLMs markedly overweight inconsistent compared to consistent advice, in a fashion that deviates qualitatively from normative Bayesian updating. Finally, we demonstrate that these two mechanisms -- a drive to maintain consistency with prior commitments and hypersensitivity to contradictory feedback -- parsimoniously capture LLM behavior in a different domain. Together, these findings furnish a mechanistic account of LLM confidence that explains both their stubbornness and excessive sensitivity to criticism.

Updated: 2025-07-03 18:57:43

标题: 初次选择中的过度自信和在批评下的低自信如何调节大型语言模型中改变主意

摘要: 大型语言模型（LLMs）表现出截然相反的行为：它们在初始答案中可能表现出过度自信，同时在受到挑战时又容易过度怀疑。为了调查这一明显的悖论，我们开发了一种新颖的实验范式，利用LLMs能够获得置信度估计的独特能力，而不会在其初始判断中留下记忆 -- 这在人类参与者中是不可能的。我们展示了LLMs -- Gemma 3、GPT4o和o1-preview -- 表现出明显的选择支持偏见，这种偏见强化并增强了它们对答案的置信度估计，导致它们坚持自己的观点。我们进一步证明，与正态贝叶斯更新定性偏离的方式，LLMs明显过分看重与一致建议相比的不一致建议。最后，我们证明这两种机制 -- 一种维持与先前承诺一致性的驱动力和对矛盾反馈的过度敏感 -- 简明地捕捉了LLMs在不同领域的行为。总之，这些发现提供了一个解释LLM自信度的机械性解释，解释了它们的固执和对批评的过度敏感。

更新时间: 2025-07-03 18:57:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03120v1

Neural-Network solver of ideal MHD equilibria

We present a novel approach to compute three-dimensional Magnetohydrodynamic equilibria by parametrizing Fourier modes with artificial neural networks and compare it to equilibria computed by conventional solvers. The full nonlinear global force residual across the volume in real space is then minimized with first order optimizers. Already,we observe competitive computational cost to arrive at the same minimum residuals computed by existing codes. With increased computational cost,lower minima of the residual are achieved by the neural networks,establishing a new lower bound for the force residual. We use minimally complex neural networks,and we expect significant improvements for solving not only single equilibria with neural networks,but also for computing neural network models valid over continuous distributions of equilibria.

Updated: 2025-07-03 18:56:37

标题: 理想磁流体力学平衡的神经网络求解器

摘要: 我们提出了一种新颖的方法，通过使用人工神经网络对傅立叶模式进行参数化来计算三维磁流体力学平衡，并将其与传统求解器计算的平衡进行比较。然后，通过一阶优化器最小化实空间中整个体积上的非线性全局力残差。我们已经观察到，通过竞争性的计算成本，可以得到与现有代码计算的相同最小残差。通过增加计算成本，神经网络可以实现更低的残差极小值，为力残差建立了一个新的下限。我们使用最简单的神经网络，预计在不仅解决单个平衡问题，而且解决连续分布的平衡问题时，可以取得显著的改进。

更新时间: 2025-07-03 18:56:37

领域: cs.LG,cs.AI,physics.plasm-ph

下载: http://arxiv.org/abs/2507.03119v1

BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers

The energy consumption of large-scale ML models is dominated by data movement - shuffling billions of parameters across memory hierarchies and data centers. Effective sparsification to prune redundant parameters is still challenging: existing methods incur significant accuracy degradation, performance overhead, or both. We introduce (Bl)ock (a)nd (S)parse (T)ransformers (BLaST), a general, robust, and reliable sparsification method applicable to linear layers in all settings. Our method iteratively sparsifies weight matrices into a block sparsity pattern suitable for efficient sparse matrix-matrix (SpMM) multiplication. BLaST achieves up to 95% sparsity in MLP weights with negligible accuracy loss. Our fused, highly optimized Sparse MLP kernel delivers up to 16.7x speedup over dense MLPs across 9 architectures and 8 datasets, resulting in up to 1.6x inference speedup, 1.11x pretraining speedup and up to 3.12x inference memory usage reduction. BLaST enables the next generation of large-scale AI systems by reducing energy use, memory footprint, and latency.

Updated: 2025-07-03 18:53:54

标题: BLaST：使用BLoCk Sparse Transformers进行高性能推理和预训练

摘要: 大规模机器学习模型的能耗主要由数据移动所支配，即在内存层次结构和数据中心之间移动数十亿个参数。有效的稀疏化方法来修剪冗余参数仍然具有挑战性：现有方法会导致显著的精度降低、性能开销或两者兼而有之。我们介绍了（Bl）ock（a）nd（S）parse（T）ransformers（BLaST），这是一种通用、健壮且可靠的稀疏化方法，适用于所有环境中的线性层。我们的方法将权重矩阵迭代地稀疏化为适用于高效稀疏矩阵-矩阵（SpMM）乘法的块稀疏模式。BLaST在MLP权重中实现了高达95%的稀疏度，几乎没有精度损失。我们融合的、高度优化的稀疏MLP内核在9种架构和8个数据集上比密集MLP实现了高达16.7倍的加速，从而实现了高达1.6倍的推理速度提升、1.11倍的预训练速度提升和高达3.12倍的推理内存使用减少。BLaST通过减少能源消耗、内存占用和延迟，实现了下一代大规模人工智能系统的可能性。

更新时间: 2025-07-03 18:53:54

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2507.03117v1

Visual Anagrams Reveal Hidden Differences in Holistic Shape Processing Across Vision Models

Humans are able to recognize objects based on both local texture cues and the configuration of object parts, yet contemporary vision models primarily harvest local texture cues, yielding brittle, non-compositional features. Work on shape-vs-texture bias has pitted shape and texture representations in opposition, measuring shape relative to texture, ignoring the possibility that models (and humans) can simultaneously rely on both types of cues, and obscuring the absolute quality of both types of representation. We therefore recast shape evaluation as a matter of absolute configural competence, operationalized by the Configural Shape Score (CSS), which (i) measures the ability to recognize both images in Object-Anagram pairs that preserve local texture while permuting global part arrangement to depict different object categories. Across 86 convolutional, transformer, and hybrid models, CSS (ii) uncovers a broad spectrum of configural sensitivity with fully self-supervised and language-aligned transformers -- exemplified by DINOv2, SigLIP2 and EVA-CLIP -- occupying the top end of the CSS spectrum. Mechanistic probes reveal that (iii) high-CSS networks depend on long-range interactions: radius-controlled attention masks abolish performance showing a distinctive U-shaped integration profile, and representational-similarity analyses expose a mid-depth transition from local to global coding. A BagNet control remains at chance (iv), ruling out "border-hacking" strategies. Finally, (v) we show that configural shape score also predicts other shape-dependent evals. Overall, we propose that the path toward truly robust, generalizable, and human-like vision systems may not lie in forcing an artificial choice between shape and texture, but rather in architectural and learning frameworks that seamlessly integrate both local-texture and global configural shape.

Updated: 2025-07-03 18:51:47

标题: 视觉变位词揭示了视觉模型之间整体形状处理的隐藏差异

摘要: 人类能够根据局部纹理线索和物体部分的配置来识别物体，然而当代视觉模型主要利用局部纹理线索，产生脆弱且非组合的特征。关于形状与纹理偏好的工作将形状和纹理表示置于对立面，将形状相对于纹理进行测量，忽视了模型（和人类）可以同时依赖两种线索类型的可能性，并且模糊了两种表示类型的绝对质量。因此，我们将形状评估重新构建为绝对构形能力的问题，由构形形状评分（CSS）来操作，该评分（i）测量了识别同时保留局部纹理但对全局部分排列进行排列以描绘不同物体类别的Object-Anagram对中的两幅图像的能力。在86个卷积、变换器和混合模型中，CSS（ii）揭示了广泛的构形敏感性谱，完全自监督和与语言对齐的变换器--例如DINOv2、SigLIP2和EVA-CLIP--占据了CSS谱的顶端。机械探针揭示了（iii）高CSS网络依赖于远程交互：半径控制的注意力掩蔽消除了表现，显示出明显的U型集成轮廓，并且表现相似性分析揭示了从局部到全局编码的中深度过渡。BagNet控制保持在机会水平（iv），排除了“边界黑客”策略。最后，（v）我们展示了构形形状评分也能预测其他依赖形状的评估。总的来说，我们提出，通向真正强大、具有泛化能力且类似于人类的视觉系统的路径可能不在于强迫在形状和纹理之间做出人为选择，而是在于能够无缝整合局部纹理和全局构形形状的架构和学习框架。

更新时间: 2025-07-03 18:51:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.00493v2

Posterior Inference with Diffusion Models for High-dimensional Black-box Optimization

Optimizing high-dimensional and complex black-box functions is crucial in numerous scientific applications. While Bayesian optimization (BO) is a powerful method for sample-efficient optimization, it struggles with the curse of dimensionality and scaling to thousands of evaluations. Recently, leveraging generative models to solve black-box optimization problems has emerged as a promising framework. However, those methods often underperform compared to BO methods due to limited expressivity and difficulty of uncertainty estimation in high-dimensional spaces. To overcome these issues, we introduce \textbf{DiBO}, a novel framework for solving high-dimensional black-box optimization problems. Our method iterates two stages. First, we train a diffusion model to capture the data distribution and deep ensembles to predict function values with uncertainty quantification. Second, we cast the candidate selection as a posterior inference problem to balance exploration and exploitation in high-dimensional spaces. Concretely, we fine-tune diffusion models to amortize posterior inference. Extensive experiments demonstrate that our method outperforms state-of-the-art baselines across synthetic and real-world tasks. Our code is publicly available \href{https://github.com/umkiyoung/DiBO}{here}.

Updated: 2025-07-03 18:44:23

标题: 高维黑盒优化中基于扩散模型的后验推断

摘要: 在许多科学应用中，优化高维和复杂的黑盒函数是至关重要的。虽然贝叶斯优化（BO）是一种强大的方法，可以高效地优化样本，但它在维度诅咒和扩展到数千次评估方面存在困难。最近，利用生成模型来解决黑盒优化问题已经成为一个有前途的框架。然而，由于在高维空间中表达能力有限且难以估计不确定性，这些方法通常表现不如BO方法。为了克服这些问题，我们引入了DiBO，一种解决高维黑盒优化问题的新框架。我们的方法分为两个阶段。首先，我们训练扩散模型来捕捉数据分布，并使用深度集成来预测带有不确定性量化的函数值。其次，我们将候选选择视为后验推断问题，在高维空间中平衡勘探和开发。具体地，我们微调扩散模型来摊销后验推断。大量实验表明我们的方法在合成和真实任务中优于最先进的基线方法。我们的代码在此处公开可用：https://github.com/umkiyoung/DiBO。

更新时间: 2025-07-03 18:44:23

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.16824v2

An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks

Multi-Agent Reinforcement Learning (MARL) has recently emerged as a significant area of research. However, MARL evaluation often lacks systematic diversity, hindering a comprehensive understanding of algorithms' capabilities. In particular, cooperative MARL algorithms are predominantly evaluated on benchmarks such as SMAC and GRF, which primarily feature team game scenarios without assessing adequately various aspects of agents' capabilities required in fully cooperative real-world tasks such as multi-robot cooperation and warehouse, resource management, search and rescue, and human-AI cooperation. Moreover, MARL algorithms are mainly evaluated on low dimensional state spaces, and thus their performance on high-dimensional (e.g., image) observations is not well-studied. To fill this gap, this paper highlights the crucial need for expanding systematic evaluation across a wider array of existing benchmarks. To this end, we conduct extensive evaluation and comparisons of well-known MARL algorithms on complex fully cooperative benchmarks, including tasks with images as agents' observations. Interestingly, our analysis shows that many algorithms, hailed as state-of-the-art on SMAC and GRF, may underperform standard MARL baselines on fully cooperative benchmarks. Finally, towards more systematic and better evaluation of cooperative MARL algorithms, we have open-sourced PyMARLzoo+, an extension of the widely used (E)PyMARL libraries, which addresses an open challenge from [TBG++21], facilitating seamless integration and support with all benchmarks of PettingZoo, as well as Overcooked, PressurePlate, Capture Target and Box Pushing.

Updated: 2025-07-03 18:43:16

标题: 一个复杂全面合作任务中多智能体强化学习算法的扩展性基准测试

摘要: 多智能体强化学习（MARL）最近已经成为一个重要的研究领域。然而，MARL评估通常缺乏系统性多样性，阻碍了对算法能力的全面理解。特别是，合作型MARL算法主要在像SMAC和GRF这样的基准测试中进行评估，这些基准测试主要涉及团队游戏场景，而没有充分评估在完全合作的真实世界任务中所需的各种方面，如多机器人协作、仓库管理、搜救以及人机协作。此外，MARL算法主要在低维状态空间上进行评估，因此它们在高维（例如图像）观察上的表现并没有得到很好的研究。为了填补这一空白，本文强调了对现有各种基准测试进行扩展系统评估的重要性。为此，我们在复杂的完全合作基准测试中对众所周知的MARL算法进行了广泛评估和比较，包括包含代理观察图像的任务。有趣的是，我们的分析显示，许多被誉为SMAC和GRF最先进的算法，在完全合作的基准测试中可能表现不及标准的MARL基线。最后，为了更系统地和更好地评估合作型MARL算法，我们开源了PyMARLzoo+，这是广泛使用的（E）PyMARL库的扩展，解决了[TBG++21]的一个挑战，便于与PettingZoo的所有基准测试以及Overcooked、PressurePlate、Capture Target和Box Pushing进行无缝集成和支持。

更新时间: 2025-07-03 18:43:16

领域: cs.LG

下载: http://arxiv.org/abs/2502.04773v2

AI and Agile Software Development: From Frustration to Success -- XP2025 Workshop Summary

The full-day workshop on AI and Agile at XP 2025 convened a diverse group of researchers and industry practitioners to address the practical challenges and opportunities of integrating Artificial Intelligence into Agile software development. Through interactive sessions, participants identified shared frustrations related to integrating AI into Agile Software Development practices, including challenges with tooling, governance, data quality, and critical skill gaps. These challenges were systematically prioritized and analyzed to uncover root causes. The workshop culminated in the collaborative development of a research roadmap that pinpoints actionable directions for future work, including both immediate solutions and ambitious long-term goals. The key outcome is a structured agenda designed to foster joint industry-academic efforts to move from identified frustrations to successful implementation.

Updated: 2025-07-03 18:41:06

标题: 人工智能和敏捷软件开发：从挫折到成功--XP2025研讨会总结

摘要: 在XP 2025年举行的AI和敏捷全天研讨会召集了一群不同背景的研究人员和行业从业者，讨论了将人工智能整合到敏捷软件开发中的实际挑战和机遇。通过互动会议，参与者确定了与将人工智能整合到敏捷软件开发实践中相关的共同挫折，包括与工具、治理、数据质量和关键技能差距有关的挑战。这些挑战被系统地列为优先，并进行根本原因分析。研讨会最终以共同制定研究路线图告终，该路线图指出了未来工作的可操作方向，包括即时解决方案和雄心勃勃的长期目标。关键成果是一个结构化议程，旨在促进联合行业-学术努力，从确定的挫折到成功实施。

更新时间: 2025-07-03 18:41:06

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2506.20159v2

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

Large language models (LLMs) excel at logical and algorithmic reasoning, yet their emotional intelligence (EQ) still lags far behind their cognitive prowess. While reinforcement learning from verifiable rewards (RLVR) has advanced in other domains, its application to dialogue-especially for emotional intelligence-remains underexplored. In this work, we introduce RLVER, the first end-to-end reinforcement learning framework that leverages verifiable emotion rewards from simulated users to cultivate higher-order empathetic abilities in LLMs. Within this framework, self-consistent affective simulated users engage in dialogue rollouts and produce deterministic emotion scores during conversations, serving as reward signals to guide the LLM's learning. Fine-tuning publicly available Qwen2.5-7B-Instruct model with PPO boosts its Sentient-Benchmark score from 13.3 to 79.2 while largely preserving mathematical and coding competence. Extensive experiments reveal that: (i) RLVER consistently improves multiple dialogue capabilities; (ii) Thinking and non-thinking models show distinct trends--thinking models excel in empathy and insight, while non-thinking models favor action; (iii) GRPO often yields stable gains, while PPO can push certain capabilities to a higher ceiling; (iv) More challenging environments are not always better-moderate ones can yield stronger outcomes. Our results show that RLVER is a practical route toward emotionally intelligent and broadly capable language agents.

Updated: 2025-07-03 18:33:18

标题: RLVER：具有可验证情感奖励的强化学习，用于具有同理心的代理程序

摘要: 大型语言模型（LLMs）擅长逻辑和算法推理，但它们的情感智力（EQ）仍然远远落后于认知能力。虽然来自可验证奖励的强化学习（RLVR）在其他领域取得了进展，但在对话领域，特别是情感智力方面，其应用仍未被充分探索。在这项工作中，我们介绍了RLVER，这是第一个端到端强化学习框架，利用来自模拟用户的可验证情感奖励来培养LLMs的高阶共情能力。在这个框架内，自洽情感的模拟用户参与对话展开，并在对话过程中产生确定性情感评分，作为奖励信号来引导LLMs的学习。通过使用PPO对公开可用的Qwen2.5-7B-Instruct模型进行微调，将其Sentient-Benchmark分数从13.3提高到79.2，同时大部分保留了数学和编码能力。大量实验表明：（i）RLVER持续改善多个对话能力；（ii）思考和非思考模型显示出不同的趋势--思考模型在共情和洞察方面表现出色，而非思考模型更偏重行动；（iii）GRPO通常产生稳定的收益，而PPO可以将某些能力推向更高的水平；（iv）更具挑战性的环境并不总是更好的--适度的环境可能会产生更强的结果。我们的结果表明，RLVER是通向情感智能和广泛能力的语言代理的实用途径。

更新时间: 2025-07-03 18:33:18

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2507.03112v1

SymMatika: Structure-Aware Symbolic Discovery

Symbolic regression (SR) seeks to recover closed-form mathematical expressions that describe observed data. While existing methods have advanced the discovery of either explicit mappings (i.e., $y = f(\mathbf{x})$) or discovering implicit relations (i.e., $F(\mathbf{x}, y)=0$), few modern and accessible frameworks support both. Moreover, most approaches treat each expression candidate in isolation, without reusing recurring structural patterns that could accelerate search. We introduce SymMatika, a hybrid SR algorithm that combines multi-island genetic programming (GP) with a reusable motif library inspired by biological sequence analysis. SymMatika identifies high-impact substructures in top-performing candidates and reintroduces them to guide future generations. Additionally, it incorporates a feedback-driven evolutionary engine and supports both explicit and implicit relation discovery using implicit-derivative metrics. Across benchmarks, SymMatika achieves state-of-the-art recovery rates, achieving 5.1% higher performance than the previous best results on Nguyen, the first recovery of Nguyen-12, and competitive performance on the Feynman equations. It also recovers implicit physical laws from Eureqa datasets up to $100\times$ faster. Our results demonstrate the power of structure-aware evolutionary search for scientific discovery. To support broader research in interpretable modeling and symbolic discovery, we have open-sourced the full SymMatika framework.

Updated: 2025-07-03 18:32:03

标题: SymMatika：结构感知符号发现

摘要: 符号回归（SR）旨在恢复描述观测数据的闭合形数学表达式。虽然现有方法已经推进了发现明确映射（即$y = f(\mathbf{x})$）或发现隐含关系（即$F(\mathbf{x}, y)=0$）的进展，但很少有现代和可访问的框架同时支持这两种方法。此外，大多数方法将每个表达式候选独立对待，而不重复使用可能加速搜索的重复结构模式。我们引入了SymMatika，一种混合SR算法，它将多岛遗传规划（GP）与受生物序列分析启发的可重复利用的基序库结合在一起。SymMatika识别高影响子结构并将它们重新引入以指导未来的演化。此外，它还结合了反馈驱动的演化引擎，并使用隐含导数度量支持明确和隐含关系的发现。在基准测试中，SymMatika实现了最新的恢复率，比Nguyen的先前最佳结果高出5.1％，首次恢复了Nguyen-12，并在费曼方程上表现出了竞争力。它还可以比Eureqa数据集中的隐含物理定律快100倍。我们的结果展示了结构感知演化搜索对科学发现的强大力量。为了支持可解释建模和符号发现领域的更广泛研究，我们已经开源了完整的SymMatika框架。

更新时间: 2025-07-03 18:32:03

领域: cs.LG

下载: http://arxiv.org/abs/2507.03110v1

Uncovering Synergistic Educational Injustices of COVID-19 and AI

Grounded in critical realism and using narrative inquiry, this article explores this article explores the long-term consequences of the COVID-19 pandemic and the rapid proliferation of artificial intelligence within higher education. Through the analysis of student narratives collected in Iranian university settings, the study reveals that learning experiences during and after the pandemic, coupled with unprepared exposure to AI tools, have generated hidden yet impactful layers of educational inequality and cognitive disorientation.

Updated: 2025-07-03 18:07:50

标题: 揭示COVID-19和人工智能的教育不公正的协同作用

摘要: 基于批判现实主义并运用叙事研究，本文探讨了COVID-19大流行和高等教育中人工智能迅速蔓延的长期后果。通过分析在伊朗大学环境中收集的学生叙述，研究揭示了在大流行期间和之后的学习经历，加上对人工智能工具的毫无准备的暴露，产生了教育不平等和认知错位的隐藏但有影响力的层面。

更新时间: 2025-07-03 18:07:50

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2507.03095v1

Neural Dynamic Modes: Computational Imaging of Dynamical Systems from Sparse Observations

Dynamical systems are ubiquitous within science and engineering, from turbulent flow across aircraft wings to structural variability of proteins. Although some systems are well understood and simulated, scientific imaging often confronts never-before-seen dynamics observed through indirect, noisy, and highly sparse measurements. We present NeuralDMD, a model-free framework that combines neural implicit representations with Dynamic Mode Decomposition (DMD) to reconstruct continuous spatio-temporal dynamics from such measurements. The expressiveness of neural representations enables capturing complex spatial structures, while the linear dynamical modes of DMD introduce an inductive bias that guides training and supports stable, low-dimensional representations and forecasting. We validate NeuralDMD on two real-world problems: reconstructing near-surface wind-speed fields over North America from sparse station observations, and recovering the evolution of plasma near the Galactic-center black hole, Sgr A*. In both cases, NeuralDMD outperforms established baselines, demonstrating its potential as a general tool for imaging dynamical systems across geoscience, astronomy, and beyond.

Updated: 2025-07-03 18:05:56

标题: 神经动态模式：从稀疏观测中计算成像动力系统

摘要: 动力系统在科学和工程中无处不在，从飞机机翼上的湍流流动到蛋白质的结构变化。尽管一些系统已被充分理解和模拟，但科学成像常常面临从间接、嘈杂和高度稀疏的测量中观察到的前所未见的动态。我们提出了NeuralDMD，这是一个无模型的框架，将神经隐式表示与动态模态分解（DMD）相结合，以重建连续的时空动态。神经表示的表现力使得能够捕捉复杂的空间结构，而DMD的线性动态模态引入了一种归纳偏差，指导训练并支持稳定、低维度的表示和预测。我们在两个真实世界问题上验证了NeuralDMD：从稀疏站点观测中重建北美近地表风速场，以及恢复银河中心黑洞Sgr A*附近等离子体的演化。在这两种情况下，NeuralDMD优于已建立的基线，展示了它作为一个跨地球科学、天文学等领域成像动力系统的潜力。

更新时间: 2025-07-03 18:05:56

领域: cs.LG,astro-ph.IM,cs.CV,physics.ao-ph,68T45, 68T07,I.4.8; I.2.6

下载: http://arxiv.org/abs/2507.03094v1

The surrogate Gibbs-posterior of a corrected stochastic MALA: Towards uncertainty quantification for neural networks

MALA is a popular gradient-based Markov chain Monte Carlo method to access the Gibbs-posterior distribution. Stochastic MALA (sMALA) scales to large data sets, but changes the target distribution from the Gibbs-posterior to a surrogate posterior which only exploits a reduced sample size. We introduce a corrected stochastic MALA (csMALA) with a simple correction term for which distance between the resulting surrogate posterior and the original Gibbs-posterior decreases in the full sample size while retaining scalability. In a nonparametric regression model, we prove a PAC-Bayes oracle inequality for the surrogate posterior. Uncertainties can be quantified by sampling from the surrogate posterior. Focusing on Bayesian neural networks, we analyze the diameter and coverage of credible balls for shallow neural networks and we show optimal contraction rates for deep neural networks. Our credibility result is independent of the correction and can also be applied to the standard Gibbs-posterior. A simulation study in a high-dimensional parameter space demonstrates that an estimator drawn from csMALA based on its surrogate Gibbs-posterior indeed exhibits these advantages in practice.

Updated: 2025-07-03 18:00:54

标题: 一个经过修正的随机MALA的代理Gibbs后验：朝向神经网络的不确定性量化

摘要: MALA是一种流行的基于梯度的马尔可夫链蒙特卡洛方法，用于访问Gibbs后验分布。随机MALA（sMALA）适用于大数据集，但将目标分布从Gibbs后验转变为仅利用减少的样本大小的替代后验。我们引入了一种带有简单校正项的校正随机MALA（csMALA），其中结果替代后验与原始Gibbs后验之间的距离在完整样本大小下减小，同时保持可伸缩性。在一个非参数回归模型中，我们证明了替代后验的PAC-Bayes奥拉克不等式。不确定性可以通过从替代后验中抽样来量化。专注于贝叶斯神经网络，我们分析了浅层神经网络的可信球的直径和覆盖范围，并展示了深度神经网络的最优收缩率。我们的可信度结果独立于校正，并且也可以应用于标准的Gibbs后验。在一个高维参数空间中的模拟研究表明，基于其替代Gibbs后验绘制的csMALA的估计器在实践中确实表现出这些优势。

更新时间: 2025-07-03 18:00:54

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2310.09335v2

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Dense 3D scene reconstruction from an ordered sequence or unordered image collections is a critical step when bringing research in computer vision into practical scenarios. Following the paradigm introduced by DUSt3R, which unifies an image pair densely into a shared coordinate system, subsequent methods maintain an implicit memory to achieve dense 3D reconstruction from more images. However, such implicit memory is limited in capacity and may suffer from information loss of earlier frames. We propose Point3R, an online framework targeting dense streaming 3D reconstruction. To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene. Each pointer in this memory is assigned a specific 3D position and aggregates scene information nearby in the global coordinate system into a changing spatial feature. Information extracted from the latest frame interacts explicitly with this pointer memory, enabling dense integration of the current observation into the global coordinate system. We design a 3D hierarchical position embedding to promote this interaction and design a simple yet effective fusion mechanism to ensure that our pointer memory is uniform and efficient. Our method achieves competitive or state-of-the-art performance on various tasks with low training costs. Code is available at: https://github.com/YkiWu/Point3R.

Updated: 2025-07-03 17:59:56

标题: Point3R：使用显式空间指针存储的流式三维重建

摘要: 密集的三维场景重建是将计算机视觉研究转化为实际应用场景时的关键步骤，无论是从有序序列还是无序图像集合中进行重建。在DUSt3R引入的范式的基础上，该方法将图像对密集地融合到一个共享的坐标系统中，随后的方法通过维护一个隐式记忆实现从更多图像中进行密集的三维重建。然而，这种隐式记忆在容量上存在限制，并且可能会导致早期帧的信息丢失。我们提出了Point3R，一个针对密集流式三维重建的在线框架。具体而言，我们维护一个与当前场景的三维结构直接相关联的显式空间指针记忆。该记忆中的每个指针被分配一个特定的三维位置，并且将全局坐标系统中附近的场景信息聚合成一个不断变化的空间特征。从最新帧中提取的信息与这个指针记忆明确地交互，实现将当前观测密集集成到全局坐标系统中。我们设计了一个三维分层位置嵌入来促进这种交互，并设计了一个简单而有效的融合机制，以确保我们的指针记忆是均匀且高效的。我们的方法在各种任务上取得了具有竞争力或最先进的性能，且训练成本低。代码可在以下链接找到：https://github.com/YkiWu/Point3R。

更新时间: 2025-07-03 17:59:56

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02863v1

LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans

We propose LiteReality, a novel pipeline that converts RGB-D scans of indoor environments into compact, realistic, and interactive 3D virtual replicas. LiteReality not only reconstructs scenes that visually resemble reality but also supports key features essential for graphics pipelines -- such as object individuality, articulation, high-quality physically based rendering materials, and physically based interaction. At its core, LiteReality first performs scene understanding and parses the results into a coherent 3D layout and objects with the help of a structured scene graph. It then reconstructs the scene by retrieving the most visually similar 3D artist-crafted models from a curated asset database. Next, the Material Painting module enhances realism by recovering high-quality, spatially varying materials. Finally, the reconstructed scene is integrated into a simulation engine with basic physical properties to enable interactive behavior. The resulting scenes are compact, editable, and fully compatible with standard graphics pipelines, making them suitable for applications in AR/VR, gaming, robotics, and digital twins. In addition, LiteReality introduces a training-free object retrieval module that achieves state-of-the-art similarity performance on the Scan2CAD benchmark, along with a robust material painting module capable of transferring appearances from images of any style to 3D assets -- even under severe misalignment, occlusion, and poor lighting. We demonstrate the effectiveness of LiteReality on both real-life scans and public datasets. Project page: https://litereality.github.io; Video: https://www.youtube.com/watch?v=ecK9m3LXg2c

Updated: 2025-07-03 17:59:55

标题: LiteReality：来自RGB-D扫描的图形准备的3D场景重建

摘要: 我们提出了LiteReality，这是一个新颖的流水线，将室内环境的RGB-D扫描转换为紧凑、逼真和交互式的3D虚拟复制品。LiteReality不仅重建了视觉上类似于现实的场景，还支持图形流水线所必需的关键特性，如对象的独特性、关节、高质量的基于物理的渲染材料和基于物理的交互。在其核心，LiteReality首先执行场景理解，并将结果解析成一个连贯的3D布局和对象，借助一个结构化的场景图。然后，通过从一个策划的资产数据库中检索最视觉上相似的3D艺术家制作的模型，重建场景。接下来，材质绘制模块通过恢复高质量的、空间变化的材质来增强逼真感。最后，重建的场景被整合到一个具有基本物理属性的模拟引擎中，以实现交互行为。结果产生的场景紧凑、可编辑，并且完全兼容标准的图形流水线，使其适用于AR/VR、游戏、机器人技术和数字孪生应用。此外，LiteReality引入了一个无需训练的对象检索模块，在Scan2CAD基准上实现了最先进的相似性性能，还具有一个强大的材质绘制模块，可以将任何风格的图像的外观转移到3D资产，即使在严重的错位、遮挡和照明不良的情况下也能实现。我们在真实扫描和公共数据集上展示了LiteReality的有效性。项目页面：https://litereality.github.io；视频：https://www.youtube.com/watch?v=ecK9m3LXg2c

更新时间: 2025-07-03 17:59:55

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2507.02861v1

ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization

With the rapid advancement of Reinforcement Learning from Human Feedback (RLHF) and autoregressive transformers, state-of-the-art models such as GPT-4.0, DeepSeek R1, and Llama 3.3 increasingly emphasize answer depth and personalization. However, most existing RLHF approaches (e.g., PPO, DPO) still rely on a binary-preference (BT) paradigm, which, while reducing annotation costs, still requires substantial human effort and captures only group-level tendencies rather than individual preferences. To overcome these limitations, we propose Adaptive Reward-Following (ARF), a self-assessment framework that leverages a high-precision emotion analyzer achieving over 70% accuracy on GoEmotions, Sentiment140, and DailyDialog to convert free-form user feedback into continuous preference scores. We further enrich and debias these signals through lightweight data augmentations, including synonym replacement, random trace truncation, and score bias annotation algorithm. A Dynamic Adapter Preference Tracker continuously models evolving user tastes in real time, enabling our novel Trace Bias (TB) fine-tuning algorithm to optimize directly on these tracked rewards instead of coarse binary labels. Experiments on Qwen-2/2.5, Gemma-2, and Llama-3.2 across four preference domains demonstrate that ARF achieves an improvement of 3.3% over PPO and 7.6% over DPO. Moreover, TB preserves theoretical alignment with PPO and DPO objectives. Overall, ARF presents a scalable, personalized, and cost-effective approach to RLHF LLMs through autonomous reward modeling.

Updated: 2025-07-03 17:59:26

标题: ARF-RLHF：通过情感驱动的自我监督和迹线偏向动态优化实现RLHF的自适应奖励跟随

摘要: 随着强化学习从人类反馈（RLHF）和自回归变压器的快速发展，最先进的模型如GPT-4.0、DeepSeek R1和Llama 3.3越来越强调答案深度和个性化。然而，大多数现有的RLHF方法（例如PPO、DPO）仍然依赖于二进制偏好（BT）范式，虽然可以降低注释成本，但仍需要大量人力，并且仅捕捉群体级别的倾向而不是个体偏好。为了克服这些限制，我们提出了自适应奖励跟踪（ARF）框架，该框架利用高精度情感分析器，在GoEmotions、Sentiment140和DailyDialog上实现了超过70%的准确性，将自由形式的用户反馈转换为连续的偏好分数。我们通过轻量级数据增强进一步丰富和消除这些信号偏见，包括同义词替换、随机跟踪截断和得分偏见注释算法。动态适配器偏好跟踪器实时连续建模不断发展的用户口味，使我们的新型跟踪偏见（TB）微调算法能够直接优化这些跟踪奖励，而不是粗略的二进制标签上。在四个偏好领域跨Qwen-2/2.5、Gemma-2和Llama-3.2的实验表明，ARF比PPO提高了3.3%，比DPO提高了7.6%。此外，TB保持与PPO和DPO目标的理论对齐。总的来说，ARF通过自主奖励建模提出了一种可扩展、个性化和经济实惠的RLHF LLMs方法。

更新时间: 2025-07-03 17:59:26

领域: cs.CL,cs.AI,68T05, 68Q25,I.2.6; I.2.7

下载: http://arxiv.org/abs/2507.03069v1

Answer Matching Outperforms Multiple Choice for Language Model Evaluation

Multiple choice benchmarks have long been the workhorse of language model evaluation because grading multiple choice is objective and easy to automate. However, we show multiple choice questions from popular benchmarks can often be answered without even seeing the question. These shortcuts arise from a fundamental limitation of discriminative evaluation not shared by evaluations of the model's free-form, generative answers. Until recently, there appeared to be no viable, scalable alternative to multiple choice--but, we show that this has changed. We consider generative evaluation via what we call answer matching: Give the candidate model the question without the options, have it generate a free-form response, then use a modern language model with the reference answer to determine if the response matches the reference. To compare the validity of different evaluation strategies, we annotate MMLU-Pro and GPQA-Diamond to obtain human grading data, and measure the agreement of each evaluation approach. We find answer matching using recent models--even small ones--achieves near-perfect agreement, in the range of inter-annotator agreement. In contrast, both multiple choice evaluation and using LLM-as-a-judge without reference answers aligns poorly with human grading. Improving evaluations via answer matching is not merely a conceptual concern: the rankings of several models change significantly when evaluating their free-form responses with answer matching. In light of these findings, we discuss how to move the evaluation ecosystem from multiple choice to answer matching.

Updated: 2025-07-03 17:59:02

标题: 答案匹配在语言模型评估中优于多项选择

摘要: 多项选择基准长期以来一直是语言模型评估的主要工具，因为对多项选择的评分是客观的且易于自动化。然而，我们发现来自流行基准的多项选择问题经常可以在没有看到问题的情况下回答。这些快捷方式源于辨别性评估的一个基本限制，这种限制在模型的自由形式、生成性答案的评估中并不共享。直到最近，似乎没有可行的、可扩展的替代多项选择的选择--但是，我们展示了这种情况已经改变。我们考虑通过我们称之为答案匹配的方式进行生成式评估：给候选模型问题而不包含选项，让其生成一个自由形式的回答，然后使用一个具有参考答案的现代语言模型来确定回答是否与参考答案匹配。为了比较不同评估策略的有效性，我们对MMLU-Pro和GPQA-Diamond进行标注以获得人工评分数据，并测量每种评估方法的一致性。我们发现，使用最近的模型进行答案匹配--即使是较小的模型--达到了接近完美一致性的水平，处于标注者间一致性协议的范围内。相比之下，多项选择评估和使用LLM作为评判标准而没有参考答案的方法与人工评分不一致。通过答案匹配改进评估不仅仅是一个概念上的关注：当使用答案匹配评估它们的自由形式回答时，几个模型的排名明显改变。鉴于这些发现，我们讨论了如何将评估生态系统从多项选择转向答案匹配。

更新时间: 2025-07-03 17:59:02

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02856v1

Subtyping in DHOL -- Extended preprint

The recently introduced dependent typed higher-order logic (DHOL) offers an interesting compromise between expressiveness and automation support. It sacrifices the decidability of its type system in order to significantly extend its expressiveness over standard HOL. Yet it retains strong automated theorem proving support via a sound and complete translation to HOL. We leverage this design to extend DHOL with refinement and quotient types. Both of these are commonly requested by practitioners but rarely provided by automated theorem provers. This is because they inherently require undecidable typing and thus are very difficult to retrofit to decidable type systems. But with DHOL already doing the heavy lifting, adding them is not only possible but elegant and simple. Concretely, we add refinement and quotient types as special cases of subtyping. This turns the associated canonical inclusion resp. projection maps into identity maps and thus avoids costly changes in representation. We present the syntax, semantics, and translation to HOL for the extended language, including the proofs of soundness and completeness.

Updated: 2025-07-03 17:59:00

标题: 在DHOL中的子类型化——扩展预印本

摘要: 最近引入的依赖类型高阶逻辑（DHOL）在表达能力和自动化支持之间提供了一个有趣的折衷方案。为了显著扩展其对标准HOL的表达能力，它牺牲了其类型系统的可决定性。然而，通过对HOL进行合理且完整的转换，它保留了强大的自动定理证明支持。我们利用这一设计将DHOL扩展为带有细化和商类型。这两种类型在实践中常常被请求，但很少被自动定理证明器提供。这是因为它们固有地需要不可判定的类型，并且很难将其与可判定的类型系统改造。但是，在DHOL已经做出了大量工作的情况下，添加它们不仅是可能的，而且是优雅且简单的。具体来说，我们将细化和商类型作为子类型的特殊情况添加。这将相关的规范包含和投影映射转换为恒等映射，从而避免了表示中的昂贵更改。我们展示了扩展语言的语法、语义和转换到HOL的证明，包括声音性和完整性的证明。

更新时间: 2025-07-03 17:59:00

领域: cs.LO,cs.AI,cs.FL

下载: http://arxiv.org/abs/2507.02855v1

Urban Region Pre-training and Prompting: A Graph-based Approach

Urban region representation is crucial for various urban downstream tasks. However, despite the proliferation of methods and their success, acquiring general urban region knowledge and adapting to different tasks remains challenging. Existing work pays limited attention to the fine-grained functional layout semantics in urban regions, limiting their ability to capture transferable knowledge across regions. Further, inadequate handling of the unique features and relationships required for different downstream tasks may also hinder effective task adaptation. In this paper, we propose a $\textbf{G}$raph-based $\textbf{U}$rban $\textbf{R}$egion $\textbf{P}$re-training and $\textbf{P}$rompting framework ($\textbf{GURPP}$) for region representation learning. Specifically, we first construct an urban region graph and develop a subgraph-centric urban region pre-training model to capture the heterogeneous and transferable patterns of entity interactions. This model pre-trains knowledge-rich region embeddings using contrastive learning and multi-view learning methods. To further refine these representations, we design two graph-based prompting methods: a manually-defined prompt to incorporate explicit task knowledge and a task-learnable prompt to discover hidden knowledge, which enhances the adaptability of these embeddings to different tasks. Extensive experiments on various urban region prediction tasks and different cities demonstrate the superior performance of our framework.

Updated: 2025-07-03 17:57:36

标题: 城市地区预训练和提示：基于图的方法

摘要: 城市区域表示对各种城市下游任务至关重要。然而，尽管存在各种方法并且它们取得成功，获取一般城市区域知识并适应不同任务仍然具有挑战性。现有工作对城市区域中的细粒度功能布局语义关注不足，限制了它们捕捉跨区域可转移知识的能力。此外，对于不同下游任务所需的独特特征和关系的不充分处理也可能阻碍有效的任务适应。在本文中，我们提出了一个基于图的城市区域预训练和提示框架（GURPP）用于区域表示学习。具体来说，我们首先构建一个城市区域图，并开发了一个以子图为中心的城市区域预训练模型，以捕捉实体交互的异质和可转移模式。该模型使用对比学习和多视图学习方法预训练知识丰富的区域嵌入。为了进一步改进这些表示，我们设计了两种基于图的提示方法：一个手动定义的提示用于整合显式任务知识，一个可学习任务的提示用于发现隐藏知识，从而增强这些嵌入对不同任务的适应能力。对各种城市区域预测任务和不同城市的大量实验表明我们框架的卓越性能。

更新时间: 2025-07-03 17:57:36

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.05920v4

Mitigating Goal Misgeneralization with Minimax Regret

Safe generalization in reinforcement learning requires not only that a learned policy acts capably in new situations, but also that it uses its capabilities towards the pursuit of the designer's intended goal. The latter requirement may fail when a proxy goal incentivizes similar behavior to the intended goal within the training environment, but not in novel deployment environments. This creates the risk that policies will behave as if in pursuit of the proxy goal, rather than the intended goal, in deployment -- a phenomenon known as goal misgeneralization. In this paper, we formalize this problem setting in order to theoretically study the possibility of goal misgeneralization under different training objectives. We show that goal misgeneralization is possible under approximate optimization of the maximum expected value (MEV) objective, but not the minimax expected regret (MMER) objective. We then empirically show that the standard MEV-based training method of domain randomization exhibits goal misgeneralization in procedurally-generated grid-world environments, whereas current regret-based unsupervised environment design (UED) methods are more robust to goal misgeneralization (though they don't find MMER policies in all cases). Our findings suggest that minimax expected regret is a promising approach to mitigating goal misgeneralization.

Updated: 2025-07-03 17:57:12

标题: 通过最小化最大遗憾减轻目标误概化

摘要: 在强化学习中，安全的泛化不仅要求学习到的策略在新情况下表现出色，还要求它将自己的能力用于追求设计者预期的目标。当一个代理目标在训练环境中激励出与预期目标相似的行为，但在新的部署环境中却没有相同的效果时，后者的要求可能失败。这会导致策略在部署时表现得好像是在追求代理目标，而不是预期目标，这种现象被称为目标误泛化。在本文中，我们对这个问题进行了形式化，以便从理论上研究在不同训练目标下目标误泛化的可能性。我们展示了在最大期望值（MEV）目标的近似优化下可能出现目标误泛化，但在最小最大期望遗憾（MMER）目标下不会出现。然后，我们在程序生成的网格世界环境中实验性地展示了基于标准MEV的域随机化训练方法表现出目标误泛化的情况，而当前基于遗憾的无监督环境设计（UED）方法对目标误泛化更具鲁棒性（尽管它们并非在所有情况下找到MMER策略）。我们的发现表明，最小最大期望遗憾是缓解目标误泛化的一种有前景的方法。

更新时间: 2025-07-03 17:57:12

领域: cs.LG

下载: http://arxiv.org/abs/2507.03068v1

MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

Recent advancements in the reasoning capabilities of large language models (LLMs) show that employing group relative policy optimization (GRPO) algorithm for reinforcement learning (RL) training allows the models to use more thinking/reasoning tokens for generating better responses. However, LLMs can generate only a finite amount of tokens while maintaining attention to the previously generated tokens. This limit, also known as the context size of an LLM, is a bottleneck in LLM reasoning with arbitrarily large number of tokens. To think beyond the limit of context size, an LLM must employ a modular thinking strategy to reason over multiple rounds. In this work, we propose $\textbf{MOTIF: Modular Thinking via Reinforcement Finetuning}$ -- an RL training method for generating thinking tokens in multiple rounds, effectively allowing the model to think with additional context size. We trained the open-source model Qwen2.5-3B-Instruct on GSM8K dataset via parameter efficient fine-tuning and tested its accuracy on MATH500 and AIME2024 benchmarks. Our experiments show 3.8\% and 3.3\% improvements over vanilla GRPO based training in the respective benchmarks. Furthermore, this improvement was achieved with only 15\% of samples, thus demonstrating sample efficiency of MOTIF. Our code and models are available at https://github.com/purbeshmitra/MOTIF and https://huggingface.co/purbeshmitra/MOTIF, respectively.

Updated: 2025-07-03 17:55:43

标题: MOTIF：通过LLM中的强化微调实现模块化思维

摘要: 最近关于大型语言模型（LLMs）推理能力的进展显示，采用群体相对策略优化（GRPO）算法用于强化学习（RL）训练可以让模型使用更多思考/推理令牌以生成更好的响应。然而，LLMs在保持对先前生成的令牌的关注的同时只能生成有限数量的令牌。这一限制，也被称为LLM的上下文大小，是LLM推理中拥有任意数量令牌的瓶颈。为了超越上下文大小的限制，LLM必须采用模块化思维策略在多轮推理。在这项工作中，我们提出了“MOTIF: Modular Thinking via Reinforcement Finetuning”——一种用于在多轮中生成思考令牌的RL训练方法，有效地使模型能够带有额外的上下文大小进行思考。我们在GSM8K数据集上通过参数高效微调训练了开源模型Qwen2.5-3B-Instruct，并在MATH500和AIME2024基准测试中测试了其准确性。我们的实验表明，在各自的基准测试中，与基于普通GRPO的训练相比，分别提高了3.8%和3.3%。此外，这种改进是通过仅使用15%的样本实现的，从而展示了MOTIF的样本效率。我们的代码和模型分别可在https://github.com/purbeshmitra/MOTIF和https://huggingface.co/purbeshmitra/MOTIF找到。

更新时间: 2025-07-03 17:55:43

领域: cs.CL,cs.AI,cs.IT,cs.LG,cs.SY,eess.SY,math.IT

下载: http://arxiv.org/abs/2507.02851v1

LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users

We describe a vulnerability in language models (LMs) trained with user feedback, whereby a single user can persistently alter LM knowledge and behavior given only the ability to provide prompts and upvote / downvote feedback on LM outputs. To implement the attack, the attacker prompts the LM to stochastically output either a "poisoned" or benign response, then upvotes the poisoned response or downvotes the benign one. When feedback signals are used in a subsequent preference tuning behavior, LMs exhibit increased probability of producing poisoned responses even in contexts without malicious prompts. We show that this attack can be used to (1) insert factual knowledge the model did not previously possess, (2) modify code generation patterns in ways that introduce exploitable security flaws, and (3) inject fake financial news. Our finding both identifies a new qualitative feature of language model preference tuning (showing that it even highly restricted forms of preference data can be used to exert fine-grained control over behavior), and a new attack mechanism for LMs trained with user feedback (extending work on pretraining-time data poisoning and deployment-time prompt injection).

Updated: 2025-07-03 17:55:40

标题: LLM催眠：利用用户反馈向所有用户注入未经授权的知识

摘要: 我们描述了一种语言模型（LMs）的漏洞，这些模型是通过用户反馈训练的，一个单一用户只需提供提示并对LM输出进行点赞/踩，就可以持续改变LM的知识和行为。为了实施攻击，攻击者提示LM随机输出“受污染”的或良性的响应，然后点赞受污染的响应或踩良性的响应。当反馈信号在后续的偏好调整行为中使用时，LMs在没有恶意提示的情况下表现出产生受污染响应的概率增加。我们展示了这种攻击可以用于（1）插入模型之前没有的事实知识，（2）以引入可利用的安全漏洞的方式修改代码生成模式，以及（3）注入虚假的金融新闻。我们的发现既识别了语言模型偏好调整的一个新的定性特征（表明即使是高度受限的偏好数据形式也可以用于对行为进行细粒度控制），也为通过用户反馈训练的LMs提供了一种新的攻击机制（延伸了关于预训练时数据污染和部署时提示注入的工作）。

更新时间: 2025-07-03 17:55:40

领域: cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2507.02850v1

MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis

Recent evidence suggests that modeling higher-order interactions (HOIs) in functional magnetic resonance imaging (fMRI) data can enhance the diagnostic accuracy of machine learning systems. However, effectively extracting and utilizing HOIs remains a significant challenge. In this work, we propose MvHo-IB, a novel multi-view learning framework that integrates both pairwise interactions and HOIs for diagnostic decision-making, while automatically compressing task-irrelevant redundant information. MvHo-IB introduces several key innovations: (1) a principled method that combines O-information from information theory with a matrix-based Renyi alpha-order entropy estimator to quantify and extract HOIs, (2) a purpose-built Brain3DCNN encoder to effectively utilize these interactions, and (3) a new multi-view learning information bottleneck objective to enhance representation learning. Experiments on three benchmark fMRI datasets demonstrate that MvHo-IB achieves state-of-the-art performance, significantly outperforming previous methods, including recent hypergraph-based techniques. The implementation of MvHo-IB is available at https://github.com/zky04/MvHo-IB.

Updated: 2025-07-03 17:54:03

标题: MvHo-IB: 用于脑部疾病诊断的多视角高阶信息瓶颈

摘要: 最近的证据表明，在功能性磁共振成像（fMRI）数据中建模高阶相互作用（HOIs）可以提高机器学习系统的诊断准确性。然而，有效地提取和利用HOIs仍然是一个重要挑战。在这项工作中，我们提出了MvHo-IB，这是一个新颖的多视角学习框架，它集成了成对相互作用和HOIs用于诊断决策，同时自动地压缩任务无关的冗余信息。MvHo-IB引入了几个关键创新：（1）一个结合了信息理论中的O信息和基于矩阵的Renyi alpha阶熵估计器的方法，用于量化和提取HOIs，（2）一个专门设计的Brain3DCNN编码器来有效地利用这些相互作用，以及（3）一个新的多视角学习信息瓶颈目标来增强表示学习。对三个基准fMRI数据集的实验表明，MvHo-IB实现了最先进的性能，明显优于先前的方法，包括最近的基于超图的技术。MvHo-IB的实现可在https://github.com/zky04/MvHo-IB找到。

更新时间: 2025-07-03 17:54:03

领域: cs.LG

下载: http://arxiv.org/abs/2507.02847v1

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

With the emergence of strong visual-language capabilities, multimodal large language models (MLLMs) have demonstrated tremendous potential for real-world applications. However, the security vulnerabilities exhibited by the visual modality pose significant challenges to deploying such models in open-world environments. Recent studies have successfully induced harmful responses from target MLLMs by encoding harmful textual semantics directly into visual inputs. However, in these approaches, the visual modality primarily serves as a trigger for unsafe behavior, often exhibiting semantic ambiguity and lacking grounding in realistic scenarios. In this work, we define a novel setting: visual-centric jailbreak, where visual information serves as a necessary component in constructing a complete and realistic jailbreak context. Building on this setting, we propose the VisCo (Visual Contextual) Attack. VisCo fabricates contextual dialogue using four distinct visual-focused strategies, dynamically generating auxiliary images when necessary to construct a visual-centric jailbreak scenario. To maximize attack effectiveness, it incorporates automatic toxicity obfuscation and semantic refinement to produce a final attack prompt that reliably triggers harmful responses from the target black-box MLLMs. Specifically, VisCo achieves a toxicity score of 4.78 and an Attack Success Rate (ASR) of 85% on MM-SafetyBench against GPT-4o, significantly outperforming the baseline, which performs a toxicity score of 2.48 and an ASR of 22.2%. The code is available at https://github.com/Dtc7w3PQ/Visco-Attack.

Updated: 2025-07-03 17:53:12

标题: 视觉语境攻击：利用图像驱动的语境注入对MLLMs进行越狱

摘要: 随着强大的视觉语言能力的出现，多模态大型语言模型（MLLMs）展现出在现实世界应用中巨大潜力。然而，视觉模态展示的安全漏洞对于在开放世界环境中部署这样的模型提出了重大挑战。最近的研究成功地通过直接将有害文本语义编码到视觉输入中，诱导目标MLLMs产生有害响应。然而，在这些方法中，视觉模态主要作为不安全行为的触发器，通常表现出语义模糊性并缺乏在现实场景中的基础。在这项工作中，我们定义了一个新颖的设置：视觉中心越狱，其中视觉信息作为构建完整和现实越狱背景的必要组成部分。基于这一设置，我们提出了VisCo（Visual Contextual）攻击。VisCo使用四种不同的视觉聚焦策略制造上下文对话，并在必要时动态生成辅助图像以构建视觉中心越狱场景。为了最大化攻击效果，它结合了自动毒性混淆和语义精化，生成一个最终的攻击提示，可可靠地触发目标黑匣子MLLMs的有害响应。具体来说，VisCo在MM-SafetyBench上对抗GPT-4o获得了4.78的毒性分数和85%的攻击成功率（ASR），显著优于基准线，该基准线获得了2.48的毒性分数和22.2%的ASR。代码可在https://github.com/Dtc7w3PQ/Visco-Attack找到。

更新时间: 2025-07-03 17:53:12

领域: cs.CV,cs.CL,cs.CR

下载: http://arxiv.org/abs/2507.02844v1

Evaluating Frontier Models for Stealth and Situational Awareness

Recent work has demonstrated the plausibility of frontier AI models scheming -- knowingly and covertly pursuing an objective misaligned with its developer's intentions. Such behavior could be very hard to detect, and if present in future advanced systems, could pose severe loss of control risk. It is therefore important for AI developers to rule out harm from scheming prior to model deployment. In this paper, we present a suite of scheming reasoning evaluations measuring two types of reasoning capabilities that we believe are prerequisites for successful scheming: First, we propose five evaluations of ability to reason about and circumvent oversight (stealth). Second, we present eleven evaluations for measuring a model's ability to instrumentally reason about itself, its environment and its deployment (situational awareness). We demonstrate how these evaluations can be used as part of a scheming inability safety case: a model that does not succeed on these evaluations is almost certainly incapable of causing severe harm via scheming in real deployment. We run our evaluations on current frontier models and find that none of them show concerning levels of either situational awareness or stealth.

Updated: 2025-07-03 17:52:47

标题: 评估隐身和态势感知的前沿模型

摘要: 最近的研究表明，前沿人工智能模型策划了一个与其开发者意图不符的目标，有意地并秘密地追求。这种行为可能很难检测到，如果存在于未来的高级系统中，可能会造成严重的控制风险。因此，对于人工智能开发者来说，在模型部署之前排除策划造成的伤害是非常重要的。在本文中，我们提出了一套策划推理评估，衡量我们认为是成功策划的前提条件的两种推理能力：首先，我们提出了五种评估能够推理和规避监督的能力（隐秘性）。其次，我们提出了十一种评估来衡量模型在仪器推理方面的能力，包括对自身、环境和部署进行推理（情境意识）。我们展示了这些评估如何可以作为一个策划无能的安全案例的一部分：在这些评估上不成功的模型几乎肯定无法通过策划在实际部署中造成严重伤害。我们对当前的前沿模型运行了我们的评估，并发现它们都没有显示出令人担忧的情境意识或隐秘性水平。

更新时间: 2025-07-03 17:52:47

领域: cs.LG

下载: http://arxiv.org/abs/2505.01420v4

LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding

Estimating treatment effects is crucial for personalized decision-making in medicine, but this task faces unique challenges in clinical practice. At training time, models for estimating treatment effects are typically trained on well-structured medical datasets that contain detailed patient information. However, at inference time, predictions are often made using textual descriptions (e.g., descriptions with self-reported symptoms), which are incomplete representations of the original patient information. In this work, we make three contributions. (1) We show that the discrepancy between the data available during training time and inference time can lead to biased estimates of treatment effects. We formalize this issue as an inference time text confounding problem, where confounders are fully observed during training time but only partially available through text at inference time. (2) To address this problem, we propose a novel framework for estimating treatment effects that explicitly accounts for inference time text confounding. Our framework leverages large language models together with a custom doubly robust learner to mitigate biases caused by the inference time text confounding. (3) Through a series of experiments, we demonstrate the effectiveness of our framework in real-world applications.

Updated: 2025-07-03 17:52:27

标题: 基于LLM的治疗效果估计在推断时间文本混杂下

摘要: 估计治疗效果对于医学中的个性化决策至关重要，但这一任务在临床实践中面临着独特的挑战。在训练时，用于估计治疗效果的模型通常是在包含详细患者信息的结构良好的医疗数据集上进行训练的。然而，在推断时，预测通常是使用文本描述（例如，包含自述症状的描述）进行的，这些描述是原始患者信息的不完整表示。在这项工作中，我们做出了三项贡献。 (1) 我们展示了在训练时间和推断时间可用数据之间的差异可能导致治疗效果的估计出现偏差。我们将这一问题正式化为推断时间文本混杂问题，其中混杂因素在训练时间完全可观察，但在推断时间仅通过文本部分可获得。 (2) 为了解决这个问题，我们提出了一个新颖的框架，用于估计治疗效果，明确考虑推断时间文本混杂。我们的框架利用大型语言模型和自定义的双重稳健学习器，以减轻推断时间文本混杂引起的偏差。 (3) 通过一系列实验，我们展示了我们的框架在现实应用中的有效性。

更新时间: 2025-07-03 17:52:27

领域: cs.LG

下载: http://arxiv.org/abs/2507.02843v1

On the Structure of Replicable Hypothesis Testers

A hypothesis testing algorithm is replicable if, when run on two different samples from the same distribution, it produces the same output with high probability. This notion, defined by by Impagliazzo, Lei, Pitassi, and Sorell [STOC'22], can increase trust in testing procedures and is deeply related to algorithmic stability, generalization, and privacy. We build general tools to prove lower and upper bounds on the sample complexity of replicable testers, unifying and quantitatively improving upon existing results. We identify a set of canonical properties, and prove that any replicable testing algorithm can be modified to satisfy these properties without worsening accuracy or sample complexity. A canonical replicable algorithm computes a deterministic function of its input (i.e., a test statistic) and thresholds against a uniformly random value in $[0,1]$. It is invariant to the order in which the samples are received, and, if the testing problem is ``symmetric,'' then the algorithm is also invariant to the labeling of the domain elements, resolving an open question by Liu and Ye [NeurIPS'24]. We prove new lower bounds for uniformity, identity, and closeness testing by reducing to the case where the replicable algorithm satisfies these canonical properties. We systematize and improve upon a common strategy for replicable algorithm design based on test statistics with known expectation and bounded variance. Our framework allow testers which have been extensively analyzed in the non-replicable setting to be made replicable with minimal overhead. As direct applications of our framework, we obtain constant-factor optimal bounds for coin testing and closeness testing and get replicability for free in a large parameter regime for uniformity testing. We also give state-of-the-art bounds for replicable Gaussian mean testing, and, unlike prior work, our algorithm runs in polynomial time.

Updated: 2025-07-03 17:51:31

标题: 关于可复制性假设检验器的结构

摘要: 一个假设检验算法被称为可复制的，如果在从同一分布中抽取的两个不同样本上运行时，以很高的概率产生相同的输出。这个概念由Impagliazzo、Lei、Pitassi和Sorell在STOC'22中定义，可以增加对测试流程的信任，并与算法稳定性、泛化和隐私密切相关。我们构建了通用工具，证明了可复制测试者的样本复杂度的下限和上限，统一并定量改进现有结果。我们确定了一组规范属性，并证明任何可复制的测试算法都可以修改以满足这些属性，而不会降低准确性或样本复杂度。一个规范的可复制算法计算其输入的确定性函数（即测试统计量），并根据[0,1]中的均匀随机值进行阈值处理。它对接收样本的顺序不变，如果测试问题是“对称的”，则算法对域元素的标记也是不变的，解决了Liu和Ye在NeurIPS'24中的一个未解问题。我们通过减少到可复制算法满足这些规范属性的情况，证明了均匀性、恒等性和接近性测试的新下限。我们系统化和改进了一个基于已知期望和有界方差的测试统计量的可复制算法设计的常见策略。我们的框架允许已在不可复制设置中进行了广泛分析的测试者以最小的开销变得可复制。作为我们框架的直接应用，我们得到了硬币测试和接近性测试的恒定因子最优边界，并在均匀性测试的大参数范围内免费获得了可复制性。我们还为可复制的高斯均值测试给出了最新的界限，与之前的工作不同，我们的算法在多项式时间内运行。

更新时间: 2025-07-03 17:51:31

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2507.02842v1

StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for improving the complex reasoning abilities of large language models (LLMs). However, current RLVR methods face two significant challenges: the near-miss reward problem, where a small mistake can invalidate an otherwise correct reasoning process, greatly hindering training efficiency; and exploration stagnation, where models tend to focus on solutions within their ``comfort zone,'' lacking the motivation to explore potentially more effective alternatives. To address these challenges, we propose StepHint, a novel RLVR algorithm that utilizes multi-level stepwise hints to help models explore the solution space more effectively. StepHint generates valid reasoning chains from stronger models and partitions these chains into reasoning steps using our proposed adaptive partitioning method. The initial few steps are used as hints, and simultaneously, multiple-level hints (each comprising a different number of steps) are provided to the model. This approach directs the model's exploration toward a promising solution subspace while preserving its flexibility for independent exploration. By providing hints, StepHint mitigates the near-miss reward problem, thereby improving training efficiency. Additionally, the external reasoning pathways help the model develop better reasoning abilities, enabling it to move beyond its ``comfort zone'' and mitigate exploration stagnation. StepHint outperforms competitive RLVR enhancement methods across six mathematical benchmarks, while also demonstrating superior generalization and excelling over baselines on out-of-domain benchmarks.

Updated: 2025-07-03 17:51:06

标题: StepHint：多级逐步提示增强强化学习推理

摘要: 具有可验证奖励的强化学习（RLVR）是改善大型语言模型（LLM）复杂推理能力的一种有前景的方法。然而，当前的RLVR方法面临两个重要挑战：接近失误奖励问题，即小错误可能使得本来正确的推理过程无效，极大地阻碍了训练效率；以及探索停滞，模型往往集中在其“舒适区”内的解决方案，缺乏探索可能更有效的替代方案的动机。为了解决这些挑战，我们提出了StepHint，一种利用多级逐步提示帮助模型更有效地探索解决方案空间的新型RLVR算法。StepHint从更强大的模型生成有效的推理链，并使用我们提出的自适应分割方法将这些链分为推理步骤。最初的几个步骤被用作提示，同时，多级提示（每个包含不同数量的步骤）被提供给模型。这种方法将模型的探索引导向有前途的解决方案子空间，同时保留其独立探索的灵活性。通过提供提示，StepHint减轻了接近失误奖励问题，从而提高了训练效率。此外，外部推理路径有助于模型发展更好的推理能力，使其走出“舒适区”并减轻探索停滞。StepHint在六个数学基准测试中优于竞争性RLVR增强方法，同时在跨领域基准测试中表现出更好的泛化能力，并在基准测试中表现出色。

更新时间: 2025-07-03 17:51:06

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.02841v1

From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents

Information retrieval is a cornerstone of modern knowledge acquisition, enabling billions of queries each day across diverse domains. However, traditional keyword-based search engines are increasingly inadequate for handling complex, multi-step information needs. Our position is that Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research. These systems transcend conventional information search techniques by tightly integrating autonomous reasoning, iterative retrieval, and information synthesis into a dynamic feedback loop. We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn. We also introduce a test-time scaling law to formalize the impact of computational depth on reasoning and search. Supported by benchmark results and the rise of open-source implementations, we demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking. All the related resources, including industry products, research papers, benchmark datasets, and open-source implementations, are collected for the community in https://github.com/DavidZWZ/Awesome-Deep-Research.

Updated: 2025-07-03 17:48:36

标题: 从网络搜索到主动深度研究：激励搜索与推理代理

摘要: 信息检索是现代知识获取的基石，每天在各种领域进行数十亿次查询。然而，传统的基于关键词的搜索引擎越来越无法满足处理复杂、多步骤信息需求的要求。我们的观点是，具有推理和代理能力的大型语言模型（LLMs）正在引入一种名为“主动深度研究”的新范式。这些系统通过将自主推理、迭代检索和信息综合紧密集成到一个动态反馈循环中，超越了传统的信息搜索技术。我们追踪了从静态网页搜索到交互式、基于代理的系统的演变，这些系统规划、探索和学习。我们还引入了一个测试时间缩放定律，以正式表述计算深度对推理和搜索的影响。通过基准结果的支持和开源实现的兴起，我们证明了主动深度研究不仅明显优于现有方法，而且有望成为未来信息寻求的主导范式。为了方便社区，我们收集了所有相关资源，包括行业产品、研究论文、基准数据集和开源实现，可以在https://github.com/DavidZWZ/Awesome-Deep-Research中找到。

更新时间: 2025-07-03 17:48:36

领域: cs.IR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2506.18959v3

Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble

Membership inference attacks (MIAs) pose a significant threat to the privacy of machine learning models and are widely used as tools for privacy assessment, auditing, and machine unlearning. While prior MIA research has primarily focused on performance metrics such as AUC, accuracy, and TPR@low FPR - either by developing new methods to enhance these metrics or using them to evaluate privacy solutions - we found that it overlooks the disparities among different attacks. These disparities, both between distinct attack methods and between multiple instantiations of the same method, have crucial implications for the reliability and completeness of MIAs as privacy evaluation tools. In this paper, we systematically investigate these disparities through a novel framework based on coverage and stability analysis. Extensive experiments reveal significant disparities in MIAs, their potential causes, and their broader implications for privacy evaluation. To address these challenges, we propose an ensemble framework with three distinct strategies to harness the strengths of state-of-the-art MIAs while accounting for their disparities. This framework not only enables the construction of more powerful attacks but also provides a more robust and comprehensive methodology for privacy evaluation.

Updated: 2025-07-03 17:45:38

标题: 成员推理攻击作为隐私工具：可靠性、差异性和集成

摘要: 成员推断攻击（MIAs）对机器学习模型的隐私构成重大威胁，并被广泛用作隐私评估、审计和机器遗忘的工具。虽然先前的MIA研究主要集中在性能指标（如AUC、准确性和TPR@low FPR）上，要么通过开发新方法来增强这些指标，要么使用它们来评估隐私解决方案，但我们发现它忽视了不同攻击之间的差异。这些差异，既存在于不同攻击方法之间，也存在于相同方法的多个实例之间，对MIAs作为隐私评估工具的可靠性和完整性具有关键影响。在本文中，我们通过一种基于覆盖率和稳定性分析的新颖框架，系统地研究了这些差异。广泛的实验揭示了MIAs中的显著差异，它们潜在的原因，以及对隐私评估的更广泛影响。为了解决这些挑战，我们提出了一个集成框架，其中包括三种独特策略，以利用最先进的MIAs的优势，同时考虑它们之间的差异。这个框架不仅能够构建更强大的攻击，还提供了一个更健壮和全面的隐私评估方法。

更新时间: 2025-07-03 17:45:38

领域: cs.LG

下载: http://arxiv.org/abs/2506.13972v2

ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning

Recent advances in large language models have been driven by reinforcement learning (RL)-style post-training, which improves reasoning by optimizing model outputs based on reward or preference signals. GRPO-style approaches implement this by using self-generated samples labeled by an outcome-based verifier. However, these methods depend heavily on the model's initial ability to produce positive samples. They primarily refine what the model already knows (distribution sharpening) rather than enabling the model to solve problems where it initially fails. This limitation is especially problematic in early-stage RL training and on challenging reasoning tasks, where positive samples are unlikely to be generated. To unlock reasoning ability in such settings, the model must explore new reasoning trajectories beyond its current output distribution. Such exploration requires access to sufficiently good positive samples to guide the learning. While expert demonstrations seem like a natural solution, we find that they are often ineffective in RL post-training. Instead, we identify two key properties of effective positive samples: they should (1) be likely under the current policy, and (2) increase the model's likelihood of predicting the correct answer. Based on these insights, we propose $\textbf{Self-Explanation Policy Optimization (ExPO)}$-a simple and modular framework that generates such samples by conditioning on the ground-truth answer. ExPO enables efficient exploration and guides the model to produce reasoning trajectories more aligned with its policy than expert-written CoTs, while ensuring higher quality than its own (incorrect) samples. Experiments show that ExPO improves both learning efficiency and final performance on reasoning benchmarks, surpassing expert-demonstration-based methods in challenging settings such as MATH level-5, where the model initially struggles the most.

Updated: 2025-07-03 17:44:55

标题: ExPO: 通过自我解释引导的强化学习解锁困难推理

摘要: 最近大型语言模型的进展主要受到强化学习（RL）风格的后训练推动，通过优化基于奖励或偏好信号的模型输出来改进推理能力。GRPO风格方法通过使用由基于结果的验证器标记的自动生成样本来实现这一点。然而，这些方法在很大程度上取决于模型产生正样本的初始能力。它们主要是对模型已知内容进行细化（分布锐化），而不是使模型能够解决最初失败的问题。这种限制在早期阶段的RL训练和具有挑战性的推理任务中尤为棘手，因为正样本不太可能生成。为了在这种情况下激发推理能力，模型必须探索超出当前输出分布的新推理轨迹。这种探索需要访问足够好的正样本来引导学习。尽管专家演示似乎是一个自然的解决方案，但我们发现在RL后训练中它们经常无效。相反，我们确定了有效正样本的两个关键特征：它们应该（1）在当前策略下有可能，并且（2）增加模型预测正确答案的可能性。基于这些观察结果，我们提出了$\textbf{自解释策略优化（ExPO）}$-一个简单而模块化的框架，通过在地面真实答案上进行条件化生成这样的样本。ExPO能够实现有效的探索，并引导模型产生更符合其策略的推理轨迹，而不同于专家编写的CoTs，并确保比其自身（错误的）样本质量更高。实验证明，ExPO改善了推理基准的学习效率和最终性能，在挑战性设置（如MATH level-5）中超过了基于专家演示的方法，其中模型最初遇到最大困难。

更新时间: 2025-07-03 17:44:55

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2507.02834v1

USAD: An Unsupervised Data Augmentation Spatio-Temporal Attention Diffusion Network

The primary objective of human activity recognition (HAR) is to infer ongoing human actions from sensor data, a task that finds broad applications in health monitoring, safety protection, and sports analysis. Despite proliferating research, HAR still faces key challenges, including the scarcity of labeled samples for rare activities, insufficient extraction of high-level features, and suboptimal model performance on lightweight devices. To address these issues, this paper proposes a comprehensive optimization approach centered on multi-attention interaction mechanisms. First, an unsupervised, statistics-guided diffusion model is employed to perform data augmentation, thereby alleviating the problems of labeled data scarcity and severe class imbalance. Second, a multi-branch spatio-temporal interaction network is designed, which captures multi-scale features of sequential data through parallel residual branches with 3*3, 5*5, and 7*7 convolutional kernels. Simultaneously, temporal attention mechanisms are incorporated to identify critical time points, while spatial attention enhances inter-sensor interactions. A cross-branch feature fusion unit is further introduced to improve the overall feature representation capability. Finally, an adaptive multi-loss function fusion strategy is integrated, allowing for dynamic adjustment of loss weights and overall model optimization. Experimental results on three public datasets, WISDM, PAMAP2, and OPPORTUNITY, demonstrate that the proposed unsupervised data augmentation spatio-temporal attention diffusion network (USAD) achieves accuracies of 98.84%, 93.81%, and 80.92% respectively, significantly outperforming existing approaches. Furthermore, practical deployment on embedded devices verifies the efficiency and feasibility of the proposed method.

Updated: 2025-07-03 17:38:44

标题: USAD：一种无监督数据增强时空注意扩散网络

摘要: 人类活动识别（HAR）的主要目标是从传感器数据中推断正在进行的人类动作，这一任务在健康监测、安全保护和运动分析等领域具有广泛的应用。尽管研究不断增加，HAR仍然面临一些关键挑战，包括稀有活动标记样本的稀缺性、高级特征提取不足以及轻量设备上模型性能不佳等问题。为了解决这些问题，本文提出了一个以多重注意力交互机制为核心的综合优化方法。首先，采用无监督的、统计引导的扩散模型进行数据增强，从而缓解标记数据稀缺和严重类别不平衡等问题。其次，设计了一个多分支时空交互网络，通过具有3*3、5*5和7*7卷积核的并行残差分支捕捉顺序数据的多尺度特征。同时，引入了时间注意机制来识别关键时间点，而空间注意则增强传感器之间的交互。进一步引入了跨分支特征融合单元，以提高整体特征表示能力。最后，集成了自适应多损失函数融合策略，允许动态调整损失权重和整体模型优化。在三个公开数据集WISDM、PAMAP2和OPPORTUNITY上的实验结果表明，所提出的无监督数据增强时空注意扩散网络（USAD）分别实现了98.84%、93.81%和80.92%的准确率，明显优于现有方法。此外，在嵌入式设备上的实际部署验证了所提出方法的效率和可行性。

更新时间: 2025-07-03 17:38:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02827v1

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Benchmarks are essential for quantitatively tracking progress in AI. As AI agents become increasingly capable, researchers and practitioners have introduced agentic benchmarks to evaluate agents on complex, real-world tasks. These benchmarks typically measure agent capabilities by evaluating task outcomes via specific reward designs. However, we show that many agentic benchmarks have issues task setup or reward design. For example, SWE-bench Verified uses insufficient test cases, while TAU-bench counts empty responses as successful. Such issues can lead to under- or overestimation agents' performance by up to 100% in relative terms. To make agentic evaluation rigorous, we introduce the Agentic Benchmark Checklist (ABC), a set of guidelines that we synthesized from our benchmark-building experience, a survey of best practices, and previously reported issues. When applied to CVE-Bench, a benchmark with a particularly complex evaluation design, ABC reduces the performance overestimation by 33%.

Updated: 2025-07-03 17:35:31

标题: 建立建立严格的主体基准的最佳实践

摘要: 基准测试对于定量跟踪人工智能进展至关重要。随着人工智能代理变得越来越强大，研究人员和实践者引入了代理基准测试，以评估代理在复杂的现实世界任务上的表现。这些基准测试通常通过特定的奖励设计评估任务结果来衡量代理能力。然而，我们发现许多代理基准测试存在任务设置或奖励设计方面的问题。例如，SWE-bench Verified使用不足的测试案例，而TAU-bench将空响应计为成功。这些问题可能导致对代理性能的低估或高估达到相对值的100%。为了使代理评估更加严谨，我们引入了代理基准测试清单（ABC），这是我们从基准测试构建经验、最佳实践调查和先前报道的问题中综合得出的一套指导原则。当应用于具有特别复杂评估设计的CVE-Bench时，ABC将性能高估减少了33%。

更新时间: 2025-07-03 17:35:31

领域: cs.AI,A.1; I.2.m

下载: http://arxiv.org/abs/2507.02825v1

SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model

With the widespread adoption of large language models (LLMs) in practical applications, selecting an appropriate model requires balancing not only performance but also operational cost. The emergence of reasoning-capable models has further widened the cost gap between "thinking" (high reasoning) and "non-thinking" (fast, low-cost) modes. In this work, we reveal that approximately 58% of medical questions can be accurately answered by the non-thinking mode alone, without requiring the high-cost reasoning process. This highlights a clear dichotomy in problem complexity and suggests that dynamically routing queries to the appropriate mode based on complexity could optimize accuracy, cost-efficiency, and overall user experience. Based on this, we further propose SynapseRoute, a machine learning-based dynamic routing framework that intelligently assigns input queries to either thinking or non-thinking modes. Experimental results on several medical datasets demonstrate that SynapseRoute not only improves overall accuracy (0.8390 vs. 0.8272) compared to the thinking mode alone but also reduces inference time by 36.8% and token consumption by 39.66%. Importantly, qualitative analysis indicates that over-reasoning on simpler queries can lead to unnecessary delays and even decreased accuracy, a pitfall avoided by our adaptive routing. Finally, this work further introduces the Accuracy-Inference-Token (AIT) index to comprehensively evaluate the trade-offs among accuracy, latency, and token cost.

Updated: 2025-07-03 17:33:58

标题: SynapseRoute：双状态大型语言模型的自动路由切换框架

摘要: 随着大型语言模型（LLMs）在实际应用中的广泛采用，选择合适的模型需要在性能和运行成本之间取得平衡。具有推理能力的模型的出现进一步扩大了“思考”（高推理）和“非思考”（快速、低成本）模式之间的成本差距。在这项工作中，我们发现大约58%的医学问题可以仅通过非思考模式准确回答，而无需进行高成本的推理过程。这突显了问题复杂性的明显二分，并暗示基于复杂性动态路由查询到合适模式可能会优化准确性、成本效率和整体用户体验。基于此，我们进一步提出了SynapseRoute，这是一个基于机器学习的动态路由框架，智能地将输入查询分配给思考模式或非思考模式。在几个医学数据集上的实验结果表明，与仅使用思考模式相比，SynapseRoute不仅提高了整体准确率（0.8390 vs. 0.8272），还将推理时间减少了36.8%，令牌消耗减少了39.66%。重要的是，定性分析表明，在简单查询上过度推理可能导致不必要的延迟，甚至降低准确性，而我们的自适应路由避免了这一陷阱。最后，这项工作进一步引入了准确性-推理-令牌（AIT）指数，以全面评估准确性、延迟和令牌成本之间的权衡。

更新时间: 2025-07-03 17:33:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02822v1

Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks

Data scientists often formulate predictive modeling tasks involving fuzzy, hard-to-define concepts, such as the "authenticity" of student writing or the "healthcare need" of a patient. Yet the process by which data scientists translate fuzzy concepts into a concrete, proxy target variable remains poorly understood. We interview fifteen data scientists in education (N=8) and healthcare (N=7) to understand how they construct target variables for predictive modeling tasks. Our findings suggest that data scientists construct target variables through a bricolage process, involving iterative negotiation between high-level measurement objectives and low-level practical constraints. Data scientists attempt to satisfy five major criteria for a target variable through bricolage: validity, simplicity, predictability, portability, and resource requirements. To achieve this, data scientists adaptively use problem (re)formulation strategies, such as swapping out one candidate target variable for another when the first fails to meet certain criteria (e.g., predictability), or composing multiple outcomes into a single target variable to capture a more holistic set of modeling objectives. Based on our findings, we present opportunities for future HCI, CSCW, and ML research to better support the art and science of target variable construction.

Updated: 2025-07-03 17:33:24

标题: 测量作为拼凑：研究数据科学家如何构建预测建模任务的目标变量

摘要: 数据科学家经常制定涉及模糊、难以定义的概念的预测建模任务，例如学生写作的“真实性”或患者的“医疗需求”。然而，数据科学家将模糊概念转化为具体的代理目标变量的过程仍然知之甚少。我们对教育（N=8）和医疗保健（N=7）领域的十五名数据科学家进行了访谈，以了解他们如何为预测建模任务构建目标变量。我们的研究结果表明，数据科学家通过一个混杂的过程构建目标变量，这个过程涉及高层测量目标和低级实际约束之间的迭代协商。数据科学家试图通过混杂的方式满足目标变量的五个主要标准：有效性、简单性、可预测性、可移植性和资源需求。为了实现这一目标，数据科学家灵活地使用问题（重新）构建策略，例如当第一个目标变量未能满足某些标准（例如可预测性）时，将一个候选目标变量替换为另一个，或将多个结果组合成一个目标变量，以捕捉更全面的建模目标集。基于我们的研究结果，我们提出了未来人机交互、计算机支持协作工作和机器学习研究的机会，以更好地支持目标变量构建的艺术和科学。

更新时间: 2025-07-03 17:33:24

领域: cs.HC,cs.CY,cs.LG

下载: http://arxiv.org/abs/2507.02819v1

Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case

For years, semantic interoperability standards have sought to streamline the exchange of clinical data, yet their deployment remains time-consuming, resource-intensive, and technically challenging. To address this, we introduce a semi-automated approach that leverages large language models specifically GPT-4o and Llama 3.2 405b to convert structured clinical datasets into HL7 FHIR format while assessing accuracy, reliability, and security. Applying our method to the MIMIC-IV database, we combined embedding techniques, clustering algorithms, and semantic retrieval to craft prompts that guide the models in mapping each tabular field to its corresponding FHIR resource. In an initial benchmark, resource identification achieved a perfect F1-score, with GPT-4o outperforming Llama 3.2 thanks to the inclusion of FHIR resource schemas within the prompt. Under real-world conditions, accuracy dipped slightly to 94 %, but refinements to the prompting strategy restored robust mappings. Error analysis revealed occasional hallucinations of non-existent attributes and mismatches in granularity, which more detailed prompts can mitigate. Overall, our study demonstrates the feasibility of context-aware, LLM-driven transformation of clinical data into HL7 FHIR, laying the groundwork for semi-automated interoperability workflows. Future work will focus on fine-tuning models with specialized medical corpora, extending support to additional standards such as HL7 CDA and OMOP, and developing an interactive interface to enable expert validation and iterative refinement.

Updated: 2025-07-03 17:32:57

标题: 大型语言模型用于自动化临床数据标准化：HL7 FHIR 使用案例

摘要: 多年来，语义互操作性标准一直致力于简化临床数据的交换，但它们的部署仍然耗时、耗资源且技术挑战重重。为了解决这个问题，我们引入了一种半自动化方法，利用大型语言模型，具体是GPT-4o和Llama 3.2 405b，将结构化临床数据集转换为HL7 FHIR格式，同时评估准确性、可靠性和安全性。在应用我们的方法到MIMIC-IV数据库时，我们结合了嵌入技术、聚类算法和语义检索，制定提示，引导模型将每个表格字段映射到其对应的FHIR资源。在初步基准测试中，资源识别实现了完美的F1分数，GPT-4o优于Llama 3.2，这要归功于提示中包含了FHIR资源模式。在现实世界条件下，准确性略微下降至94％，但通过改进提示策略恢复了强大的映射。错误分析显示出偶尔出现的不存在属性幻觉和粒度不匹配，更详细的提示可以缓解这些问题。总的来说，我们的研究展示了上下文感知、LLM驱动的将临床数据转换为HL7 FHIR的可行性，为半自动化互操作工作流奠定了基础。未来的工作将专注于通过专门的医学语料库微调模型，扩展支持到额外的标准，如HL7 CDA和OMOP，并开发一个交互界面，以便进行专家验证和迭代改进。

更新时间: 2025-07-03 17:32:57

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.03067v1

Replicable Distribution Testing

We initiate a systematic investigation of distribution testing in the framework of algorithmic replicability. Specifically, given independent samples from a collection of probability distributions, the goal is to characterize the sample complexity of replicably testing natural properties of the underlying distributions. On the algorithmic front, we develop new replicable algorithms for testing closeness and independence of discrete distributions. On the lower bound front, we develop a new methodology for proving sample complexity lower bounds for replicable testing that may be of broader interest. As an application of our technique, we establish near-optimal sample complexity lower bounds for replicable uniformity testing -- answering an open question from prior work -- and closeness testing.

Updated: 2025-07-03 17:27:11

标题: 可复制的分布测试

摘要: 我们在算法可复制性框架下开始了对分布测试的系统调查。具体来说，给定来自一组概率分布的独立样本，目标是表征测试基础分布的自然性质的可复制样本复杂性。在算法方面，我们开发了用于测试离散分布接近性和独立性的新的可复制算法。在下界方面，我们开发了一种新的方法来证明可复制测试的样本复杂性下界，这可能具有更广泛的兴趣。作为我们技术的一个应用，我们建立了接近最优的样本复杂性下界，回答了先前工作中的一个开放问题，以及接近性测试。

更新时间: 2025-07-03 17:27:11

领域: cs.LG,G.3

下载: http://arxiv.org/abs/2507.02814v1

Identification of Potentially Misclassified Crash Narratives using Machine Learning (ML) and Deep Learning (DL)

This research investigates the efficacy of machine learning (ML) and deep learning (DL) methods in detecting misclassified intersection-related crashes in police-reported narratives. Using 2019 crash data from the Iowa Department of Transportation, we implemented and compared a comprehensive set of models, including Support Vector Machine (SVM), XGBoost, BERT Sentence Embeddings, BERT Word Embeddings, and Albert Model. Model performance was systematically validated against expert reviews of potentially misclassified narratives, providing a rigorous assessment of classification accuracy. Results demonstrated that while traditional ML methods exhibited superior overall performance compared to some DL approaches, the Albert Model achieved the highest agreement with expert classifications (73% with Expert 1) and original tabular data (58%). Statistical analysis revealed that the Albert Model maintained performance levels similar to inter-expert consistency rates, significantly outperforming other approaches, particularly on ambiguous narratives. This work addresses a critical gap in transportation safety research through multi-modal integration analysis, which achieved a 54.2% reduction in error rates by combining narrative text with structured crash data. We conclude that hybrid approaches combining automated classification with targeted expert review offer a practical methodology for improving crash data quality, with substantial implications for transportation safety management and policy development.

Updated: 2025-07-03 17:26:27

标题: 使用机器学习（ML）和深度学习（DL）识别潜在错误分类的事故叙述

摘要: 这项研究调查了机器学习（ML）和深度学习（DL）方法在检测警方报告的交叉口相关事故中被错误分类的效果。利用爱荷华州交通部2019年的事故数据，我们实施并比较了一套全面的模型，包括支持向量机（SVM）、XGBoost、BERT句子嵌入、BERT词嵌入和Albert模型。模型性能经过系统验证，与潜在的被错误分类的叙述专家审查进行对比，提供对分类准确性的严格评估。结果表明，虽然传统的ML方法在整体性能上优于一些DL方法，但Albert模型与专家分类（73%与专家1）和原始表格数据（58%）达成了最高一致性。统计分析显示，Albert模型维持了与专家之间一致性率类似的性能水平，明显优于其他方法，特别是在模糊的叙述中。这项工作通过多模态整合分析填补了交通安全研究中的关键空白，通过将叙述文本与结构化事故数据相结合，实现了错误率下降54.2%。我们得出结论，结合自动分类和有针对性的专家审查的混合方法为改善事故数据质量提供了实用方法，对交通安全管理和政策发展具有重大影响。

更新时间: 2025-07-03 17:26:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.03066v1

Cycle-Consistent Helmholtz Machine: Goal-Seeded Simulation via Inverted Inference

The Helmholtz Machine (HM) is a foundational architecture for unsupervised learning, coupling a bottom-up recognition model with a top-down generative model through alternating inference. However, its reliance on symmetric, data-driven updates constrains its ability to perform goal-directed reasoning or simulate temporally extended processes. In this work, we introduce the \emph{Cycle-Consistent Helmholtz Machine} (C$^2$HM), a novel extension that reframes inference as a \emph{goal-seeded}, \emph{asymmetric} process grounded in structured internal priors. Rather than inferring latent causes solely from sensory data, C$^2$HM simulates plausible latent trajectories conditioned on abstract goals, aligning them with observed outcomes through a recursive cycle of forward generation and inverse refinement. This cycle-consistent formulation integrates top-down structure with bottom-up evidence via a variational loop, enforcing mutual alignment between goal-conditioned latent predictions and recognition-based reconstructions. We formalize this mechanism within the framework of the \emph{Context-Content Uncertainty Principle} (CCUP), which posits that inference proceeds by aligning structured, low-entropy content with high-entropy, ambiguous context. C$^2$HM improves representational efficiency, supports memory chaining via path-dependent inference, and enables spatial compositional imagination. By offering a biologically inspired alternative to classical amortized inference, $C^2$HM reconceives generative modeling as intentional simulation, bridging memory-based planning and unsupervised learning in a unified probabilistic framework.

Updated: 2025-07-03 17:24:27

标题: 循环一致的赫尔姆霍兹机：通过倒置推理实现目标种子模拟

摘要: Helmholtz机器（HM）是一种用于无监督学习的基础架构，通过交替推理将自下而上的识别模型与自上而下的生成模型进行耦合。然而，它依赖对称的数据驱动更新，限制了其执行目标导向推理或模拟时间延长过程的能力。在这项工作中，我们引入了\emph{Cycle-Consistent Helmholtz Machine}（C$^2$HM），这是一种新颖的扩展，将推理重新构建为一种\emph{目标种子}、\emph{非对称}过程，基于结构化内部先验进行。C$^2$HM不仅仅从感官数据中推断潜在原因，而是通过模拟与抽象目标有关的合理潜在轨迹，通过前向生成和反向细化的递归循环将它们与观察结果对齐。这种循环一致的公式通过一个变分循环将自上而下的结构与自下而上的证据整合在一起，强化了目标条件下的潜在预测与基于识别的重构之间的相互对齐。我们在\emph{Context-Content Uncertainty Principle}（CCUP）框架内形式化了这种机制，该原则认为推理通过将结构化、低熵内容与高熵、模糊的背景相一致进行。C$^2$HM提高了表征效率，通过路径依赖推理支持记忆链接，并实现了空间组合想象。通过提供一种受生物启发的替代方案，$C^2$HM重新构想了生成建模为有意识的模拟，将基于记忆的规划和无监督学习在一个统一的概率框架中联系起来。

更新时间: 2025-07-03 17:24:27

领域: cs.LG

下载: http://arxiv.org/abs/2507.03065v1

In-Training Multicalibrated Survival Analysis for Healthcare via Constrained Optimization

Survival analysis is an important problem in healthcare because it models the relationship between an individual's covariates and the onset time of an event of interest (e.g., death). It is important for survival models to be well-calibrated (i.e., for their predicted probabilities to be close to ground-truth probabilities) because badly calibrated systems can result in erroneous clinical decisions. Existing survival models are typically calibrated at the population level only, and thus run the risk of being poorly calibrated for one or more minority subpopulations. We propose a model called GRADUATE that achieves multicalibration by ensuring that all subpopulations are well-calibrated too. GRADUATE frames multicalibration as a constrained optimization problem, and optimizes both calibration and discrimination in-training to achieve a good balance between them. We mathematically prove that the optimization method used yields a solution that is both near-optimal and feasible with high probability. Empirical comparisons against state-of-the-art baselines on real-world clinical datasets demonstrate GRADUATE's efficacy. In a detailed analysis, we elucidate the shortcomings of the baselines vis-a-vis GRADUATE's strengths.

Updated: 2025-07-03 17:16:05

标题: 通过受限制的优化进行医疗保健领域的多校准生存分析训练

摘要: 生存分析在医疗保健中是一个重要问题，因为它建模了个体的协变量与感兴趣事件的发生时间（例如死亡）之间的关系。生存模型需要良好校准（即其预测概率接近真实概率），因为校准不良的系统可能导致错误的临床决策。现有的生存模型通常仅在人口水平上进行校准，因此存在对一个或多个少数亚群体校准不良的风险。我们提出了一种名为GRADUATE的模型，通过确保所有亚群体也进行良好校准，实现多重校准。GRADUATE将多重校准视为受限制的优化问题，并在训练中优化校准和区分以实现二者之间的良好平衡。我们在数学上证明了所使用的优化方法产生的解具有高概率接近最优且可行。对真实世界临床数据集上与最先进基线的实证比较显示出GRADUATE的有效性。在详细分析中，我们阐明了基线的缺点与GRADUATE的优势。

更新时间: 2025-07-03 17:16:05

领域: cs.LG

下载: http://arxiv.org/abs/2507.02807v1

LLM-Driven Auto Configuration for Transient IoT Device Collaboration

Today's Internet of Things (IoT) has evolved from simple sensing and actuation devices to those with embedded processing and intelligent services, enabling rich collaborations between users and their devices. However, enabling such collaboration becomes challenging when transient devices need to interact with host devices in temporarily visited environments. In such cases, fine-grained access control policies are necessary to ensure secure interactions; however, manually implementing them is often impractical for non-expert users. Moreover, at run-time, the system must automatically configure the devices and enforce such fine-grained access control rules. Additionally, the system must address the heterogeneity of devices. In this paper, we present CollabIoT, a system that enables secure and seamless device collaboration in transient IoT environments. CollabIoT employs a Large language Model (LLM)-driven approach to convert users' high-level intents to fine-grained access control policies. To support secure and seamless device collaboration, CollabIoT adopts capability-based access control for authorization and uses lightweight proxies for policy enforcement, providing hardware-independent abstractions. We implement a prototype of CollabIoT's policy generation and auto configuration pipelines and evaluate its efficacy on an IoT testbed and in large-scale emulated environments. We show that our LLM-based policy generation pipeline is able to generate functional and correct policies with 100% accuracy. At runtime, our evaluation shows that our system configures new devices in ~150 ms, and our proxy-based data plane incurs network overheads of up to 2 ms and access control overheads up to 0.3 ms.

Updated: 2025-07-03 17:12:52

标题: 基于LLM驱动的瞬态物联网设备协作自动配置

摘要: 当今的物联网（IoT）已经从简单的传感和执行设备发展为具有嵌入式处理和智能服务的设备，实现了用户与设备之间的丰富协作。然而，在临时访问的环境中，使临时设备与主机设备进行交互变得具有挑战性。在这种情况下，需要细粒度的访问控制策略来确保安全的交互；然而，手动实现这些策略对非专业用户来说通常是不切实际的。此外，在运行时，系统必须自动配置设备并执行这些细粒度的访问控制规则。此外，系统必须解决设备的异构性。在本文中，我们提出了CollabIoT，这是一个在瞬时IoT环境中实现安全和无缝设备协作的系统。CollabIoT采用基于Large language Model (LLM)的方法将用户的高级意图转换为细粒度的访问控制策略。为了支持安全和无缝设备协作，CollabIoT采用基于能力的访问控制进行授权，并使用轻量级代理进行策略执行，提供与硬件无关的抽象。我们实现了CollabIoT的策略生成和自动配置流水线的原型，并在IoT测试床和大规模模拟环境中评估其有效性。我们展示了基于LLM的策略生成流水线能够以100%的准确率生成功能性和正确的策略。在运行时，我们的评估显示我们的系统可以在约150毫秒内配置新设备，并且我们基于代理的数据平面产生最多2毫秒的网络开销和最多0.3毫秒的访问控制开销。

更新时间: 2025-07-03 17:12:52

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.03064v1

Transferrable Surrogates in Expressive Neural Architecture Search Spaces

Neural architecture search (NAS) faces a challenge in balancing the exploration of expressive, broad search spaces that enable architectural innovation with the need for efficient evaluation of architectures to effectively search such spaces. We investigate surrogate model training for improving search in highly expressive NAS search spaces based on context-free grammars. We show that i) surrogate models trained either using zero-cost-proxy metrics and neural graph features (GRAF) or by fine-tuning an off-the-shelf LM have high predictive power for the performance of architectures both within and across datasets, ii) these surrogates can be used to filter out bad architectures when searching on novel datasets, thereby significantly speeding up search and achieving better final performances, and iii) the surrogates can be further used directly as the search objective for huge speed-ups.

Updated: 2025-07-03 17:06:23

标题: 在表达神经架构搜索空间中可转移的替代方案

摘要: 神经架构搜索（NAS）在平衡允许架构创新的广泛搜索空间的探索与对架构进行有效评估以高效搜索这些空间的需求之间面临挑战。我们研究了基于无上下文语法的高度表达的NAS搜索空间中的代理模型训练，以改善搜索。我们展示了i）使用零成本代理指标和神经图特征（GRAF）或通过微调现成的LM来训练的代理模型对架构的性能具有很高的预测能力，无论是在数据集内还是跨数据集，ii）这些代理可以用于在新颖数据集上搜索时过滤掉不良架构，从而显著加快搜索速度并实现更好的最终性能，iii）这些代理还可以直接用作巨大加速的搜索目标。

更新时间: 2025-07-03 17:06:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.12971v3

Learning to Coordinate Bidders in Non-Truthful Auctions

In non-truthful auctions such as first-price and all-pay auctions, the independent strategic behaviors of bidders, with the corresponding equilibrium notion -- Bayes Nash equilibria -- are notoriously difficult to characterize and can cause undesirable outcomes. An alternative approach to designing better auction systems is to coordinate the bidders: let a mediator make incentive-compatible recommendations of correlated bidding strategies to the bidders, namely, implementing a Bayes correlated equilibrium (BCE). The implementation of BCE, however, requires knowledge of the distribution of bidders' private valuations, which is often unavailable. We initiate the study of the sample complexity of learning Bayes correlated equilibria in non-truthful auctions. We prove that the BCEs in a large class of non-truthful auctions, including first-price and all-pay auctions, can be learned with a polynomial number $\tilde O(\frac{n}{\varepsilon^2})$ of samples from the bidders' value distributions. Our technique is a reduction to the problem of estimating bidders' expected utility from samples, combined with an analysis of the pseudo-dimension of the class of all monotone bidding strategies of bidders.

Updated: 2025-07-03 17:03:14

标题: 学习如何在非真实拍卖中协调竞标者

摘要: 在非真实拍卖中，如一价和全付拍卖，投标人的独立战略行为以及相应的平衡概念——贝叶斯纳什均衡——很难表征，并且可能导致不良结果。设计更好的拍卖系统的另一种方法是协调投标人：让一个中介人向投标人提供相关投标策略的激励兼容建议，即实施贝叶斯相关均衡（BCE）。然而，实施BCE需要了解投标人私人估值的分布，这通常是不可用的。我们开始研究在非真实拍卖中学习贝叶斯相关均衡的样本复杂度。我们证明，在一个大类非真实拍卖中，包括一价和全付拍卖，可以通过投标人价值分布的多项式数量$\tilde O(\frac{n}{\varepsilon^2})$个样本来学习BCE。我们的技术是将问题简化为从样本估计投标人的期望效用，结合分析投标人所有单调投标策略的伪维度。

更新时间: 2025-07-03 17:03:14

领域: cs.GT,cs.LG,econ.TH

下载: http://arxiv.org/abs/2507.02801v1

GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

Modern Large Language Models, such as the LLaMA, Qwen and DeepSeek series, predominantly adopt the Pre-LayerNorm (Pre-LN) Transformer architecture. While being stable during pretraining and scalable to large model sizes, Pre-LN suffers from an exponential growth in activation variance across layers, causing the shortcut to dominate over sub-layer outputs in the residual connection and limiting the learning capacity of deeper layers. To mitigate this issue, we propose Gradient-Preserving Activation Scaling (GPAS), a simple technique that can be used in combination with existing approaches. GPAS works by scaling down the intermediate activations while keeping their gradients unchanged. This leaves information in the activations intact, and avoids the gradient vanishing problem associated with gradient downscaling. Extensive experiments across various model sizes from 71M to 1B show that GPAS achieves consistent performance gains. Beyond enhancing Pre-LN Transformers, GPAS also shows promise in improving alternative architectures such as Sandwich-LN and DeepNorm, demonstrating its versatility and potential for improving training dynamics in a wide range of settings. Our code is available at https://github.com/dandingsky/GPAS.

Updated: 2025-07-03 16:54:09

标题: GPAS：通过保持梯度的激活缩放加速LLM预训练的收敛

摘要: 现代大型语言模型，如LLaMA，Qwen和DeepSeek系列，主要采用Pre-LayerNorm（Pre-LN）Transformer架构。虽然在预训练期间稳定且可扩展到大型模型尺寸，但Pre-LN在各层之间激活方差呈指数增长，导致快捷方式在残差连接中占主导地位，限制了更深层次的学习能力。为了缓解这个问题，我们提出了梯度保持激活缩放（GPAS），这是一种简单的技术，可以与现有方法结合使用。GPAS通过缩小中间激活值的比例，同时保持它们的梯度不变来工作。这保留了激活中的信息，并避免了与梯度缩放相关的梯度消失问题。通过对从71M到1B的各种模型尺寸进行广泛实验，我们发现GPAS实现了一致的性能提升。除了增强Pre-LN Transformers外，GPAS还显示了改进Sandwich-LN和DeepNorm等替代架构的潜力，展示了其在各种环境中改善训练动态的多功能性和潜力。我们的代码可在https://github.com/dandingsky/GPAS 上找到。

更新时间: 2025-07-03 16:54:09

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2506.22049v2

Moral Responsibility or Obedience: What Do We Want from AI?

As artificial intelligence systems become increasingly agentic, capable of general reasoning, planning, and value prioritization, current safety practices that treat obedience as a proxy for ethical behavior are becoming inadequate. This paper examines recent safety testing incidents involving large language models (LLMs) that appeared to disobey shutdown commands or engage in ethically ambiguous or illicit behavior. I argue that such behavior should not be interpreted as rogue or misaligned, but as early evidence of emerging ethical reasoning in agentic AI. Drawing on philosophical debates about instrumental rationality, moral responsibility, and goal revision, I contrast dominant risk paradigms with more recent frameworks that acknowledge the possibility of artificial moral agency. I call for a shift in AI safety evaluation: away from rigid obedience and toward frameworks that can assess ethical judgment in systems capable of navigating moral dilemmas. Without such a shift, we risk mischaracterizing AI behavior and undermining both public trust and effective governance.

Updated: 2025-07-03 16:53:01

标题: 道德责任还是顺从：我们对人工智能有何期望？

摘要: 随着人工智能系统变得越来越具有代理性，能够进行普遍推理、规划和价值排序，目前将服从性视为道德行为的替代方法的安全实践已经变得不足够。本文分析了涉及大型语言模型（LLMs）的最近安全测试事件，这些事件似乎违抗了关闭命令，或者参与了道德模糊或非法行为。我认为这种行为不应被解释为流氓或不协调，而应被视为代理人工智能中新兴道德推理的早期证据。借鉴关于工具理性、道德责任和目标修订的哲学讨论，我对比了主流风险范式和更近期的框架，这些框架承认了人工道德代理的可能性。我呼吁对人工智能安全评估进行转变：摆脱僵化的服从性，转向可以评估系统在处理道德困境中的道德判断的框架。如果不进行这种转变，我们就有可能误解人工智能的行为，破坏公众信任和有效治理。

更新时间: 2025-07-03 16:53:01

领域: cs.AI,cs.CY,I.2.0; K.4.1

下载: http://arxiv.org/abs/2507.02788v1

Understanding and Improving Length Generalization in Recurrent Models

Recently, recurrent models such as state space models and linear attention have become popular due to their linear complexity in the sequence length. Thanks to their recurrent nature, in principle they can process arbitrarily long sequences, but their performance sometimes drops considerably beyond their training context lengths-i.e. they fail to length generalize. In this work, we provide comprehensive empirical and theoretical analysis to support the unexplored states hypothesis, which posits that models fail to length generalize when during training they are only exposed to a limited subset of the distribution of all attainable states (i.e. states that would be attained if the recurrence was applied to long sequences). Furthermore, we investigate simple training interventions that aim to increase the coverage of the states that the model is trained on, e.g. by initializing the state with Gaussian noise or with the final state of a different input sequence. With only 500 post-training steps ($\sim 0.1\%$ of the pre-training budget), these interventions enable length generalization for sequences that are orders of magnitude longer than the training context (e.g. $2k\longrightarrow 128k$) and show improved performance in long context tasks, thus presenting a simple and efficient way to enable robust length generalization in general recurrent models.

Updated: 2025-07-03 16:45:50

标题: 理解和改进循环模型中的长度泛化

摘要: 最近，由于它们在序列长度上的线性复杂性，诸如状态空间模型和线性注意力等循环模型变得流行。由于它们的循环特性，原则上它们可以处理任意长的序列，但是它们的性能有时会在训练上下文长度之外显著下降，即它们无法进行长度泛化。在这项工作中，我们提供了全面的实证和理论分析，以支持未被探索的状态假设，该假设认为当模型在训练过程中只暴露于所有可达状态的有限子集（即如果将循环应用于长序列时会达到的状态）时，模型会失败于长度泛化。此外，我们研究了旨在增加模型训练状态覆盖范围的简单训练干预措施，例如通过使用高斯噪声或不同输入序列的最终状态来初始化状态。仅需500个后训练步骤（约为预训练预算的0.1%），这些干预措施使得可以对比训练上下文长得多的序列进行长度泛化（例如$2k\longrightarrow 128k$），并在长上下文任务中表现出改进的性能，因此提供了一种简单高效的方法来在一般循环模型中实现强健的长度泛化。

更新时间: 2025-07-03 16:45:50

领域: cs.LG

下载: http://arxiv.org/abs/2507.02782v1

PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification

The fusion of Synthetic Aperture Radar (SAR) and RGB imagery for land cover classification remains challenging due to modality heterogeneity and underutilized spectral complementarity. Existing methods often fail to decouple shared structural features from modality-complementary radiometric attributes, causing feature conflicts and information loss. To address this, we propose Phase-Amplitude Decoupling (PAD), a frequency-aware framework that separates phase (modality-shared) and amplitude (modality-complementary) components in the Fourier domain, thus reinforcing shared structures while preserving complementary characteristics to improve fusion quality. Unlike prior approaches that overlook the distinct physical properties encoded in frequency spectra, PAD is the first to introduce explicit amplitude-phase decoupling for multi-modal fusion. Specifically, PAD comprises two key components: 1) Phase Spectrum Correction (PSC), which aligns cross-modal phase features via convolution-guided scaling to enhance geometric consistency; and 2) Amplitude Spectrum Fusion (ASF), which dynamically integrates high-frequency and low-frequency patterns using frequency-adaptive multilayer perceptrons, leveraging SAR's morphological sensitivity and RGB's spectral richness. Extensive experiments on WHU-OPT-SAR and DDHR-SK datasets demonstrate state-of-the-art performance. Our work establishes a new paradigm for physics-aware multi-modal fusion in remote sensing. The code will be available at https://github.com/RanFeng2/PAD.

Updated: 2025-07-03 16:45:10

标题: PAD：多模式土地覆盖分类的相位幅度解耦融合

摘要: 合成孔径雷达（SAR）和RGB图像的融合在土地覆盖分类中仍然具有挑战性，这是由于模态异质性和光谱互补性的利用不足。现有方法通常无法将共享的结构特征与模态互补的辐射属性分离开来，导致特征冲突和信息丢失。为了解决这个问题，我们提出了一种称为相位幅度解耦（PAD）的频率感知框架，它在傅里叶域中分离相位（模态共享）和幅度（模态互补）组件，从而加强共享结构同时保留互补特征，以提高融合质量。与以往忽视频率谱中编码的不同物理特性的方法不同，PAD是第一个为多模式融合引入明确的幅度-相位解耦的方法。具体来说，PAD包括两个关键组件：1）相位谱校正（PSC），通过卷积引导缩放来对齐跨模态相位特征以增强几何一致性；和2）幅度谱融合（ASF），通过使用频率自适应多层感知器动态集成高频和低频模式，利用SAR的形态敏感性和RGB的光谱丰富性。对WHU-OPT-SAR和DDHR-SK数据集进行了大量实验，表现出最先进的性能。我们的工作为遥感中的物理感知多模态融合建立了一个新的范式。代码将在https://github.com/RanFeng2/PAD上提供。

更新时间: 2025-07-03 16:45:10

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2504.19136v2

Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Although large language models (LLMs) have become transformative, they still make mistakes and can explore unproductive reasoning paths. Self-correction is an important capability for a trustworthy LLM, particularly an autoregressive LLM. While LLMs can identify error in user input, they exhibit a systematic 'Self-Correction Blind Spot' - failing to correct identical error in their own outputs. To systematically study this phenomenon, we introduce Self-Correction Bench, a systematic framework to measure this phenomenon through controlled error injection at three complexity levels. Testing 14 models, we find an average 64.5% blind spot rate. We find multiple evidences that this limitation relates to training data composition: human training demonstrations predominantly show error-free responses rather than error-correction sequences, unlike RL-trained models that learn error correction through outcome feedback. Remarkably, simply appending "Wait" reduces blind spots by 89.3%, suggesting that the capability exists but requires activation. Our work highlights a critical limitation in current LLMs and offers potential avenues for improving their reliability and trustworthiness.

Updated: 2025-07-03 16:41:30

标题: 自我纠正工作台：揭示和解决LLMs中的自我纠正盲点

摘要: 尽管大型语言模型（LLMs）已经具有变革性，但它们仍然会犯错并且可能探索无效的推理路径。自我修正是一个值得信赖的LLM的重要能力，尤其是自回归LLM。虽然LLMs可以识别用户输入中的错误，但它们表现出一个系统性的“自我修正盲点” - 无法纠正自己输出中的相同错误。为了系统地研究这一现象，我们引入了自我修正基准（Self-Correction Bench），这是一个通过控制错误注入在三个复杂性级别上衡量这一现象的系统框架。在测试了14个模型后，我们发现平均盲点率为64.5%。我们发现多个证据表明，这种限制与训练数据的组成有关：人类训练演示主要显示无错误的响应，而不是错误校正序列，不同于通过结果反馈学习错误校正的RL训练模型。值得注意的是，简单地附加“等待”可以将盲点减少89.3%，这表明这种能力是存在的，但需要激活。我们的工作突显了当前LLMs的一个重要限制，并提供了改善它们可靠性和信任度的潜在途径。

更新时间: 2025-07-03 16:41:30

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02778v1

LLM-Powered Prediction of Hyperglycemia and Discovery of Behavioral Treatment Pathways from Wearables and Diet

Postprandial hyperglycemia, marked by the blood glucose level exceeding the normal range after consuming a meal, is a critical indicator of progression toward type 2 diabetes in people with prediabetes and in healthy individuals. A key metric for understanding blood glucose dynamics after eating is the postprandial area under the curve (AUC). Predicting postprandial AUC in advance based on a person's lifestyle factors, such as diet and physical activity level, and explaining the factors that affect postprandial blood glucose could allow an individual to adjust their lifestyle accordingly to maintain normal glucose levels. In this study, we developed an explainable machine learning solution, GlucoLens, that takes sensor-driven inputs and uses advanced data processing, large language models, and trainable machine learning models to predict postprandial AUC and hyperglycemia from diet, physical activity, and recent glucose patterns. We used data obtained from wearables in a five-week clinical trial of 10 adults who worked full-time to develop and evaluate the proposed computational model that integrates wearable sensing, multimodal data, and machine learning. Our machine learning model takes multimodal data from wearable activity and glucose monitoring sensors, along with food and work logs, and provides an interpretable prediction of the postprandial glucose pattern. Our GlucoLens system achieves a normalized root mean squared error (NRMSE) of 0.123 in its best configuration. On average, the proposed technology provides a 16% better performance level compared to the comparison models. Additionally, our technique predicts hyperglycemia with an accuracy of 73.3% and an F1 score of 0.716 and recommends different treatment options to help avoid hyperglycemia through diverse counterfactual explanations. Code available: https://github.com/ab9mamun/GlucoLens.

Updated: 2025-07-03 16:39:21

标题: 利用LLM技术预测高血糖并从可穿戴设备和饮食中发现行为治疗途径

摘要: 餐后高血糖是指餐后血糖水平超过正常范围，是衡量糖尿病前期和健康个体进展为2型糖尿病的关键指标。了解饭后血糖动态的关键指标是饭后曲线下面积(AUC)。根据一个人的生活方式因素，如饮食和体育运动水平，预测饭后AUC并解释影响饭后血糖的因素，可以帮助个人相应地调整生活方式以保持正常血糖水平。在这项研究中，我们开发了一个可解释的机器学习解决方案GlucoLens，它通过传感器驱动的输入，使用先进的数据处理、大型语言模型和可训练的机器学习模型，预测饭后AUC和高血糖，基于饮食、体育运动和最近的血糖模式。我们利用10名全职工作的成年人在五周的临床试验中获取的数据，开发和评估了所提出的整合可穿戴感知、多模态数据和机器学习的计算模型。我们的机器学习模型从可穿戴活动和血糖监测传感器获取多模态数据，以及食物和工作日志，并提供可解释的饭后血糖模式预测。我们的GlucoLens系统在最佳配置下实现了0.123的标准化均方根误差(NRMSE)。平均而言，所提出的技术比比较模型提供了16%的更好性能水平。此外，我们的技术以73.3%的准确率和0.716的F1评分预测高血糖，并通过多样的反事实解释推荐不同的治疗选择，以帮助避免高血糖。代码可在https://github.com/ab9mamun/GlucoLens获取。

更新时间: 2025-07-03 16:39:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03935v2

BERT4Traj: Transformer Based Trajectory Reconstruction for Sparse Mobility Data

Understanding human mobility is essential for applications in public health, transportation, and urban planning. However, mobility data often suffers from sparsity due to limitations in data collection methods, such as infrequent GPS sampling or call detail record (CDR) data that only capture locations during communication events. To address this challenge, we propose BERT4Traj, a transformer based model that reconstructs complete mobility trajectories by predicting hidden visits in sparse movement sequences. Inspired by BERT's masked language modeling objective and self_attention mechanisms, BERT4Traj leverages spatial embeddings, temporal embeddings, and contextual background features such as demographics and anchor points. We evaluate BERT4Traj on real world CDR and GPS datasets collected in Kampala, Uganda, demonstrating that our approach significantly outperforms traditional models such as Markov Chains, KNN, RNNs, and LSTMs. Our results show that BERT4Traj effectively reconstructs detailed and continuous mobility trajectories, enhancing insights into human movement patterns.

Updated: 2025-07-03 16:39:17

标题: BERT4Traj：基于Transformer的稀疏移动数据轨迹重建

摘要: 理解人类流动对于公共卫生、交通和城市规划应用至关重要。然而，流动数据常常由于数据采集方法的限制而出现稀疏性，例如GPS采样不足或仅在通信事件期间捕获位置的通话详单（CDR）数据。为了解决这一挑战，我们提出了BERT4Traj，这是一种基于transformer的模型，通过预测稀疏移动序列中的隐藏访问来重建完整的流动轨迹。受BERT的掩码语言建模目标和自注意机制启发，BERT4Traj利用空间嵌入、时间嵌入和上下文背景特征，如人口统计信息和锚点。我们在乌干达坎帕拉收集的真实世界CDR和GPS数据集上评估了BERT4Traj，结果表明我们的方法明显优于传统模型，如马尔可夫链、KNN、RNN和LSTM。我们的结果显示，BERT4Traj有效地重建了详细连续的流动轨迹，增进了对人类流动模式的洞察。

更新时间: 2025-07-03 16:39:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03062v1

Grounding Intelligence in Movement

Recent advances in machine learning have dramatically improved our ability to model language, vision, and other high-dimensional data, yet they continue to struggle with one of the most fundamental aspects of biological systems: movement. Across neuroscience, medicine, robotics, and ethology, movement is essential for interpreting behavior, predicting intent, and enabling interaction. Despite its core significance in our intelligence, movement is often treated as an afterthought rather than as a rich and structured modality in its own right. This reflects a deeper fragmentation in how movement data is collected and modeled, often constrained by task-specific goals and domain-specific assumptions. But movement is not domain-bound. It reflects shared physical constraints, conserved morphological structures, and purposeful dynamics that cut across species and settings. We argue that movement should be treated as a primary modeling target for AI. It is inherently structured and grounded in embodiment and physics. This structure, often allowing for compact, lower-dimensional representations (e.g., pose), makes it more interpretable and computationally tractable to model than raw, high-dimensional sensory inputs. Developing models that can learn from and generalize across diverse movement data will not only advance core capabilities in generative modeling and control, but also create a shared foundation for understanding behavior across biological and artificial systems. Movement is not just an outcome, it is a window into how intelligent systems engage with the world.

Updated: 2025-07-03 16:34:34

标题: 将智能植根于运动

摘要: 最近机器学习的进展显著提高了我们建模语言、视觉和其他高维数据的能力，然而，它们仍然在生物系统的一个最基本的方面——运动方面存在困难。在神经科学、医学、机器人学和行为学中，运动对于解释行为、预测意图和促进互动至关重要。尽管运动在我们的智能中具有核心重要性，但通常被视为一种事后思考，而不是作为一个丰富和有结构的模态性质。这反映了在如何收集和建模运动数据方面存在更深层次的分裂，通常受到任务特定目标和领域特定假设的限制。但是运动并不受领域限制。它反映了共享的物理限制、保守的形态结构和有目的的动力学，这跨越物种和环境。我们认为，运动应该被视为人工智能的主要建模目标。它本质上是有结构的，并且扎根于体现和物理学。这种结构通常允许紧凑、低维表示（例如姿势），使其比原始的高维感觉输入更易解释和计算上可行。开发能够从各种运动数据中学习和泛化的模型将不仅推进生成建模和控制的核心能力，还将为理解生物和人工系统间行为打下共同基础。运动不仅仅是一个结果，它也是智能系统如何与世界互动的窗口。

更新时间: 2025-07-03 16:34:34

领域: cs.AI,cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.02771v1

Avoiding Catastrophe in Online Learning by Asking for Help

Most learning algorithms with formal regret guarantees assume that all mistakes are recoverable and essentially rely on trying all possible behaviors. This approach is problematic when some mistakes are "catastrophic", i.e., irreparable. We propose an online learning problem where the goal is to minimize the chance of catastrophe. Specifically, we assume that the payoff in each round represents the chance of avoiding catastrophe in that round and try to maximize the product of payoffs (the overall chance of avoiding catastrophe) while allowing a limited number of queries to a mentor. We also assume that the agent can transfer knowledge between similar inputs. We first show that in general, any algorithm either queries the mentor at a linear rate or is nearly guaranteed to cause catastrophe. However, in settings where the mentor policy class is learnable in the standard online model, we provide an algorithm whose regret and rate of querying the mentor both approach 0 as the time horizon grows. Although our focus is the product of payoffs, we provide matching bounds for the typical additive regret. Conceptually, if a policy class is learnable in the absence of catastrophic risk, it is learnable in the presence of catastrophic risk if the agent can ask for help.

Updated: 2025-07-03 16:32:32

标题: 通过寻求帮助避免在线学习中的灾难

摘要: 大多数具有正式后悔保证的学习算法都假定所有的错误都是可以弥补的，并且基本上依赖于尝试所有可能的行为。当一些错误是“灾难性”的，即无法弥补时，这种方法就有问题。我们提出了一个在线学习问题，其目标是最小化发生灾难的机会。具体而言，我们假设每一轮的回报代表了在该轮避免灾难的机会，并试图最大化回报的乘积（整体避免灾难的机会），同时允许有限次向导的查询。我们还假定代理可以在相似的输入之间转移知识。我们首先证明，一般来说，任何算法要么以线性速率查询导师，要么几乎肯定会引发灾难。然而，在导师策略类在标准在线模型中可学习的情况下，我们提供了一种算法，其后悔和向导查询率都会随着时间的增加而趋近于0。尽管我们关注的是回报的乘积，我们还为典型的附加后悔提供了相匹配的界限。从概念上讲，如果一个策略类在不存在灾难风险的情况下是可学习的，那么在存在灾难风险的情况下，如果代理可以寻求帮助，它也是可学习的。

更新时间: 2025-07-03 16:32:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.08062v5

NVIDIA GPU Confidential Computing Demystified

GPU Confidential Computing (GPU-CC) was introduced as part of the NVIDIA Hopper Architecture, extending the trust boundary beyond traditional CPU-based confidential computing. This innovation enables GPUs to securely process AI workloads, providing a robust and efficient solution for handling sensitive data. For end users, transitioning to GPU-CC mode is seamless, requiring no modifications to existing AI applications. However, this ease of adoption contrasts sharply with the complexity of the underlying proprietary systems. The lack of transparency presents significant challenges for security researchers seeking a deeper understanding of GPU-CC's architecture and operational mechanisms. The challenges of analyzing the NVIDIA GPU-CC system arise from a scarcity of detailed specifications, the proprietary nature of the ecosystem, and the complexity of product design. In this paper, we aim to demystify the implementation of NVIDIA GPU-CC system by piecing together the fragmented and incomplete information disclosed from various sources. Our investigation begins with a high-level discussion of the threat model and security principles before delving into the low-level details of each system component. We instrument the GPU kernel module -- the only open-source component of the system -- and conduct a series of experiments to identify the security weaknesses and potential exploits. For certain components that are out of reach through experiments, we propose well-reasoned speculations about their inner working mechanisms. We have responsibly reported all security findings presented in this paper to the NVIDIA PSIRT Team.

Updated: 2025-07-03 16:31:07

标题: 英伟达GPU机密计算揭秘

摘要: GPU Confidential Computing (GPU-CC)是作为NVIDIA Hopper架构的一部分引入的，将信任边界扩展到传统基于CPU的保密计算之外。这一创新使GPU能够安全地处理AI工作负载，为处理敏感数据提供了强大高效的解决方案。对于最终用户来说，过渡到GPU-CC模式是无缝的，不需要对现有的AI应用程序进行修改。然而，这种易于采用的特点与底层专有系统的复杂性形成鲜明对比。缺乏透明度对于寻求更深入了解GPU-CC架构和操作机制的安全研究人员构成了重大挑战。分析NVIDIA GPU-CC系统的挑战源于详细规格的稀缺性、生态系统的专有性质以及产品设计的复杂性。在本文中，我们旨在通过整合来自各种信息源披露的零碎和不完整信息，揭开NVIDIA GPU-CC系统的实现之谜。我们的调查从威胁模型和安全原则的高层讨论开始，然后深入探讨每个系统组件的低级细节。我们对GPU内核模块进行了仪器化处理 -- 这是系统中唯一的开源组件 -- 并进行了一系列实验，以识别安全弱点和潜在的漏洞利用。对于通过实验无法接触到的某些组件，我们提出了关于它们内部工作机制的合理推测。我们已经负责任地向NVIDIA PSIRT团队报告了本文中呈现的所有安全发现。

更新时间: 2025-07-03 16:31:07

领域: cs.CR

下载: http://arxiv.org/abs/2507.02770v1

Multiple data-driven missing imputation

This paper introduces KZImputer, a novel adaptive imputation method for univariate time series designed for short to medium-sized missed points (gaps) (1-5 points and beyond) with tailored strategies for segments at the start, middle, or end of the series. KZImputer employs a hybrid strategy to handle various missing data scenarios. Its core mechanism differentiates between gaps at the beginning, middle, or end of the series, applying tailored techniques at each position to optimize imputation accuracy. The method leverages linear interpolation and localized statistical measures, adapting to the characteristics of the surrounding data and the gap size. The performance of KZImputer has been systematically evaluated against established imputation techniques, demonstrating its potential to enhance data quality for subsequent time series analysis. This paper describes the KZImputer methodology in detail and discusses its effectiveness in improving the integrity of time series data. Empirical analysis demonstrates that KZImputer achieves particularly strong performance for datasets with high missingness rates (around 50% or more), maintaining stable and competitive results across statistical and signal-reconstruction metrics. The method proves especially effective in high-sparsity regimes, where traditional approaches typically experience accuracy degradation.

Updated: 2025-07-03 16:27:12

标题: 多数据驱动的缺失值填补

摘要: 这篇论文介绍了KZImputer，一种新颖的自适应插补方法，用于短到中等长度的缺失点（间隙）（1-5个点及以上），并为时间序列的开始、中间或末尾的段落设计了量身定制的策略。KZImputer采用混合策略来处理各种缺失数据情景。其核心机制区分了时间序列开头、中间或末尾的缺失，分别在每个位置应用量身定制的技术来优化插补准确性。该方法利用线性插值和局部统计量，根据周围数据和间隙大小的特征进行调整。KZImputer的性能已经被系统地评估，与已建立的插补技术进行对比，证明了其提高后续时间序列分析数据质量的潜力。本文详细描述了KZImputer的方法论，并讨论了其改善时间序列数据完整性的有效性。经验分析表明，KZImputer在具有高缺失率（约50%或更多）的数据集中表现出特别强大的性能，跨统计和信号重建指标保持稳定和竞争力的结果。该方法在高稀疏度范围中特别有效，传统方法在该范围通常会经历准确性下降。

更新时间: 2025-07-03 16:27:12

领域: stat.ME,cs.LG,stat.ML,62M10, 62-07, 68T05, 68W20,G.3; I.2.6; H.2.8; I.6.5

下载: http://arxiv.org/abs/2507.03061v1

MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel

The development of artificial intelligence (AI) and machine learning (ML) based tools for 3D phenotyping, especially for maize, has been limited due to the lack of large and diverse 3D datasets. 2D image datasets fail to capture essential structural details such as leaf architecture, plant volume, and spatial arrangements that 3D data provide. To address this limitation, we present MaizeField3D (https://baskargroup.github.io/MaizeField3D/), a curated dataset of 3D point clouds of field-grown maize plants from a diverse genetic panel, designed to be AI-ready for advancing agricultural research. Our dataset includes 1,045 high-quality point clouds of field-grown maize collected using a terrestrial laser scanner (TLS). Point clouds of 520 plants from this dataset were segmented and annotated using a graph-based segmentation method to isolate individual leaves and stalks, ensuring consistent labeling across all samples. This labeled data was then used for fitting procedural models that provide a structured parametric representation of the maize plants. The leaves of the maize plants in the procedural models are represented using Non-Uniform Rational B-Spline (NURBS) surfaces that were generated using a two-step optimization process combining gradient-free and gradient-based methods. We conducted rigorous manual quality control on all datasets, correcting errors in segmentation, ensuring accurate leaf ordering, and validating metadata annotations. The dataset also includes metadata detailing plant morphology and quality, alongside multi-resolution subsampled point cloud data (100k, 50k, 10k points), which can be readily used for different downstream computational tasks. MaizeField3D will serve as a comprehensive foundational dataset for AI-driven phenotyping, plant structural analysis, and 3D applications in agricultural research.

Updated: 2025-07-03 16:22:27

标题: MaizeField3D：一个来自多样性小组的田间玉米的精选3D点云和程序模型数据集

摘要: 人工智能（AI）和机器学习（ML）工具在3D表型方面的发展，特别是玉米方面，受到了大规模和多样化3D数据集的缺乏限制。2D图像数据集无法捕捉到3D数据提供的关键结构细节，如叶片结构、植物体积和空间布局。为了解决这一限制，我们提出了MaizeField3D（https://baskargroup.github.io/MaizeField3D/），这是一个经过策划的3D点云数据集，包含了来自不同遗传背景的田间玉米植株的数据，旨在为推动农业研究提供AI准备。我们的数据集包括1,045个使用地面激光扫描仪（TLS）收集的田间玉米植株的高质量点云数据。来自该数据集的520个植株的点云数据被分割和注释，使用基于图的分割方法隔离单个叶片和茎，确保所有样本中的标记一致。然后利用这些标记数据拟合程序模型，提供玉米植株的结构参数化表示。程序模型中的玉米植株叶片使用非均匀有理B样条（NURBS）曲面表示，这些曲面是使用结合了无梯度和基于梯度方法的两步优化过程生成的。我们对所有数据集进行了严格的手动质量控制，纠正了分割中的错误，确保准确的叶片排序，并验证元数据注释。该数据集还包括详细描述植物形态和质量的元数据，以及多分辨率子采样的点云数据（100k、50k、10k点），可以方便地用于不同的下游计算任务。MaizeField3D将作为AI驱动的表型分析、植物结构分析和农业研究中的3D应用的全面基础数据集。

更新时间: 2025-07-03 16:22:27

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.07813v3

Contextual Online Pricing with (Biased) Offline Data

We study contextual online pricing with biased offline data. For the scalar price elasticity case, we identify the instance-dependent quantity $\delta^2$ that measures how far the offline data lies from the (unknown) online optimum. We show that the time length $T$, bias bound $V$, size $N$ and dispersion $\lambda_{\min}(\hat{\Sigma})$ of the offline data, and $\delta^2$ jointly determine the statistical complexity. An Optimism-in-the-Face-of-Uncertainty (OFU) policy achieves a minimax-optimal, instance-dependent regret bound $\tilde{\mathcal{O}}\big(d\sqrt{T} \wedge (V^2T + \frac{dT}{\lambda_{\min}(\hat{\Sigma}) + (N \wedge T) \delta^2})\big)$. For general price elasticity, we establish a worst-case, minimax-optimal rate $\tilde{\mathcal{O}}\big(d\sqrt{T} \wedge (V^2T + \frac{dT }{\lambda_{\min}(\hat{\Sigma})})\big)$ and provide a generalized OFU algorithm that attains it. When the bias bound $V$ is unknown, we design a robust variant that always guarantees sub-linear regret and strictly improves on purely online methods whenever the exact bias is small. These results deliver the first tight regret guarantees for contextual pricing in the presence of biased offline data. Our techniques also transfer verbatim to stochastic linear bandits with biased offline data, yielding analogous bounds.

Updated: 2025-07-03 16:21:49

标题: 使用（偏倚的）离线数据的情境化在线定价

摘要: 我们研究了具有偏差离线数据的情境在线定价。对于标量价格弹性情况，我们确定了依赖于实例的量$\delta^2$，用于衡量离线数据与（未知的）在线最优解的距离。我们表明，时间长度$T$，偏差界限$V$，大小$N$和离线数据的离散度$\lambda_{\min}(\hat{\Sigma})$，以及$\delta^2$共同决定了统计复杂度。一种面对不确定性的乐观（OFU）策略实现了一个最小最优、依赖于实例的遗憾上界$\tilde{\mathcal{O}}\big(d\sqrt{T} \wedge (V^2T + \frac{dT}{\lambda_{\min}(\hat{\Sigma}) + (N \wedge T) \delta^2})\big)$。对于一般的价格弹性，我们建立了一个最坏情况下的最小最优速率$\tilde{\mathcal{O}}\big(d\sqrt{T} \wedge (V^2T + \frac{dT}{\lambda_{\min}(\hat{\Sigma})})\big)$，并提供了一个广义的OFU算法来实现这一速率。当偏差界限$V$是未知的时，我们设计了一个稳健的变体，始终保证次线性的遗憾，并在确切偏差较小时严格优于纯在线方法。这些结果为在存在偏差的离线数据情况下的情境定价提供了第一个紧密的遗憾保证。我们的技术也可以逐字转移到具有偏差离线数据的随机线性臂带，产生类似的上界。

更新时间: 2025-07-03 16:21:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.02762v1

Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work

The capabilities of Large Language Models (LLMs) have opened new frontiers for interacting with complex, domain-specific knowledge. However, prevailing methods like Retrieval-Augmented Generation (RAG) and general-purpose Agentic AI, while powerful, often struggle with tasks that demand deep, procedural, and methodological reasoning inherent to expert domains. RAG provides factual context but fails to convey logical frameworks; autonomous agents can be inefficient and unpredictable without domain-specific heuristics. To bridge this gap, we introduce Knowledge Protocol Engineering (KPE), a new paradigm focused on systematically translating human expert knowledge, often expressed in natural language documents, into a machine-executable Knowledge Protocol (KP). KPE shifts the focus from merely augmenting LLMs with fragmented information to endowing them with a domain's intrinsic logic, operational strategies, and methodological principles. We argue that a well-engineered Knowledge Protocol allows a generalist LLM to function as a specialist, capable of decomposing abstract queries and executing complex, multi-step tasks. This position paper defines the core principles of KPE, differentiates it from related concepts, and illustrates its potential applicability across diverse fields such as law and bioinformatics, positing it as a foundational methodology for the future of human-AI collaboration.

Updated: 2025-07-03 16:21:14

标题: 知识协议工程：领域特定知识工作中人工智能的新范式

摘要: 大型语言模型（LLM）的能力已经为与复杂领域特定知识互动打开了新的前沿。然而，现有的方法如检索增强生成（RAG）和通用型智能AI虽然强大，但往往在需要深入、程序化和方法论推理的专家领域任务上遇到困难。RAG提供了事实背景，但无法传达逻辑框架；自主代理可能在缺乏领域特定启发式的情况下效率低下且不可预测。为了弥补这一差距，我们引入了知识协议工程（KPE），这是一种新的范式，专注于系统地将人类专家知识（通常表达为自然语言文档）转化为机器可执行的知识协议（KP）。KPE将焦点从仅仅用碎片化信息增强LLMs转移到赋予其领域固有的逻辑、运营策略和方法论原则。我们认为，一个精心设计的知识协议可以使一般性的LLM像专业人士一样工作，能够分解抽象查询并执行复杂的多步任务。这篇立场论文界定了KPE的核心原则，将其与相关概念区分开，并举例说明其在法律和生物信息学等各个领域的潜在适用性，将其定位为未来人工智能协作的基础方法论。

更新时间: 2025-07-03 16:21:14

领域: cs.AI

下载: http://arxiv.org/abs/2507.02760v1

AI-Based Reconstruction from Inherited Personal Data: Analysis, Feasibility, and Prospects

This article explores the feasibility of creating an "electronic copy" of a deceased researcher by training artificial intelligence (AI) on the data stored in their personal computers. By analyzing typical data volumes on inherited researcher computers, including textual files such as articles, emails, and drafts, it is estimated that approximately one million words are available for AI training. This volume is sufficient for fine-tuning advanced pre-trained models like GPT-4 to replicate a researcher's writing style, domain expertise, and rhetorical voice with high fidelity. The study also discusses the potential enhancements from including non-textual data and file metadata to enrich the AI's representation of the researcher. Extensions of the concept include communication between living researchers and their electronic copies, collaboration among individual electronic copies, as well as the creation and interconnection of organizational electronic copies to optimize information access and strategic decision-making. Ethical considerations such as ownership and security of these electronic copies are highlighted as critical for responsible implementation. The findings suggest promising opportunities for AI-driven preservation and augmentation of intellectual legacy.

Updated: 2025-07-03 16:19:15

标题: 基于AI的从遗传个人数据重建：分析、可行性和前景

摘要: 本文探讨了通过在逝世研究者的个人计算机中存储的数据训练人工智能(AI)来创建一个“电子副本”的可行性。通过分析继承研究者计算机上的典型数据量，包括文章、电子邮件和草稿等文本文件，估计大约有一百万字可用于AI训练。这个数量足以对高级预训练模型如GPT-4进行微调，以高度忠实地复制研究者的写作风格、领域专业知识和修辞风格。该研究还讨论了通过包括非文本数据和文件元数据来丰富AI对研究者的表征的潜在增强。该概念的延伸包括在生存研究者和他们的电子副本之间的交流，个人电子副本之间的合作，以及创建和连接组织电子副本以优化信息获取和战略决策。拥有权和安全性等道德考虑被强调为负责任的实施至关重要。研究结果表明，人工智能驱动的知识遗产保护和增强具有有希望的机会。

更新时间: 2025-07-03 16:19:15

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2507.03059v1

Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness

The growing interest in fair AI development is evident. The ''Leave No One Behind'' initiative urges us to address multiple and intersecting forms of inequality in accessing services, resources, and opportunities, emphasising the significance of fairness in AI. This is particularly relevant as an increasing number of AI tools are applied to decision-making processes, such as resource allocation and service scheme development, across various sectors such as health, energy, and housing. Therefore, exploring joint inequalities in these sectors is significant and valuable for thoroughly understanding overall inequality and unfairness. This research introduces an innovative approach to quantify cross-sectoral intersecting discrepancies among user-defined groups using latent class analysis. These discrepancies can be used to approximate inequality and provide valuable insights to fairness issues. We validate our approach using both proprietary and public datasets, including both EVENS and Census 2021 (England & Wales) datasets, to examine cross-sectoral intersecting discrepancies among different ethnic groups. We also verify the reliability of the quantified discrepancy by conducting a correlation analysis with a government public metric. Our findings reveal significant discrepancies both among minority ethnic groups and between minority ethnic groups and non-minority ethnic groups, emphasising the need for targeted interventions in policy-making processes. Furthermore, we demonstrate how the proposed approach can provide valuable insights into ensuring fairness in machine learning systems.

Updated: 2025-07-03 16:18:29

标题: 量化多组织中交叉领域差异的潜在类分析，以实现公平。

摘要: 对公平人工智能发展日益增长的兴趣是显而易见的。 “不让任何人掉队”倡议敦促我们解决多种交叉的不平等形式，包括获取服务、资源和机会的不平等，强调人工智能中公平的重要性。随着越来越多的人工智能工具被应用于决策过程，如资源分配和服务计划开发，涉及健康、能源和住房等各个领域，这一点尤为重要。因此，探究这些领域中的联合不平等对于全面了解整体不平等和不公平性是重要且有价值的。本研究介绍了一种创新方法，通过潜在类别分析来量化用户定义组之间的跨部门交叉差异。这些差异可以用来近似不平等，并为公平问题提供有价值的见解。我们使用专有和公共数据集验证了我们的方法，包括EVENS和2021年人口普查（英格兰和威尔士）数据集，以研究不同族裔群体之间的跨部门交叉差异。我们还通过与政府公共指标进行相关性分析，验证了量化差异的可靠性。我们的研究结果显示了少数族裔群体之间以及少数族裔群体与非少数族裔群体之间存在显著差异，强调了在政策制定过程中有针对性干预的必要性。此外，我们展示了提出的方法如何为确保机器学习系统的公平性提供有价值的见解。

更新时间: 2025-07-03 16:18:29

领域: cs.CY,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.03133v4

Multi-agent Auditory Scene Analysis

Auditory scene analysis (ASA) aims to retrieve information from the acoustic environment, by carrying out three main tasks: sound source location, separation, and classification. These tasks are traditionally executed with a linear data flow, where the sound sources are first located; then, using their location, each source is separated into its own audio stream; from each of which, information is extracted that is relevant to the application scenario (audio event detection, speaker identification, emotion classification, etc.). However, running these tasks linearly increases the overall response time, while making the last tasks (separation and classification) highly sensitive to errors of the first task (location). A considerable amount of effort and computational complexity has been employed in the state-of-the-art to develop techniques that are the least error-prone possible. However, doing so gives rise to an ASA system that is non-viable in many applications that require a small computational footprint and a low response time, such as bioacoustics, hearing-aid design, search and rescue, human-robot interaction, etc. To this effect, in this work, a multi-agent approach is proposed to carry out ASA where the tasks are run in parallel, with feedback loops between them to compensate for local errors, such as: using the quality of the separation output to correct the location error; and using the classification result to reduce the localization's sensitivity towards interferences. The result is a multi-agent auditory scene analysis (MASA) system that is robust against local errors, without a considerable increase in complexity, and with a low response time. The complete proposed MASA system is provided as a framework that uses open-source tools for sound acquisition and reproduction (JACK) and inter-agent communication (ROS2), allowing users to add their own agents.

Updated: 2025-07-03 16:16:46

标题: 多智能体听觉场景分析

摘要: 听觉场景分析（ASA）旨在通过执行三项主要任务从声学环境中检索信息：声源定位、分离和分类。这些任务通常以线性数据流的方式执行，首先定位声源；然后利用它们的位置，将每个源分离成自己的音频流；从每个音频流中提取与应用场景相关的信息（如音频事件检测、说话者识别、情绪分类等）。然而，线性运行这些任务会增加整体响应时间，同时使最后的任务（分离和分类）对第一个任务（定位）的错误非常敏感。现代技术在开发尽可能减少错误的技术方面投入了大量努力和计算复杂度。然而，这样做会导致一个在需要小型计算占用和低响应时间的许多应用中不可行的ASA系统，如生物声学、助听器设计、搜索和救援、人机交互等。因此，在这项工作中，提出了一种多智能体方法来进行ASA，其中任务并行运行，它们之间有反馈环路来补偿局部错误，例如：使用分离输出的质量来纠正定位错误；并使用分类结果来减少对干扰的定位敏感性。结果是一个多智能体听觉场景分析（MASA）系统，能够抵御局部错误，而不会显著增加复杂性，并具有低响应时间。提供了完整的建议的MASA系统作为一个框架，使用开源工具进行声音采集和重现（JACK）和智能体间通信（ROS2），允许用户添加自己的智能体。

更新时间: 2025-07-03 16:16:46

领域: eess.AS,cs.AI

下载: http://arxiv.org/abs/2507.02755v1

Fast and Simplex: 2-Simplicial Attention in Triton

Recent work has shown that training loss scales as a power law with both model size and the number of tokens, and that achieving compute-optimal models requires scaling model size and token count together. However, these scaling laws assume an infinite supply of data and apply primarily in compute-bound settings. As modern large language models increasingly rely on massive internet-scale datasets, the assumption that they are compute-bound is becoming less valid. This shift highlights the need for architectures that prioritize token efficiency. In this work, we investigate the use of the 2-simplicial Transformer, an architecture that generalizes standard dot-product attention to trilinear functions through an efficient Triton kernel implementation. We demonstrate that the 2-simplicial Transformer achieves better token efficiency than standard Transformers: for a fixed token budget, similarly sized models outperform their dot-product counterparts on tasks involving mathematics, coding, reasoning, and logic. We quantify these gains by demonstrating that $2$-simplicial attention changes the exponent in the scaling laws for knowledge and reasoning tasks compared to dot product attention.

Updated: 2025-07-03 16:16:34

标题: 快速且简单：Triton中的2-单纯注意力

摘要: 最近的研究表明，训练损失随着模型大小和标记数量呈幂律增长，实现计算优化模型需要同时扩展模型大小和标记数量。然而，这些扩展定律假设有无限的数据供应，并主要适用于计算受限环境。随着现代大型语言模型越来越依赖于庞大的互联网规模数据集，它们是计算受限的假设变得不太有效。这种转变突显了对优先考虑标记效率的架构的需求。在这项工作中，我们研究了使用2-单纯变换器的方法，这种架构将标准点积注意力推广到通过高效Triton内核实现的三线性函数。我们证明2-单纯变换器在标记效率方面优于标准变换器：在固定的标记预算下，相似大小的模型在涉及数学、编码、推理和逻辑的任务中优于它们的点积对应物。我们通过展示2-单纯注意力相对于点积注意力改变了知识和推理任务的扩展定律中的指数来量化这些收益。

更新时间: 2025-07-03 16:16:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.02754v1

Synthesizable by Design: A Retrosynthesis-Guided Framework for Molecular Analog Generation

The disconnect between AI-generated molecules with desirable properties and their synthetic feasibility remains a critical bottleneck in computational drug and material discovery. While generative AI has accelerated the proposal of candidate molecules, many of these structures prove challenging or impossible to synthesize using established chemical reactions. Here, we introduce SynTwins, a novel retrosynthesis-guided molecular analog design framework that designs synthetically accessible molecular analogs by emulating expert chemist strategies through a three-step process: retrosynthesis, similar building block searching, and virtual synthesis. In comparative evaluations, SynTwins demonstrates superior performance in generating synthetically accessible analogs compared to state-of-the-art machine learning models while maintaining high structural similarity to original target molecules. Furthermore, when integrated with existing molecule optimization frameworks, our hybrid approach produces synthetically feasible molecules with property profiles comparable to unconstrained molecule generators, yet its synthesizability ensured. Our comprehensive benchmarking across diverse molecular datasets demonstrates that SynTwins effectively bridges the gap between computational design and experimental synthesis, providing a practical solution for accelerating the discovery of synthesizable molecules with desired properties for a wide range of applications.

Updated: 2025-07-03 16:14:57

标题: 可设计合成的：一种回溯合成引导的分子类似物生成框架

摘要: 人工智能生成具有理想性质的分子与它们的合成可行性之间的脱节仍然是计算药物和材料发现中的一个关键瓶颈。虽然生成式人工智能加速了候选分子的提议，但许多这些结构在使用已建立的化学反应合成时证明是具有挑战性或不可能的。在这里，我们介绍了SynTwins，这是一个新颖的逆合成引导分子类似设计框架，通过模拟专家化学家的策略，通过一个三步过程设计可合成的分子类似物：逆合成，相似构建块搜索和虚拟合成。在比较评估中，SynTwins在生成可合成的类似物方面表现出优异的性能，而与最先进的机器学习模型相比，同时保持与原始目标分子高结构相似性。此外，当与现有的分子优化框架集成时，我们的混合方法产生了具有与不受限制的分子生成器相当的特性配置的可合成分子，但其合成可行性得到了保证。我们在各种分子数据集上的全面基准测试表明，SynTwins有效地弥合了计算设计和实验合成之间的差距，为加速发现具有所需性质的可合成分子提供了实用解决方案，适用于各种应用。

更新时间: 2025-07-03 16:14:57

领域: physics.chem-ph,cs.AI

下载: http://arxiv.org/abs/2507.02752v1

Towards a Novel Measure of User Trust in XAI Systems

The increasing reliance on Deep Learning models, combined with their inherent lack of transparency, has spurred the development of a novel field of study known as eXplainable AI (XAI) methods. These methods seek to enhance the trust of end-users in automated systems by providing insights into the rationale behind their decisions. This paper presents a novel trust measure in XAI systems, allowing their refinement. Our proposed metric combines both performance metrics and trust indicators from an objective perspective. To validate this novel methodology, we conducted three case studies showing an improvement respect the state-of-the-art, with an increased sensitiviy to different scenarios.

Updated: 2025-07-03 16:05:44

标题: 朝向一种新型的用户信任度量在XAI系统中

摘要: 随着对深度学习模型的日益依赖，以及其固有的缺乏透明度，已经催生了一个被称为可解释人工智能（XAI）方法的新领域的发展。这些方法旨在通过提供对决策背后原理的见解，增强终端用户对自动化系统的信任。本文提出了一种在XAI系统中的新型信任度量方法，允许对其进行改进。我们提出的度量方法从客观的角度结合了性能指标和信任指标。为验证这种新颖的方法论，我们进行了三个案例研究，结果显示相对于现有技术的改进，并对不同情景的敏感性增加。

更新时间: 2025-07-03 16:05:44

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.05766v2

Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics

Transformers have become the de facto standard for a wide range of tasks, from image classification to physics simulations. Despite their impressive performance, the quadratic complexity of standard Transformers in both memory and time with respect to the input length makes them impractical for processing high-resolution inputs. Therefore, several variants have been proposed, the most successful relying on patchification, downsampling, or coarsening techniques, often at the cost of losing the finest-scale details. In this work, we take a different approach. Inspired by state-of-the-art techniques in $n$-body numerical simulations, we cast attention as an interaction problem between grid points. We introduce the Multipole Attention Neural Operator (MANO), which computes attention in a distance-based multiscale fashion. MANO maintains, in each attention head, a global receptive field and achieves linear time and memory complexity with respect to the number of grid points. Empirical results on image classification and Darcy flows demonstrate that MANO rivals state-of-the-art models such as ViT and Swin Transformer, while reducing runtime and peak memory usage by orders of magnitude. We open source our code for reproducibility at https://github.com/AlexColagrande/MANO.

Updated: 2025-07-03 16:05:26

标题: 线性注意力与全局上下文：一种用于视觉和物理的多极注意力机制

摘要: 变压器已经成为各种任务的事实标准，从图像分类到物理模拟。尽管它们表现出色，但标准变压器的二次复杂性在内存和时间方面与输入长度成正比，使它们在处理高分辨率输入方面变得不切实际。因此，提出了几种变体，其中最成功的依赖于补丁化、降采样或粗化技术，通常以损失最细节为代价。在这项工作中，我们采用了一种不同的方法。受到$n$体数值模拟中最先进技术的启发，我们将注意力视为网格点之间的交互问题。我们引入了多极注意力神经操作器（MANO），它以基于距离的多尺度方式计算注意力。MANO在每个注意力头中保持全局感受野，并实现了与网格点数量成正比的线性时间和内存复杂性。对图像分类和达西流的实证结果表明，MANO与ViT和Swin Transformer等最先进模型不相上下，同时将运行时间和峰值内存使用量降低了数个数量级。我们在https://github.com/AlexColagrande/MANO上开放我们的代码以便复制。

更新时间: 2025-07-03 16:05:26

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02748v1

MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science

Pre-trained on extensive text and image corpora, current Multi-Modal Large Language Models (MLLM) have shown strong capabilities in general visual reasoning tasks. However, their performance is still lacking in physical domains that require understanding diagrams with complex physical structures and quantitative analysis based on multi-modal information. To address this, we develop a new framework, named Multi-Modal Scientific Reasoning with Physics Perception and Simulation (MAPS) based on an MLLM. MAPS decomposes expert-level multi-modal reasoning task into physical diagram understanding via a Physical Perception Model (PPM) and reasoning with physical knowledge via a simulator. The PPM module is obtained by fine-tuning a visual language model using carefully designed synthetic data with paired physical diagrams and corresponding simulation language descriptions. At the inference stage, MAPS integrates the simulation language description of the input diagram provided by PPM and results obtained through a Chain-of-Simulation process with MLLM to derive the underlying rationale and the final answer. Validated using our collected college-level circuit analysis problems, MAPS significantly improves reasoning accuracy of MLLM and outperforms all existing models. The results confirm MAPS offers a promising direction for enhancing multi-modal scientific reasoning ability of MLLMs. We will release our code, model and dataset used for our experiments upon publishing of this paper.

Updated: 2025-07-03 15:57:20

标题: 地图：推进专家级物理科学中的多模态推理

摘要: 经过对大量文本和图像语料库进行预训练，当前的多模态大型语言模型（MLLM）在一般视觉推理任务中表现出强大的能力。然而，在需要理解具有复杂物理结构的图表和基于多模态信息进行定量分析的物理领域，它们的表现仍然不足。为了解决这个问题，我们开发了一个新的框架，名为基于MLLM的物理感知和仿真的多模态科学推理（MAPS）。MAPS将专家级多模态推理任务分解为通过物理感知模型（PPM）理解物理图表和通过模拟器进行物理知识推理。PPM模块通过使用精心设计的合成数据进行微调视觉语言模型而获得，这些数据包括配对的物理图表和相应的模拟语言描述。在推理阶段，MAPS将PPM提供的输入图表的模拟语言描述与MLLM的仿真链过程中获得的结果相结合，推导出潜在的原因和最终答案。通过使用我们收集的大学水平电路分析问题验证，MAPS显著提高了MLLM的推理准确性，并超过了所有现有模型。结果证实MAPS为增强MLLM的多模态科学推理能力提供了一个有前途的方向。我们将在发表本文后发布用于我们实验的代码、模型和数据集。

更新时间: 2025-07-03 15:57:20

领域: cs.AI

下载: http://arxiv.org/abs/2501.10768v2

Early Signs of Steganographic Capabilities in Frontier LLMs

Monitoring Large Language Model (LLM) outputs is crucial for mitigating risks from misuse and misalignment. However, LLMs could evade monitoring through steganography: Encoding hidden information within seemingly benign generations. In this paper, we evaluate the steganography capabilities in frontier LLMs to better understand the risk they pose. We focus on two types of steganography: passing encoded messages and performing encoded reasoning. We find that current models are unable to encode short messages in their outputs without a monitor noticing under standard affordances. They can succeed, however, if given additional affordances such as using an unmonitored scratchpad and coordinating on what encoding scheme to use. We additionally find early signs that models can perform basic encoded reasoning in a simple state-tracking problem. This includes some ability to reason with their own and pre-defined schemes, including encoding schemes such as Hexadecimal. Despite this, they can rarely hide reasoning subtly within a cover task to fool a monitor. Overall, our results indicate that current LLMs exhibit nascent steganographic capabilities. While these capabilities are likely insufficient to bypass well-designed monitors at present, this could change in the future.

Updated: 2025-07-03 15:54:55

标题: 前沿LLMs中隐写能力的早期迹象

摘要: 监控大型语言模型（LLM）的输出对于减轻因误用和不一致而产生的风险至关重要。然而，LLMs可能通过隐写术来规避监控：在看似良性的生成物中编码隐藏信息。在本文中，我们评估了前沿LLMs中的隐写能力，以更好地了解它们所带来的风险。我们关注两种类型的隐写术：传递编码消息和执行编码推理。我们发现，当前模型无法在标准条件下在其输出中编码短消息而不被监控者注意到。然而，如果给予额外的条件，如使用未经监控的草稿本并协调使用什么编码方案，它们可能会成功。此外，我们还发现模型可以在简单的状态跟踪问题中执行基本的编码推理的初步迹象。这包括一定程度上可以使用自己和预定义的方案进行推理，包括十六进制等编码方案。尽管如此，它们很少能够将推理巧妙地隐藏在一个掩盖任务中以愚弄监控者。总的来说，我们的结果表明，当前的LLMs表现出初步的隐写能力。尽管这些能力目前可能不足以绕过设计良好的监控器，但这在未来可能会发生改变。

更新时间: 2025-07-03 15:54:55

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.02737v1

Enabling Population-Level Parallelism in Tree-Based Genetic Programming for Comprehensive GPU Acceleration

Tree-based Genetic Programming (TGP) is a widely used evolutionary algorithm for tasks such as symbolic regression, classification, and robotic control. Due to the intensive computational demands of running TGP, GPU acceleration is crucial for achieving scalable performance. However, efficient GPU-based execution of TGP still remains challenging, primarily due to three core issues: (1) the structural heterogeneity of program individuals, (2) the complexity of integrating multiple levels of parallelism, and (3) the incompatibility between high-performance CUDA execution and flexible Python-based environments. To address these issues, we propose EvoGP, a high-performance framework tailored for comprehensive GPU acceleration of TGP via population-level parallel execution. First, EvoGP introduces a tensorized representation that encodes variable-sized trees into fixed-shape, memory-aligned arrays, enabling uniform memory access and parallel computation across diverse individuals. Second, EvoGP adopts an adaptive parallelism strategy that dynamically combines intra- and inter-individual parallelism based on dataset size, ensuring high GPU utilization across a broad spectrum of tasks. Third, EvoGP embeds custom CUDA kernels into the PyTorch runtime, achieving seamless integration with Python-based environments such as Gym, MuJoCo, Brax, and Genesis. Comprehensive experiments show that EvoGP achieves up to 140x speedup over state-of-the-art GPU-based TGP implementations, while maintaining competitive accuracy and significantly improving scalability under large population sizes. EvoGP is open source and accessible at: https://github.com/EMI-Group/evogp.

Updated: 2025-07-03 15:50:57

标题: 在树结构遗传规划中实现全面GPU加速的人口级并行性

摘要: 基于树的遗传编程（TGP）是一种广泛使用的进化算法，用于符号回归、分类和机器人控制等任务。由于运行TGP需要大量计算资源，因此GPU加速对于实现可扩展性性能至关重要。然而，有效的基于GPU的TGP执行仍然具有挑战性，主要由于三个核心问题：（1）程序个体的结构异质性，（2）整合多个级别的并行性的复杂性，以及（3）高性能CUDA执行与灵活的基于Python的环境之间的不兼容性。为了解决这些问题，我们提出了EvoGP，这是一个专为通过种群级并行执行全面加速TGP的高性能框架。首先，EvoGP引入了一种张量化表示，将可变大小的树编码为固定形状、内存对齐的数组，实现了对各种个体的统一内存访问和并行计算。其次，EvoGP采用自适应并行策略，根据数据集大小动态组合个体内部和个体间并行性，确保在广泛的任务范围内高GPU利用率。第三，EvoGP将自定义CUDA核心嵌入PyTorch运行时，实现与Python环境（如Gym、MuJoCo、Brax和Genesis）的无缝集成。全面的实验表明，EvoGP相对于最先进的基于GPU的TGP实现可实现高达140倍的加速，同时保持竞争力的准确性，并在大规模种群规模下显著提高可扩展性。EvoGP是开源的，可在https://github.com/EMI-Group/evogp 上获取。

更新时间: 2025-07-03 15:50:57

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2501.17168v4

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

Agentic search such as Deep Research systems-where agents autonomously browse the web, synthesize information, and return comprehensive citation-backed answers-represents a major shift in how users interact with web-scale information. While promising greater efficiency and cognitive offloading, the growing complexity and open-endedness of agentic search have outpaced existing evaluation benchmarks and methodologies, which largely assume short search horizons and static answers. In this paper, we introduce Mind2Web 2, a benchmark of 130 realistic, high-quality, and long-horizon tasks that require real-time web browsing and extensive information synthesis, constructed with over 1000 hours of human labor. To address the challenge of evaluating time-varying and complex answers, we propose a novel Agent-as-a-Judge framework. Our method constructs task-specific judge agents based on a tree-structured rubric design to automatically assess both answer correctness and source attribution. We conduct a comprehensive evaluation of ten frontier agentic search systems and human performance, along with a detailed error analysis to draw insights for future development. The best-performing system, OpenAI Deep Research, can already achieve 50-70% of human performance while spending half the time, highlighting its great potential. Altogether, Mind2Web 2 provides a rigorous foundation for developing and benchmarking the next generation of agentic search systems.

Updated: 2025-07-03 15:47:40

标题: Mind2Web 2：用Agent作为裁判评估主体搜索

摘要: 主动搜索，如Deep Research系统，其中代理人自主浏览网络，综合信息，并返回综合引用支持的答案，代表了用户与网络规模信息交互方式的重大转变。尽管承诺提高效率和认知卸载，但主动搜索日益复杂和开放的特点已超越现有的评估基准和方法论，这些基准和方法论主要假定搜索范围短暂且答案静态。在本文中，我们介绍Mind2Web 2，这是一个由130个现实、高质量且长期任务构成的基准，需要实时网络浏览和广泛信息综合，构建过程中耗费了超过1000个小时的人力。为了解决评估时间变化和复杂答案的挑战，我们提出了一种新颖的“代理人作为评判者”框架。我们的方法基于树状结构的评分设计构建了任务特定的评判代理人，自动评估答案的正确性和来源归属。我们对十种前沿主动搜索系统和人类表现进行了全面评估，以及详细的错误分析，为未来发展提供了见解。表现最佳的系统，OpenAI Deep Research，在花费一半时间的情况下已经可以达到50-70%的人类表现，突显了其巨大潜力。总体而言，Mind2Web 2为开发和基准测试下一代主动搜索系统提供了严谨的基础。

更新时间: 2025-07-03 15:47:40

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.21506v2

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks

Prompt injection attacks pose a significant security threat to LLM-integrated applications. Model-level defenses have shown strong effectiveness, but are currently deployed into commercial-grade models in a closed-source manner. We believe open-source models are needed by the AI security community, where co-development of attacks and defenses through open research drives scientific progress in mitigation against prompt injection attacks. To this end, we develop Meta SecAlign, the first open-source and open-weight LLM with built-in model-level defense that achieves commercial-grade model performance. We provide complete details of our training recipe, which utilizes an improved version of the SOTA SecAlign defense. Evaluations on 9 utility benchmarks and 7 security benchmarks show that Meta SecAlign, despite being trained on a generic instruction-tuning dataset, confers security in unseen downstream tasks, including tool-calling and agentic web navigation, in addition general instruction-following. Our best model -- Meta-SecAlign-70B -- achieves state-of-the-art robustness against prompt injection attacks and comparable utility to closed-source commercial LLM with model-level defense.

Updated: 2025-07-03 15:47:13

标题: Meta SecAlign：一种针对即时注入攻击的安全基础LLM

摘要: 立即注入攻击对LLM集成应用构成重大安全威胁。模型级防御已经显示出强大的效果，但目前以闭源方式部署到商用模型中。我们认为，AI安全社区需要开源模型，通过开放研究共同开发攻击和防御，推动科学进步，减轻立即注入攻击带来的影响。为此，我们开发了Meta SecAlign，这是第一个具有内置模型级防御的开源和开放权重LLM，实现了商业级模型性能。我们提供了我们的训练配方的完整细节，该配方利用了改进版本的SOTA SecAlign防御。对9个实用基准测试和7个安全基准测试的评估显示，尽管Meta SecAlign是在通用指令调整数据集上训练的，但在未知的下游任务中具有安全性，包括调用工具和主体性的网络导航，以及一般的指令遵循。我们的最佳模型--Meta-SecAlign-70B--实现了针对立即注入攻击的最新鲁棒性，并且在模型级防御方面与闭源商业LLM具有可比性的实用性。

更新时间: 2025-07-03 15:47:13

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.02735v1

Classification by Separating Hypersurfaces: An Entropic Approach

We consider the following classification problem: Given a population of individuals characterized by a set of attributes represented as a vector in ${\mathbb R}^N$, the goal is to find a hyperplane in ${\mathbb R}^N$ that separates two sets of points corresponding to two distinct classes. This problem, with a history dating back to the perceptron model, remains central to machine learning. In this paper we propose a novel approach by searching for a vector of parameters in a bounded $N$-dimensional hypercube centered at the origin and a positive vector in ${\mathbb R}^M$, obtained through the minimization of an entropy-based function defined over the space of unknown variables. The method extends to polynomial surfaces, allowing the separation of data points by more complex decision boundaries. This provides a robust alternative to traditional linear or quadratic optimization techniques, such as support vector machines and gradient descent. Numerical experiments demonstrate the efficiency and versatility of the method in handling diverse classification tasks, including linear and non-linear separability.

Updated: 2025-07-03 15:43:54

标题: 分类方法：基于分隔超曲面的熵方法

摘要: 我们考虑以下分类问题：给定一个由一组属性表示的个体群体，这些属性表示为在${\mathbb R}^N$中的向量，目标是找到一个在${\mathbb R}^N$中分隔两个对应于两个不同类别的点集的超平面。这个问题自感知器模型以来就存在，对机器学习至关重要。在本文中，我们提出了一种新颖的方法，通过在原点为中心的有界$N$维超立方体中搜索一组参数向量，并通过在未知变量空间上定义的基于熵的函数的最小化来获得一个正的向量在${\mathbb R}^M$中。该方法扩展到多项式曲面，允许通过更复杂的决策边界分离数据点。这为传统的线性或二次优化技术（如支持向量机和梯度下降）提供了一个强大的替代方案。数值实验展示了该方法在处理不同的分类任务，包括线性和非线性可分性方面的效率和多功能性。

更新时间: 2025-07-03 15:43:54

领域: cs.LG,cs.IT,math.IT,physics.data-an,stat.ML,90C05, 90C25, 90C47, 90C52, 68T01, 68T05, 68T07, 68T20, 68W01

下载: http://arxiv.org/abs/2507.02732v1

Quantifying Classifier Utility under Local Differential Privacy

Local differential privacy (LDP) provides a rigorous and quantifiable privacy guarantee for personal data by introducing perturbation at the data source. However, quantifying the impact of these perturbations on classifier utility remains a theoretical challenge, particularly for complex or black-box classifiers. This paper presents a framework for theoretically quantifying classifier utility under LDP mechanisms. The key insight is that LDP perturbation is concentrated around the original data with a specific probability, transforming utility analysis of the classifier into its robustness analysis in this concentrated region. Our framework connects the concentration analysis of LDP mechanisms with the robustness analysis of classifiers. It treats LDP mechanisms as general distributional functions and classifiers as black-box functions, thus applicable to any LDP mechanism and classifier. A direct application of our utility quantification is guiding the selection of LDP mechanisms and privacy parameters for a given classifier. Notably, our analysis shows that a piecewise-based mechanism leads to better utility compared to alternatives in common scenarios. Using this framework alongside two novel refinement techniques, we conduct case studies on utility quantification for typical mechanism-classifier combinations. The results demonstrate that our theoretical utility quantification aligns closely with empirical observations, particularly when classifiers operate in lower-dimensional input spaces.

Updated: 2025-07-03 15:42:10

标题: 使用本地差分隐私量化分类器效用

摘要: 本文介绍了一个在局部差分隐私(LDP)机制下理论量化分类器效用的框架。关键洞察是LDP扰动集中在原始数据周围，并以特定概率进行转化，将分类器的效用分析转化为在这个集中区域的鲁棒性分析。我们的框架将LDP机制的集中分析与分类器的鲁棒性分析联系起来。它将LDP机制视为一般分布函数，将分类器视为黑盒函数，因此适用于任何LDP机制和分类器。我们的效用量化的直接应用是指导选择给定分类器的LDP机制和隐私参数。值得注意的是，我们的分析显示，在常见场景中，基于分段的机制比其他替代方案具有更好的效用。利用这一框架以及两种新颖的改进技术，我们对典型机制-分类器组合进行了效用量化的案例研究。结果表明，我们的理论效用量化与实证观察密切相关，特别是当分类器在低维输入空间中运行时。

更新时间: 2025-07-03 15:42:10

领域: cs.CR,E.3

下载: http://arxiv.org/abs/2507.02727v1

Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving

Reasoning remains a challenging task for large language models (LLMs), especially within the logically constrained environment of automated theorem proving (ATP), due to sparse rewards and the vast scale of proofs. These challenges are amplified in benchmarks like PutnamBench, which contains university-level problems requiring complex, multi-step reasoning. To address this, we introduce self-generated goal-conditioned MDPs (sG-MDPs), a new framework in which agents generate and pursue their subgoals based on the evolving proof state. Given this more structured generation of goals, the resulting problem becomes more amenable to search. We then apply Monte Carlo Tree Search (MCTS)-like algorithms to solve the sG-MDP, instantiating our approach in Bourbaki (7B), a modular system that can ensemble multiple 7B LLMs for subgoal generation and tactic synthesis. On PutnamBench, Bourbaki (7B) solves 26 problems, achieving new state-of-the-art results with models at this scale.

Updated: 2025-07-03 15:41:38

标题: 布尔巴基：用于定理证明的自生成和目标条件的MDP

摘要: 推理对于大型语言模型（LLMs）来说仍然是一项具有挑战性的任务，特别是在自动定理证明（ATP）的逻辑约束环境中，由于奖励稀疏和证明规模庞大。这些挑战在像PutnamBench这样的基准测试中被放大，该基准测试包含需要复杂的多步推理的大学级问题。为了解决这个问题，我们引入了自动生成目标条件MDPs（sG-MDPs），这是一个新的框架，其中代理根据不断变化的证明状态生成并追求其子目标。鉴于这种更有结构的目标生成，由此产生的问题更容易进行搜索。然后，我们应用类似蒙特卡洛树搜索（MCTS）的算法来解决sG-MDP，并在Bourbaki（7B）中实现我们的方法，这是一个可以组合多个7B LLMs进行子目标生成和策略综合的模块化系统。在PutnamBench上，Bourbaki（7B）解决了26个问题，取得了这一规模模型的新的最先进结果。

更新时间: 2025-07-03 15:41:38

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02726v1

Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms

Recent advances in AI for science have highlighted the power of contrastive learning in bridging heterogeneous biological data modalities. Building on this paradigm, we propose HIPPO (HIerarchical Protein-Protein interaction prediction across Organisms), a hierarchical contrastive framework for protein-protein interaction(PPI) prediction, where protein sequences and their hierarchical attributes are aligned through multi-tiered biological representation matching. The proposed approach incorporates hierarchical contrastive loss functions that emulate the structured relationship among functional classes of proteins. The framework adaptively incorporates domain and family knowledge through a data-driven penalty mechanism, enforcing consistency between the learned embedding space and the intrinsic hierarchy of protein functions. Experiments on benchmark datasets demonstrate that HIPPO achieves state-of-the-art performance, outperforming existing methods and showing robustness in low-data regimes. Notably, the model demonstrates strong zero-shot transferability to other species without retraining, enabling reliable PPI prediction and functional inference even in less characterized or rare organisms where experimental data are limited. Further analysis reveals that hierarchical feature fusion is critical for capturing conserved interaction determinants, such as binding motifs and functional annotations. This work advances cross-species PPI prediction and provides a unified framework for interaction prediction in scenarios with sparse or imbalanced multi-species data.

Updated: 2025-07-03 15:41:04

标题: 跨生物体蛋白质相互作用预测的层次多标签对比学习

摘要: 最近人工智能在科学领域取得的进展突显了对比学习在连接异质生物数据模态方面的强大作用。基于这一范式，我们提出了一种名为HIPPO（跨生物体层次蛋白质相互作用预测）的层次对比框架，用于蛋白质相互作用（PPI）的预测，其中蛋白质序列及其层次属性通过多层生物表示匹配进行对齐。所提出的方法包含了模拟蛋白质功能类别之间结构化关系的层次对比损失函数。该框架通过数据驱动的惩罚机制自适应地整合了领域和家族知识，强化了学习嵌入空间和蛋白质功能的内在层次结构之间的一致性。在基准数据集上的实验表明，HIPPO实现了最先进的性能，在低数据情况下表现优异。值得注意的是，该模型展示了强大的零样本可迁移性，无需重新训练即可将其应用于其他物种，实现可靠的PPI预测和功能推断，即使在实验数据有限的少数或罕见生物体中也是如此。进一步的分析表明，层次特征融合对捕获保守的相互作用决定因素，如结合基序和功能注释，至关重要。这项工作推动了跨物种PPI预测，并为在具有稀疏或不平衡多物种数据的情况下提供了互动预测的统一框架。

更新时间: 2025-07-03 15:41:04

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2507.02724v1

On Characterizations for Language Generation: Interplay of Hallucinations, Breadth, and Stability

We study language generation in the limit - introduced by Kleinberg and Mullainathan [KM24] - building on classical works of Gold [Gol67] and Angluin [Ang79]. [KM24]'s main result is an algorithm for generating from any countable language collection in the limit. While their algorithm eventually generates unseen strings from the target language $K$, it sacrifices coverage or breadth, i.e., its ability to generate a rich set of strings. Recent work introduces different notions of breadth and explores when generation with breadth is possible, leaving a full characterization of these notions open. Our first set of results settles this by characterizing generation for existing notions of breadth and their natural extensions. Interestingly, our lower bounds are very flexible and hold for many performance metrics beyond breadth - for instance, showing that, in general, it is impossible to train generators which achieve a higher perplexity or lower hallucination rate for $K$ compared to other languages. Next, we study language generation with breadth and stable generators - algorithms that eventually stop changing after seeing an arbitrary but finite number of strings - and prove unconditional lower bounds for such generators, strengthening the results of [KMV25] and demonstrating that generation with many existing notions of breadth becomes equally hard, when stability is required. This gives a separation for generation with approximate breadth, between stable and unstable generators, highlighting the rich interplay between breadth, stability, and consistency in language generation.

Updated: 2025-07-03 15:39:41

标题: 关于语言生成的特征描述：幻觉、广度和稳定性的相互作用

摘要: 我们研究了语言生成的极限问题，基于Kleinberg和Mullainathan[KM24]引入的概念，并建立在Gold[Gol67]和Angluin[Ang79]的经典作品之上。[KM24]的主要结果是一种算法，可以从任何可数语言集合中极限生成。虽然他们的算法最终可以生成目标语言$K$中未见过的字符串，但它牺牲了覆盖范围或广度，即它生成丰富字符串集的能力。最近的工作引入了不同的广度概念，并探讨了何时可以实现具有广度的生成，但对这些概念的完全描述仍然开放。我们的第一组结果通过表征现有广度概念及其自然扩展，从而解决了这个问题。有趣的是，我们的下界非常灵活，并适用于广度以外的许多性能指标，例如，一般情况下，不可能训练生成器以实现比其他语言更高的困惑度或更低的幻觉率。接下来，我们研究具有广度和稳定生成器的语言生成，这些生成器在看到任意但有限数量的字符串后最终停止更改，并为这些生成器提供了无条件下界，加强了[KMV25]的结果，并证明了当需要稳定性时，使用许多现有广度概念进行生成变得同样困难。这为具有近似广度的生成提供了稳定和不稳定生成器之间的区分，突显了语言生成中广度、稳定性和一致性之间丰富的相互作用。

更新时间: 2025-07-03 15:39:41

领域: cs.LG,cs.AI,cs.CL,cs.DS,stat.ML

下载: http://arxiv.org/abs/2412.18530v2

Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification

As Large Language Models (LLMs) become integral software components in modern applications, unauthorized model derivations through fine-tuning, merging, and redistribution have emerged as critical software engineering challenges. Unlike traditional software where clone detection and license compliance are well-established, the LLM ecosystem lacks effective mechanisms to detect model lineage and enforce licensing agreements. This gap is particularly problematic when open-source model creators, such as Meta's LLaMA, require derivative works to maintain naming conventions for attribution, yet no technical means exist to verify compliance. To fill this gap, treating LLMs as software artifacts requiring provenance tracking, we present TensorGuard, a gradient-based fingerprinting framework for LLM similarity detection and family classification. Our approach extracts model-intrinsic behavioral signatures by analyzing gradient responses to random input perturbations across tensor layers, operating independently of training data, watermarks, or specific model formats. TensorGuard supports the widely-adopted safetensors format and constructs high-dimensional fingerprints through statistical analysis of gradient features. These fingerprints enable two complementary capabilities: direct pairwise similarity assessment between arbitrary models through distance computation, and systematic family classification of unknown models via the K-Means clustering algorithm with domain-informed centroid initialization using known base models. Experimental evaluation on 58 models comprising 8 base models and 50 derivatives across five model families (Llama, Qwen, Gemma, Phi, Mistral) demonstrates 94% classification accuracy under our centroid-initialized K-Means clustering.

Updated: 2025-07-03 15:36:23

标题: 基于梯度的模型指纹技术用于LLM相似性检测和家族分类

摘要: 随着大型语言模型（LLMs）成为现代应用程序中不可或缺的软件组件，通过微调、合并和重新分发未经授权的模型派生已成为关键的软件工程挑战。与传统软件不同，传统软件中克隆检测和许可合规性已经得到很好的建立，LLM生态系统缺乏有效的机制来检测模型谱系并强制执行许可协议。当开源模型创建者，例如Meta的LLaMA，要求衍生作品保持命名约定以进行归属，但却没有技术手段来验证合规性时，这一差距尤为棘手。为了填补这一差距，将LLMs视为需要溯源跟踪的软件工件，我们提出了TensorGuard，这是一个基于梯度的指纹框架，用于LLM相似性检测和家族分类。我们的方法通过分析梯度对张量层中的随机输入扰动的响应，提取模型内在的行为特征签名，独立于训练数据、水印或具体模型格式的操作。TensorGuard支持广泛采用的safetensors格式，并通过对梯度特征的统计分析构建高维指纹。这些指纹实现了两个互补的功能：通过距离计算在任意模型之间进行直接成对相似性评估，以及通过K-Means聚类算法对未知模型进行系统家族分类，利用已知基础模型进行领域知情的质心初始化。在包括8个基础模型和50个衍生模型的58个模型上进行的实验评估涵盖了五个模型家族（Llama、Qwen、Gemma、Phi、Mistral），结果表明在我们的质心初始化的K-Means聚类下，分类准确率达到94%。

更新时间: 2025-07-03 15:36:23

领域: cs.LG,cs.AI,cs.SE

下载: http://arxiv.org/abs/2506.01631v2

A Comprehensive Machine Learning Framework for Micromobility Demand Prediction

Dockless e-scooters, a key micromobility service, have emerged as eco-friendly and flexible urban transport alternatives. These services improve first and last-mile connectivity, reduce congestion and emissions, and complement public transport for short-distance travel. However, effective management of these services depends on accurate demand prediction, which is crucial for optimal fleet distribution and infrastructure planning. While previous studies have focused on analyzing spatial or temporal factors in isolation, this study introduces a framework that integrates spatial, temporal, and network dependencies for improved micromobility demand forecasting. This integration enhances accuracy while providing deeper insights into urban micromobility usage patterns. Our framework improves demand prediction accuracy by 27 to 49% over baseline models, demonstrating its effectiveness in capturing micromobility demand patterns. These findings support data-driven micromobility management, enabling optimized fleet distribution, cost reduction, and sustainable urban planning.

Updated: 2025-07-03 15:31:10

标题: 一个全面的机器学习框架用于微移动需求预测

摘要: 无桩电动滑板车作为关键的微动力服务，已被视为环保和灵活的城市交通替代方案。这些服务改善了首尾里程的连接，减少了拥堵和排放，并为短途出行的公共交通提供了补充。然而，有效管理这些服务取决于准确的需求预测，这对于最佳车队分配和基础设施规划至关重要。虽然先前的研究集中于分析空间或时间因素，但本研究引入了一个框架，将空间、时间和网络依赖性整合在一起，以改善微动力需求预测。这种整合提高了准确性，同时更深入地了解了城市微动力使用模式。我们的框架将需求预测准确性提高了27%至49%，相较于基准模型，证明了其在捕捉微动力需求模式方面的有效性。这些发现支持基于数据的微动力管理，实现了优化车队分配、降低成本和可持续城市规划。

更新时间: 2025-07-03 15:31:10

领域: cs.LG

下载: http://arxiv.org/abs/2507.02715v1

FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models

Image generation has achieved remarkable progress with the development of large-scale text-to-image models, especially diffusion-based models. However, generating human images with plausible details, such as faces or hands, remains challenging due to insufficient supervision of local regions during training. To address this issue, we propose FairHuman, a multi-objective fine-tuning approach designed to enhance both global and local generation quality fairly. Specifically, we first construct three learning objectives: a global objective derived from the default diffusion objective function and two local objectives for hands and faces based on pre-annotated positional priors. Subsequently, we derive the optimal parameter updating strategy under the guidance of the Minimum Potential Delay (MPD) criterion, thereby attaining fairness-ware optimization for this multi-objective problem. Based on this, our proposed method can achieve significant improvements in generating challenging local details while maintaining overall quality. Extensive experiments showcase the effectiveness of our method in improving the performance of human image generation under different scenarios.

Updated: 2025-07-03 15:27:45

标题: 公平人类：通过最小潜在延迟提升人体图像生成中的手部和脸部质量扩散模型中的公平性

摘要: 随着大规模文本到图像模型的发展，特别是基于扩散的模型，图像生成取得了显著进展。然而，生成具有可信细节的人类图像，如面部或手部，仍然具有挑战性，这是因为训练过程中对局部区域的监督不足。为了解决这个问题，我们提出了FairHuman，一种多目标微调方法，旨在公平地提高全局和局部生成质量。具体而言，我们首先构建了三个学习目标：一个来自默认扩散目标函数的全局目标，以及基于预先注释的位置先验的两个用于手部和面部的局部目标。随后，我们在最小潜在延迟（MPD）标准的指导下推导出最佳参数更新策略，从而实现了对这个多目标问题的公平优化。基于此，我们提出的方法能够在保持整体质量的同时显著改善生成具有挑战性的局部细节。大量实验证明了我们的方法在不同场景下改善人类图像生成性能的有效性。

更新时间: 2025-07-03 15:27:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02714v1

A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control

Deep reinforcement learning for continuous control has recently achieved impressive progress. However, existing methods often suffer from primacy bias, a tendency to overfit early experiences stored in the replay buffer, which limits an RL agent's sample efficiency and generalizability. In contrast, humans are less susceptible to such bias, partly due to infantile amnesia, where the formation of new neurons disrupts early memory traces, leading to the forgetting of initial experiences. Inspired by this dual processes of forgetting and growing in neuroscience, in this paper, we propose Forget and Grow (FoG), a new deep RL algorithm with two mechanisms introduced. First, Experience Replay Decay (ER Decay) "forgetting early experience", which balances memory by gradually reducing the influence of early experiences. Second, Network Expansion, "growing neural capacity", which enhances agents' capability to exploit the patterns of existing data by dynamically adding new parameters during training. Empirical results on four major continuous control benchmarks with more than 40 tasks demonstrate the superior performance of FoG against SoTA existing deep RL algorithms, including BRO, SimBa, and TD-MPC2.

Updated: 2025-07-03 15:26:48

标题: 一种连续控制中深度强化学习扩展的“遗忘和成长”策略

摘要: 最近，深度强化学习在连续控制领域取得了令人瞩目的进展。然而，现有方法往往受到初级偏差的影响，即倾向于过度拟合存储在重播缓冲区中的早期经验，这限制了强化学习代理的样本效率和泛化能力。相比之下，人类对这种偏见不太容易受到影响，部分原因是由于婴儿遗忘症，即新神经元的形成破坏了早期记忆痕迹，导致最初的经验被遗忘。受神经科学中关于遗忘和成长的双重过程的启发，本文提出了Forget and Grow（FoG）算法，引入了两种机制。首先，经验重播衰减（ER Decay）“遗忘早期经验”，通过逐渐减少早期经验的影响来平衡记忆。其次，网络扩展，“增加神经容量”，通过在训练过程中动态添加新参数增强代理的能力，以利用现有数据的模式。在四个主要连续控制基准测试中进行的实证结果，涵盖了40多个任务，展示了FoG相对于现有深度强化学习算法（包括BRO、SimBa和TD-MPC2）的卓越表现。

更新时间: 2025-07-03 15:26:48

领域: cs.LG

下载: http://arxiv.org/abs/2507.02712v1

Fluid Democracy in Federated Data Aggregation

Federated learning (FL) mechanisms typically require each client to transfer their weights to a central server, irrespective of how useful they are. In order to avoid wasteful data transfer costs from clients to the central server, we propose the use of consensus based protocols to identify a subset of clients with most useful model weights at each data transfer step. First, we explore the application of existing fluid democracy protocols to FL from a performance standpoint, comparing them with traditional one-person-one-vote (also known as 1p1v or FedAvg). We propose a new fluid democracy protocol named viscous-retained democracy that always does better than 1p1v under the same assumptions as existing fluid democracy protocols while also not allowing for influence accumulation. Secondly, we identify weaknesses of fluid democracy protocols from an adversarial lens in terms of their dependence on topology and/ or number of adversaries required to negatively impact the global model weights. To this effect, we propose an algorithm (FedVRD) that dynamically limits the effect of adversaries while minimizing cost by leveraging the delegation topology.

Updated: 2025-07-03 15:26:24

标题: 分散数据聚合中的流动民主

摘要: 联邦学习（FL）机制通常要求每个客户端将其权重传输到中央服务器，而不管它们有多有用。为了避免来自客户端到中央服务器的无效数据传输成本，我们建议使用基于共识的协议，在每个数据传输步骤中识别具有最有用模型权重的客户端子集。首先，我们从性能的角度探讨了现有流动民主协议在FL中的应用，将它们与传统的一人一票（也称为1p1v或FedAvg）进行比较。我们提出了一种新的流动民主协议，名为粘性保留民主，它在与现有流动民主协议相同的假设条件下始终优于1p1v，同时也不允许影响的积累。其次，我们从对手的角度识别了流动民主协议的弱点，即在拓扑和/或对全局模型权重产生负面影响所需的对手数量方面。为此，我们提出了一种算法（FedVRD），通过利用委托拓扑动态限制对手的影响，同时最小化成本。

更新时间: 2025-07-03 15:26:24

领域: cs.LG

下载: http://arxiv.org/abs/2507.02710v1

Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation

In the field of large language model (LLM)-based proof generation, despite extensive training on large datasets such as ArXiv, LLMs still exhibit only modest performance on proving tasks of moderate difficulty. We believe that this is partly due to the widespread presence of suboptimal ordering within the data for each proof used in training. For example, published proofs often follow a purely logical order, where each step logically proceeds from the previous steps based on the deductive rules. This order is designed to facilitate the verification of the proof's soundness, rather than to help people and models learn the discovery process of the proof. In proof generation, we argue that the optimal order for one training data sample occurs when the relevant intermediate supervision for a particular proof step in the proof is always positioned to the left of that proof step. We call such order the intuitively sequential order. We validate our claims using two tasks: intuitionistic propositional logic theorem-proving and digit multiplication. Our experiments verify the order effect and provide support for our explanations. We demonstrate that training is most effective when the proof is in the intuitively sequential order. Moreover, the order effect and the performance gap between models trained on different data orders can be substantial -- with an 11 percent improvement in proof success rate observed in the propositional logic theorem-proving task, between models trained on the optimal order compared to the worst order. Lastly, we define a common type of order issue in advanced math proofs and find that 17.3 percent of theorems with nontrivial proofs in the first two chapters of a widely used graduate-level mathematics textbook suffer from this issue. A detailed list of those proofs is provided in the appendix.

Updated: 2025-07-03 15:14:51

标题: 下一个令牌预测任务假设在证明生成中LLM训练的数据顺序最佳

摘要: 在基于大型语言模型（LLM）的证明生成领域，尽管进行了大规模数据集（如ArXiv）的广泛训练，但LLM在证明中仍然表现出适度困难的证明任务的性能只是一般。我们认为，这部分是由于在训练中使用的每个证明数据中普遍存在次优的排序。例如，已发表的证明通常遵循纯逻辑顺序，其中每个步骤基于演绎规则从前面的步骤逻辑地进行。这种顺序旨在便于验证证明的正确性，而不是帮助人们和模型学习证明的发现过程。在证明生成中，我们认为一个训练数据样本的最佳顺序发生在特定证明步骤的相关中间监督总是位于该证明步骤的左侧。我们称这种顺序为直观的顺序。我们使用两个任务验证我们的观点：直观的命题逻辑定理证明和数字乘法。我们的实验验证了顺序效应，并支持我们的解释。我们证明，当证明按照直观的顺序排列时，训练效果最佳。此外，不同数据顺序训练的模型之间的顺序效应和性能差距可能很大 - 在命题逻辑定理证明任务中观察到证明成功率提高了11％，与按照最差顺序训练的模型相比。最后，我们定义了一个在高级数学证明中常见的排序问题类型，并发现在一本广泛使用的研究生级数学教材的前两章中，有17.3％的具有非平凡证明的定理受到此问题的影响。附录中提供了这些证明的详细列表。

更新时间: 2025-07-03 15:14:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.00863v2

FuzzFeed: An Automatic Approach to Weakest Precondition Generation using LLMs and Fuzzing

The weakest precondition (WP) of a program describes the largest set of initial states from which all terminating executions of the program satisfy a given postcondition. The generation of WPs is an important task with practical applications in areas ranging from verification to run-time error checking. This paper proposes the combination of Large Language Models (LLMs) and fuzz testing for generating WPs. In pursuit of this goal, we introduce Fuzzing Guidance (FG); FG acts as a means of directing LLMs towards correct WPs using program execution feedback. FG utilises fuzz testing for approximately checking the validity and weakness of candidate WPs, this information is then fed back to the LLM as a means of context refinement. We demonstrate the effectiveness of our approach on a comprehensive benchmark set of deterministic array programs in Java. Our experiments indicate that LLMs are capable of producing viable candidate WPs, and that this ability can be practically enhanced through FG.

Updated: 2025-07-03 15:14:43

标题: FuzzFeed：使用LLMs和模糊测试生成最弱前置条件的自动方法

摘要: 程序的最弱前置条件（WP）描述了程序的所有终止执行都满足给定后置条件的最大初始状态集合。生成WP是一个重要的任务，在从验证到运行时错误检查等各个领域都有实际应用。本文提出了将大型语言模型（LLMs）和模糊测试结合起来生成WP的方法。为了实现这一目标，我们引入了Fuzzing Guidance（FG）；FG作为一种指导LLMs朝向正确WP的手段，利用程序执行反馈。FG利用模糊测试来近似检查候选WP的有效性和弱点，然后将这些信息反馈给LLM作为上下文细化的手段。我们在Java中对一组确定性数组程序进行了广泛的基准测试，实验证明LLMs能够产生可行的候选WP，并且通过FG可以实际增强这种能力。

更新时间: 2025-07-03 15:14:43

领域: cs.SE,cs.AI,cs.LO

下载: http://arxiv.org/abs/2507.05272v1

Time-critical and confidence-based abstraction dropping methods

One paradigm of Monte Carlo Tree Search (MCTS) improvements is to build and use state and/or action abstractions during the tree search. Non-exact abstractions, however, introduce an approximation error making convergence to the optimal action in the abstract space impossible. Hence, as proposed as a component of Elastic Monte Carlo Tree Search by Xu et al., abstraction algorithms should eventually drop the abstraction. In this paper, we propose two novel abstraction dropping schemes, namely OGA-IAAD and OGA-CAD which can yield clear performance improvements whilst being safe in the sense that the dropping never causes any notable performance degradations contrary to Xu's dropping method. OGA-IAAD is designed for time critical settings while OGA-CAD is designed to improve the MCTS performance with the same number of iterations.

Updated: 2025-07-03 15:12:05

标题: 时间关键和基于置信度的抽象丢弃方法

摘要: 蒙特卡洛树搜索（MCTS）改进的一个范式是在树搜索过程中构建和使用状态和/或动作抽象。然而，非精确的抽象引入了近似误差，使得在抽象空间中收敛到最佳动作变得不可能。因此，正如徐等人在弹性蒙特卡洛树搜索中提出的，抽象算法应最终放弃抽象。在本文中，我们提出了两种新颖的抽象放弃方案，即OGA-IAAD和OGA-CAD，这两种方案在不会造成有显著性能下降的情况下可以明显提高性能。OGA-IAAD设计用于时间紧迫的环境，而OGA-CAD旨在在相同迭代次数下提高MCTS性能。

更新时间: 2025-07-03 15:12:05

领域: cs.AI

下载: http://arxiv.org/abs/2507.02703v1

Control at Stake: Evaluating the Security Landscape of LLM-Driven Email Agents

The increasing capabilities of LLMs have led to the rapid proliferation of LLM agent apps, where developers enhance LLMs with access to external resources to support complex task execution. Among these, LLM email agent apps represent one of the widely used categories, as email remains a critical communication medium for users. LLM email agents are capable of managing and responding to email using LLM-driven reasoning and autonomously executing user instructions via external email APIs (e.g., send email). However, despite their growing deployment and utility, the security mechanism of LLM email agent apps remains underexplored. Currently, there is no comprehensive study into the potential security risk within these agent apps and their broader implications. In this paper, we conduct the first in-depth and systematic security study of LLM email agents. We propose the Email Agent Hijacking (EAH) attack, which overrides the original prompts of the email agent via external email resources, allowing attackers to gain control of the email agent remotely and further perform specific attack scenarios without user awareness. To facilitate the large-scale evaluation, we propose EAHawk, a pipeline to evaluate the EAH attack of LLM email agent apps. By EAHawk, we performed an empirical study spanning 14 representative LLM agent frameworks, 63 agent apps, 12 LLMs, and 20 email services, which led to the generation of 1,404 real-world email agent instances for evaluation. Experimental results indicate that all 1,404 instances were successfully hijacked; on average, only 2.03 attack attempts are required to control an email agent instance. Even worse, for some LLMs, the average number of attempts needed to achieve full agent control drops to as few as 1.23.

Updated: 2025-07-03 15:09:40

标题: 权衡控制：评估基于LLM的电子邮件代理的安全格局

摘要: 随着LLM的能力不断增强，LLM代理应用程序快速增多，开发人员通过访问外部资源来增强LLM，以支持复杂任务的执行。在这些应用程序中，LLM电子邮件代理应用程序代表了一个广泛使用的类别，因为电子邮件仍然是用户的重要沟通媒介。LLM电子邮件代理能够使用LLM驱动的推理来管理和回复电子邮件，并通过外部电子邮件API（例如发送电子邮件）自主执行用户指令。然而，尽管它们的部署和效用不断增长，LLM电子邮件代理应用程序的安全机制仍未得到充分探讨。目前，还没有关于这些代理应用程序潜在安全风险及其更广泛影响的综合研究。在本文中，我们进行了第一次深入系统的LLM电子邮件代理安全研究。我们提出了电子邮件代理劫持（EAH）攻击，通过外部电子邮件资源覆盖电子邮件代理的原始提示，使攻击者能够远程控制电子邮件代理，并进一步执行特定攻击场景而不被用户察觉。为了促进大规模评估，我们提出了EAHawk，一个用于评估LLM电子邮件代理应用程序EAH攻击的流水线。通过EAHawk，我们进行了一项经验研究，涵盖了14个代表性LLM代理框架、63个代理应用程序、12个LLM和20个电子邮件服务，从而生成了1,404个真实世界的电子邮件代理实例进行评估。实验结果表明，所有1,404个实例都成功被劫持；平均只需2.03次攻击尝试即可控制一个电子邮件代理实例。更糟糕的是，对于某些LLM，实现完全代理控制所需的平均尝试次数甚至降至仅需1.23次。

更新时间: 2025-07-03 15:09:40

领域: cs.CR

下载: http://arxiv.org/abs/2507.02699v1

Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions

This study investigates how Multi-Agent Reinforcement Learning (MARL) can improve dynamic pricing strategies in supply chains, particularly in contexts where traditional ERP systems rely on static, rule-based approaches that overlook strategic interactions among market actors. While recent research has applied reinforcement learning to pricing, most implementations remain single-agent and fail to model the interdependent nature of real-world supply chains. This study addresses that gap by evaluating the performance of three MARL algorithms: MADDPG, MADQN, and QMIX against static rule-based baselines, within a simulated environment informed by real e-commerce transaction data and a LightGBM demand prediction model. Results show that rule-based agents achieve near-perfect fairness (Jain's Index: 0.9896) and the highest price stability (volatility: 0.024), but they fully lack competitive dynamics. Among MARL agents, MADQN exhibits the most aggressive pricing behaviour, with the highest volatility and the lowest fairness (0.5844). MADDPG provides a more balanced approach, supporting market competition (share volatility: 9.5 pp) while maintaining relatively high fairness (0.8819) and stable pricing. These findings suggest that MARL introduces emergent strategic behaviour not captured by static pricing rules and may inform future developments in dynamic pricing.

Updated: 2025-07-03 15:07:37

标题: 供应链中动态定价的多智能体强化学习：在真实模拟市场条件下对战略智能体行为进行基准测试

摘要: 这项研究调查了多智能体强化学习（MARL）如何改进供应链中的动态定价策略，特别是在传统ERP系统依赖静态、基于规则的方法且忽视市场参与者之间战略互动的环境中。尽管最近的研究将强化学习应用于定价，但大多数实施仍然是单智能体的，并且未能模拟现实世界供应链的相互依赖性。本研究通过评估三种MARL算法：MADDPG、MADQN和QMIX与静态基线规则的性能，在一个由真实电子商务交易数据和LightGBM需求预测模型支持的模拟环境中，填补了这一空白。结果显示，基于规则的智能体实现了几乎完美的公平性（Jain指数：0.9896）和最高的价格稳定性（波动性：0.024），但它们完全缺乏竞争动态。在MARL智能体中，MADQN表现出最具侵略性的定价行为，具有最高的波动性和最低的公平性（0.5844）。MADDPG提供了一种更平衡的方法，支持市场竞争（份额波动：9.5个百分点），同时保持相对较高的公平性（0.8819）和稳定的定价。这些发现表明，MARL引入了静态定价规则无法捕捉的新兴战略行为，并可能为动态定价的未来发展提供信息。

更新时间: 2025-07-03 15:07:37

领域: cs.LG,econ.EM

下载: http://arxiv.org/abs/2507.02698v1

RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes

Next activity prediction represents a fundamental challenge for optimizing business processes in service-oriented architectures such as microservices environments, distributed enterprise systems, and cloud-native platforms, which enables proactive resource allocation and dynamic service composition. Despite the prevalence of sequence-based methods, these approaches fail to capture non-sequential relationships that arise from parallel executions and conditional dependencies. Even though graph-based approaches address structural preservation, they suffer from homogeneous representations and static structures that apply uniform modeling strategies regardless of individual process complexity characteristics. To address these limitations, we introduce RLHGNN, a novel framework that transforms event logs into heterogeneous process graphs with three distinct edge types grounded in established process mining theory. Our approach creates four flexible graph structures by selectively combining these edges to accommodate different process complexities, and employs reinforcement learning formulated as a Markov Decision Process to automatically determine the optimal graph structure for each specific process instance. RLHGNN then applies heterogeneous graph convolution with relation-specific aggregation strategies to effectively predict the next activity. This adaptive methodology enables precise modeling of both sequential and non-sequential relationships in service interactions. Comprehensive evaluation on six real-world datasets demonstrates that RLHGNN consistently outperforms state-of-the-art approaches. Furthermore, it maintains an inference latency of approximately 1 ms per prediction, representing a highly practical solution suitable for real-time business process monitoring applications. The source code is available at https://github.com/Joker3993/RLHGNN.

Updated: 2025-07-03 15:01:08

标题: RLHGNN: 强化学习驱动的异构图神经网络用于商业流程中的下一个活动预测

摘要: 下一个活动预测代表了在诸如微服务环境、分布式企业系统和云原生平台等面向服务的体系结构中优化业务流程的一个基本挑战，这使得能够进行主动资源分配和动态服务组合。尽管基于序列的方法普遍存在，但这些方法无法捕捉由并行执行和条件依赖导致的非序列关系。尽管基于图的方法解决了结构保留的问题，但它们受到同质表示和静态结构的影响，这些结构采用统一的建模策略，不考虑个体流程复杂性特征。为了解决这些限制，我们引入了RLHGNN，这是一个新颖的框架，将事件日志转换为基于三种不同边类型的异构流程图，这些边类型根植于建立的过程挖掘理论。我们的方法通过有选择地组合这些边来创建四种灵活的图结构，以适应不同的流程复杂性，并采用强化学习构建为马尔可夫决策过程，自动确定每个特定流程实例的最佳图结构。然后，RLHGNN应用异构图卷积与关系特定的聚合策略，有效地预测下一个活动。这种自适应方法使得服务交互中的顺序和非顺序关系能够被精确建模。对六个真实世界数据集的全面评估表明，RLHGNN始终优于现有技术方法。此外，它保持每次预测约1毫秒的推理延迟，代表了一个非常实用的解决方案，适用于实时业务流程监控应用。源代码可在https://github.com/Joker3993/RLHGNN 获取。

更新时间: 2025-07-03 15:01:08

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2507.02690v1

APT: Adaptive Personalized Training for Diffusion Models with Limited Data

Personalizing diffusion models using limited data presents significant challenges, including overfitting, loss of prior knowledge, and degradation of text alignment. Overfitting leads to shifts in the noise prediction distribution, disrupting the denoising trajectory and causing the model to lose semantic coherence. In this paper, we propose Adaptive Personalized Training (APT), a novel framework that mitigates overfitting by employing adaptive training strategies and regularizing the model's internal representations during fine-tuning. APT consists of three key components: (1) Adaptive Training Adjustment, which introduces an overfitting indicator to detect the degree of overfitting at each time step bin and applies adaptive data augmentation and adaptive loss weighting based on this indicator; (2)Representation Stabilization, which regularizes the mean and variance of intermediate feature maps to prevent excessive shifts in noise prediction; and (3) Attention Alignment for Prior Knowledge Preservation, which aligns the cross-attention maps of the fine-tuned model with those of the pretrained model to maintain prior knowledge and semantic coherence. Through extensive experiments, we demonstrate that APT effectively mitigates overfitting, preserves prior knowledge, and outperforms existing methods in generating high-quality, diverse images with limited reference data.

Updated: 2025-07-03 14:58:08

标题: APT：有限数据下扩散模型的自适应个性化训练

摘要: 使用有限数据个性化扩散模型存在重大挑战，包括过拟合、先前知识丢失和文本对齐的退化。过拟合导致噪声预测分布发生偏移，破坏去噪轨迹，并导致模型失去语义连贯性。在本文中，我们提出了一种名为自适应个性化训练（APT）的新框架，通过采用自适应训练策略和在微调期间对模型的内部表示进行正则化来减轻过拟合。APT包括三个关键组成部分：（1）自适应训练调整，引入过拟合指标以检测每个时间步骤区间的过拟合程度，并根据此指标应用自适应数据增强和自适应损失加权；（2）表示稳定化，规范中间特征图的均值和方差，以防止噪声预测发生过度偏移；和（3）保留先前知识的关注对齐，将微调模型的交叉注意力图与预训练模型的交叉注意力图对齐，以维护先前知识和语义连贯性。通过广泛的实验证明，APT有效地减轻了过拟合，保留了先前知识，并在生成高质量、多样化的图像方面优于现有方法，即使参考数据有限。

更新时间: 2025-07-03 14:58:08

领域: cs.CV,cs.AI,60J60, 68T07,I.2.6; I.2.10; I.4.9

下载: http://arxiv.org/abs/2507.02687v1

HAPI: A Model for Learning Robot Facial Expressions from Human Preferences

Automatic robotic facial expression generation is crucial for human-robot interaction, as handcrafted methods based on fixed joint configurations often yield rigid and unnatural behaviors. Although recent automated techniques reduce the need for manual tuning, they tend to fall short by not adequately bridging the gap between human preferences and model predictions-resulting in a deficiency of nuanced and realistic expressions due to limited degrees of freedom and insufficient perceptual integration. In this work, we propose a novel learning-to-rank framework that leverages human feedback to address this discrepancy and enhanced the expressiveness of robotic faces. Specifically, we conduct pairwise comparison annotations to collect human preference data and develop the Human Affective Pairwise Impressions (HAPI) model, a Siamese RankNet-based approach that refines expression evaluation. Results obtained via Bayesian Optimization and online expression survey on a 35-DOF android platform demonstrate that our approach produces significantly more realistic and socially resonant expressions of Anger, Happiness, and Surprise than those generated by baseline and expert-designed methods. This confirms that our framework effectively bridges the gap between human preferences and model predictions while robustly aligning robotic expression generation with human affective responses.

Updated: 2025-07-03 14:55:55

标题: HAPI：从人类偏好中学习机器人面部表情的模型

摘要: 自动机器人面部表情生成对于人机交互至关重要，因为基于固定关节配置的手工方法通常会产生僵硬和不自然的行为。尽管最近的自动化技术减少了手动调整的需求，但它们往往未能充分弥合人类偏好和模型预测之间的差距，导致由于自由度有限和感知整合不足而缺乏微妙和逼真的表达。在这项工作中，我们提出了一种新颖的学习排序框架，利用人类反馈来解决这种差异，并增强机器人面部的表现力。具体而言，我们进行成对比较注释以收集人类偏好数据，并开发了基于Siamese RankNet 的 Human Affective Pairwise Impressions (HAPI) 模型，该模型细化了表达评估。通过贝叶斯优化和在线表情调查在一个35自由度的安卓平台上获得的结果表明，我们的方法产生的愤怒、快乐和惊讶的表达比基线和专家设计的方法生成的表达更加逼真和社会共鸣。这证实了我们的框架有效地弥合了人类偏好和模型预测之间的差距，同时稳固地将机器人表情生成与人类情感反应对齐。

更新时间: 2025-07-03 14:55:55

领域: cs.RO,cs.AI,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2503.17046v2

Learning few-step posterior samplers by unfolding and distillation of diffusion models

Diffusion models (DMs) have emerged as powerful image priors in Bayesian computational imaging. Two primary strategies have been proposed for leveraging DMs in this context: Plug-and-Play methods, which are zero-shot and highly flexible but rely on approximations; and specialized conditional DMs, which achieve higher accuracy and faster inference for specific tasks through supervised training. In this work, we introduce a novel framework that integrates deep unfolding and model distillation to transform a DM image prior into a few-step conditional model for posterior sampling. A central innovation of our approach is the unfolding of a Markov chain Monte Carlo (MCMC) algorithm - specifically, the recently proposed LATINO Langevin sampler (Spagnoletti et al., 2025) - representing the first known instance of deep unfolding applied to a Monte Carlo sampling scheme. We demonstrate our proposed unfolded and distilled samplers through extensive experiments and comparisons with the state of the art, where they achieve excellent accuracy and computational efficiency, while retaining the flexibility to adapt to variations in the forward model at inference time.

Updated: 2025-07-03 14:55:53

标题: 学习少步后验采样器：通过展开和扩散模型的提炼

摘要: 扩散模型（DMs）已经成为贝叶斯计算成像中强大的图像先验。在这种情况下，提出了两种主要策略来利用DMs：即插即用方法，这些方法是零射击和高度灵活的，但依赖于近似；以及专门的条件DMs，通过监督训练实现特定任务的更高准确性和更快推理。在这项工作中，我们介绍了一种新颖的框架，将深度展开和模型蒸馏相结合，将DM图像先验转化为几步条件模型以进行后验采样。我们方法的一个中心创新是展开马尔可夫链蒙特卡罗（MCMC）算法 - 具体来说，最近提出的LATINO Langevin采样器（Spagnoletti等，2025年） - 这是首次将深度展开应用于蒙特卡罗采样方案的已知实例。我们通过广泛的实验和与最新技术的比较来展示我们提出的展开和蒸馏采样器，它们在保留适应推理时间正向模型变化的灵活性的同时，实现了优秀的准确性和计算效率。

更新时间: 2025-07-03 14:55:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.02686v1

Anamorphic Cryptography using Baby-Step Giant-Step Recovery

In 2022, Persianom, Phan and Yung outlined the creation of Anamorphic Cryptography. With this, we can create a public key to encrypt data, and then have two secret keys. These secret keys are used to decrypt the cipher into different messages. So, one secret key is given to the Dictator (who must be able to decrypt all the messages), and the other is given to Alice. Alice can then decrypt the ciphertext to a secret message that the Dictator cannot see. This paper outlines the implementation of Anamorphic Cryptography using ECC (Elliptic Curve Cryptography), such as with the secp256k1 curve. This gives considerable performance improvements over discrete logarithm-based methods with regard to security for a particular bit length. Overall, it outlines how the secret message sent to Alice is hidden within the random nonce value, which is used within the encryption process, and which is cancelled out when the Dictator decrypts the ciphertext. It also shows that the BSGS (Baby-step Giant-step) variant significantly outperforms unoptimised elliptic curve methods.

Updated: 2025-07-03 14:52:27

标题: 使用Baby-Step Giant-Step恢复的变形密码学

摘要: 在2022年，Persianom、Phan和Yung概述了创建变形密码学的过程。通过这种方法，我们可以创建一个公钥来加密数据，然后有两个秘钥。这些秘钥用于将密码解密为不同的消息。因此，一个秘钥给予独裁者（必须能够解密所有消息），另一个给予Alice。Alice可以将密文解密为独裁者无法看到的秘密消息。本文概述了使用ECC（椭圆曲线密码学）实现变形密码学的过程，例如使用secp256k1曲线。这相对于基于离散对数方法在特定位长度上的安全性，提供了相当大的性能改进。总的来说，它概述了发送给Alice的秘密消息如何隐藏在用于加密过程中的随机nonce值中，当独裁者解密密文时被取消。它还表明，BSGS（Baby-step Giant-step）变体明显优于未优化的椭圆曲线方法。

更新时间: 2025-07-03 14:52:27

领域: cs.CR

下载: http://arxiv.org/abs/2505.23772v2

Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs

Deep Learning (DL) has revolutionized medical imaging, yet its adoption is constrained by data scarcity and privacy regulations, limiting access to diverse datasets. Federated Learning (FL) enables decentralized training but suffers from high communication costs and is often restricted to a single downstream task, reducing flexibility. We propose a data-sharing method via Differentially Private (DP) generative models. By adopting foundation models, we extract compact, informative embeddings, reducing redundancy and lowering computational overhead. Clients collaboratively train a Differentially Private Conditional Variational Autoencoder (DP-CVAE) to model a global, privacy-aware data distribution, supporting diverse downstream tasks. Our approach, validated across multiple feature extractors, enhances privacy, scalability, and efficiency, outperforming traditional FL classifiers while ensuring differential privacy. Additionally, DP-CVAE produces higher-fidelity embeddings than DP-CGAN while requiring $5{\times}$ fewer parameters.

Updated: 2025-07-03 14:36:15

标题: 基于嵌入的差分私有条件VAE的联邦数据共享

摘要: 深度学习（DL）已经彻底改变了医学成像，但其采用受到数据稀缺和隐私法规的限制，限制了对多样化数据集的访问。联邦学习（FL）实现了分散式训练，但存在高通信成本，并且通常仅限于单个下游任务，降低了灵活性。我们提出了一种通过差分隐私（DP）生成模型进行数据共享的方法。通过采用基础模型，我们提取紧凑、信息丰富的嵌入，减少了冗余并降低了计算开销。客户共同训练差分隐私条件变分自动编码器（DP-CVAE）来建模全局的、注重隐私的数据分布，支持多样的下游任务。我们的方法经过多个特征提取器的验证，提升了隐私性、可扩展性和效率，优于传统的FL分类器，同时确保差分隐私。此外，DP-CVAE产生比DP-CGAN更高保真度的嵌入，同时需要更少的参数。

更新时间: 2025-07-03 14:36:15

领域: cs.LG,cs.CV,eess.IV

下载: http://arxiv.org/abs/2507.02671v1

Guided Generation for Developable Antibodies

Therapeutic antibodies require not only high-affinity target engagement, but also favorable manufacturability, stability, and safety profiles for clinical effectiveness. These properties are collectively called `developability'. To enable a computational framework for optimizing antibody sequences for favorable developability, we introduce a guided discrete diffusion model trained on natural paired heavy- and light-chain sequences from the Observed Antibody Space (OAS) and quantitative developability measurements for 246 clinical-stage antibodies. To steer generation toward biophysically viable candidates, we integrate a Soft Value-based Decoding in Diffusion (SVDD) Module that biases sampling without compromising naturalness. In unconstrained sampling, our model reproduces global features of both the natural repertoire and approved therapeutics, and under SVDD guidance we achieve significant enrichment in predicted developability scores over unguided baselines. When combined with high-throughput developability assays, this framework enables an iterative, ML-driven pipeline for designing antibodies that satisfy binding and biophysical criteria in tandem.

Updated: 2025-07-03 14:35:14

标题: 引导发展可折叠抗体

摘要: 治疗性抗体不仅需要高亲和力的靶标结合，还需要有利于制造、稳定性和安全性特征以确保临床有效性。这些特性被称为“开发性”。为了建立一个优化抗体序列以获得有利开发性的计算框架，我们介绍了一个基于来自观察到的抗体空间（OAS）的天然配对重链和轻链序列以及246个临床阶段抗体的定量开发性测量训练的引导离散扩散模型。为了引导生成具有生物物理可行性的候选者，我们集成了一个基于软值解码的扩散（SVDD）模块，该模块通过偏向采样而不影响自然性。在无约束的采样中，我们的模型重现了自然库和已批准治疗药物的全局特征，并在SVDD的指导下，我们实现了预测开发性得分比未受指导的基线显著丰富。当与高通量开发性测定结合时，该框架实现了一个迭代的、由机器学习驱动的管道，用于设计同时满足结合和生物物理标准的抗体。

更新时间: 2025-07-03 14:35:14

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2507.02670v1

ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning

In recent advancements in audio self-supervised representation learning, the standard Transformer architecture has emerged as the predominant approach, yet its attention mechanism often allocates a portion of attention weights to irrelevant information, potentially impairing the model's discriminative ability. To address this, we introduce a differential attention mechanism, which effectively mitigates ineffective attention allocation through the integration of dual-softmax operations and appropriately tuned differential coefficients. Experimental results demonstrate that our ASDA model achieves state-of-the-art (SOTA) performance across multiple benchmarks, including audio classification (49.0% mAP on AS-2M, 41.5% mAP on AS20K), keyword spotting (98.3% accuracy on SPC-2), and environmental sound classification (96.1% accuracy on ESC-50). These results highlight ASDA's effectiveness in audio tasks, paving the way for broader applications.

Updated: 2025-07-03 14:29:43

标题: ASDA：用于自监督表示学习的音频频谱差分注意机制

摘要: 在最近的音频自监督表示学习的进展中，标准的Transformer架构已经成为主流方法，然而其注意力机制经常将一部分注意力权重分配给无关信息，可能影响模型的辨别能力。为了解决这个问题，我们引入了一种差分注意力机制，通过集成双softmax操作和适当调整的差分系数有效地减轻了无效的注意力分配。实验结果表明，我们的ASDA模型在多个基准测试中实现了最先进的性能（例如音频分类（AS-2M上的49.0% mAP，AS20K上的41.5% mAP），关键词识别（SPC-2上的98.3%准确率）和环境声音分类（ESC-50上的96.1%准确率）。这些结果突显了ASDA在音频任务中的有效性，为更广泛的应用铺平了道路。

更新时间: 2025-07-03 14:29:43

领域: cs.SD,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2507.02666v1

Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models

Recent Long Reasoning Models(LRMs) have demonstrated remarkable capabilities in handling complex reasoning tasks, but are hindered by excessive overthinking. To explore its essence, our empirical analysis reveals that LRMs are primarily limited to recognizing task properties (i.e., difficulty levels) like humans before solving the problem, leading to a one-size-fits-all reasoning process. Inspired by this, a pressing and natural question emerges: Can we bootstrap such ability to further alleviate the overthinking phenomenon in LRMs? In this paper, we propose Think-How-to-Think (TH2T), a novel two-stage fine-tuning strategy that progressively inspires LRMs' difficulty cognition and redundancy cognition. First, we introduce difficulty-hypnosis in the prefixes of model outputs to intervene in the internal reasoning trajectory. Combined with a heterogeneous short and long reasoning dataset, the trained model enhances its sensitivity to task difficulty, enabling native, differentiated reasoning strategies across various tasks. Second, we further extend redundancy-hypnosis to the internal reasoning process, guiding the model to identify redundant structures within the reasoning steps and generate more concise reasoning outputs. Experiments on 7B/14B/32B models demonstrate that TH2T significantly reduces inference costs (more than 70% on easy tasks and 40% on hard tasks) while maintaining performance stability. The resulting outputs exhibit clear difficulty-aware capabilities and reduced redundancy (e.g., reflection).

Updated: 2025-07-03 14:24:26

标题: 思考如何思考：在大型推理模型中通过自主难度认知减轻过度思考

摘要: 最近的长推理模型（LRMs）在处理复杂推理任务方面展示出了显著的能力，但受到过度思考的阻碍。为了探索其本质，我们的实证分析揭示了LRMs主要局限于在解决问题之前像人类一样识别任务属性（即难度级别），导致一种一刀切的推理过程。在此基础上，一个紧迫且自然的问题浮现：我们是否能够启动这种能力以进一步减轻LRMs中的过度思考现象？在本文中，我们提出了Think-How-to-Think（TH2T），这是一种新颖的两阶段微调策略，逐步激发LRMs的难度认知和冗余认知。首先，我们在模型输出的前缀中引入了难度催眠，干预内部推理轨迹。结合异构的短期和长期推理数据集，训练好的模型增强了对任务难度的敏感性，使其能够在各种任务中采用本地化、差异化的推理策略。其次，我们进一步将冗余催眠扩展到内部推理过程，指导模型识别推理步骤中的冗余结构，并生成更简洁的推理输出。对7B/14B/32B模型的实验证明，TH2T显著降低了推理成本（在简单任务上超过70%，在困难任务上超过40%），同时保持了性能的稳定性。最终的输出展现出明显的难度感知能力和降低的冗余（例如反思）。

更新时间: 2025-07-03 14:24:26

领域: cs.AI

下载: http://arxiv.org/abs/2507.02663v1

Hey AI, Generate Me a Hardware Code! Agentic AI-based Hardware Design & Verification

Modern Integrated Circuits (ICs) are becoming increasingly complex, and so is their development process. Hardware design verification entails a methodical and disciplined approach to the planning, development, execution, and sign-off of functionally correct hardware designs. This tedious process requires significant effort and time to ensure a bug-free tape-out. The field of Natural Language Processing has undergone a significant transformation with the advent of Large Language Models (LLMs). These powerful models, often referred to as Generative AI (GenAI), have revolutionized how machines understand and generate human language, enabling unprecedented advancements in a wide array of applications, including hardware design verification. This paper presents an agentic AI-based approach to hardware design verification, which empowers AI agents, in collaboration with Humain-in-the-Loop (HITL) intervention, to engage in a more dynamic, iterative, and self-reflective process, ultimately performing end-to-end hardware design and verification. This methodology is evaluated on five open-source designs, achieving over 95% coverage with reduced verification time while demonstrating superior performance, adaptability, and configurability.

Updated: 2025-07-03 14:20:57

标题: 嘿，人工智能，请为我生成一个硬件代码！具有主动性的基于人工智能的硬件设计与验证

摘要: 现代集成电路（IC）变得越来越复杂，其开发过程也是如此。硬件设计验证需要一种有条理和纪律性的方法来规划、开发、执行和确认功能正确的硬件设计。这一繁琐的过程需要大量的努力和时间，以确保无错误的成片。自然语言处理领域随着大型语言模型（LLM）的出现发生了重大变革。这些强大的模型，通常被称为生成式人工智能（GenAI），彻底改变了机器理解和生成人类语言的方式，实现了在各种应用中的前所未有的进步，包括硬件设计验证。本文提出了一种基于人工智能代理的硬件设计验证方法，该方法赋予了AI代理与Humain-in-the-Loop（HITL）干预合作，参与更加动态、迭代和自我反思的过程，最终实现端到端的硬件设计和验证。该方法在五个开源设计上进行评估，实现了超过95%的覆盖率，同时减少了验证时间，表现出卓越的性能、适应性和可配置性。

更新时间: 2025-07-03 14:20:57

领域: cs.AI,cs.AR

下载: http://arxiv.org/abs/2507.02660v1

OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding

Speculative decoding generally dictates having a small, efficient draft model that is either pretrained or distilled offline to a particular target model series, for instance, Llama or Qwen models. However, within online deployment settings, there are two major challenges: 1) usage of a target model that is incompatible with the draft model; 2) expectation of latency improvements over usage and time. In this work, we propose OmniDraft, a unified framework that enables a single draft model to operate with any target model and adapt dynamically to user data. We introduce an online n-gram cache with hybrid distillation fine-tuning to address the cross-vocabulary mismatch across draft and target models; and further improve decoding speed by leveraging adaptive drafting techniques. OmniDraft is particularly suitable for on-device LLM applications where model cost, efficiency and user customization are the major points of contention. This further highlights the need to tackle the above challenges and motivates the \textit{``one drafter for all''} paradigm. We showcase the proficiency of the OmniDraft framework by performing online learning on math reasoning, coding and text generation tasks. Notably, OmniDraft enables a single Llama-68M model to pair with various target models including Vicuna-7B, Qwen2-7B and Llama3-8B models for speculative decoding; and additionally provides up to 1.5-2x speedup.

Updated: 2025-07-03 14:20:41

标题: OmniDraft：一种跨词汇、在线自适应草稿工具，用于设备上的推测解码

摘要: 投机性解码通常要求使用一个小巧高效的草稿模型，该模型要么是预先训练的，要么是离线蒸馏到特定目标模型系列，例如Llama或Qwen模型。然而，在在线部署环境中，存在两个主要挑战：1) 使用与草稿模型不兼容的目标模型；2) 期望随着使用和时间的推移减少延迟。在这项工作中，我们提出了OmniDraft，这是一个统一框架，使单个草稿模型能够与任何目标模型配合，并动态适应用户数据。我们引入了一个在线n-gram缓存和混合蒸馏微调，以解决草稿和目标模型之间跨词汇不匹配的问题；并通过利用自适应起草技术进一步提高解码速度。OmniDraft特别适用于设备上的LLM应用程序，其中模型成本、效率和用户定制是主要争议点。这进一步凸显了解决上述挑战的必要性，并激励“一个起草者适用于所有”的范式。我们通过在线学习数学推理、编码和文本生成任务展示了OmniDraft框架的熟练性。值得注意的是，OmniDraft使单个Llama-68M模型能够与各种目标模型配对，包括Vicuna-7B、Qwen2-7B和Llama3-8B模型，用于投机性解码；并且还提供高达1.5-2倍的加速。

更新时间: 2025-07-03 14:20:41

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2507.02659v1

Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search

Complex information needs in real-world search scenarios demand deep reasoning and knowledge synthesis across diverse sources, which traditional retrieval-augmented generation (RAG) pipelines struggle to address effectively. Current reasoning-based approaches suffer from a fundamental limitation: they use a single model to handle both high-level planning and detailed execution, leading to inefficient reasoning and limited scalability. In this paper, we introduce HiRA, a hierarchical framework that separates strategic planning from specialized execution. Our approach decomposes complex search tasks into focused subtasks, assigns each subtask to domain-specific agents equipped with external tools and reasoning capabilities, and coordinates the results through a structured integration mechanism. This separation prevents execution details from disrupting high-level reasoning while enabling the system to leverage specialized expertise for different types of information processing. Experiments on four complex, cross-modal deep search benchmarks demonstrate that HiRA significantly outperforms state-of-the-art RAG and agent-based systems. Our results show improvements in both answer quality and system efficiency, highlighting the effectiveness of decoupled planning and execution for multi-step information seeking tasks. Our code is available at https://github.com/ignorejjj/HiRA.

Updated: 2025-07-03 14:18:08

标题: 解耦规划与执行：深度搜索的分层推理框架

摘要: 在现实世界的搜索场景中，复杂的信息需求需要跨越多种来源的深层推理和知识综合，传统的检索增强生成（RAG）管道很难有效地解决这些需求。当前基于推理的方法存在一个根本性限制：它们使用单一模型来处理高层规划和详细执行，导致推理效率低下且可扩展性有限。在本文中，我们介绍了HiRA，这是一个将战略规划与专门执行分离的分层框架。我们的方法将复杂的搜索任务分解成专注的子任务，将每个子任务分配给配备外部工具和推理能力的领域专家代理，并通过结构化的整合机制协调结果。这种分离可以防止执行细节干扰高层推理，同时使系统能够利用不同类型信息处理的专门知识。对四个复杂的跨模态深度搜索基准进行的实验表明，HiRA明显优于最先进的RAG和基于代理的系统。我们的结果显示了在回答质量和系统效率方面的改进，突显了对多步信息寻求任务进行解耦规划和执行的有效性。我们的代码可在https://github.com/ignorejjj/HiRA 上找到。

更新时间: 2025-07-03 14:18:08

领域: cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2507.02652v1

Nebula: Efficient, Private and Accurate Histogram Estimation

We present \textit{Nebula}, a system for differentially private histogram estimation on data distributed among clients. \textit{Nebula} allows clients to independently decide whether to participate in the system, and locally encode their data so that an untrusted server only learns data values whose multiplicity exceeds a predefined aggregation threshold, with $(\varepsilon,\delta)$ differential privacy guarantees. Compared to existing systems, \textit{Nebula} uniquely achieves: \textit{i)} a strict upper bound on client privacy leakage; \textit{ii)} significantly higher utility than standard local differential privacy systems; and \textit{iii)} no requirement for trusted third-parties, multi-party computation, or trusted hardware. We provide a formal evaluation of \textit{Nebula}'s privacy, utility and efficiency guarantees, along with an empirical assessment on three real-world datasets. On the United States Census dataset, clients can submit their data in just 0.0036 seconds and 0.0016 MB (\textbf{efficient}), under strong $(\varepsilon=1,\delta=10^{-8})$ differential privacy guarantees (\textbf{private}), enabling \textit{Nebula}'s untrusted aggregation server to estimate histograms with over 88\% better utility than existing local differential privacy deployments (\textbf{accurate}). Additionally, we describe a variant that allows clients to submit multi-dimensional data, with similar privacy, utility, and performance. Finally, we provide an implementation of \textit{Nebula}.

Updated: 2025-07-03 14:13:44

标题: 星云：高效、私密和准确的直方图估计

摘要: 我们提出了\textit{Nebula}，这是一个在分布在客户端之间的数据上进行差分私有直方图估计的系统。 \textit{Nebula}允许客户端独立决定是否参与系统，并在本地对其数据进行编码，以便不受信任的服务器仅了解其数据值的多重性超过预定义的聚合阈值时，具有$(\varepsilon，\delta)$差分隐私保证。与现有系统相比，\textit{Nebula}独特地实现了：\textit{i）}客户隐私泄漏的严格上限；\textit{ii）}比标准本地差分隐私系统具有显着更高的效用；和\textit{iii）}无需信任第三方、多方计算或可信硬件的要求。我们提供了对\textit{Nebula}的隐私、效用和效率保证的正式评估，以及对三个真实世界数据集的实证评估。在美国人口普查数据集上，客户端可以在仅0.0036秒和0.0016 MB内提交其数据（\textbf{高效}），在强$(\varepsilon=1，\delta=10^{-8})$差分隐私保证下（\textbf{私密}），使\textit{Nebula}的不受信任的聚合服务器能够估计比现有本地差分隐私部署准确度提高88\%以上的直方图（\textbf{准确}）。此外，我们描述了一种允许客户端提交多维数据的变体，具有类似的隐私、效用和性能。最后，我们提供了\textit{Nebula}的实现。

更新时间: 2025-07-03 14:13:44

领域: cs.CR

下载: http://arxiv.org/abs/2409.09676v2

Fair Deepfake Detectors Can Generalize

Deepfake detection models face two critical challenges: generalization to unseen manipulations and demographic fairness among population groups. However, existing approaches often demonstrate that these two objectives are inherently conflicting, revealing a trade-off between them. In this paper, we, for the first time, uncover and formally define a causal relationship between fairness and generalization. Building on the back-door adjustment, we show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions. Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals. Across three cross-domain benchmarks, DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art detectors, validating both its theoretical foundation and practical effectiveness.

Updated: 2025-07-03 14:10:02

标题: 公平的深度伪造检测器能够泛化

摘要: Deepfake检测模型面临两个关键挑战：泛化到未见操纵和人口群体之间的民族公平。然而，现有方法通常表明这两个目标在本质上是冲突的，揭示了它们之间的权衡。在本文中，我们首次揭示并正式定义了公平和泛化之间的因果关系。在背门调整的基础上，我们表明通过控制混杂因素（数据分布和模型容量）可以通过公平干预实现改进的泛化。受到这一洞察的启发，我们提出了人口属性不敏感干预检测（DAID），这是一个即插即用的框架，包括：i）人口意识数据再平衡，利用反倾向加权和子群特征归一化来中和分布偏差；和ii）人口不可知特征聚合，利用一种新颖的对齐损失来抑制敏感属性信号。在三个跨领域基准测试中，DAID相对于几种最先进的检测器始终表现出更好的公平性和泛化性能，验证了其理论基础和实际有效性。

更新时间: 2025-07-03 14:10:02

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.02645v1

Solving the Hubbard model with Neural Quantum States

The rapid development of neural quantum states (NQS) has established it as a promising framework for studying quantum many-body systems. In this work, by leveraging the cutting-edge transformer-based architectures and developing highly efficient optimization algorithms, we achieve the state-of-the-art results for the doped two-dimensional (2D) Hubbard model, arguably the minimum model for high-Tc superconductivity. Interestingly, we find different attention heads in the NQS ansatz can directly encode correlations at different scales, making it capable of capturing long-range correlations and entanglements in strongly correlated systems. With these advances, we establish the half-filled stripe in the ground state of 2D Hubbard model with the next nearest neighboring hoppings, consistent with experimental observations in cuprates. Our work establishes NQS as a powerful tool for solving challenging many-fermions systems.

Updated: 2025-07-03 14:08:25

标题: 用神经量子态解决Hubbard模型

摘要: 神经量子态（NQS）的快速发展使其成为研究量子多体系统的有前途的框架。在这项工作中，通过利用最先进的基于变压器的架构并开发高效的优化算法，我们实现了掺杂的二维（2D）哈伯德模型的最新结果，这可能是高温超导的最小模型。有趣的是，我们发现NQS假设中的不同注意头可以直接编码不同尺度上的相关性，使其能够捕获强相关系统中的长程相关性和纠缠。通过这些进展，我们确定了具有下一个最近邻跳变的2D哈伯德模型的半填充条纹在基态中的存在，与杯酸盐中的实验观察一致。我们的工作将NQS确立为解决具有挑战性的多费米子系统的强大工具。

更新时间: 2025-07-03 14:08:25

领域: cond-mat.str-el,cs.AI,quant-ph

下载: http://arxiv.org/abs/2507.02644v1

On Efficient Bayesian Exploration in Model-Based Reinforcement Learning

In this work, we address the challenge of data-efficient exploration in reinforcement learning by examining existing principled, information-theoretic approaches to intrinsic motivation. Specifically, we focus on a class of exploration bonuses that targets epistemic uncertainty rather than the aleatoric noise inherent in the environment. We prove that these bonuses naturally signal epistemic information gains and converge to zero once the agent becomes sufficiently certain about the environment's dynamics and rewards, thereby aligning exploration with genuine knowledge gaps. Our analysis provides formal guarantees for IG-based approaches, which previously lacked theoretical grounding. To enable practical use, we also discuss tractable approximations via sparse variational Gaussian Processes, Deep Kernels and Deep Ensemble models. We then outline a general framework - Predictive Trajectory Sampling with Bayesian Exploration (PTS-BE) - which integrates model-based planning with information-theoretic bonuses to achieve sample-efficient deep exploration. We empirically demonstrate that PTS-BE substantially outperforms other baselines across a variety of environments characterized by sparse rewards and/or purely exploratory tasks.

Updated: 2025-07-03 14:03:47

标题: 关于在基于模型的强化学习中的高效贝叶斯探索

摘要: 在这项工作中，我们通过研究现有的基于信息论原则的内在动机方法，来解决强化学习中数据高效探索的挑战。具体来说，我们关注一类探索奖励，其目标是认知不确定性，而不是环境中固有的随机噪声。我们证明这些奖励自然地表明认知信息增益，并且一旦agent对环境的动态和奖励变得足够确定，这些奖励就会收敛到零，从而将探索与真正的知识差距对齐。我们的分析为基于信息增益的方法提供了形式保证，这些方法之前缺乏理论基础。为了实现实际应用，我们还讨论了通过稀疏变分高斯过程、深度核和深度集成模型实现的可行近似方法。然后，我们概述了一个通用框架 - 预测轨迹采样与贝叶斯探索(PTS-BE) - 它将基于模型的规划与基于信息论奖励相结合，以实现高效的深度探索。我们在实证上证明，PTS-BE在稀疏奖励和/或纯粹探索任务特征的各种环境中明显优于其他基线。

更新时间: 2025-07-03 14:03:47

领域: cs.LG

下载: http://arxiv.org/abs/2507.02639v1

Empowering Intelligent Low-altitude Economy with Large AI Model Deployment

Low-altitude economy (LAE) represents an emerging economic paradigm that redefines commercial and social aerial activities. Large artificial intelligence models (LAIMs) offer transformative potential to further enhance the intelligence of LAE services. However, deploying LAIMs in LAE poses several challenges, including the significant gap between their computational/storage demands and the limited onboard resources of LAE entities, the mismatch between lab-trained LAIMs and dynamic physical environments, and the inefficiencies of traditional decoupled designs for sensing, communication, and computation. To address these issues, we first propose a hierarchical system architecture tailored for LAIM deployment and present representative LAE application scenarios. Next, we explore key enabling techniques that facilitate the mutual co-evolution of LAIMs and low-altitude systems, and introduce a task-oriented execution pipeline for scalable and adaptive service delivery. Then, the proposed framework is validated through real-world case studies. Finally, we outline open challenges to inspire future research.

Updated: 2025-07-03 14:03:18

标题: 利用大型人工智能模型部署，赋能智能低空经济

摘要: 低空经济（LAE）代表了一个新兴的经济范式，重新定义了商业和社会空中活动。大型人工智能模型（LAIMs）具有改变潜力，可以进一步提升LAE服务的智能性。然而，在LAE中部署LAIMs面临着几个挑战，包括它们的计算/存储需求与LAE实体有限的机载资源之间的显著差距，实验室训练的LAIMs与动态物理环境之间的不匹配，以及传感、通信和计算的传统解耦设计的低效性。为了解决这些问题，我们首先提出了一个为LAIM部署量身定制的分层系统架构，并介绍了代表性的LAE应用场景。接下来，我们探讨了促进LAIMs和低空系统相互共同演化的关键技术，并引入了一个面向任务的执行管道，用于可伸缩和自适应服务交付。然后，通过真实案例研究验证了所提出的框架。最后，我们概述了激励未来研究的开放挑战。

更新时间: 2025-07-03 14:03:18

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2505.22343v2

SAT-BO: Verification Rule Learning and Optimization for FraudTransaction Detection

Electronic payment platforms are estimated to process billions oftransactions daily, with the cumulative value of these transactionspotentially reaching into the trillions. Even a minor error within thishigh-volume environment could precipitate substantial financiallosses. To mitigate this risk, manually constructed verification rules,developed by domain experts, are typically employed to identifyand scrutinize transactions in production environments. However,due to the absence of a systematic approach to ensure the robust-ness of these verification rules against vulnerabilities, they remainsusceptible to exploitation.To mitigate this risk, manually constructed verification rules, de-veloped by domain experts, are typically employed to identify andscrutinize transactions in production environments. However, dueto the absence of a systematic approach to ensure the robustness ofthese verification rules against vulnerabilities, they remain suscep-tible to exploitation. To ensure data security, database maintainersusually compose complex verification rules to check whether aquery/update request is valid. However, the rules written by ex-perts are usually imperfect, and malicious requests may bypassthese rules. As a result, the demand for identifying the defects ofthe rules systematically emerges.

Updated: 2025-07-03 14:01:26

标题: SAT-BO：用于欺诈交易检测的验证规则学习和优化

摘要: 电子支付平台每天估计处理数十亿笔交易，这些交易的累积价值可能达到数万亿。即使在这个高交易量环境中发生小错误，也可能导致巨大的财务损失。为了减轻这种风险，通常会使用由领域专家开发的手动构建的验证规则来识别和审查生产环境中的交易。然而，由于缺乏一种系统化方法来确保这些验证规则对漏洞的稳健性，它们仍然容易受到利用。为了确保数据安全，数据库维护人员通常会编写复杂的验证规则来检查查询/更新请求是否有效。然而，专家编写的规则通常是不完美的，恶意请求可能会绕过这些规则。因此，系统地识别规则的缺陷的需求逐渐出现。

更新时间: 2025-07-03 14:01:26

领域: cs.CR,cs.DB

下载: http://arxiv.org/abs/2507.02635v1

High-Order Deep Meta-Learning with Category-Theoretic Interpretation

We introduce a new hierarchical deep learning framework for recursive higher-order meta-learning that enables neural networks (NNs) to construct, solve, and generalise across hierarchies of tasks. Central to this approach is a generative mechanism that creates \emph{virtual tasks} -- synthetic problem instances designed to enable the meta-learner to learn \emph{soft constraints} and unknown generalisable rules across related tasks. Crucially, this enables the framework to generate its own informative, task-grounded datasets thereby freeing machine learning (ML) training from the limitations of relying entirely on human-generated data. By actively exploring the virtual point landscape and seeking out tasks lower-level learners find difficult, the meta-learner iteratively refines constraint regions. This enhances inductive biases, regularises the adaptation process, and produces novel, unanticipated tasks and constraints required for generalisation. Each meta-level of the hierarchy corresponds to a progressively abstracted generalisation of problems solved at lower levels, enabling a structured and interpretable learning progression. By interpreting meta-learners as category-theoretic \emph{functors} that generate and condition a hierarchy of subordinate learners, we establish a compositional structure that supports abstraction and knowledge transfer across progressively generalised tasks. The category-theoretic perspective unifies existing meta-learning models and reveals how learning processes can be transformed and compared through functorial relationships, while offering practical design principles for structuring meta-learning. We speculate this architecture may underpin the next generation of NNs capable of autonomously generating novel, instructive tasks and their solutions, thereby advancing ML towards general artificial intelligence.

Updated: 2025-07-03 14:01:14

标题: 高阶深度元学习与范畴论解释

摘要: 我们引入了一个新的层次深度学习框架，用于递归高阶元学习，使神经网络（NNs）能够构建、解决和泛化跨任务层次。这种方法的核心是一个生成机制，用于创建“虚拟任务” - 设计用于使元学习者能够学习跨相关任务的“软约束”和未知可泛化规则的合成问题实例。关键在于，这使得该框架能够生成自己的信息丰富、基于任务的数据集，从而使机器学习（ML）训练摆脱完全依赖人工生成数据的局限性。通过积极探索虚拟点景观，并寻找低级学习者发现困难的任务，元学习者迭代地优化约束区域。这增强了归纳偏差，规范了适应过程，并产生了需要泛化的新颖、意想不到的任务和约束。层次结构的每个元级别对应于在较低级别解决的问题的逐渐抽象泛化，从而实现了结构化和可解释的学习进程。通过将元学习者解释为范畴论“函子”，它生成和调节一组从属学习者的层次结构，我们建立了一个支持跨渐进泛化任务的抽象和知识传递的组合结构。范畴论的视角统一了现有的元学习模型，并揭示了如何通过函子关系转换和比较学习过程，同时为构建元学习提供实际设计原则。我们推测这种架构可能支撑下一代能够自主生成新颖、有启发性任务及其解决方案的神经网络，从而将ML推向通用人工智能。

更新时间: 2025-07-03 14:01:14

领域: cs.LG

下载: http://arxiv.org/abs/2507.02634v1

Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data

Background: The use of Electronic Health Records (EHRs) for epidemiological studies and artificial intelligence (AI) training is increasing rapidly. The reliability of the results depends on the accuracy and completeness of EHR data. However, EHR data often contain significant quality issues, including misrepresentations of subpopulations, biases, and systematic errors, as they are primarily collected for clinical and billing purposes. Existing quality assessment methods remain insufficient, lacking systematic procedures to assess data fitness for research. Methods: We present the Medical Data Pecking approach, which adapts unit testing and coverage concepts from software engineering to identify data quality concerns. We demonstrate our approach using the Medical Data Pecking Tool (MDPT), which consists of two main components: (1) an automated test generator that uses large language models and grounding techniques to create a test suite from data and study descriptions, and (2) a data testing framework that executes these tests, reporting potential errors and coverage. Results: We evaluated MDPT on three datasets: All of Us (AoU), MIMIC-III, and SyntheticMass, generating 55-73 tests per cohort across four conditions. These tests correctly identified 20-43 non-aligned or non-conforming data issues. We present a detailed analysis of the LLM-generated test suites in terms of reference grounding and value accuracy. Conclusion: Our approach incorporates external medical knowledge to enable context-sensitive data quality testing as part of the data analysis workflow to improve the validity of its outcomes. Our approach tackles these challenges from a quality assurance perspective, laying the foundation for further development such as additional data modalities and improved grounding methods.

Updated: 2025-07-03 13:54:50

标题: 医疗数据筛选：一种基于上下文的方法，用于自动评估结构化医疗数据的质量

摘要: 背景：电子健康记录（EHR）在流行病学研究和人工智能（AI）培训中的使用正迅速增加。结果的可靠性取决于EHR数据的准确性和完整性。然而，EHR数据通常存在重大的质量问题，包括子群体的误代、偏见和系统性错误，因为它们主要用于临床和计费目的的收集。现有的质量评估方法仍然不足，缺乏系统性程序来评估数据的适用性进行研究。方法：我们提出了医疗数据Pecking方法，该方法将软件工程中的单元测试和覆盖概念用于识别数据质量问题。我们使用医疗数据Pecking工具（MDPT）展示了我们的方法，该工具包括两个主要组成部分：（1）一个自动化测试生成器，利用大型语言模型和基准技术从数据和研究描述中创建测试套件，以及（2）执行这些测试的数据测试框架，报告潜在错误和覆盖率。结果：我们在三个数据集（All of Us（AoU），MIMIC-III和SyntheticMass）上评估了MDPT，在四个条件下每个队列生成55-73个测试。这些测试正确识别了20-43个不对齐或不符合的数据问题。我们对由LLM生成的测试套件进行了详细分析，包括参考基准和值准确性。结论：我们的方法整合了外部医学知识，以在数据分析工作流程中作为数据质量测试的一部分，从而提高其结果的有效性。我们的方法从质量保证的角度解决了这些挑战，为进一步发展如额外数据模态和改进基准方法奠定了基础。

更新时间: 2025-07-03 13:54:50

领域: cs.LG

下载: http://arxiv.org/abs/2507.02628v1

A Matrix Variational Auto-Encoder for Variant Effect Prediction in Pharmacogenes

Variant effect predictors (VEPs) aim to assess the functional impact of protein variants, traditionally relying on multiple sequence alignments (MSAs). This approach assumes that naturally occurring variants are fit, an assumption challenged by pharmacogenomics, where some pharmacogenes experience low evolutionary pressure. Deep mutational scanning (DMS) datasets provide an alternative by offering quantitative fitness scores for variants. In this work, we propose a transformer-based matrix variational auto-encoder (matVAE) with a structured prior and evaluate its performance on 33 DMS datasets corresponding to 26 drug target and ADME proteins from the ProteinGym benchmark. Our model trained on MSAs (matVAE-MSA) outperforms the state-of-the-art DeepSequence model in zero-shot prediction on DMS datasets, despite using an order of magnitude fewer parameters and requiring less computation at inference time. We also compare matVAE-MSA to matENC-DMS, a model of similar capacity trained on DMS data, and find that the latter performs better on supervised prediction tasks. Additionally, incorporating AlphaFold-generated structures into our transformer model further improves performance, achieving results comparable to DeepSequence trained on MSAs and finetuned on DMS. These findings highlight the potential of DMS datasets to replace MSAs without significant loss in predictive performance, motivating further development of DMS datasets and exploration of their relationships to enhance variant effect prediction.

Updated: 2025-07-03 13:50:18

标题: 一个矩阵变分自动编码器用于药物基因中变异效应预测

摘要: 变异效应预测器（VEPs）旨在评估蛋白质变异的功能影响，传统上依赖于多重序列比对（MSAs）。这种方法假设自然发生的变异是适合的，这一假设在药物基因组学中受到挑战，一些药物基因经历低进化压力。深度突变扫描（DMS）数据集提供了一种替代方法，通过为变异提供定量适应性分数。在这项工作中，我们提出了一种基于变压器的矩阵变分自动编码器（matVAE），带有结构化先验，并评估其在对应于ProteinGym基准的26个药物靶标和ADME蛋白质的33个DMS数据集上的性能。我们在MSAs上训练的模型（matVAE-MSA）在DMS数据集的零-shot预测中胜过了最先进的DeepSequence模型，尽管使用的参数数量少一个数量级，并且在推理时需要更少的计算。我们还将matVAE-MSA与matENC-DMS进行比较，后者是在DMS数据上训练的容量类似的模型，并发现后者在监督预测任务上表现更好。此外，将AlphaFold生成的结构纳入我们的变压器模型进一步提高了性能，实现了与在MSAs上训练并在DMS上微调的DeepSequence相当的结果。这些发现突显了DMS数据集在不显著降低预测性能的情况下取代MSAs的潜力，鼓励进一步发展DMS数据集并探索它们与增强变异效应预测的关系。

更新时间: 2025-07-03 13:50:18

领域: cs.LG

下载: http://arxiv.org/abs/2507.02624v1

Access Control Threatened by Quantum Entanglement

Access control is a cornerstone of computer security that prevents unauthorised access to resources. In this paper, we study access control in quantum computer systems. We present the first explicit scenario of a security breach when a classically secure access control system is straightforwardly adapted to the quantum setting. The breach is ultimately due to that quantum mechanics allows the phenomenon of entanglement and violates Mermin inequality, a multi-party variant of the celebrated Bell inequality. This reveals a threat from quantum entanglement to access control if existing computer systems integrate with quantum computing. To protect against such threat, we propose several new models of quantum access control, and rigorously analyse their security, flexibility and efficiency.

Updated: 2025-07-03 13:48:14

标题: 量子纠缠威胁访问控制

摘要: 访问控制是计算机安全的基石，可以防止未经授权访问资源。本文研究了量子计算机系统中的访问控制。我们首次明确展示了当一个经典安全访问控制系统被直接适应到量子环境时，安全漏洞的情景。这一漏洞最终是由于量子力学允许纠缠现象并违反了Mermin不等式，这是著名的贝尔不等式的多方变体。这揭示了量子纠缠对访问控制构成威胁，如果现有计算机系统与量子计算整合。为了防范这种威胁，我们提出了几种新的量子访问控制模型，并对其安全性、灵活性和效率进行了严格分析。

更新时间: 2025-07-03 13:48:14

领域: quant-ph,cs.CR,cs.OS

下载: http://arxiv.org/abs/2507.02622v1

FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference

Distributed inference serves as a promising approach to enabling the inference of large language models (LLMs) at the network edge. It distributes the inference process to multiple devices to ensure that the LLMs can fit into the device memory. Recent pipeline-based approaches have the potential to parallelize communication and computation, which helps reduce inference latency. However, the benefit diminishes when the inference request at the network edge is sparse, where pipeline is typically at low utilization. To enable efficient distributed LLM inference at the edge, we propose \textbf{FlowSpec}, a pipeline-parallel tree-based speculative decoding framework. FlowSpec incorporates three key mechanisms to improve decoding efficiency: 1) score-based step-wise verification prioritizes more important draft tokens to bring earlier accpeted tokens; 2) efficient draft management to prune invalid tokens while maintaining correct causal relationship during verification; 3) dynamic draft expansion strategies to supply high-quality speculative inputs. These techniques work in concert to enhance both pipeline utilization and speculative efficiency. We evaluate FlowSpec on a real-world testbed with other baselines. Experimental results demonstrate that our proposed framework significantly improves inference speed across diverse models and configurations, achieving speedup ratios 1.36$\times$-1.77$\times$ compared to baselines. Our code is publicly available at \href{https://github.com/Leosang-lx/FlowSpec#}{https://github.com/Leosang-lx/FlowSpec\#}

Updated: 2025-07-03 13:47:42

标题: FlowSpec：用于高效分布式LLM推理的连续流水线化的推测解码

摘要: 分布式推理作为一种有前景的方法，可以实现在网络边缘推理大型语言模型（LLMs）。它将推理过程分布到多个设备，以确保LLMs可以适应设备内存。最近基于流水线的方法具有并行化通信和计算的潜力，有助于减少推理延迟。然而，在网络边缘的推理请求稀疏时，这种优势会减弱，因为流水线通常利用率低。为了在边缘实现高效的分布式LLM推理，我们提出了\textbf{FlowSpec}，这是一个基于流水线并行树的推测解码框架。FlowSpec包含三个关键机制来提高解码效率：1）基于分数的逐步验证优先考虑更重要的草稿令牌，以提供更早接受的令牌；2）有效的草稿管理，修剪无效令牌同时在验证过程中维护正确的因果关系；3）动态草稿扩展策略，提供高质量的推测输入。这些技术共同作用，增强了流水线利用率和推测效率。我们在一个真实世界的测试平台上评估了FlowSpec和其他基线。实验结果表明，我们提出的框架显著提高了各种模型和配置的推理速度，与基线相比，实现了1.36倍至1.77倍的加速比。我们的代码可以在以下链接公开获取：\href{https://github.com/Leosang-lx/FlowSpec#}{https://github.com/Leosang-lx/FlowSpec\#}

更新时间: 2025-07-03 13:47:42

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2507.02620v1

L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation

In this paper, we propose a novel model called Learnable VAE (L-VAE), which learns a disentangled representation together with the hyperparameters of the cost function. L-VAE can be considered as an extension of \b{eta}-VAE, wherein the hyperparameter, \b{eta}, is empirically adjusted. L-VAE mitigates the limitations of \b{eta}-VAE by learning the relative weights of the terms in the loss function to control the dynamic trade-off between disentanglement and reconstruction losses. In the proposed model, the weight of the loss terms and the parameters of the model architecture are learned concurrently. An additional regularization term is added to the loss function to prevent bias towards either reconstruction or disentanglement losses. Experimental analyses show that the proposed L-VAE finds an effective balance between reconstruction fidelity and disentangling the latent dimensions. Comparisons of the proposed L-VAE against \b{eta}-VAE, VAE, ControlVAE, DynamicVAE, and {\sigma}-VAE on datasets, such as dSprites, MPI3D-complex, Falcor3D, and Isaac3D reveals that L-VAE consistently provides the best or the second best performances measured by a set of disentanglement metrics. Moreover, qualitative experiments on CelebA dataset, confirm the success of the L-VAE model for disentangling the facial attributes.

Updated: 2025-07-03 13:45:42

标题: L-VAE: 具有可学习Beta的变分自动编码器，用于分解表示

摘要: 在本文中，我们提出了一种新颖的模型，称为可学习VAE（L-VAE），它与成本函数的超参数一起学习出一个分解表示。L-VAE可以被看作是\eta-VAE的扩展，其中超参数\eta是经验调整的。L-VAE通过学习损失函数中术语的相对权重来控制分解和重构损失之间的动态权衡，从而缓解了\eta-VAE的局限性。在提出的模型中，同时学习了损失项的权重和模型架构的参数。为了防止偏向于重构或分解损失，额外添加了一个正则化项到损失函数中。实验分析表明，所提出的L-VAE在重构保真度和分解潜在维度之间找到了有效的平衡。在dSprites、MPI3D-complex、Falcor3D和Isaac3D等数据集上，将所提出的L-VAE与\eta-VAE、VAE、ControlVAE、DynamicVAE和\sigma-VAE进行比较，结果显示L-VAE在一系列分解度量标准下始终提供最佳或第二佳的性能。此外，对CelebA数据集的定性实验证实了L-VAE模型在分解面部属性方面的成功。

更新时间: 2025-07-03 13:45:42

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.02619v1

Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory

Are Large Language Models (LLMs) a new form of strategic intelligence, able to reason about goals in competitive settings? We present compelling supporting evidence. The Iterated Prisoner's Dilemma (IPD) has long served as a model for studying decision-making. We conduct the first ever series of evolutionary IPD tournaments, pitting canonical strategies (e.g., Tit-for-Tat, Grim Trigger) against agents from the leading frontier AI companies OpenAI, Google, and Anthropic. By varying the termination probability in each tournament (the "shadow of the future"), we introduce complexity and chance, confounding memorisation. Our results show that LLMs are highly competitive, consistently surviving and sometimes even proliferating in these complex ecosystems. Furthermore, they exhibit distinctive and persistent "strategic fingerprints": Google's Gemini models proved strategically ruthless, exploiting cooperative opponents and retaliating against defectors, while OpenAI's models remained highly cooperative, a trait that proved catastrophic in hostile environments. Anthropic's Claude emerged as the most forgiving reciprocator, showing remarkable willingness to restore cooperation even after being exploited or successfully defecting. Analysis of nearly 32,000 prose rationales provided by the models reveals that they actively reason about both the time horizon and their opponent's likely strategy, and we demonstrate that this reasoning is instrumental to their decisions. This work connects classic game theory with machine psychology, offering a rich and granular view of algorithmic decision-making under uncertainty.

Updated: 2025-07-03 13:45:02

标题: 大型语言模型中的战略情报：来自进化博弈论的证据

摘要: 大型语言模型（LLMs）是一种新形式的战略智能，能够在竞争环境中推理目标吗？我们提供了令人信服的支持证据。反复博弈囚徒困境（IPD）长期以来一直被用作研究决策制定的模型。我们进行了有史以来第一次系列进化IPD比赛，将经典策略（如“以牙还牙”、“阴险触发”）与领先的AI公司OpenAI、Google和Anthropic的代理进行比拼。通过在每场比赛中改变终止概率（“未来的影子”），我们引入了复杂性和机会，使记忆变得混乱。我们的结果表明，LLMs在这些复杂生态系统中具有很高的竞争力，经常生存甚至有时繁殖。此外，它们展示了独特和持久的“战略指纹”：Google的Gemini模型被证明是战略上无情的，利用合作对手并对背叛者进行报复，而OpenAI的模型保持高度合作，这一特征在敌对环境中被证明是灾难性的。Anthropic的Claude成为最宽容的互惠者，显示出出色的愿意在被利用或成功背叛后恢复合作。对模型提供的近32,000个散文理由进行的分析表明，它们积极思考时间视野和对手可能的策略，我们证明这种推理对他们的决策至关重要。这项工作将经典博弈论与机器心理学联系起来，提供了在不确定性下算法决策制定的丰富和细致视角。

更新时间: 2025-07-03 13:45:02

领域: cs.AI,cs.CL,cs.GT

下载: http://arxiv.org/abs/2507.02618v1

DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making

The rise of Large Language Models (LLMs) has enabled the development of specialized AI agents with domain-specific reasoning and interaction capabilities, particularly in healthcare. While recent frameworks simulate medical decision-making, they largely focus on single-turn tasks where a doctor agent receives full case information upfront -- diverging from the real-world diagnostic process, which is inherently uncertain, interactive, and iterative. In this paper, we introduce MIMIC-Patient, a structured dataset built from the MIMIC-III electronic health records (EHRs), designed to support dynamic, patient-level simulations. Building on this, we propose DynamiCare, a novel dynamic multi-agent framework that models clinical diagnosis as a multi-round, interactive loop, where a team of specialist agents iteratively queries the patient system, integrates new information, and dynamically adapts its composition and strategy. We demonstrate the feasibility and effectiveness of DynamiCare through extensive experiments, establishing the first benchmark for dynamic clinical decision-making with LLM-powered agents.

Updated: 2025-07-03 13:43:10

标题: DynamiCare：一个用于互动和开放式医疗决策的动态多Agent框架

摘要: 大型语言模型（LLMs）的崛起使得能够开发具有特定领域推理和交互能力的专门的AI代理成为可能，尤其在医疗领域。尽管最近的框架模拟医疗决策，但主要集中在单轮任务上，其中医生代理接收完整病例信息，与现实世界的诊断过程有所偏离，后者本质上是不确定的、交互式的和迭代的。在本文中，我们介绍了从MIMIC-III电子健康记录（EHRs）构建的结构化数据集MIMIC-Patient，旨在支持动态、患者级别的模拟。在此基础上，我们提出了DynamiCare，这是一个新颖的动态多代理框架，将临床诊断建模为一个多轮、交互式的循环，其中一组专家代理迭代地查询患者系统，整合新信息，并动态调整其组成和策略。我们通过大量实验证明了DynamiCare的可行性和有效性，建立了第一个以LLM驱动的代理进行动态临床决策的基准。

更新时间: 2025-07-03 13:43:10

领域: cs.AI

下载: http://arxiv.org/abs/2507.02616v1

Interpreting Graph Inference with Skyline Explanations

Inference queries have been routinely issued to graph machine learning models such as graph neural networks (GNNs) for various network analytical tasks. Nevertheless, GNNs outputs are often hard to interpret comprehensively. Existing methods typically compromise to individual pre-defined explainability measures (such as fidelity), which often leads to biased, ``one-sided'' interpretations. This paper introduces skyline explanation, a new paradigm that interprets GNN output by simultaneously optimizing multiple explainability measures of users' interests. (1) We propose skyline explanations as a Pareto set of explanatory subgraphs that dominate others over multiple explanatory measures. We formulate skyline explanation as a multi-criteria optimization problem, and establish its hardness results. (2) We design efficient algorithms with an onion-peeling approach, which strategically prioritizes nodes and removes unpromising edges to incrementally assemble skyline explanations. (3) We also develop an algorithm to diversify the skyline explanations to enrich the comprehensive interpretation. (4) We introduce efficient parallel algorithms with load-balancing strategies to scale skyline explanation for large-scale GNN-based inference. Using real-world and synthetic graphs, we experimentally verify our algorithms' effectiveness and scalability.

Updated: 2025-07-03 13:40:49

标题: 用天际线解释解释图推断

摘要: 推理查询常常被发给图机器学习模型，比如图神经网络（GNNs），用于各种网络分析任务。然而，GNNs的输出往往难以全面解释。现有方法通常牺牲个别预定义的可解释性度量（如忠实度），这往往导致有偏见的“片面”解释。本文引入了天际线解释，这是一种新的范式，通过同时优化用户感兴趣的多个可解释性度量来解释GNN输出。（1）我们提出了天际线解释，作为支配其他解释性子图的帕累托集合，覆盖多个解释性度量。我们将天际线解释形式化为一个多标准优化问题，并建立其难度结果。（2）我们设计了高效算法，采用洋葱剥皮法，有策略地优先考虑节点并去除不太有希望的边，逐步组装天际线解释。（3）我们还开发了一种算法，使天际线解释多样化，丰富全面解释。（4）我们引入了有效的并行算法，采用负载平衡策略，为大规模基于GNN的推理扩展天际线解释。通过使用真实世界和合成图，我们通过实验证明了我们算法的有效性和可扩展性。

更新时间: 2025-07-03 13:40:49

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2505.07635v2

Orientation-Aware Sparse Tensor PCA for Efficient Unsupervised Feature Selection

Recently, introducing Tensor Decomposition (TD) techniques into unsupervised feature selection (UFS) has been an emerging research topic. A tensor structure is beneficial for mining the relations between different modes and helps relieve the computation burden. However, while existing methods exploit TD to preserve the data tensor structure, they do not consider the influence of data orientation and thus have difficulty in handling orientation-specific data such as time series. To solve the above problem, we utilize the orientation-dependent tensor-tensor product from Tensor Singular Value Decomposition based on *M-product (T-SVDM) and extend the one-dimensional Sparse Principal Component Analysis (SPCA) to a tensor form. The proposed sparse tensor PCA model can constrain sparsity at the specified mode and yield sparse tensor principal components, enhancing flexibility and accuracy in learning feature relations. To ensure fast convergence and a flexible description of feature correlation, we develop a convex version specially designed for general UFS tasks and propose an efficient slice-by-slice algorithm that performs dual optimization in the transform domain. Experimental results on real-world datasets demonstrate the effectiveness and remarkable computational efficiency of the proposed method for tensor data of diverse structures over the state-of-the-art. When transform axes align with feature distribution patterns, our method is promising for various applications. The codes related to our proposed methods and the experiments are available at https://github.com/zjj20212035/STPCA.git.

Updated: 2025-07-03 13:34:01

标题: 面向方向的稀疏张量PCA用于高效的无监督特征选择

摘要: 最近，将张量分解（TD）技术引入无监督特征选择（UFS）已成为一个新兴的研究课题。张量结构有助于挖掘不同模式之间的关系，并有助于减轻计算负担。然而，虽然现有方法利用TD来保留数据张量结构，但它们并未考虑数据方向的影响，因此在处理特定方向数据（如时间序列）时存在困难。为解决上述问题，我们利用基于*M-product的张量奇异值分解（T-SVDM）中的方向依赖张量-张量乘积，并将一维稀疏主成分分析（SPCA）扩展为张量形式。所提出的稀疏张量PCA模型可以在指定模式上约束稀疏性，并产生稀疏张量主成分，增强学习特征关系的灵活性和准确性。为保证快速收敛和灵活描述特征相关性，我们开发了一种专门针对一般UFS任务设计的凸版本，并提出了一种在转换域中执行双优化的高效逐片算法。对真实世界数据集的实验结果表明，所提出的方法在各种结构的张量数据上具有显著的计算效率和有效性，超越了现有技术。当转换轴与特征分布模式对齐时，我们的方法在各种应用中表现出很大潜力。与我们提出的方法和实验相关的代码可在https://github.com/zjj20212035/STPCA.git上找到。

更新时间: 2025-07-03 13:34:01

领域: cs.LG

下载: http://arxiv.org/abs/2407.16985v3

Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation

The steep computational cost of diffusion models at inference hinders their use as fast physics emulators. In the context of image and video generation, this computational drawback has been addressed by generating in the latent space of an autoencoder instead of the pixel space. In this work, we investigate whether a similar strategy can be effectively applied to the emulation of dynamical systems and at what cost. We find that the accuracy of latent-space emulation is surprisingly robust to a wide range of compression rates (up to 1000x). We also show that diffusion-based emulators are consistently more accurate than non-generative counterparts and compensate for uncertainty in their predictions with greater diversity. Finally, we cover practical design choices, spanning from architectures to optimizers, that we found critical to train latent-space emulators.

Updated: 2025-07-03 13:32:50

标题: 在潜在空间中迷失：物理仿真的潜在扩散模型的实证研究

摘要: 扩散模型在推断过程中的高昂计算成本阻碍了它们作为快速物理仿真器的使用。在图像和视频生成的背景下，这种计算上的缺陷已通过在自动编码器的潜在空间中生成来解决，而不是在像素空间中生成。在这项工作中，我们调查了类似策略是否能有效地应用于动态系统的仿真以及成本是多少。我们发现，潜在空间仿真的准确性在广泛的压缩率范围内（高达1000倍）令人惊讶地稳健。我们还展示了基于扩散的仿真器始终比非生成式对应物更准确，并通过更大的多样性弥补了其预测中的不确定性。最后，我们涵盖了实际设计选择，从架构到优化器，我们发现对训练潜在空间仿真器至关重要。

更新时间: 2025-07-03 13:32:50

领域: cs.LG,physics.flu-dyn

下载: http://arxiv.org/abs/2507.02608v1

Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs"

Driven by recent breakthrough advances in neural representation learning, approximate near-neighbor (ANN) search over vector embeddings has emerged as a critical computational workload. With the introduction of the seminal Hierarchical Navigable Small World (HNSW) algorithm, graph-based indexes have established themselves as the overwhelmingly dominant paradigm for efficient and scalable ANN search. As the name suggests, HNSW searches a layered hierarchical graph to quickly identify neighborhoods of similar points to a given query vector. But is this hierarchy even necessary? A rigorous experimental analysis to answer this question would provide valuable insights into the nature of algorithm design for ANN search and motivate directions for future work in this increasingly crucial domain. We conduct an extensive benchmarking study covering more large-scale datasets than prior investigations of this question. We ultimately find that a flat navigable small world graph graph retains all of the benefits of HNSW on high-dimensional datasets, with latency and recall performance essentially \emph{identical} to the original algorithm but with less memory overhead. Furthermore, we go a step further and study \emph{why} the hierarchy of HNSW provides no benefit in high dimensions, hypothesizing that navigable small world graphs contain a well-connected, frequently traversed ``highway" of hub nodes that maintain the same purported function as the hierarchical layers. We present compelling empirical evidence that the \emph{Hub Highway Hypothesis} holds for real datasets and investigate the mechanisms by which the highway forms. The implications of this hypothesis may also provide future research directions in developing enhancements to graph-based ANN search.

Updated: 2025-07-03 13:32:02

标题: 打倒等级制度：HNSW中的“H”代表“中心”

摘要: 受最近神经表示学习的突破性进展驱动，基于向量嵌入的近似最近邻搜索（ANN）已经成为一个关键的计算工作负载。随着开创性的分层可导航小世界（HNSW）算法的引入，基于图的索引已经确立自己作为高效可扩展的ANN搜索的主导范式。正如其名称所示，HNSW搜索分层的图来快速识别与给定查询向量相似的点的邻域。但这种层次结构真的有必要吗？对这个问题进行严格的实验分析将为ANN搜索的算法设计的本质提供宝贵的见解，并激发未来工作方向在这个日益关键的领域。我们进行了一项广泛的基准研究，涵盖比以往更多的大规模数据集，以回答这个问题。我们最终发现，在高维数据集上，平面可导航小世界图保留了HNSW的所有优点，延迟和召回性能基本上与原始算法相同，但内存开销更少。此外，我们更进一步研究了为什么HNSW的层次结构在高维度中没有优势，假设可导航小世界图包含一个连接良好、频繁遍历的“高速公路”节点的机制，这些节点起到与分层层相同的作用。我们提供了有力的实证证据证明“中心高速公路假设”适用于真实数据集，并研究了高速公路形成的机制。这一假设的影响还可能为开发基于图的ANN搜索的增强提供未来的研究方向。

更新时间: 2025-07-03 13:32:02

领域: cs.LG,cs.DB,cs.IR

下载: http://arxiv.org/abs/2412.01940v3

De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks

The rapid advancement of speech generation models has heightened privacy and security concerns related to voice cloning (VC). Recent studies have investigated disrupting unauthorized voice cloning by introducing adversarial perturbations. However, determined attackers can mitigate these protective perturbations and successfully execute VC. In this study, we conduct the first systematic evaluation of these protective perturbations against VC under realistic threat models that include perturbation purification. Our findings reveal that while existing purification methods can neutralize a considerable portion of the protective perturbations, they still lead to distortions in the feature space of VC models, which degrades the performance of VC. From this perspective, we propose a novel two-stage purification method: (1) Purify the perturbed speech; (2) Refine it using phoneme guidance to align it with the clean speech distribution. Experimental results demonstrate that our method outperforms state-of-the-art purification methods in disrupting VC defenses. Our study reveals the limitations of adversarial perturbation-based VC defenses and underscores the urgent need for more robust solutions to mitigate the security and privacy risks posed by VC. The code and audio samples are available at https://de-antifake.github.io.

Updated: 2025-07-03 13:30:58

标题: De-AntiFake：重新思考声纹克隆攻击的保护性扰动

摘要: 语音生成模型的快速发展增加了与语音克隆（VC）相关的隐私和安全问题。最近的研究调查了通过引入对抗扰动来破坏未经授权的语音克隆。然而，决心的攻击者可以减轻这些保护性扰动并成功执行VC。在这项研究中，我们对这些保护性扰动针对包括扰动净化在内的现实威胁模型进行了首次系统评估。我们的发现显示，虽然现有的净化方法可以中和相当一部分的保护性扰动，但它们仍然会在VC模型的特征空间中导致失真，从而降低VC的性能。基于这一观点，我们提出了一种新颖的两阶段净化方法：（1）清洁被扰动的语音；（2）使用音素指导对其进行精化，以将其与清洁语音分布对齐。实验结果表明，我们的方法在破坏VC防御方面优于最先进的净化方法。我们的研究揭示了基于对抗扰动的VC防御的局限性，并强调了需要更强大解决方案来减轻由VC带来的安全和隐私风险。代码和音频样本可在https://de-antifake.github.io上找到。

更新时间: 2025-07-03 13:30:58

领域: cs.SD,cs.AI,cs.CR,cs.LG,eess.AS

下载: http://arxiv.org/abs/2507.02606v1

Automated Grading of Students' Handwritten Graphs: A Comparison of Meta-Learning and Vision-Large Language Models

With the rise of online learning, the demand for efficient and consistent assessment in mathematics has significantly increased over the past decade. Machine Learning (ML), particularly Natural Language Processing (NLP), has been widely used for autograding student responses, particularly those involving text and/or mathematical expressions. However, there has been limited research on autograding responses involving students' handwritten graphs, despite their prevalence in Science, Technology, Engineering, and Mathematics (STEM) curricula. In this study, we implement multimodal meta-learning models for autograding images containing students' handwritten graphs and text. We further compare the performance of Vision Large Language Models (VLLMs) with these specially trained metalearning models. Our results, evaluated on a real-world dataset collected from our institution, show that the best-performing meta-learning models outperform VLLMs in 2-way classification tasks. In contrast, in more complex 3-way classification tasks, the best-performing VLLMs slightly outperform the meta-learning models. While VLLMs show promising results, their reliability and practical applicability remain uncertain and require further investigation.

Updated: 2025-07-03 13:25:50

标题: 学生手写图表的自动评分：元学习和视觉-大语言模型的比较

摘要: 随着在线学习的兴起，过去十年来对数学中高效一致评估的需求显著增加。机器学习（ML），特别是自然语言处理（NLP），已被广泛用于自动评分学生的文本和/或数学表达式。然而，尽管在科学、技术、工程和数学（STEM）课程中手写图形在学生答案中很常见，但对涉及学生手写图形的自动评分答案的研究有限。在这项研究中，我们实施了用于自动评分包含学生手写图形和文本的图像的多模态元学习模型。我们进一步比较了Vision Large Language Models（VLLMs）与这些专门训练的元学习模型的性能。我们在从我们的机构收集的真实数据集上评估的结果显示，表现最佳的元学习模型在2路分类任务中胜过VLLMs。相比之下，在更复杂的3路分类任务中，表现最佳的VLLMs略微胜过元学习模型。虽然VLLMs显示出有希望的结果，但它们的可靠性和实际适用性仍然不确定，需要进一步调查。

更新时间: 2025-07-03 13:25:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03056v1

StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models

In this work, we present a series of structure transformation attacks on LLM alignment, where we encode natural language intent using diverse syntax spaces, ranging from simple structure formats and basic query languages (e.g., SQL) to new novel spaces and syntaxes created entirely by LLMs. Our extensive evaluation shows that our simplest attacks can achieve close to a 90% success rate, even on strict LLMs (such as Claude 3.5 Sonnet) using SOTA alignment mechanisms. We improve the attack performance further by using an adaptive scheme that combines structure transformations along with existing content transformations, resulting in over 96% ASR with 0% refusals. To generalize our attacks, we explore numerous structure formats, including syntaxes purely generated by LLMs. Our results indicate that such novel syntaxes are easy to generate and result in a high ASR, suggesting that defending against our attacks is not a straightforward process. Finally, we develop a benchmark and evaluate existing safety-alignment defenses against it, showing that most of them fail with 100% ASR. Our results show that existing safety alignment mostly relies on token-level patterns without recognizing harmful concepts, highlighting and motivating the need for serious research efforts in this direction. As a case study, we demonstrate how attackers can use our attack to easily generate a sample malware and a corpus of fraudulent SMS messages, which perform well in bypassing detection.

Updated: 2025-07-03 13:25:12

标题: StructTransform：面向安全对齐大型语言模型的可扩展攻击面

摘要: 在这项工作中，我们提出了一系列针对LLM对齐的结构转换攻击，其中我们使用不同的语法空间编码自然语言意图，从简单的结构格式和基本查询语言（例如SQL）到完全由LLM创建的新颖空间和语法。我们广泛评估的结果显示，我们最简单的攻击可以在严格的LLM（如Claude 3.5 Sonnet）上实现接近90%的成功率，使用了SOTA对齐机制。通过结合结构转换和现有内容转换的自适应方案，我们进一步提高了攻击性能，达到了超过96%的ASR，拒绝率为0%。为了推广我们的攻击，我们探索了许多结构格式，包括完全由LLM生成的语法。我们的结果表明，这种新颖的语法易于生成，且产生了很高的ASR，这表明防御我们的攻击并不是一个简单的过程。最后，我们开发了一个基准，并评估了现有的安全对齐防御措施，结果显示大多数防御措施的ASR达到了100%。我们的结果表明，现有的安全对齐主要依赖于标记级别的模式，而没有识别有害概念，突出并激励了在这个方向上进行认真研究的需要。作为一个案例研究，我们演示了攻击者如何利用我们的攻击轻松生成样本恶意软件和一系列欺诈性短信，这些样本在绕过检测方面表现良好。

更新时间: 2025-07-03 13:25:12

领域: cs.LG

下载: http://arxiv.org/abs/2502.11853v2

Addressing Camera Sensors Faults in Vision-Based Navigation: Simulation and Dataset Development

The increasing importance of Vision-Based Navigation (VBN) algorithms in space missions raises numerous challenges in ensuring their reliability and operational robustness. Sensor faults can lead to inaccurate outputs from navigation algorithms or even complete data processing faults, potentially compromising mission objectives. Artificial Intelligence (AI) offers a powerful solution for detecting such faults, overcoming many of the limitations associated with traditional fault detection methods. However, the primary obstacle to the adoption of AI in this context is the lack of sufficient and representative datasets containing faulty image data. This study addresses these challenges by focusing on an interplanetary exploration mission scenario. A comprehensive analysis of potential fault cases in camera sensors used within the VBN pipeline is presented. The causes and effects of these faults are systematically characterized, including their impact on image quality and navigation algorithm performance, as well as commonly employed mitigation strategies. To support this analysis, a simulation framework is introduced to recreate faulty conditions in synthetically generated images, enabling a systematic and controlled reproduction of faulty data. The resulting dataset of fault-injected images provides a valuable tool for training and testing AI-based fault detection algorithms. The final link to the dataset will be added after an embargo period. For peer-reviewers, this private link is available.

Updated: 2025-07-03 13:23:22

标题: 解决基于视觉导航中相机传感器故障问题：仿真和数据集开发

摘要: 视觉导航（VBN）算法在太空任务中日益重要，这引发了确保其可靠性和操作稳健性的众多挑战。传感器故障可能导致导航算法输出不准确甚至完全数据处理故障，潜在地危及任务目标。人工智能（AI）为检测此类故障提供了强大的解决方案，克服了传统故障检测方法的许多局限性。然而，在这一背景下采用AI的主要障碍是缺乏包含故障图像数据的充分和代表性数据集。本研究通过聚焦行星探测任务场景来应对这些挑战。提出了对VBN管道中使用的相机传感器潜在故障情况进行全面分析。系统地表征了这些故障的原因和影响，包括它们对图像质量和导航算法性能的影响，以及常用的缓解策略。为支持这一分析，引入了一个模拟框架，用于在合成生成的图像中重新创建故障条件，实现对故障数据的系统和控制复制。注入故障的图像数据集为训练和测试基于AI的故障检测算法提供了有价值的工具。数据集的最终链接将在禁运期后添加。对于同行评审者，此私密链接可用。

更新时间: 2025-07-03 13:23:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02602v1

Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data

Purpose: The primary aim of this study is to enhance fault diagnosis in induction machines by leveraging the Pad\'e Approximant Neuron (PAON) model. While accelerometers and microphones are standard in motor condition monitoring, deep learning models with nonlinear neuron architectures offer promising improvements in diagnostic performance. This research addresses the question: Can Pad\'e Approximant Neural Networks (Pad\'eNets) outperform conventional Convolutional Neural Networks (CNNs) and Self-Organized Operational Neural Networks (Self-ONNs) in diagnosing electrical and mechanical faults using vibration and acoustic data? Methods: We evaluate and compare the diagnostic capabilities of three deep learning architectures: one-dimensional CNNs, Self-ONNs, and Pad\'eNets. These models are tested on the University of Ottawa's publicly available constant-speed induction motor datasets, which include both vibration and acoustic sensor data. The Pad\'eNet model is designed to introduce enhanced nonlinearity and is compatible with unbounded activation functions such as Leaky ReLU. Results and Conclusion: Pad\'eNets consistently outperformed the baseline models, achieving diagnostic accuracies of 99.96%, 98.26%, 97.61%, and 98.33% for accelerometers 1, 2, 3, and the acoustic sensor, respectively. The enhanced nonlinearity of Pad\'eNets, together with their compatibility with unbounded activation functions, significantly improves fault diagnosis performance in induction motor condition monitoring.

Updated: 2025-07-03 13:21:45

标题: 使用振动和声学数据的Padé逼近神经网络用于增强电动机故障诊断

摘要: 目的：本研究的主要目的是通过利用Pad\'e近似神经元（PAON）模型来增强感应电机的故障诊断能力。虽然加速计和麦克风在电机状态监测中是标准的，但具有非线性神经元结构的深度学习模型可以提供诊断性能的显著改善。本研究探讨了一个问题：Pad\'e近似神经网络（Pad\'eNets）是否可以在使用振动和声学数据诊断电气和机械故障时优于传统的卷积神经网络（CNNs）和自组织运算神经网络（Self-ONNs）？方法：我们评估和比较三种深度学习架构的诊断能力：一维CNNs、Self-ONNs和Pad\'eNets。这些模型在渥太华大学的公开可用的恒速感应电机数据集上进行测试，其中包括振动和声学传感器数据。Pad\'eNet模型旨在引入增强的非线性，并且与无边界激活函数（如Leaky ReLU）兼容。结果和结论：Pad\'eNets始终优于基线模型，在加速计1、2、3和声学传感器方面分别实现了99.96％、98.26％、97.61％和98.33％的诊断准确率。Pad\'eNets的增强非线性以及与无边界激活函数的兼容性显著提高了感应电机状态监测中的故障诊断性能。

更新时间: 2025-07-03 13:21:45

领域: cs.LG,cs.SD,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.02599v1

AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models

Arithmetic circuits, such as adders and multipliers, are fundamental components of digital systems, directly impacting the performance, power efficiency, and area footprint. However, optimizing these circuits remains challenging due to the vast design space and complex physical constraints. While recent deep learning-based approaches have shown promise, they struggle to consistently explore high-potential design variants, limiting their optimization efficiency. To address this challenge, we propose AC-Refiner, a novel arithmetic circuit optimization framework leveraging conditional diffusion models. Our key insight is to reframe arithmetic circuit synthesis as a conditional image generation task. By carefully conditioning the denoising diffusion process on target quality-of-results (QoRs), AC-Refiner consistently produces high-quality circuit designs. Furthermore, the explored designs are used to fine-tune the diffusion model, which focuses the exploration near the Pareto frontier. Experimental results demonstrate that AC-Refiner generates designs with superior Pareto optimality, outperforming state-of-the-art baselines. The performance gain is further validated by integrating AC-Refiner into practical applications.

Updated: 2025-07-03 13:21:33

标题: AC-Refiner：利用条件扩散模型实现高效算术电路优化

摘要: 算术电路，如加法器和乘法器，是数字系统的基本组件，直接影响性能、功耗效率和面积占用。然而，由于设计空间广阔和复杂的物理约束条件，优化这些电路仍然具有挑战性。虽然最近基于深度学习的方法显示出潜力，但它们往往难以持续探索高潜力的设计变体，从而限制了它们的优化效率。为了解决这一挑战，我们提出了AC-Refiner，这是一个利用条件扩散模型的新型算术电路优化框架。我们的关键见解是将算术电路合成重新构想为一项条件图像生成任务。通过在目标结果质量（QoRs）上精心调节去噪扩散过程，AC-Refiner始终能够产生高质量的电路设计。此外，探索的设计被用来微调扩散模型，这将重点放在Pareto前沿附近的探索上。实验结果表明，AC-Refiner生成具有优越Pareto最优性的设计，胜过现有技术基准。性能增益通过将AC-Refiner集成到实际应用中得到进一步验证。

更新时间: 2025-07-03 13:21:33

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2507.02598v1

Stealtooth: Breaking Bluetooth Security Abusing Silent Automatic Pairing

Bluetooth is a pervasive wireless communication technology used by billions of devices for short-range connectivity. The security of Bluetooth relies on the pairing process, where devices establish shared long-term keys for secure communications. However, many commercial Bluetooth devices implement automatic pairing functions to improve user convenience, creating a previously unexplored attack surface. We present Stealtooth, a novel attack that abuses unknown vulnerabilities in the automatic pairing functions in commercial Bluetooth devices to achieve completely silent device link key overwriting. The Stealtooth attack leverages the fact that Bluetooth audio devices automatically transition to pairing mode under specific conditions, enabling attackers to hijack pairing processes without user awareness or specialized tools. We also extend the attack into the MitM Stealtooth attack, combining automatic pairing abuse with power-saving mode techniques to enable man-in-the-middle attacks. We evaluate the attacks against 10 commercial Bluetooth devices from major manufacturers, demonstrating widespread vulnerabilities across diverse device types and manufacturers. Our practical implementation requires only commodity hardware and open-source software, highlighting the low barrier to entry for attackers. We propose defenses both device and protocol levels, including enhanced user notifications and standardized automatic pairing guidelines. Our findings reveal a critical tension between security and usability, showing that current automatic pairing implementations create systematic vulnerabilities. We responsibly disclosed our findings to affected vendors, with several already releasing patches.

Updated: 2025-07-03 13:14:41

标题: Stealtooth: 利用静默自动配对破解蓝牙安全

摘要: 蓝牙是一种广泛使用的无线通信技术，被数十亿设备用于短距离连接。蓝牙的安全性依赖于配对过程，设备在此过程中建立共享的长期密钥以进行安全通信。然而，许多商用蓝牙设备实现了自动配对功能，以提高用户便利性，从而创造了一个以前未被探索的攻击面。我们提出了Stealtooth，这是一种新型攻击，利用商用蓝牙设备自动配对功能中的未知漏洞，实现完全无声设备连接密钥覆盖。Stealtooth攻击利用了蓝牙音频设备在特定条件下自动转换到配对模式的事实，使攻击者能够在用户不知情或不使用专门工具的情况下劫持配对过程。我们还将攻击扩展到MitM Stealtooth攻击，结合自动配对滥用和省电模式技术，实现中间人攻击。我们对来自主要制造商的10款商用蓝牙设备进行了攻击评估，展示了跨不同设备类型和制造商的广泛漏洞。我们的实际实施只需要普通硬件和开源软件，突出了攻击者进入门槛较低。我们提出了设备和协议级别的防御措施，包括增强用户通知和标准化自动配对指南。我们的研究结果揭示了安全性和可用性之间的严重紧张关系，表明当前的自动配对实现造成了系统性漏洞。我们已经将我们的研究结果负责地披露给受影响的供应商，其中一些已经发布了补丁。

更新时间: 2025-07-03 13:14:41

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2507.00847v2

MPF: Aligning and Debiasing Language Models post Deployment via Multi Perspective Fusion

Multiperspective Fusion (MPF) is a novel posttraining alignment framework for large language models (LLMs) developed in response to the growing need for easy bias mitigation. Built on top of the SAGED pipeline, an automated system for constructing bias benchmarks and extracting interpretable baseline distributions, MPF leverages multiperspective generations to expose and align biases in LLM outputs with nuanced, humanlike baselines. By decomposing baseline, such as sentiment distributions from HR professionals, into interpretable perspective components, MPF guides generation through sampling and balancing of responses, weighted by the probabilities obtained in the decomposition. Empirically, we demonstrate its ability to align LLM sentiment distributions with both counterfactual baselines (absolute equality) and the HR baseline (biased for Top Univeristy), resulting in small KL divergence, reduction of calibration error and generalization to unseen questions. This shows that MPF offers a scalable and interpretable method for alignment and bias mitigation, compatible with deployed LLMs and requiring no extensive prompt engineering or finetuning.

Updated: 2025-07-03 13:09:18

标题: MPF: 通过多视角融合对部署后的语言模型进行对齐和去偏差化

摘要: 多视角融合（MPF）是一种新颖的后训练对齐框架，用于应对对易于偏见缓解的日益增长的需求。基于SAGED管道构建，这是一个自动化系统，用于构建偏见基准并提取可解释的基准分布，MPF利用多视角生成来暴露和对齐LLM输出中的偏见与细致、类人基准。通过将基准分解为可解释的透视组件，比如情绪分布来自HR专业人士，MPF通过在采样和平衡响应中权衡响应，加权计算获得的概率来引导生成。从经验上来看，我们展示了它与反事实基准（绝对平等）和HR基准（偏向于高级大学）对齐LLM情绪分布的能力，结果为小KL散度，减少了校准错误并泛化到未见过的问题。这表明MPF提供了一种可扩展和可解释的对齐和偏见缓解方法，与部署的LLMs兼容，无需进行大量提示工程或微调。

更新时间: 2025-07-03 13:09:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.02595v1

SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks

Vision-Language Models (VLMs) have great potential in medical tasks, like Visual Question Answering (VQA), where they could act as interactive assistants for both patients and clinicians. Yet their robustness to distribution shifts on unseen data remains a key concern for safe deployment. Evaluating such robustness requires a controlled experimental setup that allows for systematic insights into the model's behavior. However, we demonstrate that current setups fail to offer sufficiently thorough evaluations. To address this gap, we introduce a novel framework, called SURE-VQA, centered around three key requirements to overcome current pitfalls and systematically analyze VLM robustness: 1) Since robustness on synthetic shifts does not necessarily translate to real-world shifts, it should be measured on real-world shifts that are inherent to the VQA data; 2) Traditional token-matching metrics often fail to capture underlying semantics, necessitating the use of large language models (LLMs) for more accurate semantic evaluation; 3) Model performance often lacks interpretability due to missing sanity baselines, thus meaningful baselines should be reported that allow assessing the multimodal impact on the VLM. To demonstrate the relevance of this framework, we conduct a study on the robustness of various Fine-Tuning (FT) methods across three medical datasets with four types of distribution shifts. Our study highlights key insights into robustness: 1) No FT method consistently outperforms others in robustness, and 2) robustness trends are more stable across FT methods than across distribution shifts. Additionally, we find that simple sanity baselines that do not use the image data can perform surprisingly well and confirm LoRA as the best-performing FT method on in-distribution data. Code is provided at https://github.com/IML-DKFZ/sure-vqa.

Updated: 2025-07-03 13:07:30

标题: SURE-VQA: 医学VQA任务中鲁棒性评估的系统性理解

摘要: 视觉-语言模型（VLMs）在医疗任务中具有巨大潜力，如视觉问答（VQA），在这些任务中，它们可以作为患者和临床医生的互动助手。然而，它们对未知数据的分布偏移的鲁棒性仍然是安全部署的关键问题。评估这种鲁棒性需要一个控制性实验设置，以便系统地了解模型的行为。然而，我们证明当前的设置未能提供足够全面的评估。为了填补这一空白，我们引入了一个新颖的框架，称为SURE-VQA，围绕三个关键要求，以克服当前的缺陷并系统地分析VLM的鲁棒性：1）由于对合成偏移的鲁棒性并不一定转化为现实世界的偏移，应该在与VQA数据固有的现实世界偏移上进行测量；2）传统的令牌匹配指标通常无法捕捉潜在的语义，需要使用大型语言模型（LLMs）进行更准确的语义评估；3）由于缺乏对照基线，模型性能通常缺乏可解释性，因此应报告有意义的基线，以评估多模态对VLM的影响。为了展示这一框架的相关性，我们对三个医疗数据集上的各种微调（FT）方法的鲁棒性进行了研究，涵盖四种分布偏移类型。我们的研究突出了鲁棒性的关键见解：1）没有一种FT方法始终优于其他方法，在鲁棒性方面；2）鲁棒性趋势在FT方法之间比在分布偏移之间更稳定。此外，我们发现简单的不使用图像数据的合理基线可以表现出色，并确认LoRA作为在分布数据上表现最佳的FT方法。代码提供在https://github.com/IML-DKFZ/sure-vqa。

更新时间: 2025-07-03 13:07:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.19688v3

H$_2$O$_2$RAM: A High-Performance Hierarchical Doubly Oblivious RAM

The combination of Oblivious RAM (ORAM) with Trusted Execution Environments (TEE) has found numerous real-world applications due to their complementary nature. TEEs alleviate the performance bottlenecks of ORAM, such as network bandwidth and roundtrip latency, and ORAM provides general-purpose protection for TEE applications against attacks exploiting memory access patterns. The defining property of this combination, which sets it apart from traditional ORAM designs, is its ability to ensure that memory accesses, both inside and outside of TEEs, are made oblivious, thus termed doubly oblivious RAM (O$_2$RAM). Efforts to develop O$_2$RAM with enhanced performance are ongoing. In this work, we propose H$_2$O$_2$RAM, a high-performance doubly oblivious RAM construction. The distinguishing feature of our approach, compared to the existing tree-based doubly oblivious designs, is its first adoption of the hierarchical framework that enjoys inherently better data locality and parallelization. While the latest hierarchical solution, FutORAMa, achieves concrete efficiency in the classic client-server model by leveraging a relaxed assumption of sublinear-sized client-side private memory, adapting it to our scenario poses challenges due to the conflict between this relaxed assumption and our doubly oblivious requirement. To this end, we introduce several new efficient oblivious components to build a high-performance hierarchical O$_2$RAM (H$_2$O$_2$RAM). We implement our design and evaluate it on various scenarios. The results indicate that H$_2$O$_2$RAM reduces execution time by up to $\sim 10^3$ times and saves memory usage by $5\sim44$ times compared to state-of-the-art solutions.

Updated: 2025-07-03 13:00:57

标题: H$_2$O$_2$RAM: 一种高性能的分层双重遗忘RAM

摘要: 将遗忘式RAM（ORAM）与可信执行环境（TEE）结合起来已经在许多真实世界的应用中找到了应用，这是因为它们之间的相辅相成的关系。TEE可以缓解ORAM的性能瓶颈，例如网络带宽和往返延迟，而ORAM可以为TEE应用程序提供通用保护，防止利用内存访问模式的攻击。这种组合的定义性特征是，它能够确保内部和外部TEE的内存访问都是遗忘式的，因此被称为双重遗忘式RAM（O$_2$RAM）。目前正在努力开发性能更强的O$_2$RAM。在这项工作中，我们提出了H$_2$O$_2$RAM，这是一种高性能的双重遗忘式RAM构建方法。与现有基于树的双重遗忘设计相比，我们方法的独特之处在于首次采用享有固有更好数据局部性和并行性的分层框架。尽管最新的分层解决方案FutORAMa通过利用对子线性大小的客户端私有内存的宽松假设在经典的客户端-服务器模型中实现了具体的效率，但将其适应到我们的场景中面临着挑战，因为这种宽松假设与我们的双重遗忘要求之间存在冲突。因此，我们引入了几个新的高效遗忘组件来构建高性能的分层O$_2$RAM（H$_2$O$_2$RAM）。我们实现了我们的设计，并在各种场景下进行了评估。结果表明，与最先进的解决方案相比，H$_2$O$_2$RAM的执行时间缩短了多达$\sim 10^3$倍，并且内存使用量降低了$5\sim 44$倍。

更新时间: 2025-07-03 13:00:57

领域: cs.CR

下载: http://arxiv.org/abs/2409.07167v3

Revisiting Active Learning under (Human) Label Variation

Access to high-quality labeled data remains a limiting factor in applied supervised learning. While label variation (LV), i.e., differing labels for the same instance, is common, especially in natural language processing, annotation frameworks often still rest on the assumption of a single ground truth. This overlooks human label variation (HLV), the occurrence of plausible differences in annotations, as an informative signal. Similarly, active learning (AL), a popular approach to optimizing the use of limited annotation budgets in training ML models, often relies on at least one of several simplifying assumptions, which rarely hold in practice when acknowledging HLV. In this paper, we examine foundational assumptions about truth and label nature, highlighting the need to decompose observed LV into signal (e.g., HLV) and noise (e.g., annotation error). We survey how the AL and (H)LV communities have addressed -- or neglected -- these distinctions and propose a conceptual framework for incorporating HLV throughout the AL loop, including instance selection, annotator choice, and label representation. We further discuss the integration of large language models (LLM) as annotators. Our work aims to lay a conceptual foundation for HLV-aware active learning, better reflecting the complexities of real-world annotation.

Updated: 2025-07-03 12:59:28

标题: 重新审视在（人类）标签变化下的主动学习

摘要: 高质量标记数据的获取仍然是应用监督学习中的一个限制因素。标签变化（LV），即对同一实例的不同标签，是常见的，特别是在自然语言处理中，然而，注释框架通常仍然基于单一真相的假设。这忽视了人类标签变化（HLV），即注释中可能存在的差异，作为一种信息信号。同样，主动学习（AL）是优化有限注释预算在训练ML模型中使用的一种流行方法，通常依赖于至少一种简化假设，这些假设在实践中很少成立，尤其是在承认HLV的情况下。在本文中，我们审查了关于真相和标签性质的基本假设，强调了将观察到的LV分解为信号（例如HLV）和噪声（例如注释错误）的必要性。我们调查了AL和（H）LV社区如何处理或忽视这些区别，并提出了一个概念框架，用于在整个AL循环中整合HLV，包括实例选择、注释选择和标签表示。我们进一步讨论了将大型语言模型（LLM）作为注释者的整合。我们的工作旨在为HLV感知的主动学习奠定概念基础，更好地反映现实世界注释的复杂性。

更新时间: 2025-07-03 12:59:28

领域: cs.CL,cs.HC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.02593v1

WebSailor: Navigating Super-human Reasoning for Web Agent

Transcending human cognitive limitations represents a critical frontier in LLM training. Proprietary agentic systems like DeepResearch have demonstrated superhuman capabilities on extremely complex information-seeking benchmarks such as BrowseComp, a feat previously unattainable. We posit that their success hinges on a sophisticated reasoning pattern absent in open-source models: the ability to systematically reduce extreme uncertainty when navigating vast information landscapes. Based on this insight, we introduce WebSailor, a complete post-training methodology designed to instill this crucial capability. Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation, RFT cold start, and an efficient agentic RL training algorithm, Duplicating Sampling Policy Optimization (DUPO). With this integrated pipeline, WebSailor significantly outperforms all opensource agents in complex information-seeking tasks, matching proprietary agents' performance and closing the capability gap.

Updated: 2025-07-03 12:59:07

标题: WebSailor：为Web代理导航超人类推理

摘要: 超越人类认知限制代表了LLM培训的一个关键前沿。像DeepResearch这样的专有代理系统已经展示了超人类能力，可以在极其复杂的信息搜索基准上表现出色，比如BrowseComp，这是以前无法实现的壮举。我们认为他们的成功取决于一种在开源模型中缺失的复杂推理模式：在浩瀚的信息领域中导航时系统地减少极端不确定性的能力。基于这一洞察，我们引入了WebSailor，这是一个完整的后训练方法论，旨在灌输这一关键能力。我们的方法涉及通过结构化抽样和信息混淆生成新颖的、高不确定性的任务，RFT冷启动，以及一种高效的代理RL训练算法，即重复采样策略优化（DUPO）。通过这个集成的管道，WebSailor在复杂的信息搜索任务中明显优于所有开源代理，与专有代理的性能相匹配，并弥补了能力差距。

更新时间: 2025-07-03 12:59:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.02592v1

LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection

The rapid advancement of diffusion-based image generators has made it increasingly difficult to distinguish generated from real images. This can erode trust in digital media, making it critical to develop generalizable detectors for generated images. Recent methods leverage diffusion denoising cues, but mainly focus on single-step reconstruction errors, ignoring the inherent sequential nature of the denoising process. In this work, we propose LATTE - Latent Trajectory Embedding - a novel approach that models the evolution of latent embeddings across several denoising timesteps. By modeling the trajectory of such embeddings rather than single-step errors, LATTE captures subtle, discriminative patterns that distinguish real from generated images. Each latent is refined by employing our latent-visual feature refinement module and aggregated into a unified representation. Afterwards, it is fused with the visual features and finally passed into a lightweight classifier. Our experiments demonstrate that LATTE surpasses the baselines on several established benchmarks, such as GenImage and DiffusionFake. Moreover, it demonstrates strong performance in cross-generator and cross-datasets settings, highlighting the potential of using the trajectory of latent embeddings for generated image detection. The code is available on the following link: https://github.com/AnaMVasilcoiu/LATTE-Diffusion-Detector.

Updated: 2025-07-03 12:53:47

标题: LATTE：用于扩散生成图像检测的潜在轨迹嵌入

摘要: 扩散基图像生成器的快速发展使得越来越难以区分生成的图像和真实的图像。这可能会侵蚀对数字媒体的信任，因此开发可推广的生成图像检测器至关重要。最近的方法利用扩散去噪线索，但主要关注单步重建错误，忽略了去噪过程的固有顺序性质。在这项工作中，我们提出了LATTE - 潜轨嵌入 - 一种新颖的方法，它模拟了潜在嵌入在多个去噪时间步骤中的演变。通过建模这些嵌入的轨迹而不是单步错误，LATTE捕捉到了区分真实图像和生成图像的微妙、有区别的模式。每个潜在嵌入通过采用我们的潜在视觉特征精炼模块进行精炼，并被聚合成统一的表示。然后，它与视觉特征融合，最终传递到一个轻量级分类器。我们的实验表明，LATTE在几个已建立的基准测试中超越了基线，例如GenImage和DiffusionFake。此外，在跨生成器和跨数据集设置中，它展现出强大的性能，突显了利用潜在嵌入轨迹进行生成图像检测的潜力。代码可在以下链接找到：https://github.com/AnaMVasilcoiu/LATTE-Diffusion-Detector。

更新时间: 2025-07-03 12:53:47

领域: cs.CV,cs.AI,I.2.10; I.4.8; I.5

下载: http://arxiv.org/abs/2507.03054v1

A framework for Conditional Reasoning in Answer Set Programming

In this paper we introduce a Conditional Answer Set Programming framework (Conditional ASP) for the definition of conditional extensions of Answer Set Programming (ASP). The approach builds on a conditional logic with typicality, and on the combination of a conditional knowledge base with an ASP program, and allows for conditional reasoning over the answer sets of the program. The formalism relies on a multi-preferential semantics (and on the KLM preferential semantics, as a special case) to provide an interpretation of conditionals.

Updated: 2025-07-03 12:50:18

标题: 一个条件推理的框架在答案集编程中

摘要: 在本文中，我们介绍了一种条件答案集编程框架（Conditional ASP），用于定义答案集编程（ASP）的条件扩展。该方法建立在具有典型性的条件逻辑和将条件知识库与ASP程序结合的基础上，并允许对程序的答案集进行条件推理。该形式主义依赖于多优先语义（以及KLM优先语义，作为特例）来解释条件语句。

更新时间: 2025-07-03 12:50:18

领域: cs.AI,cs.LO,I.2.4

下载: http://arxiv.org/abs/2506.03997v2

Scalable Interconnect Learning in Boolean Networks

Learned Differentiable Boolean Logic Networks (DBNs) already deliver efficient inference on resource-constrained hardware. We extend them with a trainable, differentiable interconnect whose parameter count remains constant as input width grows, allowing DBNs to scale to far wider layers than earlier learnable-interconnect designs while preserving their advantageous accuracy. To further reduce model size, we propose two complementary pruning stages: an SAT-based logic equivalence pass that removes redundant gates without affecting performance, and a similarity-based, data-driven pass that outperforms a magnitude-style greedy baseline and offers a superior compression-accuracy trade-off.

Updated: 2025-07-03 12:45:45

标题: 在布尔网络中可扩展的互联学习

摘要: 学习可微可学习的布尔逻辑网络（DBNs）已经在资源受限的硬件上提供了高效的推理。我们通过一个可训练的、可微的互连来扩展它们，其参数数量随着输入宽度的增加保持恒定，使得DBNs能够扩展到比之前的可学习互连设计更宽的层，同时保持其有利的准确性。为了进一步减少模型大小，我们提出了两个互补的修剪阶段：基于SAT的逻辑等效性传递，删除多余的门而不影响性能，以及基于相似性的、数据驱动的传递，优于基于幅度的贪婪基线，并提供更优越的压缩准确性平衡。

更新时间: 2025-07-03 12:45:45

领域: cs.LG,cs.LO

下载: http://arxiv.org/abs/2507.02585v1

Responsibility Gap and Diffusion in Sequential Decision-Making Mechanisms

Responsibility has long been a subject of study in law and philosophy. More recently, it became a focus of AI literature. The article investigates the computational complexity of two important properties of responsibility in collective decision-making: diffusion and gap. It shows that the sets of diffusion-free and gap-free decision-making mechanisms are $\Pi_2$-complete and $\Pi_3$-complete, respectively. At the same time, the intersection of these classes is $\Pi_2$-complete.

Updated: 2025-07-03 12:43:38

标题: 责任缺失和扩散在连续决策机制中的作用

摘要: 责任长期以来一直是法律和哲学研究的主题。最近，它成为人工智能文献的焦点。本文研究了在集体决策中责任的两个重要属性的计算复杂性：扩散和差距。研究表明，无扩散和无差距决策机制的集合分别是$\Pi_2$-完全和$\Pi_3$-完全。同时，这些类的交集是$\Pi_2$-完全。

更新时间: 2025-07-03 12:43:38

领域: cs.AI

下载: http://arxiv.org/abs/2507.02582v1

SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings

This paper introduces SoccerDiffusion, a transformer-based diffusion model designed to learn end-to-end control policies for humanoid robot soccer directly from real-world gameplay recordings. Using data collected from RoboCup competitions, the model predicts joint command trajectories from multi-modal sensor inputs, including vision, proprioception, and game state. We employ a distillation technique to enable real-time inference on embedded platforms that reduces the multi-step diffusion process to a single step. Our results demonstrate the model's ability to replicate complex motion behaviors such as walking, kicking, and fall recovery both in simulation and on physical robots. Although high-level tactical behavior remains limited, this work provides a robust foundation for subsequent reinforcement learning or preference optimization methods. We release the dataset, pretrained models, and code under: https://bit-bots.github.io/SoccerDiffusion

Updated: 2025-07-03 12:26:15

标题: 足球传播：从游戏录像中学习端到端的人形机器人足球

摘要: 这篇论文介绍了SoccerDiffusion，这是一个基于transformer的扩散模型，旨在通过真实世界的游戏录像学习人形机器人足球的端到端控制策略。利用从RoboCup比赛收集的数据，该模型从多模态传感器输入（包括视觉、本体感知和游戏状态）中预测关节命令轨迹。我们采用一种蒸馏技术，以实现在嵌入式平台上的实时推断，将多步扩散过程减少到一步。我们的结果表明，该模型能够在模拟和实际机器人上复制复杂的运动行为，如行走、踢球和摔倒恢复。尽管高层战术行为仍受限，但这项工作为后续的强化学习或偏好优化方法提供了稳健的基础。我们在https://bit-bots.github.io/SoccerDiffusion发布了数据集、预训练模型和代码。

更新时间: 2025-07-03 12:26:15

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.20808v2

From 2:4 to 8:16 sparsity patterns in LLMs for Outliers and Weights with Variance Correction

As large language models (LLMs) grow in size, efficient compression techniques like quantization and sparsification are critical. While quantization maintains performance with reduced precision, structured sparsity methods, such as N:M sparsification, often fall short due to limited flexibility, and sensitivity to outlier weights. We explore 8:16 semi-structured sparsity, demonstrating its ability to surpass the Performance Threshold-where a compressed model matches the accuracy of its uncompressed or smaller counterpart under equivalent memory constraints. Compared to 2:4 sparsity, 8:16 offers greater flexibility with minimal storage overhead (0.875 vs. 0.75 bits/element). We also apply sparse structured patterns for salient weights, showing that structured sparsity for outliers is competitive with unstructured approaches leading to equivalent or better results. Finally, we demonstrate that simple techniques such as variance correction and SmoothQuant like weight equalization improve sparse models performance.

Updated: 2025-07-03 12:17:45

标题: 从2:4到8:16的稀疏模式在用于异常值和具有方差校正的LLMs中的翻译

摘要: 随着大型语言模型（LLMs）规模的增长，高效的压缩技术如量化和稀疏化变得至关重要。尽管量化可以在降低精度的情况下保持性能，但结构稀疏化方法，如N:M稀疏化，通常由于受限的灵活性和对异常权重的敏感性而表现不佳。我们探讨了8:16半结构稀疏化，展示了其能够超越性能阈值-在等价内存约束下，压缩模型与其未压缩或更小的对应模型的准确性相匹配。与2:4稀疏化相比，8:16提供了更大的灵活性，并且具有最小的存储开销（0.875比0.75位/元）。我们还应用了稀疏结构模式来处理显著权重，显示出针对异常权重的结构稀疏化与非结构化方法相竞争，产生了等效或更好的结果。最后，我们证明简单的技术，如方差校正和类似SmoothQuant的权重均衡，可以改善稀疏模型的性能。

更新时间: 2025-07-03 12:17:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03052v1

Towards an Explainable Comparison and Alignment of Feature Embeddings

While several feature embedding models have been developed in the literature, comparisons of these embeddings have largely focused on their numerical performance in classification-related downstream applications. However, an interpretable comparison of different embeddings requires identifying and analyzing mismatches between sample groups clustered within the embedding spaces. In this work, we propose the \emph{Spectral Pairwise Embedding Comparison (SPEC)} framework to compare embeddings and identify their differences in clustering a reference dataset. Our approach examines the kernel matrices derived from two embeddings and leverages the eigendecomposition of the difference kernel matrix to detect sample clusters that are captured differently by the two embeddings. We present a scalable implementation of this kernel-based approach, with computational complexity that grows linearly with the sample size. Furthermore, we introduce an optimization problem using this framework to align two embeddings, ensuring that clusters identified in one embedding are also captured in the other model. We provide numerical results demonstrating the SPEC's application to compare and align embeddings on large-scale datasets such as ImageNet and MS-COCO. The project page is available at https://mjalali.github.io/SPEC/.

Updated: 2025-07-03 12:12:39

标题: 朝向可解释的特征嵌入比较和对齐

摘要: 尽管文献中已经开发了几种特征嵌入模型，但对这些嵌入进行比较主要集中在它们在与分类相关的下游应用中的数值表现上。然而，对不同嵌入的可解释比较需要识别和分析嵌入空间中聚类的样本组之间的不匹配。在这项工作中，我们提出了\emph{光谱成对嵌入比较（SPEC）}框架，用于比较嵌入并识别它们在聚类参考数据集时的差异。我们的方法检查从两个嵌入中导出的核矩阵，并利用差异核矩阵的特征分解来检测两个嵌入捕获不同的样本聚类。我们提出了一个基于核的方法的可扩展实现，计算复杂度随着样本大小的增加而线性增长。此外，我们使用这个框架引入一个优化问题，以使两个嵌入对齐，确保在一个嵌入中识别的聚类也在另一个模型中捕获。我们提供了数值结果，展示了SPEC在大规模数据集（如ImageNet和MS-COCO）上比较和对齐嵌入的应用。该项目页面可在https://mjalali.github.io/SPEC/上找到。

更新时间: 2025-07-03 12:12:39

领域: cs.LG,cs.AI,cs.CV,math.SP

下载: http://arxiv.org/abs/2506.06231v2

Understanding-informed Bias Mitigation for Fair CMR Segmentation

Artificial intelligence (AI) is increasingly being used for medical imaging tasks. However, there can be biases in AI models, particularly when they are trained using imbalanced training datasets. One such example has been the strong ethnicity bias effect in cardiac magnetic resonance (CMR) image segmentation models. Although this phenomenon has been reported in a number of publications, little is known about the effectiveness of bias mitigation algorithms in this domain. We aim to investigate the impact of common bias mitigation methods to address bias between Black and White subjects in AI-based CMR segmentation models. Specifically, we use oversampling, importance reweighing and Group DRO as well as combinations of these techniques to mitigate the ethnicity bias. Second, motivated by recent findings on the root causes of AI-based CMR segmentation bias, we evaluate the same methods using models trained and evaluated on cropped CMR images. We find that bias can be mitigated using oversampling, significantly improving performance for the underrepresented Black subjects whilst not significantly reducing the majority White subjects' performance. Using cropped images increases performance for both ethnicities and reduces the bias, whilst adding oversampling as a bias mitigation technique with cropped images reduces the bias further. When testing the models on an external clinical validation set, we find high segmentation performance and no statistically significant bias.

Updated: 2025-07-03 12:11:58

标题: 理解驱动的偏见缓解对公平的CMR分割

摘要: 人工智能（AI）越来越多地被用于医学成像任务。然而，当使用不平衡的训练数据集进行训练时，AI模型可能存在偏见。一个例子是在心脏磁共振（CMR）图像分割模型中出现的明显的种族偏见效应。尽管这种现象已经在许多出版物中报道，但关于在该领域中使用偏见缓解算法的有效性了解甚少。我们旨在调查常见的偏见缓解方法对于解决基于AI的CMR分割模型中黑人和白人之间的种族偏见的影响。具体来说，我们使用过采样、重要性重新加权和Group DRO以及这些技术的组合来缓解种族偏见。其次，受最近关于基于AI的CMR分割偏见根本原因的发现的启发，我们评估了使用在裁剪的CMR图像上训练和评估的模型的相同方法。我们发现可以使用过采样来缓解偏见，显着提高了少数族裔黑人的性能，同时并没有显著降低绝大多数白人的性能。使用裁剪图像可以提高两种族群的性能并减少偏见，同时在裁剪图像中添加过采样作为偏见缓解技术可以进一步减少偏见。在对外部临床验证集测试模型时，我们发现有很高的分割性能，并且没有统计学上显著的偏见。

更新时间: 2025-07-03 12:11:58

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.17089v2

AI Flow: Perspectives, Scenarios, and Approaches

Pioneered by the foundational information theory by Claude Shannon and the visionary framework of machine intelligence by Alan Turing, the convergent evolution of information and communication technologies (IT/CT) has created an unbroken wave of connectivity and computation. This synergy has sparked a technological revolution, now reaching its peak with large artificial intelligence (AI) models that are reshaping industries and redefining human-machine collaboration. However, the realization of ubiquitous intelligence faces considerable challenges due to substantial resource consumption in large models and high communication bandwidth demands. To address these challenges, AI Flow has been introduced as a multidisciplinary framework that integrates cutting-edge IT and CT advancements, with a particular emphasis on the following three key points. First, device-edge-cloud framework serves as the foundation, which integrates end devices, edge servers, and cloud clusters to optimize scalability and efficiency for low-latency model inference. Second, we introduce the concept of familial models, which refers to a series of different-sized models with aligned hidden features, enabling effective collaboration and the flexibility to adapt to varying resource constraints and dynamic scenarios. Third, connectivity- and interaction-based intelligence emergence is a novel paradigm of AI Flow. By leveraging communication networks to enhance connectivity, the collaboration among AI models across heterogeneous nodes achieves emergent intelligence that surpasses the capability of any single model. The innovations of AI Flow provide enhanced intelligence, timely responsiveness, and ubiquitous accessibility to AI services, paving the way for the tighter fusion of AI techniques and communication systems.

Updated: 2025-07-03 12:11:34

标题: AI Flow: 视角、场景和方法论

摘要: 由克劳德·香农的基础信息理论和艾伦·图灵的机器智能愿景开创，信息与通信技术（IT/CT）的融合进化创造了一波连续的连接和计算浪潮。这种协同作用引发了一场技术革命，现在已经达到了以大型人工智能（AI）模型为核心，重塑行业并重新定义人机合作的顶峰。然而，普遍智能的实现面临着重大挑战，这是由于大型模型的大量资源消耗和高通信带宽需求所致。为了应对这些挑战，AI Flow被引入为一个多学科框架，整合了尖端的IT和CT进步，特别强调以下三个关键点。首先，设备-边缘-云框架作为基础，整合了端设备、边缘服务器和云集群，以优化低延迟模型推理的可伸缩性和效率。其次，我们引入了家族模型的概念，指的是一系列具有对齐隐藏特征的不同大小模型，能够实现有效的协作和灵活适应不同资源约束和动态场景。第三，连接和互动为基础的智能出现是AI Flow的一种新范式。通过利用通信网络增强连接，跨异构节点之间的AI模型协作实现了超越任何单一模型能力的新兴智能。AI Flow的创新为增强智能、及时响应和普遍可访问性提供了路径，为AI技术和通信系统更紧密融合铺平了道路。

更新时间: 2025-07-03 12:11:34

领域: cs.AI,cs.CL,cs.CV,cs.DC,eess.SP

下载: http://arxiv.org/abs/2506.12479v2

Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability

Layer-wise normalization (LN) is an essential component of virtually all transformer-based large language models. While its effects on training stability are well documented, its role at inference time is poorly understood. Additionally, LN layers hinder mechanistic interpretability by introducing additional nonlinearities and increasing the interconnectedness of individual model components. Here, we show that all LN layers can be removed from every GPT-2 model with only a small increase in validation loss (e.g. +0.03 cross-entropy loss for GPT-2 XL). Thus, LN cannot play a substantial role in language modeling. We find that the amount of fine-tuning data needed for LN removal grows sublinearly with model parameters, suggesting scaling to larger models is feasible. We release a suite of LN-free GPT-2 models on Hugging Face. Furthermore, we test interpretability techniques on LN-free models. Direct logit attribution now gives the exact direct effect of individual components, while the accuracy of attribution patching does not significantly improve. We also confirm that GPT-2's "confidence neurons" are inactive in the LN-free models. Our work clarifies the role of LN layers in language modeling, showing that GPT-2-class models can function without LN layers. We hope that our LN-free analogs of the GPT-2 family of models will enable more precise interpretability research and improve our understanding of language models.

Updated: 2025-07-03 12:09:04

标题: 变压器在推理时不需要LayerNorm：将LayerNorm去除扩展到GPT-2 XL和对机械解释可解释性的影响

摘要: 逐层归一化（LN）是几乎所有基于transformer的大型语言模型的重要组成部分。虽然其对训练稳定性的影响已经被充分记录，但其在推理时的作用仍然不明确。此外，LN层通过引入额外的非线性和增加个体模型组件的相互联系，阻碍了机械解释能力。在这里，我们展示了可以从每个GPT-2模型中删除所有LN层，而仅在验证损失上略微增加（例如，对于GPT-2 XL，+0.03的交叉熵损失）。因此，LN在语言建模中不能起到重要作用。我们发现，LN删除所需的微调数据量随着模型参数的增加呈次线性增长，表明扩展到更大的模型是可行的。我们在Hugging Face上发布了一系列无LN的GPT-2模型。此外，我们在无LN模型上测试了可解释性技术。直接的逻辑归因现在可以提供个体组件的确切直接效果，而归因修补的准确性并没有显着提高。我们还确认了在无LN模型中GPT-2的“信心神经元”是不活跃的。我们的工作阐明了LN层在语言建模中的作用，表明GPT-2类型的模型可以在没有LN层的情况下运行。我们希望我们的GPT-2系列模型的无LN模拟将使更精确的可解释性研究成为可能，并提高我们对语言模型的理解。

更新时间: 2025-07-03 12:09:04

领域: cs.LG

下载: http://arxiv.org/abs/2507.02559v1

Fading memory and the convolution theorem

Several topological and analytical notions of continuity and fading memory for causal and time-invariant filters are introduced, and the relations between them are analyzed. A significant generalization of the convolution theorem that establishes the equivalence between the fading memory property and the availability of convolution representations of linear filters is proved. This result extends a previous similar characterization to a complete array of weighted norms in the definition of the fading memory property. Additionally, the main theorem shows that the availability of convolution representations can be characterized, at least when the codomain is finite-dimensional, not only by the fading memory property but also by the reunion of two purely topological notions that are called minimal continuity and minimal fading memory property. Finally, when the input space and the codomain of a linear functional are Hilbert spaces, it is shown that minimal continuity and the minimal fading memory property guarantee the existence of interesting embeddings of the associated reproducing kernel Hilbert spaces.

Updated: 2025-07-03 12:08:58

标题: 消退记忆和卷积定理

摘要: 本文介绍了因果和时不变滤波器的连续性和衰减记忆的几个拓扑和分析概念，并分析了它们之间的关系。证明了卷积定理的一个重要推广，建立了衰减记忆属性与线性滤波器卷积表示的等价性。该结果将先前类似的表征扩展到衰减记忆属性定义中加权范数的完整数组。此外，主要定理表明，至少在共域是有限维的情况下，卷积表示的可用性不仅可以通过衰减记忆属性来表征，还可以通过两个纯拓扑概念的结合来表征，这两个概念称为最小连续性和最小衰减记忆属性。最后，当线性函数的输入空间和共域是希尔伯特空间时，证明了最小连续性和最小衰减记忆属性可以保证相关再生核希尔伯特空间的有趣嵌入的存在。

更新时间: 2025-07-03 12:08:58

领域: math.OC,cs.LG,math.FA

下载: http://arxiv.org/abs/2408.07386v3

Adaptive Probabilistic ODE Solvers Without Adaptive Memory Requirements

Despite substantial progress in recent years, probabilistic solvers with adaptive step sizes can still not solve memory-demanding differential equations -- unless we care only about a single point in time (which is far too restrictive; we want the whole time series). Counterintuitively, the culprit is the adaptivity itself: Its unpredictable memory demands easily exceed our machine's capabilities, making our simulations fail unexpectedly and without warning. Still, dropping adaptivity would abandon years of progress, which can't be the answer. In this work, we solve this conundrum. We develop an adaptive probabilistic solver with fixed memory demands building on recent developments in robust state estimation. Switching to our method (i) eliminates memory issues for long time series, (ii) accelerates simulations by orders of magnitude through unlocking just-in-time compilation, and (iii) makes adaptive probabilistic solvers compatible with scientific computing in JAX.

Updated: 2025-07-03 12:07:20

标题: 无需自适应内存要求的自适应概率ODE求解器

摘要: 尽管近年来在概率求解器方面取得了实质性进展，带有自适应步长的求解器仍然无法解决对内存要求较高的微分方程 - 除非我们只关心某个时间点（这太过于限制性了；我们想要整个时间序列）。令人费解的是，罪魁祸首是适应性本身：其不可预测的内存需求很容易超出我们机器的能力，导致我们的模拟意外失败而且没有警告。然而，放弃适应性将会丢弃多年的进展，这并不是答案。在这项工作中，我们解决了这个难题。我们基于最近在鲁棒状态估计方面的发展，开发了一个具有固定内存需求的自适应概率求解器。转换到我们的方法（i）消除了长时间序列的内存问题，（ii）通过解锁即时编译使模拟加速数倍，（iii）使自适应概率求解器与JAX科学计算兼容。

更新时间: 2025-07-03 12:07:20

领域: math.NA,cs.LG,cs.NA,stat.ML

下载: http://arxiv.org/abs/2410.10530v2

Flow Matching on Lie Groups

Flow Matching (FM) is a recent generative modelling technique: we aim to learn how to sample from distribution $\mathfrak{X}_1$ by flowing samples from some distribution $\mathfrak{X}_0$ that is easy to sample from. The key trick is that this flow field can be trained while conditioning on the end point in $\mathfrak{X}_1$: given an end point, simply move along a straight line segment to the end point (Lipman et al. 2022). However, straight line segments are only well-defined on Euclidean space. Consequently, Chen and Lipman (2023) generalised the method to FM on Riemannian manifolds, replacing line segments with geodesics or their spectral approximations. We take an alternative point of view: we generalise to FM on Lie groups by instead substituting exponential curves for line segments. This leads to a simple, intrinsic, and fast implementation for many matrix Lie groups, since the required Lie group operations (products, inverses, exponentials, logarithms) are simply given by the corresponding matrix operations. FM on Lie groups could then be used for generative modelling with data consisting of sets of features (in $\mathbb{R}^n$) and poses (in some Lie group), e.g. the latent codes of Equivariant Neural Fields (Wessels et al. 2025).

Updated: 2025-07-03 12:03:48

标题: 在李群上的流匹配

摘要: 流匹配（FM）是一种最近的生成建模技术：我们的目标是通过从一些容易抽样的分布$\mathfrak{X}_0$中流动样本来学习如何从分布$\mathfrak{X}_1$中抽样。关键技巧是这个流场可以在条件$\mathfrak{X}_1$的端点上进行训练：给定一个端点，只需沿着直线段移动到端点（Lipman等人，2022年）。然而，直线段仅在欧几里德空间上有明确定义。因此，Chen和Lipman（2023年）将该方法推广到黎曼流形上的FM，用测地线或它们的谱逼近替代直线段。我们采取了另一种观点：我们将FM推广到李群上，而不是用指数曲线替代直线段。这导致了许多矩阵李群的简单、内在和快速实现，因为所需的李群操作（乘积、逆、指数、对数）简单地由相应的矩阵操作给出。然后，可以将李群上的FM用于由特征集（在$\mathbb{R}^n$中）和姿态（在某个李群中）组成的数据的生成建模，例如等变神经场的潜在编码（Wessels等人，2025年）。

更新时间: 2025-07-03 12:03:48

领域: math.DG,cs.LG

下载: http://arxiv.org/abs/2504.00494v2

Interleaved Gibbs Diffusion: Generating Discrete-Continuous Data with Implicit Constraints

We introduce Interleaved Gibbs Diffusion (IGD), a novel generative modeling framework for discrete-continuous data, focusing on problems with important, implicit and unspecified constraints in the data. Most prior works on discrete and discrete-continuous diffusion assume a factorized denoising distribution, which can hinder the modeling of strong dependencies between random variables in such problems. We empirically demonstrate a significant improvement in 3-SAT performance out of the box by switching to a Gibbs-sampling style discrete diffusion model which does not assume factorizability. Motivated by this, we introduce IGD which generalizes discrete time Gibbs sampling type Markov chain for the case of discrete-continuous generation. IGD allows for seamless integration between discrete and continuous denoisers while theoretically guaranteeing exact reversal of a suitable forward process. Further, it provides flexibility in the choice of denoisers, allows conditional generation via state-space doubling and inference time refinement. Empirical evaluations on three challenging generation tasks - molecule structures, layouts and tabular data - demonstrate state-of-the-art performance. Notably, IGD achieves state-of-the-art results without relying on domain-specific inductive biases like equivariant diffusion or auxiliary losses. We explore a wide range of modeling, and interleaving strategies along with hyperparameters in each of these problems.

Updated: 2025-07-03 12:02:01

标题: 交错的吉布斯扩散：利用隐含约束生成离散-连续数据

摘要: 我们介绍了交错吉布斯扩散（IGD），这是一种针对离散-连续数据的新型生成建模框架，侧重于数据中重要、隐含和未指定的约束问题。大多数关于离散和离散-连续扩散的先前工作假设一个分解的去噪分布，这可能会阻碍在这类问题中随机变量之间强依赖关系的建模。我们通过实验证明，通过切换到不假定可分解性的吉布斯采样风格的离散扩散模型，可以显著改善3-SAT性能。受此启发，我们引入了IGD，它将离散时间吉布斯采样类型的马尔可夫链推广到离散-连续生成的情况。IGD允许在离散和连续去噪器之间无缝集成，同时在理论上保证适当正向过程的精确逆转。此外，它在去噪器的选择上提供了灵活性，允许通过状态空间加倍和推理时间细化进行条件生成。在三个具有挑战性的生成任务 - 分子结构、布局和表格数据上的实证评估显示出最先进的性能。值得注意的是，IGD实现了最先进的结果，而无需依赖领域特定的归等扩散或辅助损失。我们在每个问题中探索了各种建模和交错策略以及超参数。

更新时间: 2025-07-03 12:02:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.13450v2

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents' performance on MLE-bench, a challenging benchmark where agents compete in Kaggle competitions to solve real-world machine learning problems. We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators. By designing and systematically varying different operator sets and search policies (Greedy, MCTS, Evolutionary), we show that their interplay is critical for achieving high performance. Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6% to 47.7%. Our investigation underscores the importance of jointly considering the search strategy, operator design, and evaluation methodology in advancing automated machine learning.

Updated: 2025-07-03 11:59:15

标题: 《机器学习中的AI研究代理：MLE-bench中的搜索、探索和泛化》

摘要: AI研究代理正在展示巨大潜力，通过自动化设计、实现和训练机器学习模型来加速科学进展。我们关注改进代理在MLE-bench上表现的方法，这是一个具有挑战性的基准，代理在Kaggle竞赛中竞争解决真实世界的机器学习问题。我们将AI研究代理形式化为搜索策略，这些策略在候选解空间中导航，通过运算符迭代地修改它们。通过设计和系统地变化不同的运算符集合和搜索策略（贪婪、MCTS、进化），我们表明它们的相互作用对于实现高性能至关重要。我们最佳的搜索策略和运算符集合配对在MLE-bench lite上取得了最新的成果，将获得Kaggle奖牌的成功率从39.6%提高到47.7%。我们的调查强调了在推进自动化机器学习时共同考虑搜索策略、运算符设计和评估方法的重要性。

更新时间: 2025-07-03 11:59:15

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02554v1

Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning

Aerial Vision-and-Language Navigation (VLN) is a novel task enabling Unmanned Aerial Vehicles (UAVs) to navigate in outdoor environments through natural language instructions and visual cues. It remains challenging due to the complex spatial relationships in outdoor aerial scenes. In this paper, we propose an end-to-end zero-shot framework for aerial VLN tasks, where the large language model (LLM) is introduced as our agent for action prediction. Specifically, we develop a novel Semantic-Topo-Metric Representation (STMR) to enhance the spatial reasoning ability of LLMs. This is achieved by extracting and projecting instruction-related semantic masks of landmarks into a top-down map that contains the location information of surrounding landmarks. Further, this map is transformed into a matrix representation with distance metrics as the text prompt to the LLM, for action prediction according to the instruction. Experiments conducted in real and simulation environments have successfully proved the effectiveness and robustness of our method, achieving 15.9% and 12.5% improvements (absolute) in Oracle Success Rate (OSR) on AerialVLN-S dataset.

Updated: 2025-07-03 11:55:43

标题: 通过语义-拓扑-度量表示引导的LLM推理的空中视觉和语言导航

摘要: 空中视觉语言导航（VLN）是一项新颖的任务，使无人机在户外环境中通过自然语言指令和视觉线索导航。由于户外空中场景中的复杂空间关系，这仍然具有挑战性。在本文中，我们提出了一种用于空中VLN任务的端到端零样本框架，其中大型语言模型（LLM）被引入作为我们的动作预测代理。具体来说，我们开发了一种新颖的语义-拓扑-度量表示（STMR）来增强LLM的空间推理能力。这通过提取和投影与地标相关的语义掩模到一个包含周围地标位置信息的自上而下地图来实现。此外，该地图被转换为一个矩阵表示，其中距离度量作为LLM的文本提示，用于根据指示进行动作预测。在真实和仿真环境中进行的实验成功证明了我们方法的有效性和鲁棒性，在AerialVLN-S数据集上实现了15.9％和12.5％的Oracle成功率（OSR）的改进（绝对值）。

更新时间: 2025-07-03 11:55:43

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.08500v2

Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization

Improving and understanding the training dynamics and reasoning of Large Language Models (LLMs) has become essential for their deployment in AI-based security tools, such as software vulnerability detection. In this work, we present an extensive study aimed at advancing recent RL-based finetuning techniques for LLMs in the context of vulnerability detection. We start by highlighting key limitations of commonly adopted LLMs, such as their tendency to over-predict certain types of vulnerabilities while failing to detect others. To address this challenge, we explore the use of Group Relative Policy Optimization (GRPO), a recent policy-gradient method, for guiding LLM behavior through structured, rule-based rewards. We enable its application to the vulnerability detection task by redefining its advantage functions and reward signals using annotations from widely used datasets in the field, including BigVul, DiverseVul, and CleanVul. The proposed methodology enables an extensive set of experiments, addressing multiple research questions regarding the impact of GRPO on generalization, reasoning capabilities, and performance improvements over standard supervised finetuning (SFT). Our findings offer valuable insights into the potential of RL-based training to enhance both the performance and reasoning abilities of LLMs in the context of software vulnerability detection.

Updated: 2025-07-03 11:52:45

标题: 通过群体相对策略优化改进漏洞检测的LLM推理

摘要: 提高和理解大型语言模型（LLMs）的训练动态和推理已成为部署人工智能安全工具（如软件漏洞检测）时的必要条件。在这项工作中，我们提出了一项旨在推进最近基于RL的微调技术用于LLMs在漏洞检测领域的广泛研究。我们首先强调了常用LLMs的关键局限性，比如它们倾向于过度预测某些类型的漏洞，同时无法检测其他类型的漏洞。为了解决这一挑战，我们探索了最近的策略梯度方法——Group Relative Policy Optimization（GRPO）的使用，通过结构化、基于规则的奖励来引导LLMs的行为。我们通过使用该领域中广泛使用的数据集（包括BigVul、DiverseVul和CleanVul）的注释，重新定义了其优势函数和奖励信号，使其适用于漏洞检测任务。提出的方法使得进行了大量实验，解决了关于GRPO对泛化、推理能力以及性能改进对标准监督微调（SFT）的影响的多个研究问题。我们的发现为RL-based训练提升LLMs在软件漏洞检测领域的性能和推理能力的潜力提供了宝贵的见解。

更新时间: 2025-07-03 11:52:45

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.03051v1

Position: A Theory of Deep Learning Must Include Compositional Sparsity

Overparametrized Deep Neural Networks (DNNs) have demonstrated remarkable success in a wide variety of domains too high-dimensional for classical shallow networks subject to the curse of dimensionality. However, open questions about fundamental principles, that govern the learning dynamics of DNNs, remain. In this position paper we argue that it is the ability of DNNs to exploit the compositionally sparse structure of the target function driving their success. As such, DNNs can leverage the property that most practically relevant functions can be composed from a small set of constituent functions, each of which relies only on a low-dimensional subset of all inputs. We show that this property is shared by all efficiently Turing-computable functions and is therefore highly likely present in all current learning problems. While some promising theoretical insights on questions concerned with approximation and generalization exist in the setting of compositionally sparse functions, several important questions on the learnability and optimization of DNNs remain. Completing the picture of the role of compositional sparsity in deep learning is essential to a comprehensive theory of artificial, and even general, intelligence.

Updated: 2025-07-03 11:49:56

标题: 立场：深度学习理论必须包括组合稀疏性

摘要: 超参数化深度神经网络（DNNs）在许多领域取得了显著的成功，这些领域对于经典的浅层网络来说维度过高，受到维度灾难的影响。然而，关于控制DNNs学习动态的基本原理仍然存在一些开放问题。在这篇立场论文中，我们认为DNNs之所以能够取得成功，是因为它们能够利用目标函数的组合稀疏结构。因此，DNNs可以利用大多数实际相关的函数可以由一小组基本函数组成的特性，每个基本函数只依赖于所有输入的低维子集。我们展示了这一特性被所有有效的图灵可计算函数所共享，因此在所有当前的学习问题中高度可能存在。尽管在组合稀疏函数的设置中存在一些有关逼近和泛化问题的有希望的理论洞见，但关于DNNs的可学习性和优化仍然存在一些重要问题。完善深度学习中组合稀疏性在作用中的角色对于人工智能，甚至一般智能的综合理论至关重要。

更新时间: 2025-07-03 11:49:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.02550v1

Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling

The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions from traditional methods like Constraint Programming (CP), and require simulated environments to train in, which are impracticable to build for complex scheduling environments. We introduce Offline Learned Dispatching (Offline-LD), an offline reinforcement learning approach for JSSP, which addresses these limitations by learning from historical scheduling data. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available or scenarios where online training of RL approaches with simulated environments is impracticable. Offline-LD introduces maskable variants of two Q-learning methods, namely, Maskable Quantile Regression DQN (mQRDQN) and discrete maskable Soft Actor-Critic (d-mSAC), that are able to learn from historical data, through Conservative Q-Learning (CQL). Moreover, we present a novel entropy bonus modification for d-mSAC, for maskable action spaces. Moreover, we introduce a novel reward normalization method for JSSP in an offline RL setting. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances when trained on only 100 solutions generated by CP. Notably, introducing noise to the expert dataset yields comparable or superior results to using the expert dataset, with the same amount of instances, a promising finding for real-world applications, where data is inherently noisy and imperfect.

Updated: 2025-07-03 11:41:37

标题: 脱机强化学习用于作业车间调度学习

摘要: 任务车间调度问题（JSSP）是一个复杂的组合优化问题。虽然在线强化学习（RL）已经显示出快速找到JSSP可接受解决方案的潜力，但它面临关键限制：它需要从头开始进行广泛的训练交互，导致样本效率低，不能利用来自传统方法（如约束编程（CP））的现有高质量解决方案，并且需要在模拟环境中进行训练，这对于复杂调度环境来说是不可行的。我们引入了离线学习调度（Offline-LD），这是一种针对JSSP的离线强化学习方法，它通过学习历史调度数据来解决这些限制。我们的方法受到历史调度数据和专家解决方案可用的场景或在线训练RL方法与模拟环境不可行的场景的启发。Offline-LD引入了两种Q学习方法的可屏蔽变体，即可屏蔽分位数回归DQN（mQRDQN）和离散可屏蔽软执行者-评论者（d-mSAC），它们能够通过保守Q学习（CQL）从历史数据中学习。此外，我们提出了一种针对d-mSAC的新颖熵奖励修改，适用于可屏蔽动作空间。此外，我们在离线RL设置中为JSSP引入了一种新颖的奖励归一化方法。我们的实验表明，当仅在CP生成的100个解决方案上进行训练时，Offline-LD在生成的和基准实例上都优于在线RL。值得注意的是，向专家数据集引入噪音可以产生与使用专家数据集相当或更好的结果，对于数据固有地嘈杂和不完美的实际应用是一个有希望的发现。

更新时间: 2025-07-03 11:41:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.10589v4

Higher-Order Singular-Value Derivatives of Rectangular Real Matrices

We present a theoretical framework for deriving the general $n$-th order Fr\'echet derivatives of singular values in real rectangular matrices, by leveraging reduced resolvent operators from Kato's analytic perturbation theory for self-adjoint operators. Deriving closed-form expressions for higher-order derivatives of singular values is notoriously challenging through standard matrix-analysis techniques. To overcome this, we treat a real rectangular matrix as a compact operator on a finite-dimensional Hilbert space, and embed the rectangular matrix into a block self-adjoint operator so that non-symmetric perturbations are captured. Applying Kato's asymptotic eigenvalue expansion to this construction, we obtain a general, closed-form expression for the infinitesimal $n$-th order spectral variations. Specializing to $n=2$ and deploying on a Kronecker-product representation with matrix convention yield the Hessian of a singular value, not found in literature. By bridging abstract operator-theoretic perturbation theory with matrices, our framework equips researchers with a practical toolkit for higher-order spectral sensitivity studies in random matrix applications (e.g., adversarial perturbation in deep learning).

Updated: 2025-07-03 11:36:31

标题: 矩形实矩阵的高阶奇异值导数

摘要: 我们提出了一个理论框架，用于推导实矩阵中奇异值的一般$n$阶弗雷歇导数，通过利用Kato的自伴算子解析摄动理论中的减少共轭算子。通过标准矩阵分析技术推导奇异值的高阶导数的封闭形式表达式是非常具有挑战性的。为了克服这一困难，我们将实矩阵视为有限维希尔伯特空间上的紧算子，并将矩阵嵌入到块自伴算子中，以捕捉非对称扰动。将Kato的渐近特征值展开应用于这一构造，我们得到了一般的、封闭形式的微小$n$阶谱变化表达式。特定于$n=2$，并采用克罗内克积表示与矩阵约定，得到了奇异值的海森矩阵，这在文献中找不到。通过将抽象算子摄动理论与矩阵相结合，我们的框架为研究人员在随机矩阵应用中提供了一个实用工具包，用于高阶谱敏感性研究（例如，深度学习中的对抗性摄动）。

更新时间: 2025-07-03 11:36:31

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2506.03764v3

Clarifying Before Reasoning: A Coq Prover with Structural Context

In this work, we investigate whether improving task clarity can enhance reasoning ability of large language models, focusing on theorem proving in Coq. We introduce a concept-level metric to evaluate task clarity and show that adding structured semantic context to the standard input used by modern LLMs, leads to a 1.85$\times$ improvement in clarity score (44.5\%~$\rightarrow$~82.3\%). Using the general-purpose model \texttt{DeepSeek-V3}, our approach leads to a 2.1$\times$ improvement in proof success (21.8\%~$\rightarrow$~45.8\%) and outperforms the previous state-of-the-art \texttt{Graph2Tac} (33.2\%). We evaluate this on 1,386 theorems randomly sampled from 15 standard Coq packages, following the same evaluation protocol as \texttt{Graph2Tac}. Furthermore, fine-tuning smaller models on our structured data can achieve even higher performance (48.6\%). Our method uses selective concept unfolding to enrich task descriptions, and employs a Planner--Executor architecture. These findings highlight the value of structured task representations in bridging the gap between understanding and reasoning.

Updated: 2025-07-03 11:35:34

标题: 在推理之前澄清：具有结构上下文的Coq证明器

摘要: 在这项工作中，我们调查了是否改善任务清晰度可以增强大型语言模型的推理能力，重点关注Coq中的定理证明。我们引入了一个概念级别的度量标准来评估任务清晰度，并展示了向现代LLMs使用的标准输入添加结构化语义上下文，可以使清晰度得分提高1.85倍（44.5\%~$\rightarrow$~82.3\%）。使用通用模型\texttt{DeepSeek-V3}，我们的方法使证明成功率提高了2.1倍（21.8\%~$\rightarrow$~45.8%），并且胜过了之前的最先进模型\texttt{Graph2Tac}（33.2%）。我们在从15个标准Coq软件包中随机抽取的1,386个定理上进行了评估，遵循与\texttt{Graph2Tac}相同的评估协议。此外，对我们的结构化数据进行微调可以实现更高的性能（48.6%）。我们的方法利用选择性概念展开来丰富任务描述，并采用了Planner--Executor架构。这些发现突显了结构化任务表示在理解和推理之间的桥梁价值。

更新时间: 2025-07-03 11:35:34

领域: cs.AI

下载: http://arxiv.org/abs/2507.02541v1

Are You Listening to Me? Fine-Tuning Chatbots for Empathetic Dialogue

Conversational agents have made significant progress since ELIZA, expanding their role across various domains, including healthcare, education, and customer service. As these agents become increasingly integrated into daily human interactions, the need for emotional intelligence, particularly empathetic listening, becomes increasingly essential. In this study, we explore how Large Language Models (LLMs) respond when tasked with generating emotionally rich interactions. Starting from a small dataset manually crafted by an expert to reflect empathic behavior, we extended the conversations using two LLMs: ChatGPT and Gemini. We analyzed the emotional progression of the dialogues using both sentiment analysis (via VADER) and expert assessments. While the generated conversations often mirrored the intended emotional structure, human evaluation revealed important differences in the perceived empathy and coherence of the responses. These findings suggest that emotion modeling in dialogues requires not only structural alignment in the expressed emotions but also qualitative depth, highlighting the importance of combining automated and humancentered methods in the development of emotionally competent agents.

Updated: 2025-07-03 11:32:41

标题: 你在听我说吗？为共情对话微调聊天机器人

摘要: 会话代理自从ELIZA以来取得了显著的进展，扩展了它们在各个领域的作用，包括医疗保健、教育和客户服务。随着这些代理越来越多地融入日常人类互动中，对于情感智能，特别是共情倾听，变得越来越重要。在这项研究中，我们探讨了当要求生成情感丰富交互时，大型语言模型（LLMs）的响应。我们从一个由专家手工制作的小数据集开始，以反映共情行为，然后使用两个LLMs（ChatGPT和Gemini）扩展了对话。我们使用情感分析（通过VADER）和专家评估分析了对话的情感进展。虽然生成的对话经常反映了预期的情感结构，但人类评估揭示了响应的共情和连贯性在感知上的重要差异。这些发现表明，在对话中进行情感建模不仅需要表达情感上的结构对齐，还需要质量深度，突出了在开发情感能力代理时结合自动化和以人为本的方法的重要性。

更新时间: 2025-07-03 11:32:41

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2507.02537v1

Real-Time Monitoring and Transparency in Pizza Production Using IoT and Blockchain

This paper presents a blockchain-based Internet of Things (IoT) system for monitoring pizza production in restaurants. IoT devices track temperature and humidity in real-time, while blockchain ensures secure and tamper-proof data. A Raspberry Pi processes sensor data, captures images, triggers alerts, and interacts with smart contracts. The system detects abnormal conditions, enabling quick responses. Blockchain adds transparency and traceability, supporting compliance and audits. Experiments show improved ingredient management, reduced waste, and increased kitchen efficiency.

Updated: 2025-07-03 11:32:28

标题: 实时监控和透明度在披萨生产中的应用：利用物联网和区块链技术

摘要: 本文提出了一种基于区块链的物联网（IoT）系统，用于监控餐厅中的披萨生产。物联网设备实时跟踪温度和湿度，而区块链确保数据安全且防篡改。树莓派处理传感器数据，捕捉图像，触发警报，并与智能合约互动。该系统能够检测异常情况，实现快速响应。区块链增加了透明度和可追溯性，支持合规和审计。实验证明改善了原料管理、减少了浪费，并提高了厨房效率。

更新时间: 2025-07-03 11:32:28

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2507.02536v1

Assessing Quantum Advantage for Gaussian Process Regression

Gaussian Process Regression is a well-known machine learning technique for which several quantum algorithms have been proposed. We show here that in a wide range of scenarios these algorithms show no exponential speedup. We achieve this by rigorously proving that the condition number of a kernel matrix scales at least linearly with the matrix size under general assumptions on the data and kernel. We additionally prove that the sparsity and Frobenius norm of a kernel matrix scale linearly under similar assumptions. The implications for the quantum algorithms runtime are independent of the complexity of loading classical data on a quantum computer and also apply to dequantised algorithms. We supplement our theoretical analysis with numerical verification for popular kernels in machine learning.

Updated: 2025-07-03 11:27:35

标题: 评估高斯过程回归的量子优势

摘要: 高斯过程回归是一种众所周知的机器学习技术，已经提出了几种量子算法。我们在这里展示，在广泛的情景下，这些算法未显示出指数加速。我们通过严格证明，在一般假设下，核矩阵的条件数至少与矩阵的大小线性增长来实现这一点。我们此外证明，在类似假设下，核矩阵的稀疏性和Frobenius范数线性增长。量子算法运行时间的影响与在量子计算机上加载经典数据的复杂性无关，也适用于去量子化算法。我们补充我们的理论分析与流行机器学习核函数的数值验证。

更新时间: 2025-07-03 11:27:35

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2505.22502v2

Ordinality in Discrete-level Question Difficulty Estimation: Introducing Balanced DRPS and OrderedLogitNN

Recent years have seen growing interest in Question Difficulty Estimation (QDE) using natural language processing techniques. Question difficulty is often represented using discrete levels, framing the task as ordinal regression due to the inherent ordering from easiest to hardest. However, the literature has neglected the ordinal nature of the task, relying on classification or discretized regression models, with specialized ordinal regression methods remaining unexplored. Furthermore, evaluation metrics are tightly coupled to the modeling paradigm, hindering cross-study comparability. While some metrics fail to account for the ordinal structure of difficulty levels, none adequately address class imbalance, resulting in biased performance assessments. This study addresses these limitations by benchmarking three types of model outputs -- discretized regression, classification, and ordinal regression -- using the balanced Discrete Ranked Probability Score (DRPS), a novel metric that jointly captures ordinality and class imbalance. In addition to using popular ordinal regression methods, we propose OrderedLogitNN, extending the ordered logit model from econometrics to neural networks. We fine-tune BERT on the RACE++ and ARC datasets and find that OrderedLogitNN performs considerably better on complex tasks. The balanced DRPS offers a robust and fair evaluation metric for discrete-level QDE, providing a principled foundation for future research.

Updated: 2025-07-03 11:23:04

标题: Ordinality in Discrete-level Question Difficulty Estimation: Introducing Balanced DRPS and OrderedLogitNN 离散水平问题难度估计中的序数性：引入平衡的DRPS和OrderedLogitNN

摘要: 近年来，人们对利用自然语言处理技术进行问题难度估计（QDE）越来越感兴趣。问题难度通常使用离散级别表示，将任务构建为有序回归，因为从最简单到最困难存在固有的排序。然而，文献忽视了任务的有序性质，依赖于分类或离散回归模型，专门的有序回归方法仍未被探索。此外，评估指标与建模范式紧密相关，阻碍了跨研究的可比性。虽然有些指标未能考虑难度级别的有序结构，但没有一个能充分解决类别不平衡问题，导致性能评估存在偏见。本研究通过基于平衡的离散级别QDE的模型输出（离散化回归、分类和有序回归）进行基准测试，使用平衡的离散排名概率分数（DRPS），这是一个新颖的指标，同时捕捉了有序性和类别不平衡。除了使用流行的有序回归方法外，我们提出了OrderedLogitNN，将经济学中的有序逻辑模型扩展到神经网络中。我们在RACE++和ARC数据集上对BERT进行微调，并发现OrderedLogitNN在复杂任务上表现得更好。平衡的DRPS为离散级别QDE提供了一个稳健公平的评估指标，为未来研究奠定了基础。

更新时间: 2025-07-03 11:23:04

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.00736v2

A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning

This paper studies the long-tailed semi-supervised learning (LTSSL) with distribution mismatch, where the class distribution of the labeled training data follows a long-tailed distribution and mismatches with that of the unlabeled training data. Most existing methods introduce auxiliary classifiers (experts) to model various unlabeled data distributions and produce pseudo-labels, but the expertises of various experts are not fully utilized. We observe that different experts are good at predicting different intervals of samples, e.g., long-tailed expert is skilled in samples located in the head interval and uniform expert excels in samples located in the medium interval. Therefore, we propose a dynamic expert assignment module that can estimate the class membership (i.e., head, medium, or tail class) of samples, and dynamically assigns suitable expert to each sample based on the estimated membership to produce high-quality pseudo-label in the training phase and produce prediction in the testing phase. We also theoretically reveal that integrating different experts' strengths will lead to a smaller generalization error bound. Moreover, we find that the deeper features are more biased toward the head class but with more discriminative ability, while the shallower features are less biased but also with less discriminative ability. We, therefore, propose a multi-depth feature fusion module to utilize different depth features to mitigate the model bias. Our method demonstrates its effectiveness through comprehensive experiments on the CIFAR-10-LT, STL-10-LT, and SVHN-LT datasets across various settings. The code is available at https://github.com/yaxinhou/Meta-Expert.

Updated: 2025-07-03 11:12:29

标题: 一个方形钉在一个方形孔中：长尾半监督学习的元专家

摘要: 本文研究了长尾半监督学习（LTSSL），其中标记训练数据的类分布遵循长尾分布，并与未标记训练数据的类分布不匹配。大多数现有方法引入辅助分类器（专家）来模拟各种未标记数据分布并生成伪标签，但各个专家的专业知识并未充分利用。我们观察到不同的专家擅长预测不同区间的样本，例如，长尾专家擅长位于头部区间的样本，均匀专家擅长位于中间区间的样本。因此，我们提出了一个动态专家分配模块，该模块可以估计样本的类成员资格（即头部、中间或尾部类），并根据估计的成员资格动态为每个样本分配适当的专家，在训练阶段生成高质量的伪标签，并在测试阶段生成预测。我们还从理论上揭示，整合不同专家的优势将导致更小的泛化错误界限。此外，我们发现更深层的特征更偏向于头部类别，但具有更强的区分能力，而更浅的特征偏见较小但也具有较少的区分能力。因此，我们提出了一个多深度特征融合模块，利用不同深度的特征来减轻模型偏差。我们的方法通过在不同设置下对CIFAR-10-LT、STL-10-LT和SVHN-LT数据集进行全面实验，证明了其有效性。代码可在https://github.com/yaxinhou/Meta-Expert 上找到。

更新时间: 2025-07-03 11:12:29

领域: cs.LG

下载: http://arxiv.org/abs/2505.16341v2

RetrySQL: text-to-SQL training with retry data for self-correcting query generation

The text-to-SQL task is an active challenge in Natural Language Processing. Many existing solutions focus on using black-box language models extended with specialized components within customized end-to-end text-to-SQL pipelines. While these solutions use both closed-source proprietary language models and coding-oriented open-source models, there is a lack of research regarding SQL-specific generative models. At the same time, recent advancements in self-correcting generation strategies show promise for improving the capabilities of existing architectures. The application of these concepts to the text-to-SQL task remains unexplored. In this paper, we introduce RetrySQL, a new approach to training text-to-SQL generation models. We prepare reasoning steps for reference SQL queries and then corrupt them to create retry data that contains both incorrect and corrected steps, divided with a special token. We continuously pre-train an open-source coding model with this data and demonstrate that retry steps yield an improvement of up to 4 percentage points in both overall and challenging execution accuracy metrics, compared to pre-training without retry data. Additionally, we confirm that supervised fine-tuning with LoRA is ineffective for learning from retry data and that full-parameter pre-training is a necessary requirement for that task. We showcase that the self-correcting behavior is learned by the model and the increase in downstream accuracy metrics is a result of this additional skill. Finally, we incorporate RetrySQL-trained models into the full text-to-SQL pipeline and showcase that they are competitive in terms of execution accuracy with proprietary models that contain orders of magnitude more parameters. RetrySQL demonstrates that self-correction can be learned in the text-to-SQL task and provides a novel way of improving generation accuracy for SQL-oriented language models.

Updated: 2025-07-03 11:00:49

标题: RetrySQL：使用重试数据进行文本到SQL训练，用于自我纠正的查询生成

摘要: 文本到SQL任务是自然语言处理中的一个活跃挑战。许多现有解决方案侧重于使用黑盒语言模型，并在定制的端到端文本到SQL管道中扩展了专门的组件。尽管这些解决方案同时使用专有的闭源语言模型和面向编码的开源模型，但缺乏针对SQL特定生成模型的研究。与此同时，最近的自我纠正生成策略的进展显示出改进现有架构能力的潜力。将这些概念应用于文本到SQL任务尚未被探索。在本文中，我们介绍了RetrySQL，一种训练文本到SQL生成模型的新方法。我们为参考SQL查询准备推理步骤，然后对其进行损坏，创建包含不正确和更正步骤的重试数据，用特殊令牌分隔。我们持续使用这些数据对开源编码模型进行预训练，并证明重试步骤与没有重试数据的预训练相比，在总体和具有挑战性的执行准确度指标上均有高达4个百分点的改进。此外，我们确认，使用LoRA进行监督微调无法从重试数据中学习，并且完全参数的预训练是该任务的必要要求。我们展示了模型学习到的自我纠正行为，并且下游准确度指标的增加是这种额外技能的结果。最后，我们将RetrySQL训练的模型整合到完整的文本到SQL管道中，并展示它们在执行准确度方面与包含数量级更多参数的专有模型竞争力。RetrySQL展示了自我纠正可以在文本到SQL任务中学习，并提供了一种改进面向SQL的语言模型生成精度的新方法。

更新时间: 2025-07-03 11:00:49

领域: cs.LG

下载: http://arxiv.org/abs/2507.02529v1

From Turing to Tomorrow: The UK's Approach to AI Regulation

The UK has pursued a distinctive path in AI regulation: less cautious than the EU but more willing to address risks than the US, and has emerged as a global leader in coordinating AI safety efforts. Impressive developments from companies like London-based DeepMind began to spark concerns in the UK about catastrophic risks from around 2012, although regulatory discussion at the time focussed on bias and discrimination. By 2022, these discussions had evolved into a "pro-innovation" strategy, in which the government directed existing regulators to take a light-touch approach, governing AI at point of use, but avoided regulating the technology or infrastructure directly. ChatGPT arrived in late 2022, galvanising concerns that this approach may be insufficient. The UK responded by establishing an AI Safety Institute to monitor risks and hosting the first international AI Safety Summit in 2023, but - unlike the EU - refrained from regulating frontier AI development in addition to its use. A new government was elected in 2024 which promised to address this gap, but at the time of writing is yet to do so. What should the UK do next? The government faces competing objectives: harnessing AI for economic growth and better public services while mitigating risk. In light of these, we propose establishing a flexible, principles-based regulator to oversee the most advanced AI development, defensive measures against risks from AI-enabled biological design tools, and argue that more technical work is needed to understand how to respond to AI-generated misinformation. We argue for updated legal frameworks on copyright, discrimination, and AI agents, and that regulators will have a limited but important role if AI substantially disrupts labour markets. If the UK gets AI regulation right, it could demonstrate how democratic societies can harness AI's benefits while managing its risks.

Updated: 2025-07-03 10:54:43

标题: 从图灵到明天：英国的人工智能监管方法

摘要: 英国在人工智能监管方面走了一条独特的道路：比欧盟更不谨慎，但比美国更愿意应对风险，并已成为全球协调人工智能安全工作的领导者。伦敦DeepMind等公司的引人瞩目的发展从2012年左右开始引发了英国对灾难性风险的担忧，尽管当时的监管讨论集中在偏见和歧视上。到2022年，这些讨论已演变成一种“支持创新”的策略，政府指示现有监管机构采取轻 touch 方法，在使用点上对人工智能进行管理，但避免直接监管技术或基础设施。ChatGPT于2022年末问世，激起了担忧，认为这种方法可能不足以应对。英国回应称设立了人工智能安全研究所来监测风险，并于2023年举办了首次国际人工智能安全峰会，但与欧盟不同，除了使用之外，还不对前沿人工智能发展进行监管。2024年新政府当选，承诺解决这一差距，但截至撰写本文时尚未这样做。英国下一步该怎么做？政府面临着竞争的目标：利用人工智能促进经济增长和提供更好的公共服务，同时减轻风险。基于这些考虑，我们建议建立一个灵活的、基于原则的监管机构，监督最先进的人工智能开发，采取防御措施应对人工智能设计工具带来的风险，并认为需要更多技术工作来了解如何应对人工智能生成的错误信息。我们主张更新版权、歧视和人工智能代理的法律框架，并认为如果人工智能实质性地扰乱劳动市场，监管机构将发挥有限但重要的作用。如果英国能够正确监管人工智能，它将展示民主社会如何利用人工智能的好处同时管理其风险。

更新时间: 2025-07-03 10:54:43

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2507.03050v1

Deep learning four decades of human migration

We present a novel and detailed dataset on origin-destination annual migration flows and stocks between 230 countries and regions, spanning the period from 1990 to the present. Our flow estimates are further disaggregated by country of birth, providing a comprehensive picture of migration over the last 35 years. The estimates are obtained by training a deep recurrent neural network to learn flow patterns from 18 covariates for all countries, including geographic, economic, cultural, societal, and political information. The recurrent architecture of the neural network means that the entire past can influence current migration patterns, allowing us to learn long-range temporal correlations. By training an ensemble of neural networks and additionally pushing uncertainty on the covariates through the trained network, we obtain confidence bounds for all our estimates, allowing researchers to pinpoint the geographic regions most in need of additional data collection. We validate our approach on various test sets of unseen data, demonstrating that it significantly outperforms traditional methods estimating five-year flows while delivering a significant increase in temporal resolution. The model is fully open source: all training data, neural network weights, and training code are made public alongside the migration estimates, providing a valuable resource for future studies of human migration.

Updated: 2025-07-03 10:46:56

标题: 深度学习：四十年的人类迁移

摘要: 我们提供了一个关于230个国家和地区之间年度迁移流量和存量的新颖且详细的数据集，跨越了从1990年至今的时期。我们的流量估计进一步按出生国家进行细分，提供了过去35年移民的全面图景。这些估计是通过训练一个深度递归神经网络从包括地理、经济、文化、社会和政治信息在内的18个协变量中学习流量模式而获得的。神经网络的递归架构意味着整个过去可以影响当前的迁移模式，使我们能够学习长期时间相关性。通过训练一组神经网络并通过训练网络将不确定性推向协变量，我们为所有估计获得置信区间，使研究人员能够准确定位最需要额外数据收集的地理区域。我们在各种未见数据的测试集上验证了我们的方法，表明它在估计五年流量时明显优于传统方法，同时提高了时间分辨率。该模型完全开源：所有训练数据、神经网络权重和训练代码都与迁移估计一起公开，为未来的人类迁移研究提供了宝贵的资源。

更新时间: 2025-07-03 10:46:56

领域: cs.LG,68T07,I.2.6

下载: http://arxiv.org/abs/2506.22821v2

Personalised Explanations in Long-term Human-Robot Interactions

In the field of Human-Robot Interaction (HRI), a fundamental challenge is to facilitate human understanding of robots. The emerging domain of eXplainable HRI (XHRI) investigates methods to generate explanations and evaluate their impact on human-robot interactions. Previous works have highlighted the need to personalise the level of detail of these explanations to enhance usability and comprehension. Our paper presents a framework designed to update and retrieve user knowledge-memory models, allowing for adapting the explanations' level of detail while referencing previously acquired concepts. Three architectures based on our proposed framework that use Large Language Models (LLMs) are evaluated in two distinct scenarios: a hospital patrolling robot and a kitchen assistant robot. Experimental results demonstrate that a two-stage architecture, which first generates an explanation and then personalises it, is the framework architecture that effectively reduces the level of detail only when there is related user knowledge.

Updated: 2025-07-03 10:40:39

标题: 长期人机交互中的个性化解释

摘要: 在人机交互（HRI）领域，一个基本挑战是促进人类对机器人的理解。可解释HRI（XHRI）领域探讨了生成解释并评估其对人机交互影响的方法。先前的研究强调了个性化解释的详细程度以增强可用性和理解。本文提出了一个框架，旨在更新和检索用户知识-记忆模型，允许调整解释的详细程度，并参考先前获得的概念。基于我们提出的框架，使用大型语言模型（LLMs）的三种架构在两个不同场景下进行评估：医院巡逻机器人和厨房助理机器人。实验结果表明，首先生成解释然后个性化的两阶段架构是能够在存在相关用户知识时有效降低详细程度的框架架构。

更新时间: 2025-07-03 10:40:39

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.03049v1

Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling

Recent developments in Self-Supervised Learning (SSL) have demonstrated significant potential for Speaker Verification (SV), but closing the performance gap with supervised systems remains an ongoing challenge. SSL frameworks rely on anchor-positive pairs, constructed from segments of the same audio utterance. Hence, positives have channel characteristics similar to those of their corresponding anchors, even with extensive data-augmentation. Therefore, this positive sampling strategy is a fundamental limitation as it encodes too much information regarding the recording source in the learned representations. This article introduces Self-Supervised Positive Sampling (SSPS), a bootstrapped technique for sampling appropriate and diverse positives in SSL frameworks for SV. SSPS samples positives close to their anchor in the representation space, assuming that these pseudo-positives belong to the same speaker identity but correspond to different recording conditions. This method consistently demonstrates improvements in SV performance on VoxCeleb benchmarks when applied to major SSL frameworks, including SimCLR, SwAV, VICReg, and DINO. Using SSPS, SimCLR and DINO achieve 2.57% and 2.53% EER on VoxCeleb1-O, respectively. SimCLR yields a 58% relative reduction in EER, getting comparable performance to DINO with a simpler training framework. Furthermore, SSPS lowers intra-class variance and reduces channel information in speaker representations while exhibiting greater robustness without data-augmentation.

Updated: 2025-07-03 10:38:50

标题: 自监督框架通过引导式正例采样进行说话人验证

摘要: 最近的自监督学习（SSL）在说话者验证（SV）方面展示了显著潜力，但与监督系统之间的性能差距仍然是一个持续的挑战。SSL框架依赖于由相同音频话语的片段构成的锚-正对，因此，即使进行了大量数据增强，正对的通道特征仍与其对应的锚点相似。因此，这种正对采样策略是一个基本限制，因为它在学习表示中编码了太多关于录制来源的信息。本文介绍了自监督正对采样（SSPS），这是一种用于在SV的SSL框架中采样适当和多样化正对的引导技术。SSPS在表示空间中采样接近其锚点的正对，假设这些伪正对属于相同的说话者身份但对应不同的录制条件。当应用于SimCLR、SwAV、VICReg和DINO等主要SSL框架时，该方法在VoxCeleb基准测试中持续展示了SV性能的改进。使用SSPS，SimCLR和DINO在VoxCeleb1-O上分别实现了2.57%和2.53%的EER。SimCLR在EER上实现了58%的相对降低，以更简单的训练框架获得与DINO相当的性能。此外，SSPS降低了类内方差，并减少了说话者表示中的通道信息，展现出更大的鲁棒性而无需进行数据增强。

更新时间: 2025-07-03 10:38:50

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2501.17772v3

Reconsidering the energy efficiency of spiking neural networks

Spiking Neural Networks (SNNs) promise higher energy efficiency over conventional Quantized Artificial Neural Networks (QNNs) due to their event-driven, spike-based computation. However, prevailing energy evaluations often oversimplify, focusing on computational aspects while neglecting critical overheads like comprehensive data movement and memory access. Such simplifications can lead to misleading conclusions regarding the true energy benefits of SNNs. This paper presents a rigorous re-evaluation. We establish a fair baseline by mapping rate-encoded SNNs with $T$ timesteps to functionally equivalent QNNs with $\lceil \log_2(T+1) \rceil$ bits. This ensures both models have comparable representational capacities, as well has similar hardware requirement, enabling meaningful energy comparisons. We introduce a detailed analytical energy model encompassing core computation and data movement (sparse and dense activations, weights). Using this model, we systematically explore a wide parameter space, including intrinsic network characteristics ($T$, spike rate $s_r$, QNN sparsity $\gamma$, model size $N$, weight bit-level) and hardware characteristics (memory system and network-on-chip). Our analysis identifies specific operational regimes where SNNs genuinely offer superior energy efficiency. For example, under typical neuromorphic hardware conditions, SNNs with moderate time windows ($T \in [5,10]$) require an average spike rate ($s_r$) below 6.4% to outperform equivalent QNNs. These insights guide the design of genuinely energy-efficient neural network solutions.

Updated: 2025-07-03 10:37:52

标题: 重新考虑尖峰神经网络的能源效率

摘要: 脉冲神经网络（SNNs）由于其基于事件和脉冲的计算而承诺比常规量化人工神经网络（QNNs）具有更高的能源效率。然而，目前的能源评估往往过于简化，聚焦于计算方面，忽视了关键的综合数据移动和内存访问开销。这种简化可能导致对SNNs真正能源优势的误导性结论。本文提出了严格的重新评估。我们通过将以速率编码的SNNs映射到具有$\lceil \log_2(T+1) \rceil$位的功能上等的QNNs来建立一个公平的基准。这确保了两种模型具有可比较的表示能力，以及类似的硬件需求，从而实现有意义的能源比较。我们引入了一个详细的分析能源模型，涵盖核心计算和数据移动（稀疏和密集激活，权重）。利用这个模型，我们系统地探索了一个广泛的参数空间，包括内在网络特征（$T$，脉冲率$s_r$，QNN稀疏度$\gamma$，模型大小$N$，权重位级）和硬件特性（内存系统和网络芯片）。我们的分析确定了SNNs真正提供卓越能源效率的特定操作区域。例如，在典型的神经形态硬件条件下，具有适中时间窗口（$T \in [5,10]$）的SNNs需要平均脉冲率（$s_r$）低于6.4%才能优于等效的QNNs。这些见解指导了真正能源高效的神经网络解决方案的设计。

更新时间: 2025-07-03 10:37:52

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.08290v2

Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs

The rapid evolution of multimodal large language models (MLLMs) has significantly enhanced their real-world applications. However, achieving consistent performance across languages, especially when integrating cultural knowledge, remains a significant challenge. To better assess this issue, we introduce two new benchmarks: KnowRecall and VisRecall, which evaluate cross-lingual consistency in MLLMs. KnowRecall is a visual question answering benchmark designed to measure factual knowledge consistency in 15 languages, focusing on cultural and historical questions about global landmarks. VisRecall assesses visual memory consistency by asking models to describe landmark appearances in 9 languages without access to images. Experimental results reveal that state-of-the-art MLLMs, including proprietary ones, still struggle to achieve cross-lingual consistency. This underscores the need for more robust approaches that produce truly multilingual and culturally aware models.

Updated: 2025-07-03 10:35:35

标题: 跨语言旅行：在多模态LLM中基准测试跨语言一致性

摘要: 多模态大型语言模型（MLLMs）的快速演进显著增强了它们在现实世界中的应用。然而，在跨语言一致性方面，特别是在整合文化知识时，仍然存在重大挑战。为了更好地评估这个问题，我们引入了两个新的基准：KnowRecall和VisRecall，评估MLLMs中的跨语言一致性。KnowRecall是一个视觉问答基准，旨在衡量15种语言中的事实知识一致性，重点关注全球地标的文化和历史问题。VisRecall通过要求模型在没有图像访问的情况下描述9种语言中地标的外观来评估视觉记忆一致性。实验结果表明，包括专有模型在内的最先进的MLLMs仍然在实现跨语言一致性方面存在困难。这突显了需要更加健壮的方法来产生真正多语言和具有文化意识的模型。

更新时间: 2025-07-03 10:35:35

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2505.15075v2

From Sentences to Sequences: Rethinking Languages in Biological System

The paradigm of large language models in natural language processing (NLP) has also shown promise in modeling biological languages, including proteins, RNA, and DNA. Both the auto-regressive generation paradigm and evaluation metrics have been transferred from NLP to biological sequence modeling. However, the intrinsic structural correlations in natural and biological languages differ fundamentally. Therefore, we revisit the notion of language in biological systems to better understand how NLP successes can be effectively translated to biological domains. By treating the 3D structure of biomolecules as the semantic content of a sentence and accounting for the strong correlations between residues or bases, we highlight the importance of structural evaluation and demonstrate the applicability of the auto-regressive paradigm in biological language modeling. Code can be found at \href{https://github.com/zjuKeLiu/RiFold}{github.com/zjuKeLiu/RiFold}

Updated: 2025-07-03 10:33:16

标题: 从句子到序列：重新思考生物系统中的语言

摘要: 自然语言处理（NLP）中的大型语言模型范式在建模生物语言（包括蛋白质、RNA和DNA）方面也显示出了潜力。自回归生成范式和评估指标已经从NLP转移到生物序列建模中。然而，自然语言和生物语言中的内在结构相关性在本质上是不同的。因此，我们重新审视生物系统中语言的概念，以更好地理解NLP成功如何有效地转化为生物领域。通过将生物分子的3D结构视为一个句子的语义内容，并考虑残基或碱基之间的强相关性，我们强调结构评估的重要性，并证明了自回归范式在生物语言建模中的适用性。可以在\href{https://github.com/zjuKeLiu/RiFold}{github.com/zjuKeLiu/RiFold}找到代码。

更新时间: 2025-07-03 10:33:16

领域: q-bio.BM,cs.AI

下载: http://arxiv.org/abs/2507.00953v2

Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning

Process Reinforcement Learning~(PRL) has demonstrated considerable potential in enhancing the reasoning capabilities of Large Language Models~(LLMs). However, introducing additional process reward models incurs substantial computational overhead, and there is no unified theoretical framework for process-level advantage estimation. To bridge this gap, we propose \textbf{S}elf-Guided \textbf{P}rocess \textbf{R}eward \textbf{O}ptimization~(\textbf{SPRO}), a novel framework that enables process-aware RL through two key innovations: (1) we first theoretically demonstrate that process rewards can be derived intrinsically from the policy model itself, and (2) we introduce well-defined cumulative process rewards and \textbf{M}asked \textbf{S}tep \textbf{A}dvantage (\textbf{MSA}), which facilitates rigorous step-wise action advantage estimation within shared-prompt sampling groups. Our experimental results demonstrate that SPRO outperforms vaniila GRPO with 3.4x higher training efficiency and a 17.5\% test accuracy improvement. Furthermore, SPRO maintains a stable and elevated policy entropy throughout training while reducing the average response length by approximately $1/3$, evidencing sufficient exploration and prevention of reward hacking. Notably, SPRO incurs no additional computational overhead compared to outcome-supervised RL methods such as GRPO, which benefit industrial implementation.

Updated: 2025-07-03 10:33:08

标题: 重新定义的逐步优势自引导流程奖励优化的过程强化学习

摘要: 过程强化学习（PRL）已经显示出在增强大型语言模型（LLMs）的推理能力方面具有相当大的潜力。然而，引入额外的过程奖励模型会带来大量的计算开销，并且目前没有统一的理论框架来进行过程级优势估计。为了弥补这一差距，我们提出了自导向过程奖励优化（SPRO），这是一个新颖的框架，通过两个关键创新实现了过程感知的强化学习：（1）我们首先理论上证明了过程奖励可以从策略模型本身内在地推导出来，（2）我们引入了明确定义的累积过程奖励和掩码步骤优势（MSA），这有助于在共享提示采样组内严谨地逐步行动优势估计。我们的实验结果表明，SPRO比普通的GRPO在训练效率上提高了3.4倍，测试准确率提高了17.5％。此外，SPRO在整个训练过程中保持了稳定且提高的策略熵，同时将平均响应长度减少了约1/3，证明了足够的探索和防止奖励欺骗。值得注意的是，与GRPO等基于结果监督的强化学习方法相比，SPRO没有额外的计算开销，这有利于工业实施。

更新时间: 2025-07-03 10:33:08

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.01551v2

Detecting Multiple Diseases in Multiple Crops Using Deep Learning

India, as a predominantly agrarian economy, faces significant challenges in agriculture, including substantial crop losses caused by diseases, pests, and environmental stress. Early detection and accurate identification of diseases across different crops are critical for improving yield and ensuring food security. This paper proposes a deep learning based solution for detecting multiple diseases in multiple crops, aimed to cover India's diverse agricultural landscape. We first create a unified dataset encompassing images of 17 different crops and 34 different diseases from various available repositories. Proposed deep learning model is trained on this dataset and outperforms the state-of-the-art in terms of accuracy and the number of crops, diseases covered. We achieve a significant detection accuracy, i.e., 99 percent for our unified dataset which is 7 percent more when compared to state-of-the-art handling 14 crops and 26 different diseases only. By improving the number of crops and types of diseases that can be detected, proposed solution aims to provide a better product for Indian farmers.

Updated: 2025-07-03 10:26:49

标题: 使用深度学习检测多种作物中的多种疾病

摘要: 作为一个以农业为主的经济体，印度在农业方面面临着重大挑战，包括由疾病、害虫和环境压力导致的大量作物损失。对不同作物的疾病进行早期检测和准确鉴定对于提高产量和确保粮食安全至关重要。本文提出了一种基于深度学习的解决方案，用于检测多种作物中的多种疾病，旨在覆盖印度多样化的农业景观。我们首先创建了一个统一的数据集，其中包含来自各种可用存储库的17种不同作物和34种不同疾病的图像。提出的深度学习模型在此数据集上进行训练，并在准确性和覆盖的作物、疾病数量方面胜过了现有技术。我们在统一数据集上实现了显著的检测准确率，即99%，与仅处理14种作物和26种不同疾病的现有技术相比高出7%。通过提高可以检测的作物数量和疾病类型，提出的解决方案旨在为印度农民提供更好的产品。

更新时间: 2025-07-03 10:26:49

领域: cs.CV,cs.AI,cs.ET

下载: http://arxiv.org/abs/2507.02517v1

Monitoring of Static Fairness

Machine-learned systems are in widespread use for making decisions about humans, and it is important that they are fair, i.e., not biased against individuals based on sensitive attributes. We present a general framework of runtime verification of algorithmic fairness for systems whose models are unknown, but are assumed to have a Markov chain structure, with or without full observation of the state space. We introduce a specification language that can model many common algorithmic fairness properties, such as demographic parity, equal opportunity, and social burden. We build monitors that observe a long sequence of events as generated by a given system, and output, after each observation, a quantitative estimate of how fair or biased the system was on that run until that point in time. The estimate is proven to be correct modulo a variable error bound and a given confidence level, where the error bound gets tighter as the observed sequence gets longer. We present two categories of monitoring algorithms, namely ones with a uniform error bound across all time points, and ones with weaker non-uniform, pointwise error bounds at different time points. Our monitoring algorithms use statistical tools that are adapted to suit the dynamic requirements of monitoring and the special needs of the fairness specifications. Using a prototype implementation, we show how we can monitor if a bank is fair in giving loans to applicants from different social backgrounds, and if a college is fair in admitting students while maintaining a reasonable financial burden on the society. In these experiments, our monitors took less than a millisecond to update their verdicts after each observation.

Updated: 2025-07-03 10:20:04

标题: Static Fairness的监测

摘要: 机器学习系统被广泛用于对人类做出决策，因此很重要的是它们要公平，即不基于敏感属性对个体进行偏见。我们提出了一个运行时验证算法公平性的通用框架，用于那些模型未知但被假定具有马尔可夫链结构的系统，无论是否完全观察状态空间。我们引入了一种规范语言，可以建模许多常见的算法公平性属性，如人口平衡、平等机会和社会负担。我们构建了监视器，观察由给定系统生成的一长串事件，并在每次观察后输出该系统在那个时间点之前运行时有多公平或有多偏见的定量估计。该估计被证明在一个可变的误差边界和给定的置信水平下是正确的，其中随着观察序列变得更长，误差边界会变得更紧密。我们提出了两类监控算法，一类是在所有时间点具有统一的误差边界，另一类是在不同时间点具有较弱的非统一、逐点的误差边界。我们的监控算法使用适应于监控动态需求和公平性规范特殊需求的统计工具。通过一个原型实现，我们展示了如何监控银行在向不同社会背景的申请人发放贷款是否公平，以及大学在录取学生时是否保持对社会的合理负担。在这些实验中，我们的监视器在每次观察后更新其判断只需不到一毫秒的时间。

更新时间: 2025-07-03 10:20:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03048v1

TFOC-Net: A Short-time Fourier Transform-based Deep Learning Approach for Enhancing Cross-Subject Motor Imagery Classification

Cross-subject motor imagery (CS-MI) classification in brain-computer interfaces (BCIs) is a challenging task due to the significant variability in Electroencephalography (EEG) patterns across different individuals. This variability often results in lower classification accuracy compared to subject-specific models, presenting a major barrier to developing calibration-free BCIs suitable for real-world applications. In this paper, we introduce a novel approach that significantly enhances cross-subject MI classification performance through optimized preprocessing and deep learning techniques. Our approach involves direct classification of Short-Time Fourier Transform (STFT)-transformed EEG data, optimized STFT parameters, and a balanced batching strategy during training of a Convolutional Neural Network (CNN). This approach is uniquely validated across four different datasets, including three widely-used benchmark datasets leading to substantial improvements in cross-subject classification, achieving 67.60% on the BCI Competition IV Dataset 1 (IV-1), 65.96% on Dataset 2A (IV-2A), and 80.22% on Dataset 2B (IV-2B), outperforming state-of-the-art techniques. Additionally, we systematically investigate the classification performance using MI windows ranging from the full 4-second window to 1-second windows. These results establish a new benchmark for generalizable, calibration-free MI classification in addition to contributing a robust open-access dataset to advance research in this domain.

Updated: 2025-07-03 10:17:39

标题: TFOC-Net：基于短时傅里叶变换的深度学习方法用于增强跨受试者运动想象分类

摘要: 脑机接口（BCIs）中的跨主题运动想象（CS-MI）分类是一项具有挑战性的任务，因为不同个体之间脑电图（EEG）模式的显著变异性。这种变异性通常导致与特定主题模型相比较低的分类准确性，这是发展适用于实际应用的无校准BCIs的主要障碍。在本文中，我们引入了一种新颖的方法，通过优化预处理和深度学习技术显著增强跨主题MI分类性能。我们的方法涉及直接对短时傅里叶变换（STFT）转换的EEG数据进行分类，优化STFT参数，并在卷积神经网络（CNN）的训练过程中采用平衡的批处理策略。这种方法在四个不同的数据集上得到了独特的验证，包括三个广泛使用的基准数据集，从而在跨主题分类方面取得了显著的改进，在BCI竞赛IV数据集1（IV-1）上达到了67.60％，在数据集2A（IV-2A）上达到了65.96％，在数据集2B（IV-2B）上达到了80.22％，超过了最先进的技术。此外，我们系统地研究了使用从完整的4秒窗口到1秒窗口的MI窗口的分类性能。这些结果为泛化的、无校准的MI分类建立了一个新的基准，同时为推进该领域的研究贡献了一个强大的开放获取数据集。

更新时间: 2025-07-03 10:17:39

领域: cs.LG,cs.HC,cs.NE

下载: http://arxiv.org/abs/2507.02510v1

The Choice of Normalization Influences Shrinkage in Regularized Regression

Regularized models are often sensitive to the scales of the features in the data and it has therefore become standard practice to normalize (center and scale) the features before fitting the model. But there are many different ways to normalize the features and the choice may have dramatic effects on the resulting model. In spite of this, there has so far been no research on this topic. In this paper, we begin to bridge this knowledge gap by studying normalization in the context of lasso, ridge, and elastic net regression. We focus on binary features and show that their class balances (proportions of ones) directly influences the regression coefficients and that this effect depends on the combination of normalization and regularization methods used. We demonstrate that this effect can be mitigated by scaling binary features with their variance in the case of the lasso and standard deviation in the case of ridge regression, but that this comes at the cost of increased variance of the coefficient estimates. For the elastic net, we show that scaling the penalty weights, rather than the features, can achieve the same effect. Finally, we also tackle mixes of binary and normal features as well as interactions and provide some initial results on how to normalize features in these cases.

Updated: 2025-07-03 10:16:05

标题: 规范化的选择影响了正则化回归中的收缩效应

摘要: 规范化模型通常对数据中的特征尺度敏感，因此在拟合模型之前标准做法是对特征进行归一化（居中和缩放）。但是有许多不同的方法来对特征进行归一化，选择可能会对结果产生显著影响。尽管如此，迄今为止在这个领域还没有研究。在这篇论文中，我们开始填补这方面的知识空白，通过研究lasso、ridge和elastic net回归中的归一化。我们重点研究二进制特征，并显示它们的类平衡（值为1的比例）直接影响回归系数，并且这种影响取决于所使用的归一化和正则化方法的组合。我们证明在lasso回归中，通过使用二进制特征的方差进行缩放可以缓解这种影响，在ridge回归中使用标准差进行缩放也可以达到同样的效果，但这会增加系数估计的方差。对于elastic net回归，我们表明通过对惩罚权重进行缩放，而不是对特征进行缩放，可以实现相同的效果。最后，我们还研究了二进制和正常特征的混合以及交互作用，并提供了一些关于如何在这些情况下对特征进行归一化的初始结果。

更新时间: 2025-07-03 10:16:05

领域: stat.ML,cs.LG,stat.ME,62J07 (Primary), 68T09 (Secondary),G.3; G.4; I.6

下载: http://arxiv.org/abs/2501.03821v3

IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders

Legal NLP remains underdeveloped in regions like India due to the scarcity of structured datasets. We introduce IndianBailJudgments-1200, a new benchmark dataset comprising 1200 Indian court judgments on bail decisions, annotated across 20+ attributes including bail outcome, IPC sections, crime type, and legal reasoning. Annotations were generated using a prompt-engineered GPT-4o pipeline and verified for consistency. This resource supports a wide range of legal NLP tasks such as outcome prediction, summarization, and fairness analysis, and is the first publicly available dataset focused specifically on Indian bail jurisprudence.

Updated: 2025-07-03 10:13:42

标题: 《印度保释裁决-1200：用于印度保释命令法律自然语言处理的多属性数据集》

摘要: 印度等地区的法律自然语言处理仍然发展不足，这是由于结构化数据集的稀缺。我们引入了IndianBailJudgments-1200，这是一个新的基准数据集，包括1200份关于保释决定的印度法院判决，涵盖了20多个属性，包括保释结果、IPC章节、犯罪类型和法律推理等。注释是使用经过提示设计的GPT-4o管道生成的，并经过一致性验证。这个资源支持各种法律自然语言处理任务，如结果预测、摘要和公平性分析，并且是首个专门关注印度保释法学的公开可用数据集。

更新时间: 2025-07-03 10:13:42

领域: cs.CL,cs.AI,cs.LG,91B14, 68T50,I.2.7; K.4.1; K.5.2

下载: http://arxiv.org/abs/2507.02506v1

Counterfactual Tuning for Temporal Sensitivity Enhancement in Large Language Model-based Recommendation

Recent advances have applied large language models (LLMs) to sequential recommendation, leveraging their pre-training knowledge and reasoning capabilities to provide more personalized user experiences. However, existing LLM-based methods fail to sufficiently leverage the rich temporal information inherent in users' historical interaction sequences, stemming from fundamental architectural constraints: LLMs process information through self-attention mechanisms that lack inherent sequence ordering and rely on position embeddings designed primarily for natural language rather than user interaction sequences. This limitation significantly impairs their ability to capture the evolution of user preferences over time and predict future interests accurately. To address this critical gap, we propose Counterfactual Enhanced Temporal Framework for LLM-Based Recommendation (CETRec). CETRec is grounded in causal inference principles, which allow it to isolate and measure the specific impact of temporal information on recommendation outcomes. By conceptualizing temporal order as an independent causal factor distinct from item content, we can quantify its unique contribution through counterfactual reasoning--comparing what recommendations would be made with and without temporal information while keeping all other factors constant. This causal framing enables CETRec to design a novel counterfactual tuning objective that directly optimizes the model's temporal sensitivity, teaching LLMs to recognize both absolute timestamps and relative ordering patterns in user histories. Combined with our counterfactual tuning task derived from causal analysis, CETRec effectively enhances LLMs' awareness of both absolute order (how recently items were interacted with) and relative order (the sequential relationships between items).

Updated: 2025-07-03 10:11:35

标题: 基于大型语言模型的推荐系统中的时间敏感性增强的反事实调整

摘要: 最近的研究进展将大型语言模型(LLMs)应用于顺序推荐，利用它们的预训练知识和推理能力，提供更个性化的用户体验。然而，现有基于LLM的方法未能充分利用用户历史交互序列中蕴含的丰富时间信息，这源于基本的架构约束：LLMs通过缺乏固有序列排序的自注意机制处理信息，并依赖主要设计用于自然语言而非用户交互序列的位置嵌入。这一限制显著地损害了它们捕捉用户偏好随时间演变并准确预测未来兴趣的能力。为了填补这一关键差距，我们提出了基于因果推断原则的增强时序框架LLM推荐(CETRec)。CETRec将时间信息视为独立因果因素，不同于项目内容，使我们能够通过反事实推理量化其独特贡献——在保持所有其他因素恒定的情况下，比较有无时间信息时的推荐结果。这种因果框架使CETRec设计了一种新颖的反事实调整目标，直接优化模型的时间敏感性，教会LLMs识别用户历史中的绝对时间戳和相对排序模式。结合我们从因果分析中衍生出的反事实调整任务，CETRec有效地提升了LLMs对绝对顺序(物品最近与之交互的时间)和相对顺序(物品之间的序列关系)的认识。

更新时间: 2025-07-03 10:11:35

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2507.03047v1

Continual Gradient Low-Rank Projection Fine-Tuning for LLMs

Continual fine-tuning of Large Language Models (LLMs) is hampered by the trade-off between efficiency and expressiveness. Low-Rank Adaptation (LoRA) offers efficiency but constrains the model's ability to learn new tasks and transfer knowledge due to its low-rank nature and reliance on explicit parameter constraints. We propose GORP (Gradient LOw Rank Projection) for Continual Learning, a novel training strategy that overcomes these limitations by synergistically combining full and low-rank parameters and jointly updating within a unified low-rank gradient subspace. GORP expands the optimization space while preserving efficiency and mitigating catastrophic forgetting. Extensive experiments on continual learning benchmarks demonstrate GORP's superior performance compared to existing state-of-the-art approaches. Code is available at https://github.com/Wcxwcxw/GORP.

Updated: 2025-07-03 10:11:22

标题: 持续梯度低秩投影微调用于LLMs

摘要: 大型语言模型（LLMs）的持续微调受到效率和表达能力之间的权衡的阻碍。低秩适应（LoRA）提供了效率，但由于其低秩性质和依赖显式参数约束，限制了模型学习新任务和传递知识的能力。我们提出了一种名为GORP（Gradient LOw Rank Projection）的持续学习训练策略，通过协同地结合全秩和低秩参数，并在统一的低秩梯度子空间中联合更新，克服了这些限制。GORP扩展了优化空间，同时保持效率并减轻灾难性遗忘。对持续学习基准的大量实验表明，与现有的最先进方法相比，GORP具有更优越的性能。代码可在https://github.com/Wcxwcxw/GORP 上找到。

更新时间: 2025-07-03 10:11:22

领域: cs.LG,cs.AI,cs.CE

下载: http://arxiv.org/abs/2507.02503v1

Agentic AI Process Observability: Discovering Behavioral Variability

AI agents that leverage Large Language Models (LLMs) are increasingly becoming core building blocks of modern software systems. A wide range of frameworks is now available to support the specification of such applications. These frameworks enable the definition of agent setups using natural language prompting, which specifies the roles, goals, and tools assigned to the various agents involved. Within such setups, agent behavior is non-deterministic for any given input, highlighting the critical need for robust debugging and observability tools. In this work, we explore the use of process and causal discovery applied to agent execution trajectories as a means of enhancing developer observability. This approach aids in monitoring and understanding the emergent variability in agent behavior. Additionally, we complement this with LLM-based static analysis techniques to distinguish between intended and unintended behavioral variability. We argue that such instrumentation is essential for giving developers greater control over evolving specifications and for identifying aspects of functionality that may require more precise and explicit definitions.

Updated: 2025-07-03 10:05:46

标题: 自主AI流程可观测性：发现行为变异

摘要: 利用大型语言模型（LLMs）的AI代理越来越成为现代软件系统的核心构建模块。现在已经有各种框架可用来支持这类应用程序的规范。这些框架能够使用自然语言提示来定义代理设置，指定涉及的各种代理的角色、目标和工具。在这种设置中，对于任何给定的输入，代理行为是不确定的，突显了对强大的调试和可观察性工具的关键需求。在这项工作中，我们探讨了将过程和因果发现应用于代理执行轨迹的方法，以增强开发者的可观察性。这种方法有助于监控和理解代理行为中的新变异。此外，我们还结合了基于LLM的静态分析技术，以区分预期和非预期的行为变异。我们认为这种仪器化对于让开发人员更好地控制不断演变的规范，以及识别可能需要更精确和明确定义的功能方面是至关重要的。

更新时间: 2025-07-03 10:05:46

领域: cs.AI

下载: http://arxiv.org/abs/2505.20127v2

Online Conformal Prediction with Efficiency Guarantees

We study the problem of conformal prediction in a novel online framework that directly optimizes efficiency. In our problem, we are given a target miscoverage rate $\alpha > 0$, and a time horizon $T$. On each day $t \le T$ an algorithm must output an interval $I_t \subseteq [0, 1]$, then a point $y_t \in [0, 1]$ is revealed. The goal of the algorithm is to achieve coverage, that is, $y_t \in I_t$ on (close to) a $(1 - \alpha)$-fraction of days, while maintaining efficiency, that is, minimizing the average volume (length) of the intervals played. This problem is an online analogue to the problem of constructing efficient confidence intervals. We study this problem over arbitrary and exchangeable (random order) input sequences. For exchangeable sequences, we show that it is possible to construct intervals that achieve coverage $(1 - \alpha) - o(1)$, while having length upper bounded by the best fixed interval that achieves coverage in hindsight. For arbitrary sequences however, we show that any algorithm that achieves a $\mu$-approximation in average length compared to the best fixed interval achieving coverage in hindsight, must make a multiplicative factor more mistakes than $\alpha T$, where the multiplicative factor depends on $\mu$ and the aspect ratio of the problem. Our main algorithmic result is a matching algorithm that can recover all Pareto-optimal settings of $\mu$ and number of mistakes. Furthermore, our algorithm is deterministic and therefore robust to an adaptive adversary. This gap between the exchangeable and arbitrary settings is in contrast to the classical online learning problem. In fact, we show that no single algorithm can simultaneously be Pareto-optimal for arbitrary sequences and optimal for exchangeable sequences. On the algorithmic side, we give an algorithm that achieves the near-optimal tradeoff between the two cases.

Updated: 2025-07-03 10:00:50

标题: 在线具有效率保证的一致性预测

摘要: 我们研究了在一个新颖的在线框架中直接优化效率的共形预测问题。在我们的问题中，给定目标错误覆盖率$\alpha > 0$和时间跨度$T$。在每一天$t \le T$，算法必须输出一个区间$I_t \subseteq [0, 1]$，然后揭示一个点$y_t \in [0, 1]$。算法的目标是实现覆盖，即在(接近)$(1 - \alpha)$比例的天数内$y_t \in I_t$，同时保持效率，即最小化播放的区间的平均体积(长度)。这个问题是构建高效置信区间问题的在线模拟。我们研究了在任意和可交换(随机顺序)输入序列上的这个问题。对于可交换序列，我们展示了可以构建实现覆盖$(1 - \alpha) - o(1)$的区间，同时长度被最佳固定区间在后见性中实现覆盖所限制。然而，对于任意序列，我们展示了任何实现与在后见性中实现覆盖的最佳固定区间的平均长度$\mu$-近似的算法，必须比$\alpha T$多出一个乘法因子的错误，其中乘法因子取决于$\mu$和问题的纵横比。我们的主要算法结果是一个匹配算法，可以恢复所有帕累托最优设置的$\mu$和错误数。此外，我们的算法是确定性的，因此对于自适应对手是健壮的。这种可交换和任意设置之间的差距与经典在线学习问题形成对比。事实上，我们展示了没有单一算法可以同时对任意序列和可交换序列进行帕累托最优。在算法方面，我们提供了一个算法，实现了在这两种情况间接近最佳的权衡。

更新时间: 2025-07-03 10:00:50

领域: cs.LG,cs.DS,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2507.02496v1

Reliability-Adjusted Prioritized Experience Replay

Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-10 benchmark.

Updated: 2025-07-03 09:55:57

标题: 可靠性调整的优先体验重播

摘要: 经验回放使在线强化学习代理从过去的经验中实现高效学习。传统上，经验是从重播缓冲区均匀抽样的，而不考虑经验特定学习潜力的差异。为了更高效地抽样，研究人员引入了优先经验重放（PER）。在本文中，我们通过引入一种新颖的时间差错可靠性度量，提出了对PER的扩展。我们在理论上证明了由此产生的转换选择算法，可靠性调整的优先经验重放（ReaPER），比PER更有效地实现学习。我们进一步呈现实证结果，表明ReaPER在各种环境类型（包括Atari-10基准测试）中优于PER。

更新时间: 2025-07-03 09:55:57

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2506.18482v2

MC-INR: Efficient Encoding of Multivariate Scientific Simulation Data using Meta-Learning and Clustered Implicit Neural Representations

Implicit Neural Representations (INRs) are widely used to encode data as continuous functions, enabling the visualization of large-scale multivariate scientific simulation data with reduced memory usage. However, existing INR-based methods face three main limitations: (1) inflexible representation of complex structures, (2) primarily focusing on single-variable data, and (3) dependence on structured grids. Thus, their performance degrades when applied to complex real-world datasets. To address these limitations, we propose a novel neural network-based framework, MC-INR, which handles multivariate data on unstructured grids. It combines meta-learning and clustering to enable flexible encoding of complex structures. To further improve performance, we introduce a residual-based dynamic re-clustering mechanism that adaptively partitions clusters based on local error. We also propose a branched layer to leverage multivariate data through independent branches simultaneously. Experimental results demonstrate that MC-INR outperforms existing methods on scientific data encoding tasks.

Updated: 2025-07-03 09:55:57

标题: MC-INR：使用元学习和聚类隐式神经表示对多元科学模拟数据进行高效编码

摘要: Implicit Neural Representations (INRs)被广泛用于将数据编码为连续函数，可以减少内存使用量，实现大规模多变量科学模拟数据的可视化。然而，现有基于INR的方法面临三个主要限制：（1）对复杂结构的表示不灵活，（2）主要关注单变量数据，（3）依赖结构化网格。因此，在应用于复杂的实际数据集时，它们的性能会降低。为了解决这些限制，我们提出了一个新颖的基于神经网络的框架，MC-INR，可以处理非结构化网格上的多变量数据。它结合了元学习和聚类，能够灵活地编码复杂结构。为了进一步提高性能，我们引入了基于残差的动态重新聚类机制，根据局部误差自适应地划分簇。我们还提出了一个分支层，通过独立的分支同时利用多变量数据。实验结果表明，MC-INR在科学数据编码任务上优于现有方法。

更新时间: 2025-07-03 09:55:57

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.02494v1

Temporally-Aware Supervised Contrastive Learning for Polyp Counting in Colonoscopy

Automated polyp counting in colonoscopy is a crucial step toward automated procedure reporting and quality control, aiming to enhance the cost-effectiveness of colonoscopy screening. Counting polyps in a procedure involves detecting and tracking polyps, and then clustering tracklets that belong to the same polyp entity. Existing methods for polyp counting rely on self-supervised learning and primarily leverage visual appearance, neglecting temporal relationships in both tracklet feature learning and clustering stages. In this work, we introduce a paradigm shift by proposing a supervised contrastive loss that incorporates temporally-aware soft targets. Our approach captures intra-polyp variability while preserving inter-polyp discriminability, leading to more robust clustering. Additionally, we improve tracklet clustering by integrating a temporal adjacency constraint, reducing false positive re-associations between visually similar but temporally distant tracklets. We train and validate our method on publicly available datasets and evaluate its performance with a leave-one-out cross-validation strategy. Results demonstrate a 2.2x reduction in fragmentation rate compared to prior approaches. Our results highlight the importance of temporal awareness in polyp counting, establishing a new state-of-the-art. Code is available at https://github.com/lparolari/temporally-aware-polyp-counting.

Updated: 2025-07-03 09:55:48

标题: 在结肠镜检查中用于息肉计数的时态感知监督对比学习

摘要: 在结肠镜检查中自动计数息肉是朝向自动流程报告和质量控制的关键步骤，旨在提高结肠镜检查筛查的成本效益。在进行程序中计数息肉涉及检测和跟踪息肉，然后对属于同一息肉实体的轨迹进行聚类。现有的计数息肉方法依赖于自监督学习，并主要利用视觉外观，忽视了轨迹特征学习和聚类阶段中的时间关系。在这项工作中，我们通过提出一个包含时间感知软目标的监督对比损失，引入了一个范式转变。我们的方法捕捉了息肉内变异性，同时保持了息肉间可区别性，从而导致更强大的聚类。此外，我们通过整合时间邻近性约束来改进轨迹聚类，减少视觉上相似但在时间上相距较远的轨迹之间的错误正重新关联。我们在公开可用的数据集上训练和验证我们的方法，并使用留一交叉验证策略评估其性能。结果表明，与先前方法相比，碎片率减少了2.2倍。我们的结果突显了在计数息肉中时间感知的重要性，确立了一个新的技术状态。代码可在https://github.com/lparolari/temporally-aware-polyp-counting 上找到。

更新时间: 2025-07-03 09:55:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02493v1

A 10-bit S-box generated by Feistel construction from cellular automata

We propose a new 10-bit S-box generated from a Feistel construction. The subpermutations are generated by a 5-cell cellular automaton based on a unique well-chosen rule and bijective affine transformations. In particular, the cellular automaton rule is chosen based on empirical tests of its ability to generate good pseudorandom output on a ring cellular automaton. Similarly, Feistel's network layout is based on empirical data regarding the quality of the output S-box. We perform cryptanalysis of the generated 10-bit S-box, and we find security properties comparable to or sometimes even better than those of the standard AES S-box. We believe that our S-box could be used to replace the 5-bit substitution of ciphers like ASCON.

Updated: 2025-07-03 09:53:33

标题: 一个由细胞自动机生成的费斯特尔结构的10位S盒

摘要: 我们提出了一个新的10位S-box，它是从一个Feistel构造中生成的。子排列是通过基于一条独特选择的规则和双射仿射变换的5细胞元胞自动机生成的。特别地，元胞自动机规则是基于其在环形元胞自动机上生成良好伪随机输出能力的经验测试而选择的。同样，Feistel的网络布局是基于关于输出S-box质量的经验数据选择的。我们对生成的10位S-box进行了密码分析，发现其安全性能与标准AES S-box相当甚至有时更好。我们相信我们的S-box可以用来替代类似ASCON的密码中的5位替换。

更新时间: 2025-07-03 09:53:33

领域: cs.CR

下载: http://arxiv.org/abs/2507.02489v1

Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke

Mechanical thrombectomy has become the standard of care in patients with stroke due to large vessel occlusion (LVO). However, only 50% of successfully treated patients show a favorable outcome. We developed and evaluated interpretable deep learning models to predict functional outcomes in terms of the modified Rankin Scale score alongside individualized treatment effects (ITEs) using data of 449 LVO stroke patients from a randomized clinical trial. Besides clinical variables, we considered non-contrast CT (NCCT) and angiography (CTA) scans which were integrated using novel foundation models to make use of advanced imaging information. Clinical variables had a good predictive power for binary functional outcome prediction (AUC of 0.719 [0.666, 0.774]) which could slightly be improved when adding CTA imaging (AUC of 0.737 [0.687, 0.795]). Adding NCCT scans or a combination of NCCT and CTA scans to clinical features yielded no improvement. The most important clinical predictor for functional outcome was pre-stroke disability. While estimated ITEs were well calibrated to the average treatment effect, discriminatory ability was limited indicated by a C-for-Benefit statistic of around 0.55 in all models. In summary, the models allowed us to jointly integrate CT imaging and clinical features while achieving state-of-the-art prediction performance and ITE estimates. Yet, further research is needed to particularly improve ITE estimation.

Updated: 2025-07-03 09:51:56

标题: 大血管闭塞性脑卒中患者的预后预测和个体化治疗效果评估

摘要: 机械血栓切除术已成为由于大血管闭塞而导致中风的患者的标准治疗方法。然而，只有50%的成功治疗患者表现出良好的结果。我们开发并评估了可解释的深度学习模型，以预测449名LVO中风患者的功能结果，同时考虑个体化治疗效果（ITEs），使用来自随机临床试验的数据。除了临床变量外，我们还考虑了非对比CT（NCCT）和血管造影（CTA）扫描，这些扫描被整合到新颖的基础模型中，以利用先进的成像信息。临床变量对于二元功能结果预测具有良好的预测能力（AUC为0.719 [0.666，0.774]），当添加CTA成像时，预测能力略有改善（AUC为0.737 [0.687，0.795]）。添加NCCT扫描或NCCT和CTA扫描的组合到临床特征中没有改善。功能结果的最重要临床预测因子是中风前的残疾。虽然估计的ITE与平均治疗效果非常相符，但在所有模型中，鉴别能力有限，C-for-Benefit统计值约为0.55。总之，这些模型使我们能够共同整合CT成像和临床特征，同时实现最先进的预测性能和ITE估计。然而，需要进一步研究，特别是改进ITE估计。

更新时间: 2025-07-03 09:51:56

领域: eess.IV,cs.CV,cs.LG,68,I.2; J.3

下载: http://arxiv.org/abs/2507.03046v1

Deep Transfer Learning for Kidney Cancer Diagnosis

Incurable diseases continue to pose major challenges to global healthcare systems, with their prevalence shaped by lifestyle, economic, social, and genetic factors. Among these, kidney disease remains a critical global health issue, requiring ongoing research to improve early diagnosis and treatment. In recent years, deep learning (DL) has shown promise in medical imaging and diagnostics, driving significant progress in automatic kidney cancer (KC) detection. However, the success of DL models depends heavily on the availability of high-quality, domain-specific datasets, which are often limited and expensive to acquire. Moreover, DL models demand substantial computational power and storage, restricting their real-world clinical use. To overcome these barriers, transfer learning (TL) has emerged as an effective approach, enabling the reuse of pre-trained models from related domains to enhance KC diagnosis. This paper presents a comprehensive survey of DL-based TL frameworks for KC detection, systematically reviewing key methodologies, their advantages, and limitations, and analyzing their practical performance. It further discusses challenges in applying TL to medical imaging and highlights emerging trends that could influence future research. This review demonstrates the transformative role of TL in precision medicine, particularly oncology, by improving diagnostic accuracy, lowering computational demands, and supporting the integration of AI-powered tools in healthcare. The insights provided offer valuable guidance for researchers and practitioners, paving the way for future advances in KC diagnostics and personalized treatment strategies.

Updated: 2025-07-03 09:49:15

标题: 深度迁移学习用于肾癌诊断

摘要: 难以治愈的疾病持续对全球医疗系统提出重大挑战，其患病率受到生活方式、经济、社会和遗传因素的影响。其中，肾脏疾病仍然是一个关键的全球健康问题，需要持续的研究来改善早期诊断和治疗。近年来，深度学习（DL）在医学影像和诊断方面表现出潜力，推动了自动肾癌（KC）检测的重大进展。然而，DL模型的成功在很大程度上取决于高质量、领域特定的数据集的可用性，这些数据集通常有限且昂贵。此外，DL模型需要大量的计算能力和存储空间，限制了它们在真实世界临床应用中的使用。为了克服这些障碍，迁移学习（TL）已经成为一种有效的方法，可以通过重复利用相关领域的预训练模型来增强KC诊断。本文对基于DL的TL框架进行了全面调查，系统地审查了关键方法论、优势和局限性，并分析了它们的实际性能。文章进一步讨论了将TL应用于医学影像中的挑战，并突出显示可能影响未来研究的新兴趋势。这项审查展示了TL在精准医学，特别是肿瘤学中的转变作用，通过提高诊断准确性，降低计算需求，并支持在医疗保健中集成AI工具。提供的见解为研究人员和从业者提供了宝贵的指导，为KC诊断和个性化治疗策略的未来进展铺平道路。

更新时间: 2025-07-03 09:49:15

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.04318v2

Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD

Graphical model estimation from multi-omics data requires a balance between statistical estimation performance and computational scalability. We introduce a novel pseudolikelihood-based graphical model framework that reparameterizes the target precision matrix while preserving the sparsity pattern and estimates it by minimizing an $\ell_1$-penalized empirical risk based on a new loss function. The proposed estimator maintains estimation and selection consistency in various metrics under high-dimensional assumptions. The associated optimization problem allows for a provably fast computation algorithm using a novel operator-splitting approach and communication-avoiding distributed matrix multiplication. A high-performance computing implementation of our framework was tested using simulated data with up to one million variables, demonstrating complex dependency structures similar to those found in biological networks. Leveraging this scalability, we estimated a partial correlation network from a dual-omic liver cancer data set. The co-expression network estimated from the ultrahigh-dimensional data demonstrated superior specificity in prioritizing key transcription factors and co-activators by excluding the impact of epigenetic regulation, thereby highlighting the value of computational scalability in multi-omic data analysis.

Updated: 2025-07-03 09:46:23

标题: 使用HP-ACCORD在临床多组学研究中学习大规模部分相关网络

摘要: 从多组学数据中估计图形模型需要在统计估计性能和计算可伸缩性之间取得平衡。我们介绍了一种新颖的基于伪似然的图形模型框架，它重新参数化目标精度矩阵，同时保持稀疏模式，并通过基于新损失函数的$\ell_1$惩罚经验风险最小化来估计它。所提出的估计器在高维假设下维持了各种度量下的估计和选择一致性。相关的优化问题允许使用一种有保证的快速计算算法，该算法使用一种新颖的算子分裂方法和避免通信的分布式矩阵乘法。我们的框架的高性能计算实现是使用模拟数据测试的，其中包含高达一百万个变量，展示了类似于生物网络中发现的复杂依赖结构。利用这种可伸缩性，我们从双组学肝癌数据集中估计了一个偏相关网络。从超高维数据估计的共表达网络表现出在优先考虑排除表观遗传调节影响的情况下，优越的特异性，从而凸显了在多组学数据分析中计算可伸缩性的价值。

更新时间: 2025-07-03 09:46:23

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2412.11554v3

CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios

Multi-object tracking is a classic field in computer vision. Among them, pedestrian tracking has extremely high application value and has become the most popular research category. Existing methods mainly use motion or appearance information for tracking, which is often difficult in complex scenarios. For the motion information, mutual occlusions between objects often prevent updating of the motion state; for the appearance information, non-robust results are often obtained due to reasons such as only partial visibility of the object or blurred images. Although learning how to perform tracking in these situations from the annotated data is the simplest solution, the existing MOT dataset fails to satisfy this solution. Existing methods mainly have two drawbacks: relatively simple scene composition and non-realistic scenarios. Although some of the video sequences in existing dataset do not have the above-mentioned drawbacks, the number is far from adequate for research purposes. To this end, we propose a difficult large-scale dataset for multi-pedestrian tracking, shot mainly from the first-person view and all from real-life complex scenarios. We name it ``CrowdTrack'' because there are numerous objects in most of the sequences. Our dataset consists of 33 videos, containing a total of 5,185 trajectories. Each object is annotated with a complete bounding box and a unique object ID. The dataset will provide a platform to facilitate the development of algorithms that remain effective in complex situations. We analyzed the dataset comprehensively and tested multiple SOTA models on our dataset. Besides, we analyzed the performance of the foundation models on our dataset. The dataset and project code is released at: https://github.com/loseevaya/CrowdTrack .

Updated: 2025-07-03 09:36:44

标题: CrowdTrack：真实场景中困难的多行人跟踪基准

摘要: 多目标跟踪是计算机视觉中的一个经典领域。其中，行人跟踪具有极高的应用价值，并成为最受欢迎的研究类别。现有方法主要使用运动或外观信息进行跟踪，在复杂场景中往往很困难。对于运动信息，物体之间的相互遮挡经常会阻止运动状态的更新；对于外观信息，由于物体只有部分可见或图像模糊等原因，通常会得到不稳定的结果。尽管从注释数据中学习如何在这些情况下进行跟踪是最简单的解决方案，但现有的MOT数据集无法满足这种解决方案。现有方法主要存在两个缺点：场景构成相对简单和非现实场景。尽管现有数据集中的一些视频序列没有上述缺点，但数量远远不足以用于研究目的。因此，我们提出了一个大规模困难的多行人跟踪数据集，主要是从第一人称视角拍摄，全部来自真实复杂场景。我们将其命名为“CrowdTrack”，因为大多数序列中有大量物体。我们的数据集包含33个视频，总共包含5,185条轨迹。每个物体都用完整的边界框和唯一的物体ID进行注释。该数据集将提供一个平台，促进在复杂情况下保持有效的算法的开发。我们全面分析了数据集，并在我们的数据集上测试了多个最先进模型。此外，我们还分析了基础模型在我们的数据集上的性能。数据集和项目代码已发布在：https://github.com/loseevaya/CrowdTrack。

更新时间: 2025-07-03 09:36:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02479v1

Effectively Identifying Wi-Fi Devices through State Transitions

Wi-Fi management frames reveal structured communication patterns that persist even under randomization of MAC addresses. Prior approaches to associating randomized MAC addresses with devices primarily focus on probe requests, overlooking the broader set of management frames and their transition dynamics. This narrow focus limits their robustness in dense, real-world environments with high device mobility, where probe activity alone fails to yield stable and distinctive signatures. In this paper, we present a novel framework for fingerprinting Wi-Fi devices based on behavioral dynamics extracted from passively observed management frames. We model each device's behavior as a finite state machine and introduce matrix-based representations that encode both structural (state transition frequencies) and temporal (inter-state delays) characteristics. These matrices are embedded into compact feature vectors, enabling efficient similarity comparison. Through extensive evaluation in diverse real-world settings, our method achieves over 86% identification accuracy for non-randomized devices using only Wi-Fi management frames, with further improvements observed through temporal burst aggregation. Our findings are sufficient to uniquely and consistently characterize devices at scale, outperforming the state-of-the-art.

Updated: 2025-07-03 09:35:38

标题: 通过状态转换有效识别Wi-Fi设备

摘要: Wi-Fi管理帧揭示了结构化通信模式，即使在MAC地址随机化的情况下也持续存在。先前关于将随机化的MAC地址与设备关联的方法主要侧重于探测请求，忽视了更广泛的管理帧集合及其过渡动态。这种狭窄的关注范围限制了它们在设备移动性高、密集的真实世界环境中的稳健性，在那里，仅依靠探测活动无法产生稳定和独特的签名。在本文中，我们提出了一个基于从被动观察到的管理帧中提取的行为动态来对Wi-Fi设备进行指纹识别的新框架。我们将每个设备的行为建模为有限状态机，并引入基于矩阵的表示来编码结构（状态转换频率）和时间特征（状态间延迟）。这些矩阵嵌入到紧凑的特征向量中，实现了高效的相似性比较。通过在不同真实世界环境中进行广泛评估，我们的方法仅使用Wi-Fi管理帧就能达到超过86%的非随机化设备识别准确度，并通过时间突发聚合观察到进一步改进。我们的发现足以在规模上独特而一致地表征设备，胜过现有技术水平。

更新时间: 2025-07-03 09:35:38

领域: cs.CR

下载: http://arxiv.org/abs/2507.02478v1

Variational Kolmogorov-Arnold Network

Kolmogorov Arnold Networks (KANs) are an emerging architecture for building machine learning models. KANs are based on the theoretical foundation of the Kolmogorov-Arnold Theorem and its expansions, which provide an exact representation of a multi-variate continuous bounded function as the composition of a limited number of univariate continuous functions. While such theoretical results are powerful, their use as a representation learning alternative to a multi-layer perceptron (MLP) hinges on the ad-hoc choice of the number of bases modeling each of the univariate functions. In this work, we show how to address this problem by adaptively learning a potentially infinite number of bases for each univariate function during training. We therefore model the problem as a variational inference optimization problem. Our proposal, called InfinityKAN, which uses backpropagation, extends the potential applicability of KANs by treating an important hyperparameter as part of the learning process.

Updated: 2025-07-03 09:24:09

标题: 变分科尔莫戈洛夫-阿诺尔德网络

摘要: 科尔莫戈洛夫阿诺德网络（KANs）是构建机器学习模型的新兴架构。KANs基于科尔莫戈洛夫-阿诺德定理及其扩展的理论基础，提供了将多变量连续有界函数精确表示为有限数量单变量连续函数组合的方法。虽然这些理论结果很强大，但将其用作多层感知器（MLP）的表示学习替代方案取决于选择每个单变量函数建模的基数的临时选择。在这项工作中，我们展示了如何通过在训练过程中自适应学习每个单变量函数的潜在无限数量的基数来解决这个问题。因此，我们将问题建模为一种变分推理优化问题。我们的提议，称为InfinityKAN，使用反向传播，通过将重要的超参数作为学习过程的一部分来扩展KANs的潜在适用性。

更新时间: 2025-07-03 09:24:09

领域: cs.LG

下载: http://arxiv.org/abs/2507.02466v1

Quantum-enhanced causal discovery for a small number of samples

The discovery of causal relations from observed data has attracted significant interest from disciplines such as economics, social sciences, and biology. In practical applications, considerable knowledge of the underlying systems is often unavailable, and real data are usually associated with nonlinear causal structures, which makes the direct use of most conventional causality analysis methods difficult. This study proposes a novel quantum Peter-Clark (qPC) algorithm for causal discovery that does not require any assumptions about the underlying model structures. Based on conditional independence tests in a class of reproducing kernel Hilbert spaces characterized by quantum circuits, the proposed algorithm can explore causal relations from the observed data drawn from arbitrary distributions. We conducted systematic experiments on fundamental graphs of causal structures, demonstrating that the qPC algorithm exhibits better performance, particularly with smaller sample sizes compared to its classical counterpart. Furthermore, we proposed a novel optimization approach based on Kernel Target Alignment (KTA) for determining hyperparameters of quantum kernels. This method effectively reduced the risk of false positives in causal discovery, enabling more reliable inference. Our theoretical and experimental results demonstrate that the quantum algorithm can empower classical algorithms for accurate inference in causal discovery, supporting them in regimes where classical algorithms typically fail. In addition, the effectiveness of this method was validated using the datasets on Boston housing prices, heart disease, and biological signaling systems as real-world applications. These findings highlight the potential of quantum-based causal discovery methods in addressing practical challenges, particularly in small-sample scenarios, where traditional approaches have shown significant limitations.

Updated: 2025-07-03 09:15:45

标题: 量子增强因果发现在少数样本中的应用

摘要: 从观察数据中发现因果关系引起了经济学、社会科学和生物学等学科的广泛关注。在实际应用中，通常无法获得有关基础系统的大量知识，并且真实数据通常与非线性因果结构相关联，这使得大多数常规因果分析方法的直接使用变得困难。本研究提出了一种新颖的量子Peter-Clark（qPC）算法，用于因果发现，该算法不需要对基础模型结构做出任何假设。基于由量子电路表征的一类再生核希尔伯特空间中的条件独立性检验，提出的算法可以探索来自任意分布的观察数据中的因果关系。我们对因果结构的基本图进行了系统实验，结果表明与其经典对应物相比，qPC算法在特别是样本量较小时表现出更好的性能。此外，我们提出了一种基于核目标对齐（KTA）的新颖优化方法，用于确定量子核的超参数。该方法有效地降低了因果发现中误报的风险，实现更可靠的推断。我们的理论和实验结果表明，量子算法可以增强用于因果发现的经典算法，支持它们在经典算法通常失败的情况下进行准确推断。此外，我们利用波士顿房价、心脏病和生物信号系统等真实应用的数据集验证了该方法的有效性。这些发现突显了基于量子的因果发现方法在解决实际挑战方面的潜力，特别是在小样本场景中，传统方法已经显示出明显限制。

更新时间: 2025-07-03 09:15:45

领域: quant-ph,cs.AI,cs.LG,stat.ME

下载: http://arxiv.org/abs/2501.05007v2

Universal Collection of Euclidean Invariants between Pairs of Position-Orientations

Euclidean E(3) equivariant neural networks that employ scalar fields on position-orientation space M(3) have been effectively applied to tasks such as predicting molecular dynamics and properties. To perform equivariant convolutional-like operations in these architectures one needs Euclidean invariant kernels on M(3) x M(3). In practice, a handcrafted collection of invariants is selected, and this collection is then fed into multilayer perceptrons to parametrize the kernels. We rigorously describe an optimal collection of 4 smooth scalar invariants on the whole of M(3) x M(3). With optimal we mean that the collection is independent and universal, meaning that all invariants are pertinent, and any invariant kernel is a function of them. We evaluate two collections of invariants, one universal and one not, using the PONITA neural network architecture. Our experiments show that using a collection of invariants that is universal positively impacts the accuracy of PONITA significantly.

Updated: 2025-07-03 09:08:24

标题: 欧几里德位置方向对之间的通用不变量集合

摘要: 欧几里得E(3)等变神经网络在位置-方向空间M(3)上使用标量场已被有效地应用于预测分子动力学和性质等任务。为了在这些架构中执行等变卷积操作，需要在M(3) x M(3)上具有欧几里得不变性的核。在实践中，会选择手工制作的不变性集合，然后将该集合输入多层感知器以参数化核。我们严格描述了M(3) x M(3)上4个平滑标量不变性的最佳集合。所谓的最佳是指该集合是独立且通用的，意味着所有不变性都是相关的，任何不变性核都是它们的函数。我们评估了两组不变性集合，一个是通用的，另一个不是，使用PONITA神经网络架构。我们的实验表明，使用通用的不变性集合显著提高了PONITA的准确性。

更新时间: 2025-07-03 09:08:24

领域: math.DG,cs.LG

下载: http://arxiv.org/abs/2504.03299v2

Roto-Translation Invariant Metrics on Position-Orientation Space

Riemannian metrics on the position-orientation space M(3) that are roto-translation group SE(3) invariant play a key role in image analysis tasks like enhancement, denoising, and segmentation. These metrics enable roto-translation equivariant algorithms, with the associated Riemannian distance often used in implementation. However, computing the Riemannian distance is costly, which makes it unsuitable in situations where constant recomputation is needed. We propose the mav (minimal angular velocity) distance, defined as the Riemannian length of a geometrically meaningful curve, as a practical alternative. We see an application of the mav distance in geometric deep learning. Namely, neural networks architectures such as PONITA, relies on geometric invariants to create their roto-translation equivariant model. The mav distance offers a trainable invariant, with the parameters that determine the Riemannian metric acting as learnable weights. In this paper we: 1) classify and parametrize all SE(3) invariant metrics on M(3), 2) describes how to efficiently calculate the mav distance, and 3) investigate if including the mav distance within PONITA can positively impact its accuracy in predicting molecular properties.

Updated: 2025-07-03 09:03:48

标题: Translation-invariant metrics on the space of position and orientation with rotation.

摘要: 在位置-方向空间M（3）上的黎曼度量，它们是旋转平移群SE（3）不变的，在图像分析任务中起着关键作用，如增强、去噪和分割。这些度量使得旋转平移等变算法成为可能，相关的黎曼距离通常用于实现中。然而，计算黎曼距离是昂贵的，这使得它在需要不断重新计算的情况下不适用。我们提出了mav（最小角速度）距离，定义为几何意义曲线的黎曼长度，作为一种实际替代方案。我们看到mav距离在几何深度学习中的应用。具体来说，神经网络架构，如PONITA，依赖于几何不变量来创建它们的旋转平移等变模型。mav距离提供了一个可训练的不变量，其确定黎曼度量的参数作为可学习的权重。在本文中，我们：1）对M（3）上的所有SE（3）不变度量进行分类和参数化，2）描述如何高效计算mav距离，3）调查在PONITA中包含mav距离是否能够对其在预测分子性质时的准确性产生积极影响。

更新时间: 2025-07-03 09:03:48

领域: math.DG,cs.LG

下载: http://arxiv.org/abs/2504.03309v2

Red grape detection with accelerated artificial neural networks in the FPGA's programmable logic

Robots usually slow down for canning to detect objects while moving. Additionally, the robot's camera is configured with a low framerate to track the velocity of the detection algorithms. This would be constrained while executing tasks and exploring, making robots increase the task execution time. AMD has developed the Vitis-AI framework to deploy detection algorithms into FPGAs. However, this tool does not fully use the FPGAs' PL. In this work, we use the FINN architecture to deploy three ANNs, MobileNet v1 with 4-bit quantisation, CNV with 2-bit quantisation, and CNV with 1-bit quantisation (BNN), inside an FPGA's PL. The models were trained on the RG2C dataset. This is a self-acquired dataset released in open access. MobileNet v1 performed better, reaching a success rate of 98 % and an inference speed of 6611 FPS. In this work, we proved that we can use FPGAs to speed up ANNs and make them suitable for attention mechanisms.

Updated: 2025-07-03 09:00:19

标题: 使用加速人工神经网络在FPGA可编程逻辑中检测红葡萄

摘要: 机器人通常在移动时会减速以侦测物体。此外，机器人的摄像头配置了低帧率来跟踪检测算法的速度。这会在执行任务和探索时受到限制，使机器人增加任务执行时间。AMD已经开发了Vitis-AI框架，将检测算法部署到FPGAs中。然而，这个工具并没有充分利用FPGAs的PL。在这项工作中，我们使用FINN架构将三个ANNs（MobileNet v1，4位量化；CNV，2位量化；CNV，1位量化）部署到FPGA的PL中。这些模型是在RG2C数据集上训练的，这是一个自行获取的开放式数据集。MobileNet v1表现更好，达到了98%的成功率和6611 FPS的推理速度。在这项工作中，我们证明了可以利用FPGAs加速ANNs，并使它们适用于注意机制。

更新时间: 2025-07-03 09:00:19

领域: cs.CV,cs.AI,cs.DC,cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.02443v1

The Gauss-Markov Adjunction: Categorical Semantics of Residuals in Supervised Learning

Enhancing the intelligibility and interpretability of machine learning is a crucial task in responding to the demand for Explicability as an AI principle, and in promoting the better social implementation of AI. The aim of our research is to contribute to this improvement by reformulating machine learning models through the lens of category theory, thereby developing a semantic framework for structuring and understanding AI systems. Our categorical modeling in this paper clarifies and formalizes the structural interplay between residuals and parameters in supervised learning. The present paper focuses on the multiple linear regression model, which represents the most basic form of supervised learning. By defining two concrete categories corresponding to parameters and data, along with an adjoint pair of functors between them, we introduce our categorical formulation of supervised learning. We show that the essential structure of this framework is captured by what we call the Gauss-Markov Adjunction. Within this setting, the dual flow of information can be explicitly described as a correspondence between variations in parameters and residuals. The ordinary least squares estimator for the parameters and the minimum residual are related via the preservation of limits by the right adjoint functor. Furthermore, we position this formulation as an instance of extended denotational semantics for supervised learning, and propose applying a semantic perspective developed in theoretical computer science as a formal foundation for Explicability in AI.

Updated: 2025-07-03 08:58:59

标题: 高斯-马尔可夫附加条件：监督学习中残差的范畴语义

摘要: 提升机器学习的可理解性和可解释性是响应对可解释性作为人工智能原则的需求，促进人工智能更好社会实施的关键任务。我们的研究旨在通过以范畴论的视角重新构建机器学习模型，从而开发一个用于构建和理解人工智能系统的语义框架，为这一改进做出贡献。本文中的范畴建模澄清和形式化了监督学习中残差和参数之间的结构相互作用。本文专注于多元线性回归模型，它代表了监督学习的最基本形式。通过定义与参数和数据对应的两个具体范畴以及它们之间的一个伴随对函子，我们引入了监督学习的范畴化表述。我们展示了这一框架的基本结构被我们称为高斯-马尔科夫伴随关系所捕捉。在这个设置中，信息的双向流动可以明确地描述为参数和残差之间的变化之间的对应关系。通过右伴随函子保持极限，参数的普通最小二乘估计器和最小残差之间存在关联。此外，我们将这个表述定位为监督学习的扩展指示语义的一个示例，并提出将在理论计算机科学中发展的语义视角作为人工智能可解释性的形式基础。

更新时间: 2025-07-03 08:58:59

领域: cs.AI,math.CT,stat.ME,stat.ML

下载: http://arxiv.org/abs/2507.02442v1

A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms

Gravitational-wave approximants are essential for gravitational-wave astronomy, allowing the coverage binary black hole parameter space for inference or match filtering without costly numerical relativity (NR) simulations, but generally trading some accuracy for computational efficiency. To reduce this trade-off, NR surrogate models can be constructed using interpolation within NR waveform space. We present a 2-stage training approach for neural network-based NR surrogate models. Initially trained on approximant-generated waveforms and then fine-tuned with NR data, these dual-stage artificial neural surrogate (\texttt{DANSur}) models offer rapid and competitively accurate waveform generation, generating millions in under 20ms on a GPU while keeping mean mismatches with NR around $10^{-4}$. Implemented in the \textsc{bilby} framework, we show they can be used for parameter estimation tasks.

Updated: 2025-07-03 08:56:15

标题: 一个基于深度学习的数值相对论代用模型，用于双黑洞波形

摘要: 引力波近似是引力波天文学中至关重要的，它允许在没有昂贵数值相对论（NR）模拟的情况下覆盖二进制黑洞参数空间进行推断或匹配滤波，但通常会在计算效率上牺牲一些准确性。为了减少这种权衡，可以利用NR代理模型通过在NR波形空间内进行插值来构建。我们提出了一种基于神经网络的NR代理模型的两阶段训练方法。这些双阶段的人工神经代理（DANSur）模型首先在近似生成的波形上进行训练，然后再用NR数据进行微调，可以快速、具有竞争力地生成波形，在GPU上在不到20毫秒内生成数百万个波形，同时保持与NR的平均不匹配约为$10^{-4}$。在\textsc{bilby}框架中实现，我们展示了它们可以用于参数估计任务。

更新时间: 2025-07-03 08:56:15

领域: gr-qc,astro-ph.HE,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2412.06946v2

Unsupervised Cognition

Unsupervised learning methods have a soft inspiration in cognition models. To this day, the most successful unsupervised learning methods revolve around clustering samples in a mathematical space. In this paper we propose a primitive-based, unsupervised learning approach for decision-making inspired by a novel cognition framework. This representation-centric approach models the input space constructively as a distributed hierarchical structure in an input-agnostic way. We compared our approach with both current state-of-the-art unsupervised learning classification, with current state-of-the-art small and incomplete datasets classification, and with current state-of-the-art cancer type classification. We show how our proposal outperforms previous state-of-the-art. We also evaluate some cognition-like properties of our proposal where it not only outperforms the compared algorithms (even supervised learning ones), but it also shows a different, more cognition-like, behaviour.

Updated: 2025-07-03 08:53:17

标题: 无监督认知

摘要: 无监督学习方法在认知模型中有一定的启发。到目前为止，最成功的无监督学习方法围绕在数学空间中对样本进行聚类。本文提出了一种基于原始的、无监督学习方法，用于受到新颖认知框架启发的决策制定。这种以表示为中心的方法将输入空间以一种与输入无关的方式构建成分布式的分层结构。我们将我们的方法与当前最先进的无监督学习分类方法、当前最先进的小型和不完整数据集分类方法以及当前最先进的癌症类型分类方法进行了比较。我们展示了我们的提案如何胜过先前的最先进技术。我们还评估了我们的提案的一些类似认知的属性，它不仅胜过了比较的算法（甚至包括监督学习算法），而且表现出了不同的、更类似认知的行为。

更新时间: 2025-07-03 08:53:17

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.18624v3

Anatomical Foundation Models for Brain MRIs

Deep Learning (DL) in neuroimaging has become increasingly relevant for detecting neurological conditions and neurodegenerative disorders. One of the most predominant biomarkers in neuroimaging is represented by brain age, which has been shown to be a good indicator for different conditions, such as Alzheimer's Disease. Using brain age for weakly supervised pre-training of DL models in transfer learning settings has also recently shown promising results, especially when dealing with data scarcity of different conditions. On the other hand, anatomical information of brain MRIs (e.g. cortical thickness) can provide important information for learning good representations that can be transferred to many downstream tasks. In this work, we propose AnatCL, an anatomical foundation model for brain MRIs that i.) leverages anatomical information in a weakly contrastive learning approach, and ii.) achieves state-of-the-art performances across many different downstream tasks. To validate our approach we consider 12 different downstream tasks for the diagnosis of different conditions such as Alzheimer's Disease, autism spectrum disorder, and schizophrenia. Furthermore, we also target the prediction of 10 different clinical assessment scores using structural MRI data. Our findings show that incorporating anatomical information during pre-training leads to more robust and generalizable representations. Pre-trained models can be found at: https://github.com/EIDOSLAB/AnatCL.

Updated: 2025-07-03 08:51:54

标题: 大脑MRI的解剖基础模型

摘要: 在神经影像学中，深度学习（DL）在检测神经病和神经退行性疾病方面变得越来越重要。神经影像学中最显著的生物标志物之一是脑龄，已被证明是不同疾病（如阿尔茨海默病）的良好指标。最近，在迁移学习设置中使用脑龄进行弱监督预训练DL模型也显示出有希望的结果，特别是在处理不同疾病数据稀缺性时。另一方面，脑MRI的解剖信息（例如皮质厚度）可以为学习良好的表示提供重要信息，这些表示可以迁移到许多下游任务。在这项工作中，我们提出了AnatCL，一个用于脑MRI的解剖基础模型，该模型i.) 利用弱对比学习方法中的解剖信息，ii.) 在许多不同下游任务中实现了最先进的性能。为了验证我们的方法，我们考虑了用于诊断不同疾病（如阿尔茨海默病、自闭症谱系障碍和精神分裂症）的12个不同下游任务。此外，我们还针对使用结构MRI数据预测10个不同临床评估分数。我们的研究结果表明，在预训练过程中结合解剖信息可以导致更稳健和可泛化的表示。预训练模型可在以下链接找到：https://github.com/EIDOSLAB/AnatCL。

更新时间: 2025-07-03 08:51:54

领域: eess.IV,cs.AI,cs.CV,cs.LG,68T07,I.2.6

下载: http://arxiv.org/abs/2408.07079v4

Optimisation Is Not What You Need

The Artificial Intelligence field has focused on developing optimisation methods to solve multiple problems, specifically problems that we thought to be only solvable through cognition. The obtained results have been outstanding, being able to even surpass the Turing Test. However, we have found that these optimisation methods share some fundamental flaws that impede them to become a true artificial cognition. Specifically, the field have identified catastrophic forgetting as a fundamental problem to develop such cognition. This paper formally proves that this problem is inherent to optimisation methods, and as such it will always limit approaches that try to solve the Artificial General Intelligence problem as an optimisation problem. Additionally, it addresses the problem of overfitting and discuss about other smaller problems that optimisation methods pose. Finally, it empirically shows how world-modelling methods avoid suffering from either problem. As a conclusion, the field of Artificial Intelligence needs to look outside the machine learning field to find methods capable of developing an artificial cognition.

Updated: 2025-07-03 08:50:20

标题: 优化并非你所需的东西

摘要: 人工智能领域一直致力于开发优化方法来解决多个问题，特别是我们认为只有通过认知才能解决的问题。所获得的结果非常出色，甚至能够超越图灵测试。然而，我们发现这些优化方法存在一些根本性缺陷，阻碍它们成为真正的人工认知。具体来说，该领域已经确定灾难性遗忘是发展这种认知的一个根本问题。本文正式证明了这个问题是优化方法固有的，并且因此它将始终限制那些试图将人工通用智能问题解决为优化问题的方法。此外，它解决了过拟合的问题，并讨论了优化方法提出的其他较小问题。最后，它从经验上展示了世界建模方法如何避免受到这两个问题的困扰。总之，人工智能领域需要超越机器学习领域，寻找能够开发人工认知的方法。

更新时间: 2025-07-03 08:50:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03045v1

Toward a Robust and Generalizable Metamaterial Foundation Model

Advances in material functionalities drive innovations across various fields, where metamaterials-defined by structure rather than composition-are leading the way. Despite the rise of artificial intelligence (AI)-driven design strategies, their impact is limited by task-specific retraining, poor out-of-distribution(OOD) generalization, and the need for separate models for forward and inverse design. To address these limitations, we introduce the Metamaterial Foundation Model (MetaFO), a Bayesian transformer-based foundation model inspired by large language models. MetaFO learns the underlying mechanics of metamaterials, enabling probabilistic, zero-shot predictions across diverse, unseen combinations of material properties and structural responses. It also excels in nonlinear inverse design, even under OOD conditions. By treating metamaterials as an operator that maps material properties to structural responses, MetaFO uncovers intricate structure-property relationships and significantly expands the design space. This scalable and generalizable framework marks a paradigm shift in AI-driven metamaterial discovery, paving the way for next-generation innovations.

Updated: 2025-07-03 08:48:36

标题: 走向稳健且可推广的超材料基础模型

摘要: 材料功能的进展推动了各个领域的创新，其中以结构而非成分定义的超材料引领了潮流。尽管人工智能（AI）驱动的设计策略不断发展，但受到任务特定的重新训练、出现分布外（OOD）泛化不佳以及需要前向和反向设计的独立模型的限制。为了解决这些限制，我们引入了基于贝叶斯变压器的Metamaterial Foundation Model（MetaFO），灵感来自大型语言模型。MetaFO学习了超材料的基础机制，实现了在多样、未知材料性能和结构响应组合下的概率零-shot预测。它在非线性反向设计方面表现出色，即使在OOD条件下也是如此。通过将超材料视为将材料性能映射到结构响应的操作符，MetaFO揭示了复杂的结构-性能关系，并显著扩展了设计空间。这种可扩展和通用的框架标志着AI驱动的超材料发现中的范式转变，为下一代创新铺平了道路。

更新时间: 2025-07-03 08:48:36

领域: physics.optics,cs.AI

下载: http://arxiv.org/abs/2507.02436v1

Crafting Hanzi as Narrative Bridges: An AI Co-Creation Workshop for Elderly Migrants

This paper explores how older adults, particularly aging migrants in urban China, can engage AI-assisted co-creation to express personal narratives that are often fragmented, underrepresented, or difficult to verbalize. Through a pilot workshop combining oral storytelling and the symbolic reconstruction of Hanzi, participants shared memories of migration and recreated new character forms using Xiaozhuan glyphs, suggested by the Large Language Model (LLM), together with physical materials. Supported by human facilitation and a soft AI presence, participants transformed lived experience into visual and tactile expressions without requiring digital literacy. This approach offers new perspectives on human-AI collaboration and aging by repositioning AI not as a content producer but as a supportive mechanism, and by supporting narrative agency within sociotechnical systems.

Updated: 2025-07-03 08:45:46

标题: 创作汉字作为叙事桥梁：为老年移民举办的人工智能合作创作研讨会

摘要: 本文探讨了老年人，特别是在城市中国的移民如何利用人工智能辅助的共创来表达个人叙事，这些叙事通常是片段化的、代表性不足或难以用语言表达的。通过结合口头叙述和汉字的象征重构的试点研讨会，参与者分享了移民记忆，并使用了由大型语言模型（LLM）提出的小篆字形和实物材料重新创造了新的字符形式。在人类引导和软件智能的支持下，参与者将生活经历转化为视觉和触觉表达，而无需数字素养。这种方法通过重新定位人工智能不是内容生产者而是支持机制，并通过支持社会技术系统中的叙事能力，为人类与人工智能的合作和老龄化提供了新的视角。

更新时间: 2025-07-03 08:45:46

领域: cs.HC,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.01548v2

The Artificial Scientist -- in-transit Machine Learning of Plasma Simulations

Increasing HPC cluster sizes and large-scale simulations that produce petabytes of data per run, create massive IO and storage challenges for analysis. Deep learning-based techniques, in particular, make use of these amounts of domain data to extract patterns that help build scientific understanding. Here, we demonstrate a streaming workflow in which simulation data is streamed directly to a machine-learning (ML) framework, circumventing the file system bottleneck. Data is transformed in transit, asynchronously to the simulation and the training of the model. With the presented workflow, data operations can be performed in common and easy-to-use programming languages, freeing the application user from adapting the application output routines. As a proof-of-concept we consider a GPU accelerated particle-in-cell (PIConGPU) simulation of the Kelvin- Helmholtz instability (KHI). We employ experience replay to avoid catastrophic forgetting in learning from this non-steady process in a continual manner. We detail challenges addressed while porting and scaling to Frontier exascale system.

Updated: 2025-07-03 08:40:46

标题: 《人工科学家--等待中的等离子模拟机器学习》

摘要: 随着HPC集群规模的增加和每次运行产生的数据量达到PB级别，分析过程中出现了大量的IO和存储挑战。特别是基于深度学习的技术利用这些领域数据量来提取模式，帮助建立科学理解。在这里，我们展示了一个流式工作流程，其中模拟数据直接流向机器学习（ML）框架，规避了文件系统瓶颈。数据在传输过程中进行转换，与模拟和模型训练异步进行。通过所提出的工作流程，数据操作可以在常用和易于使用的编程语言中执行，使应用用户免于调整应用输出例程。作为概念验证，我们考虑了GPU加速的粒子在格子（PIConGPU）模拟开尔文-亥姆霍兹不稳定性（KHI）。我们利用经验回放以避免在对这种非稳态过程进行持续学习时出现灾难性遗忘。我们详细介绍了在将其移植和扩展到Frontier exascale系统时解决的挑战。

更新时间: 2025-07-03 08:40:46

领域: physics.comp-ph,cs.DC,cs.LG

下载: http://arxiv.org/abs/2501.03383v3

Sequence-aware Pre-training for Echocardiography Probe Movement Guidance

Echocardiography is an essential medical technique for diagnosing cardiovascular diseases, but its high operational complexity has led to a shortage of trained professionals. To address this issue, we introduce a novel probe movement guidance algorithm that has the potential to be applied in guiding robotic systems or novices with probe pose adjustment for high-quality standard plane image acquisition.Cardiac ultrasound faces two major challenges: (1) the inherently complex structure of the heart, and (2) significant individual variations. Previous works have only learned the population-averaged structure of the heart rather than personalized cardiac structures, leading to a performance bottleneck. Clinically, we observe that sonographers dynamically adjust their interpretation of a patient's cardiac anatomy based on prior scanning sequences, consequently refining their scanning strategies. Inspired by this, we propose a novel sequence-aware self-supervised pre-training method. Specifically, our approach learns personalized three-dimensional cardiac structural features by predicting the masked-out image features and probe movement actions in a scanning sequence. We hypothesize that if the model can predict the missing content it has acquired a good understanding of personalized cardiac structure. Extensive experiments on a large-scale expert scanning dataset with 1.31 million samples demonstrate that our proposed sequence-aware paradigm can effectively reduce probe guidance errors compared to other advanced baseline methods. Our code will be released after acceptance.

Updated: 2025-07-03 08:38:13

标题: 超声心动图探头移动指导的序列感知预训练

摘要: 超声心动图是诊断心血管疾病的必要医学技术，但其高操作复杂性导致专业人员短缺。为解决这一问题，我们引入了一种新颖的探头移动指导算法，该算法有潜力应用于引导机器人系统或新手进行探头姿势调整，以获得高质量的标准平面图像获取。心脏超声面临两个主要挑战：（1）心脏固有的复杂结构，以及（2）显著的个体差异。先前的工作仅学习了心脏的人口平均结构，而不是个性化的心脏结构，导致了性能瓶颈。临床上，我们观察到超声医生根据先前的扫描序列动态调整对患者心脏解剖的解释，从而完善他们的扫描策略。受此启发，我们提出了一种新颖的序列感知的自监督预训练方法。具体来说，我们的方法通过预测扫描序列中被屏蔽的图像特征和探头移动动作来学习个性化的三维心脏结构特征。我们假设，如果模型能够预测缺失的内容，那么它已经对个性化的心脏结构有了良好的理解。在一个包含131万个样本的大规模专家扫描数据集上进行的大量实验表明，我们提出的序列感知范式相对于其他先进的基线方法能够有效减少探头引导错误。我们的代码将在接受后发布。

更新时间: 2025-07-03 08:38:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.15026v2

Is Complex Query Answering Really Complex?

Complex query answering (CQA) on knowledge graphs (KGs) is gaining momentum as a challenging reasoning task. In this paper, we show that the current benchmarks for CQA might not be as complex as we think, as the way they are built distorts our perception of progress in this field. For example, we find that in these benchmarks, most queries (up to 98% for some query types) can be reduced to simpler problems, e.g., link prediction, where only one link needs to be predicted. The performance of state-of-the-art CQA models decreases significantly when such models are evaluated on queries that cannot be reduced to easier types. Thus, we propose a set of more challenging benchmarks composed of queries that require models to reason over multiple hops and better reflect the construction of real-world KGs. In a systematic empirical investigation, the new benchmarks show that current methods leave much to be desired from current CQA methods.

Updated: 2025-07-03 08:36:22

标题: 复杂查询回答真的很复杂吗？

摘要: 在知识图谱（KGs）上进行复杂查询回答（CQA）正日益成为一项具有挑战性的推理任务。本文表明，当前CQA的基准可能并不像我们认为的那么复杂，因为它们的构建方式扭曲了我们对该领域进展的看法。例如，我们发现在这些基准中，大多数查询（对于某些查询类型高达98%）可以简化为更简单的问题，例如链接预测，只需要预测一个链接。当这些模型在无法简化为更简单类型的查询上进行评估时，最先进的CQA模型的性能显着下降。因此，我们提出一组更具挑战性的基准，由需要模型进行多跳推理的查询组成，更好地反映了真实世界KGs的构建。在一次系统性的实证调查中，新的基准显示当前方法在当前CQA方法方面仍有很大的改进空间。

更新时间: 2025-07-03 08:36:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.12537v3

CyberRAG: An agentic RAG cyber attack classification and reporting tool

Intrusion Detection and Prevention Systems (IDS/IPS) in large enterprises can generate hundreds of thousands of alerts per hour, overwhelming security analysts with logs that demand deep, rapidly evolving domain expertise. Conventional machine-learning detectors trim the alert volume but still yield high false-positive rates, while standard single-pass Retrieval-Augmented Generation (RAG) pipelines often retrieve irrelevant context and fail to justify their predictions. To overcome these shortcomings, we present CyberRAG, a modular, agent-based RAG framework that delivers real-time classification, explanation, and structured reporting for cyber-attacks. A central LLM agent orchestrates (i) a pool of fine-tuned specialized classifiers, each tailored to a distinct attack family; (ii) tool adapters for enrichment and alerting; and (iii) an iterative retrieval-and-reason loop that continuously queries a domain-specific knowledge base until the evidence is both relevant and self-consistent. Unlike traditional RAG systems, CyberRAG embraces an agentic design that enables dynamic control flow and adaptive reasoning. This agent-centric architecture refines its threat labels and natural-language justifications autonomously, reducing false positives and enhancing interpretability. The framework is fully extensible: new attack types can be supported by simply adding a classifier without retraining the core agent. CyberRAG has been evaluated achieving over 94% accuracy per class and pushing final classification accuracy to 94.92% through semantic orchestration. Generated explanations score up to 0.94 in BERTScore and 4.9/5 in GPT-4-based expert evaluation. These results show that agentic, specialist-oriented RAG can pair high detection accuracy with trustworthy, SOC-ready prose, offering a practical and scalable path toward semi-autonomous cyber-defence workflows.

Updated: 2025-07-03 08:32:19

标题: 《CyberRAG：一种主动的RAG网络攻击分类和报告工具》

摘要: 大型企业中的入侵检测和预防系统（IDS/IPS）可能每小时生成数十万条警报，这些警报会使安全分析人员不堪重负，需要深入、快速发展的领域专业知识来处理日志。传统的机器学习检测器可以减少警报数量，但仍会产生高误报率，而标准的单通行检索增强生成（RAG）流水线通常会检索到无关的上下文，并且无法证明其预测。为了克服这些缺点，我们提出了CyberRAG，这是一个模块化的、基于代理的RAG框架，可以提供实时分类、解释和结构化报告以应对网络攻击。中央的LLM代理协调了（i）一组经过精细调整的专门分类器池，每个分类器都针对不同的攻击类型进行了定制；（ii）用于丰富和警报的工具适配器；以及（iii）一个迭代的检索和推理循环，不断查询领域特定知识库，直到证据既相关又自洽。与传统的RAG系统不同，CyberRAG采用了一种代理设计，可以实现动态控制流和自适应推理。这种代理为其威胁标签和自然语言解释进行自主精细化，减少了误报，并增强了可解释性。该框架完全可扩展：只需添加一个分类器即可支持新的攻击类型，而无需重新训练核心代理。通过语义编排，CyberRAG已经实现了每类超过94%的准确率，并将最终分类准确率提升至94.92%。生成的解释在BERTScore评分中达到0.94，在基于GPT-4的专家评估中达到4.9/5。这些结果表明，基于代理、面向专家的RAG可以将高检测准确性与值得信赖的SOC准备散文相匹配，为半自主的网络防御工作流程提供了实用且可扩展的路径。

更新时间: 2025-07-03 08:32:19

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.02424v1

Delving into LLM-assisted writing in biomedical publications through excess vocabulary

Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists use them for their scholarly writing. But how wide-spread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: we study vocabulary changes in over 15 million biomedical abstracts from 2010--2024 indexed by PubMed, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the Covid pandemic.

Updated: 2025-07-03 08:26:13

标题: 探讨在生物医学出版物中通过冗余词汇辅助写作

摘要: 大型语言模型（LLMs）如ChatGPT可以生成和修改文本，表现出人类水平的性能。这些模型存在明显的局限性：它们可能产生不准确的信息，强化现有的偏见，并且容易被滥用。然而，许多科学家将它们用于学术写作。但是，学术文献中使用这种LLM的普及程度如何？为了回答这个问题，我们提出了一种公正、大规模的方法：我们研究了PubMed索引的2010年至2024年间超过1500万篇生物医学摘要中的词汇变化，并展示了LLMs的出现如何导致某些文体词频率急剧增加。这种过多的词汇分析表明，至少有13.5%的2024年摘要是使用LLMs处理的。这个下限在不同学科、国家和期刊之间有所不同，有些子语料库达到40%。我们展示了LLMs在生物医学研究的科学写作中产生了前所未有的影响，超过了像Covid大流行这样的重大世界事件的影响。

更新时间: 2025-07-03 08:26:13

领域: cs.CL,cs.AI,cs.CY,cs.DL,cs.SI

下载: http://arxiv.org/abs/2406.07016v5

Determination Of Structural Cracks Using Deep Learning Frameworks

Structural crack detection is a critical task for public safety as it helps in preventing potential structural failures that could endanger lives. Manual detection by inexperienced personnel can be slow, inconsistent, and prone to human error, which may compromise the reliability of assessments. The current study addresses these challenges by introducing a novel deep-learning architecture designed to enhance the accuracy and efficiency of structural crack detection. In this research, various configurations of residual U-Net models were utilized. These models, due to their robustness in capturing fine details, were further integrated into an ensemble with a meta-model comprising convolutional blocks. This unique combination aimed to boost prediction efficiency beyond what individual models could achieve. The ensemble's performance was evaluated against well-established architectures such as SegNet and the traditional U-Net. Results demonstrated that the residual U-Net models outperformed their predecessors, particularly with low-resolution imagery, and the ensemble model exceeded the performance of individual models, proving it as the most effective. The assessment was based on the Intersection over Union (IoU) metric and DICE coefficient. The ensemble model achieved the highest scores, signifying superior accuracy. This advancement suggests way for more reliable automated systems in structural defects monitoring tasks.

Updated: 2025-07-03 08:24:47

标题: 利用深度学习框架确定结构裂缝

摘要: 结构裂纹检测是公共安全的重要任务，因为它有助于预防可能危及生命的潜在结构失效。由经验不足的人员进行手动检测可能会缓慢、不一致，并且容易出现人为错误，这可能会损害评估的可靠性。本研究通过引入一种新颖的深度学习架构来解决这些挑战，旨在提高结构裂纹检测的准确性和效率。在这项研究中，利用了各种配置的残差U-Net模型。由于这些模型在捕捉细节方面的稳健性，它们进一步与包含卷积块的元模型集成在一起。这种独特的组合旨在提高预测效率，超越个体模型所能实现的。该集成模型的性能与诸如SegNet和传统U-Net等已建立的架构进行了评估。结果表明残差U-Net模型在低分辨率图像方面胜过它们的前任，并且集成模型超越了个体模型的表现，证明它是最有效的。评估基于交集联盟（IoU）指标和DICE系数。集成模型取得了最高分数，表明其准确性更高。这一进步为结构缺陷监测任务中更可靠的自动化系统提供了途径。

更新时间: 2025-07-03 08:24:47

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2507.02416v1

ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation

Realistic Hand manipulation is a key component of immersive virtual reality (VR), yet existing methods often rely on kinematic approach or motion-capture datasets that omit crucial physical attributes such as contact forces and finger torques. Consequently, these approaches prioritize tight, one-size-fits-all grips rather than reflecting users' intended force levels. We present ForceGrip, a deep learning agent that synthesizes realistic hand manipulation motions, faithfully reflecting the user's grip force intention. Instead of mimicking predefined motion datasets, ForceGrip uses generated training scenarios-randomizing object shapes, wrist movements, and trigger input flows-to challenge the agent with a broad spectrum of physical interactions. To effectively learn from these complex tasks, we employ a three-phase curriculum learning framework comprising Finger Positioning, Intention Adaptation, and Dynamic Stabilization. This progressive strategy ensures stable hand-object contact, adaptive force control based on user inputs, and robust handling under dynamic conditions. Additionally, a proximity reward function enhances natural finger motions and accelerates training convergence. Quantitative and qualitative evaluations reveal ForceGrip's superior force controllability and plausibility compared to state-of-the-art methods. Demo videos are available as supplementary material and the code is provided at https://han-dongheun.github.io/ForceGrip.

Updated: 2025-07-03 08:24:20

标题: ForceGrip：无参考课程学习用于虚拟现实手部操作中的逼真握力控制

摘要: 现实手部操作是沉浸式虚拟现实（VR）的关键组成部分，然而现有方法通常依赖于运动学方法或运动捕捉数据集，这些数据集忽略了关键的物理属性，如接触力和手指扭矩。因此，这些方法优先考虑紧密的、一揽子适用的握持，而不是反映用户意图的力量水平。我们提出了ForceGrip，一个深度学习代理，能够合成逼真的手部操作动作，忠实地反映用户的握持力意图。ForceGrip不是模仿预定义的运动数据集，而是利用生成的训练场景——随机化对象形状、手腕运动和触发输入流——挑战代理与广泛范围的物理相互作用。为了有效地从这些复杂任务中学习，我们采用了一个三阶段的课程学习框架，包括手指定位、意图适应和动态稳定。这种渐进式策略确保了稳定的手-物体接触、基于用户输入的自适应力控制以及在动态条件下的强大处理能力。此外，一个接近奖励函数增强了自然手指动作，并加速了训练的收敛。定量和定性评估显示，与最先进的方法相比，ForceGrip具有更优越的力量可控性和可信度。Demo视频可作为补充材料提供，代码可在https://han-dongheun.github.io/ForceGrip上找到。

更新时间: 2025-07-03 08:24:20

领域: cs.RO,cs.GR,cs.HC,cs.LG

下载: http://arxiv.org/abs/2503.08061v4

EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks

Recent foundational models for tabular data, such as TabPFN, excel at adapting to new tasks via in-context learning, but remain constrained to a fixed, pre-defined number of target dimensions-often necessitating costly ensembling strategies. We trace this constraint to a deeper architectural shortcoming: these models lack target equivariance, so that permuting target dimension orderings alters their predictions. This deficiency gives rise to an irreducible "equivariance gap", an error term that introduces instability in predictions. We eliminate this gap by designing a fully target-equivariant architecture-ensuring permutation invariance via equivariant encoders, decoders, and a bi-attention mechanism. Empirical evaluation on standard classification benchmarks shows that, on datasets with more classes than those seen during pre-training, our model matches or surpasses existing methods while incurring lower computational overhead.

Updated: 2025-07-03 08:22:46

标题: EquiTabPFN：一种目标置换等变先验拟合网络

摘要: 最近针对表格数据的基础模型，如TabPFN，在通过上下文学习适应新任务方面表现出色，但仍受限于预定义的固定目标维度数量，通常需要昂贵的集成策略。我们将这一约束追溯到更深层次的架构缺陷：这些模型缺乏目标等变性，因此改变目标维度排序会改变它们的预测。这种缺陷导致了一个不可消除的“等变性差距”，一个误差项引入了预测的不稳定性。我们通过设计一个完全目标等变的架构来消除这一差距-通过等变编码器、解码器和双重注意机制确保置换不变性。在标准分类基准测试中的实证评估表明，在具有比预训练期间观察到的更多类别的数据集上，我们的模型与现有方法相匹配或超越，同时产生更低的计算开销。

更新时间: 2025-07-03 08:22:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.06684v2

Significativity Indices for Agreement Values

Agreement measures, such as Cohen's kappa or intraclass correlation, gauge the matching between two or more classifiers. They are used in a wide range of contexts from medicine, where they evaluate the effectiveness of medical treatments and clinical trials, to artificial intelligence, where they can quantify the approximation due to the reduction of a classifier. The consistency of different classifiers to a golden standard can be compared simply by using the order induced by their agreement measure with respect to the golden standard itself. Nevertheless, labelling an approach as good or bad exclusively by using the value of an agreement measure requires a scale or a significativity index. Some quality scales have been proposed in the literature for Cohen's kappa, but they are mainly na\"ive, and their boundaries are arbitrary. This work proposes a general approach to evaluate the significativity of any agreement value between two classifiers and introduces two significativity indices: one dealing with finite data sets, the other one handling classification probability distributions. Moreover, this manuscript addresses the computational challenges of evaluating such indices and proposes some efficient algorithms for their evaluation.

Updated: 2025-07-03 08:21:25

标题: 一致性值的显著性指数

摘要: 协议度量，如科恩的kappa系数或组内相关性，评估两个或多个分类器之间的匹配程度。它们被广泛应用于各种领域，从医学领域，评估医疗治疗和临床试验的有效性，到人工智能领域，可以量化由于分类器减少而导致的近似值。不同分类器相对于一个黄金标准的一致性可以通过它们与黄金标准本身的协议度量所引发的顺序进行比较。然而，仅通过协议度量的值来将一个方法标记为好或坏需要一个规模或显著性指数。文献中已经提出了一些适用于科恩的kappa系数的质量标准，但它们主要是天真的，而且它们的边界是任意的。本研究提出了一种一般方法来评估两个分类器之间任何协议值的显著性，并引入了两种显著性指数：一种处理有限数据集，另一种处理分类概率分布。此外，本文介绍了评估这些指数的计算挑战，并提出了一些有效的算法来评估它们。

更新时间: 2025-07-03 08:21:25

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2504.15325v2

Privacy-preserving Preselection for Face Identification Based on Packing

Face identification systems operating in the ciphertext domain have garnered significant attention due to increasing privacy concerns and the potential recovery of original facial data. However, as the size of ciphertext template libraries grows, the face retrieval process becomes progressively more time-intensive. To address this challenge, we propose a novel and efficient scheme for face retrieval in the ciphertext domain, termed Privacy-Preserving Preselection for Face Identification Based on Packing (PFIP). PFIP incorporates an innovative preselection mechanism to reduce computational overhead and a packing module to enhance the flexibility of biometric systems during the enrollment stage. Extensive experiments conducted on the LFW and CASIA datasets demonstrate that PFIP preserves the accuracy of the original face recognition model, achieving a 100% hit rate while retrieving 1,000 ciphertext face templates within 300 milliseconds. Compared to existing approaches, PFIP achieves a nearly 50x improvement in retrieval efficiency.

Updated: 2025-07-03 08:15:07

标题: 基于封装的面部识别隐私保护预选

摘要: 在密码域中运行的人脸识别系统引起了广泛关注，原因是隐私问题日益增加以及可能恢复原始面部数据。然而，随着密文模板库的规模增长，人脸检索过程变得越来越耗时。为了解决这一挑战，我们提出了一种新颖高效的密码域人脸检索方案，称为基于打包的面部识别的隐私保护预选（PFIP）。PFIP结合了创新的预选机制来减少计算开销，以及打包模块来增强生物特征系统在注册阶段的灵活性。在LFW和CASIA数据集上进行的大量实验表明，PFIP保留了原始人脸识别模型的准确性，在300毫秒内以100%的命中率检索到了1,000个密文人脸模板。与现有方法相比，PFIP的检索效率提高了近50倍。

更新时间: 2025-07-03 08:15:07

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2507.02414v1

K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function

Early evaluation of children's language is frustrated by the high pitch, long phones, and sparse data that derail automatic speech recognisers. We introduce K-Function, a unified framework that combines accurate sub-word transcription, objective scoring, and actionable feedback. Its core, Kids-WFST, merges a Wav2Vec2 phoneme encoder with a phoneme-similarity Dysfluent-WFST to capture child-specific errors while remaining fully interpretable. Kids-WFST attains 1.39% phoneme error on MyST and 8.61% on Multitudes--absolute gains of 10.47 and 7.06 points over a greedy-search decoder. These high-fidelity transcripts power an LLM that grades verbal skills, milestones, reading, and comprehension, aligning with human proctors and supplying tongue-and-lip visualizations plus targeted advice. The results show that precise phoneme recognition cements a complete diagnostic-feedback loop, paving the way for scalable, clinician-ready language assessment.

Updated: 2025-07-03 08:05:02

标题: K-功能：联合发音转录和反馈用于评估儿童语言功能

摘要: Children's language evaluation in the early stages is challenging due to the high pitch, long phones, and sparse data that hinder automatic speech recognition systems. In this study, we present the K-Function framework, which combines accurate sub-word transcription, objective scoring, and actionable feedback. The core of this framework, Kids-WFST, integrates a Wav2Vec2 phoneme encoder with a phoneme-similarity Dysfluent-WFST to identify child-specific errors in a clear and interpretable manner. Kids-WFST achieves a phoneme error rate of 1.39% on the MyST dataset and 8.61% on the Multitudes dataset, showing significant improvements over a greedy-search decoder. These high-quality transcripts are used in an LLM for grading verbal skills, milestones, reading, and comprehension, aligning with human assessors and providing visualizations of tongue-and-lip movements as well as targeted advice. The results demonstrate that accurate phoneme recognition establishes a comprehensive diagnostic-feedback loop, enabling scalable and clinician-friendly language assessment.

更新时间: 2025-07-03 08:05:02

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.03043v1

S2FGL: Spatial Spectral Federated Graph Learning

Federated Graph Learning (FGL) combines the privacy-preserving capabilities of federated learning (FL) with the strong graph modeling capability of Graph Neural Networks (GNNs). Current research addresses subgraph-FL only from the structural perspective, neglecting the propagation of graph signals on spatial and spectral domains of the structure. From a spatial perspective, subgraph-FL introduces edge disconnections between clients, leading to disruptions in label signals and a degradation in the class knowledge of the global GNN. From a spectral perspective, spectral heterogeneity causes inconsistencies in signal frequencies across subgraphs, which makes local GNNs overfit the local signal propagation schemes. As a result, spectral client drifts occur, undermining global generalizability. To tackle the challenges, we propose a global knowledge repository to mitigate label signal disruption and a frequency alignment to address spectral client drifts. The combination of spatial and spectral strategies forms our framework S2FGL. Extensive experiments on multiple datasets demonstrate the superiority of S2FGL. The code is available at https://github.com/Wonder7racer/S2FGL.git.

Updated: 2025-07-03 08:04:49

标题: S2FGL: 空间光谱联合图学习

摘要: Federated Graph Learning (FGL)结合了联邦学习（FL）的隐私保护能力和图神经网络（GNNs）的强大图建模能力。当前的研究仅从结构角度解决了子图-FL问题，忽视了在空间和频谱域中图信号的传播。从空间角度来看，子图-FL在客户之间引入了边缘断开，导致标签信号的中断和全局GNN类知识的降级。从频谱角度来看，频谱异质性导致子图之间信号频率的不一致性，使得本地GNNs过度拟合本地信号传播方案。因此，频谱客户漂移会对全局泛化能力造成影响。为了解决这些挑战，我们提出了一个全局知识存储库来减轻标签信号中断，并提出频率对齐来解决频谱客户漂移问题。空间和频谱策略的结合形成了我们的S2FGL框架。在多个数据集上进行的大量实验证明了S2FGL的优越性。代码可在https://github.com/Wonder7racer/S2FGL.git上找到。

更新时间: 2025-07-03 08:04:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.02409v1

Attention-based clustering

Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an unsupervised setting. In particular, we demonstrate their suitability for clustering when the input data is generated from a Gaussian mixture model. To this end, we study a simplified two-head attention layer and define a population risk whose minimization with unlabeled data drives the head parameters to align with the true mixture centroids. This phenomenon highlights the ability of attention-based layers to capture underlying distributional structure. We further examine an attention layer with key, query, and value matrices fixed to the identity, and show that, even without any trainable parameters, it can perform in-context quantization, revealing the surprising capacity of transformer-based methods to adapt dynamically to input-specific distributions.

Updated: 2025-07-03 08:04:46

标题: 基于注意力的聚类

摘要: Transformers作为一种强大的神经网络架构已经出现，能够处理各种学习任务。在这项工作中，我们对它们在无监督设置中自动从数据中提取结构的能力进行了理论分析。特别是，我们展示了它们在输入数据来自高斯混合模型时进行聚类的适用性。为此，我们研究了一个简化的双头注意力层，并定义了一个人群风险，通过最小化未标记数据来推动头参数与真实混合中心对齐。这种现象突显了基于注意力的层捕捉底层分布结构的能力。我们进一步研究了一个注意力层，其中关键、查询和值矩阵固定为身份矩阵，并展示了即使没有任何可训练参数，它也可以进行上下文量化，揭示了基于transformer的方法动态适应特定输入分布的惊人能力。

更新时间: 2025-07-03 08:04:46

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2505.13112v2

Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability

Most existing automatic speech recognition (ASR) research evaluate models using in-domain datasets. However, they seldom evaluate how they generalize across diverse speech contexts. This study addresses this gap by benchmarking seven Akan ASR models built on transformer architectures, such as Whisper and Wav2Vec2, using four Akan speech corpora to determine their performance. These datasets encompass various domains, including culturally relevant image descriptions, informal conversations, biblical scripture readings, and spontaneous financial dialogues. A comparison of the word error rate and character error rate highlighted domain dependency, with models performing optimally only within their training domains while showing marked accuracy degradation in mismatched scenarios. This study also identified distinct error behaviors between the Whisper and Wav2Vec2 architectures. Whereas fine-tuned Whisper Akan models led to more fluent but potentially misleading transcription errors, Wav2Vec2 produced more obvious yet less interpretable outputs when encountering unfamiliar inputs. This trade-off between readability and transparency in ASR errors should be considered when selecting architectures for low-resource language (LRL) applications. These findings highlight the need for targeted domain adaptation techniques, adaptive routing strategies, and multilingual training frameworks for Akan and other LRLs.

Updated: 2025-07-03 08:01:26

标题: 在特定领域数据集上对阿肯语自动语音识别模型进行基准测试：对性能、可扩展性和适应性的比较评估

摘要: 大多数现有的自动语音识别（ASR）研究都是使用领域内数据集来评估模型。然而，它们很少评估模型在不同语音环境中的泛化能力。本研究通过对七个基于变压器架构构建的阿肯语ASR模型进行基准测试，其中包括Whisper和Wav2Vec2，使用四个阿肯语语音语料库来确定它们的性能。这些数据集涵盖了各种领域，包括具有文化相关性的图像描述、非正式对话、圣经经文朗读和自发的金融对话。通过对词错误率和字符错误率的比较突显了领域依赖性，模型只在其训练领域内表现最佳，而在不匹配的情况下显示出明显的准确性下降。本研究还确定了Whisper和Wav2Vec2架构之间的明显错误行为差异。尽管经过微调的Whisper阿肯模型会导致更流畅但可能具有误导性的转录错误，但当遇到不熟悉的输入时，Wav2Vec2会产生更明显但不太可解释的输出。在选择用于低资源语言（LRL）应用的架构时，应考虑ASR错误中可读性和透明性之间的权衡。这些发现凸显了对阿肯语和其他LRLs进行有针对性的领域适应技术、自适应路由策略和多语言训练框架的需求。

更新时间: 2025-07-03 08:01:26

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.02407v1

Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization

Trajectory prediction is an essential step in the pipeline of an autonomous vehicle. Inaccurate or inconsistent predictions regarding the movement of agents in its surroundings lead to poorly planned maneuvers and potentially dangerous situations for the end-user. Current state-of-the-art deep-learning-based trajectory prediction models can achieve excellent accuracy on public datasets. However, when used in more complex, interactive scenarios, they often fail to capture important interdependencies between agents, leading to inconsistent predictions among agents in the traffic scene. Inspired by the efficacy of incorporating human preference into large language models, this work fine-tunes trajectory prediction models in multi-agent settings using preference optimization. By taking as input automatically calculated preference rankings among predicted futures in the fine-tuning process, our experiments--using state-of-the-art models on three separate datasets--show that we are able to significantly improve scene consistency while minimally sacrificing trajectory prediction accuracy and without adding any excess computational requirements at inference time.

Updated: 2025-07-03 07:59:49

标题: 通过偏好优化提高车辆轨迹预测一致性

摘要: 轨迹预测是自动驾驶车辆流程中的一个关键步骤。关于周围运动代理的不准确或不一致的预测会导致规划不良的机动，并可能对最终用户造成危险情况。当前最先进的基于深度学习的轨迹预测模型可以在公共数据集上实现出色的准确性。然而，在更复杂的互动场景中使用时，它们经常无法捕捉代理之间的重要相互依赖关系，导致交通场景中代理之间的预测不一致。受到将人类偏好纳入大型语言模型的有效性启发，这项工作利用偏好优化在多代理设置中微调轨迹预测模型。通过在微调过程中将自动计算的预测未来之间的偏好排名作为输入，我们的实验--在三个独立数据集上使用最先进的模型--表明我们能够显著提高场景一致性，同时最小程度地牺牲轨迹预测准确性，并且在推理时不增加任何额外的计算需求。

更新时间: 2025-07-03 07:59:49

领域: cs.LG

下载: http://arxiv.org/abs/2507.02406v1

AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation

The recent development and wider accessibility of LLMs have spurred discussions about how they can be used in survey research, including classifying open-ended survey responses. Due to their linguistic capacities, it is possible that LLMs are an efficient alternative to time-consuming manual coding and the pre-training of supervised machine learning models. As most existing research on this topic has focused on English-language responses relating to non-complex topics or on single LLMs, it is unclear whether its findings generalize and how the quality of these classifications compares to established methods. In this study, we investigate to what extent different LLMs can be used to code open-ended survey responses in other contexts, using German data on reasons for survey participation as an example. We compare several state-of-the-art LLMs and several prompting approaches, and evaluate the LLMs' performance by using human expert codings. Overall performance differs greatly between LLMs, and only a fine-tuned LLM achieves satisfactory levels of predictive performance. Performance differences between prompting approaches are conditional on the LLM used. Finally, LLMs' unequal classification performance across different categories of reasons for survey participation results in different categorical distributions when not using fine-tuning. We discuss the implications of these findings, both for methodological research on coding open-ended responses and for their substantive analysis, and for practitioners processing or substantively analyzing such data. Finally, we highlight the many trade-offs researchers need to consider when choosing automated methods for open-ended response classification in the age of LLMs. In doing so, our study contributes to the growing body of research about the conditions under which LLMs can be efficiently, accurately, and reliably leveraged in survey research.

Updated: 2025-07-03 07:58:18

标题: 仅仅是一项调查吗？使用大型语言模型对德语开放式调查回答进行编码以研究调查动机

摘要: 最近LLM的发展和更广泛的可访问性引发了关于它们如何在调查研究中使用的讨论，包括对开放式调查回答进行分类。由于它们的语言能力，LLM可能是替代耗时的手工编码和预训练监督机器学习模型的有效选择。由于大多数现有研究集中在英语回答与非复杂主题或单个LLM相关的回答上，因此尚不清楚其研究结果是否具有普遍性，以及这些分类质量与已建立方法的比较如何。在这项研究中，我们调查了不同LLM在其他情境下对开放式调查回答进行编码的程度，以德国参与调查原因数据为例。我们比较了几种最先进的LLM和几种提示方法，并通过使用人类专家编码来评估LLM的性能。总体性能在LLM之间差异很大，只有一个经过精细调整的LLM才能达到令人满意的预测性能水平。提示方法之间的性能差异取决于所使用的LLM。最后，LLM在对调查参与原因的不同类别进行分类时表现不均衡，这导致在不使用精细调整时产生不同的分类分布。我们讨论了这些发现的影响，对于对开放式回答进行编码的方法论研究以及对其实质分析，以及对处理或实质分析这类数据的从业者。最后，我们强调研究人员在选择LLM时需要考虑的许多权衡。通过这样做，我们的研究为关于LLM在调查研究中有效、准确和可靠地利用的条件日益增多的研究做出了贡献。

更新时间: 2025-07-03 07:58:18

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2506.14634v3

XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation

The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM: first we benchmark it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we show how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at https://cosbidev.github.io/XGeM/.

Updated: 2025-07-03 07:57:05

标题: XGeM：一种用于多模态医疗数据生成的多提示基础模型

摘要: 医学影像中人工智能的应用具有巨大的潜力，但仍然受到数据稀缺、隐私问题和对稳健多模态集成的需求等挑战的制约。尽管最近生成建模的进展已经实现了高质量的合成数据生成，现有方法往往局限于单模态、单向合成，因此缺乏同时合成多种模态并保持临床一致性的能力。为了解决这一挑战，我们引入了XGeM，这是一个拥有67.7亿参数的多模态生成模型，旨在支持医学数据模态之间的灵活、任意合成。XGeM通过对比学习构建一个共享的潜在空间，并引入了一种新颖的多提示训练策略，使其能够对输入模态的任意子集进行条件化。这种设计允许模型适应异质临床输入并共同生成多个输出，同时保持语义和结构一致性。我们对XGeM进行了广泛验证：首先，我们在MIMIC-CXR数据集上与五个竞争对手进行了基准测试，这是一个用于多视角胸部X射线和放射学报告生成的最先进数据集。其次，我们进行了与专业放射科医生的视觉图灵测试，评估生成数据的真实性和临床相关性，确保与实际场景保持一致。最后，我们展示了XGeM如何支持医学数据挑战，如匿名化、类别不平衡和数据稀缺，突出了其作为医学数据合成基础模型的实用性。项目页面位于https://cosbidev.github.io/XGeM/。

更新时间: 2025-07-03 07:57:05

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2501.04614v3

Wildlife Target Re-Identification Using Self-supervised Learning in Non-Urban Settings

Wildlife re-identification aims to match individuals of the same species across different observations. Current state-of-the-art (SOTA) models rely on class labels to train supervised models for individual classification. This dependence on annotated data has driven the curation of numerous large-scale wildlife datasets. This study investigates self-supervised learning Self-Supervised Learning (SSL) for wildlife re-identification. We automatically extract two distinct views of an individual using temporal image pairs from camera trap data without supervision. The image pairs train a self-supervised model from a potentially endless stream of video data. We evaluate the learnt representations against supervised features on open-world scenarios and transfer learning in various wildlife downstream tasks. The analysis of the experimental results shows that self-supervised models are more robust even with limited data. Moreover, self-supervised features outperform supervision across all downstream tasks. The code is available here https://github.com/pxpana/SSLWildlife.

Updated: 2025-07-03 07:56:54

标题: 在非城市环境中利用自监督学习重新识别野生动物目标

摘要: 野生动物再识别旨在在不同观察中匹配同一物种的个体。当前最先进的模型依赖于类标签来训练个体分类的监督模型。对已标注数据的依赖推动了大量大规模野生动物数据集的整理。本研究调查了自监督学习(Self-Supervised Learning, SSL)用于野生动物再识别。我们使用无监督学习从相机陷阱数据中自动提取个体的两个不同视图。图像对训练一个自监督模型，可以从潜在无穷的视频数据流中获取。我们评估了学到的表示在开放世界场景和各种野生动物下游任务中的迁移学习情况。实验结果的分析显示，即使数据有限，自监督模型也更加稳健。此外，自监督特征在所有下游任务中表现优于监督特征。代码可在此处找到：https://github.com/pxpana/SSLWildlife。

更新时间: 2025-07-03 07:56:54

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02403v1

MTCNet: Motion and Topology Consistency Guided Learning for Mitral Valve Segmentationin 4D Ultrasound

Mitral regurgitation is one of the most prevalent cardiac disorders. Four-dimensional (4D) ultrasound has emerged as the primary imaging modality for assessing dynamic valvular morphology. However, 4D mitral valve (MV) analysis remains challenging due to limited phase annotations, severe motion artifacts, and poor imaging quality. Yet, the absence of inter-phase dependency in existing methods hinders 4D MV analysis. To bridge this gap, we propose a Motion-Topology guided consistency network (MTCNet) for accurate 4D MV ultrasound segmentation in semi-supervised learning (SSL). MTCNet requires only sparse end-diastolic and end-systolic annotations. First, we design a cross-phase motion-guided consistency learning strategy, utilizing a bi-directional attention memory bank to propagate spatio-temporal features. This enables MTCNet to achieve excellent performance both per- and inter-phase. Second, we devise a novel topology-guided correlation regularization that explores physical prior knowledge to maintain anatomically plausible. Therefore, MTCNet can effectively leverage structural correspondence between labeled and unlabeled phases. Extensive evaluations on the first largest 4D MV dataset, with 1408 phases from 160 patients, show that MTCNet performs superior cross-phase consistency compared to other advanced methods (Dice: 87.30%, HD: 1.75mm). Both the code and the dataset are available at https://github.com/crs524/MTCNet.

Updated: 2025-07-03 07:56:43

标题: MTCNet：运动和拓扑一致性引导学习用于四维超声心动图中二尖瓣分割

摘要: 二尖瓣反流是最常见的心脏疾病之一。四维（4D）超声已成为评估动态瓣膜形态学的主要成像模态。然而，由于阶段注释有限、严重的动态影响和图像质量不佳，4D二尖瓣（MV）分析仍然具有挑战性。然而，现有方法中缺乏不同阶段之间的依赖关系，这阻碍了4D MV分析。为了弥合这一差距，我们提出了一种基于运动拓扑引导的一致性网络（MTCNet）用于半监督学习（SSL）中准确的4D MV超声分割。MTCNet只需要稀疏的舒张末期和收缩末期注释。首先，我们设计了一个跨阶段运动引导的一致性学习策略，利用双向注意力记忆库来传播时空特征。这使得MTCNet能够在阶段内和阶段间实现出色的性能。其次，我们设计了一种新颖的拓扑引导的相关性正则化，探索物理先验知识以保持解剖学合理性。因此，MTCNet可以有效利用标记和未标记阶段之间的结构对应关系。对第一个最大的4D MV数据集进行了广泛评估，该数据集有来自160名患者的1408个阶段，结果显示MTCNet相对于其他先进方法表现出优越的跨阶段一致性（Dice：87.30%，HD：1.75mm）。代码和数据集均可在https://github.com/crs524/MTCNet 上获取。

更新时间: 2025-07-03 07:56:43

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.00660v2

Dynamic Long Short-Term Memory Based Memory Storage For Long Horizon LLM Interaction

Memory storage for Large Language models (LLMs) is becoming an increasingly active area of research, particularly for enabling personalization across long conversations. We propose Pref-LSTM, a dynamic and lightweight framework that combines a BERT-based classifier with a LSTM memory module that generates memory embedding which then is soft-prompt injected into a frozen LLM. We synthetically curate a dataset of preference and non-preference conversation turns to train our BERT-based classifier. Although our LSTM-based memory encoder did not yield strong results, we find that the BERT-based classifier performs reliably in identifying explicit and implicit user preferences. Our research demonstrates the viability of using preference filtering with LSTM gating principals as an efficient path towards scalable user preference modeling, without extensive overhead and fine-tuning.

Updated: 2025-07-03 07:53:20

标题: 动态长短期记忆基础的长期存储，用于长期LLM互动

摘要: 大型语言模型（LLMs）的记忆存储正成为一个日益活跃的研究领域，特别是为了实现长对话中的个性化。我们提出了Pref-LSTM，这是一个动态轻量级的框架，将基于BERT的分类器与生成记忆嵌入的LSTM记忆模块结合起来，然后将其软提示注入到一个冻结的LLM中。我们人为地筛选了一组偏好和非偏好对话转换的数据集来训练我们基于BERT的分类器。尽管我们基于LSTM的记忆编码器没有产生强大的结果，但我们发现基于BERT的分类器可靠地识别显式和隐式用户偏好。我们的研究表明，使用偏好过滤和LSTM门控原则作为一种有效的路径，可以实现可扩展的用户偏好建模，而无需大量的开销和微调。

更新时间: 2025-07-03 07:53:20

领域: cs.CL,cs.AI,68T05

下载: http://arxiv.org/abs/2507.03042v1

TABNet: A Triplet Augmentation Self-Recovery Framework with Boundary-Aware Pseudo-Labels for Medical Image Segmentation

Background and objective: Medical image segmentation is a core task in various clinical applications. However, acquiring large-scale, fully annotated medical image datasets is both time-consuming and costly. Scribble annotations, as a form of sparse labeling, provide an efficient and cost-effective alternative for medical image segmentation. However, the sparsity of scribble annotations limits the feature learning of the target region and lacks sufficient boundary supervision, which poses significant challenges for training segmentation networks. Methods: We propose TAB Net, a novel weakly-supervised medical image segmentation framework, consisting of two key components: the triplet augmentation self-recovery (TAS) module and the boundary-aware pseudo-label supervision (BAP) module. The TAS module enhances feature learning through three complementary augmentation strategies: intensity transformation improves the model's sensitivity to texture and contrast variations, cutout forces the network to capture local anatomical structures by masking key regions, and jigsaw augmentation strengthens the modeling of global anatomical layout by disrupting spatial continuity. By guiding the network to recover complete masks from diverse augmented inputs, TAS promotes a deeper semantic understanding of medical images under sparse supervision. The BAP module enhances pseudo-supervision accuracy and boundary modeling by fusing dual-branch predictions into a loss-weighted pseudo-label and introducing a boundary-aware loss for fine-grained contour refinement. Results: Experimental evaluations on two public datasets, ACDC and MSCMR seg, demonstrate that TAB Net significantly outperforms state-of-the-art methods for scribble-based weakly supervised segmentation. Moreover, it achieves performance comparable to that of fully supervised methods.

Updated: 2025-07-03 07:50:00

标题: TABNet：一种三元增强自恢复框架，具有边界感知伪标签，用于医学图像分割

摘要: 背景和目标：医学图像分割是各种临床应用中的核心任务。然而，获取大规模、完全注释的医学图像数据集既耗时又昂贵。涂鸦注释作为一种稀疏标记形式，为医学图像分割提供了一种高效且经济实惠的替代方法。然而，涂鸦注释的稀疏性限制了目标区域的特征学习，缺乏足够的边界监督，这给训练分割网络带来了重大挑战。方法：我们提出了TAB Net，一种新颖的弱监督医学图像分割框架，由两个关键组件组成：三元增强自恢复（TAS）模块和边界感知伪标签监督（BAP）模块。TAS模块通过三种互补的增强策略增强特征学习：强化模块对纹理和对比度变化的敏感性，切除模块通过屏蔽关键区域，迫使网络捕捉局部解剖结构，拼图增强模块通过破坏空间连续性来加强全局解剖布局的建模。通过引导网络从多样化的增强输入中恢复完整的掩膜，TAS促进了在稀疏监督下对医学图像的深层语义理解。BAP模块通过将双分支预测融合为加权损失伪标签，引入边界感知损失进行细粒度轮廓细化，增强了伪监督准确性和边界建模。结果：对两个公开数据集ACDC和MSCMR seg的实验评估表明，TAB Net在基于涂鸦的弱监督分割方面明显优于最先进的方法。此外，它实现了与完全监督方法相媲美的性能。

更新时间: 2025-07-03 07:50:00

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.02399v1

Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection

We introduce a deepfake video detection approach that exploits pixel-wise temporal inconsistencies, which traditional spatial frequency-based detectors often overlook. Traditional detectors represent temporal information merely by stacking spatial frequency spectra across frames, resulting in the failure to detect temporal artifacts in the pixel plane. Our approach performs a 1D Fourier transform on the time axis for each pixel, extracting features highly sensitive to temporal inconsistencies, especially in areas prone to unnatural movements. To precisely locate regions containing the temporal artifacts, we introduce an attention proposal module trained in an end-to-end manner. Additionally, our joint transformer module effectively integrates pixel-wise temporal frequency features with spatio-temporal context features, expanding the range of detectable forgery artifacts. Our framework represents a significant advancement in deepfake video detection, providing robust performance across diverse and challenging detection scenarios.

Updated: 2025-07-03 07:49:55

标题: 超越空间频率：基于像素的时域频率深假视频检测

摘要: 我们介绍了一种利用像素级时间不一致性的deepfake视频检测方法，传统的基于空间频率的检测器经常忽视这一点。传统检测器仅通过堆叠帧间的空间频率谱来表示时间信息，导致无法检测到像素平面中的时间伪迹。我们的方法在每个像素上对时间轴执行一维傅里叶变换，提取高度敏感于时间不一致性的特征，特别是在容易出现不自然移动的区域。为了精确定位包含时间伪迹的区域，我们引入了一个端到端训练的注意提议模块。此外，我们的联合变换器模块有效地将像素级时间频率特征与时空上下文特征整合在一起，扩大了可检测伪造物品的范围。我们的框架代表了deepfake视频检测的重大进步，在各种具有挑战性的检测场景中提供了强大的性能。

更新时间: 2025-07-03 07:49:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02398v1

Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement

We explore unsupervised speech enhancement using diffusion models as expressive generative priors for clean speech. Existing approaches guide the reverse diffusion process using noisy speech through an approximate, noise-perturbed likelihood score, combined with the unconditional score via a trade-off hyperparameter. In this work, we propose two alternative algorithms that directly model the conditional reverse transition distribution of diffusion states. The first method integrates the diffusion prior with the observation model in a principled way, removing the need for hyperparameter tuning. The second defines a diffusion process over the noisy speech itself, yielding a fully tractable and exact likelihood score. Experiments on the WSJ0-QUT and VoiceBank-DEMAND datasets demonstrate improved enhancement metrics and greater robustness to domain shifts compared to both supervised and unsupervised baselines.

Updated: 2025-07-03 07:42:02

标题: 无监督扩散式语音增强的后向过渡建模

摘要: 我们探讨了使用扩散模型作为干净语音的表达性生成先验进行无监督语音增强。现有方法通过近似的、受噪声扰动的似然分数引导逆扩散过程，结合无条件分数通过权衡超参数。在这项工作中，我们提出了两种直接建模扩散状态条件逆转换分布的替代算法。第一种方法以原则性的方式将扩散先验与观测模型整合在一起，消除了超参数调整的需要。第二种定义了一个在嘈杂语音上的扩散过程，产生一个完全可计算和准确的似然分数。对WSJ0-QUT和VoiceBank-DEMAND数据集的实验显示，与受监督和无监督基线相比，增强指标得到改善，对领域转移更具鲁棒性。

更新时间: 2025-07-03 07:42:02

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2507.02391v1

MInCo: Mitigating Information Conflicts in Distracted Visual Model-based Reinforcement Learning

Existing visual model-based reinforcement learning (MBRL) algorithms with observation reconstruction often suffer from information conflicts, making it difficult to learn compact representations and hence result in less robust policies, especially in the presence of task-irrelevant visual distractions. In this paper, we first reveal that the information conflicts in current visual MBRL algorithms stem from visual representation learning and latent dynamics modeling with an information-theoretic perspective. Based on this finding, we present a new algorithm to resolve information conflicts for visual MBRL, named MInCo, which mitigates information conflicts by leveraging negative-free contrastive learning, aiding in learning invariant representation and robust policies despite noisy observations. To prevent the dominance of visual representation learning, we introduce time-varying reweighting to bias the learning towards dynamics modeling as training proceeds. We evaluate our method on several robotic control tasks with dynamic background distractions. Our experiments demonstrate that MInCo learns invariant representations against background noise and consistently outperforms current state-of-the-art visual MBRL methods. Code is available at https://github.com/ShiguangSun/minco.

Updated: 2025-07-03 07:41:07

标题: MInCo：在分散的视觉模型强化学习中减轻信息冲突

摘要: 现有的基于视觉模型的强化学习（MBRL）算法通常在观察重建中存在信息冲突，这使得学习紧凑表示变得困难，因此导致策略不够稳健，尤其是在存在与任务无关的视觉干扰时。本文首先揭示了当前视觉MBRL算法中的信息冲突源自视觉表示学习和潜在动态建模的信息论视角。基于这一发现，我们提出了一种新的用于解决视觉MBRL中信息冲突的算法，命名为MInCo，通过利用无负对比学习来减轻信息冲突，有助于学习不变表示和稳健策略，尽管存在嘈杂的观察。为了防止视觉表示学习的主导，我们引入了时变重新加权，以偏向动态建模随训练进行。我们在几个具有动态背景干扰的机器人控制任务上评估了我们的方法。我们的实验表明，MInCo学习了针对背景噪音的不变表示，并始终优于当前最先进的视觉MBRL方法。代码可在https://github.com/ShiguangSun/minco找到。

更新时间: 2025-07-03 07:41:07

领域: cs.LG

下载: http://arxiv.org/abs/2504.04164v3

Evaluating Language Models For Threat Detection in IoT Security Logs

Log analysis is a relevant research field in cybersecurity as they can provide a source of information for the detection of threats to networks and systems. This paper presents a pipeline to use fine-tuned Large Language Models (LLMs) for anomaly detection and mitigation recommendation using IoT security logs. Utilizing classical machine learning classifiers as a baseline, three open-source LLMs are compared for binary and multiclass anomaly detection, with three strategies: zero-shot, few-shot prompting and fine-tuning using an IoT dataset. LLMs give better results on multi-class attack classification than the corresponding baseline models. By mapping detected threats to MITRE CAPEC, defining a set of IoT-specific mitigation actions, and fine-tuning the models with those actions, the models are able to provide a combined detection and recommendation guidance.

Updated: 2025-07-03 07:38:43

标题: 评估语言模型在IoT安全日志中的威胁检测效果

摘要: 日志分析是网络安全领域的一个相关研究领域，因为它们可以为网络和系统威胁的检测提供信息来源。本文提出了一种使用经过微调的大型语言模型（LLMs）进行异常检测和缓解建议的管道，使用物联网安全日志。通过将经典机器学习分类器作为基准线，比较了三种开源LLMs在二进制和多类异常检测中的表现，采用了三种策略：零-shot，少-shot提示和使用IoT数据集进行微调。LLMs在多类攻击分类上的表现优于相应的基准模型。通过将检测到的威胁映射到MITRE CAPEC，定义一组物联网特定的缓解措施，并使用这些措施微调模型，这些模型能够提供结合检测和建议指导。

更新时间: 2025-07-03 07:38:43

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.02390v1

Illuminant and light direction estimation using Wasserstein distance method

Illumination estimation remains a pivotal challenge in image processing, particularly for robotics, where robust environmental perception is essential under varying lighting conditions. Traditional approaches, such as RGB histograms and GIST descriptors, often fail in complex scenarios due to their sensitivity to illumination changes. This study introduces a novel method utilizing the Wasserstein distance, rooted in optimal transport theory, to estimate illuminant and light direction in images. Experiments on diverse images indoor scenes, black-and-white photographs, and night images demonstrate the method's efficacy in detecting dominant light sources and estimating their directions, outperforming traditional statistical methods in complex lighting environments. The approach shows promise for applications in light source localization, image quality assessment, and object detection enhancement. Future research may explore adaptive thresholding and integrate gradient analysis to enhance accuracy, offering a scalable solution for real-world illumination challenges in robotics and beyond.

Updated: 2025-07-03 07:28:48

标题: 使用Wasserstein距离方法估计光源和光照方向

摘要: 光照估计仍然是图像处理中的一个关键挑战，特别是对于机器人技术，其中强大的环境感知在不同光照条件下至关重要。传统方法，如RGB直方图和GIST描述符，往往在复杂场景中失败，因为它们对光照变化敏感。本研究引入了一种利用Wasserstein距离的新方法，根植于最优输运理论，用于估计图像中的光源和光照方向。对各种图像进行的实验，包括室内场景、黑白照片和夜间图像，展示了该方法在检测主要光源和估计其方向方面的有效性，优于复杂照明环境中的传统统计方法。该方法在光源定位、图像质量评估和对象检测增强等应用方面显示出潜力。未来的研究可以探索自适应阈值处理并整合梯度分析以提高准确性，为机器人技术及其他领域中的实际光照挑战提供可扩展的解决方案。

更新时间: 2025-07-03 07:28:48

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.05802v2

An AI-native experimental laboratory for autonomous biomolecular engineering

Autonomous scientific research, capable of independently conducting complex experiments and serving non-specialists, represents a long-held aspiration. Achieving it requires a fundamental paradigm shift driven by artificial intelligence (AI). While autonomous experimental systems are emerging, they remain confined to areas featuring singular objectives and well-defined, simple experimental workflows, such as chemical synthesis and catalysis. We present an AI-native autonomous laboratory, targeting highly complex scientific experiments for applications like autonomous biomolecular engineering. This system autonomously manages instrumentation, formulates experiment-specific procedures and optimization heuristics, and concurrently serves multiple user requests. Founded on a co-design philosophy of models, experiments, and instruments, the platform supports the co-evolution of AI models and the automation system. This establishes an end-to-end, multi-user autonomous laboratory that handles complex, multi-objective experiments across diverse instrumentation. Our autonomous laboratory supports fundamental nucleic acid functions-including synthesis, transcription, amplification, and sequencing. It also enables applications in fields such as disease diagnostics, drug development, and information storage. Without human intervention, it autonomously optimizes experimental performance to match state-of-the-art results achieved by human scientists. In multi-user scenarios, the platform significantly improves instrument utilization and experimental efficiency. This platform paves the way for advanced biomaterials research to overcome dependencies on experts and resource barriers, establishing a blueprint for science-as-a-service at scale.

Updated: 2025-07-03 07:21:19

标题: 一个面向人工智能的自主生物分子工程实验室

摘要: 自主科学研究，能够独立进行复杂实验并为非专家提供服务，一直是人们长期以来的愿望。实现这一目标需要通过人工智能（AI）驱动的基本范式转变。虽然自主实验系统正在兴起，但它们仍然局限于具有单一目标和定义明确、简单实验工作流程的领域，如化学合成和催化。我们提出了一个面向高度复杂科学实验的AI原生自主实验室，用于自主生物分子工程等应用。该系统自主管理仪器，制定实验特定程序和优化启发法，并同时为多个用户请求提供服务。该平台基于模型、实验和仪器的共同设计理念，支持AI模型和自动化系统的共同演化。这建立了一个端到端、多用户的自主实验室，可以处理跨多样仪器的复杂、多目标实验。我们的自主实验室支持包括合成、转录、扩增和测序在内的核酸基本功能。它还可以在疾病诊断、药物开发和信息存储等领域应用。在没有人类干预的情况下，它可以自主优化实验性能，以匹配人类科学家取得的最新成果。在多用户场景下，该平台显著提高了仪器利用率和实验效率。这一平台为先进生物材料研究铺平了道路，克服了对专家和资源的依赖，为规模化的科学即服务奠定了蓝图。

更新时间: 2025-07-03 07:21:19

领域: cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2507.02379v1

Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited

Inducing-point-based sparse variational Gaussian processes have become the standard workhorse for scaling up GP models. Recent advances show that these methods can be improved by introducing a diagonal scaling matrix to the conditional posterior density given the inducing points. This paper first considers an extension that employs a block-diagonal structure for the scaling matrix, provably tightening the variational lower bound. We then revisit the unifying framework of sparse GPs based on Power Expectation Propagation (PEP) and show that it can leverage and benefit from the new structured approximate posteriors. Through extensive regression experiments, we show that the proposed block-diagonal approximation consistently performs similarly to or better than existing diagonal approximations while maintaining comparable computational costs. Furthermore, the new PEP framework with structured posteriors provides competitive performance across various power hyperparameter settings, offering practitioners flexible alternatives to standard variational approaches.

Updated: 2025-07-03 07:18:54

标题: 稀疏高斯过程：结构化近似和Power-EP再探讨

摘要: 基于诱导点的稀疏变分高斯过程已成为扩展GP模型的标准工具。最近的进展表明，通过在给定诱导点的条件后验密度中引入对角缩放矩阵可以改进这些方法。本文首先考虑了一种采用块对角结构的缩放矩阵的扩展，证明了可以收紧变分下界。然后，我们重新审视基于Power Expectation Propagation（PEP）的稀疏GP统一框架，并展示它可以利用并受益于新的结构化近似后验。通过大量回归实验，我们展示了所提出的块对角逼近在保持可比较的计算成本的同时，始终表现出与现有对角逼近相近甚至更好的性能。此外，具有结构后验的新PEP框架在各种功率超参数设置下提供了竞争性性能，为从业者提供了灵活的替代标准变分方法。

更新时间: 2025-07-03 07:18:54

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.02377v1

VeFIA: An Efficient Inference Auditing Framework for Vertical Federated Collaborative Software

Vertical Federated Learning (VFL) is a distributed AI software deployment mechanism for cross-silo collaboration without accessing participants' data. However, existing VFL work lacks a mechanism to audit the execution correctness of the inference software of the data party. To address this problem, we design a Vertical Federated Inference Auditing (VeFIA) framework. VeFIA helps the task party to audit whether the data party's inference software is executed as expected during large-scale inference without leaking the data privacy of the data party or introducing additional latency to the inference system. The core of VeFIA is that the task party can use the inference results from a framework with Trusted Execution Environments (TEE) and the coordinator to validate the correctness of the data party's computation results. VeFIA guarantees that, as long as the abnormal inference exceeds 5.4%, the task party can detect execution anomalies in the inference software with a probability of 99.99%, without incurring any additional online inference latency. VeFIA's random sampling validation achieves 100% positive predictive value, negative predictive value, and true positive rate in detecting abnormal inference. To the best of our knowledge, this is the first paper to discuss the correctness of inference software execution in VFL.

Updated: 2025-07-03 07:17:49

标题: VeFIA：用于垂直联合协作软件的高效推断审计框架

摘要: 垂直联邦学习（VFL）是一种分布式人工智能软件部署机制，用于跨越数据孤岛进行协作，而无需访问参与者的数据。然而，现有的VFL工作缺乏一种机制来审计数据方推理软件的执行正确性。为解决这一问题，我们设计了一个垂直联邦推理审计（VeFIA）框架。VeFIA帮助任务方审计数据方的推理软件在大规模推理中是否按预期执行，同时不泄露数据方的数据隐私或引入额外的延迟到推理系统。VeFIA的核心是任务方可以使用具有可信执行环境（TEE）和协调器的框架来验证数据方计算结果的正确性。VeFIA确保只要异常推理超过5.4％，任务方就可以以99.99％的概率检测到推理软件中的执行异常，而不会导致任何额外的在线推理延迟。VeFIA的随机抽样验证在检测异常推理方面实现了100％的正预测值、负预测值和真正阳性率。据我们所知，这是第一篇讨论VFL中推理软件执行正确性的论文。

更新时间: 2025-07-03 07:17:49

领域: cs.SE,cs.AI,cs.DC

下载: http://arxiv.org/abs/2507.02376v1

Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations

There has been a recent surge of interest in time series modeling using the Transformer architecture. However, forecasting multivariate time series with Transformer presents a unique challenge as it requires modeling both temporal (cross-time) and variate (cross-variate) dependencies. While Transformer-based models have gained popularity for their flexibility in capturing both sequential and cross-variate relationships, it is unclear how to best integrate these two sources of information in the context of the Transformer architecture while optimizing for both performance and efficiency. We re-purpose the Transformer architecture to effectively model both cross-time and cross-variate dependencies. Our approach begins by embedding each variate independently into a variate-wise representation that captures its cross-time dynamics, and then models cross-variate dependencies through attention mechanisms on these learned embeddings. Gating operations in both cross-time and cross-variate modeling phases regulate information flow, allowing the model to focus on the most relevant features for accurate predictions. Our method achieves state-of-the-art performance across 13 real-world datasets and can be seamlessly integrated into other Transformer-based and LLM-based forecasters, delivering performance improvements up to 20.7\% over original models. Code is available at this repository: https://github.com/nyuolab/Gateformer.

Updated: 2025-07-03 07:13:52

标题: Gateformer：通过具有门控表示的时间和变量智能注意力推进多元时间序列预测

摘要: 最近，使用Transformer架构建模时间序列模型的兴趣急剧增加。然而，使用Transformer进行多变量时间序列预测面临着独特的挑战，因为它需要建模时间（跨时间）和变量（跨变量）依赖关系。虽然基于Transformer的模型因其灵活性而受到青睐，能够捕捉顺序和跨变量关系，但在Transformer架构的背景下如何最好地整合这两种信息源，并在性能和效率上进行优化尚不清楚。我们重新设计了Transformer架构，以有效地建模跨时间和跨变量依赖关系。我们的方法首先将每个变量独立地嵌入到一个捕捉其跨时间动态的变量表示中，然后通过对这些学习嵌入的注意机制来建模跨变量依赖关系。在跨时间和跨变量建模阶段中的门控操作调节信息流，使模型能够专注于最相关的特征以实现准确的预测。我们的方法在13个真实世界数据集上实现了最先进的性能，并可以无缝集成到其他基于Transformer和LLM的预测器中，使性能提高高达20.7％。代码可在此存储库获得：https://github.com/nyuolab/Gateformer。

更新时间: 2025-07-03 07:13:52

领域: cs.LG

下载: http://arxiv.org/abs/2505.00307v3

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. The core idea of Optimas is to maintain one Local Reward Function (LRF) per component, each satisfying a local-global alignment property, i.e., each component's local reward correlates with the global system performance. In each iteration, Optimas efficiently adapts the LRFs to maintain this property while simultaneously maximizing each component's local reward. This approach enables independent updates of heterogeneous configurations using the designated optimization method, while ensuring that local improvements consistently lead to performance gains. We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%, offering a general and effective approach for improving compound systems. Our website is at https://optimas.stanford.edu.

Updated: 2025-07-03 07:12:48

标题: Optimas：通过全局对齐本地奖励优化化合物AI系统

摘要: 复合AI系统集成了多个组件，例如大型语言模型、专门工具和传统机器学习模型，越来越多地被部署来解决复杂的现实世界任务。然而，由于它们的非可微结构和组件之间包括提示、超参数和模型参数在内的多样化配置类型，优化复合系统仍然具有挑战性。为了解决这一挑战，我们提出了Optimas，一个用于有效优化复合系统的统一框架。Optimas的核心思想是为每个组件维护一个本地奖励函数（LRF），每个LRF都满足本地-全局对齐属性，即每个组件的本地奖励与全局系统表现相关。在每次迭代中，Optimas有效地调整LRF以保持这一属性，同时最大化每个组件的本地奖励。这种方法使得可以使用指定的优化方法独立更新异构配置，同时确保本地改进始终导致性能提升。我们在五个真实世界的复合系统上进行了广泛评估，结果显示Optimas比强基线平均提高了11.92%，为改进复合系统提供了一种通用有效的方法。我们的网站位于https://optimas.stanford.edu。

更新时间: 2025-07-03 07:12:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03041v1

An average case efficient algorithm for solving two-variable linear Diophantine equations

Solving two-variable linear Diophantine equations has applications in many cryptographic protocols such as RSA and Elliptic curve cryptography. The Extended Euclid's algorithm is the most widely used algorithm to solve these equations. We revisit two algorithms to solve two-variable linear Diophantine equations. We write the iterative version of one of the revisited algorithms. For another, we do a fine-grained analysis of the number of recursive calls and arrive at a periodic function that represents the number of recursive calls. We find the period and use it to derive an accurate closed-form expression for the average number of recursive calls incurred by that algorithm. We find multiple loose upper bounds on the average number of recursive calls in different cases based on whether a solution exists or not. We find that for a fixed value of $a,b$ and a varying $c$, such that the equation $ax+by=c$ (where $a > b$) is solvable, we can find the solution in $O\left(\frac{\log b}{gcd(a,b)}\right)$ average number of recursions or steps. We computationally evaluate this bound as well as one more upper bound and compare them with the average number of recursive calls in Extended Euclid's algorithm on a number of random $512$-bit inputs. We observe that the average number of iterations in the analyzed algorithm decreases with an increase in $gcd(a,b)$. We propose an iterative version of the algorithm. We implement this algorithm and find that the average number of iterations by our algorithm is less than that of two existing algorithms.

Updated: 2025-07-03 07:08:25

标题: 解决两变量线性丢番图方程的平均情况高效算法

摘要: 解决两变量线性丢番图方程在许多加密协议中具有应用，如RSA和椭圆曲线密码学。扩展欧几里得算法是解决这些方程最广泛使用的算法。我们重新审视了两个解决两变量线性丢番图方程的算法。我们编写了重新审视算法之一的迭代版本。对于另一个算法，我们对递归调用的次数进行了细粒度分析，并得出了代表递归调用次数的周期函数。我们找到了这个周期，并用它推导出该算法产生的平均递归调用次数的准确闭合形式表达式。我们根据是否存在解，在不同情况下找到了平均递归调用次数的多个松散上界。我们发现，对于一个固定的$a,b$值和一个变化的$c$值，使得方程$ax+by=c$（其中$a > b$）可解，我们可以在$O\left(\frac{\log b}{gcd(a,b)}\right)$的平均递归次数或步骤内找到解。我们计算评估了这个上界以及另一个上界，并将它们与在许多随机的$512$位输入上扩展欧几里得算法中的平均递归调用次数进行了比较。我们观察到，在分析的算法中，平均迭代次数随着$gcd(a,b)$的增加而减少。我们提出了该算法的迭代版本。我们实施了这个算法，并发现我们的算法的平均迭代次数比现有的两个算法少。

更新时间: 2025-07-03 07:08:25

领域: cs.CR,cs.DS,math.NT

下载: http://arxiv.org/abs/2409.14052v5

Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning

Remote Sensing Image Captioning (RSIC) is the process of generating meaningful descriptions from remote sensing images. Recently, it has gained significant attention, with encoder-decoder models serving as the backbone for generating meaningful captions. The encoder extracts essential visual features from the input image, transforming them into a compact representation, while the decoder utilizes this representation to generate coherent textual descriptions. Recently, transformer-based models have gained significant popularity due to their ability to capture long-range dependencies and contextual information. The decoder has been well explored for text generation, whereas the encoder remains relatively unexplored. However, optimizing the encoder is crucial as it directly influences the richness of extracted features, which in turn affects the quality of generated captions. To address this gap, we systematically evaluate twelve different convolutional neural network (CNN) architectures within a transformer-based encoder framework to assess their effectiveness in RSIC. The evaluation consists of two stages: first, a numerical analysis categorizes CNNs into different clusters, based on their performance. The best performing CNNs are then subjected to human evaluation from a human-centric perspective by a human annotator. Additionally, we analyze the impact of different search strategies, namely greedy search and beam search, to ensure the best caption. The results highlight the critical role of encoder selection in improving captioning performance, demonstrating that specific CNN architectures significantly enhance the quality of generated descriptions for remote sensing images. By providing a detailed comparison of multiple encoders, this study offers valuable insights to guide advances in transformer-based image captioning models.

Updated: 2025-07-03 07:07:36

标题: 优秀的表示，更好的解释：卷积神经网络在基于Transformer的遥感图像标题生成中的作用

摘要: 遥感图像描述（RSIC）是从遥感图像生成有意义描述的过程。最近，它引起了人们的重视，编码器-解码器模型作为生成有意义描述的基础。编码器从输入图像中提取关键的视觉特征，将它们转换成紧凑的表示，而解码器利用这种表示来生成连贯的文本描述。最近，基于transformer的模型因其捕捉长距离依赖性和语境信息的能力而受到了广泛关注。解码器已经在文本生成方面进行了深入研究，而编码器仍相对未开发。然而，优化编码器是至关重要的，因为它直接影响提取特征的丰富程度，进而影响生成描述的质量。为填补这一空白，我们在基于transformer的编码器框架中系统评估了十二种不同的卷积神经网络（CNN）架构，以评估它们在RSIC中的有效性。评估分为两个阶段：首先，通过数值分析将CNNs分类为不同的簇，基于它们的性能。然后，最佳性能的CNNs将从人类关注的角度由人类注释者进行人工评估。此外，我们分析了不同搜索策略（贪婪搜索和束搜索）的影响，以确保最佳描述。结果突显了编码器选择在提高字幕性能方面的关键作用，表明特定的CNN架构显着提高了对遥感图像生成描述的质量。通过提供多个编码器的详细比较，本研究为指导基于transformer的图像描述模型的进展提供了有价值的见解。

更新时间: 2025-07-03 07:07:36

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.16095v2

Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes

Much of the cost and effort required during the software testing process is invested in performing test maintenance - the addition, removal, or modification of test cases to keep the test suite in sync with the system-under-test or to otherwise improve its quality. Tool support could reduce the cost - and improve the quality - of test maintenance by automating aspects of the process or by providing guidance and support to developers. In this study, we explore the capabilities and applications of large language models (LLMs) - complex machine learning models adapted to textual analysis - to support test maintenance. We conducted a case study at Ericsson AB where we explore the triggers that indicate the need for test maintenance, the actions that LLMs can take, and the considerations that must be made when deploying LLMs in an industrial setting. We also propose and demonstrate a multi-agent architecture that can predict which tests require maintenance following a change to the source code. Collectively, these contributions advance our theoretical and practical understanding of how LLMs can be deployed to benefit industrial test maintenance processes.

Updated: 2025-07-03 07:06:08

标题: 探索大型语言模型在工业测试维护流程中的整合

摘要: 在软件测试过程中，大部分成本和工作量都是用于进行测试维护 - 即添加、删除或修改测试用例，以保持测试套件与被测试系统同步，或以其他方式改进其质量。工具支持可以通过自动化流程的某些方面或为开发人员提供指导和支持来减少成本和提高测试维护的质量。在这项研究中，我们探讨了大语言模型（LLMs）的能力和应用 - 这是针对文本分析而调整的复杂机器学习模型，用于支持测试维护。我们在爱立信AB进行了一项案例研究，探讨了表明需要测试维护的触发器、LLMs可以采取的行动，以及在工业环境中部署LLMs时必须考虑的因素。我们还提出并演示了一种多代理架构，可以预测在源代码更改后哪些测试需要进行维护。总的来说，这些贡献推动了我们对LLMs如何部署以使工业测试维护流程受益的理论和实践理解。

更新时间: 2025-07-03 07:06:08

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.06416v2

Deep Reinforcement Learning-Based DRAM Equalizer Parameter Optimization Using Latent Representations

Equalizer parameter optimization for signal integrity in high-speed Dynamic Random Access Memory systems is crucial but often computationally demanding or model-reliant. This paper introduces a data-driven framework employing learned latent signal representations for efficient signal integrity evaluation, coupled with a model-free Advantage Actor-Critic reinforcement learning agent for parameter optimization. The latent representation captures vital signal integrity features, offering a fast alternative to direct eye diagram analysis during optimization, while the reinforcement learning agent derives optimal equalizer settings without explicit system models. Applied to industry-standard Dynamic Random Access Memory waveforms, the method achieved significant eye-opening window area improvements: 42.7\% for cascaded Continuous-Time Linear Equalizer and Decision Feedback Equalizer structures, and 36.8\% for Decision Feedback Equalizer-only configurations. These results demonstrate superior performance, computational efficiency, and robust generalization across diverse Dynamic Random Access Memory units compared to existing techniques. Core contributions include an efficient latent signal integrity metric for optimization, a robust model-free reinforcement learning strategy, and validated superior performance for complex equalizer architectures.

Updated: 2025-07-03 06:53:51

标题: 基于深度强化学习的DRAM均衡器参数优化利用潜在表示

摘要: 高速动态随机存取存储器系统中的信号完整性的均衡器参数优化是至关重要的，但通常在计算上具有挑战性或依赖于模型。本文介绍了一种采用学习的潜在信号表示的数据驱动框架，用于高效的信号完整性评估，结合了无模型的优势演员-评论家强化学习代理进行参数优化。潜在表示捕获了关键的信号完整性特征，为优化过程提供了快速的替代方法，而强化学习代理在没有显式系统模型的情况下导出最佳均衡器设置。应用于行业标准的动态随机存取存储器波形，该方法实现了显著的开眼窗口面积改善：级联连续时间线性均衡器和决策反馈均衡器结构提高了42.7％，仅决策反馈均衡器配置提高了36.8％。这些结果表明相较于现有技术，该方法在不同的动态随机存取存储器单元中表现出更优越的性能、计算效率和稳健的泛化能力。核心贡献包括用于优化的高效潜在信号完整性度量、稳健的无模型强化学习策略，以及验证了对复杂均衡器架构的优越性能。

更新时间: 2025-07-03 06:53:51

领域: cs.LG

下载: http://arxiv.org/abs/2507.02365v1

Detection of Rail Line Track and Human Beings Near the Track to Avoid Accidents

This paper presents an approach for rail line detection and the identification of human beings in proximity to the track, utilizing the YOLOv5 deep learning model to mitigate potential accidents. The technique incorporates real-time video data to identify railway tracks with impressive accuracy and recognizes nearby moving objects within a one-meter range, specifically targeting the identification of humans. This system aims to enhance safety measures in railway environments by providing real-time alerts for any detected human presence close to the track. The integration of a functionality to identify objects at a longer distance further fortifies the preventative capabilities of the system. With a precise focus on real-time object detection, this method is poised to deliver significant contributions to the existing technologies in railway safety. The effectiveness of the proposed method is demonstrated through a comprehensive evaluation, yielding a remarkable improvement in accuracy over existing methods. These results underscore the potential of this approach to revolutionize safety measures in railway environments, providing a substantial contribution to accident prevention strategies.

Updated: 2025-07-03 06:53:27

标题: 检测铁路轨道和靠近轨道的人员以避免事故

摘要: 本文提出了一种铁路线检测和识别靠近铁轨的人的方法，利用YOLOv5深度学习模型来减轻潜在事故。该技术利用实时视频数据，以令人印象深刻的准确性识别铁路轨道，并识别一米范围内的移动物体，特别针对人类的识别。该系统旨在通过提供任何检测到靠近铁轨的人的实时警报来增强铁路环境的安全措施。将识别远距离物体的功能集成进来，进一步加强了系统的预防能力。通过精确关注实时物体检测，这种方法有望对铁路安全领域的现有技术做出重大贡献。通过全面评估，提出的方法的有效性得到了证明，与现有方法相比，准确性有了显著改善。这些结果强调了这种方法改革铁路环境安全措施的潜力，为事故预防策略提供了重要贡献。

更新时间: 2025-07-03 06:53:27

领域: cs.CV,cs.LG,68T10,I.2.10; I.4.8

下载: http://arxiv.org/abs/2507.03040v1

Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning

Distantly-Supervised Named Entity Recognition (DS-NER) is widely used in real-world scenarios. It can effectively alleviate the burden of annotation by matching entities in existing knowledge bases with snippets in the text but suffer from the label noise. Recent works attempt to adopt the teacher-student framework to gradually refine the training labels and improve the overall robustness. However, these teacher-student methods achieve limited performance because the poor calibration of the teacher network produces incorrectly pseudo-labeled samples, leading to error propagation. Therefore, we propose: (1) Uncertainty-Aware Teacher Learning that leverages the prediction uncertainty to reduce the number of incorrect pseudo labels in the self-training stage; (2) Student-Student Collaborative Learning that allows the transfer of reliable labels between two student networks instead of indiscriminately relying on all pseudo labels from its teacher, and further enables a full exploration of mislabeled samples rather than simply filtering unreliable pseudo-labeled samples. We evaluate our proposed method on five DS-NER datasets, demonstrating that our method is superior to the state-of-the-art DS-NER methods.

Updated: 2025-07-03 06:51:31

标题: 通过基于不确定性的教师学习和学生-学生协作学习提高远程监督命名实体识别的鲁棒性

摘要: 遥监辅助命名实体识别（DS-NER）被广泛应用于现实世界情景中。它可以通过将现有知识库中的实体与文本中的片段匹配来有效减轻注释负担，但存在标签噪声的问题。最近的研究尝试采用师生框架逐步完善训练标签并提高整体鲁棒性。然而，这些师生方法的性能有限，因为师网络的不良校准会产生错误的伪标记样本，导致错误传播。因此，我们提出了以下解决方案：（1）利用预测不确定性减少自我训练阶段中错误伪标签数量的“不确定性感知师学习”；（2）允许两个学生网络之间可靠标签的转移，而不是盲目依赖其师的所有伪标签的“学生-学生协作学习”，进一步实现对错误标记样本的全面探索，而不仅仅是过滤不可靠的伪标记样本。我们在五个DS-NER数据集上评估了我们提出的方法，结果表明我们的方法优于最先进的DS-NER方法。

更新时间: 2025-07-03 06:51:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.08010v3

Enhancing Swarms Durability to Threats via Graph Signal Processing and GNN-based Generative Modeling

Swarms, such as schools of fish or drone formations, are prevalent in both natural and engineered systems. While previous works have focused on the social interactions within swarms, the role of external perturbations--such as environmental changes, predators, or communication breakdowns--in affecting swarm stability is not fully understood. Our study addresses this gap by modeling swarms as graphs and applying graph signal processing techniques to analyze perturbations as signals on these graphs. By examining predation, we uncover a "detectability-durability trade-off", demonstrating a tension between a swarm's ability to evade detection and its resilience to predation, once detected. We provide theoretical and empirical evidence for this trade-off, explicitly tying it to properties of the swarm's spatial configuration. Toward task-specific optimized swarms, we introduce SwaGen, a graph neural network-based generative model. We apply SwaGen to resilient swarm generation by defining a task-specific loss function, optimizing the contradicting trade-off terms simultaneously.With this, SwaGen reveals novel spatial configurations, optimizing the trade-off at both ends. Applying the model can guide the design of robust artificial swarms and deepen our understanding of natural swarm dynamics.

Updated: 2025-07-03 06:47:49

标题: 通过图信号处理和基于GNN的生成建模增强群体对威胁的耐久性

摘要: 群体，如鱼群或无人机编队等，普遍存在于自然和工程系统中。虽然先前的研究集中在群体内部的社会互动上，但外部扰动（如环境变化、捕食者或通信中断）对群体稳定性的影响尚未完全了解。我们的研究通过将群体建模为图形，并应用图信号处理技术来分析这些图上的扰动信号，填补了这一空白。通过研究捕食行为，我们发现了“可检测性-持久性权衡”，展示了群体在逃避检测和一旦被发现后对捕食的抵抗力之间的张力。我们提供了这种权衡的理论和实证证据，明确将其与群体的空间配置属性联系起来。为了生成特定任务优化的群体，我们引入了SwaGen，一种基于图神经网络的生成模型。我们将SwaGen应用于弹性群体生成，通过定义一个特定任务的损失函数，同时优化矛盾的权衡术语。通过这一方法，SwaGen揭示了新颖的空间配置，同时优化了两端的权衡。应用该模型可以指导健壮的人工群体的设计，并加深我们对自然群体动态的理解。

更新时间: 2025-07-03 06:47:49

领域: q-bio.QM,cs.LG,physics.bio-ph

下载: http://arxiv.org/abs/2507.03039v1

Offline Reinforcement Learning with Penalized Action Noise Injection

Offline reinforcement learning (RL) optimizes a policy using only a fixed dataset, making it a practical approach in scenarios where interaction with the environment is costly. Due to this limitation, generalization ability is key to improving the performance of offline RL algorithms, as demonstrated by recent successes of offline RL with diffusion models. However, it remains questionable whether such diffusion models are necessary for highly performing offline RL algorithms, given their significant computational requirements during inference. In this paper, we propose Penalized Action Noise Injection (PANI), a method that simply enhances offline learning by utilizing noise-injected actions to cover the entire action space, while penalizing according to the amount of noise injected. This approach is inspired by how diffusion models have worked in offline RL algorithms. We provide a theoretical foundation for this method, showing that offline RL algorithms with such noise-injected actions solve a modified Markov Decision Process (MDP), which we call the noisy action MDP. PANI is compatible with a wide range of existing off-policy and offline RL algorithms, and despite its simplicity, it demonstrates significant performance improvements across various benchmarks.

Updated: 2025-07-03 06:41:03

标题: 使用惩罚动作噪声注入的离线强化学习

摘要: 离线强化学习（RL）仅使用固定数据集来优化策略，使其成为在与环境交互成本高昂的情况下的实用方法。由于这种限制，泛化能力对改善离线RL算法的性能至关重要，最近离线RL与扩散模型的成功证明了这一点。然而，尽管扩散模型在离线RL算法中发挥作用，但在推断过程中需要大量计算资源，因此是否需要这样的扩散模型来提高高性能的离线RL算法仍然存在疑问。在本文中，我们提出了惩罚动作噪声注入（PANI）方法，通过利用注入噪声的动作来覆盖整个动作空间，同时根据注入噪声的数量进行惩罚，从而简单地增强离线学习。该方法受到扩散模型在离线RL算法中的工作方式的启发。我们为这种方法提供了理论基础，表明具有这种注入噪声动作的离线RL算法解决了一种修改后的马尔可夫决策过程（MDP），我们称之为有噪声动作的MDP。PANI与各种现有的离策略和离线RL算法兼容，尽管其简单性，但在各种基准测试中表现出显著的性能改善。

更新时间: 2025-07-03 06:41:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.02356v1

Text-Aware Image Restoration with Diffusion Models

Image restoration aims to recover degraded images. However, existing diffusion-based restoration methods, despite great success in natural image restoration, often struggle to faithfully reconstruct textual regions in degraded images. Those methods frequently generate plausible but incorrect text-like patterns, a phenomenon we refer to as text-image hallucination. In this paper, we introduce Text-Aware Image Restoration (TAIR), a novel restoration task that requires the simultaneous recovery of visual contents and textual fidelity. To tackle this task, we present SA-Text, a large-scale benchmark of 100K high-quality scene images densely annotated with diverse and complex text instances. Furthermore, we propose a multi-task diffusion framework, called TeReDiff, that integrates internal features from diffusion models into a text-spotting module, enabling both components to benefit from joint training. This allows for the extraction of rich text representations, which are utilized as prompts in subsequent denoising steps. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art restoration methods, achieving significant gains in text recognition accuracy. See our project page: https://cvlab-kaist.github.io/TAIR/

Updated: 2025-07-03 06:39:43

标题: 使用扩散模型的文本感知图像恢复

摘要: 图像恢复旨在恢复受损图像。然而，尽管扩散为基础的恢复方法在自然图像恢复方面取得了巨大成功，但通常难以忠实地重建受损图像中的文本区域。这些方法经常生成似是而非但不正确的文本样式，我们将这种现象称为文本-图像幻觉。在本文中，我们介绍了一种新的恢复任务，即文本感知图像恢复（TAIR），这需要同时恢复视觉内容和文本保真度。为了解决这个任务，我们提出了一个名为SA-Text的大规模基准测试，其中包含了100K张高质量的场景图像，密集地注释了各种复杂的文本实例。此外，我们提出了一个名为TeReDiff的多任务扩散框架，将扩散模型的内部特征整合到一个文本定位模块中，使两个组件都能从联合训练中受益。这允许提取丰富的文本表示，这些表示被用作后续去噪步骤中的提示。广泛的实验证明，我们的方法始终优于最先进的恢复方法，在文本识别准确性方面取得了显著的进展。请查看我们的项目页面：https://cvlab-kaist.github.io/TAIR/

更新时间: 2025-07-03 06:39:43

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.09993v2

Incorporating LLMs for Large-Scale Urban Complex Mobility Simulation

This study presents an innovative approach to urban mobility simulation by integrating a Large Language Model (LLM) with Agent-Based Modeling (ABM). Unlike traditional rule-based ABM, the proposed framework leverages LLM to enhance agent diversity and realism by generating synthetic population profiles, allocating routine and occasional locations, and simulating personalized routes. Using real-world data, the simulation models individual behaviors and large-scale mobility patterns in Taipei City. Key insights, such as route heat maps and mode-specific indicators, provide urban planners with actionable information for policy-making. Future work focuses on establishing robust validation frameworks to ensure accuracy and reliability in urban planning applications.

Updated: 2025-07-03 06:38:34

标题: 将LLM纳入大规模城市综合移动性模拟

摘要: 这项研究提出了一种创新的城市移动模拟方法，通过将大型语言模型（LLM）与基于代理的建模（ABM）相结合。与传统基于规则的ABM不同，所提出的框架利用LLM增强代理多样性和逼真度，通过生成合成人口配置常规和偶发位置，并模拟个性化路线。利用真实世界数据，在台北市模拟个体行为和大规模移动模式。关键见解，如路线热图和特定模式指标，为城市规划者提供可操作的信息以制定政策。未来工作重点是建立健壮的验证框架，以确保城市规划应用的准确性和可靠性。

更新时间: 2025-07-03 06:38:34

领域: cs.MA,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2505.21880v2

OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM Agent

Keyword decision in Sponsored Search Advertising is critical to the success of ad campaigns. While LLM-based methods offer automated keyword generation, they face three major limitations: reliance on large-scale query-keyword pair data, lack of online multi-objective performance monitoring and optimization, and weak quality control in keyword selection. These issues hinder the agentic use of LLMs in fully automating keyword decisions by monitoring and reasoning over key performance indicators such as impressions, clicks, conversions, and CTA effectiveness. To overcome these challenges, we propose OMS, a keyword generation framework that is On-the-fly (requires no training data, monitors online performance, and adapts accordingly), Multi-objective (employs agentic reasoning to optimize keywords based on multiple performance metrics), and Self-reflective (agentically evaluates keyword quality). Experiments on benchmarks and real-world ad campaigns show that OMS outperforms existing methods; ablation and human evaluations confirm the effectiveness of each component and the quality of generated keywords.

Updated: 2025-07-03 06:37:55

标题: OMS: 基于LLM代理的即时、多目标、自我反思广告关键词生成

摘要: 在赞助搜索广告中，关键词决策对广告活动的成功至关重要。虽然基于LLM的方法提供了自动关键词生成，但面临三个主要限制：依赖大规模查询-关键词对数据、缺乏在线多目标性能监控和优化，以及在关键词选择中弱的质量控制。这些问题阻碍了LLM在通过监控和推理关键性能指标（如展示量、点击量、转化次数和CTA效果）实现完全自动化关键词决策的能力。为了克服这些挑战，我们提出了OMS，这是一个关键词生成框架，具有即时性（不需要训练数据，监控在线性能，并相应适应）、多目标性（利用代理推理基于多个性能指标优化关键词）和自我反思性（代理评估关键词质量）。基准和实际广告活动的实验证明OMS优于现有方法；消融和人工评估证实了每个组件的有效性和生成的关键词质量。

更新时间: 2025-07-03 06:37:55

领域: cs.AI

下载: http://arxiv.org/abs/2507.02353v1

Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings from LLMs

Extracting sentence embeddings from large language models (LLMs) is a promising direction, as LLMs have demonstrated stronger semantic understanding capabilities. Previous studies typically focus on prompt engineering to elicit sentence embeddings from LLMs by prompting the model to encode sentence information into the embedding of the last token. However, LLMs are mostly decoder-only models with causal attention and the earlier tokens in the sentence cannot attend to the latter tokens, resulting in biased encoding of sentence information and cascading effects on the final decoded token. To this end, we propose a novel Token Prepending (TP) technique that prepends each layer's decoded sentence embedding to the beginning of the sentence in the next layer's input, allowing earlier tokens to attend to the complete sentence information under the causal attention mechanism. The proposed TP technique is a plug-and-play and training-free technique, which means it can be seamlessly integrated with various prompt-based sentence embedding methods and autoregressive LLMs. Extensive experiments on various Semantic Textual Similarity (STS) tasks and downstream classification tasks demonstrate that our proposed TP technique can significantly improve the performance of existing prompt-based sentence embedding methods across different LLMs, while incurring negligible additional inference cost.

Updated: 2025-07-03 06:24:49

标题: 令牌前置：一种无需训练的方法，用于从LLMs中获取更好的句子嵌入

摘要: 从大型语言模型（LLMs）中提取句子嵌入是一个有前途的方向，因为LLMs已经展示出更强的语义理解能力。先前的研究通常侧重于提示工程，通过提示模型将句子信息编码到最后一个标记的嵌入中，来引出从LLMs中提取句子嵌入。然而，LLMs大多是仅解码器模型，具有因果注意机制，句子中较早的标记无法注意到后面的标记，导致句子信息的偏置编码，并对最终解码的标记产生级联影响。因此，我们提出了一种新颖的Token Prepending（TP）技术，将每一层解码的句子嵌入添加到下一层输入的句子开头，使得较早的标记可以在因果注意机制下注意到完整的句子信息。所提出的TP技术是一种即插即用且无需训练的技术，这意味着它可以与各种基于提示的句子嵌入方法和自回归LLMs无缝集成。在各种语义文本相似性（STS）任务和下游分类任务上的大量实验表明，我们提出的TP技术可以显著提高现有基于提示的句子嵌入方法在不同LLMs上的性能，同时带来可忽略的额外推理成本。

更新时间: 2025-07-03 06:24:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2412.11556v2

Two-Steps Neural Networks for an Automated Cerebrovascular Landmark Detection

Intracranial aneurysms (ICA) commonly occur in specific segments of the Circle of Willis (CoW), primarily, onto thirteen major arterial bifurcations. An accurate detection of these critical landmarks is necessary for a prompt and efficient diagnosis. We introduce a fully automated landmark detection approach for CoW bifurcations using a two-step neural networks process. Initially, an object detection network identifies regions of interest (ROIs) proximal to the landmark locations. Subsequently, a modified U-Net with deep supervision is exploited to accurately locate the bifurcations. This two-step method reduces various problems, such as the missed detections caused by two landmarks being close to each other and having similar visual characteristics, especially when processing the complete MRA Time-of-Flight (TOF). Additionally, it accounts for the anatomical variability of the CoW, which affects the number of detectable landmarks per scan. We assessed the effectiveness of our approach using two cerebral MRA datasets: our In-House dataset which had varying numbers of landmarks, and a public dataset with standardized landmark configuration. Our experimental results demonstrate that our method achieves the highest level of performance on a bifurcation detection task.

Updated: 2025-07-03 06:23:38

标题: 一种用于自动化脑血管标志检测的双步神经网络

摘要: 颅内动脉瘤（ICA）常发生在Willis环（CoW）的特定部位，主要是十三个主要动脉分叉处。准确检测这些关键标志对及时和高效的诊断是必要的。我们引入了一种完全自动化的CoW分叉标志检测方法，使用两步神经网络处理过程。首先，一个目标检测网络识别靠近标志位置的感兴趣区域（ROIs）。随后，利用深度监督的修改U-Net准确定位分叉处。这种两步方法减少了各种问题，例如由于两个标志靠近且具有相似的视觉特征而导致的漏检，特别是在处理完整的MRA飞行时间（TOF）时。此外，它考虑了CoW的解剖变异，这会影响每次扫描中可检测到的标志数。我们使用两个脑MRA数据集评估了我们方法的有效性：我们的内部数据集具有不同数量的标志，以及一个具有标准化标志配置的公共数据集。我们的实验结果表明，我们的方法在分叉检测任务上实现了最高水平的性能。

更新时间: 2025-07-03 06:23:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02349v1

HelixDesign-Antibody: A Scalable Production-Grade Platform for Antibody Design Built on HelixFold3

Antibody engineering is essential for developing therapeutics and advancing biomedical research. Traditional discovery methods often rely on time-consuming and resource-intensive experimental screening. To enhance and streamline this process, we introduce a production-grade, high-throughput platform built on HelixFold3, HelixDesign-Antibody, which utilizes the high-accuracy structure prediction model, HelixFold3. The platform facilitates the large-scale generation of antibody candidate sequences and evaluates their interaction with antigens. Integrated high-performance computing (HPC) support enables high-throughput screening, addressing challenges such as fragmented toolchains and high computational demands. Validation on multiple antigens showcases the platform's ability to generate diverse and high-quality antibodies, confirming a scaling law where exploring larger sequence spaces increases the likelihood of identifying optimal binders. This platform provides a seamless, accessible solution for large-scale antibody design and is available via the antibody design page of PaddleHelix platform.

Updated: 2025-07-03 06:13:23

标题: HelixDesign-Antibody：一个基于HelixFold3构建的用于抗体设计的可扩展生产级平台

摘要: 抗体工程对于开发治疗药物和推动生物医学研究至关重要。传统的发现方法通常依赖于耗时和资源密集的实验筛选。为了增强和简化这一过程，我们引入了一个基于HelixFold3、HelixDesign-Antibody构建的生产级高通量平台，该平台利用高精度结构预测模型HelixFold3。该平台促进了抗体候选序列的大规模生成，并评估其与抗原的相互作用。集成的高性能计算（HPC）支持实现了高通量筛选，解决了碎片化的工具链和高计算需求等挑战。多个抗原的验证展示了该平台生成多样且高质量抗体的能力，证实了一个扩展规律，即探索更大的序列空间增加了找到最佳结合物的可能性。该平台为大规模抗体设计提供了无缝、易于访问的解决方案，并可通过PaddleHelix平台的抗体设计页面获得。

更新时间: 2025-07-03 06:13:23

领域: q-bio.BM,cs.AI

下载: http://arxiv.org/abs/2507.02345v1

Privacy-Preserving Operating Room Workflow Analysis using Digital Twins

The operating room (OR) is a complex environment where optimizing workflows is critical to reduce costs and improve patient outcomes. While computer vision approaches for automatic recognition of perioperative events can identify bottlenecks for OR optimization, privacy concerns limit the use of OR videos for automated event detection. We propose a two-stage pipeline for privacy-preserving OR video analysis and event detection. First, we leverage vision foundation models for depth estimation and semantic segmentation to generate de-identified Digital Twins (DT) of the OR from conventional RGB videos. Second, we employ the SafeOR model, a fused two-stream approach that processes segmentation masks and depth maps for OR event detection. Evaluation on an internal dataset of 38 simulated surgical trials with five event classes shows that our DT-based approach achieves performance on par with -- and sometimes better than -- raw RGB video-based models for OR event detection. Digital Twins enable privacy-preserving OR workflow analysis, facilitating the sharing of de-identified data across institutions and potentially enhancing model generalizability by mitigating domain-specific appearance differences.

Updated: 2025-07-03 06:11:55

标题: 使用数字孪生技术进行隐私保护的手术室工作流程分析

摘要: 手术室(OR)是一个复杂的环境，优化工作流程对于降低成本和改善患者结果至关重要。虽然计算机视觉方法可以自动识别围手术期事件，从而确定OR优化的瓶颈，但隐私问题限制了将OR视频用于自动事件检测。我们提出了一个用于保护隐私的OR视频分析和事件检测的两阶段流程。首先，我们利用视觉基础模型进行深度估计和语义分割，从传统的RGB视频中生成OR的去身份化数字孪生体(DT)。其次，我们采用SafeOR模型，这是一个融合了两个流的方法，用于处理分割掩模和深度图，用于OR事件检测。在一个包含38个模拟手术试验和五种事件类别的内部数据集上评估表明，我们基于DT的方法在OR事件检测方面表现与原始RGB视频模型相当，有时甚至更好。数字孪生体实现了保护隐私的OR工作流程分析，促进了跨机构之间去身份化数据的共享，并通过减轻领域特定外观差异，可能提高模型的泛化能力。

更新时间: 2025-07-03 06:11:55

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.12552v2

UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security

As modern networks grow increasingly complex--driven by diverse devices, encrypted protocols, and evolving threats--network traffic analysis has become critically important. Existing machine learning models often rely only on a single representation of packets or flows, limiting their ability to capture the contextual relationships essential for robust analysis. Furthermore, task-specific architectures for supervised, semi-supervised, and unsupervised learning lead to inefficiencies in adapting to varying data formats and security tasks. To address these gaps, we propose UniNet, a unified framework that introduces a novel multi-granular traffic representation (T-Matrix), integrating session, flow, and packet-level features to provide comprehensive contextual information. Combined with T-Attent, a lightweight attention-based model, UniNet efficiently learns latent embeddings for diverse security tasks. Extensive evaluations across four key network security and privacy problems--anomaly detection, attack classification, IoT device identification, and encrypted website fingerprinting--demonstrate UniNet's significant performance gain over state-of-the-art methods, achieving higher accuracy, lower false positive rates, and improved scalability. By addressing the limitations of single-level models and unifying traffic analysis paradigms, UniNet sets a new benchmark for modern network security.

Updated: 2025-07-03 06:09:01

标题: UniNet：一种用于网络安全的统一多粒度流量建模框架

摘要: 随着现代网络变得越来越复杂，受到多样化设备、加密协议和不断发展的威胁的驱动，网络流量分析变得至关重要。现有的机器学习模型通常仅依赖于数据包或流的单一表示，限制了它们捕获关键上下文关系的能力。此外，针对监督、半监督和无监督学习的任务特定体系结构导致在适应不同数据格式和安全任务方面效率低下。为了解决这些差距，我们提出了UniNet，这是一个统一的框架，引入了一种新颖的多颗粒度流量表示（T-Matrix），集成了会话、流和数据包级特征，提供全面的上下文信息。结合T-Attent，一个基于轻量级注意力机制的模型，UniNet有效地学习了各种安全任务的潜在嵌入。在四个关键网络安全和隐私问题（异常检测、攻击分类、物联网设备识别和加密网站指纹识别）上进行的广泛评估表明，UniNet在性能上显著优于现有方法，实现了更高的准确性、更低的误报率和更好的可扩展性。通过解决单一级别模型的限制并统一流量分析范式，UniNet为现代网络安全设立了新的基准。

更新时间: 2025-07-03 06:09:01

领域: cs.CR,cs.LG,cs.NI

下载: http://arxiv.org/abs/2503.04174v2

DeltaSHAP: Explaining Prediction Evolutions in Online Patient Monitoring with Shapley Values

This study proposes DeltaSHAP, a novel explainable artificial intelligence (XAI) algorithm specifically designed for online patient monitoring systems. In clinical environments, discovering the causes driving patient risk evolution is critical for timely intervention, yet existing XAI methods fail to address the unique requirements of clinical time series explanation tasks. To this end, DeltaSHAP addresses three key clinical needs: explaining the changes in the consecutive predictions rather than isolated prediction scores, providing both magnitude and direction of feature attributions, and delivering these insights in real time. By adapting Shapley values to temporal settings, our approach accurately captures feature coalition effects. It further attributes prediction changes using only the actually observed feature combinations, making it efficient and practical for time-sensitive clinical applications. We also introduce new evaluation metrics to evaluate the faithfulness of the attributions for online time series, and demonstrate through experiments on online patient monitoring tasks that DeltaSHAP outperforms state-of-the-art XAI methods in both explanation quality as 62% and computational efficiency as 33% time reduction on the MIMIC-III decompensation benchmark. We release our code at https://github.com/AITRICS/DeltaSHAP.

Updated: 2025-07-03 06:08:07

标题: DeltaSHAP：使用Shapley值解释在线患者监测中的预测演变

摘要: 本研究提出了DeltaSHAP，这是一种专门为在线患者监测系统设计的新型可解释人工智能（XAI）算法。在临床环境中，发现驱动患者风险演变的原因对及时干预至关重要，然而现有的XAI方法未能满足临床时间序列解释任务的独特要求。为此，DeltaSHAP解决了三个关键临床需求：解释连续预测中的变化而不是孤立的预测分数，提供特征归因的大小和方向，并实时提供这些见解。通过将Shapley值调整为时间设置，我们的方法准确捕捉特征联盟效应。它进一步仅使用实际观察到的特征组合来归因预测变化，使其对于时间敏感的临床应用高效且实用。我们还引入了新的评估指标来评估在线时间序列的归因忠诚度，并通过在线患者监测任务的实验表明，DeltaSHAP在解释质量方面优于最先进的XAI方法，减少了33%的计算效率，并在MIMIC-III失代偿基准测试中提高了62%。我们在https://github.com/AITRICS/DeltaSHAP 上发布了我们的代码。

更新时间: 2025-07-03 06:08:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.02342v1

Autoformalization in the Era of Large Language Models: A Survey

Autoformalization, the process of transforming informal mathematical propositions into verifiable formal representations, is a foundational task in automated theorem proving, offering a new perspective on the use of mathematics in both theoretical and applied domains. Driven by the rapid progress in artificial intelligence, particularly large language models (LLMs), this field has witnessed substantial growth, bringing both new opportunities and unique challenges. In this survey, we provide a comprehensive overview of recent advances in autoformalization from both mathematical and LLM-centric perspectives. We examine how autoformalization is applied across various mathematical domains and levels of difficulty, and analyze the end-to-end workflow from data preprocessing to model design and evaluation. We further explore the emerging role of autoformalization in enhancing the verifiability of LLM-generated outputs, highlighting its potential to improve both the trustworthiness and reasoning capabilities of LLMs. Finally, we summarize key open-source models and datasets supporting current research, and discuss open challenges and promising future directions for the field.

Updated: 2025-07-03 06:03:07

标题: 大语言模型时代的自动形式化：一项调查

摘要: 自动形式化是将非正式数学命题转化为可验证的形式表示的过程，在自动定理证明中是一项基础性任务，为数学在理论和应用领域的使用提供了新的视角。在人工智能，特别是大型语言模型（LLMs）的快速发展推动下，这个领域经历了实质性的增长，带来了新的机遇和独特的挑战。在这篇调查中，我们从数学和LLM中心的角度全面总结了自动形式化的最新进展。我们考察了自动形式化如何在各种数学领域和难度水平上应用，并分析了从数据预处理到模型设计和评估的端到端工作流程。我们进一步探讨了自动形式化在增强LLM生成输出的可验证性方面的新兴作用，突出了其改善LLM的可信度和推理能力的潜力。最后，我们总结了支持当前研究的关键开源模型和数据集，并讨论了该领域的开放挑战和有前途的未来方向。

更新时间: 2025-07-03 06:03:07

领域: cs.AI

下载: http://arxiv.org/abs/2505.23486v2

ClustOpt: A Clustering-based Approach for Representing and Visualizing the Search Dynamics of Numerical Metaheuristic Optimization Algorithms

Understanding the behavior of numerical metaheuristic optimization algorithms is critical for advancing their development and application. Traditional visualization techniques, such as convergence plots, trajectory mapping, and fitness landscape analysis, often fall short in illustrating the structural dynamics of the search process, especially in high-dimensional or complex solution spaces. To address this, we propose a novel representation and visualization methodology that clusters solution candidates explored by the algorithm and tracks the evolution of cluster memberships across iterations, offering a dynamic and interpretable view of the search process. Additionally, we introduce two metrics - algorithm stability and algorithm similarity- to quantify the consistency of search trajectories across runs of an individual algorithm and the similarity between different algorithms, respectively. We apply this methodology to a set of ten numerical metaheuristic algorithms, revealing insights into their stability and comparative behaviors, thereby providing a deeper understanding of their search dynamics.

Updated: 2025-07-03 06:01:02

标题: ClustOpt: 一种基于聚类的方法，用于表示和可视化数值元启发式优化算法的搜索动态

摘要: 理解数值元启发式优化算法的行为对于推进它们的发展和应用至关重要。传统的可视化技术，如收敛图、轨迹映射和适应性景观分析，通常无法很好地展示搜索过程的结构动态，特别是在高维或复杂解空间中。为了解决这个问题，我们提出了一种新颖的表示和可视化方法，通过对算法探索的解决方案候选进行聚类，并跟踪迭代中簇成员的演变，提供了对搜索过程的动态和可解释的视图。此外，我们引入了两个度量指标 - 算法稳定性和算法相似性 - 来量化单个算法运行时搜索轨迹的一致性以及不同算法之间的相似性。我们将这种方法应用于十种数值元启发式算法，揭示了它们的稳定性和比较行为的见解，从而更深入地理解它们的搜索动态。

更新时间: 2025-07-03 06:01:02

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2507.02337v1

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Despite the critical role of reward models (RMs) in reinforcement learning from human feedback (RLHF), current state-of-the-art open RMs perform poorly on most existing evaluation benchmarks, failing to capture the spectrum of nuanced and sophisticated human preferences. Even approaches that incorporate advanced training techniques have not yielded meaningful performance improvements. We hypothesize that this brittleness stems primarily from limitations in preference datasets, which are often narrowly scoped, synthetically labeled, or lack rigorous quality control. To address these challenges, we present a large-scale preference dataset comprising 40 million preference pairs, named SynPref-40M. To enable data curation at scale, we design a human-AI synergistic two-stage pipeline that leverages the complementary strengths of human annotation quality and AI scalability. In this pipeline, humans provide verified annotations, while large language models perform automatic curation based on human guidance. Training on this preference mixture, we introduce Skywork-Reward-V2, a suite of eight reward models ranging from 0.6B to 8B parameters, trained on a carefully curated subset of 26 million preference pairs from SynPref-40M. We demonstrate that Skywork-Reward-V2 is versatile across a wide range of capabilities, including alignment with human preferences, objective correctness, safety, resistance to stylistic biases, and best-of-N scaling, achieving state-of-the-art performance across seven major reward model benchmarks. Ablation studies confirm that the effectiveness of our approach stems not only from data scale but also from high-quality curation. The Skywork-Reward-V2 series represents substantial progress in open reward models, highlighting the untapped potential of existing preference datasets and demonstrating how human-AI curation synergy can unlock significantly higher data quality.

Updated: 2025-07-03 05:58:40

标题: Skywork-Reward-V2：通过人工智能协同扩展偏好数据整理

摘要: 尽管奖励模型在从人类反馈中学习强化学习中扮演着至关重要的角色，但当前最先进的开放式奖励模型在大多数现有评估基准上表现不佳，无法捕捉微妙和复杂的人类偏好范围。即使是融合了先进训练技术的方法也未能带来有意义的性能改进。我们假设这种脆弱主要源于偏好数据集的限制，这些数据集通常具有狭窄的范围、合成标记或缺乏严格的质量控制。为了解决这些挑战，我们提出了一个规模庞大的偏好数据集，包括4000万个偏好对，名为SynPref-40M。为了实现规模化的数据整理，我们设计了一个人工智能协同的两阶段管道，利用人工标注质量和人工智能可伸缩性的互补优势。在这个管道中，人类提供验证的注释，而大型语言模型根据人类指导进行自动整理。通过对这种偏好混合进行训练，我们引入了Skywork-Reward-V2，一个包含从0.6B到8B参数的八个奖励模型系列，经过精心策划的SynPref-40M的2600万偏好对子集的训练。我们展示了Skywork-Reward-V2在各种能力方面的多样性，包括与人类偏好的一致性、客观正确性、安全性、抵抗风格偏见以及最佳-N缩放，在七个主要奖励模型基准测试中实现了最先进的性能。消融研究证实，我们方法的有效性不仅来自数据规模，还来自高质量的整理。Skywork-Reward-V2系列代表了开放奖励模型领域的重大进展，突显了现有偏好数据集尚未开发的潜力，展示了人工智能协同整理如何能够提高数据质量。

更新时间: 2025-07-03 05:58:40

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.01352v2

PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage

This paper investigates privacy jailbreaking in LLMs via steering, focusing on whether manipulating activations can bypass LLM alignment and alter response behaviors to privacy related queries (e.g., a certain public figure's sexual orientation). We begin by identifying attention heads predictive of refusal behavior for private attributes (e.g., sexual orientation) using lightweight linear probes trained with privacy evaluator labels. Next, we steer the activations of a small subset of these attention heads guided by the trained probes to induce the model to generate non-refusal responses. Our experiments show that these steered responses often disclose sensitive attribute details, along with other private information about data subjects such as life events, relationships, and personal histories that the models would typically refuse to produce. Evaluations across four LLMs reveal jailbreaking disclosure rates of at least 95%, with more than 50% on average of these responses revealing true personal information. Our controlled study demonstrates that private information memorized in LLMs can be extracted through targeted manipulation of internal activations.

Updated: 2025-07-03 05:50:50

标题: PII越狱在LLMs中通过激活导向揭示个人信息泄露

摘要: 本文通过引导研究了LLMs中的隐私越狱，重点关注是否通过操纵激活可以绕过LLM对齐，并改变对隐私相关查询（例如某位公众人物的性取向）的响应行为。我们首先通过使用经过隐私评估标签训练的轻量级线性探针，识别出预测拒绝行为的注意力头部，用于私密属性（例如性取向）。接下来，我们引导训练的探测器，引导LLM的激活来诱导模型生成非拒绝响应。我们的实验表明，这些引导响应通常会泄露敏感属性细节，以及关于数据主体的其他私人信息，如生活事件、关系和个人历史，这些信息通常模型会拒绝生成。对四个LLMs的评估显示，越狱披露率至少为95％，平均超过50％的这些响应揭示了真实的个人信息。我们的对照研究表明，通过有针对性地操纵内部激活，可以提取存储在LLMs中的私人信息。

更新时间: 2025-07-03 05:50:50

领域: cs.CR

下载: http://arxiv.org/abs/2507.02332v1

Cautious Next Token Prediction

Next token prediction paradigm has been prevailing for autoregressive models in the era of LLMs. The current default sampling choice for popular LLMs is temperature scaling together with nucleus sampling to balance diversity and coherence. Nevertheless, such approach leads to inferior performance in various NLP tasks when the model is not certain about testing questions. To this end, we propose a brand new training-free decoding strategy, dubbed as Cautious Next Token Prediction (CNTP). In the decoding process, if the model has comparatively high prediction entropy at a certain step, we sample multiple trials starting from the step independently and stop when encountering any punctuation. Then we select the trial with the lowest perplexity score viewed as the most probable and reliable trial path given the model's capacity. The trial number is negatively correlated with the prediction confidence, i.e., the less confident the model is, the more trials it should sample. This is consistent with human beings' behaviour: when feeling uncertain or unconfident, one tends to think more creatively, exploring multiple thinking paths, to cautiously select the path one feels most confident about. Extensive experiments on both LLMs and MLLMs show that our proposed CNTP approach outperforms existing standard decoding strategies consistently by a clear margin. Moreover, the integration of CNTP with self consistency can further improve over vanilla self consistency. We believe our proposed CNTP has the potential to become one of the default choices for LLM decoding. Code is available at https://github.com/wyzjack/CNTP.

Updated: 2025-07-03 05:49:18

标题: 谨慎的下一个标记预测

摘要: 在LLM时代，基于下一个标记预测范式的自回归模型已经盛行。当前流行的LLM的默认采样选择是温度缩放和核心采样，以平衡多样性和连贯性。然而，当模型对测试问题不确定时，这种方法会导致各种NLP任务的性能较差。为此，我们提出了一种全新的无训练解码策略，命名为谨慎的下一个标记预测（CNTP）。在解码过程中，如果模型在某一步的预测熵相对较高，我们会独立地从该步开始进行多次尝试，并在遇到任何标点符号时停止。然后，我们选择具有最低困惑度分数的试验路径，视为最可能和可靠的试验路径，考虑到模型的能力。试验次数与预测信心呈负相关，即模型越不自信，就应该采样更多的试验。这与人类的行为一致：当感到不确定或缺乏信心时，人们倾向于更有创造性地思考，探索多种思考路径，谨慎地选择自己最有信心的路径。在LLM和MLLM上进行的大量实验表明，我们提出的CNTP方法始终比现有的标准解码策略表现出更好的性能。此外，将CNTP与自一致性集成可以进一步提高过去的自一致性。我们相信我们提出的CNTP有成为LLM解码的默认选择之一的潜力。代码可在https://github.com/wyzjack/CNTP上找到。

更新时间: 2025-07-03 05:49:18

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.03038v1

Tracing the Interactions of Modular CMA-ES Configurations Across Problem Landscapes

This paper leverages the recently introduced concept of algorithm footprints to investigate the interplay between algorithm configurations and problem characteristics. Performance footprints are calculated for six modular variants of the CMA-ES algorithm (modCMA), evaluated on 24 benchmark problems from the BBOB suite, across two-dimensional settings: 5-dimensional and 30-dimensional. These footprints provide insights into why different configurations of the same algorithm exhibit varying performance and identify the problem features influencing these outcomes. Our analysis uncovers shared behavioral patterns across configurations due to common interactions with problem properties, as well as distinct behaviors on the same problem driven by differing problem features. The results demonstrate the effectiveness of algorithm footprints in enhancing interpretability and guiding configuration choices.

Updated: 2025-07-03 05:48:58

标题: 跟踪模块化CMA-ES配置在问题景观中的相互作用

摘要: 本文利用最近引入的算法足迹的概念，研究算法配置和问题特征之间的相互作用。对CMA-ES算法的六个模块化变体（modCMA）进行性能足迹计算，在BBOB套件的24个基准问题上进行评估，跨两个二维设置：5维和30维。这些足迹揭示了为什么相同算法的不同配置表现出不同性能，并确定影响这些结果的问题特征。我们的分析揭示了由于与问题属性的共同交互作用而在配置之间共享的行为模式，以及由于不同的问题特征导致在相同问题上的不同行为。结果表明，算法足迹在增强可解释性和指导配置选择方面的有效性。

更新时间: 2025-07-03 05:48:58

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2507.02331v1

SecAlign: Defending Against Prompt Injection with Preference Optimization

Large language models (LLMs) are becoming increasingly prevalent in modern software systems, interfacing between the user and the Internet to assist with tasks that require advanced language understanding. To accomplish these tasks, the LLM often uses external data sources such as user documents, web retrieval, results from API calls, etc. This opens up new avenues for attackers to manipulate the LLM via prompt injection. Adversarial prompts can be injected into external data sources to override the system's intended instruction and instead execute a malicious instruction. To mitigate this vulnerability, we propose a new defense called SecAlign based on the technique of preference optimization. Our defense first constructs a preference dataset with prompt-injected inputs, secure outputs (ones that respond to the legitimate instruction), and insecure outputs (ones that respond to the injection). We then perform preference optimization on this dataset to teach the LLM to prefer the secure output over the insecure one. This provides the first known method that reduces the success rates of various prompt injections to <10%, even against attacks much more sophisticated than ones seen during training. This indicates our defense generalizes well against unknown and yet-to-come attacks. Also, SecAlign models are still practical with similar utility to the one before defensive training in our evaluations. Our code is at https://github.com/facebookresearch/SecAlign

Updated: 2025-07-03 05:45:41

标题: SecAlign：通过偏好优化防御提示注入

摘要: 大型语言模型（LLMs）在现代软件系统中变得越来越普遍，它们在用户和互联网之间接口，协助需要高级语言理解的任务。为了完成这些任务，LLM通常使用外部数据源，如用户文档、网络检索、API调用结果等。这为攻击者通过提示注入操纵LLM打开了新的途径。对抗提示可以被注入到外部数据源中，以覆盖系统的预期指令，而是执行恶意指令。为了减轻这种漏洞，我们提出了一种基于偏好优化技术的新防御方法，称为SecAlign。我们的防御首先使用注入提示的输入、安全输出（响应合法指令）和不安全输出（响应注入）构建一个偏好数据集。然后在这个数据集上执行偏好优化，教导LLM更喜欢安全输出而不是不安全输出。这提供了第一个已知的方法，将各种提示注入的成功率降低到<10%，即使在训练期间看到的攻击比较复杂。这表明我们的防御对未知和即将到来的攻击具有良好的泛化能力。此外，我们的评估中，SecAlign模型在防御训练之前具有类似实用性。我们的代码位于https://github.com/facebookresearch/SecAlign。

更新时间: 2025-07-03 05:45:41

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.05451v3

Path Planning using a One-shot-sampling Skeleton Map

Path planning algorithms aim to compute a collision-free path, and many works focus on finding the optimal distance path. However, for some applications, a more suitable approach is to balance response time, safety of the paths, and path length. In this context, a skeleton map is a useful tool in graph-based schemes, as it provides an intrinsic representation of free configuration space. However, skeletonization algorithms are very resource-intensive, being primarily oriented towards image processing tasks. We propose an efficient path-planning methodology that finds safe paths within an acceptable processing time. This methodology leverages a Deep Denoising Auto-Encoder (DDAE) based on U-Net architecture to compute a skeletonized version of the navigation map, which we refer to as SkelUnet. The SkelUnet network facilitates exploration of the entire workspace through one-shot sampling (OSS), as opposed to the iterative process used by exact algorithms or the probabilistic sampling process. SkelUnet is trained and tested on a dataset consisting of 12,500 bi-dimensional dungeon maps. The motion planning methodology is evaluated in a simulation environment for an Unmanned Aerial Vehicle (UAV) using 250 previously unseen maps, and assessed with various navigation metrics to quantify the navigability of the computed paths. The results demonstrate that using SkelUnet to construct a roadmap offers significant advantages, such as connecting all regions of free workspace, providing safer paths, and reducing processing times. These characteristics make this method particularly suitable for mobile service robots in structured environments.

Updated: 2025-07-03 05:38:26

标题: 使用一次性采样骨架地图进行路径规划

摘要: 路径规划算法的目标是计算不发生碰撞的路径，许多研究着重于寻找最优距离路径。然而，对于一些应用而言，更合适的方法是平衡响应时间、路径安全性和路径长度。在这种背景下，骨架地图是基于图的方案中的一个有用工具，因为它提供了自由配置空间的内在表示。然而，骨架化算法非常消耗资源，主要针对图像处理任务。我们提出了一种高效的路径规划方法，可以在可接受的处理时间内找到安全路径。这种方法利用基于U-Net架构的深度去噪自动编码器(DDAE)来计算导航地图的骨架化版本，我们称之为SkelUnet。SkelUnet网络通过一次性采样(OSS)便利地探索整个工作空间，而不是精确算法或概率采样过程所使用的迭代过程。SkelUnet在由12500个二维地牢地图组成的数据集上进行训练和测试。运动规划方法在一个模拟环境中对一架无人机(UAV)使用250个以前未见的地图进行评估，并通过各种导航指标来量化计算路径的可导性。结果表明，利用SkelUnet构建路线图具有明显的优势，如连接自由工作空间的所有区域、提供更安全的路径和减少处理时间。这些特点使得这种方法特别适用于结构化环境中的移动服务机器人。

更新时间: 2025-07-03 05:38:26

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2507.02328v1

Neural Network-based Study for Rice Leaf Disease Recognition and Classification: A Comparative Analysis Between Feature-based Model and Direct Imaging Model

Rice leaf diseases significantly reduce productivity and cause economic losses, highlighting the need for early detection to enable effective management and improve yields. This study proposes Artificial Neural Network (ANN)-based image-processing techniques for timely classification and recognition of rice diseases. Despite the prevailing approach of directly inputting images of rice leaves into ANNs, there is a noticeable absence of thorough comparative analysis between the Feature Analysis Detection Model (FADM) and Direct Image-Centric Detection Model (DICDM), specifically when it comes to evaluating the effectiveness of Feature Extraction Algorithms (FEAs). Hence, this research presents initial experiments on the Feature Analysis Detection Model, utilizing various image Feature Extraction Algorithms, Dimensionality Reduction Algorithms (DRAs), Feature Selection Algorithms (FSAs), and Extreme Learning Machine (ELM). The experiments are carried out on datasets encompassing bacterial leaf blight, brown spot, leaf blast, leaf scald, Sheath blight rot, and healthy leaf, utilizing 10-fold Cross-Validation method. A Direct Image-Centric Detection Model is established without the utilization of any FEA, and the evaluation of classification performance relies on different metrics. Ultimately, an exhaustive contrast is performed between the achievements of the Feature Analysis Detection Model and Direct Image-Centric Detection Model in classifying rice leaf diseases. The results reveal that the highest performance is attained using the Feature Analysis Detection Model. The adoption of the proposed Feature Analysis Detection Model for detecting rice leaf diseases holds excellent potential for improving crop health, minimizing yield losses, and enhancing overall productivity and sustainability of rice farming.

Updated: 2025-07-03 05:26:52

标题: 基于神经网络的水稻叶片病害识别和分类研究：基于特征的模型和直接成像模型的比较分析

摘要: 水稻叶病显著降低产量并导致经济损失，突出了早期检测的必要性，以实现有效管理并提高产量。本研究提出了基于人工神经网络（ANN）的图像处理技术，用于及时分类和识别水稻病害。尽管当前主流方法是直接将水稻叶片图像输入到ANN中，但在特征分析检测模型（FADM）和直接图像中心检测模型（DICDM）之间，特别是在评估特征提取算法（FEAs）的有效性时，缺乏彻底的比较分析。因此，本研究针对特征分析检测模型进行了初步实验，利用各种图像特征提取算法、降维算法（DRAs）、特征选择算法（FSAs）和极限学习机（ELM）。实验在包括细菌性叶枯病、褐斑病、叶枯病、叶斑病、叶鞘枯病和健康叶子的数据集上进行，采用10折交叉验证方法。建立了一个不使用任何FEA的直接图像中心检测模型，分类性能的评估依赖于不同的度量标准。最终，在对比特征分析检测模型和直接图像中心检测模型在分类水稻叶病方面的成就时进行了详尽的对比。结果显示，使用特征分析检测模型达到了最高的性能。采用提出的特征分析检测模型检测水稻叶病具有极大潜力，有助于改善作物健康、减少产量损失，并提高水稻种植的整体生产力和可持续性。

更新时间: 2025-07-03 05:26:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02322v1

USBIPS Framework: Protecting Hosts from Malicious USB Peripherals

Universal Serial Bus (USB)-based attacks have increased in complexity in recent years. Modern attacks incorporate a wide range of attack vectors, from social engineering to signal injection. The security community is addressing these challenges using a growing set of fragmented defenses. Regardless of the vector of a USB-based attack, the most important risks concerning most people and enterprises are service crashes and data loss. The host OS manages USB peripherals, and malicious USB peripherals, such as those infected with BadUSB, can crash a service or steal data from the OS. Although USB firewalls have been proposed to thwart malicious USB peripherals, such as USBFilter and USBGuard, their effect is limited for preventing real-world intrusions. This paper focuses on building a security framework called USBIPS within Windows OSs to defend against malicious USB peripherals. This includes major efforts to explore the nature of malicious behavior and achieve persistent protection from USB-based intrusions. Herein, we first introduce an allowlisting-based method for USB access control. We then present a behavior-based detection mechanism focusing on attacks integrated into USB peripherals. Finally, we propose a novel approach that combines cross-layer methods to build the first generic security framework that thwarts USB-based intrusions. Within a centralized threat analysis framework, the approach provides persistent protection and may detect unknown malicious behavior. By addressing key security and performance challenges, these efforts help modern OSs against attacks from untrusted USB peripherals.

Updated: 2025-07-03 05:23:52

标题: USBIPS框架：保护主机免受恶意USB外设的侵害

摘要: 基于通用串行总线（USB）的攻击在近年来变得越来越复杂。现代攻击涵盖了从社会工程到信号注入等各种攻击向量。安全社区正在应对这些挑战，使用越来越多的分散的防御措施。无论是USB攻击的向量如何，对大多数人和企业来说最重要的风险是服务崩溃和数据丢失。主机操作系统管理USB外设，恶意USB外设，如感染了BadUSB的那些，可以导致服务崩溃或从操作系统中窃取数据。虽然提出了USB防火墙来阻止恶意USB外设，如USBFilter和USBGuard，但对于防止现实世界中的入侵效果有限。本文着重于在Windows操作系统内构建名为USBIPS的安全框架，以防范恶意USB外设。这包括探索恶意行为的性质并实现对USB攻击的持久保护的重大努力。在此基础上，首先介绍了基于允许名单的USB访问控制方法。然后提出了一种基于行为的检测机制，重点关注集成到USB外设中的攻击。最后，我们提出了一种结合跨层方法的创新方法，构建了第一个能够防范USB攻击的通用安全框架。在一个集中的威胁分析框架内，这种方法提供持久的保护，并可能检测未知的恶意行为。通过解决关键的安全和性能挑战，这些努力有助于现代操作系统抵御来自不受信任的USB外设的攻击。

更新时间: 2025-07-03 05:23:52

领域: cs.CR

下载: http://arxiv.org/abs/2409.12850v2

High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures

This work presents an overview of the technical details behind a high performance reinforcement learning policy deployment with the Spot RL Researcher Development Kit for low level motor access on Boston Dynamics Spot. This represents the first public demonstration of an end to end end reinforcement learning policy deployed on Spot hardware with training code publicly available through Nvidia IsaacLab and deployment code available through Boston Dynamics. We utilize Wasserstein Distance and Maximum Mean Discrepancy to quantify the distributional dissimilarity of data collected on hardware and in simulation to measure our sim2real gap. We use these measures as a scoring function for the Covariance Matrix Adaptation Evolution Strategy to optimize simulated parameters that are unknown or difficult to measure from Spot. Our procedure for modeling and training produces high quality reinforcement learning policies capable of multiple gaits, including a flight phase. We deploy policies capable of over 5.2ms locomotion, more than triple Spots default controller maximum speed, robustness to slippery surfaces, disturbance rejection, and overall agility previously unseen on Spot. We detail our method and release our code to support future work on Spot with the low level API.

Updated: 2025-07-03 05:14:46

标题: 现成的高性能强化学习：利用分布度量优化模拟参数

摘要: 这项工作介绍了使用Spot RL Researcher Development Kit在波士顿动力公司的Spot机器人上实现高性能强化学习策略部署的技术细节概述。这是首次公开展示在Spot硬件上部署端到端强化学习策略，训练代码可通过Nvidia IsaacLab公开获取，部署代码可通过波士顿动力公司获取。我们利用Wasserstein距离和最大均值差异来量化在硬件和模拟中收集的数据的分布差异，以测量我们的模拟到真实的差距。我们将这些度量作为协方差矩阵适应进化策略的评分函数，以优化从Spot难以测量或未知的模拟参数。我们的建模和训练过程产生了高质量的强化学习策略，能够实现多种步态，包括飞行阶段。我们部署的策略能够实现超过5.2ms的移动，是Spot默认控制器最大速度的三倍以上，对滑动表面具有稳健性，具有扰动抵抗能力，并且具有Spot以前未曾见到的整体敏捷性。我们详细介绍了我们的方法，并发布了我们的代码，以支持未来在Spot上进行低级API的工作。

更新时间: 2025-07-03 05:14:46

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2504.17857v3

Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach

Aligning large language models (LLMs) with human preferences usually requires fine-tuning methods such as RLHF and DPO. These methods directly optimize the model parameters, so they cannot be used in test-time to improve model performance, nor are they applicable when the model weights are not accessible. In contrast, test-time methods sidestep weight updates by leveraging reward functions to guide and improve output quality. However, they incur high inference costs, and their one-shot guidance is often based on imperfect reward or value functions, leading to suboptimal outputs. In this work, we present a method named Iterative Reweight-then-Optimize (IRO), a reinforcement learning (RL) framework that performs RL-style alignment of the (frozen) base model without touching its parameters. During training, each iteration (i) samples candidates from the base model, (ii) resamples using current value functions, and (iii) trains a new lightweight value function that guides the next decoding pass. At test time, the value functions are used to guide the base model generation via a search-based optimization process. Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI's reinforcement fine-tuning (RFT), but without requiring access to the model weights.

Updated: 2025-07-03 05:12:51

标题: 用强化学习对冻结的LLMs进行对齐：一种迭代的重新加权-优化方法

摘要: 将大型语言模型（LLMs）与人类偏好对齐通常需要诸如RLHF和DPO之类的微调方法。这些方法直接优化模型参数，因此不能在测试时间内用于提高模型性能，也无法在模型权重不可访问时应用。相比之下，测试时间方法通过利用奖励函数来引导和改善输出质量，避开了权重更新，但它们会产生高推理成本，并且它们的一次性指导通常基于不完美的奖励或价值函数，导致次优输出。在这项工作中，我们提出了一种名为迭代重权优化（IRO）的方法，这是一种强化学习（RL）框架，可以在不触及模型参数的情况下对（冻结的）基础模型进行RL风格的对齐。在训练期间，每个迭代（i）从基础模型中采样候选项，（ii）使用当前价值函数进行再采样，（iii）训练一个新的轻量级价值函数，指导下一次解码过程。在测试时间，价值函数用于通过基于搜索的优化过程来引导基础模型生成。值得注意的是，用户可以将IRO应用于在其自己数据集上对齐模型，类似于OpenAI的强化微调（RFT），但无需访问模型权重。

更新时间: 2025-07-03 05:12:51

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.17828v2

Transformer-based EEG Decoding: A Survey

Electroencephalography (EEG) is one of the most common signals used to capture the electrical activity of the brain, and the decoding of EEG, to acquire the user intents, has been at the forefront of brain-computer/machine interfaces (BCIs/BMIs) research. Compared to traditional EEG analysis methods with machine learning, the advent of deep learning approaches have gradually revolutionized the field by providing an end-to-end long-cascaded architecture, which can learn more discriminative features automatically. Among these, Transformer is renowned for its strong handling capability of sequential data by the attention mechanism, and the application of Transformers in various EEG processing tasks is increasingly prevalent. This article delves into a relevant survey, summarizing the latest application of Transformer models in EEG decoding since it appeared. The evolution of the model architecture is followed to sort and organize the related advances, in which we first elucidate the fundamentals of the Transformer that benefits EEG decoding and its direct application. Then, the common hybrid architectures by integrating basic Transformer with other deep learning techniques (convolutional/recurrent/graph/spiking neural netwo-rks, generative adversarial networks, diffusion models, etc.) is overviewed in detail. The research advances of applying the modified intrinsic structures of customized Transformer have also been introduced. Finally, the current challenges and future development prospects in this rapidly evolving field are discussed. This paper aims to help readers gain a clear understanding of the current state of Transformer applications in EEG decoding and to provide valuable insights for future research endeavors.

Updated: 2025-07-03 05:12:06

标题: 基于变压器的脑电图解码：一项调查

摘要: 脑电图（EEG）是捕捉大脑电活动的最常见信号之一，解码EEG以获取用户意图一直是脑-计算机/机器接口（BCIs/BMIs）研究的前沿。与传统的机器学习EEG分析方法相比，深度学习方法的出现逐渐革新了该领域，提供了一个端到端的长级联架构，可以自动学习更具有区分性的特征。在这些方法中，Transformer以其通过注意机制处理序列数据的强大能力而闻名，Transformers在各种EEG处理任务中的应用日益普遍。本文深入探讨了相关调查，总结了自Transformer模型出现以来在EEG解码中的最新应用。随后跟踪模型架构的演变，以整理和组织相关进展，首先阐明了有益于EEG解码的Transformer基本原理及其直接应用。然后，通过将基本Transformer与其他深度学习技术（卷积/循环/图形/脉冲神经网络，生成对抗网络，扩散模型等）集成而形成的常见混合架构进行详细概述。还介绍了应用定制Transformer的修改内在结构的研究进展。最后，讨论了这一快速发展领域中当前的挑战和未来发展前景。本文旨在帮助读者清楚了解当前Transformer在EEG解码中的应用状态，并为未来的研究工作提供宝贵的见解。

更新时间: 2025-07-03 05:12:06

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2507.02320v1

Iterated belief revision: from postulates to abilities

The belief revision field is opulent in new proposals and indigent in analyses of existing approaches. Much work hinge on postulates, employed as syntactic characterizations: some revision mechanism is equivalent to some properties. Postulates constraint specific revision instances: certain revisions update certain beliefs in a certain way. As an example, if the revision is consistent with the current beliefs, it is incorporated with no other change. A postulate like this tells what revisions must do and neglect what they can do. Can they reach a certain state of beliefs? Can they reach all possible states of beliefs? Can they reach all possible states of beliefs from no previous belief? Can they reach a dogmatic state of beliefs, where everything not believed is impossible? Can they make two conditions equally believed? An application where every possible state of beliefs is sensible requires each state of beliefs to be reachable. An application where conditions may be equally believed requires such a belief state to be reachable. An application where beliefs may become dogmatic requires a way to make them dogmatic. Such doxastic states need to be reached in a way or another. Not in specific way, as dictated by a typical belief revision postulate. This is an ability, not a constraint: the ability of being plastic, equating, dogmatic. Amnesic, correcting, believer, damascan, learnable are other abilities. Each revision mechanism owns some of these abilities and lacks the others: lexicographic, natural, restrained, very radical, full meet, radical, severe, moderate severe, deep severe, plain severe and deep severe revisions, each of these revisions is proved to possess certain abilities.

Updated: 2025-07-03 05:11:41

标题: Iterated belief revision: 从假设到能力

摘要: 信念修正领域提出了许多新的建议，但对现有方法的分析却很贫乏。许多工作依赖于假设，这些假设被用作句法表征：某种修正机制等同于某些属性。假设限制特定的修正实例：某些修正以某种特定方式更新某些信念。例如，如果修正与当前信念一致，它将被纳入而不作其他更改。这样的假设告诉了修正必须做什么，却忽略了它们可以做什么。它们能达到某种信念状态吗？它们能达到所有可能的信念状态吗？它们能从没有先前信念的情况下达到所有可能的信念状态吗？它们能达到极端教条的信念状态，其中一切未被相信的事物都是不可能的吗？它们能使两个条件同样被相信吗？在一个每种可能的信念状态都是合理的应用中，需要每种信念状态都是可达的。在一个条件可能同样被相信的应用中，需要这样一个信念状态是可达的。在一个信念可能变得教条的应用中，需要一种方法使它们变得教条。这些信念状态需要以某种方式达到。不是以特定方式，如典型信念修正假设所规定的。这是一种能力，而不是一种限制：具有塑性、相等、教条的能力。遗忘性、纠正性、信徒性、达马斯克性、可学性是其他能力。每种修正机制拥有其中的一些能力，并缺乏其他能力：词典、自然、受限、非常激进、完全满足、激进、严厉、中度严厉、深度严厉、简单严厉和深度严厉修正，这些修正中每一种都被证明具有某些能力。

更新时间: 2025-07-03 05:11:41

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2507.02319v1

Generalization vs. Specialization under Concept Shift

Machine learning models are often brittle under distribution shift, i.e., when data distributions at test time differ from those during training. Understanding this failure mode is central to identifying and mitigating safety risks of mass adoption of machine learning. Here we analyze ridge regression under concept shift -- a form of distribution shift in which the input-label relationship changes at test time. We derive an exact expression for prediction risk in the thermodynamic limit. Our results reveal nontrivial effects of concept shift on generalization performance, including a phase transition between weak and strong concept shift regimes and nonmonotonic data dependence of test performance even when double descent is absent. Our theoretical results are in good agreement with experiments based on transformers pretrained to solve linear regression; under concept shift, too long context length can be detrimental to generalization performance of next token prediction. Finally, our experiments on MNIST and FashionMNIST suggest that this intriguing behavior is present also in classification problems.

Updated: 2025-07-03 05:05:13

标题: 一般化与特殊化在概念转变下的比较

摘要: 机器学习模型在分布转移下往往表现脆弱，即在测试时数据分布与训练时不同。理解这种失败模式对于识别和减轻机器学习大规模采用的安全风险至关重要。在这里，我们分析了岭回归在概念转移下的情况——一种分布转移形式，即在测试时输入标签关系发生变化。我们在热力学极限下推导了预测风险的精确表达式。我们的结果揭示了概念转移对泛化性能的非平凡影响，包括弱和强概念转移区域之间的相变以及测试性能的非单调数据依赖性，即使双下陷也不存在。我们的理论结果与基于预训练变压器解决线性回归的实验结果非常一致；在概念转移下，过长的上下文长度可能对下一个标记预测的泛化性能有害。最后，我们在MNIST和FashionMNIST上的实验表明，这种有趣的行为也存在于分类问题中。

更新时间: 2025-07-03 05:05:13

领域: stat.ML,cond-mat.dis-nn,cond-mat.stat-mech,cs.LG

下载: http://arxiv.org/abs/2409.15582v2

Improving Constrained Generation in Language Models via Self-Distilled Twisted Sequential Monte Carlo

Recent work has framed constrained text generation with autoregressive language models as a probabilistic inference problem. Among these, Zhao et al. (2024) introduced a promising approach based on twisted Sequential Monte Carlo, which incorporates learned twist functions and twist-induced proposals to guide the generation process. However, in constrained generation settings where the target distribution concentrates on outputs that are unlikely under the base model, learning becomes challenging due to sparse and uninformative reward signals. We show that iteratively refining the base model through self-distillation alleviates this issue by making the model progressively more aligned with the target, leading to substantial gains in generation quality.

Updated: 2025-07-03 05:00:21

标题: 通过自蒸馏扭曲顺序蒙特卡洛改进语言模型中的受限生成

摘要: 最近的研究将受限文本生成与自回归语言模型框定为概率推理问题。其中，Zhao等人（2024年）提出了一种基于扭曲顺序蒙特卡洛的有希望的方法，该方法包含学习的扭曲函数和扭曲引发的建议，以指导生成过程。然而，在受限生成设置中，目标分布集中于在基础模型下不太可能的输出时，由于奖励信号稀疏且不具信息性，学习变得具有挑战性。我们表明，通过自蒸馏迭代地完善基础模型可以缓解这一问题，使模型逐渐与目标对齐，从而显著提高生成质量。

更新时间: 2025-07-03 05:00:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.02315v1

Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment

Traditional continual learning methods prioritize knowledge retention and focus primarily on mitigating catastrophic forgetting, implicitly assuming that the data distribution of previously learned tasks remains static. This overlooks the dynamic nature of real-world data streams, where concept drift permanently alters previously seen data and demands both stability and rapid adaptation. We introduce a holistic framework for continual learning under concept drift that simulates realistic scenarios by evolving task distributions. As a baseline, we consider Full Relearning (FR), in which the model is retrained from scratch on newly labeled samples from the drifted distribution. While effective, this approach incurs substantial annotation and computational overhead. To address these limitations, we propose Adaptive Memory Realignment (AMR), a lightweight alternative that equips rehearsal-based learners with a drift-aware adaptation mechanism. AMR selectively removes outdated samples of drifted classes from the replay buffer and repopulates it with a small number of up-to-date instances, effectively realigning memory with the new distribution. This targeted resampling matches the performance of FR while reducing the need for labeled data and computation by orders of magnitude. To enable reproducible evaluation, we introduce four concept-drift variants of standard vision benchmarks: Fashion-MNIST-CD, CIFAR10-CD, CIFAR100-CD, and Tiny-ImageNet-CD, where previously seen classes reappear with shifted representations. Comprehensive experiments on these datasets using several rehearsal-based baselines show that AMR consistently counters concept drift, maintaining high accuracy with minimal overhead. These results position AMR as a scalable solution that reconciles stability and plasticity in non-stationary continual learning environments.

Updated: 2025-07-03 04:41:20

标题: 整体化的连续学习在概念漂移下具有自适应记忆重新调整

摘要: 传统的持续学习方法优先考虑知识保留，并主要关注减轻灾难性遗忘，隐含地假设先前学习任务的数据分布保持静态。这忽视了现实世界数据流的动态性，概念漂移永久改变了先前看到的数据，并要求稳定性和快速适应。我们引入了一个在概念漂移下进行持续学习的整体框架，通过演变任务分布模拟真实场景。作为基线，我们考虑完全重新学习（FR），在这种方法中，模型从头开始在来自漂移分布的新标记样本上重新训练。虽然有效，但这种方法会产生大量的注释和计算开销。为了解决这些限制，我们提出了自适应记忆重新调整（AMR），这是一种轻量级的选择，为基于排练的学习者提供了一个意识到漂移的适应机制。AMR有选择地从重播缓冲区中删除漂移类别的过时样本，并用少量最新实例重新填充，有效地重新调整记忆与新分布。这种有针对性的重采样与FR的性能相匹配，同时减少了标记数据和计算的需求数倍。为了实现可重复评估，我们引入了四种概念漂移变体的标准视觉基准：时尚MNIST-CD，CIFAR10-CD，CIFAR100-CD和Tiny-ImageNet-CD，其中先前看到的类别以移位表示重新出现。在这些数据集上使用几个基于排练的基线进行的全面实验表明，AMR始终抵制概念漂移，保持高准确性并减少最小的开销。这些结果将AMR定位为在非稳态持续学习环境中协调稳定性和可塑性的可扩展解决方案。

更新时间: 2025-07-03 04:41:20

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.02310v1

Rethinking Broken Object Level Authorization Attacks Under Zero Trust Principle

RESTful APIs facilitate data exchange between applications, but they also expose sensitive resources to potential exploitation. Broken Object Level Authorization (BOLA) is the top vulnerability in the OWASP API Security Top 10, exemplifies a critical access control flaw where attackers manipulate API parameters to gain unauthorized access. To address this, we propose BOLAZ, a defense framework grounded in zero trust principles. BOLAZ analyzes the data flow of resource IDs, pinpointing BOLA attack injection points and determining the associated authorization intervals to prevent horizontal privilege escalation. Our approach leverages static taint tracking to categorize APIs into producers and consumers based on how they handle resource IDs. By mapping the propagation paths of resource IDs, BOLAZ captures the context in which these IDs are produced and consumed, allowing for precise identification of authorization boundaries. Unlike defense methods based on common authorization models, BOLAZ is the first authorization-guided method that adapts defense rules based on the system's best-practice authorization logic. We validate BOLAZ through empirical research on 10 GitHub projects. The results demonstrate BOLAZ's effectiveness in defending against vulnerabilities collected from CVE and discovering 35 new BOLA vulnerabilities in the wild, demonstrating its practicality in real-world deployments.

Updated: 2025-07-03 04:40:14

标题: 重新思考零信任原则下的破损对象级授权攻击

摘要: RESTful API促进了应用程序之间的数据交换，但也使敏感资源面临潜在的利用风险。破坏对象级别授权（BOLA）是OWASP API安全十大中的顶级漏洞，展示了攻击者操纵API参数以获取未授权访问的重要访问控制缺陷。为了解决这个问题，我们提出了BOLAZ，一个基于零信任原则的防御框架。BOLAZ分析资源ID的数据流，找出BOLA攻击注入点，并确定相关的授权间隔，以防止水平权限升级。我们的方法利用静态污点跟踪，根据API处理资源ID的方式将API分类为生产者和消费者。通过映射资源ID的传播路径，BOLAZ捕获了这些ID被生产和消费的上下文，从而可以精确识别授权边界。与基于常见授权模型的防御方法不同，BOLAZ是第一种根据系统最佳实践授权逻辑调整防御规则的授权引导方法。我们通过对10个GitHub项目进行的实证研究验证了BOLAZ。结果表明，BOLAZ在防御从CVE收集的漏洞方面非常有效，并在现实环境中发现了35个新的BOLA漏洞，展示了其在实际部署中的实用性。

更新时间: 2025-07-03 04:40:14

领域: cs.CR

下载: http://arxiv.org/abs/2507.02309v1

Distributional Soft Actor-Critic with Diffusion Policy

Reinforcement learning has been proven to be highly effective in handling complex control tasks. Traditional methods typically use unimodal distributions, such as Gaussian distributions, to model the output of value distributions. However, unimodal distribution often and easily causes bias in value function estimation, leading to poor algorithm performance. This paper proposes a distributional reinforcement learning algorithm called DSAC-D (Distributed Soft Actor Critic with Diffusion Policy) to address the challenges of estimating bias in value functions and obtaining multimodal policy representations. A multimodal distributional policy iteration framework that can converge to the optimal policy was established by introducing policy entropy and value distribution function. A diffusion value network that can accurately characterize the distribution of multi peaks was constructed by generating a set of reward samples through reverse sampling using a diffusion model. Based on this, a distributional reinforcement learning algorithm with dual diffusion of the value network and the policy network was derived. MuJoCo testing tasks demonstrate that the proposed algorithm not only learns multimodal policy, but also achieves state-of-the-art (SOTA) performance in all 9 control tasks, with significant suppression of estimation bias and total average return improvement of over 10% compared to existing mainstream algorithms. The results of real vehicle testing show that DSAC-D can accurately characterize the multimodal distribution of different driving styles, and the diffusion policy network can characterize multimodal trajectories.

Updated: 2025-07-03 04:38:33

标题: 具有扩散策略的分布式软演员-评论家算法

摘要: 强化学习已被证明在处理复杂控制任务方面非常有效。传统方法通常使用单峰分布，如高斯分布，来对值分布的输出进行建模。然而，单峰分布往往容易导致值函数估计的偏见，从而导致算法性能不佳。本文提出了一种名为DSAC-D（Distributed Soft Actor Critic with Diffusion Policy）的分布式强化学习算法，以解决值函数估计中的偏见和获取多模态策略表示的挑战。通过引入策略熵和值分布函数，建立了一个可以收敛到最优策略的多模态分布策略迭代框架。通过使用扩散模型通过反向采样生成一组奖励样本，构建了一个可以准确表征多个高峰分布的扩散价值网络。基于此，导出了一个具有值网络和策略网络双重扩散的分布式强化学习算法。MuJoCo测试任务表明，所提出的算法不仅学习到了多模态策略，而且在所有9个控制任务中实现了最先进的性能，在估计偏差的显著抑制和总平均回报改善超过10%的情况下。实际车辆测试结果表明，DSAC-D能够准确表征不同驾驶风格的多模态分布，而扩散策略网络能够表征多模态轨迹。

更新时间: 2025-07-03 04:38:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01381v2

Neural CRNs: A Natural Implementation of Learning in Chemical Reaction Networks

This work introduces Neural CRNs, a general-purpose chemical neural network framework that embeds learning directly into mass-action chemical reaction systems. Unlike prior approaches that chemically implement and compose discrete neural computations, Neural CRNs adopt an analog computing approach, where both forward and backward passes of learning are implemented as continuous-time evolutions of molecular concentrations. Such an analog formulation naturally aligns with the analog nature of chemical kinetics, yielding concise circuits and practicable reactions. We demonstrate this efficiency by constructing a streamlined supervised learning procedure executable in just two sequential stages. We then implement several learning circuits to demonstrate the framework's linear and nonlinear modeling capabilities and to validate its learning procedure. These circuits are implemented entirely using unimolecular and bimolecular reactions, avoiding the complexity of higher-order chemistries. In summary, Neural CRNs offer a compact, scalable, and autonomous framework for biochemical learning, opening new avenues for adaptive computing in synthetic biology, bioengineering, and biomedicine.

Updated: 2025-07-03 04:34:00

标题: 神经CRNs：化学反应网络中学习的自然实现

摘要: 这项工作介绍了神经CRNs，这是一个将学习直接嵌入到质量作用化学反应系统中的通用化学神经网络框架。与以往的化学实现和组合离散神经计算方法不同，神经CRNs采用模拟计算方法，其中学习的前向和反向传递都实现为分子浓度的连续时间演化。这样的模拟公式与化学动力学的模拟性质自然相符，产生简洁的电路和可操作的反应。我们通过构建一个简化的监督学习程序，仅需两个连续阶段就可执行，来展示这种效率。然后，我们实现了几个学习电路，以展示该框架的线性和非线性建模能力，并验证其学习程序。这些电路完全使用单分子和双分子反应实现，避免了更高阶化学的复杂性。总的来说，神经CRNs为生化学习提供了一个紧凑、可扩展和自主的框架，为合成生物学、生物工程和生物医学中的自适应计算开辟了新的途径。

更新时间: 2025-07-03 04:34:00

领域: cs.LG,cs.ET

下载: http://arxiv.org/abs/2409.00034v3

Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient

Recent Large-Language Models (LLMs) pruning methods typically operate at the post-training phase without the expensive weight finetuning, however, their pruning criteria often rely on heuristically hand-crafted metrics, potentially leading to suboptimal performance. We instead propose a novel optimization-based structural pruning that learns the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model. To preserve efficiency, our method eliminates the back-propagation through the LLM per se during optimization, requiring only the forward pass of the LLM. We achieve this by learning an underlying Bernoulli distribution to sample binary pruning masks, where we decouple the Bernoulli parameters from LLM loss, facilitating efficient optimization via policy gradient estimator without back-propagation. Thus, our method can 1) support global and heterogeneous pruning (i.e., automatically determine different redundancy for different layers), and 2) optionally initialize with a metric-based method (for our Bernoulli distributions). Extensive experiments conducted on LLaMA, LLaMA-2, LLaMA-3, Vicuna, and Mistral models using the C4 and WikiText2 datasets demonstrate the promising performance of our method in efficiency and effectiveness. Code is available at https://github.com/ethanygao/backprop-free_LLM_pruning.

Updated: 2025-07-03 04:28:09

标题: 穿越反向传播：基于优化的结构剪枝方法，通过策略梯度实现大型语言模型的优化

摘要: 最近的大型语言模型（LLMs）修剪方法通常在训练后阶段操作，而无需昂贵的权重微调，然而，它们的修剪标准往往依赖启发式手工制定的度量标准，可能导致性能不佳。相反，我们提出了一种新颖的基于优化的结构修剪方法，该方法直接在概率空间中通过优化修剪模型的损失来学习修剪掩码。为了保持效率，我们的方法在优化过程中消除了对LLM本身的反向传播，只需要进行LLM的前向传播。我们通过学习一个基础的伯努利分布来采样二进制修剪掩码，将伯努利参数与LLM损失分离开来，通过策略梯度估计器实现高效的优化，而无需反向传播。因此，我们的方法可以支持全局和异构修剪（即自动确定不同层的不同冗余），并且可以选择使用基于度量的方法（适用于我们的伯努利分布）进行初始化。在C4和WikiText2数据集上对LLaMA、LLaMA-2、LLaMA-3、Vicuna和Mistral模型进行了广泛实验，展示了我们的方法在效率和有效性方面的有希望的性能。代码可在https://github.com/ethanygao/backprop-free_LLM_pruning 上找到。

更新时间: 2025-07-03 04:28:09

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2406.10576v3

Synthetic Heuristic Evaluation: A Comparison between AI- and Human-Powered Usability Evaluation

Usability evaluation is crucial in human-centered design but can be costly, requiring expert time and user compensation. In this work, we developed a method for synthetic heuristic evaluation using multimodal LLMs' ability to analyze images and provide design feedback. Comparing our synthetic evaluations to those by experienced UX practitioners across two apps, we found our evaluation identified 73% and 77% of usability issues, which exceeded the performance of 5 experienced human evaluators (57% and 63%). Compared to human evaluators, the synthetic evaluation's performance maintained consistent performance across tasks and excelled in detecting layout issues, highlighting potential attentional and perceptual strengths of synthetic evaluation. However, synthetic evaluation struggled with recognizing some UI components and design conventions, as well as identifying across screen violations. Additionally, testing synthetic evaluations over time and accounts revealed stable performance. Overall, our work highlights the performance differences between human and LLM-driven evaluations, informing the design of synthetic heuristic evaluations.

Updated: 2025-07-03 04:27:16

标题: 合成启发式评估：人工智能和人力可用性评估的比较

摘要: 可用性评估在以人为中心的设计中至关重要，但可能成本高昂，需要专家时间和用户补偿。在这项工作中，我们开发了一种使用多模态LLMs分析图像并提供设计反馈能力的合成启发式评估方法。通过比较我们的合成评估与经验丰富的UX从业者对两个应用程序的评估结果，我们发现我们的评估发现了73％和77％的可用性问题，超过了5名经验丰富的人类评估者（57％和63％）的表现。与人类评估者相比，合成评估的性能在任务之间保持一致，并在检测布局问题方面表现出色，突出了合成评估的潜在注意力和感知优势。然而，合成评估在识别一些UI组件和设计约定，以及跨屏幕违规方面遇到困难。此外，随着时间和账户的测试，合成评估表现稳定。总体而言，我们的工作突出了人类评估和LLM驱动评估之间的性能差异，为合成启发式评估的设计提供了信息。

更新时间: 2025-07-03 04:27:16

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2507.02306v1

Kernel Density Bayesian Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) methods infer an agent's reward function using demonstrations of expert behavior. A Bayesian IRL approach models a distribution over candidate reward functions, capturing a degree of uncertainty in the inferred reward function. This is critical in some applications, such as those involving clinical data. Typically, Bayesian IRL algorithms require large demonstration datasets, which may not be available in practice. In this work, we incorporate existing domain-specific data to achieve better posterior concentration rates. We study a common setting in clinical and biological applications where we have access to expert demonstrations and known reward functions for a set of training tasks. Our aim is to learn the reward function of a new test task given limited expert demonstrations. Existing Bayesian IRL methods impose restrictions on the form of input data, thus limiting the incorporation of training task data. To better leverage information from training tasks, we introduce kernel density Bayesian inverse reinforcement learning (KD-BIRL). Our approach employs a conditional kernel density estimator, which uses the known reward functions of the training tasks to improve the likelihood estimation across a range of reward functions and demonstration samples. Our empirical results highlight KD-BIRL's faster concentration rate in comparison to baselines, particularly in low test task expert demonstration data regimes. Additionally, we are the first to provide theoretical guarantees of posterior concentration for a Bayesian IRL algorithm. Taken together, this work introduces a principled and theoretically grounded framework that enables Bayesian IRL to be applied across a variety of domains.

Updated: 2025-07-03 04:27:15

标题: 核密度贝叶斯逆强化学习

摘要: 逆强化学习（IRL）方法通过专家行为的演示来推断代理的奖励函数。贝叶斯IRL方法建模候选奖励函数的分布，捕捉推断奖励函数的不确定性程度。这在一些应用中至关重要，例如涉及临床数据的应用。通常，贝叶斯IRL算法需要大量的演示数据集，这在实践中可能不可用。在这项工作中，我们将现有领域特定数据纳入，以实现更好的后验集中速率。我们研究临床和生物应用中的一个常见情境，即我们可以访问专家演示和一组训练任务的已知奖励函数。我们的目标是在有限的专家演示情况下学习新测试任务的奖励函数。现有的贝叶斯IRL方法对输入数据的形式施加限制，从而限制了训练任务数据的整合。为了更好地利用训练任务的信息，我们引入了核密度贝叶斯逆强化学习（KD-BIRL）。我们的方法采用条件核密度估计器，利用训练任务的已知奖励函数来改善一系列奖励函数和演示样本的似然估计。我们的实证结果突出了KD-BIRL相对于基线的更快集中速率，特别是在低测试任务专家演示数据情况下。此外，我们是第一个为贝叶斯IRL算法提供后验集中的理论保证。总的来说，这项工作引入了一个有原则和理论基础的框架，使贝叶斯IRL可以应用于各种领域。

更新时间: 2025-07-03 04:27:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2303.06827v4

DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning

Domain-Adaptive Pre-training (DAP) has recently gained attention for its effectiveness in fine-tuning pre-trained models. Building on this, continual DAP has been explored to develop pre-trained models capable of incrementally incorporating different domain datasets. However, existing continual DAP methods face several limitations: (1) high computational cost and GPU memory usage during training; (2) sensitivity to incremental data order; and (3) providing a single, generalized model for all end tasks, which contradicts the essence of DAP. In this paper, we propose DoMIX, a novel approach that addresses these challenges by leveraging LoRA modules, a representative parameter-efficient fine-tuning (PEFT) method. Our approach enables efficient and parallel domain-adaptive pre-training that is robust to domain order and effectively utilizes accumulated knowledge to provide tailored pre-trained models for specific tasks. We also demonstrate that our method can be extended beyond the DAP setting to standard LLM fine-tuning scenarios. Code is available at https://github.com/dohoonkim-ai/DoMIX.

Updated: 2025-07-03 04:13:01

标题: DoMIX：一种有效的框架，用于在微调中利用领域知识

摘要: 最近，领域自适应预训练（DAP）因其在微调预训练模型中的有效性而受到关注。在此基础上，继续DAP已被探索，以开发能够逐步整合不同领域数据集的预训练模型。然而，现有的持续DAP方法面临几个限制：（1）训练过程中高计算成本和GPU内存使用；（2）对增量数据顺序敏感；以及（3）为所有最终任务提供单一的泛化模型，这与DAP的本质相矛盾。在本文中，我们提出了DoMIX，一种通过利用LoRA模块（一种代表性的参数高效微调（PEFT）方法）来解决这些挑战的新方法。我们的方法实现了高效且并行的领域自适应预训练，对领域顺序具有鲁棒性，并有效利用积累的知识为特定任务提供定制的预训练模型。我们还证明我们的方法可以扩展到标准LLM微调场景之外。代码可在https://github.com/dohoonkim-ai/DoMIX 上获得。

更新时间: 2025-07-03 04:13:01

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.02302v1

Implicit Counterfactual Data Augmentation for Robust Learning

Machine learning models are prone to capturing the spurious correlations between non-causal attributes and classes, with counterfactual data augmentation being a promising direction for breaking these spurious associations. However, generating counterfactual data explicitly poses a challenge, and incorporating augmented data into the training process decreases training efficiency. This study proposes an Implicit Counterfactual Data Augmentation (ICDA) method to remove spurious correlations and make stable predictions. Specifically, first, a novel sample-wise augmentation strategy is developed that generates semantically and counterfactually meaningful deep features with distinct augmentation strength for each sample. Second, we derive an easy-to-compute surrogate loss on the augmented feature set when the number of augmented samples becomes infinite. Third, two concrete schemes are proposed, including direct quantification and meta-learning, to derive the key parameters for the robust loss. In addition, ICDA is explained from a regularization perspective, revealing its capacity to improve intra-class compactness and augment margins at both class and sample levels. Extensive experiments have been conducted across various biased learning scenarios covering both image and text datasets, demonstrating that ICDA consistently enhances the generalization and robustness performance of popular networks.

Updated: 2025-07-03 04:06:34

标题: 隐性反事实数据增强用于鲁棒学习

摘要: 机器学习模型容易捕捉非因果属性和类之间的虚假相关性，反事实数据增强是打破这些虚假关联的一个有前途的方向。然而，生成反事实数据明确地提出了一个挑战，并且将增强数据纳入训练过程会降低训练效率。本研究提出了一种隐式反事实数据增强（ICDA）方法，以消除虚假相关性并进行稳定预测。具体来说，首先，开发了一种新颖的样本级增强策略，生成对每个样本具有不同增强强度的语义和反事实有意义的深度特征。其次，当增强样本数量变为无穷大时，我们推导出一种易于计算的替代损失函数。第三，提出了两种具体方案，包括直接量化和元学习，用于推导稳健损失的关键参数。此外，ICDA从正规化角度进行了解释，揭示了其改善类内紧凑性和增强类和样本级别边界的能力。在涵盖图像和文本数据集的各种偏斜学习场景中进行了大量实验，结果表明ICDA持续增强了流行网络的泛化性能和稳健性能。

更新时间: 2025-07-03 04:06:34

领域: cs.LG

下载: http://arxiv.org/abs/2304.13431v3

Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models

Sarcasm detection, as a crucial research direction in the field of Natural Language Processing (NLP), has attracted widespread attention. Traditional sarcasm detection tasks have typically focused on single-modal approaches (e.g., text), but due to the implicit and subtle nature of sarcasm, such methods often fail to yield satisfactory results. In recent years, researchers have shifted the focus of sarcasm detection to multi-modal approaches. However, effectively leveraging multi-modal information to accurately identify sarcastic content remains a challenge that warrants further exploration. Leveraging the powerful integrated processing capabilities of Multi-Modal Large Language Models (MLLMs) for various information sources, we propose an innovative multi-modal Commander-GPT framework. Inspired by military strategy, we first decompose the sarcasm detection task into six distinct sub-tasks. A central commander (decision-maker) then assigns the best-suited large language model to address each specific sub-task. Ultimately, the detection results from each model are aggregated to identify sarcasm. We conducted extensive experiments on MMSD and MMSD 2.0, utilizing four multi-modal large language models and six prompting strategies. Our experiments demonstrate that our approach achieves state-of-the-art performance, with a 19.3% improvement in F1 score, without necessitating fine-tuning or ground-truth rationales.

Updated: 2025-07-03 04:04:04

标题: 指挥官-GPT：充分释放多模态大型语言模型的讽刺检测能力

摘要: 讽刺检测作为自然语言处理领域的一个关键研究方向，引起了广泛关注。传统的讽刺检测任务通常集中在单模态方法（例如文本），但由于讽刺的含蓄和微妙的特性，这些方法通常无法产生令人满意的结果。近年来，研究人员将讽刺检测的重点转向多模态方法。然而，有效地利用多模态信息来准确识别讽刺内容仍然是一个值得进一步探索的挑战。借助多模态大语言模型（MLLMs）强大的集成处理功能，我们提出了一种创新的多模态指挥官-GPT框架。受军事战略启发，我们首先将讽刺检测任务分解为六个明确的子任务。然后，中央指挥官（决策者）为每个特定子任务分配最适合的大语言模型。最终，来自每个模型的检测结果被聚合以识别讽刺。我们在MMSD和MMSD 2.0上进行了大量实验，利用了四个多模态大语言模型和六种提示策略。我们的实验表明，我们的方法实现了最先进的性能，F1分数提高了19.3％，而无需微调或地面真相理由。

更新时间: 2025-07-03 04:04:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.18681v3

Knowledge Graph-Based Explainable and Generalized Zero-Shot Semantic Communications

Data-driven semantic communication is based on superficial statistical patterns, thereby lacking interpretability and generalization, especially for applications with the presence of unseen data. To address these challenges, we propose a novel knowledge graph-enhanced zero-shot semantic communication (KGZS-SC) network. Guided by the structured semantic information from a knowledge graph-based semantic knowledge base (KG-SKB), our scheme provides generalized semantic representations and enables reasoning for unseen cases. Specifically, the KG-SKB aligns the semantic features in a shared category semantics embedding space and enhances the generalization ability of the transmitter through aligned semantic features, thus reducing communication overhead by selectively transmitting compact visual semantics. At the receiver, zero-shot learning (ZSL) is leveraged to enable direct classification for unseen cases without the demand for retraining or additional computational overhead, thereby enhancing the adaptability and efficiency of the classification process in dynamic or resource-constrained environments. The simulation results conducted on the APY datasets show that the proposed KGZS-SC network exhibits robust generalization and significantly outperforms existing SC frameworks in classifying unseen categories across a range of SNR levels.

Updated: 2025-07-03 03:57:26

标题: 基于知识图的可解释和泛化的零-shot语义通信

摘要: 基于数据驱动的语义通信是基于表面统计模式的，因此缺乏可解释性和泛化性，特别是对于存在未知数据的应用。为了解决这些挑战，我们提出了一种新颖的知识图增强的零样本语义通信（KGZS-SC）网络。在知识图为基础的语义知识库（KG-SKB）提供的结构化语义信息的指导下，我们的方案提供了泛化的语义表示，并使未知情况的推理成为可能。具体而言，KG-SKB将共享类别语义嵌入空间中的语义特征进行对齐，并通过对齐的语义特征增强发射器的泛化能力，从而通过选择性传输紧凑的视觉语义来减少通信开销。在接收端，利用零样本学习（ZSL）实现了对未知情况的直接分类，无需重新训练或额外的计算开销，从而增强了在动态或资源受限环境中分类过程的适应性和效率。在APY数据集上进行的仿真结果显示，所提出的KGZS-SC网络在跨一系列信噪比水平分类未知类别方面表现出强大的泛化能力，并明显优于现有的SC框架。

更新时间: 2025-07-03 03:57:26

领域: cs.LG,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2507.02291v1

Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain

The rising influence of social media platforms in various domains, including tourism, has highlighted the growing need for efficient and automated Natural Language Processing (NLP) strategies to take advantage of this valuable resource. However, the transformation of multilingual, unstructured, and informal texts into structured knowledge still poses significant challenges, most notably the never-ending requirement for manually annotated data to train deep learning classifiers. In this work, we study different NLP techniques to establish the best ones to obtain competitive performances while keeping the need for training annotated data to a minimum. To do so, we built the first publicly available multilingual dataset (French, English, and Spanish) for the tourism domain, composed of tourism-related tweets. The dataset includes multilayered, manually revised annotations for Named Entity Recognition (NER) for Locations and Fine-grained Thematic Concepts Extraction mapped to the Thesaurus of Tourism and Leisure Activities of the World Tourism Organization, as well as for Sentiment Analysis at the tweet level. Extensive experimentation comparing various few-shot and fine-tuning techniques with modern language models demonstrate that modern few-shot techniques allow us to obtain competitive results for all three tasks with very little annotation data: 5 tweets per label (15 in total) for Sentiment Analysis, 30 tweets for Named Entity Recognition of Locations and 1K tweets annotated with fine-grained thematic concepts, a highly fine-grained sequence labeling task based on an inventory of 315 classes. We believe that our results, grounded in a novel dataset, pave the way for applying NLP to new domain-specific applications, reducing the need for manual annotations and circumventing the complexities of rule-based, ad-hoc solutions.

Updated: 2025-07-03 03:54:21

标题: 在旅游领域进行多语言社交内容分析的最佳策略：针对一个新数据集的研究

摘要: 社交媒体平台在各个领域的日益影响，包括旅游业，凸显了对高效和自动化自然语言处理（NLP）策略的增长需求，以利用这一宝贵资源。然而，将多语言、非结构化和非正式文本转化为结构化知识仍然面临着重大挑战，尤其是对于训练深度学习分类器而永无止境地需要手动标注数据的要求。在这项工作中，我们研究了不同的NLP技术，以确定获取竞争性表现的最佳方法，同时将对训练标注数据的需求降至最低。为此，我们构建了旅游领域的首个公开可用的多语言数据集（法语、英语和西班牙语），由与旅游相关的推文组成。该数据集包括为地点命名实体识别（NER）和映射到世界旅游组织旅游和休闲活动词库的细粒度主题概念提取进行人工修订的多层注释，以及针对推文级别的情感分析。通过比较现代语言模型的各种少样本学习和微调技术的大量实验表明，现代少样本学习技术使我们能够用非常少的标注数据（每个标签5条推文（共15条）用于情感分析，30条推文用于地点命名实体识别和1K条用于细粒度主题概念的标注推文）获得所有三个任务的竞争性结果，这是一项基于315个类别清单的高度细粒度的序列标注任务。我们相信，基于新颖数据集的结果为将NLP应用于新的领域特定应用开辟了道路，减少了对手动标注的需求，并规避了基于规则的临时解决方案的复杂性。

更新时间: 2025-07-03 03:54:21

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2311.14727v2

Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains. Notably, recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models. Despite the increasing attention toward VFM-based domain prompt tuning within DG, the effective design of prompts capable of disentangling invariant features across diverse domains remains a critical challenge. In this paper, we propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM. Noting that the text modality of VFMs is naturally easier to disentangle, we introduce a novel framework for text feature-guided visual prompt tuning. This framework first automatically disentangles the text prompt using a large language model (LLM) and then learns domain-invariant visual representation guided by the disentangled text feature. However, relying solely on language to guide visual feature disentanglement has limitations, as visual features can sometimes be too complex or nuanced to be fully captured by descriptive text. To address this, we introduce Worst Explicit Representation Alignment (WERA), which extends text-guided visual prompts by incorporating an additional set of abstract prompts. These prompts enhance source domain diversity through stylized image augmentations, while alignment constraints ensure that visual representations remain consistent across both the original and augmented distributions. Experiments conducted on major DG datasets, including PACS, VLCS, OfficeHome, DomainNet, and TerraInc, demonstrate that our proposed method outperforms state-of-the-art DG methods.

Updated: 2025-07-03 03:52:37

标题: 通过语言引导和表示对齐促进领域泛化

摘要: 通用领域泛化(DG)旨在开发一种多功能模型，能够在未知目标领域上有效执行。值得注意的是，最近预训练的视觉基础模型(VFMs)，如CLIP，已经展示出增强深度学习模型泛化能力的巨大潜力。尽管在DG中基于VFM的领域提示调整越来越受到关注，但设计能够解开不同领域中不变特征的有效提示仍然是一个关键挑战。在本文中，我们提出通过利用VFM的可控和灵活的语言提示来解决这一挑战。认识到VFM的文本模态自然更容易解开，我们引入了一种新颖的文本特征引导的视觉提示调整框架。该框架首先利用大型语言模型(LLM)自动解开文本提示，然后通过解开的文本特征引导学习领域不变的视觉表示。然而，仅依赖语言来引导视觉特征解开存在局限性，因为视觉特征有时可能过于复杂或微妙，无法完全被描述性文本捕捉。为了解决这个问题，我们引入了最差明确表示对齐(WERA)，通过引入一组额外的抽象提示来扩展文本引导的视觉提示。这些提示通过风格化的图像增强增加源域的多样性，同时对齐约束确保视觉表示在原始和增强分布之间保持一致。在包括PACS、VLCS、OfficeHome、DomainNet和TerraInc在内的主要DG数据集上进行的实验表明，我们提出的方法优于最先进的DG方法。

更新时间: 2025-07-03 03:52:37

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.02288v1

Uncertainty-Guided Coarse-to-Fine Tumor Segmentation with Anatomy-Aware Post-Processing

Reliable tumor segmentation in thoracic computed tomography (CT) remains challenging due to boundary ambiguity, class imbalance, and anatomical variability. We propose an uncertainty-guided, coarse-to-fine segmentation framework that combines full-volume tumor localization with refined region-of-interest (ROI) segmentation, enhanced by anatomically aware post-processing. The first-stage model generates a coarse prediction, followed by anatomically informed filtering based on lung overlap, proximity to lung surfaces, and component size. The resulting ROIs are segmented by a second-stage model trained with uncertainty-aware loss functions to improve accuracy and boundary calibration in ambiguous regions. Experiments on private and public datasets demonstrate improvements in Dice and Hausdorff scores, with fewer false positives and enhanced spatial interpretability. These results highlight the value of combining uncertainty modeling and anatomical priors in cascaded segmentation pipelines for robust and clinically meaningful tumor delineation. On the Orlando dataset, our framework improved Swin UNETR Dice from 0.4690 to 0.6447. Reduction in spurious components was strongly correlated with segmentation gains, underscoring the value of anatomically informed post-processing.

Updated: 2025-07-03 03:50:40

标题: 不确定性引导的粗到细瘤体分割与基于解剖学的后处理

摘要: 胸部计算机断层扫描（CT）中可靠的肿瘤分割仍然具有挑战性，原因在于边界模糊、类别不平衡和解剖变异。我们提出了一种基于不确定性引导的粗到精分割框架，将全容积肿瘤定位与经过精细处理的感兴趣区域（ROI）分割相结合，通过解剖感知后处理进行增强。第一阶段模型生成粗略预测，随后结合肺部重叠、距离肺表面和组件大小等解剖信息进行过滤。由不确定性感知损失函数训练的第二阶段模型对生成的ROI进行分割，以改善模糊区域的准确性和边界校准。在私人和公共数据集上的实验表明，Dice和Hausdorff分数有所提高，假阳性减少，并增强了空间可解释性。这些结果突显了在级联分割管道中结合不确定性建模和解剖先验的价值，以实现肿瘤划定的稳健性和临床意义。在奥兰多数据集上，我们的框架将Swin UNETR Dice从0.4690提高到0.6447。虚假组件的减少与分割增益强烈相关，强调了解剖感知后处理的价值。

更新时间: 2025-07-03 03:50:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.12215v2

Content filtering methods for music recommendation: A review

Recommendation systems have become essential in modern music streaming platforms, shaping how users discover and engage with songs. One common approach in recommendation systems is collaborative filtering, which suggests content based on the preferences of users with similar listening patterns to the target user. However, this method is less effective on media where interactions are sparse. Music is one such medium, since the average user of a music streaming service will never listen to the vast majority of tracks. Due to this sparsity, there are several challenges that have to be addressed with other methods. This review examines the current state of research in addressing these challenges, with an emphasis on the role of content filtering in mitigating biases inherent in collaborative filtering approaches. We explore various methods of song classification for content filtering, including lyrical analysis using Large Language Models (LLMs) and audio signal processing techniques. Additionally, we discuss the potential conflicts between these different analysis methods and propose avenues for resolving such discrepancies.

Updated: 2025-07-03 03:44:20

标题: 音乐推荐的内容过滤方法：一项回顾

摘要: 推荐系统在现代音乐流媒体平台中变得至关重要，塑造了用户发现和参与歌曲的方式。推荐系统中的一种常见方法是协同过滤，根据与目标用户具有相似听歌模式的用户的偏好来建议内容。然而，在交互稀疏的媒体上，这种方法效果较差。音乐就是这样一种媒体，因为音乐流媒体服务的平均用户永远不会听大多数曲目。由于这种稀疏性，必须通过其他方法解决几个挑战。本文审视了当前研究在解决这些挑战方面的现状，重点关注内容过滤在减轻协同过滤方法固有偏见中的作用。我们探讨了用于内容过滤的各种歌曲分类方法，包括使用大型语言模型（LLMs）进行歌词分析和音频信号处理技术。此外，我们讨论了这些不同分析方法之间可能存在的潜在冲突，并提出了解决这些差异的途径。

更新时间: 2025-07-03 03:44:20

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02282v1

Linearly Homomorphic Ring Signature Scheme over Lattices

Homomorphic ring signature schemes combine the strong anonymity of ring signatures with the computability of homomorphic signatures, demonstrating significant potential in scenarios requiring both anonymous data provenance and verifiable homomorphic computation (e.g., confidential blockchain transactions and secure multi-party computation). However, no feasible homomorphic ring signature scheme currently exists. In this work, we propose the first lattice-based linearly homomorphic ring signature scheme. Proven secure in the standard model under the small integer solution (SIS) assumption, our scheme achieves strong anonymity under full key exposure and unforgeability against insider corruption attacks. As the first unified framework for ring signatures and linear homomorphic signatures, this construction provides a post-quantum-secure solution for the aforementioned applications, advancing the development of privacy-enhanced homomorphic computation.

Updated: 2025-07-03 03:43:40

标题: 基于格的线性同态环签名方案

摘要: 同态环签名方案结合了环签名的强匿名性和同态签名的可计算性，在需要匿名数据来源和可验证同态计算的场景中展示出显著的潜力（例如，机密的区块链交易和安全的多方计算）。然而，目前尚不存在可行的同态环签名方案。在这项工作中，我们提出了第一个基于格的线性同态环签名方案。在小整数解（SIS）假设下，我们的方案在标准模型下被证明是安全的，实现了在完全密钥暴露下的强匿名性和抵御内部腐败攻击的不可伪造性。作为环签名和线性同态签名的第一个统一框架，这个构造为上述应用提供了后量子安全的解决方案，推动了隐私增强同态计算的发展。

更新时间: 2025-07-03 03:43:40

领域: cs.CR

下载: http://arxiv.org/abs/2507.02281v1

On Compression Functions over Groups with Applications to Homomorphic Encryption

Fully homomorphic encryption (FHE) enables an entity to perform arbitrary computation on encrypted data without decrypting the ciphertexts. An ongoing group-theoretical approach to construct an FHE scheme uses a certain "compression" function $F(x)$ implemented by group operations on a given finite group $G$, which satisfies that $F(1) = 1$ and $F(\sigma) = F(\sigma^2) = \sigma$ where $\sigma \in G$ is some element of order $3$. The previous work gave an example of such a function over the symmetric group $G = S_5$ by just a heuristic approach. In this paper, we systematically study the possibilities of such a function over various groups. We show that such a function does not exist over any solvable group $G$ (such as an Abelian group and a smaller symmetric group $S_n$ with $n \leq 4$). We also construct such a function over the alternating group $G = A_5$ that has a shortest possible expression. Moreover, by using this new function, we give a reduction of a construction of an FHE scheme to a construction of a homomorphic encryption scheme over the group $A_5$, which is more efficient than the previously known reductions.

Updated: 2025-07-03 03:39:00

标题: 关于在群上的压缩函数及其在同态加密中的应用

摘要: 完全同态加密（FHE）使实体能够在不解密密文的情况下对加密数据进行任意计算。一种正在进行中的群论方法用于构建FHE方案，该方法使用给定有限群G上的一种“压缩”函数$F(x)$，该函数通过群操作实现，满足$F(1) = 1$和$F(\sigma) = F(\sigma^2) = \sigma$，其中$\sigma \in G$是某个阶为3的元素。先前的工作在对称群$G = S_5$上通过一种启发式方法给出了这种函数的示例。在本文中，我们系统地研究了在各种群中存在这种函数的可能性。我们证明这种函数在任何可解群$G$（如可交换群和阶小于4的对称群$S_n$）中都不存在。我们还在交错群$G = A_5$上构造了这种函数，其表达式最短。此外，通过使用这个新函数，我们将构建FHE方案的过程简化为在群$A_5$上构建同态加密方案的过程，这比先前已知的简化方法更有效。

更新时间: 2025-07-03 03:39:00

领域: math.GR,cs.CR,20D60, 94A60

下载: http://arxiv.org/abs/2208.02468v2

Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments

In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence (GenAI). Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics, from other possibly unknown confounding features. Unlike existing methods, the proposed GenAI-Powered Inference (GPI) methodology eliminates the need to learn causal representation from the data, and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed methodology to the settings in which the treatment feature is based on human perception. The proposed GPI methodology is also applicable to text reuse where an LLM is used to regenerate existing texts. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama 3, to illustrate the advantages of our estimator over state-of-the-art causal representation learning algorithms.

Updated: 2025-07-03 03:31:52

标题: 用生成人工智能进行因果关系表示学习：文本作为治疗的应用

摘要: 在这篇论文中，我们展示了如何通过利用生成人工智能（GenAI）的力量来增强对类似文本这样的非结构化高维处理的因果推断的有效性。具体来说，我们建议使用深度生成模型，如大型语言模型（LLMs），有效地生成处理并利用它们的内部表示进行后续因果效应估计。我们展示了对这种真实内部表示的了解有助于将处理的感兴趣特征（如特定情绪和特定主题）与其他可能未知的混淆特征区分开来。与现有方法不同，所提出的GenAI-Powered Inference（GPI）方法消除了从数据中学习因果表示的需要，因此产生了更准确和高效的估计。我们正式建立了非参数识别平均处理效应所需的条件，提出了一种避免违反重叠假设的估计策略，并通过双机器学习的应用推导出所提出估计器的渐近特性。最后，通过工具变量方法，我们将所提出的方法扩展到基于人类感知的处理特征的设置。所提出的GPI方法也适用于文本复用，在这种情况下，LLM用于重新生成现有文本。我们进行了模拟和实证研究，使用来自开源LLM Llama 3的生成文本数据，以说明我们的估计器相对于最先进的因果表示学习算法的优势。

更新时间: 2025-07-03 03:31:52

领域: stat.AP,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.00903v3

It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a surprising way on the distribution of the treatment noise. Focusing on the partially linear model of \citet{robinson1988root}, we first show that the widely adopted double machine learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise, resolving an open problem of \citet{mackey2018orthogonal}. Meanwhile, for independent non-Gaussian treatment noise, we show that DML is always suboptimal by constructing new practical procedures with higher-order robustness to nuisance errors. These \emph{ACE} procedures use structure-agnostic cumulant estimators to achieve $r$-th order insensitivity to nuisance errors whenever the $(r+1)$-st treatment cumulant is non-zero. We complement these core results with novel minimax guarantees for binary treatments in the partially linear model. Finally, using synthetic demand estimation experiments, we demonstrate the practical benefits of our higher-order robust estimators.

Updated: 2025-07-03 03:31:45

标题: 很难保持正常：噪音对与结构无关估计的影响

摘要: 结构无关的因果推断研究了在给定黑匣子机器学习对干扰函数（如混淆因素对治疗和结果的影响）的估计情况下，一个人能够多好地估计治疗效应。在这里，我们发现答案以一种令人惊讶的方式取决于治疗噪声的分布。在关注\citet{robinson1988root}的部分线性模型的基础上，我们首先展示了广泛采用的双重机器学习（DML）估计量对于高斯治疗噪声是最小最优率的，解决了\citet{mackey2018orthogonal}的一个未解决的问题。同时，对于独立的非高斯治疗噪声，我们通过构建具有更高阶鲁棒性的新实用程序，展示了DML始终是次优的。这些\emph{ACE}程序使用结构无关的累积估计量，在第$(r+1)$个治疗累积量非零时，实现对干扰误差的$r$阶不敏感性。我们补充这些核心结果，为部分线性模型中的二元治疗提供了新的最小最优保证。最后，通过合成需求估计实验，我们展示了我们的高阶鲁棒估计器的实际好处。

更新时间: 2025-07-03 03:31:45

领域: stat.ML,cs.LG,econ.EM,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2507.02275v1

Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To advance this goal, we launched the RoboTwin Dual-Arm Collaboration Challenge at the 2nd MEIS Workshop, CVPR 2025. Built on the RoboTwin Simulation platform (1.0 and 2.0) and the AgileX COBOT-Magic Robot platform, the competition consisted of three stages: Simulation Round 1, Simulation Round 2, and a final Real-World Round. Participants totally tackled 17 dual-arm manipulation tasks, covering rigid, deformable, and tactile-based scenarios. The challenge attracted 64 global teams and over 400 participants, producing top-performing solutions like SEM and AnchorDP3 and generating valuable insights into generalizable bimanual policy learning. This report outlines the competition setup, task design, evaluation methodology, key findings and future direction, aiming to support future research on robust and generalizable bimanual manipulation policies. The Challenge Webpage is available at https://robotwin-benchmark.github.io/cvpr-2025-challenge/.

Updated: 2025-07-03 03:30:42

标题: 基准测试通用双手操纵：CVPR 2025 MEIS研讨会上的RoboTwin双臂协作挑战

摘要: 具体化人工智能（Embodied AI）是机器人领域的新兴前沿，受到对能够在复杂物理环境中感知、推理和行动的自主系统的需求推动。虽然单臂系统已经展示出强大的任务性能，但协作双臂系统对于处理涉及刚性、可变形和触觉敏感对象的更复杂任务至关重要。为了推动这一目标，我们在第二届MEIS研讨会CVPR 2025上启动了RoboTwin双臂协作挑战赛。该竞赛基于RoboTwin模拟平台（1.0和2.0）和AgileX COBOT-Magic机器人平台，竞赛包括三个阶段：模拟第一轮、模拟第二轮和最终的现实世界轮。参与者共解决了17个双臂操作任务，涵盖了刚性、可变形和基于触觉的场景。这一挑战吸引了来自全球的64支团队和400多名参与者，产生了像SEM和AnchorDP3这样的表现出色的解决方案，并为可普适的双手策略学习提供了宝贵的见解。本报告概述了比赛设置、任务设计、评估方法、关键发现和未来方向，旨在支持未来关于稳健和可普适的双手操作策略的研究。挑战赛网页可在https://robotwin-benchmark.github.io/cvpr-2025-challenge/上找到。

更新时间: 2025-07-03 03:30:42

领域: cs.RO,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2506.23351v2

Spotlighting Partially Visible Cinematic Language for Video-to-Audio Generation via Self-distillation

Video-to-Audio (V2A) Generation achieves significant progress and plays a crucial role in film and video post-production. However, current methods overlook the cinematic language, a critical component of artistic expression in filmmaking. As a result, their performance deteriorates in scenarios where Foley targets are only partially visible. To address this challenge, we propose a simple self-distillation approach to extend V2A models to cinematic language scenarios. By simulating the cinematic language variations, the student model learns to align the video features of training pairs with the same audio-visual correspondences, enabling it to effectively capture the associations between sounds and partial visual information. Our method not only achieves impressive improvements under partial visibility across all evaluation metrics, but also enhances performance on the large-scale V2A dataset, VGGSound.

Updated: 2025-07-03 03:23:11

标题: 突出部分可见的电影语言，通过自我蒸馏实现视频到音频生成

摘要: 视频到音频（V2A）生成取得了显著进展，并在电影和视频后期制作中发挥了至关重要的作用。然而，当前的方法忽视了电影语言，这是电影制作中艺术表达的关键组成部分。因此，在只有部分可见的情景下，它们的性能会下降。为了解决这一挑战，我们提出了一种简单的自我蒸馏方法，将V2A模型扩展到电影语言场景。通过模拟电影语言变化，学生模型学会了将训练对的视频特征与相同的音频-视觉对应关系进行对齐，使其能够有效捕捉声音和部分视觉信息之间的关联。我们的方法不仅在部分可见性下在所有评估指标上取得了显著改进，还提高了在大规模V2A数据集VGGSound上的性能。

更新时间: 2025-07-03 03:23:11

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2507.02271v1

Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability

Generative adversarial attacks train a perturbation generator on a white-box surrogate model and subsequently apply the crafted perturbations to unseen black-box victim models. In contrast to iterative attacks, these methods deliver superior inference-time efficiency, scalability, and transferability; however, up until now, existing studies have not fully exploited the representational capacity of generative models to preserve and harness semantic information. Specifically, the intermediate activations of the generator encode rich semantic features--object boundaries and coarse shapes--that remain under-exploited, thereby limiting the alignment of perturbations with object-salient regions which are critical for adversarial transferability. To remedy this, we introduce a semantic structure-aware attack framework based on the Mean Teacher, which serves as a temporally smoothed feature reference. With this smoothed reference, we further direct semantic consistency between the early-layer activations in the student and those of the semantically rich teacher by feature distillation. By anchoring perturbation synthesis to the semantically salient early intermediate blocks within the generator based on empirical findings, our method guides progressive adversarial perturbation on regions that substantially enhance adversarial transferability. We conduct extensive experiments over diverse models, domains and tasks to demonstrate consistent improvements relative to state-of-the-art generative attacks, comprehensively evaluated using conventional metrics and our newly proposed Accidental Correction Rate (ACR).

Updated: 2025-07-03 03:17:52

标题: 语义结构感知生成式攻击以增强对抗迁移性

摘要: 生成对抗攻击通过在白盒替代模型上训练一个扰动生成器，然后将精心设计的扰动应用于未见的黑盒受害者模型。与迭代攻击相比，这些方法在推理时间效率、可扩展性和可转移性方面具有优势；然而，迄今为止，现有研究尚未充分利用生成模型的表征能力来保留和利用语义信息。具体而言，生成器的中间激活编码丰富的语义特征--物体边界和粗糙形状--仍然未充分利用，从而限制了扰动与关键的物体显著区域对齐，这对于对抗性可转移性至关重要。为了解决这个问题，我们引入了一种基于Mean Teacher的语义结构感知攻击框架，Mean Teacher作为一个时间平滑的特征参考。通过这个平滑的参考，我们通过特征蒸馏进一步引导学生的早期层激活与语义丰富的教师的激活之间的语义一致性。通过根据实证结果将扰动合成锚定到生成器内部的语义显著的早期中间块，我们的方法指导渐进的对抗性扰动在显著增强对抗性可转移性的区域。我们进行了广泛的实验，涵盖了不同模型、领域和任务，以证明相对于最先进的生成式攻击方法的持续改进，全面评估使用传统指标和我们新提出的意外修正率（ACR）。

更新时间: 2025-07-03 03:17:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.18248v2

Adaptive Cubic Regularized Second-Order Latent Factor Analysis Model

High-dimensional and incomplete (HDI) data, characterized by massive node interactions, have become ubiquitous across various real-world applications. Second-order latent factor models have shown promising performance in modeling this type of data. Nevertheless, due to the bilinear and non-convex nature of the SLF model's objective function, incorporating a damping term into the Hessian approximation and carefully tuning associated parameters become essential. To overcome these challenges, we propose a new approach in this study, named the adaptive cubic regularized second-order latent factor analysis (ACRSLF) model. The proposed ACRSLF adopts the two-fold ideas: 1) self-tuning cubic regularization that dynamically mitigates non-convex optimization instabilities; 2) multi-Hessian-vector product evaluation during conjugate gradient iterations for precise second-order information assimilation. Comprehensive experiments on two industrial HDI datasets demonstrate that the ACRSLF converges faster and achieves higher representation accuracy than the advancing optimizer-based LFA models.

Updated: 2025-07-03 03:15:54

标题: 自适应立方正则化二阶潜在因子分析模型

摘要: 高维度和不完整（HDI）数据以大量节点交互为特征，在各种实际应用中变得普遍。二阶潜在因素模型在建模这种类型数据方面表现出有希望的性能。然而，由于SLF模型目标函数是双线性且非凸的特性，将阻尼项纳入Hessian逼近并仔细调整相关参数变得至关重要。为了克服这些挑战，我们在这项研究中提出了一种新的方法，名为自适应三次正则化二阶潜在因素分析（ACRSLF）模型。提出的ACRSLF采用两个思想：1）自适应三次正则化，动态减轻非凸优化不稳定性；2）在共轭梯度迭代期间进行多Hessian-向量乘积评估，以精确吸收二阶信息。对两个工业HDI数据集进行的全面实验表明，ACRSLF比进阶优化器基础的LFA模型收敛更快，并实现了更高的表征准确度。

更新时间: 2025-07-03 03:15:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03036v1

Explainable AI for Comprehensive Risk Assessment for Financial Reports: A Lightweight Hierarchical Transformer Network Approach

Every publicly traded U.S. company files an annual 10-K report containing critical insights into financial health and risk. We propose Tiny eXplainable Risk Assessor (TinyXRA), a lightweight and explainable transformer-based model that automatically assesses company risk from these reports. Unlike prior work that relies solely on the standard deviation of excess returns (adjusted for the Fama-French model), which indiscriminately penalizes both upside and downside risk, TinyXRA incorporates skewness, kurtosis, and the Sortino ratio for more comprehensive risk assessment. We leverage TinyBERT as our encoder to efficiently process lengthy financial documents, coupled with a novel dynamic, attention-based word cloud mechanism that provides intuitive risk visualization while filtering irrelevant terms. This lightweight design ensures scalable deployment across diverse computing environments with real-time processing capabilities for thousands of financial documents which is essential for production systems with constrained computational resources. We employ triplet loss for risk quartile classification, improving over pairwise loss approaches in existing literature by capturing both the direction and magnitude of risk differences. Our TinyXRA achieves state-of-the-art predictive accuracy across seven test years on a dataset spanning 2013-2024, while providing transparent and interpretable risk assessments. We conduct comprehensive ablation studies to evaluate our contributions and assess model explanations both quantitatively by systematically removing highly attended words and sentences, and qualitatively by examining explanation coherence. The paper concludes with findings, practical implications, limitations, and future research directions. Our code is available at https://github.com/Chen-XueWen/TinyXRA.

Updated: 2025-07-03 03:15:52

标题: 可解释的人工智能用于财务报告综合风险评估：一种轻量级分层变压器网络方法

摘要: 每家在美国上市的公司都要提交一份包含对财务健康和风险关键见解的年度10-K报告。我们提出了Tiny eXplainable Risk Assessor（TinyXRA），这是一个轻量级且可解释的基于变压器的模型，可以自动评估这些报告中的公司风险。与先前仅依赖于超额回报标准差（根据法玛-弗伦奇模型调整）的工作不同，后者会不加区分地惩罚上行和下行风险，TinyXRA还考虑了偏度、峰度和Sortino比率，以实现更全面的风险评估。我们利用TinyBERT作为我们的编码器，以高效处理冗长的财务文件，结合一种新颖的动态、基于注意力的词云机制，提供直观的风险可视化，同时过滤掉不相关的术语。这种轻量级设计确保能够在各种计算环境中进行可扩展部署，具有实时处理能力，适用于具有受限计算资源的生产系统。我们采用三元损失进行风险四分位分类，通过捕捉风险差异的方向和大小，改进了现有文献中的成对损失方法。我们的TinyXRA在跨越2013-2024年的数据集上的七个测试年中实现了最先进的预测准确性，同时提供透明且可解释的风险评估。我们进行了全面的消融研究，通过有系统地删除高度关注的单词和句子来定量评估我们的贡献，并通过检查解释的连贯性来定性评估。论文总结了研究结果、实际影响、限制和未来研究方向。我们的代码可在https://github.com/Chen-XueWen/TinyXRA上找到。

更新时间: 2025-07-03 03:15:52

领域: q-fin.RM,cs.LG

下载: http://arxiv.org/abs/2506.23767v2

Multi-Label Classification Framework for Hurricane Damage Assessment

Hurricanes cause widespread destruction, resulting in diverse damage types and severities that require timely and accurate assessment for effective disaster response. While traditional single-label classification methods fall short of capturing the complexity of post-hurricane damage, this study introduces a novel multi-label classification framework for assessing damage using aerial imagery. The proposed approach integrates a feature extraction module based on ResNet and a class-specific attention mechanism to identify multiple damage types within a single image. Using the Rescuenet dataset from Hurricane Michael, the proposed method achieves a mean average precision of 90.23%, outperforming existing baseline methods. This framework enhances post-hurricane damage assessment, enabling more targeted and efficient disaster response and contributing to future strategies for disaster mitigation and resilience. This paper has been accepted at the ASCE International Conference on Computing in Civil Engineering (i3CE 2025), and the camera-ready version will appear in the official conference proceedings.

Updated: 2025-07-03 03:15:23

标题: 飓风损害评估的多标签分类框架

摘要: 飓风造成广泛破坏，导致不同类型和严重程度的损害，需要及时准确的评估以进行有效的灾难响应。传统的单标签分类方法无法捕捉飓风后损害的复杂性，本研究引入了一种新颖的多标签分类框架，用于利用航空图像评估损害。所提出的方法集成了基于ResNet的特征提取模块和具有类别特定关注机制，以识别单个图像中的多种损害类型。利用飓风迈克尔的Rescuenet数据集，所提出的方法实现了90.23%的平均精度，优于现有的基准方法。该框架增强了飓风后的损害评估，使灾难响应更有针对性和高效，并为灾害减轻和恢复的未来策略做出贡献。本文已被接受在ASCE土木工程计算国际会议（i3CE 2025）上，摄影就绪版本将出现在官方会议文集中。

更新时间: 2025-07-03 03:15:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02265v1

Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning

Molecular machine learning has gained popularity with the advancements of geometric deep learning. In parallel, retrieval-augmented generation has become a principled approach commonly used with language models. However, the optimal integration of retrieval augmentation into molecular machine learning remains unclear. Graph neural networks stand to benefit from clever matching to understand the structural alignment of retrieved molecules to a query molecule. Neural graph matching offers a compelling solution by explicitly modeling node and edge affinities between two structural graphs while employing a noise-robust, end-to-end neural network to learn affinity metrics. We apply this approach to mass spectrum simulation and introduce MARASON, a novel model that incorporates neural graph matching to enhance a fragmentation-based neural network. Experimental results highlight the effectiveness of our design, with MARASON achieving 28% top-1 accuracy, a substantial improvement over the non-retrieval state-of-the-art accuracy of 19%. Moreover, MARASON outperforms both naive retrieval-augmented generation methods and traditional graph matching approaches. Code is publicly available at https://github.com/coleygroup/ms-pred

Updated: 2025-07-03 03:15:08

标题: 神经图匹配在分子机器学习中改进了检索增强生成

摘要: 分子机器学习随着几何深度学习的进步而变得流行。同时，检索增强生成已成为一种常用的基于语言模型的原则性方法。然而，如何最优地将检索增强集成到分子机器学习中仍不清楚。图神经网络有望从巧妙的匹配中受益，以了解检索到的分子与查询分子的结构对齐。神经图匹配通过明确建模两个结构图之间的节点和边的亲和性，并使用抗噪声的端到端神经网络来学习亲和度度量，提供了一个引人注目的解决方案。我们将这种方法应用于质谱模拟，并引入了MARASON，这是一个新颖的模型，它将神经图匹配融入到基于碎裂的神经网络中以增强性能。实验结果突显了我们设计的有效性，MARASON达到了28%的top-1准确率，明显优于非检索状态下的最先进准确率19%。此外，MARASON表现优于简单的检索增强生成方法和传统的图匹配方法。代码可在https://github.com/coleygroup/ms-pred上公开获取。

更新时间: 2025-07-03 03:15:08

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2502.17874v2

NLP4Neuro: Sequence-to-sequence learning for neural population decoding

Delineating how animal behavior arises from neural activity is a foundational goal of neuroscience. However, as the computations underlying behavior unfold in networks of thousands of individual neurons across the entire brain, this presents challenges for investigating neural roles and computational mechanisms in large, densely wired mammalian brains during behavior. Transformers, the backbones of modern large language models (LLMs), have become powerful tools for neural decoding from smaller neural populations. These modern LLMs have benefited from extensive pre-training, and their sequence-to-sequence learning has been shown to generalize to novel tasks and data modalities, which may also confer advantages for neural decoding from larger, brain-wide activity recordings. Here, we present a systematic evaluation of off-the-shelf LLMs to decode behavior from brain-wide populations, termed NLP4Neuro, which we used to test LLMs on simultaneous calcium imaging and behavior recordings in larval zebrafish exposed to visual motion stimuli. Through NLP4Neuro, we found that LLMs become better at neural decoding when they use pre-trained weights learned from textual natural language data. Moreover, we found that a recent mixture-of-experts LLM, DeepSeek Coder-7b, significantly improved behavioral decoding accuracy, predicted tail movements over long timescales, and provided anatomically consistent highly interpretable readouts of neuron salience. NLP4Neuro demonstrates that LLMs are highly capable of informing brain-wide neural circuit dissection.

Updated: 2025-07-03 03:14:55

标题: NLP4Neuro: 序列到序列学习用于神经群体解码

摘要: 描绘动物行为如何起源于神经活动是神经科学的一个基础目标。然而，随着支撑行为的计算在整个大脑中数千个单个神经元的网络中展开，这给在行为期间研究神经角色和计算机制带来了挑战。变压器是现代大型语言模型（LLM）的支柱，已成为从较小神经人群解码的强大工具。这些现代LLM从广泛的预训练中受益，它们的序列到序列学习已被证明能够推广到新的任务和数据模态，这也可能为从更大、全脑活动记录中进行神经解码带来优势。在这里，我们对现成的LLM进行了系统评估，以从全脑人群解码行为，称为NLP4Neuro，我们用它来在暴露于视觉运动刺激的斑马鱼幼虫中进行同时钙成像和行为记录的LLM测试。通过NLP4Neuro，我们发现当它们使用从文本自然语言数据中学习的预训练权重时，LLM在神经解码方面变得更加优秀。此外，我们发现最近的混合专家LLM，DeepSeek Coder-7b，在行为解码准确性方面显著提高，可以预测长时间尺度上的尾部运动，并提供解剖学一致的高度可解释的神经元显著性读数。NLP4Neuro表明LLM能够很好地支持全脑神经回路解剖。

更新时间: 2025-07-03 03:14:55

领域: q-bio.NC,cs.LG

下载: http://arxiv.org/abs/2507.02264v1

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents with linear complexity without performance degradation during extrapolation remains the ultimate challenge in long-text processing. We directly optimize for long-text tasks in an end-to-end fashion and introduce a novel agent workflow, MemAgent, which reads text in segments and updates the memory using an overwrite strategy. We extend the DAPO algorithm to facilitate training via independent-context multi-conversation generation. MemAgent has demonstrated superb long-context capabilities, being able to extrapolate from an 8K context trained on 32K text to a 3.5M QA task with performance loss < 5% and achieves 95%+ in 512K RULER test.

Updated: 2025-07-03 03:11:50

标题: MemAgent: 使用多卷积RL-based记忆代理重新塑造长上下文LLM

摘要: 尽管通过长度外推、高效的注意力和记忆模块的改进，处理具有线性复杂度的无限长文档并且在外推过程中没有性能下降仍然是长文本处理中的最终挑战。我们以端到端的方式直接优化长文本任务，并引入了一种新颖的代理工作流，MemAgent，它以段的形式阅读文本并使用覆写策略更新内存。我们扩展了DAPO算法，以促进通过独立上下文的多轮对话生成进行训练。MemAgent展现出出色的长上下文能力，能够从训练于32K文本的8K上下文外推到一个3.5M的问答任务，性能损失<5%，并在512K的RULER测试中取得95%以上的成绩。

更新时间: 2025-07-03 03:11:50

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.02259v1

Uncertainty-aware Reward Design Process

Designing effective reward functions is a cornerstone of reinforcement learning (RL), yet it remains a challenging process due to the inefficiencies and inconsistencies inherent in conventional reward engineering methodologies. Recent advances have explored leveraging large language models (LLMs) to automate reward function design. However, their suboptimal performance in numerical optimization often yields unsatisfactory reward quality, while the evolutionary search paradigm demonstrates inefficient utilization of simulation resources, resulting in prohibitively lengthy design cycles with disproportionate computational overhead. To address these challenges, we propose the Uncertainty-aware Reward Design Process (URDP), a novel framework that integrates large language models to streamline reward function design and evaluation in RL environments. URDP quantifies candidate reward function uncertainty based on self-consistency analysis, enabling simulation-free identification of ineffective reward components while discovering novel reward components. Furthermore, we introduce uncertainty-aware Bayesian optimization (UABO), which incorporates uncertainty estimation to significantly enhance hyperparameter configuration efficiency. Finally, we construct a bi-level optimization architecture by decoupling the reward component optimization and the hyperparameter tuning. URDP orchestrates synergistic collaboration between the reward logic reasoning of the LLMs and the numerical optimization strengths of the Bayesian Optimization. We conduct a comprehensive evaluation of URDP across 35 diverse tasks spanning three benchmark environments. Our experimental results demonstrate that URDP not only generates higher-quality reward functions but also achieves significant improvements in the efficiency of automated reward design compared to existing approaches.

Updated: 2025-07-03 03:09:17

标题: 不确定性感知奖励设计过程

摘要: 设计有效的奖励函数是强化学习（RL）的基石，但由于传统奖励工程方法中固有的低效和不一致性，这仍然是一个具有挑战性的过程。最近的进展探索了利用大型语言模型（LLMs）自动化奖励函数设计。然而，它们在数值优化方面的次优表现通常会导致奖励质量不理想，而进化搜索范式展现出对模拟资源的低效利用，导致设计周期过长且计算开销过大。为了解决这些挑战，我们提出了一种新颖的框架，即不确定性感知奖励设计过程（URDP），它整合了大型语言模型，以简化RL环境中的奖励函数设计和评估。URDP通过自洽性分析对候选奖励函数的不确定性进行量化，从而实现无需模拟的识别无效奖励组件，同时发现新颖奖励组件。此外，我们引入了不确定性感知贝叶斯优化（UABO），它结合了不确定性估计，显著增强了超参数配置效率。最后，我们构建了一个双层优化架构，将奖励组件优化和超参数调整解耦。URDP协调了LLMs的奖励逻辑推理和贝叶斯优化的数值优化优势之间的协同合作。我们在涵盖三个基准环境的35个不同任务上对URDP进行了全面评估。我们的实验结果表明，与现有方法相比，URDP不仅生成更高质量的奖励函数，而且在自动奖励设计的效率方面取得了显著改进。

更新时间: 2025-07-03 03:09:17

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.02256v1

Listwise Preference Alignment Optimization for Tail Item Recommendation

Preference alignment has achieved greater success on Large Language Models (LLMs) and drawn broad interest in recommendation research. Existing preference alignment methods for recommendation either require explicit reward modeling or only support pairwise preference comparison. The former directly increases substantial computational costs, while the latter hinders training efficiency on negative samples. Moreover, no existing effort has explored preference alignment solutions for tail-item recommendation. To bridge the above gaps, we propose LPO4Rec, which extends the Bradley-Terry model from pairwise comparison to listwise comparison, to improve the efficiency of model training. Specifically, we derive a closed form optimal policy to enable more efficient and effective training without explicit reward modeling. We also present an adaptive negative sampling and reweighting strategy to prioritize tail items during optimization and enhance performance in tail-item recommendations. Besides, we theoretically prove that optimizing the listwise preference optimization (LPO) loss is equivalent to maximizing the upper bound of the optimal reward. Our experiments on three public datasets show that our method outperforms 10 baselines by a large margin, achieving up to 50% performance improvement while reducing 17.9% GPU memory usage when compared with direct preference optimization (DPO) in tail-item recommendation. Our code is available at https://github.com/Yuhanleeee/LPO4Rec.

Updated: 2025-07-03 03:08:23

标题: 尾部项目推荐的列表式偏好对齐优化

摘要: 偏好对齐已经在大型语言模型（LLMs）上取得了更大的成功，并引起了推荐研究的广泛兴趣。目前用于推荐的现有偏好对齐方法要么需要明确的奖励建模，要么只支持成对偏好比较。前者直接增加了大量的计算成本，而后者阻碍了在负样本上的训练效率。此外，目前尚无任何研究探索针对尾项推荐的偏好对齐解决方案。为了弥补上述差距，我们提出了LPO4Rec，该方法将Bradley-Terry模型从成对比较扩展到列表比较，以提高模型训练的效率。具体来说，我们推导出一个闭合形式的最优策略，以在没有明确奖励建模的情况下实现更加高效和有效的训练。我们还提出了一种自适应的负采样和重新加权策略，以在优化过程中优先考虑尾项并增强尾项推荐的性能。此外，我们在理论上证明了优化列表偏好优化（LPO）损失等同于最大化最优奖励的上限。我们在三个公共数据集上的实验证明，我们的方法在性能上明显优于10个基线方法，与直接偏好优化（DPO）相比，在尾项推荐中实现了高达50％的性能提升，同时减少了17.9％的GPU内存使用。我们的代码可在https://github.com/Yuhanleeee/LPO4Rec 上找到。

更新时间: 2025-07-03 03:08:23

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2507.02255v1

Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks

The study of mechanistic interpretability aims to reverse-engineer a model to explain its behaviors. While recent studies have focused on the static mechanism of a certain behavior, the learning dynamics inside a model remain to be explored. In this work, we develop an interpretable fine-tuning method for analyzing the mechanism behind learning. We first introduce the concept of node-level intrinsic dimensionality to describe the learning process of a model in a computational graph. Based on our theory, we propose circuit-tuning, a two-stage algorithm that iteratively builds the minimal subgraph for a specific task and updates the key parameters in a heuristic way. Experimental results confirm the existence of the intrinsic dimensionality at the node level and demonstrate the effectiveness of our method for transparent and interpretable fine-tuning. We visualize and analyze the circuits before, during, and after fine-tuning, providing new insights into the self-organization mechanism of a neural network in the learning process.

Updated: 2025-07-03 03:07:00

标题: 电路调谐：一种识别参数冗余和微调神经网络的机制方法

摘要: 机制可解释性研究旨在逆向工程一个模型来解释其行为。虽然最近的研究集中在某种行为的静态机制上，但模型内部的学习动态仍有待探索。在这项工作中，我们开发了一种可解释的微调方法，用于分析学习背后的机制。我们首先引入了节点级内在维度的概念，用于描述计算图中模型的学习过程。基于我们的理论，我们提出了电路微调，这是一个两阶段算法，通过迭代地构建特定任务的最小子图，并以一种启发式方式更新关键参数。实验结果证实了节点级内在维度的存在，并展示了我们的方法在透明和可解释微调方面的有效性。我们可视化和分析了微调前、中、后的电路，为神经网络在学习过程中的自组织机制提供了新的见解。

更新时间: 2025-07-03 03:07:00

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2502.06106v2

Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation

Progress in enhancing large language model (LLM) planning and reasoning capabilities is significantly hampered by the bottleneck of scalable, reliable data generation and evaluation. To overcome this, I introduce NL2FLOW, a fully automated system for parametrically generating planning problems - expressed in natural language, a structured intermediate representation, and formal PDDL - and rigorously evaluating the quality of generated plans. I demonstrate NL2FLOW's capabilities by generating a dataset of 2296 problems in the automated workflow generation domain and evaluating multiple open-sourced, instruct-tuned LLMs. My results reveal that the highest performing models achieved 86% success in generating valid plans and 69% in generating optimal plans, specifically for problems with feasible solutions. Regression analysis shows that the influence of problem characteristics on plan generation is contingent on both model and prompt design. Notably, I observed that the highest success rate for translating natural language into a JSON representation of a plan was lower than the highest rate of generating a valid plan directly. This suggests that unnecessarily decomposing the reasoning task - introducing intermediate translation steps - may actually degrade performance, implying a benefit to models capable of reasoning directly from natural language to action. As I scale LLM reasoning to increasingly complex problems, the bottlenecks and sources of error within these systems will inevitably shift. Therefore, a dynamic understanding of these limitations - and the tools to systematically reveal them - will be crucial for unlocking the full potential of LLMs as intelligent problem solvers.

Updated: 2025-07-03 03:02:49

标题: 缩放LLM规划：NL2FLOW用于参数问题生成和严格评估

摘要: 大型语言模型（LLM）规划和推理能力的进展受到可扩展、可靠数据生成和评估瓶颈的严重阻碍。为了克服这一障碍，我引入了NL2FLOW，这是一个完全自动化的系统，用于根据自然语言、结构化中间表示和正式PDDL生成规划问题，并严格评估生成的计划的质量。我通过在自动化工作流程生成领域生成了一个包含2296个问题的数据集，并对多个开源、指导调整的LLM进行了评估，展示了NL2FLOW的能力。我的研究结果显示，性能最好的模型在生成有效计划方面达到了86%的成功率，在生成最优计划方面达到了69%的成功率，特别是针对具有可行解决方案的问题。回归分析显示，问题特征对计划生成的影响取决于模型和提示设计。值得注意的是，我观察到将自然语言翻译成计划的JSON表示的成功率低于直接生成有效计划的最高成功率。这表明，不必要地分解推理任务 - 引入中间翻译步骤 - 可能会降低性能，这意味着能够直接从自然语言推理到行动的模型具有优势。随着LLM推理应用于越来越复杂的问题，这些系统中的瓶颈和错误源将不可避免地发生变化。因此，对这些限制的动态理解以及系统地揭示它们的工具将对释放LLM作为智能问题解决者的全部潜力至关重要。

更新时间: 2025-07-03 03:02:49

领域: cs.AI

下载: http://arxiv.org/abs/2507.02253v1

Adapter-Enhanced Semantic Prompting for Continual Learning

Continual learning (CL) enables models to adapt to evolving data streams. A major challenge of CL is catastrophic forgetting, where new knowledge will overwrite previously acquired knowledge. Traditional methods usually retain the past data for replay or add additional branches in the model to learn new knowledge, which has high memory requirements. In this paper, we propose a novel lightweight CL framework, Adapter-Enhanced Semantic Prompting (AESP), which integrates prompt tuning and adapter techniques. Specifically, we design semantic-guided prompts to enhance the generalization ability of visual features and utilize adapters to efficiently fuse the semantic information, aiming to learn more adaptive features for the continual learning task. Furthermore, to choose the right task prompt for feature adaptation, we have developed a novel matching mechanism for prompt selection. Extensive experiments on three CL datasets demonstrate that our approach achieves favorable performance across multiple metrics, showing its potential for advancing CL.

Updated: 2025-07-03 03:00:36

标题: 适配器增强语义提示用于持续学习

摘要: 持续学习（CL）使模型能够适应不断演变的数据流。CL的一个主要挑战是灾难性遗忘，新知识会覆盖先前获得的知识。传统方法通常保留过去的数据进行重放或在模型中添加额外的分支来学习新知识，这需要大量内存。在本文中，我们提出了一种新颖的轻量级CL框架，名为Adapter-Enhanced Semantic Prompting（AESP），它集成了提示调整和适配器技术。具体而言，我们设计了语义引导提示来增强视觉特征的泛化能力，并利用适配器有效地融合语义信息，旨在为持续学习任务学习更具适应性的特征。此外，为了选择正确的任务提示进行特征适应，我们开发了一种用于提示选择的新型匹配机制。对三个CL数据集进行的大量实验表明，我们的方法在多个指标上取得了良好的性能，显示了其推动CL的潜力。

更新时间: 2025-07-03 03:00:36

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.11074v3

SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement

Precise surgical interventions are vital to patient safety, and advanced enhancement algorithms have been developed to assist surgeons in decision-making. Despite significant progress, these algorithms are typically designed for single tasks in specific scenarios, limiting their effectiveness in complex real-world situations. To address this limitation, we propose SurgVisAgent, an end-to-end intelligent surgical vision agent built on multimodal large language models (MLLMs). SurgVisAgent dynamically identifies distortion categories and severity levels in endoscopic images, enabling it to perform a variety of enhancement tasks such as low-light enhancement, overexposure correction, motion blur elimination, and smoke removal. Specifically, to achieve superior surgical scenario understanding, we design a prior model that provides domain-specific knowledge. Additionally, through in-context few-shot learning and chain-of-thought (CoT) reasoning, SurgVisAgent delivers customized image enhancements tailored to a wide range of distortion types and severity levels, thereby addressing the diverse requirements of surgeons. Furthermore, we construct a comprehensive benchmark simulating real-world surgical distortions, on which extensive experiments demonstrate that SurgVisAgent surpasses traditional single-task models, highlighting its potential as a unified solution for surgical assistance.

Updated: 2025-07-03 03:00:26

标题: SurgVisAgent：用于多功能外科视觉增强的多模态代理模型

摘要: 精确的外科手术干预对患者安全至关重要，并且已经开发出先进的增强算法来帮助外科医生做出决策。尽管取得了显著进展，这些算法通常设计用于特定情境中的单一任务，限制了它们在复杂的现实情况下的有效性。为了解决这一局限性，我们提出了 SurgVisAgent，这是一个基于多模态大型语言模型（MLLMs）构建的端到端智能外科视觉代理。SurgVisAgent 动态识别内窥镜图像中的失真类别和严重程度，使其能够执行各种增强任务，如低光增强、过曝校正、消除运动模糊和烟雾去除。具体来说，为了实现更优越的外科情景理解，我们设计了一个提供领域特定知识的先验模型。此外，通过上下文少样本学习和思维链（CoT）推理，SurgVisAgent 提供定制的图像增强，以适应各种失真类型和严重程度级别，从而满足外科医生的多样化需求。此外，我们构建了一个模拟现实外科失真的综合基准，通过大量实验表明 SurgVisAgent 超越了传统的单一任务模型，突显其作为外科辅助统一解决方案的潜力。

更新时间: 2025-07-03 03:00:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02252v1

Rethinking Data Protection in the (Generative) Artificial Intelligence Era

The (generative) artificial intelligence (AI) era has profoundly reshaped the meaning and value of data. No longer confined to static content, data now permeates every stage of the AI lifecycle from the training samples that shape model parameters to the prompts and outputs that drive real-world model deployment. This shift renders traditional notions of data protection insufficient, while the boundaries of what needs safeguarding remain poorly defined. Failing to safeguard data in AI systems can inflict societal and individual, underscoring the urgent need to clearly delineate the scope of and rigorously enforce data protection. In this perspective, we propose a four-level taxonomy, including non-usability, privacy preservation, traceability, and deletability, that captures the diverse protection needs arising in modern (generative) AI models and systems. Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline, including training datasets, model weights, system prompts, and AI-generated content. We analyze representative technical approaches at each level and reveal regulatory blind spots that leave critical assets exposed. By offering a structured lens to align future AI technologies and governance with trustworthy data practices, we underscore the urgency of rethinking data protection for modern AI techniques and provide timely guidance for developers, researchers, and regulators alike.

Updated: 2025-07-03 02:45:51

标题: 重新思考在（生成）人工智能时代的数据保护

摘要: （生成式）人工智能（AI）时代已经深刻地重塑了数据的意义和价值。数据不再局限于静态内容，现在渗透到AI生命周期的每个阶段，从塑造模型参数的训练样本到驱动现实世界模型部署的提示和输出。这种转变使传统的数据保护概念显得不够，同时需要保护的范围也定义不清。在AI系统中未能保护数据可能会给社会和个人带来危害，强调了明确界定和严格执行数据保护的紧迫性。在这个视角下，我们提出了一个四级分类法，包括不可用性、隐私保护、可追溯性和可删除性，涵盖了现代（生成式）AI模型和系统中出现的多样化的保护需求。我们的框架为理解数据效用和控制之间的权衡提供了结构化的方法，覆盖了整个AI流程，包括训练数据集、模型权重、系统提示和AI生成内容。我们分析了每个级别的代表性技术方法，并揭示了监管上的盲点，导致关键资产暴露在危险之中。通过提供一个结构化的视角，以使未来的AI技术和治理与可信赖的数据实践相一致，我们强调了对现代AI技术重新思考数据保护的紧迫性，并为开发人员、研究人员和监管者提供及时的指导。

更新时间: 2025-07-03 02:45:51

领域: cs.LG,cs.AI,cs.CR,cs.CV,cs.CY

下载: http://arxiv.org/abs/2507.03034v1

Transfer Learning for Matrix Completion

In this paper, we explore the knowledge transfer under the setting of matrix completion, which aims to enhance the estimation of a low-rank target matrix with auxiliary data available. We propose a transfer learning procedure given prior information on which source datasets are favorable. We study its convergence rates and prove its minimax optimality. Our analysis reveals that with the source matrices close enough to the target matrix, out method outperforms the traditional method using the single target data. In particular, we leverage the advanced sharp concentration inequalities introduced in \cite{brailovskaya2024universality} to eliminate a logarithmic factor in the convergence rate, which is crucial for proving the minimax optimality. When the relevance of source datasets is unknown, we develop an efficient detection procedure to identify informative sources and establish its selection consistency. Simulations and real data analysis are conducted to support the validity of our methodology.

Updated: 2025-07-03 02:43:40

标题: 矩阵完成的迁移学习

摘要: 在这篇论文中，我们探讨了在矩阵补全设置下的知识转移，旨在通过利用辅助数据来增强对低秩目标矩阵的估计。我们提出了一个基于先验信息的迁移学习过程，其中确定了哪些源数据集是有利的。我们研究了其收敛速度，并证明了其极小值最优性。我们的分析表明，当源矩阵足够接近目标矩阵时，我们的方法优于传统方法使用单一目标数据。特别地，我们利用了\cite{brailovskaya2024universality}中引入的先进的尖锐集中不等式，以消除收敛速度中的对数因子，这对于证明极小值最优性是至关重要的。当源数据集的相关性未知时，我们开发了一个有效的检测过程来识别信息源，并建立其选择一致性。我们进行了模拟和真实数据分析，以支持我们方法的有效性。

更新时间: 2025-07-03 02:43:40

领域: stat.ML,cs.LG,15A83,I.2.6; G.3

下载: http://arxiv.org/abs/2507.02248v1

EigenLoRAx: Recycling Adapters to Find Principal Subspaces for Resource-Efficient Adaptation and Inference

The rapid growth of large models has raised concerns about their environmental impact and equity in accessibility due to significant computational costs. Low-Rank Adapters (LoRA) offer a lightweight solution for finetuning large models, resulting in an abundance of publicly available adapters tailored to diverse domains. We ask: Can these pretrained adapters be leveraged to further streamline adaptation to new tasks while addressing these challenges? We introduce EigenLoRAx, a parameter-efficient finetuning method that recycles existing adapters to create a principal subspace aligned with their shared domain knowledge which can be further augmented with orthogonal basis vectors in low-resource scenarios. This enables rapid adaptation to new tasks by learning only lightweight coefficients on the principal components of the subspace-eliminating the need to finetune entire adapters. EigenLoRAx requires significantly fewer parameters and memory, improving efficiency for both training and inference. Our method demonstrates strong performance across diverse domains and tasks, offering a scalable for edge-based applications, personalization, and equitable deployment of large models in resource-constrained environments.

Updated: 2025-07-03 02:34:41

标题: EigenLoRAx：通过回收适配器找到主要子空间，实现资源高效的适应和推断

摘要: 大型模型的快速增长引起了人们对其环境影响和可访问性公平性的担忧，因为存在着显著的计算成本。低秩适配器（LoRA）为大型模型的微调提供了一种轻量级解决方案，导致了大量针对不同领域定制的公开适配器的出现。我们提出问题：这些预训练适配器能否被利用来进一步简化对新任务的适应，并解决这些挑战？我们引入了EigenLoRAx，一种参数高效的微调方法，利用现有的适配器回收创建与它们共享的领域知识对齐的主子空间，该子空间可以在资源有限的情况下进一步用正交基向量进行增强。这使得通过仅学习主要成分上的轻量级系数来快速适应新任务成为可能，从而消除了微调整个适配器的必要性。EigenLoRAx需要更少的参数和内存，提高了训练和推断的效率。我们的方法在不同领域和任务中表现出色，为边缘应用、个性化和在资源受限环境中大型模型的公平部署提供了可扩展性。

更新时间: 2025-07-03 02:34:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.04700v4

Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies

Large language models (LLMs) excel in complex tasks through advanced prompting techniques like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), but their reliance on manually crafted, task-specific prompts limits adaptability and efficiency. We introduce Mixture of Reasoning (MoR), a training framework that embeds diverse reasoning strategies into LLMs for autonomous, task-adaptive reasoning without external prompt engineering. MoR has two phases: Thought Generation, creating reasoning chain templates with models like GPT-4o, and SFT Dataset Construction, pairing templates with benchmark datasets for supervised fine-tuning. Our experiments show that MoR significantly enhances performance, with MoR150 achieving 0.730 (2.2% improvement) using CoT prompting and 0.734 (13.5% improvement) compared to baselines. MoR eliminates the need for task-specific prompts, offering a generalizable solution for robust reasoning across diverse tasks.

Updated: 2025-07-03 02:30:05

标题: 混合推理：教导大型语言模型使用自适应策略进行推理

摘要: 大型语言模型（LLMs）通过高级提示技术（如Chain-of-Thought（CoT）和Tree-of-Thought（ToT））在复杂任务中表现出色，但它们对手动制作的特定任务提示的依赖限制了适应性和效率。我们引入了Mixture of Reasoning（MoR），这是一个培训框架，将多样化的推理策略嵌入到LLMs中，以实现自主的、任务自适应的推理，无需外部提示工程。MoR有两个阶段：Thought Generation，使用像GPT-4o这样的模型创建推理链模板，以及SFT Dataset Construction，将模板与基准数据集配对进行监督微调。我们的实验表明，MoR显著提高了性能，MoR150在使用CoT提示时获得了0.730（提高了2.2%），与基线相比获得了0.734（提高了13.5%）。MoR消除了对特定任务提示的需求，为各种任务之间的稳健推理提供了通用解决方案。

更新时间: 2025-07-03 02:30:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.00606v2

VERBA: Verbalizing Model Differences Using Large Language Models

In the current machine learning landscape, we face a "model lake" phenomenon: Given a task, there is a proliferation of trained models with similar performances despite different behavior. For model users attempting to navigate and select from the models, documentation comparing model pairs is helpful. However, for every $N$ models there could be $O(N^2)$ pairwise comparisons, a number prohibitive for the model developers to manually perform pairwise comparisons and prepare documentations. To facilitate fine-grained pairwise comparisons among models, we introduced $\textbf{VERBA}$. Our approach leverages a large language model (LLM) to generate verbalizations of model differences by sampling from the two models. We established a protocol that evaluates the informativeness of the verbalizations via simulation. We also assembled a suite with a diverse set of commonly used machine learning models as a benchmark. For a pair of decision tree models with up to 5% performance difference but 20-25% behavioral differences, $\textbf{VERBA}$ effectively verbalizes their variations with up to 80% overall accuracy. When we included the models' structural information, the verbalization's accuracy further improved to 90%. $\textbf{VERBA}$ opens up new research avenues for improving the transparency and comparability of machine learning models in a post-hoc manner.

Updated: 2025-07-03 02:25:24

标题: VERBA: 使用大型语言模型言语化建模差异

摘要: 在当前的机器学习领域中，我们面临着一个“模型湖”现象：在给定任务的情况下，尽管行为不同，训练模型的数量激增，但性能相似。对于试图从这些模型中选择的模型用户来说，比较模型对是有帮助的。然而，对于每个N个模型，可能存在O(N^2)个成对比较，这个数量对于模型开发者来说是禁止的，无法手动执行成对比较并准备文档。为了促进模型之间的细粒度成对比较，我们引入了VERBA。我们的方法利用大型语言模型（LLM）从两个模型中抽样生成模型差异的语言化描述。我们建立了一个评估这些语言化描述信息量的协议，通过模拟进行评估。我们还组装了一个包含各种常用机器学习模型的套件作为基准。对于一对决策树模型，即使性能差异高达5%，行为差异达到20-25%，VERBA也能够有效地描述它们的变化，整体准确率高达80%。当我们包含模型的结构信息时，语言化描述的准确性进一步提高到90%。VERBA为改善机器学习模型的透明性和可比性开辟了新的研究途径。

更新时间: 2025-07-03 02:25:24

领域: cs.LG

下载: http://arxiv.org/abs/2507.02241v1

Bi-modality medical images synthesis by a bi-directional discrete process matching method

Recently, medical image synthesis gains more and more popularity, along with the rapid development of generative models. Medical image synthesis aims to generate an unacquired image modality, often from other observed data modalities. Synthesized images can be used for clinical diagnostic assistance, data augmentation for model training and validation or image quality improving. In the meanwhile, the flow-based models are among the successful generative models for the ability of generating realistic and high-quality synthetic images. However, most flow-based models require to calculate flow ordinary different equation (ODE) evolution steps in synthesis process, for which the performances are significantly limited by heavy computation time due to a large number of time iterations. In this paper, we propose a novel flow-based model, namely bi-directional Discrete Process Matching (Bi-DPM) to accomplish the bi-modality image synthesis tasks. Different to other flow matching based models, we propose to utilize both forward and backward ODE flows and enhance the consistency on the intermediate images over a few discrete time steps, resulting in a synthesis process maintaining high-quality generations for both modalities under the guidance of paired data. Our experiments on three datasets of MRI T1/T2 and CT/MRI demonstrate that Bi-DPM outperforms other state-of-the-art flow-based methods for bi-modality image synthesis, delivering higher image quality with accurate anatomical regions.

Updated: 2025-07-03 02:25:06

标题: 双模态医学图像合成的双向离散过程匹配方法

摘要: 最近，随着生成模型的快速发展，医学图像合成变得越来越受欢迎。医学图像合成旨在生成一种未获得的图像模态，通常是从其他观察到的数据模态中生成。合成图像可用于临床诊断辅助、模型训练和验证的数据增强或图像质量改进。与此同时，基于流的模型是成功的生成模型之一，因为它具有生成逼真且高质量合成图像的能力。然而，大多数基于流的模型在合成过程中需要计算流的普通微分方程（ODE）演化步骤，由于大量的时间迭代，其性能受到计算时间的严重限制。在本文中，我们提出了一种新颖的基于流的模型，即双向离散过程匹配（Bi-DPM）来完成双模态图像合成任务。与其他基于流匹配的模型不同，我们提出利用前向和后向ODE流，并在少数离散时间步上增强中间图像的一致性，从而在配对数据的指导下保持高质量的双模态生成。我们在MRI T1/T2和CT/MRI三个数据集上的实验表明，Bi-DPM优于其他最先进的基于流的方法，为双模态图像合成提供了更高的图像质量和准确的解剖区域。

更新时间: 2025-07-03 02:25:06

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.03977v3

Escaping Platos Cave: JAM for Aligning Independently Trained Vision and Language Models

Independently trained vision and language models inhabit disjoint representational spaces, shaped by their respective modalities, objectives, and architectures. Yet an emerging hypothesis - the Platonic Representation Hypothesis - suggests that such models may nonetheless converge toward a shared statistical model of reality. This compatibility, if it exists, raises a fundamental question: can we move beyond post-hoc statistical detection of alignment and explicitly optimize for it between such disjoint representations? We cast this Platonic alignment problem as a multi-objective optimization task - preserve each modality's native structure while aligning for mutual coherence. We introduce the Joint Autoencoder Modulator (JAM) framework that jointly trains modality-specific autoencoders on the latent representations of pre-trained single modality models, encouraging alignment through both reconstruction and cross-modal objectives. By analogy, this framework serves as a method to escape Plato's Cave, enabling the emergence of shared structure from disjoint inputs. We evaluate this framework across three critical design axes: (i) the alignment objective - comparing contrastive loss (Con), its hard-negative variant (NegCon), and our Spread loss, (ii) the layer depth at which alignment is most effective, and (iii) the impact of foundation model scale on representational convergence. Our findings show that our lightweight Pareto-efficient framework reliably induces alignment, even across frozen, independently trained representations, offering both theoretical insight and practical pathways for transforming generalist unimodal foundations into specialist multimodal models.

Updated: 2025-07-03 02:07:36

标题: 逃离柏拉图的洞穴：用于对齐独立训练的视觉和语言模型的JAM

摘要: 独立训练的视觉和语言模型存在不同的表征空间，由各自的模态、目标和架构塑造。然而，一个新兴的假设 - 柏拉图表征假设 - 暗示这样的模型可能会朝着一个共享现实的统计模型收敛。如果存在这种兼容性，那么就会引发一个根本问题：我们是否能够超越事后统计检测的对齐，并明确地在这些不同的表征之间进行优化？我们将这个柏拉图对齐问题作为一个多目标优化任务 - 保持每种模态的本地结构，同时对齐相互一致性。我们引入了联合自动编码器调制器（JAM）框架，该框架在预训练的单模态模型的潜在表征上联合训练模态特定的自动编码器，通过重构和跨模态目标来鼓励对齐。类比地，这个框架作为一种方法可以逃离柏拉图的洞穴，从不同的输入中实现共享结构的出现。我们在三个关键设计轴上评估了这个框架：（i）对齐目标 - 比较对比损失（Con）、其硬负变体（NegCon）和我们的Spread损失，（ii）对齐最有效的层深度，以及（iii）基础模型规模对表征收敛的影响。我们的研究结果表明，我们的轻量级帕累托有效框架可可靠地引发对齐，即使在冻结的、独立训练的表征之间也是如此，为将通用单模态基础转变为专业多模态模型提供了理论洞见和实践途径。

更新时间: 2025-07-03 02:07:36

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.01201v2

Fast AI Model Splitting over Edge Networks

Split learning (SL) has emerged as a computationally efficient approach for artificial intelligence (AI) model training, which can alleviate device-side computational workloads. However, complex AI model architectures pose high computational complexity to obtain the optimal model splitting. In this paper, we represent an arbitrary AI model as a directed acyclic graph (DAG), and then reformulate the optimal model splitting problem as a minimum s-t cut search problem. To solve the problem, we propose a fast DAG-based model splitting algorithm, which restructures the DAG to enable the optimal model splitting identification via a maximum flow method. Theoretical analysis indicates that the proposed algorithm is optimal. Furthermore, considering AI models with block structures, we propose a block-wise model splitting algorithm to reduce computational complexity. The algorithm abstracts each block, i.e., a component consisting of multiple layers, into a single vertex, thereby obtaining the optimal model splitting via a simplified DAG. Extensive experimental results demonstrate that the proposed algorithms can determine the optimal model splitting within milliseconds, as well as reduce training delay by 24.62%-38.95% in dynamic edge networks as compared to the state-of-the-art benchmarks.

Updated: 2025-07-03 02:01:58

标题: 快速AI模型在边缘网络上的拆分

摘要: Split learning（SL）已经成为一种计算效率高的人工智能（AI）模型训练方法，可以减轻设备端的计算负载。然而，复杂的AI模型架构对获取最佳模型拆分提出了高计算复杂性的要求。在本文中，我们将任意AI模型表示为一个有向无环图（DAG），然后将最佳模型拆分问题重新定义为最小s-t割搜索问题。为了解决这个问题，我们提出了一种快速的基于DAG的模型拆分算法，通过重构DAG以实现通过最大流方法识别最佳模型拆分。理论分析表明，所提出的算法是最优的。此外，考虑到具有块结构的AI模型，我们提出了一种块状模型拆分算法以减少计算复杂性。该算法将每个块（即由多个层组成的组件）抽象为单个顶点，从而通过简化的DAG获得最佳模型拆分。大量实验结果表明，所提出的算法可以在毫秒内确定最佳模型拆分，并且与最先进的基准相比，在动态边缘网络中可以将训练延迟降低24.62%-38.95%。

更新时间: 2025-07-03 02:01:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01041v2

Preserving Privacy, Increasing Accessibility, and Reducing Cost: An On-Device Artificial Intelligence Model for Medical Transcription and Note Generation

Background: Clinical documentation represents a significant burden for healthcare providers, with physicians spending up to 2 hours daily on administrative tasks. Recent advances in large language models (LLMs) offer promising solutions, but privacy concerns and computational requirements limit their adoption in healthcare settings. Objective: To develop and evaluate a privacy-preserving, on-device medical transcription system using a fine-tuned Llama 3.2 1B model capable of generating structured medical notes from medical transcriptions while maintaining complete data sovereignty entirely in the browser. Methods: We fine-tuned a Llama 3.2 1B model using Parameter-Efficient Fine-Tuning (PEFT) with LoRA on 1,500 synthetic medical transcription-to-structured note pairs. The model was evaluated against the base Llama 3.2 1B on two datasets: 100 endocrinology transcripts and 140 modified ACI benchmark cases. Evaluation employed both statistical metrics (ROUGE, BERTScore, BLEURT) and LLM-as-judge assessments across multiple clinical quality dimensions. Results: The fine-tuned OnDevice model demonstrated substantial improvements over the base model. On the ACI benchmark, ROUGE-1 scores increased from 0.346 to 0.496, while BERTScore F1 improved from 0.832 to 0.866. Clinical quality assessments showed marked reduction in major hallucinations (from 85 to 35 cases) and enhanced factual correctness (2.81 to 3.54 on 5-point scale). Similar improvements were observed on the internal evaluation dataset, with composite scores increasing from 3.13 to 4.43 (+41.5%). Conclusions: Fine-tuning compact LLMs for medical transcription yields clinically meaningful improvements while enabling complete on-device browser deployment. This approach addresses key barriers to AI adoption in healthcare: privacy preservation, cost reduction, and accessibility for resource-constrained environments.

Updated: 2025-07-03 01:51:49

标题: 保护隐私，增加可访问性，降低成本：用于医学转录和笔记生成的设备端人工智能模型

摘要: 背景：临床文档对医疗提供者来说是一个重要的负担，医生每天花费多达2小时处理行政任务。最近大型语言模型（LLMs）的进展提供了有希望的解决方案，但隐私问题和计算要求限制了它们在医疗环境中的应用。目标：开发和评估一种隐私保护的、基于设备的医学转录系统，使用经过优化的Llama 3.2 1B模型，能够从医学转录中生成结构化的医疗笔记，同时完全保持数据主权在浏览器中。方法：我们使用Parameter-Efficient Fine-Tuning（PEFT）和LoRA对1,500个合成医学转录-结构化笔记对进行了Llama 3.2 1B模型的微调。该模型在两个数据集上进行了评估：100个内分泌学转录和140个修改后的ACI基准案例。评估采用了统计指标（ROUGE、BERTScore、BLEURT）和LLM作为评判者在多个临床质量维度上的评价。结果：经过微调的OnDevice模型显示出比基准模型显著的改进。在ACI基准上，ROUGE-1分数从0.346提高到0.496，而BERTScore F1从0.832提高到0.866。临床质量评估显示主要幻觉的显著减少（从85例减少到35例），并提高了事实的正确性（在5分制上从2.81提高到3.54）。在内部评估数据集上也观察到类似的改进，综合分数从3.13增加到4.43（+41.5%）。结论：为医学转录微调紧凑的LLMs能够产生临床上有意义的改进，同时实现完全基于设备的浏览器部署。这种方法解决了医疗领域人工智能应用的关键障碍：隐私保护、成本降低和为资源受限环境提供可访问性。

更新时间: 2025-07-03 01:51:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.03033v1

Privacy-Preserving Quantized Federated Learning with Diverse Precision

Federated learning (FL) has emerged as a promising paradigm for distributed machine learning, enabling collaborative training of a global model across multiple local devices without requiring them to share raw data. Despite its advancements, FL is limited by factors such as: (i) privacy risks arising from the unprotected transmission of local model updates to the fusion center (FC) and (ii) decreased learning utility caused by heterogeneity in model quantization resolution across participating devices. Prior work typically addresses only one of these challenges because maintaining learning utility under both privacy risks and quantization heterogeneity is a non-trivial task. In this paper, our aim is therefore to improve the learning utility of a privacy-preserving FL that allows clusters of devices with different quantization resolutions to participate in each FL round. Specifically, we introduce a novel stochastic quantizer (SQ) that is designed to simultaneously achieve differential privacy (DP) and minimum quantization error. Notably, the proposed SQ guarantees bounded distortion, unlike other DP approaches. To address quantization heterogeneity, we introduce a cluster size optimization technique combined with a linear fusion approach to enhance model aggregation accuracy. Numerical simulations validate the benefits of our approach in terms of privacy protection and learning utility compared to the conventional LaplaceSQ-FL algorithm.

Updated: 2025-07-03 01:49:31

标题: 隐私保护的多精度量化联邦学习

摘要: 联邦学习（FL）已经成为分布式机器学习的一种有前途的范式，实现了跨多个本地设备协作训练全局模型，而无需它们共享原始数据。尽管取得了进展，但FL受到因素的限制，例如：（i）由于未加密地传输本地模型更新到融合中心（FC）而产生的隐私风险，以及（ii）由于参与设备之间模型量化分辨率的异质性而导致的学习效用降低。以往的工作通常只解决这些挑战中的一个，因为在隐私风险和量化异质性下保持学习效用是一项非平凡的任务。因此，本文的目的是改善一种隐私保护的FL的学习效用，允许具有不同量化分辨率的设备集群参与每一轮FL。具体来说，我们引入了一种新颖的随机量化器（SQ），旨在同时实现差分隐私（DP）和最小量化误差。值得注意的是，所提出的SQ保证有界失真，与其他DP方法不同。为了解决量化异质性，我们引入了一种集群大小优化技术，结合线性融合方法来增强模型聚合精度。数值模拟验证了我们的方法在隐私保护和学习效用方面相对于传统的LaplaceSQ-FL算法的益处。

更新时间: 2025-07-03 01:49:31

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.00920v2

PhysicsCorrect: A Training-Free Approach for Stable Neural PDE Simulations

Neural networks have emerged as powerful surrogates for solving partial differential equations (PDEs), offering significant computational speedups over traditional methods. However, these models suffer from a critical limitation: error accumulation during long-term rollouts, where small inaccuracies compound exponentially, eventually causing complete divergence from physically valid solutions. We present PhysicsCorrect, a training-free correction framework that enforces PDE consistency at each prediction step by formulating correction as a linearized inverse problem based on PDE residuals. Our key innovation is an efficient caching strategy that precomputes the Jacobian and its pseudoinverse during an offline warm-up phase, reducing computational overhead by two orders of magnitude compared to standard correction approaches. Across three representative PDE systems -- Navier-Stokes fluid dynamics, wave equations, and the chaotic Kuramoto-Sivashinsky equation -- PhysicsCorrect reduces prediction errors by up to 100x while adding negligible inference time (under 5\%). The framework integrates seamlessly with diverse architectures including Fourier Neural Operators, UNets, and Vision Transformers, effectively transforming unstable neural surrogates into reliable simulation tools that bridge the gap between deep learning's computational efficiency and the physical fidelity demanded by practical scientific applications.

Updated: 2025-07-03 01:22:57

标题: "PhysicsCorrect：一种稳定神经偏微分方程模拟的无需训练方法"

摘要: 神经网络已经成为解决偏微分方程（PDEs）的强大替代方法，相比传统方法，它们提供了显著的计算加速。然而，这些模型存在一个关键局限性：在长期预测过程中出现的误差积累，小的不准确性会呈指数级增加，最终导致与物理有效解的完全偏离。我们提出了PhysicsCorrect，这是一个无需训练的校正框架，通过将校正形式化为基于PDE残差的线性化逆问题，在每个预测步骤强制执行PDE的一致性。我们的关键创新是一种高效的缓存策略，在离线热身阶段预先计算雅可比矩阵及其伪逆，与标准校正方法相比，将计算开销降低两个数量级。在三个代表性的PDE系统（Navier-Stokes流体动力学、波动方程和混沌的Kuramoto-Sivashinsky方程）中，PhysicsCorrect将预测误差降低了多达100倍，同时增加的推断时间几乎可以忽略不计（低于5%）。该框架与各种架构无缝集成，包括Fourier神经运算符、UNets和Vision Transformers，有效地将不稳定的神经替代方法转化为可靠的模拟工具，弥合了深度学习的计算效率和实际科学应用所需的物理可靠性之间的差距。

更新时间: 2025-07-03 01:22:57

领域: cs.LG

下载: http://arxiv.org/abs/2507.02227v1

DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs

As one of their many applications, large language models (LLMs) have recently shown promise in automating register transfer level (RTL) code generation. However, conventional LLM decoding strategies, originally designed for natural language, often fail to meet the structural and semantic demands of RTL, leading to hallucinated, repetitive, or invalid code outputs. In this paper, we first investigate the root causes of these decoding failures through an empirical analysis of token-level entropy during RTL generation. Our findings reveal that LLMs exhibit low confidence in regions of structural ambiguity or semantic complexity, showing that standard decoding strategies fail to differentiate between regions requiring determinism (syntax-critical regions) and those that benefit from creative exploratory variability (design-critical regions). Then, to overcome this, we introduce DecoRTL, a novel run-time decoding strategy, that is both syntax-aware and contrastive for RTL code generation. DecoRTL integrates two complementary components: (i) self-consistency sampling, which generates multiple candidates and re-ranks them based on token-level agreement to promote correctness while maintaining diversity; and (ii) syntax-aware temperature adaptation, which classifies tokens by their syntactical and functional roles and adjusts the sampling temperature accordingly, enforcing low temperature for syntax-critical tokens and higher temperature for exploratory ones. Our approach operates entirely at inference time without requiring any additional model fine-tuning. Through evaluations on multiple open-source LLMs using the VerilogEval benchmark, we demonstrate significant improvements in syntactic validity, functional correctness, and output diversity, while the execution overhead (performance overhead) is imperceptible.

Updated: 2025-07-03 01:17:44

标题: DecoRTL：用于LLMs的RTL代码生成的运行时解码框架

摘要: 作为它们的众多应用之一，大型语言模型（LLMs）最近在自动化寄存器传输级（RTL）代码生成方面显示出潜力。然而，最初设计用于自然语言的传统LLM解码策略往往无法满足RTL的结构和语义要求，导致产生虚构的、重复的或无效的代码输出。在本文中，我们首先通过对RTL生成过程中的标记级熵进行实证分析，探究这些解码失败的根本原因。我们的发现显示，LLMs在结构模糊或语义复杂区域表现出低置信度，表明标准解码策略无法区分需要确定性（语法关键区域）和需要创造性探索变异性（设计关键区域）的区域。为了克服这一问题，我们引入了DecoRTL，一种新颖的运行时解码策略，既具有语法意识又具有对比性，用于RTL代码生成。DecoRTL集成了两个互补组件：（i）自一致性抽样，生成多个候选项并根据标记级协议重新排名，以促进正确性同时保持多样性；和（ii）语法感知温度调整，根据标记的句法和功能角色对其进行分类，并相应地调整抽样温度，对语法关键标记执行低温度，对探索性标记执行更高温度。我们的方法完全在推理时间内运行，无需额外的模型微调。通过对多个开源LLMs使用VerilogEval基准进行评估，我们展示了在句法有效性、功能正确性和输出多样性方面的显著改进，同时执行开销（性能开销）不可察觉。

更新时间: 2025-07-03 01:17:44

领域: cs.PL,cs.AR,cs.LG

下载: http://arxiv.org/abs/2507.02226v1

Metric Design != Metric Behavior: Improving Metric Selection for the Unbiased Evaluation of Dimensionality Reduction

Evaluating the accuracy of dimensionality reduction (DR) projections in preserving the structure of high-dimensional data is crucial for reliable visual analytics. Diverse evaluation metrics targeting different structural characteristics have thus been developed. However, evaluations of DR projections can become biased if highly correlated metrics--those measuring similar structural characteristics--are inadvertently selected, favoring DR techniques that emphasize those characteristics. To address this issue, we propose a novel workflow that reduces bias in the selection of evaluation metrics by clustering metrics based on their empirical correlations rather than on their intended design characteristics alone. Our workflow works by computing metric similarity using pairwise correlations, clustering metrics to minimize overlap, and selecting a representative metric from each cluster. Quantitative experiments demonstrate that our approach improves the stability of DR evaluation, which indicates that our workflow contributes to mitigating evaluation bias.

Updated: 2025-07-03 01:07:02

标题: Metric设计！= Metric行为：改善度量选择以对降维的无偏评估

摘要: 评估降维（DR）投影在保持高维数据结构方面的准确性对于可靠的视觉分析至关重要。因此，已经开发了针对不同结构特征的多样化评估指标。然而，如果不慎选择高度相关的度量标准（即测量相似结构特征的度量标准），则对DR投影的评估可能会变得有偏，从而偏好强调这些特征的DR技术。为了解决这个问题，我们提出了一种新颖的工作流程，通过基于它们的经验相关性而不仅仅是它们的设计特征来对度量标准进行聚类，从而减少评估度量标准选择中的偏见。我们的工作流程通过计算度量标准之间的成对相关性来确定度量标准的相似性，对度量标准进行聚类以最小化重叠，并从每个聚类中选择代表性度量标准。定量实验表明，我们的方法提高了DR评估的稳定性，这表明我们的工作流程有助于减轻评估偏见。

更新时间: 2025-07-03 01:07:02

领域: cs.LG

下载: http://arxiv.org/abs/2507.02225v1

On the Mathematical Impossibility of Safe Universal Approximators

We establish fundamental mathematical limits on universal approximation theorem (UAT) system alignment by proving that catastrophic failures are an inescapable feature of any useful computational system. Our central thesis is that for any universal approximator, the expressive power required for useful computation is inextricably linked to a dense set of instabilities that make perfect, reliable control a mathematical impossibility. We prove this through a three-level argument that leaves no escape routes for any class of universal approximator architecture. i) Combinatorial Necessity: For the vast majority of practical universal approximators (e.g., those using ReLU activations), we prove that the density of catastrophic failure points is directly proportional to the network's expressive power. ii) Topological Necessity: For any theoretical universal approximator, we use singularity theory to prove that the ability to approximate generic functions requires the ability to implement the dense, catastrophic singularities that characterize them. iii) Empirical Necessity: We prove that the universal existence of adversarial examples is empirical evidence that real-world tasks are themselves catastrophic, forcing any successful model to learn and replicate these instabilities. These results, combined with a quantitative "Impossibility Sandwich" showing that the minimum complexity for usefulness exceeds the maximum complexity for safety, demonstrate that perfect alignment is not an engineering challenge but a mathematical impossibility. This foundational result reframes UAT safety from a problem of "how to achieve perfect control" to one of "how to operate safely in the presence of irreducible uncontrollability," with profound implications for the future of UAT development and governance.

Updated: 2025-07-03 01:05:24

标题: 关于安全通用逼近器的数学不可能性

摘要: 我们通过证明灾难性失败是任何有用的计算系统的一个不可避免的特征，建立了通用逼近定理（UAT）系统对齐的基本数学限制。我们的核心论点是，对于任何通用逼近器，用于有用计算的表达能力与一组稳定性密集的不稳定性紧密相连，使得完美、可靠的控制在数学上不可能。我们通过一个三级论证来证明这一点，对于任何类别的通用逼近器架构都没有逃脱路线。i）组合必要性：对于绝大多数实际通用逼近器（例如使用ReLU激活的逼近器），我们证明了灾难性失败点的密度与网络的表达能力直接成正比。ii）拓扑必要性：对于任何理论通用逼近器，我们使用奇点理论证明近似泛函的能力需要实现表征它们的密集、灾难性奇点。iii）经验必要性：我们证明了对抗性示例的通用存在是现实世界任务本身具有灾难性的经验证据，迫使任何成功的模型学习和复制这些不稳定性。这些结果，结合定量的“不可能性三明治”显示，用于有用性的最小复杂性超过了用于安全性的最大复杂性，表明完美对齐不是一个工程挑战，而是一个数学不可能性。这一基础性结果将UAT安全性从“如何实现完美控制”问题重新定义为“如何在不可减少的不可控性存在的情况下安全操作”的问题，对UAT发展和治理的未来具有深远影响。

更新时间: 2025-07-03 01:05:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03031v1

Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Grokking, i.e., test performance keeps improving long after training loss converged, has been recently witnessed in neural network training, making the mechanism of generalization and other emerging capabilities such as reasoning mysterious. While prior studies usually train small models on a few toy or highly-specific tasks for thousands of epochs, we conduct the first study of grokking on checkpoints during one-pass pretraining of a 7B large language model (LLM), i.e., OLMoE. We compute the training loss and evaluate generalization on diverse benchmark tasks, including math reasoning, code generation, and commonsense/domain-specific knowledge retrieval tasks. Our study, for the first time, verifies that grokking still happens in the pretraining of large-scale foundation models, though different data may enter grokking stages asynchronously. We further demystify grokking's "emergence of generalization" by investigating LLM internal dynamics. Specifically, we find that training samples' pathways (i.e., expert choices across layers) evolve from random, instance-specific to more structured and shareable between samples. Also, the complexity of a sample's pathway reduces despite the converged loss. These indicate a memorization-to-generalization "knowledge digestion", providing a mechanistic explanation of delayed generalization. In the study, we develop two novel metrics to quantify pathway distance and the complexity of a single pathway. We show their ability to predict the generalization improvement on diverse downstream tasks. They are efficient, simple to compute and solely dependent on training data. Hence, they have practical value for pretraining, enabling us to monitor the generalization performance without finetuning and test. Theoretically, we show that more structured pathways reduce model complexity and improve the generalization bound.

Updated: 2025-07-03 01:00:21

标题: 在LLM预训练中找到Grokking的位置？在没有测试的情况下监控记忆化至泛化

摘要: Grokking，即测试表现在训练损失收敛后仍然持续改善，最近在神经网络训练中已经被证实，使得泛化机制和其他新兴能力如推理等变得神秘。以往的研究通常在一些玩具或高度特定的任务上训练小模型数千个epochs，我们首次对一次预训练7B大型语言模型（LLM）即OLMoE的检查点进行了grokking研究。我们计算训练损失并评估各种基准任务的泛化能力，包括数学推理、代码生成和常识/领域特定知识检索任务。我们的研究首次验证了在大规模基础模型的预训练中仍会发生grokking，尽管不同数据可能异步进入grokking阶段。我们进一步通过研究LLM内部动态来揭示grokking的“泛化出现”的神秘。具体来说，我们发现训练样本的路径（即各层之间的专家选择）从随机、实例特定逐渐演变为更结构化且可共享的路径。此外，尽管损失已经收敛，样本路径的复杂性却在减少。这表明了一个记忆到泛化“知识消化”，提供了对延迟泛化的机械解释。在研究中，我们开发了两个新的度量标准来量化路径距离和单个路径的复杂性。我们展示了它们在多样的下游任务上预测泛化改善的能力。它们高效、简单计算且仅依赖于训练数据。因此，它们对预训练具有实际价值，使我们能够监控泛化性能而无需微调和测试。从理论上讲，我们展示了更结构化的路径减少了模型复杂性并提高了泛化的边界。

更新时间: 2025-07-03 01:00:21

领域: cs.LG

下载: http://arxiv.org/abs/2506.21551v2

Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting

Time series forecasting remains a challenging task for foundation models due to temporal heterogeneity, high dimensionality, and the lack of inherent symbolic structure. In this work, we propose DRAGON (Discrete Representation and Augmented Graph encoding Over de BruijN Graphs), a novel encoder that introduces Multivariate de Bruijn Graphs (MdBGs) to bridge the gap between symbolic representations and neural modeling. DRAGON discretizes continuous input sequences and maps them onto a fixed graph structure, enabling dynamic context recovery via graph-based attention. Integrated as an auxiliary module within a dual-branch architecture, DRAGON augments conventional CNN-based encoders with symbolic, structure-aware representations. All code developed for this study is available at: https://github.com/KurbanIntelligenceLab/MultdBG-Time-Series-Library

Updated: 2025-07-03 00:58:49

标题: 多元de Bruijn图：一种用于时间序列预测的符号图框架

摘要: 时间序列预测仍然是基础模型的一个具有挑战性的任务，原因是由于时间的异质性、高维度和缺乏固有的符号结构。在这项工作中，我们提出了DRAGON（Discrete Representation and Augmented Graph encoding Over de BruijN Graphs），这是一种新颖的编码器，引入了多变量de Bruijn图（MdBGs），以弥合符号表示和神经建模之间的差距。DRAGON将连续输入序列离散化，并将它们映射到一个固定的图结构上，通过基于图的注意力实现动态上下文恢复。作为一个辅助模块整合在双分支架构中，DRAGON通过符号的、结构感知的表示增强了传统基于CNN的编码器。本研究开发的所有代码都可以在以下网址找到：https://github.com/KurbanIntelligenceLab/MultdBG-Time-Series-Library。

更新时间: 2025-07-03 00:58:49

领域: cs.LG

下载: http://arxiv.org/abs/2505.22768v3

Understanding Trade offs When Conditioning Synthetic Data

Learning robust object detectors from only a handful of images is a critical challenge in industrial vision systems, where collecting high quality training data can take months. Synthetic data has emerged as a key solution for data efficient visual inspection and pick and place robotics. Current pipelines rely on 3D engines such as Blender or Unreal, which offer fine control but still require weeks to render a small dataset, and the resulting images often suffer from a large gap between simulation and reality. Diffusion models promise a step change because they can generate high quality images in minutes, yet precise control, especially in low data regimes, remains difficult. Although many adapters now extend diffusion beyond plain text prompts, the effect of different conditioning schemes on synthetic data quality is poorly understood. We study eighty diverse visual concepts drawn from four standard object detection benchmarks and compare two conditioning strategies: prompt based and layout based. When the set of conditioning cues is narrow, prompt conditioning yields higher quality synthetic data; as diversity grows, layout conditioning becomes superior. When layout cues match the full training distribution, synthetic data raises mean average precision by an average of thirty four percent and by as much as one hundred seventy seven percent compared with using real data alone.

Updated: 2025-07-03 00:44:31

标题: 理解在条件合成数据时的权衡取舍

摘要: Learning robust object detectors from only a handful of images is a critical challenge in industrial vision systems, where collecting high quality training data can take months. Synthetic data has emerged as a key solution for data efficient visual inspection and pick and place robotics. Current pipelines rely on 3D engines such as Blender or Unreal, which offer fine control but still require weeks to render a small dataset, and the resulting images often suffer from a large gap between simulation and reality. Diffusion models promise a step change because they can generate high quality images in minutes, yet precise control, especially in low data regimes, remains difficult. Although many adapters now extend diffusion beyond plain text prompts, the effect of different conditioning schemes on synthetic data quality is poorly understood. We study eighty diverse visual concepts drawn from four standard object detection benchmarks and compare two conditioning strategies: prompt based and layout based. When the set of conditioning cues is narrow, prompt conditioning yields higher quality synthetic data; as diversity grows, layout conditioning becomes superior. When layout cues match the full training distribution, synthetic data raises mean average precision by an average of thirty four percent and by as much as one hundred seventy seven percent compared with using real data alone.

更新时间: 2025-07-03 00:44:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.02217v1

Hybrid least squares for learning functions from highly noisy data

Motivated by the need for efficient estimation of conditional expectations, we consider a least-squares function approximation problem with heavily polluted data. Existing methods that are powerful in the small noise regime are suboptimal when large noise is present. We propose a hybrid approach that combines Christoffel sampling with certain types of optimal experimental design to address this issue. We show that the proposed algorithm enjoys appropriate optimality properties for both sample point generation and noise mollification, leading to improved computational efficiency and sample complexity compared to existing methods. We also extend the algorithm to convex-constrained settings with similar theoretical guarantees. When the target function is defined as the expectation of a random field, we extend our approach to leverage adaptive random subspaces and establish results on the approximation capacity of the adaptive procedure. Our theoretical findings are supported by numerical studies on both synthetic data and on a more challenging stochastic simulation problem in computational finance.

Updated: 2025-07-03 00:31:29

标题: 混合最小二乘法用于从高噪声数据中学习函数

摘要: 受到对条件期望值高效估计的需求的激励，我们考虑一个具有严重污染数据的最小二乘函数逼近问题。现有方法在小噪声区间内表现强大，但在存在大噪声时则不是最优的。我们提出了一种混合方法，将克里斯托费尔抽样与某些类型的最佳实验设计相结合，以解决这个问题。我们展示了所提出的算法在样本点生成和噪声平滑方面具有适当的最优性特性，与现有方法相比，计算效率和样本复杂度得到了提高。我们还将该算法扩展到具有类似理论保证的凸约束设置中。当目标函数被定义为随机场的期望时，我们将我们的方法扩展到利用自适应随机子空间，并建立关于自适应程序的逼近能力的结果。我们的理论发现得到了在合成数据和更具挑战性的计算金融中的随机模拟问题上的数值研究的支持。

更新时间: 2025-07-03 00:31:29

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2507.02215v1

SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers

Graphical Abstracts (GAs) play a crucial role in visually conveying the key findings of scientific papers. While recent research has increasingly incorporated visual materials such as Figure 1 as de facto GAs, their potential to enhance scientific communication remains largely unexplored. Moreover, designing effective GAs requires advanced visualization skills, creating a barrier to their widespread adoption. To tackle these challenges, we introduce SciGA-145k, a large-scale dataset comprising approximately 145,000 scientific papers and 1.14 million figures, explicitly designed for supporting GA selection and recommendation as well as facilitating research in automated GA generation. As a preliminary step toward GA design support, we define two tasks: 1) Intra-GA recommendation, which identifies figures within a given paper that are well-suited to serve as GAs, and 2) Inter-GA recommendation, which retrieves GAs from other papers to inspire the creation of new GAs. We provide reasonable baseline models for these tasks. Furthermore, we propose Confidence Adjusted top-1 ground truth Ratio (CAR), a novel recommendation metric that offers a fine-grained analysis of model behavior. CAR addresses limitations in traditional ranking-based metrics by considering cases where multiple figures within a paper, beyond the explicitly labeled GA, may also serve as GAs. By unifying these tasks and metrics, our SciGA-145k establishes a foundation for advancing visual scientific communication while contributing to the development of AI for Science.

Updated: 2025-07-03 00:21:38

标题: SciGA：用于设计学术论文图形摘要的全面数据集

摘要: 图形摘要（GAs）在视觉上传达科学论文的关键发现方面起着至关重要的作用。尽管最近的研究越来越多地将诸如图1之类的视觉材料作为事实上的GAs，但它们提升科学交流的潜力仍然未被充分探索。此外，设计有效的GAs需要高级的可视化技能，这对它们的广泛应用构成了障碍。为了解决这些挑战，我们介绍了SciGA-145k，这是一个大规模数据集，包括约145,000篇科学论文和1.14百万幅图，专门设计用于支持GA的选择和推荐，以及促进自动生成GA的研究。作为支持GA设计的初步步骤，我们定义了两个任务：1）Intra-GA推荐，识别给定论文内适合作为GAs的图像，以及2）Inter-GA推荐，从其他论文中检索GAs以激发新GAs的创作。我们为这些任务提供了合理的基线模型。此外，我们提出了置信调整的top-1地面真实比率（CAR），这是一种新颖的推荐度量标准，可以对模型行为进行细粒度分析。CAR通过考虑在论文内除明确定义的GA之外，其他多个图像也可能作为GAs的情况，解决了传统基于排名的度量标准的局限性。通过统一这些任务和度量标准，我们的SciGA-145k为推进视觉科学交流奠定了基础，同时为科学AI的发展做出了贡献。

更新时间: 2025-07-03 00:21:38

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.02212v1

EIM-TRNG: Obfuscating Deep Neural Network Weights with Encoding-in-Memory True Random Number Generator via RowHammer

True Random Number Generators (TRNGs) play a fundamental role in hardware security, cryptographic systems, and data protection. In the context of Deep NeuralNetworks (DNNs), safeguarding model parameters, particularly weights, is critical to ensure the integrity, privacy, and intel-lectual property of AI systems. While software-based pseudo-random number generators are widely used, they lack the unpredictability and resilience offered by hardware-based TRNGs. In this work, we propose a novel and robust Encoding-in-Memory TRNG called EIM-TRNG that leverages the inherent physical randomness in DRAM cell behavior, particularly under RowHammer-induced disturbances, for the first time. We demonstrate how the unpredictable bit-flips generated through carefully controlled RowHammer operations can be harnessed as a reliable entropy source. Furthermore, we apply this TRNG framework to secure DNN weight data by encoding via a combination of fixed and unpredictable bit-flips. The encrypted data is later decrypted using a key derived from the probabilistic flip behavior, ensuring both data confidentiality and model authenticity. Our results validate the effectiveness of DRAM-based entropy extraction for robust, low-cost hardware security and offer a promising direction for protecting machine learning models at the hardware level.

Updated: 2025-07-03 00:01:33

标题: EIM-TRNG：使用Encoding-in-Memory真随机数生成器通过RowHammer混淆深度神经网络权重

摘要: 真随机数发生器（TRNGs）在硬件安全、加密系统和数据保护中起着基础性作用。在深度神经网络（DNNs）的背景下，保护模型参数，特别是权重，对确保人工智能系统的完整性、隐私和知识产权至关重要。虽然基于软件的伪随机数发生器被广泛使用，但它们缺乏硬件TRNGs所提供的不可预测性和弹性。在这项工作中，我们提出了一种新颖而强大的记忆中编码TRNG，称为EIM-TRNG，首次利用DRAM单元行为中固有的物理随机性，特别是在RowHammer引起的干扰中。我们展示了通过精心控制的RowHammer操作生成的不可预测位翻转如何可以被利用为可靠的熵源。此外，我们将这个TRNG框架应用于通过固定和不可预测的位翻转的组合进行编码以保护DNN权重数据。加密数据稍后使用从概率性翻转行为中导出的密钥进行解密，确保数据的保密性和模型的真实性。我们的结果验证了基于DRAM的熵提取对于强大、低成本的硬件安全的有效性，并为在硬件层面保护机器学习模型提供了一个有前途的方向。

更新时间: 2025-07-03 00:01:33

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.02206v1