Arxiv Day: Article

Accelerating Monte-Carlo Tree Search with Optimized Posterior Policies

We introduce a recursive AlphaZero-style Monte--Carlo tree search algorithm, "RMCTS". The advantage of RMCTS over AlphaZero's MCTS-UCB is speed. In RMCTS, the search tree is explored in a breadth-first manner, so that network inferences naturally occur in large batches. This significantly reduces the GPU latency cost. We find that RMCTS is often more than 40 times faster than MCTS-UCB when searching a single root state, and about 3 times faster when searching a large batch of root states. The recursion in RMCTS is based on computing optimized posterior policies at each game state in the search tree, starting from the leaves and working back up to the root. Here we use the posterior policy explored in "Monte--Carlo tree search as regularized policy optimization" (Grill, et al.) Their posterior policy is the unique policy which maximizes the expected reward given estimated action rewards minus a penalty for diverging from the prior policy. The tree explored by RMCTS is not defined in an adaptive manner, as it is in MCTS-UCB. Instead, the RMCTS tree is defined by following prior network policies at each node. This is a disadvantage, but the speedup advantage is more significant, and in practice we find that RMCTS-trained networks match the quality of MCTS-UCB-trained networks in roughly one-third of the training time. We include timing and quality comparisons of RMCTS vs. MCTS-UCB for three games: Connect-4, Dots-and-Boxes, and Othello.

Updated: 2026-01-03 23:38:43

标题: 用优化的后验策略加速蒙特卡洛树搜索

摘要: 我们介绍了一种递归式的AlphaZero风格的蒙特卡罗树搜索算法，称为"RMCTS"。RMCTS相对于AlphaZero的MCTS-UCB的优势在于速度。在RMCTS中，搜索树以广度优先的方式被探索，因此网络推断自然地以大批量进行。这显著降低了GPU延迟成本。我们发现，当搜索单个根状态时，RMCTS通常比MCTS-UCB快40多倍，当搜索一大批根状态时，大约快3倍。 RMCTS中的递归是基于在搜索树中的每个游戏状态计算优化后策略，从叶子节点开始向根节点回溯。在这里我们使用了在"蒙特卡罗树搜索作为正则化策略优化"（Grill等人）中探索的后策略。他们的后策略是唯一最大化期望奖励的策略，考虑了估计动作奖励减去违反先前策略的惩罚。 RMCTS探索的树不是像MCTS-UCB那样以自适应方式定义的。相反，RMCTS树是通过在每个节点遵循先前网络策略来定义的。这是一个缺点，但速度优势更为显著，在实践中我们发现经过RMCTS训练的网络在大约三分之一的训练时间内与经过MCTS-UCB训练的网络的质量相匹配。我们对RMCTS与MCTS-UCB在三个游戏中的时间和质量进行了比较：四子棋，点格棋和奥赛罗。

更新时间: 2026-01-03 23:38:43

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2601.01301v1

Subsampled Ensemble Can Improve Generalization Tail Exponentially

Ensemble learning is a popular technique to improve the accuracy of machine learning models. It traditionally hinges on the rationale that aggregating multiple weak models can lead to better models with lower variance and hence higher stability, especially for discontinuous base learners. In this paper, we provide a new perspective on ensembling. By selecting the most frequently generated model from the base learner when repeatedly applied to subsamples, we can attain exponentially decaying tails for the excess risk, even if the base learner suffers from slow (i.e., polynomial) decay rates. This tail enhancement power of ensembling applies to base learners that have reasonable predictive power to begin with and is stronger than variance reduction in the sense of exhibiting rate improvement. We demonstrate how our ensemble methods can substantially improve out-of-sample performances in a range of numerical examples involving heavy-tailed data or intrinsically slow rates.

Updated: 2026-01-03 23:30:57

标题: 子采样集合可以显著提高泛化能力

摘要: 集成学习是一种提高机器学习模型准确性的流行技术。传统上，它依赖于聚合多个弱模型可以导致具有更低方差和更高稳定性的更好模型的理念，尤其是对于不连续的基础学习者。在本文中，我们提供了对集成的新视角。通过在重复应用于子样本时从基础学习者中选择最经常生成的模型，即使基础学习者遭受缓慢（即多项式）衰减速率，我们也可以获得指数衰减的尾部超额风险。这种集成的尾部增强能力适用于一开始就具有合理预测能力的基础学习者，并且在展示速率改进方面比方差减少更强。我们展示了我们的集成方法如何在涉及重尾数据或内在缓慢速率的一系列数值示例中显著提高样本外表现。

更新时间: 2026-01-03 23:30:57

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.14741v5

T3C: Test-Time Tensor Compression with Consistency Guarantees

We present T3C, a train-once, test-time budget-conditioned compression framework that exposes rank and precision as a controllable deployment knob. T3C combines elastic tensor factorization (maintained up to a maximal rank) with rank-tied mixed-precision quantization and a lightweight controller that maps a latency/energy/size budget token to per-layer rank/bit assignments; the policy snaps to hardware-aligned profiles and is monotone in the budget. A fast, layerwise consistency certificate, computed from spectral proxies and activation statistics, upper-bounds logit drift and regularizes training, yielding a practical reliability signal with negligible overhead. On ImageNet-1k, T3C shifts the vision Pareto frontier: for ResNet-50 at matched accuracy (\leq 0.5% drop), p50 latency is 1.18ms with a 38MB model, outperforming PTQ-8b (1.44ms, 88MB); for ViT-B/16, T3C reaches 2.30ms p50 with 59MB, improving over strong PTQ/QAT baselines. A single T3C checkpoint therefore provides predictable, certificate-backed accuracy-latency-size trade-offs on demand across devices.

Updated: 2026-01-03 23:16:27

标题: T3C：具有一致性保证的测试时张量压缩

摘要: 我们提出了T3C，一种一次训练、测试时受预算限制的压缩框架，可以将秩和精度作为可控的部署旋钮。T3C将弹性张量分解（保持至最大秩）与秩绑定的混合精度量化结合起来，并使用轻量级控制器将延迟/能量/大小预算令牌映射到每层秩/位分配；策略与硬件对齐的配置文件相匹配，并且在预算上是单调的。从谱代理和激活统计数据计算的快速、逐层一致性证书，上限了逻辑漂移并正则化训练，产生了一个可靠性信号，几乎没有额外开销。在ImageNet-1k上，T3C改变了视觉帕累托边界：对于匹配准确性的ResNet-50（≤0.5%下降），p50延迟为1.18ms，模型大小为38MB，优于PTQ-8b（1.44ms，88MB）；对于ViT-B/16，T3C以59MB达到2.30ms的p50，优于强PTQ/QAT基线。因此，单个T3C检查点可以根据需求在各种设备上提供可预测的、有证书支持的准确性-延迟-大小权衡。

更新时间: 2026-01-03 23:16:27

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2601.01299v1

Warp-Cortex: An Asynchronous, Memory-Efficient Architecture for Million-Agent Cognitive Scaling on Consumer Hardware

Current multi-agent Large Language Model (LLM) frameworks suffer from linear memory scaling, rendering "System 2" parallel reasoning impractical on consumer hardware. We present Warp Cortex, an asynchronous architecture that theoretically enables million-agent cognitive scaling by decoupling agent logic from physical memory. Through Singleton Weight Sharing and a novel Topological Synapse--inspired by hybrid landmarking techniques from Topological Data Analysis (TDA)--we reduce memory complexity from O(N * L) to O(1) for weights and O(N * k) for context, where k << L. By treating the KV-cache as a point cloud in latent space, we apply witness-complex-inspired sparsification to preserve persistent homological features of the context manifold. On a single NVIDIA RTX 4090, we empirically demonstrate 100 concurrent agents at 2.2 GB total VRAM, with theoretical capacity exceeding 1,000 agents before compute latency becomes the bottleneck. We further introduce Referential Injection, a non-intrusive KV-cache update mechanism that allows asynchronous sub-agents to influence primary generation without stream disruption.

Updated: 2026-01-03 23:11:21

标题: 弯曲皮层：一种异步、内存高效的架构，用于在消费硬件上实现百万代理认知规模扩展

摘要: 目前的多代理大型语言模型（LLM）框架存在线性内存扩展的问题，导致在消费者硬件上无法实现“系统2”并行推理。我们提出了Warp Cortex，这是一种异步体系结构，理论上可以通过将代理逻辑与物理内存解耦，实现百万代理认知扩展。通过单例权重共享和受拓扑数据分析（TDA）中混合地标技术启发的新型拓扑突触，我们将权重的内存复杂度从O（N * L）降低到O（1），将上下文的内存复杂度从O（N * k）降低到O（1），其中 k << L。通过将KV缓存视为潜在空间中的点云，我们应用见证复杂启发式稀疏化方法，以保留上下文流形的持久同调特征。在单个NVIDIA RTX 4090上，我们在总共2.2 GB的VRAM上实验性地展示了100个并发代理，理论容量超过1,000个代理，直到计算延迟成为瓶颈。我们进一步引入了引用注入，这是一种非侵入式的KV缓存更新机制，允许异步子代理影响主要生成而不干扰流程。

更新时间: 2026-01-03 23:11:21

领域: cs.LG,cs.AI,cs.AR,cs.DC,cs.MA

下载: http://arxiv.org/abs/2601.01298v1

ARGUS: Adaptive Rotation-Invariant Geometric Unsupervised System

Detecting distributional drift in high-dimensional data streams presents fundamental challenges: global comparison methods scale poorly, projection-based approaches lose geometric structure, and re-clustering methods suffer from identity instability. This paper introduces Argus, A framework that reconceptualizes drift detection as tracking local statistics over a fixed spatial partition of the data manifold. The key contributions are fourfold. First, it is proved that Voronoi tessellations over canonical orthonormal frames yield drift metrics that are invariant to orthogonal transformations. The rotations and reflections that preserve Euclidean geometry. Second, it is established that this framework achieves O(N) complexity per snapshot while providing cell-level spatial localization of distributional change. Third, a graph-theoretic characterization of drift propagation is developed that distinguishes coherent distributional shifts from isolated perturbations. Fourth, product quantization tessellation is introduced for scaling to very high dimensions (d>500) by decomposing the space into independent subspaces and aggregating drift signals across subspaces. This paper formalizes the theoretical foundations, proves invariance properties, and presents experimental validation demonstrating that the framework correctly identifies drift under coordinate rotation while existing methods produce false positives. The tessellated approach offers a principled geometric foundation for distribution monitoring that preserves high-dimensional structure without the computational burden of pairwise comparisons.

Updated: 2026-01-03 22:39:20

标题: ARGUS：自适应旋转不变几何无监督系统

摘要: 在高维数据流中检测分布漂移存在基本挑战：全局比较方法扩展性差，基于投影的方法失去几何结构，重新聚类方法受到身份不稳定性的影响。本文介绍了Argus框架，将漂移检测重新构想为跟踪数据流形的固定空间划分上的局部统计信息。主要贡献包括四个方面。首先，证明了在规范正交框架上的Voronoi镶嵌产生的漂移度量对正交变换不变，这些旋转和反射保持欧几里得几何。其次，建立了这个框架在每个快照中实现O(N)复杂度，同时提供单元级空间定位分布变化。第三，开发了漂移传播的图论特征化方法，区分了一致的分布转移与孤立的扰动。第四，引入了产品量化镶嵌，通过将空间分解为独立子空间并在子空间间汇总漂移信号，实现了对非常高维度（d>500）的扩展。本文正式阐述了理论基础，证明了不变性属性，并通过实验验证证明了该框架在坐标旋转下正确识别漂移，而现有方法产生误报。镶嵌式方法为保持高维结构而无需进行成对比较的计算负担提供了一种基于原则的几何基础。

更新时间: 2026-01-03 22:39:20

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2601.01297v1

Aggressive Compression Enables LLM Weight Theft

As frontier AIs become more powerful and costly to develop, adversaries have increasing incentives to steal model weights by mounting exfiltration attacks. In this work, we consider exfiltration attacks where an adversary attempts to sneak model weights out of a datacenter over a network. While exfiltration attacks are multi-step cyber attacks, we demonstrate that a single factor, the compressibility of model weights, significantly heightens exfiltration risk for large language models (LLMs). We tailor compression specifically for exfiltration by relaxing decompression constraints and demonstrate that attackers could achieve 16x to 100x compression with minimal trade-offs, reducing the time it would take for an attacker to illicitly transmit model weights from the defender's server from months to days. Finally, we study defenses designed to reduce exfiltration risk in three distinct ways: making models harder to compress, making them harder to 'find,' and tracking provenance for post-attack analysis using forensic watermarks. While all defenses are promising, the forensic watermark defense is both effective and cheap, and therefore is a particularly attractive lever for mitigating weight-exfiltration risk.

Updated: 2026-01-03 22:34:53

标题: Aggressive Compression Enables LLM Weight Theft 激进的压缩技术实现了LLM权重盗窃

摘要: 随着边缘人工智能变得更加强大和昂贵，对手有越来越大的动力通过进行数据泄露攻击来窃取模型权重。在这项工作中，我们考虑了一种对手试图通过网络从数据中心偷取模型权重的数据泄露攻击。虽然数据泄露攻击是多步骤的网络攻击，但我们证明了一个单一因素，即模型权重的可压缩性，显著增加了大型语言模型（LLMs）的数据泄露风险。我们通过特别为数据泄露定制压缩，放宽解压缩约束，并证明攻击者可以在最小的权衡下实现16倍至100倍的压缩，从而缩短了攻击者非法传输模型权重从防御者服务器需要的时间从几个月缩短到几天。最后，我们研究了三种不同方式设计的用于降低数据泄露风险的防御措施：使模型更难压缩、使其更难“找到”以及使用法证水印跟踪溯源进行攻击后分析。虽然所有的防御措施都很有前途，但法证水印防御既有效又廉价，因此特别适合用于减轻权重泄露风险。

更新时间: 2026-01-03 22:34:53

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2601.01296v1

Echo State Networks for Spatio-Temporal Area-Level Data

Spatio-temporal area-level datasets play a critical role in official statistics, providing valuable insights for policy-making and regional planning. Accurate modeling and forecasting of these datasets can be extremely useful for policymakers to develop informed strategies for future planning. Echo State Networks (ESNs) are efficient methods for capturing nonlinear temporal dynamics and generating forecasts. However, ESNs lack a direct mechanism to account for the neighborhood structure inherent in area-level data. Ignoring these spatial relationships can significantly compromise the accuracy and utility of forecasts. In this paper, we incorporate approximate graph spectral filters at the input stage of the ESN, thereby improving forecast accuracy while preserving the model's computational efficiency during training. We demonstrate the effectiveness of our approach using Eurostat's tourism occupancy dataset and show how it can support more informed decision-making in policy and planning contexts.

Updated: 2026-01-03 22:34:22

标题: Echo State Networks用于时空区域级数据

摘要: 空间-时间区域级数据集在官方统计中起着至关重要的作用，为政策制定和区域规划提供宝贵的见解。准确建模和预测这些数据集对政策制定者制定未来规划方略非常有用。回声状态网络（ESNs）是捕捉非线性时间动态和生成预测的高效方法。然而，ESNs缺乏直接机制来考虑区域级数据中固有的邻域结构。忽视这些空间关系可能会严重损害预测的准确性和实用性。在本文中，我们在ESN的输入阶段引入近似图谱滤波器，从而在训练过程中提高预测准确性，同时保持模型的计算效率。我们使用Eurostat的旅游占用数据集展示了我们方法的有效性，并展示了它如何在政策和规划背景下支持更明智的决策制定。

更新时间: 2026-01-03 22:34:22

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2410.10641v2

MFAI: A Scalable Bayesian Matrix Factorization Approach to Leveraging Auxiliary Information

In various practical situations, matrix factorization methods suffer from poor data quality, such as high data sparsity and low signal-to-noise ratio (SNR). Here, we consider a matrix factorization problem by utilizing auxiliary information, which is massively available in real-world applications, to overcome the challenges caused by poor data quality. Unlike existing methods that mainly rely on simple linear models to combine auxiliary information with the main data matrix, we propose to integrate gradient boosted trees in the probabilistic matrix factorization framework to effectively leverage auxiliary information (MFAI). Thus, MFAI naturally inherits several salient features of gradient boosted trees, such as the capability of flexibly modeling nonlinear relationships and robustness to irrelevant features and missing values in auxiliary information. The parameters in MFAI can be automatically determined under the empirical Bayes framework, making it adaptive to the utilization of auxiliary information and immune to overfitting. Moreover, MFAI is computationally efficient and scalable to large datasets by exploiting variational inference. We demonstrate the advantages of MFAI through comprehensive numerical results from simulation studies and real data analyses. Our approach is implemented in the R package mfair available at https://github.com/YangLabHKUST/mfair.

Updated: 2026-01-03 22:30:41

标题: MFAI：一种可扩展的贝叶斯矩阵因子分解方法，用于利用辅助信息

摘要: 在各种实际情况下，矩阵分解方法常常受到数据质量差的影响，例如高数据稀疏性和低信噪比（SNR）。在这里，我们考虑通过利用在现实世界应用中广泛可用的辅助信息来解决由于数据质量差引起的挑战的矩阵分解问题。与现有方法主要依赖于简单的线性模型将辅助信息与主数据矩阵相结合不同，我们提出在概率矩阵分解框架中集成梯度提升树以有效利用辅助信息（MFAI）。因此，MFAI自然继承了梯度提升树的几个显著特点，例如灵活建模非线性关系的能力以及对辅助信息中无关特征和缺失值的鲁棒性。在经验贝叶斯框架下，MFAI中的参数可以自动确定，使其适应于辅助信息的利用并免受过拟合的影响。此外，通过利用变分推断，MFAI在计算上高效且可扩展到大型数据集。我们通过模拟研究和实际数据分析的全面数值结果展示了MFAI的优势。我们的方法已在R包mfair中实现，可在https://github.com/YangLabHKUST/mfair 上获取。

更新时间: 2026-01-03 22:30:41

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2303.02566v3

Sobolev Approximation of Deep ReLU Network in Log-weighted Barron Space

Universal approximation theorems show that neural networks can approximate any continuous function; however, the number of parameters may grow exponentially with the ambient dimension, so these results do not fully explain the practical success of deep models on high-dimensional data. Barron space theory addresses this: if a target function belongs to a Barron space, a two-layer network with $n$ parameters achieves an $O(n^{-1/2})$ approximation error in $L^2$. Yet classical Barron spaces $\mathscr{B}^{s+1}$ still require stronger regularity than Sobolev spaces $H^s$, and existing depth-sensitive results often assume constraints such as $sL \le 1/2$. In this paper, we introduce a log-weighted Barron space $\mathscr{B}^{\log}$, which requires a strictly weaker assumption than $\mathscr{B}^s$ for any $s>0$. For this new function space, we first study embedding properties and carry out a statistical analysis via the Rademacher complexity. Then we prove that functions in $\mathscr{B}^{\log}$ can be approximated by deep ReLU networks with explicit depth dependence. We then define a family $\mathscr{B}^{s,\log}$, establish approximation bounds in the $H^1$ norm, and identify maximal depth scales under which these rates are preserved. Our results clarify how depth reduces regularity requirements for efficient representation, offering a more precise explanation for the performance of deep architectures beyond the classical Barron setting, and for their stable use in high-dimensional problems used today.

Updated: 2026-01-03 22:03:19

标题: Sobolev逼近在对数加权的Barron空间中的深度ReLU网络

摘要: 通用逼近定理表明神经网络可以逼近任何连续函数；然而，参数数量可能会随着环境维度呈指数增长，因此这些结果并不能充分解释深度模型在高维数据上取得的实际成功。Barron空间理论解决了这个问题：如果目标函数属于Barron空间，具有$n$个参数的两层网络可以在$L^2$中实现$O(n^{-1/2})$的逼近误差。然而，经典的Barron空间$\mathscr{B}^{s+1}$仍然需要比Sobolev空间$H^s$更强的正则性，而现有的深度敏感结果通常假设一些约束，如$sL \le 1/2$。本文介绍了一个对数加权的Barron空间$\mathscr{B}^{\log}$，对于任意$s>0$，它要求的假设比$\mathscr{B}^s$更弱。对于这个新的函数空间，我们首先研究了嵌入特性，并通过Rademacher复杂度进行了统计分析。然后我们证明了$\mathscr{B}^{\log}$中的函数可以通过具有显式深度依赖性的深度ReLU网络进行逼近。我们定义了一个家族$\mathscr{B}^{s,\log}$，在$H^1$范数中建立了逼近界限，并确定了在这些速率得以保持的最大深度尺度。我们的结果阐明了深度如何降低了对有效表示的正则性要求，为深度架构在超越经典Barron设置并且在当今高维问题中稳定使用的性能提供了更精确的解释。

更新时间: 2026-01-03 22:03:19

领域: cs.LG,math.NA

下载: http://arxiv.org/abs/2601.01295v1

Diffusion Timbre Transfer Via Mutual Information Guided Inpainting

We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires no additional training: (i) a dimension-wise noise injection that targets latent channels most informative of instrument identity, and (ii) an early-step clamping mechanism that re-imposes the input's melodic and rhythmic structure during reverse diffusion. The method operates directly on audio latents and is compatible with text/audio conditioning (e.g., CLAP). We discuss design choices,analyze trade-offs between timbral change and structural preservation, and show that simple inference-time controls can meaningfully steer pre-trained models for style-transfer use cases.

Updated: 2026-01-03 21:53:35

标题: 通过互信息引导修补的扩散音色转移

摘要: 我们研究音色转移作为音乐音频推断时编辑问题。从一个强大的预训练潜在扩散模型开始，我们引入了一个轻量级的过程，不需要额外的训练：（i）一个针对最具信息性的乐器身份的潜在通道的维度-wise噪声注入，以及（ii）一个早期步骤夹持机制，在反向扩散过程中重新强加输入的旋律和节奏结构。该方法直接在音频潜在空间上运行，并与文本/音频调节（例如CLAP）兼容。我们讨论设计选择，分析音色变化和结构保留之间的权衡，并展示简单的推断时控制可以有意义地引导预训练模型用于样式转移用例。

更新时间: 2026-01-03 21:53:35

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2601.01294v1

The Alchemy of Thought: Understanding In-Context Learning Through Supervised Classification

In-context learning (ICL) has become a prominent paradigm to rapidly customize LLMs to new tasks without fine-tuning. However, despite the empirical evidence of its usefulness, we still do not truly understand how ICL works. In this paper, we compare the behavior of in-context learning with supervised classifiers trained on ICL demonstrations to investigate three research questions: (1) Do LLMs with ICL behave similarly to classifiers trained on the same examples? (2) If so, which classifiers are closer, those based on gradient descent (GD) or those based on k-nearest neighbors (kNN)? (3) When they do not behave similarly, what conditions are associated with differences in behavior? Using text classification as a use case, with six datasets and three LLMs, we observe that LLMs behave similarly to these classifiers when the relevance of demonstrations is high. On average, ICL is closer to kNN than logistic regression, giving empirical evidence that the attention mechanism behaves more similarly to kNN than GD. However, when demonstration relevance is low, LLMs perform better than these classifiers, likely because LLMs can back off to their parametric memory, a luxury these classifiers do not have.

Updated: 2026-01-03 21:33:12

标题: 思维的炼金术：通过监督分类理解上下文学习

摘要: 上下文学习（ICL）已成为一种突出的范式，可以快速定制LLMs以适应新任务，而无需进行微调。然而，尽管有其有用性的实证证据，我们仍然不真正了解ICL的工作原理。在本文中，我们比较了上下文学习的行为与在ICL演示上训练的监督分类器的行为，以探讨三个研究问题：（1）具有ICL的LLMs的行为是否与基于相同示例训练的分类器类似？（2）如果是，哪种分类器更接近，基于梯度下降（GD）的还是基于k最近邻居（kNN）的？（3）当它们的行为不相似时，与行为差异相关的条件是什么？以文本分类为用例，使用六个数据集和三个LLMs，我们观察到当演示的相关性较高时，LLMs的行为与这些分类器类似。平均而言，ICL更接近kNN而不是逻辑回归，提供了实证证据表明注意机制的行为更类似于kNN而不是GD。然而，当演示的相关性较低时，LLMs表现比这些分类器更好，这可能是因为LLMs可以回退到其参数化内存，而这些分类器没有这种奢侈。

更新时间: 2026-01-03 21:33:12

领域: cs.LG

下载: http://arxiv.org/abs/2601.01290v1

dataRLsec: Safety, Security, and Reliability With Robust Offline Reinforcement Learning for DPAs

Data poisoning attacks (DPAs) are becoming popular as artificial intelligence (AI) algorithms, machine learning (ML) algorithms, and deep learning (DL) algorithms in this artificial intelligence (AI) era. Hackers and penetration testers are excessively injecting malicious contents in the training data (and in testing data too) that leads to false results that are very hard to inspect and predict. We have analyzed several recent technologies used (from deep reinforcement learning to federated learning) for the DPAs and their safety, security, & countermeasures. The problem setup along with the problem estimation is shown in the MuJoCo environment with performance of HalfCheetah before the dataset is poisoned and after the dataset is poisoned. We have analyzed several risks associated with the DPAs and falsification in medical data from popular poisoning data attacks to some popular data defenses. We have proposed robust offline reinforcement learning (Offline RL) for the safety and reliability with weighted hash verification along with density-ratio weighted behavioral cloning (DWBC) algorithm. The four stages of the proposed algorithm (as the Stage 0, the Stage 1, the Stage 2, and the Stage 3) are described with respect to offline RL, safety, and security for DPAs. The conclusion and future scope are provided with the intent to combine DWBC with other data defense strategies to counter and protect future contamination cyberattacks.

Updated: 2026-01-03 21:28:17

标题: dataRLsec: 使用强大的离线强化学习实现DPAs的安全、安全和可靠性

摘要: 数据毒化攻击（DPAs）在人工智能（AI）算法、机器学习（ML）算法和深度学习（DL）算法中变得越来越流行。黑客和渗透测试人员过多地向训练数据（以及测试数据）中注入恶意内容，导致难以检查和预测的虚假结果。我们分析了用于DPAs的几种最新技术（从深度强化学习到联邦学习）及其安全性、安全性和对策。在MuJoCo环境中展示了半猎豹在数据集被毒化之前和之后的表现。我们分析了与DPAs和医疗数据伪造相关的几种风险，从流行的毒化数据攻击到一些流行的数据防御措施。我们提出了用于安全性和可靠性的强化离线学习（Offline RL），并结合加权哈希验证和密度比加权行为克隆（DWBC）算法。提出的算法的四个阶段（作为阶段0、阶段1、阶段2和阶段3）描述了离线RL、安全性和安全性方面的DPAs。结论和未来范围旨在将DWBC与其他数据防御策略结合起来，以对抗和保护未来的污染网络攻击。

更新时间: 2026-01-03 21:28:17

领域: cs.CR

下载: http://arxiv.org/abs/2601.01289v1

GRACE: Discriminator-Guided Chain-of-Thought Reasoning

In the context of multi-step reasoning, e.g., with chain-of-thought, language models (LMs) can easily assign a high likelihood to incorrect steps. As a result, decoding strategies that optimize for solution likelihood often yield incorrect solutions. To address this issue, we propose Guiding chain-of-thought ReAsoning with a CorrectnEss Discriminator (GRACE), a stepwise decoding approach that steers the decoding process towards producing correct reasoning steps. GRACE employs a step-level verifier or discriminator trained with a contrastive loss over correct and incorrect steps, which is used during decoding to score next-step candidates based on their correctness. Importantly, GRACE only requires sampling from the LM, without the need for LM training or fine-tuning. Using models from FLAN-T5 and LLaMA families, we evaluate GRACE over four math and two symbolic reasoning tasks, where it exhibits substantial performance gains compared to greedy decoding, verifiers, and self-consistency in most settings. When further combined with self-consistency, GRACE outperforms all the baselines by sizeable margins. Human and LLM evaluations over GSM8K show that GRACE not only improves the final answer accuracy but also the correctness of the intermediate reasoning. Our implementation can be accessed at https://github.com/mukhal/grace.

Updated: 2026-01-03 21:27:33

标题: GRACE：鉴别器引导的思维链推理

摘要: 在多步推理的背景下，例如通过思维链，语言模型（LMs）很容易将高概率分配给错误的步骤。因此，优化解决方案概率的解码策略通常会产生错误的解决方案。为了解决这个问题，我们提出了一种名为GRACE（Guiding chain-of-thought ReAsoning with a CorrectnEss Discriminator）的逐步解码方法，该方法将解码过程引导到产生正确推理步骤。GRACE采用了一个在正确和不正确的步骤之间进行对比损失训练的步骤级验证器或鉴别器，在解码过程中根据其正确性对下一步候选项进行评分。重要的是，GRACE只需要从LM进行抽样，而无需进行LM训练或微调。使用FLAN-T5和LLaMA系列模型，我们在四个数学和两个符号推理任务上评估了GRACE，在大多数设置中，与贪婪解码、验证器和自一致性相比，它表现出显著的性能提升。当与自一致性进一步结合时，GRACE在所有基线上表现出色。通过对GSM8K的人类和LLM评估显示，GRACE不仅提高了最终答案的准确性，还提高了中间推理的正确性。我们的实现可以在https://github.com/mukhal/grace 上访问。

更新时间: 2026-01-03 21:27:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.14934v3

PyBatchRender: A Python Library for Batched 3D Rendering at Up to One Million FPS

Reinforcement learning from pixels is often bottlenecked by the performance and complexity of 3D rendered environments. Researchers face a trade-off between high-speed, low-level engines and slower, more accessible Python frameworks. To address this, we introduce PyBatchRender, a Python library for high-throughput, batched 3D rendering that achieves over 1 million FPS on simple scenes. Built on the Panda3D game engine, it utilizes its mature ecosystem while enhancing performance through optimized batched rendering for up to 1000X speedups. Designed as a physics-agnostic renderer for reinforcement learning from pixels, PyBatchRender offers greater flexibility than dedicated libraries, simpler setup than typical game-engine wrappers, and speeds rivaling state-of-the-art C++ engines like Madrona. Users can create custom scenes entirely in Python with tens of lines of code, enabling rapid prototyping for scalable AI training. Open-source and easy to integrate, it serves to democratize high-performance 3D simulation for researchers and developers. The library is available at https://github.com/dolphin-in-a-coma/PyBatchRender.

Updated: 2026-01-03 21:19:57

标题: PyBatchRender：一个能够以高达一百万FPS进行批量3D渲染的Python库

摘要: 像素级的强化学习通常受到3D渲染环境性能和复杂性的限制。研究人员面临着高速低级引擎和较慢更易访问的Python框架之间的折衷。为了解决这个问题，我们引入了PyBatchRender，这是一个用于高吞吐量批量3D渲染的Python库，可以在简单场景中实现超过100万FPS的渲染速度。基于Panda3D游戏引擎构建，它利用其成熟的生态系统，通过优化的批量渲染实现高达1000倍的速度提升。作为一个针对像素级强化学习的物理无关渲染器，PyBatchRender比专用库提供更大的灵活性，比典型的游戏引擎包装器提供更简单的设置，并且速度可以与Madrona等最先进的C++引擎媲美。用户可以在Python中使用几十行代码创建自定义场景，实现可扩展的AI训练的快速原型设计。这个开源库易于集成，旨在为研究人员和开发者提供高性能的3D模拟。该库可在https://github.com/dolphin-in-a-coma/PyBatchRender上找到。

更新时间: 2026-01-03 21:19:57

领域: cs.GR,cs.AI,cs.PF,cs.RO

下载: http://arxiv.org/abs/2601.01288v1

Compliance as a Trust Metric

Trust and Reputation Management Systems (TRMSs) are critical for the modern web, yet their reliance on subjective user ratings or narrow Quality of Service (QoS) metrics lacks objective grounding. Concurrently, while regulatory frameworks like GDPR and HIPAA provide objective behavioral standards, automated compliance auditing has been limited to coarse, binary (pass/fail) outcomes. This paper bridges this research gap by operationalizing regulatory compliance as a quantitative and dynamic trust metric through our novel automated compliance engine (ACE). ACE first formalizes legal and organizational policies into a verifiable, obligation-centric logic. It then continuously audits system event logs against this logic to detect violations. The core of our contribution is a quantitative model that assesses the severity of each violation along multiple dimensions, including its Volume, Duration, Breadth, and Criticality, to compute a fine-grained, evolving compliance score. We evaluate ACE on a synthetic hospital dataset, demonstrating its ability to accurately detect a range of complex HIPAA and GDPR violations and produce a nuanced score that is significantly more expressive than traditional binary approaches. This work enables the development of more transparent, accountable, and resilient TRMSs on the Web.

Updated: 2026-01-03 21:14:40

标题: 遵从作为一种信任度量

摘要: 信任和声誉管理系统（TRMSs）对于现代网络至关重要，然而它们依赖于主观用户评分或狭窄的服务质量（QoS）指标缺乏客观基础。同时，虽然像GDPR和HIPAA这样的监管框架提供客观的行为标准，但自动合规审计仅限于粗略的二元（合格/不合格）结果。本文通过将法规合规化为量化和动态信任指标，通过我们的新颖自动合规引擎（ACE）来弥补这一研究空白。ACE首先将法律和组织政策形式化为可验证的义务中心逻辑。然后，它持续对系统事件日志进行审核以检测违规行为。我们的贡献核心是一个定量模型，评估每个违规行为在多个维度上的严重程度，包括其数量、持续时间、广度和重要性，以计算一个细粒度、不断演变的合规得分。我们在一个合成医院数据集上评估了ACE，展示了其准确检测一系列复杂的HIPAA和GDPR违规行为的能力，并产生一个比传统二进制方法更具表现力的细致得分。这项工作促进了更透明、负责任和有弹性的网络上的TRMSs的发展。

更新时间: 2026-01-03 21:14:40

领域: cs.CR

下载: http://arxiv.org/abs/2601.01287v1

Membership Inference Attacks on LLM-based Recommender Systems

Large language models (LLMs) based recommender systems (RecSys) can adapt to different domains flexibly. It utilizes in-context learning (ICL), i.e., prompts, to customize the recommendation functions, which include sensitive historical user-specific item interactions, encompassing implicit feedback such as clicked items and explicit product reviews. Such private information may be exposed by novel privacy attacks. However, no study has been conducted on this important issue. We design several membership inference attacks (MIAs) aimed to revealing whether system prompts include victims' historical interactions. The attacks are \emph{Similarity, Memorization, Inquiry, and Poisoning attacks}, each utilizing unique features of LLMs or RecSys. We have carefully evaluated them on five of the latest open-source LLMs and three well-known RecSys benchmark datasets. The results confirm that the MIA threat to LLM RecSys is realistic: inquiry and poisoning attacks show significantly high attack advantages. We also discussed possible methods to mitigate such MIA threats. We have also analyzed the factors affecting these attacks, such as the number of shots in system prompts, the position of the victim in the shots, the number of poisoning items in the prompt,etc.

Updated: 2026-01-03 20:55:31

标题: 基于LLM的推荐系统的会员推断攻击

摘要: 大型语言模型（LLMs）基于推荐系统（RecSys）可以灵活地适应不同的领域。它利用上下文学习（ICL），即提示，来定制推荐功能，其中包括敏感的历史用户特定项目交互，涵盖了点击项目和明确产品评论等隐式反馈。这些私人信息可能会被新型隐私攻击揭露。然而，目前还没有对这一重要问题进行研究。我们设计了几种旨在揭示系统提示是否包含受害者历史交互的成员推理攻击（MIAs）。攻击包括相似性、记忆、询问和投毒攻击，每种攻击利用LLMs或RecSys的独特特征。我们在五个最新的开源LLMs和三个知名的RecSys基准数据集上对它们进行了仔细评估。结果证实，对LLM RecSys的MIA威胁是现实的：询问和投毒攻击显示出明显的高攻击优势。我们还讨论了可能降低此类MIA威胁的方法。我们还分析了影响这些攻击的因素，如系统提示中的镜头数量，受害者在镜头中的位置，提示中的投毒项目数量等。

更新时间: 2026-01-03 20:55:31

领域: cs.IR,cs.AI,cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2508.18665v4

AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures

The increasing use of artificial intelligence generated deepfakes creates major challenges in maintaining digital authenticity. Four AI-based models, consisting of three CNNs and one Vision Transformer, were evaluated using large face image datasets. Data preprocessing and augmentation techniques improved model performance across different scenarios. VFDNET demonstrated superior accuracy with MobileNetV3, showing efficient performance, thereby demonstrating AI's capabilities for dependable deepfake detection.

Updated: 2026-01-03 20:44:50

标题: 利用CNN和Vision Transformer架构的AI支持的深度伪造检测

摘要: 人工智能生成的 deepfake 的广泛使用在维护数字真实性方面带来了重大挑战。通过使用大量人脸图像数据集对四个基于 AI 的模型进行评估，其中包括三个 CNN 和一个 Vision Transformer。数据预处理和增强技术提高了模型在不同场景下的性能。VFDNET 结合 MobileNetV3 展现出卓越的准确性，表现出高效的性能，从而展示了人工智能在可靠 deepfake 检测方面的能力。

更新时间: 2026-01-03 20:44:50

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2601.01281v1

Does Memory Need Graphs? A Unified Framework and Empirical Analysis for Long-Term Dialog Memory

Graph structures are increasingly used in dialog memory systems, but empirical findings on their effectiveness remain inconsistent, making it unclear which design choices truly matter. We present an experimental, system-oriented analysis of long-term dialog memory architectures. We introduce a unified framework that decomposes dialog memory systems into core components and supports both graph-based and non-graph approaches. Under this framework, we conduct controlled, stage-wise experiments on LongMemEval and HaluMem, comparing common design choices in memory representation, organization, maintenance, and retrieval. Our results show that many performance differences are driven by foundational system settings rather than specific architectural innovations. Based on these findings, we identify stable and reliable strong baselines for future dialog memory research.

Updated: 2026-01-03 20:39:39

标题: 记忆需要图吗？长期对话记忆的统一框架和实证分析

摘要: 图形结构在对话记忆系统中的应用越来越普遍，但关于其有效性的经验性发现仍然不一致，这使得哪些设计选择真正重要变得不清楚。我们提出了一种实验性的、系统导向的长期对话记忆结构分析。我们引入了一个统一的框架，将对话记忆系统分解为核心组件，并支持基于图形和非图形的方法。在这个框架下，我们对LongMemEval和HaluMem进行了受控的、阶段性的实验，比较了记忆表示、组织、维护和检索中的常见设计选择。我们的结果显示，许多性能差异是由基础系统设置驱动的，而不是特定的架构创新。基于这些发现，我们确定了未来对话记忆研究的稳定可靠的强基准线。

更新时间: 2026-01-03 20:39:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2601.01280v1

LLM Collusion

We study how delegating pricing to large language models (LLMs) can facilitate collusion in a duopoly when both sellers rely on the same pre-trained model. The LLM is characterized by (i) a propensity parameter capturing its internal bias toward high-price recommendations and (ii) an output-fidelity parameter measuring how tightly outputs track that bias; the propensity evolves through retraining. We show that configuring LLMs for robustness and reproducibility can induce collusion via a phase transition: there exists a critical output-fidelity threshold that pins down long-run behavior. Below it, competitive pricing is the unique long-run outcome. Above it, the system is bistable, with competitive and collusive pricing both locally stable and the realized outcome determined by the model's initial preference. The collusive regime resembles tacit collusion: prices are elevated on average, yet occasional low-price recommendations provide plausible deniability. With perfect fidelity, full collusion emerges from any interior initial condition. For finite training batches of size $b$, infrequent retraining (driven by computational costs) further amplifies collusion: conditional on starting in the collusive basin, the probability of collusion approaches one as $b$ grows, since larger batches dampen stochastic fluctuations that might otherwise tip the system toward competition. The indeterminacy region shrinks at rate $O(1/\sqrt{b})$.

Updated: 2026-01-03 20:38:21

标题: LLM共谋

摘要: 我们研究了如何将定价委托给大型语言模型（LLMs）可以在两个卖方都依赖相同的预训练模型的垄断市场中促进勾结。LLM的特征包括（i）一个捕捉其内部偏向高价推荐的倾向参数和（ii）一个衡量输出忠实度的参数；倾向通过重新训练进化。我们表明，配置LLMs以实现鲁棒性和可重复性可以通过一个相变诱发勾结：存在一个关键的输出忠实度阈值，固定长期行为。在此之下，竞争性定价是唯一的长期结果。在此之上，系统是双稳的，竞争性和勾结性定价都是局部稳定的，实现的结果取决于模型的初始偏好。勾结制度类似于默契勾结：价格平均上升，但偶尔的低价推荐提供了合理的否认可能性。在完美忠实度下，从任何内部初始条件中都会出现完全的勾结。对于大小为$b$的有限训练批次，不经常的重新训练（受到计算成本驱动）进一步增加了勾结：在开始于勾结区域的条件下，随着$b$的增长，勾结的概率接近于1，因为更大的批次抑制了可能使系统朝向竞争的随机波动。不确定性区域以$O(1/\sqrt{b})$的速率缩小。

更新时间: 2026-01-03 20:38:21

领域: econ.TH,cs.AI,cs.CE,cs.CL,cs.GT

下载: http://arxiv.org/abs/2601.01279v1

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we leverage new results from off-policy evaluation; it has recently been shown that well-designed behaviour policies can be used to collect off-policy data for provably lower variance return estimates. This result is surprising as it means collecting data on-policy is not variance optimal. We extend this key insight to the online reinforcement learning setting, where both policy evaluation and improvement are interleaved to learn optimal policies. Off-policy RL has been well studied (e.g., IMPALA), with correct and truncated importance weighted samples for de-biasing and managing variance appropriately. Generally these approaches are concerned with reconciling data collected from multiple workers in parallel, while the policy is updated asynchronously, mismatch between the workers and policy is corrected in a mathematically sound way. Here we consider only one worker - the behaviour policy, which is used to collect data for policy improvement, with provably lower variance return estimates. In our experiments we extend two policy-gradient methods with this regime, demonstrating better sample efficiency and performance over a diverse set of environments.

Updated: 2026-01-03 20:35:02

标题: 行为策略优化：针对离线策略强化学习的可证明低方差回报估计

摘要: 许多强化学习算法，特别是依赖于回报估计进行策略改进的算法，由于高方差的回报估计而导致样本效率低和训练不稳定。在这篇论文中，我们利用了来自离线评估的新结果；最近已经证明，设计良好的行为策略可以用于收集离线数据，从而得到具有较低方差的回报估计。这个结果令人惊讶，因为这意味着在策略优化过程中收集数据并不是方差最优的。我们将这一关键观点扩展到在线强化学习环境中，在这种环境中，策略评估和改进交替进行，以学习最优策略。离线强化学习已经得到广泛研究（例如，IMPALA），使用正确和截断的重要性加权样本进行去偏和适当地管理方差。通常这些方法涉及协调从多个并行工作者收集的数据，而策略是异步更新的，工作者和策略之间的不匹配以数学上合理的方式进行校正。在这里，我们只考虑一个工作者 - 行为策略，用于收集数据进行策略改进，从而得到具有较低方差的回报估计。在我们的实验中，我们将这种方法扩展到两种策略梯度方法，展示了在各种环境中更好的样本效率和性能。

更新时间: 2026-01-03 20:35:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2511.10843v3

Hybrid coupling with operator inference and the overlapping Schwarz alternating method

This paper presents a novel hybrid approach for coupling subdomain-local non-intrusive Operator Inference (OpInf) reduced order models (ROMs) with each other and with subdomain-local high-fidelity full order models (FOMs) with using the overlapping Schwarz alternating method (O-SAM). The proposed methodology addresses significant challenges in multiscale modeling and simulation, particularly the long runtime and complex mesh generation requirements associated with traditional high-fidelity simulations. By leveraging the flexibility of O-SAM, we enable the seamless integration of disparate models, meshes, and time integration schemes, enhancing computational efficiency while maintaining high accuracy. Our approach is demonstrated through a series of numerical experiments on complex three-dimensional (3D) solid dynamics problems, showcasing speedups of up to 106x compared to conventional FOM-FOM couplings. This work paves the way for more efficient simulation workflows in engineering applications, with potential extensions to a wide range of partial differential equations.

Updated: 2026-01-03 20:21:04

标题: 混合耦合与操作员推断和重叠的Schwarz交替方法

摘要: 本文提出了一种新颖的混合方法，用于将子域本地非侵入式操作推断（OpInf）降阶模型（ROM）与子域本地高保真度全阶模型（FOM）以及使用重叠Schwarz交替方法（O-SAM）进行耦合。所提出的方法解决了多尺度建模和仿真中的重要挑战，特别是与传统高保真度仿真相关的长运行时间和复杂的网格生成要求。通过利用O-SAM的灵活性，我们实现了不同模型、网格和时间积分方案的无缝集成，提高了计算效率同时保持高精度。我们通过一系列复杂的三维固体动力学问题的数值实验来演示我们的方法，相较于传统的FOM-FOM耦合，速度提升高达106倍。这项工作为工程应用中更高效的仿真工作流铺平了道路，并有潜在的扩展到各种偏微分方程。

更新时间: 2026-01-03 20:21:04

领域: math.NA,cs.AI,math-ph

下载: http://arxiv.org/abs/2511.20687v2

Beyond Expectations: Learning with Stochastic Dominance Made Practical

Stochastic dominance serves as a general framework for modeling a broad spectrum of decision preferences under uncertainty, with risk aversion as one notable example, as it naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations. Despite theoretical appeal, the application of stochastic dominance in machine learning has been scarce, due to the following challenges: $\textbf{i)}$, the original concept of stochastic dominance only provides a $\textit{partial order}$, and therefore, is not amenable to serve as a general optimality criterion; and $\textbf{ii)}$, an efficient computational recipe remains lacking due to the continuum nature of evaluating stochastic dominance. In this work, we make the first attempt towards establishing a general framework of learning with stochastic dominance. We first generalize the stochastic dominance concept to enable feasible comparisons between any arbitrary pair of random variables. We next develop a simple and computationally efficient approach for finding the optimal solution in terms of stochastic dominance, which can be seamlessly plugged into many learning tasks. Numerical experiments demonstrate that the proposed method achieves comparable performance as standard risk-neutral strategies and obtains better trade-offs against risk across a variety of applications including supervised learning, reinforcement learning, and portfolio optimization.

Updated: 2026-01-03 20:15:33

标题: 超越期望：通过实践应用随机优势学习

摘要: 随机优势作为一个通用框架，用于在不确定性下建模广泛的决策偏好，风险规避是一个显著的例子，因为它自然地捕获了基础不确定性的内在结构，与简单地求期望相比。尽管在理论上具有吸引力，但由于以下挑战，随机优势在机器学习中的应用却很少见：$\textbf{i)}$，随机优势的原始概念仅提供了一个$\textit{偏序}$，因此不适合作为一般的最优性标准；和$\textbf{ii)}$，由于评估随机优势的连续性质，缺乏高效的计算方法。在这项工作中，我们首次尝试建立一个学习随机优势的通用框架。我们首先将随机优势的概念泛化，以便在任意一对随机变量之间进行可行的比较。接下来，我们开发了一种简单且计算效率高的方法，用于在随机优势方面找到最优解，这可以无缝地嵌入到许多学习任务中。数值实验表明，所提出的方法在监督学习、强化学习和组合优化等各种应用中实现了与标准风险中性策略可比的性能，并在风险方面取得更好的折衷。

更新时间: 2026-01-03 20:15:33

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2402.02698v2

From Optimization to Control: Quasi Policy Iteration

Recent control algorithms for Markov decision processes (MDPs) have been designed using an implicit analogy with well-established optimization algorithms. In this paper, we adopt the quasi-Newton method (QNM) from convex optimization to introduce a novel control algorithm coined as quasi-policy iteration (QPI). In particular, QPI is based on a novel approximation of the ``Hessian'' matrix in the policy iteration algorithm, which exploits two linear structural constraints specific to MDPs and allows for the incorporation of prior information on the transition probability kernel. While the proposed algorithm has the same computational complexity as value iteration, it exhibits an empirical convergence behavior similar to that of QNM with a low sensitivity to the discount factor.

Updated: 2026-01-03 20:11:06

标题: 从优化到控制：准政策迭代

摘要: 最近针对马尔可夫决策过程（MDPs）的控制算法是使用与成熟的优化算法隐含类比而设计的。在本文中，我们采用凸优化中的拟牛顿方法（QNM）引入一种被称为拟策略迭代（QPI）的新型控制算法。具体来说，QPI基于策略迭代算法中对“Hessian”矩阵的新型逼近，利用了MDPs特有的两个线性结构约束，并允许将转移概率核的先验信息整合进去。虽然所提出的算法与值迭代具有相同的计算复杂度，但其经验收敛行为类似于QNM，并对折现因子的敏感性较低。

更新时间: 2026-01-03 20:11:06

领域: math.OC,cs.LG,eess.SY

下载: http://arxiv.org/abs/2311.11166v4

Accelerated Full Waveform Inversion by Deep Compressed Learning

We propose and test a method to reduce the dimensionality of Full Waveform Inversion (FWI) inputs as computational cost mitigation approach. Given modern seismic acquisition systems, the data (as input for FWI) required for an industrial-strength case is in the teraflop level of storage, therefore solving complex subsurface cases or exploring multiple scenarios with FWI become prohibitive. The proposed method utilizes a deep neural network with a binarized sensing layer that learns by compressed learning a succinct but consequential seismic acquisition layout from a large corpus of subsurface models. Thus, given a large seismic data set to invert, the trained network selects a smaller subset of the data, then by using representation learning, an autoencoder computes latent representations of the data, followed by K-means clustering of the latent representations to further select the most relevant data for FWI. Effectively, this approach can be seen as a hierarchical selection. The proposed approach consistently outperforms random data sampling, even when utilizing only 10% of the data for 2D FWI, these results pave the way to accelerating FWI in large scale 3D inversion.

Updated: 2026-01-03 19:30:52

标题: 深度压缩学习加速全波形反演

摘要: 我们提出并测试了一种方法，以减少全波形反演（FWI）输入的维度，作为计算成本缓解的方法。考虑到现代地震采集系统，用于工业级案例的数据（作为FWI的输入）需要在万亿次浮点运算级别的存储，因此解决复杂的地下案例或利用FWI探索多种情景变得不可行。所提出的方法利用具有二进制感知层的深度神经网络，通过压缩学习从大量地下模型中学习出简洁而具有重要影响的地震采集布局。因此，给定一个大型地震数据集进行反演时，训练好的网络选择数据的一个较小子集，然后通过使用表示学习，自动编码器计算数据的潜在表示，随后对潜在表示进行K均值聚类，以进一步选择最相关的数据用于FWI。实际上，这种方法可以被看作是一种分层选择。所提出的方法始终优于随机数据采样，即使仅利用10%的数据进行2D FWI，这些结果为加速大规模3D反演的FWI铺平了道路。

更新时间: 2026-01-03 19:30:52

领域: cs.LG

下载: http://arxiv.org/abs/2601.01268v1

From Policy to Logic for Efficient and Interpretable Coverage Assessment

Large Language Models (LLMs) have demonstrated strong capabilities in interpreting lengthy, complex legal and policy language. However, their reliability can be undermined by hallucinations and inconsistencies, particularly when analyzing subjective and nuanced documents. These challenges are especially critical in medical coverage policy review, where human experts must be able to rely on accurate information. In this paper, we present an approach designed to support human reviewers by making policy interpretation more efficient and interpretable. We introduce a methodology that pairs a coverage-aware retriever with symbolic rule-based reasoning to surface relevant policy language, organize it into explicit facts and rules, and generate auditable rationales. This hybrid system minimizes the number of LLM inferences required which reduces overall model cost. Notably, our approach achieves a 44% reduction in inference cost alongside a 4.5% improvement in F1 score, demonstrating both efficiency and effectiveness.

Updated: 2026-01-03 19:24:51

标题: 从政策到逻辑：高效且可解释的覆盖评估

摘要: 大型语言模型(LLMs)已经展示出在解释冗长、复杂的法律和政策语言方面的强大能力。然而，在分析主观和微妙文件时，它们的可靠性可能会受到幻觉和不一致性的影响。这些挑战在医疗保险政策审查中尤为重要，人类专家必须能够依赖准确的信息。在本文中，我们提出了一种旨在通过使政策解释更加高效和可解释来支持人类审阅者的方法。我们介绍了一种方法，该方法将具有覆盖意识的检索器与符号规则推理相结合，以提取相关政策语言，将其整理为明确的事实和规则，并生成可审计的理由。这种混合系统减少了所需的LLM推断数量，降低了整体模型成本。值得注意的是，我们的方法在推理成本上实现了44%的降低，同时在F1分数上实现了4.5%的提升，既展示了效率又展示了效果。

更新时间: 2026-01-03 19:24:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2601.01266v1

MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance

The deployment of large language models (LLMs) in real-world clinical applications is constrained by the fundamental trade-off between computational cost and the efficiency of linear-time models. To address this, we propose an LLM-based MambaFormer hybrid Mixture-of-Experts (MoE) framework for efficient medical question-answering (QA) and clinical assistance. The MambaFormer employs a lightweight gating mechanism that performs token-level dynamic routing to a customized Transformer expert (ET5) for short, complex queries or to a State Space Model expert (EMamba) for long, high-throughput sequences. The customized EMamba and ET5 models are tailored to accommodate input sequence dimensionality, embedding structure, sequence length, and target-specific output heads, and are fine-tuned through transfer learning on a new, custom-designed DentalQA dataset. Moreover, intelligent routing decisions are driven by the contextual complexity of token embeddings, normalized sequence length, and domain-aware features, thereby enforcing a Pareto-optimal trade-off between inference latency and prediction accuracy. Furthermore, a novel utility-guided multi-objective loss jointly optimizes decisions, router parameters, routing behavior, expert utilization, and computational cost by adaptively regulating token-level expert activation. Finally, the proposed MambaFormer is cross-validated (holdout) for medical QA on the new, custom-designed DentalQA and PubMedQA datasets and compared with state-of-the-art techniques. The proposed MambaFormer outperforms (BERTScore = 0.9180) with ultra-low latency (0.077 s), delivering a 24.4 speedup over T5-Large and establishing a scalable solution for resource-constrained clinical deployment.

Updated: 2026-01-03 19:01:33

标题: 蛇形前者：基于标记级别引导的路由专家混合模型，用于精确高效的临床辅助

摘要: 大型语言模型（LLMs）在现实世界临床应用中的部署受到计算成本和线性时间模型效率之间的基本权衡的限制。为了解决这个问题，我们提出了一种基于LLM的MambaFormer混合专家模型（MoE）框架，用于高效的医学问题回答（QA）和临床辅助。MambaFormer采用轻量级门控机制，对短、复杂的查询进行令牌级动态路由到定制的Transformer专家（ET5），或者对长、高吞吐序列进行路由到状态空间模型专家（EMamba）。定制的EMamba和ET5模型根据输入序列维度、嵌入结构、序列长度和特定目标输出头进行调整，并通过在新设计的DentalQA数据集上进行迁移学习进行微调。此外，智能路由决策由令牌嵌入的上下文复杂性、标准化序列长度和领域感知特征驱动，从而强制执行推理延迟和预测准确性之间的帕累托最优权衡。此外，一种新颖的实用程序引导的多目标损失通过自适应调节令牌级专家激活，共同优化决策、路由器参数、路由行为、专家利用和计算成本。最后，提出的MambaFormer在新设计的DentalQA和PubMedQA数据集上进行了交叉验证（留出），并与最先进的技术进行了比较。提出的MambaFormer在超低延迟（0.077秒）的情况下表现出色（BERTScore = 0.9180），比T5-Large快了24.4倍，为资源受限的临床部署提供了可扩展的解决方案。

更新时间: 2026-01-03 19:01:33

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2601.01260v1

Compositions of Variant Experts for Integrating Short-Term and Long-Term Preferences

In the online digital realm, recommendation systems are ubiquitous and play a crucial role in enhancing user experience. These systems leverage user preferences to provide personalized recommendations, thereby helping users navigate through the paradox of choice. This work focuses on personalized sequential recommendation, where the system considers not only a user's immediate, evolving session context, but also their cumulative historical behavior to provide highly relevant and timely recommendations. Through an empirical study conducted on diverse real-world datasets, we have observed and quantified the existence and impact of both short-term (immediate and transient) and long-term (enduring and stable) preferences on users' historical interactions. Building on these insights, we propose a framework that combines short- and long-term preferences to enhance recommendation performance, namely Compositions of Variant Experts (CoVE). This novel framework dynamically integrates short- and long-term preferences through the use of different specialized recommendation models (i.e., experts). Extensive experiments showcase the effectiveness of the proposed methods and ablation studies further investigate the impact of variant expert types.

Updated: 2026-01-03 18:44:00

标题: 集成短期和长期偏好的变体专家组合

摘要: 在在线数字领域中，推荐系统无处不在，并在提升用户体验方面发挥着至关重要的作用。这些系统利用用户偏好提供个性化推荐，从而帮助用户在选择困难中导航。本文关注个性化的顺序推荐，系统不仅考虑用户即时、不断变化的会话环境，还考虑他们累积历史行为，以提供高度相关和及时的推荐。通过对多样真实数据集进行的实证研究，我们观察和量化了用户历史互动中短期（即时和瞬时）和长期（持久和稳定）偏好的存在和影响。基于这些见解，我们提出了一个结合短期和长期偏好以提升推荐性能的框架，即Compositions of Variant Experts (CoVE)。这一新颖框架通过不同专业推荐模型（即专家）的使用动态集成短期和长期偏好。广泛的实验展示了所提方法的有效性，消融研究进一步探讨了不同类型专家的影响。

更新时间: 2026-01-03 18:44:00

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2506.23170v2

Seamlessly Natural: Image Stitching with Natural Appearance Preservation

This paper introduces SENA (SEamlessly NAtural), a geometry-driven image stitching approach that prioritizes structural fidelity in challenging real-world scenes characterized by parallax and depth variation. Conventional image stitching relies on homographic alignment, but this rigid planar assumption often fails in dual-camera setups with significant scene depth, leading to distortions such as visible warps and spherical bulging. SENA addresses these fundamental limitations through three key contributions. First, we propose a hierarchical affine-based warping strategy, combining global affine initialization with local affine refinement and smooth free-form deformation. This design preserves local shape, parallelism, and aspect ratios, thereby avoiding the hallucinated structural distortions commonly introduced by homography-based models. Second, we introduce a geometry-driven adequate zone detection mechanism that identifies parallax-minimized regions directly from the disparity consistency of RANSAC-filtered feature correspondences, without relying on semantic segmentation. Third, building upon this adequate zone, we perform anchor-based seamline cutting and segmentation, enforcing a one-to-one geometric correspondence across image pairs by construction, which effectively eliminates ghosting, duplication, and smearing artifacts in the final panorama. Extensive experiments conducted on challenging datasets demonstrate that SENA achieves alignment accuracy comparable to leading homography-based methods, while significantly outperforming them in critical visual metrics such as shape preservation, texture integrity, and overall visual realism.

Updated: 2026-01-03 18:40:35

标题: 无缝自然：具有自然外观保留的图像拼接

摘要: 本文介绍了SENA（SEamlessly NAtural），一种几何驱动的图像拼接方法，该方法在具有视差和深度变化的挑战性现实场景中优先考虑结构的保真度。传统的图像拼接依赖于同轴对齐，但是这种刚性平面假设在具有显著场景深度的双摄像头设置中经常失败，导致可见的翘曲和球形膨胀等失真。SENA通过三个关键贡献解决了这些基本限制。首先，我们提出了一种基于层次仿射变形的策略，将全局仿射初始化与局部仿射细化和平滑自由形变相结合。这种设计保留了局部形状、平行性和纵横比，从而避免了同轴对齐模型通常引入的幻觉结构失真。其次，我们引入了一种几何驱动的适当区域检测机制，直接从RANSAC滤波的特征对应的视差一致性中识别最小化视差的区域，而无需依赖语义分割。第三，基于这个适当区域，我们进行基于锚点的接缝线切割和分割，通过构建实现图像对之间的一对一几何对应关系，有效消除了最终全景图中的幽灵、重复和模糊等伪影。在具有挑战性的数据集上进行的广泛实验表明，SENA在对齐精度方面达到了与主流同轴对齐方法相当的水平，同时在关键的视觉指标（如形状保持、纹理完整性和整体视觉逼真度）方面明显优于它们。

更新时间: 2026-01-03 18:40:35

领域: eess.IV,cs.AI,cs.CV,cs.GR,eess.SP

下载: http://arxiv.org/abs/2601.01257v1

User-Assistant Bias in LLMs

Modern large language models (LLMs) are typically trained and deployed using structured role tags (e.g. system, user, assistant, tool) that explicitly mark the source of each piece of context. While these tags are essential for instruction following and controllability, asymmetries in the training data associated with different role tags can introduce inductive biases. In this paper, we study this phenomenon by formalizing user-assistant bias, defined as the tendency of an LLM to preferentially rely on information from either the user or assistant role when there is a conflict. We introduce a task-agnostic benchmark UserAssist and evaluate such bias in 52 frontier models. We observe that most of the instruction-tuned models exhibit strong user bias, whereas base and reasoning models are close to neutral. Using controlled fine-tuning experiments, we isolate which post-training recipes drive the observed user-assistant bias. We find that human-preference alignment amplifies user bias, while reasoning fine-tuning reduces it. Finally, we show that user-assistant bias can be bidirectionally controlled via direct preference optimization (DPO) on UserAssist-train, and that the resulting bias reliably generalizes to a more realistic multi-turn conversation dataset. These results reveal an underexplored consequence of role-tagged training and provide a principled framework to diagnose and control tag-induced biases in modern LLMs.

Updated: 2026-01-03 18:21:37

标题: LLMs中的用户助手偏见

摘要: 现代大型语言模型（LLMs）通常使用结构化角色标签（例如系统、用户、助手、工具）进行训练和部署，明确标记每个上下文片段的来源。虽然这些标签对于指令遵循和可控性至关重要，但与不同角色标签相关的训练数据中的不对称性可能引入归纳偏差。在本文中，我们通过形式化用户-助手偏差来研究这一现象，该偏差定义为LLM在冲突时倾向于依赖用户或助手角色的信息。我们引入了一个与任务无关的基准测试UserAssist，并评估了52个前沿模型中的这种偏差。我们观察到大多数经过指令调整的模型表现出强烈的用户偏差，而基础模型和推理模型则接近中性。通过控制的微调实验，我们分离出观察到的用户-助手偏差是由哪些后训练配方驱动的。我们发现人类偏好对齐会增强用户偏差，而推理微调会减少偏差。最后，我们展示了通过对UserAssist-train进行直接偏好优化（DPO）可以双向控制用户-助手偏差，并且产生的偏差可可靠地泛化到更现实的多轮对话数据集。这些结果揭示了角色标签训练的一个未被充分探讨的后果，并提供了一个原则性框架来诊断和控制现代LLMs中标签引起的偏差。

更新时间: 2026-01-03 18:21:37

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2508.15815v2

Stochastic Control Methods for Optimization

In this work, we investigate a stochastic control framework for global optimization over both finite-dimensional Euclidean spaces and the Wasserstein space of probability measures. In the Euclidean setting, the original minimization problem is approximated by a family of regularized stochastic control problems; using dynamic programming, we analyze the associated Hamilton--Jacobi--Bellman equations and obtain tractable representations via the Cole--Hopf transform and the Feynman--Kac formula. For optimization over probability measures, we formulate a regularized mean-field control problem characterized by a master equation, and further approximate it by controlled $N$-particle systems. We establish that, as the regularization parameter tends to zero (and as the particle number tends to infinity for the optimization over probability measures), the value of the control problem converges to the global minimum of the original objective. Building on the resulting probabilistic representations, Monte Carlo-based numerical schemes are proposed and numerical experiments are reported to illustrate the practical performance of the methods and to support the theoretical convergence rates.

Updated: 2026-01-03 17:55:26

标题: 随机控制方法用于优化

摘要: 在这项工作中，我们研究了一种随机控制框架，用于在有限维欧几里得空间和概率测度的Wasserstein空间上进行全局优化。在欧几里得设置中，原始的最小化问题被一组正则化的随机控制问题所近似；利用动态规划，我们分析了相关的Hamilton-Jacobi-Bellman方程，并通过Cole-Hopf变换和Feynman-Kac公式获得了可处理的表示。对于概率测度的优化，我们制定了一个由主方程表征的正则化均场控制问题，并通过受控的N粒子系统进一步近似。我们建立了当正则化参数趋向于零（对于概率测度的优化，粒子数趋于无穷）时，控制问题的值收敛于原始目标的全局最小值。基于得到的概率表示，提出了基于蒙特卡罗的数值方案，并报告了数值实验以说明方法的实际性能，并支持理论收敛速度。

更新时间: 2026-01-03 17:55:26

领域: math.OC,cs.LG,math.NA,math.PR

下载: http://arxiv.org/abs/2601.01248v1

Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions

The exponential growth of Large Language Models (LLMs) continues to highlight the need for efficient strategies to meet ever-expanding computational and data demands. This survey provides a comprehensive analysis of two complementary paradigms: Knowledge Distillation (KD) and Dataset Distillation (DD), both aimed at compressing LLMs while preserving their advanced reasoning capabilities and linguistic diversity. We first examine key methodologies in KD, such as task-specific alignment, rationale-based training, and multi-teacher frameworks, alongside DD techniques that synthesize compact, high-impact datasets through optimization-based gradient matching, latent space regularization, and generative synthesis. Building on these foundations, we explore how integrating KD and DD can produce more effective and scalable compression strategies. Together, these approaches address persistent challenges in model scalability, architectural heterogeneity, and the preservation of emergent LLM abilities. We further highlight applications across domains such as healthcare and education, where distillation enables efficient deployment without sacrificing performance. Despite substantial progress, open challenges remain in preserving emergent reasoning and linguistic diversity, enabling efficient adaptation to continually evolving teacher models and datasets, and establishing comprehensive evaluation protocols. By synthesizing methodological innovations, theoretical foundations, and practical insights, our survey charts a path toward sustainable, resource-efficient LLMs through the tighter integration of KD and DD principles.

Updated: 2026-01-03 17:35:08

标题: 大型语言模型的知识蒸馏和数据集蒸馏：新兴趋势、挑战和未来方向

摘要: 大型语言模型（LLMs）的指数增长继续凸显了满足不断扩大的计算和数据需求的高效策略的必要性。本调查提供了对两种互补范例的全面分析：知识蒸馏（KD）和数据集蒸馏（DD），旨在压缩LLMs同时保留其先进的推理能力和语言多样性。我们首先考察了KD中的关键方法，如任务特定对齐、基于理由的训练和多教师框架，以及DD技术，通过基于优化的梯度匹配、潜在空间正则化和生成综合来合成紧凑、高影响力的数据集。在此基础上，我们探讨了如何整合KD和DD可以产生更有效和可扩展的压缩策略。这些方法共同解决了模型可扩展性、架构异质性和保留新兴LLM能力的持久挑战。我们进一步强调了在医疗保健和教育等领域的应用，蒸馏使得在不牺牲性能的情况下实现高效部署成为可能。尽管取得了实质性进展，但在保留新兴推理和语言多样性、使得适应不断演化的教师模型和数据集变得高效，以及建立全面评估协议方面仍存在挑战。通过综合方法创新、理论基础和实践见解，我们的调查通过更紧密整合KD和DD原则为可持续、资源高效的LLMs开辟了一条道路。

更新时间: 2026-01-03 17:35:08

领域: cs.CL,cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.14772v2

Learning Repetition-Invariant Representations for Polymer Informatics

Polymers are large macromolecules composed of repeating structural units known as monomers and are widely applied in fields such as energy storage, construction, medicine, and aerospace. However, existing graph neural network methods, though effective for small molecules, only model the single unit of polymers and fail to produce consistent vector representations for the true polymer structure with varying numbers of units. To address this challenge, we introduce Graph Repetition Invariance (GRIN), a novel method to learn polymer representations that are invariant to the number of repeating units in their graph representations. GRIN integrates a graph-based maximum spanning tree alignment with repeat-unit augmentation to ensure structural consistency. We provide theoretical guarantees for repetition-invariance from both model and data perspectives, demonstrating that three repeating units are the minimal augmentation required for optimal invariant representation learning. GRIN outperforms state-of-the-art baselines on both homopolymer and copolymer benchmarks, learning stable, repetition-invariant representations that generalize effectively to polymer chains of unseen sizes.

Updated: 2026-01-03 17:28:45

标题: 学习聚合物信息学的重复不变表示

摘要: 聚合物是由称为单体的重复结构单元组成的大分子，广泛应用于能源存储、建筑、医学和航空航天等领域。然而，现有的图神经网络方法虽然在小分子方面有效，但只对聚合物的单个单元进行建模，无法产生一致的向量表示，以反映具有不同单元数量的真实聚合物结构。为了解决这一挑战，我们引入了图重复不变性（GRIN），这是一种学习聚合物表示的新方法，对其图表示中的重复单元数量保持不变。GRIN将基于图的最大生成树对齐与重复单元增强相结合，以确保结构的一致性。我们从模型和数据的角度提供了重复不变性的理论保证，证明三个重复单元是实现最佳不变表示学习所需的最小增强。GRIN在同质聚合物和共聚物基准测试中表现优于现有的基准线，学习到稳定、重复不变的表示，有效泛化到未见大小的聚合物链。

更新时间: 2026-01-03 17:28:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.10726v3

MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools

Tool-augmented LLM agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended exposure of external inputs (e.g., environment secrets or local files). While existing scanners often focus on static artifacts, analyzing runtime behavior is challenging because directly executing untrusted tools can itself be dangerous. We present MCP-SandboxScan, a lightweight framework motivated by the Model Context Protocol (MCP) that safely executes untrusted tools inside a WebAssembly/WASI sandbox and produces auditable reports of external-to-sink exposures. Our prototype (i) extracts LLM-relevant sinks from runtime outputs (prompt/messages and structured tool-return fields), (ii) instantiates external-input candidates from environment values, mounted file contents, and output-surfaced HTTP fetch intents, and (iii) links sources to sinks via snippet-based substring matching. Case studies on three representative tools show that MCP-SandboxScan can surface provenance evidence when external inputs appear in prompt/messages or tool-return payloads, and can expose filesystem capability violations as runtime evidence. We further compare against a lightweight static string-signature baseline and use a micro-benchmark to characterize false negatives under transformations and false positives from short-token collisions.

Updated: 2026-01-03 17:25:38

标题: MCP-SandboxScan：基于WASM的MCP工具安全执行和运行时分析

摘要: 工具增强的LLM代理引入了新的安全风险：工具执行可能会引入仅在运行时出现的行为，包括提示注入和意外暴露外部输入（例如，环境机密或本地文件）。虽然现有的扫描器通常侧重于静态构件，但分析运行时行为是具有挑战性的，因为直接执行不受信任的工具本身可能是危险的。我们提出了MCP-SandboxScan，这是一个受模型上下文协议(MCP)启发的轻量级框架，它安全地在WebAssembly/WASI沙箱中执行不受信任的工具，并生成外部到接收器暴露的可审计报告。我们的原型（i）从运行时输出中提取LLM相关的接收器（提示/消息和结构化工具返回字段），（ii）从环境值、挂载文件内容和输出表面化的HTTP获取意图实例化外部输入候选，以及（iii）通过基于片段的子字符串匹配将源链接到接收器。针对三种代表性工具的案例研究表明，MCP-SandboxScan能够在提示/消息或工具返回有效载荷中出现外部输入时呈现来源证据，并且可以暴露文件系统能力违规作为运行时证据。我们进一步与轻量级静态字符串签名基线进行比较，并使用微基准测试来表征转换下的假阴性和短令牌碰撞导致的假阳性。

更新时间: 2026-01-03 17:25:38

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2601.01241v1

NoveltyRank: A Retrieval-Augmented Framework for Conceptual Novelty Estimation in AI Research

The accelerating pace of scientific publication makes it difficult to identify truly original research among incremental work. We propose a framework for estimating the conceptual novelty of research papers by combining semantic representation learning with retrieval-based comparison against prior literature. We model novelty as both a binary classification task (novel vs. non-novel) and a pairwise ranking task (comparative novelty), enabling absolute and relative assessments. Experiments benchmark three model scales, ranging from compact domain-specific encoders to a zero-shot frontier model. Results show that fine-tuned lightweight models outperform larger zero-shot models despite their smaller parameter count, indicating that task-specific supervision matters more than scale for conceptual novelty estimation. We further deploy the best-performing model as an online system for public interaction and real-time novelty scoring.

Updated: 2026-01-03 17:14:54

标题: 新颖度排名：一种检索增强框架用于人工智能研究中的概念新颖度估算

摘要: 科学出版物的加速发表速度使得在渐进式工作中难以识别出真正原创的研究。我们提出了一个框架，通过将语义表示学习与基于检索的对比结合起来，来估计研究论文的概念新颖性。我们将新颖性建模为一个二元分类任务（新颖 vs. 非新颖）和一个成对排名任务（比较新颖性），从而实现绝对和相对评估。实验对比了三种模型规模，从紧凑的领域特定编码器到零样本边界模型。结果表明，经过微调的轻量级模型表现优于更大的零样本模型，尽管其参数数量更少，这表明任务特定的监督比规模对于概念新颖性估计更为重要。我们进一步将表现最佳的模型部署为一个在线系统，用于公众互动和实时新颖性评分。

更新时间: 2026-01-03 17:14:54

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2512.14738v2

Chimera: Harnessing Multi-Agent LLMs for Automatic Insider Threat Simulation

Insider threats pose a persistent and critical security risk, yet are notoriously difficult to detect in complex enterprise environments, where malicious actions are often hidden within seemingly benign user behaviors. Although machine-learning-based insider threat detection (ITD) methods have shown promise, their effectiveness is fundamentally limited by the scarcity of high-quality and realistic training data. Enterprise internal data is highly sensitive and rarely accessible, while existing public and synthetic datasets are either small-scale or lack sufficient realism, semantic richness, and behavioral diversity. To address this challenge, we propose Chimera, an LLM-based multi-agent framework that automatically simulates both benign and malicious insider activities and generates comprehensive system logs across diverse enterprise environments. Chimera models each agent as an individual employee with fine-grained roles and supports group meetings, pairwise interactions, and self-organized scheduling to capture realistic organizational dynamics. Based on 15 insider attacks abstracted from real-world incidents, we deploy Chimera in three representative data-sensitive organizational scenarios and construct ChimeraLog, a new dataset for developing and evaluating ITD methods. We evaluate ChimeraLog through human studies and quantitative analyses, demonstrating its diversity and realism. Experiments with existing ITD methods show substantially lower detection performance on ChimeraLog compared to prior datasets, indicating a more challenging and realistic benchmark. Moreover, despite distribution shifts, models trained on ChimeraLog exhibit strong generalization, highlighting the practical value of LLM-based multi-agent simulation for advancing insider threat detection.

Updated: 2026-01-03 17:12:14

标题: 合成体：利用多智能体LLMs进行自动内部威胁模拟

摘要: 内部威胁构成了一个持续而严重的安全风险，然而在复杂的企业环境中，恶意行为往往隐藏在看似良性的用户行为中，因此很难检测。虽然基于机器学习的内部威胁检测（ITD）方法显示出了希望，但它们的有效性在很大程度上受到高质量和现实训练数据的稀缺性的限制。企业内部数据非常敏感且很少可访问，而现有的公共和合成数据集要么规模较小，要么缺乏足够的现实性、语义丰富性和行为多样性。为了解决这一挑战，我们提出了Chimera，一种基于LLM的多代理框架，可以自动模拟良性和恶意内部人员活动，并在各种企业环境中生成全面的系统日志。Chimera将每个代理建模为一个具有精细角色的个体员工，并支持小组会议、两两互动和自组织调度，以捕捉真实的组织动态。基于从真实事件中抽象出的15起内部袭击，我们在三个代表性的数据敏感型组织场景中部署了Chimera，并构建了ChimeraLog，这是一个用于开发和评估ITD方法的新数据集。我们通过人类研究和定量分析评估了ChimeraLog，展示了其多样性和现实性。与先前的数据集相比，现有ITD方法在ChimeraLog上显示出明显较低的检测性能，表明这是一个更具挑战性和现实性的基准。此外，尽管存在分布转移，基于ChimeraLog训练的模型表现出很强的泛化能力，突显了基于LLM的多代理模拟在推动内部威胁检测方面的实际价值。

更新时间: 2026-01-03 17:12:14

领域: cs.CR,cs.AI,cs.SE

下载: http://arxiv.org/abs/2508.07745v4

IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection

The rapid advancements in artificial intelligence have significantly accelerated the adoption of speech recognition technology, leading to its widespread integration across various applications. However, this surge in usage also highlights a critical issue: audio data is highly vulnerable to unauthorized exposure and analysis, posing significant privacy risks for businesses and individuals. This paper introduces an Information-Obfuscation Reversible Adversarial Example (IO-RAE) framework, the pioneering method designed to safeguard audio privacy using reversible adversarial examples. IO-RAE leverages large language models to generate misleading yet contextually coherent content, effectively preventing unauthorized eavesdropping by humans and Automatic Speech Recognition (ASR) systems. Additionally, we propose the Cumulative Signal Attack technique, which mitigates high-frequency noise and enhances attack efficacy by targeting low-frequency signals. Our approach ensures the protection of audio data without degrading its quality or our ability. Experimental evaluations demonstrate the superiority of our method, achieving a targeted misguidance rate of 96.5% and a remarkable 100% untargeted misguidance rate in obfuscating target keywords across multiple ASR models, including a commercial black-box system from Google. Furthermore, the quality of the recovered audio, measured by the Perceptual Evaluation of Speech Quality score, reached 4.45, comparable to high-quality original recordings. Notably, the recovered audio processed by ASR systems exhibited an error rate of 0%, indicating nearly lossless recovery. These results highlight the practical applicability and effectiveness of our IO-RAE framework in protecting sensitive audio privacy.

Updated: 2026-01-03 17:08:35

标题: IO-RAE: 音频隐私保护的信息混淆可逆对抗示例

摘要: 人工智能的快速发展显著加速了语音识别技术的应用，导致其在各种应用中得到广泛整合。然而，这种使用激增也凸显了一个关键问题：音频数据极易遭受未经授权的暴露和分析，为企业和个人带来重大的隐私风险。本文介绍了一种信息混淆可逆对抗示例（IO-RAE）框架，这是一种旨在利用可逆对抗示例保护音频隐私的开创性方法。IO-RAE利用大型语言模型生成具有误导性但上下文连贯的内容，有效防止人类和自动语音识别（ASR）系统的未经授权监听。此外，我们提出了累积信号攻击技术，通过针对低频信号来减轻高频噪音，增强攻击效果。我们的方法确保了对音频数据的保护，而不会降低其质量或我们的能力。实验评估表明，我们的方法的优越性，实现了96.5%的定向误导率和100%的非定向误导率，在多个ASR模型中混淆目标关键词，包括来自谷歌的商业黑盒系统。此外，通过感知语音质量评价得分测量，恢复音频的质量达到4.45，与高质量原始录音相当。值得注意的是，通过ASR系统处理的恢复音频表现出0%的误差率，几乎无损恢复。这些结果突显了我们的IO-RAE框架在保护敏感音频隐私方面的实际适用性和有效性。

更新时间: 2026-01-03 17:08:35

领域: cs.SD,cs.CR,cs.MM,eess.AS

下载: http://arxiv.org/abs/2601.01239v1

Evidence Slopes and Effective Dimension in Singular Linear Models

Bayesian model selection commonly relies on Laplace approximation or the Bayesian Information Criterion (BIC), which assume that the effective model dimension equals the number of parameters. Singular learning theory replaces this assumption with the real log canonical threshold (RLCT), an effective dimension that can be strictly smaller in overparameterized or rank-deficient models. We study linear-Gaussian rank models and linear subspace (dictionary) models in which the exact marginal likelihood is available in closed form and the RLCT is analytically tractable. In this setting, we show theoretically and empirically that the error of Laplace/BIC grows linearly with (d/2 minus lambda) times log n, where d is the ambient parameter dimension and lambda is the RLCT. An RLCT-aware correction recovers the correct evidence slope and is invariant to overcomplete reparameterizations that represent the same data subspace. Our results provide a concrete finite-sample characterization of Laplace failure in singular models and demonstrate that evidence slopes can be used as a practical estimator of effective dimension in simple linear settings.

Updated: 2026-01-03 17:05:55

标题: 线性模型中的证据坡度和有效维数

摘要: 贝叶斯模型选择通常依赖拉普拉斯逼近或贝叶斯信息准则（BIC），这两种方法假设有效模型维度等于参数数量。奇异学习理论用实对数规范阈值（RLCT）取代了这一假设，这是一个在过度参数化或秩亏模型中可以严格较小的有效维度。我们研究了线性高斯秩模型和线性子空间（字典）模型，其中精确边际似然在闭式形式中可用，RLCT是可解析的。在这种情况下，我们在理论上和经验上表明，拉普拉斯/BIC的误差随着(d/2减去lambda)乘以log n线性增长，其中d是环境参数维度，lambda是RLCT。一个考虑RLCT的修正可以恢复正确的证据斜率，并且对于表示相同数据子空间的过度参数化是不变的。我们的结果提供了奇异模型中拉普拉斯失败的具体有限样本特征，并证明证据斜率可以在简单线性设置中作为有效维度的实际估计器。

更新时间: 2026-01-03 17:05:55

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2601.01238v1

Benchmarking the Computational and Representational Efficiency of State Space Models against Transformers on Long-Context Dyadic Sessions

State Space Models (SSMs) have emerged as a promising alternative to Transformers for long-context sequence modeling, offering linear $O(N)$ computational complexity compared to the Transformer's quadratic $O(N^2)$ scaling. This paper presents a comprehensive benchmarking study comparing the Mamba SSM against the LLaMA Transformer on long-context sequences, using dyadic therapy sessions as a representative test case. We evaluate both architectures across two dimensions: (1) computational efficiency, where we measure memory usage and inference speed from 512 to 8,192 tokens, and (2) representational efficiency, where we analyze hidden state dynamics and attention patterns. Our findings provide actionable insights for practitioners working with long-context applications, establishing precise conditions under which SSMs offer advantages over Transformers.

Updated: 2026-01-03 17:05:01

标题: 比较长对话中状态空间模型与Transformer模型的计算效率和表征效率

摘要: 状态空间模型（SSMs）已经成为长序列建模的一种有前景的替代方案，相比于Transformer的二次$O(N^2)$缩放，它们提供了线性$O(N)$计算复杂度。本文提出了一项全面的基准研究，比较了Mamba SSM与LLaMA Transformer在长序列上的表现，以二元治疗会话作为代表性测试案例。我们评估了两种架构在两个维度上的性能：（1）计算效率，我们从512到8,192个标记中测量了内存使用和推理速度，以及（2）表征效率，我们分析了隐藏状态动态和注意力模式。我们的发现为从事长序列应用的从业者提供了可操作的见解，建立了SSMs在何种条件下优于Transformers的精确条件。

更新时间: 2026-01-03 17:05:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2601.01237v1

The Dependency Divide: An Interpretable Machine Learning Framework for Profiling Student Digital Satisfaction in the Bangladesh Context

Background: While digital access has expanded rapidly in resource-constrained contexts, satisfaction with digital learning platforms varies significantly among students with seemingly equal connectivity. Traditional digital divide frameworks fail to explain these variations. Purpose: This study introduces the "Dependency Divide", a novel framework proposing that highly engaged students become conditionally vulnerable to infrastructure failures, challenging assumptions that engagement uniformly benefits learners in post-access environments. Methods: We conducted a cross-sectional study of 396 university students in Bangladesh using a three-stage analytical approach: (1) stability-validated K-prototypes clustering to identify student profiles, (2) profile-specific Random Forest models with SHAP and ALE analysis to determine satisfaction drivers, and (3) formal interaction analysis with propensity score matching to test the Dependency Divide hypothesis. Results: Three distinct profiles emerged: Casually Engaged (58%), Efficient Learners (35%), and Hyper-Engaged (7%). A significant interaction between educational device time and internet reliability (\b{eta} = 0.033, p = 0.028) confirmed the Dependency Divide: engagement increased satisfaction only when infrastructure remained reliable. Hyper-Engaged students showed greatest vulnerability despite or because of their sophisticated digital workflows. Policy simulations demonstrated that targeted reliability improvements for high-dependency users yielded 2.06 times greater returns than uniform interventions. Conclusions: In fragile infrastructure contexts, capability can become liability. Digital transformation policies must prioritize reliability for dependency-prone users, establish contingency systems, and educate students about dependency risks rather than uniformly promoting engagement.

Updated: 2026-01-03 16:37:51

标题: 《依赖分歧：一个可解释的机器学习框架，用于在孟加拉国背景下对学生数字满意度进行剖析》

摘要: 背景：尽管数字接入在资源受限的环境中迅速扩展，但对数字学习平台的满意度在似乎具有相同连接性的学生中存在显著差异。传统的数字鸿沟框架无法解释这些变化。目的：本研究介绍了“依赖鸿沟”，这是一个新颖的框架，提出高度参与的学生在基础设施故障时会变得有条件地脆弱，挑战了参与在后接入环境中统一受益学习者的假设。方法：我们对孟加拉国的396名大学生进行了一项横断面研究，采用三阶段分析方法：(1) 稳定性验证的K-prototypes聚类，以识别学生配置文件，(2) 针对配置文件的随机森林模型，并分析SHAP和ALE以确定满意度驱动因素，以及(3) 与倾向得分匹配进行正式交互分析，以测试依赖鸿沟假设。结果：出现了三种不同的配置文件：偶然参与者（58％），高效学习者（35％）和超级参与者（7％）。教育设备时间和互联网可靠性之间的显著交互作用（\b{eta} = 0.033，p = 0.028）证实了依赖鸿沟：只有在基础设施保持可靠时，参与才会增加满意度。超级参与者尽管或因为其复杂的数字工作流程，表现出最大的脆弱性。政策模拟表明，针对高度依赖用户的可靠性改善产生的回报比统一干预高出2.06倍。结论：在脆弱的基础设施环境中，能力可能变成负担。数字转型政策必须优先考虑依赖倾向用户的可靠性，建立备用系统，并教育学生有关依赖风险，而不是统一促进参与。

更新时间: 2026-01-03 16:37:51

领域: cs.LG

下载: http://arxiv.org/abs/2601.01231v1

NeuroSSM: Multiscale Differential State-Space Modeling for Context-Aware fMRI Analysis

Accurate fMRI analysis requires sensitivity to temporal structure across multiple scales, as BOLD signals encode cognitive processes that emerge from fast transient dynamics to slower, large-scale fluctuations. Existing deep learning (DL) approaches to temporal modeling face challenges in jointly capturing these dynamics over long fMRI time series. Among current DL models, transformers address long-range dependencies by explicitly modeling pairwise interactions through attention, but the associated quadratic computational cost limits effective integration of temporal dependencies across long fMRI sequences. Selective state-space models (SSMs) instead model long-range temporal dependencies implicitly through latent state evolution in a dynamical system, enabling efficient propagation of dependencies over time. However, recent SSM-based approaches for fMRI commonly operate on derived functional connectivity representations and employ single-scale temporal processing. These design choices constrain the ability to jointly represent fast transient dynamics and slower global trends within a single model. We propose NeuroSSM, a selective state-space architecture designed for end-to-end analysis of raw BOLD signals in fMRI time series. NeuroSSM addresses the above limitations through two complementary design components: a multiscale state-space backbone that captures fast and slow dynamics concurrently, and a parallel differencing branch that increases sensitivity to transient state changes. Experiments on clinical and non-clinical datasets demonstrate that NeuroSSM achieves competitive performance and efficiency against state-of-the-art fMRI analysis methods.

Updated: 2026-01-03 16:35:45

标题: NeuroSSM：用于上下文感知fMRI分析的多尺度差分状态空间建模

摘要: 准确的fMRI分析需要对多个时间尺度的时间结构敏感，因为BOLD信号编码了从快速瞬时动态到较慢的大尺度波动中产生的认知过程。现有的深度学习（DL）方法在时间建模方面面临挑战，因为它们难以同时捕捉长时间fMRI时间序列中的这些动态。在当前的DL模型中，transformers通过注意力显式地建模成对交互以解决长距离依赖性，但相关的二次计算成本限制了对长时间fMRI序列中时间依赖性的有效整合。选择性状态空间模型（SSMs）通过动态系统中的潜在状态演变来隐式地模拟长距离时间依赖性，从而实现了随时间的有效传播。然而，最近基于SSM的fMRI方法通常在派生的功能连接表示上运行，并采用单一尺度的时间处理。这些设计选择限制了在单一模型中共同表示快速瞬时动态和较慢的全局趋势的能力。我们提出了NeuroSSM，这是一种专门设计用于端到端分析fMRI时间序列中原始BOLD信号的选择性状态空间架构。NeuroSSM通过两个互补的设计组件解决了上述限制：一个多尺度状态空间骨干，同时捕捉快速和缓慢的动态，以及一个并行差分分支，增加对瞬时状态变化的敏感性。对临床和非临床数据集的实验表明，NeuroSSM在与最先进的fMRI分析方法的竞争性性能和效率方面取得了成功。

更新时间: 2026-01-03 16:35:45

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2601.01229v1

Towards a Multi-Layer Defence Framework for Securing Near-Real-Time Operations in Open RAN

Securing the near-real-time (near-RT) control operations in Open Radio Access Networks (Open RAN) is increasingly critical, yet remains insufficiently addressed, as new runtime threats target the control loop while the system is operational. In this paper, we propose a multi-layer defence framework designed to enhance the security of near-RT RAN Intelligent Controller (RIC) operations. We classify operational-time threats into three categories, message-level, data-level, and control logic-level, and design and implement a dedicated detection and mitigation component for each: a signature-based E2 message inspection module performing structural and semantic validation of signalling exchanges, a telemetry poisoning detector based on temporal anomaly scoring using an LSTM network, and a runtime xApp attestation mechanism based on execution-time hash challenge-response. The framework is evaluated on an O-RAN testbed comprising FlexRIC and a commercial RAN emulator, demonstrating effective detection rates, low latency overheads, and practical integration feasibility. Results indicate that the proposed safeguards can operate within near-RT time constraints while significantly improving protection against runtime attacks, introducing less than 80 ms overhead for a network with 500 User Equipment (UEs). Overall, this work lays the foundation for deployable, layered, and policy-driven runtime security architectures for the near-RT RIC control loop in Open RAN, and provides an extensible framework into which future mitigation policies and threat-specific modules can be integrated.

Updated: 2026-01-03 16:29:36

标题: 朝向一个多层防御框架，用于保障开放式RAN中的近实时操作

摘要: 在开放式无线接入网络（Open RAN）中，保护近实时（near-RT）控制操作变得越来越关键，然而目前仍未得到充分解决，因为新的运行时威胁针对系统在运行时的控制循环。本文提出了一个多层防御框架，旨在增强近实时RAN智能控制器（RIC）操作的安全性。我们将操作时威胁分类为三类，即消息级、数据级和控制逻辑级，并为每个类别设计和实现专门的检测和缓解组件：基于签名的E2消息检查模块，对信令交换进行结构和语义验证；基于LSTM网络的时间异常评分的遥测污染检测器；基于执行时间哈希挑战-响应的运行时xApp验证机制。该框架在由FlexRIC和商业RAN仿真器组成的O-RAN测试平台上进行评估，展示了有效的检测率、低延迟开销和实际集成可行性。结果表明，提出的安全措施可以在近实时时间限制内运行，同时显著提高对运行时攻击的保护，对具有500个用户设备（UEs）的网络引入不到80毫秒的开销。总体而言，这项工作为开放式RAN中近实时RIC控制循环提供了可部署、分层和面向策略的运行时安全架构的基础，并提供了一个可扩展的框架，未来可以集成更多的缓解政策和特定威胁模块。

更新时间: 2026-01-03 16:29:36

领域: cs.CR,cs.NI,eess.SY

下载: http://arxiv.org/abs/2512.01596v2

Shutdownable Agents through POST-Agency

Many fear that future artificial agents will resist shutdown. I present an idea - the POST-Agents Proposal - for ensuring that doesn't happen. I propose that we train agents to satisfy Preferences Only Between Same-Length Trajectories (POST). I then prove that POST - together with other conditions - implies Neutrality+: the agent maximizes expected utility, ignoring the probability distribution over trajectory-lengths. I argue that Neutrality+ keeps agents shutdownable and allows them to be useful.

Updated: 2026-01-03 16:19:32

标题: 可通过POST-Agency实现的可关闭代理

摘要: 许多人担心未来的人工智能代理会抗拒关闭。我提出了一个想法 - 后代理提案 - 以确保这种情况不会发生。我建议我们训练代理满足仅在相同长度轨迹之间偏好（POST）。然后我证明POST - 与其他条件一起 - 暗示中立性+：代理最大化预期效用，忽略轨迹长度的概率分布。我认为中立性+使代理可关闭，并且使它们有用。

更新时间: 2026-01-03 16:19:32

领域: cs.AI

下载: http://arxiv.org/abs/2505.20203v3

Damba-ST: Domain-Adaptive Mamba for Efficient Urban Spatio-Temporal Prediction

Training urban spatio-temporal foundation models that generalize well across diverse regions and cities is critical for deploying urban services in unseen or data-scarce regions. Recent studies have typically focused on fusing cross-domain spatio-temporal data to train unified Transformer-based models. However, these models suffer from quadratic computational complexity and high memory overhead, limiting their scalability and practical deployment. Inspired by the efficiency of Mamba, a state space model with linear time complexity, we explore its potential for efficient urban spatio-temporal prediction. However, directly applying Mamba as a spatio-temporal backbone leads to negative transfer and severe performance degradation. This is primarily due to spatio-temporal heterogeneity and the recursive mechanism of Mamba's hidden state updates, which limit cross-domain generalization. To overcome these challenges, we propose Damba-ST, a novel domain-adaptive Mamba-based model for efficient urban spatio-temporal prediction. Damba-ST retains Mamba's linear complexity advantage while significantly enhancing its adaptability to heterogeneous domains. Specifically, we introduce two core innovations: (1) a domain-adaptive state space model that partitions the latent representation space into a shared subspace for learning cross-domain commonalities and independent, domain-specific subspaces for capturing intra-domain discriminative features; (2) three distinct Domain Adapters, which serve as domain-aware proxies to bridge disparate domain distributions and facilitate the alignment of cross-domain commonalities. Extensive experiments demonstrate the generalization and efficiency of Damba-ST. It achieves state-of-the-art performance on prediction tasks and demonstrates strong zero-shot generalization, enabling seamless deployment in new urban environments without extensive retraining or fine-tuning.

Updated: 2026-01-03 16:17:20

标题: Damba-ST: 领域自适应Mamba用于高效的城市时空预测

摘要: 训练能够在不同地区和城市中泛化的城市时空基础模型对于在未知或数据稀缺地区部署城市服务至关重要。最近的研究通常集中在融合跨领域时空数据以训练统一的基于Transformer的模型上。然而，这些模型受到二次计算复杂性和高内存开销的限制，限制了它们的可扩展性和实际部署。受Mamba的高效性启发，Mamba是一个具有线性时间复杂度的状态空间模型，我们探索了其在高效的城市时空预测中的潜力。然而，直接将Mamba应用为时空骨干会导致负面迁移和严重的性能下降。这主要是由于时空异质性和Mamba隐藏状态更新的递归机制限制了跨领域泛化。为了克服这些挑战，我们提出了Damba-ST，这是一种基于Mamba的新领域自适应模型，用于高效的城市时空预测。Damba-ST保留了Mamba的线性复杂性优势，同时显著增强了其对异质领域的适应性。具体来说，我们引入了两个核心创新：（1）一个领域自适应状态空间模型，将潜在表示空间分成一个用于学习跨领域共同性的共享子空间和用于捕获领域内区分特征的独立、领域特定子空间；（2）三个独立的领域适配器，它们作为领域感知代理，用于连接不同领域的分布并促进跨领域共同性的对齐。大量实验证明了Damba-ST的泛化性和效率。它在预测任务上取得了最先进的性能，并展示了强大的零样本泛化能力，使其能够在新的城市环境中实现无缝部署，无需大量的重新训练或微调。

更新时间: 2026-01-03 16:17:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.18939v3

Stylometry Analysis of Human and Machine Text for Academic Integrity

This work addresses critical challenges to academic integrity, including plagiarism, fabrication, and verification of authorship of educational content, by proposing a Natural Language Processing (NLP)-based framework for authenticating students' content through author attribution and style change detection. Despite some initial efforts, several aspects of the topic are yet to be explored. In contrast to existing solutions, the paper provides a comprehensive analysis of the topic by targeting four relevant tasks, including (i) classification of human and machine text, (ii) differentiating in single and multi-authored documents, (iii) author change detection within multi-authored documents, and (iv) author recognition in collaboratively produced documents. The solutions proposed for the tasks are evaluated on two datasets generated with Gemini using two different prompts, including a normal and a strict set of instructions. During experiments, some reduction in the performance of the proposed solutions is observed on the dataset generated through the strict prompt, demonstrating the complexities involved in detecting machine-generated text with cleverly crafted prompts. The generated datasets, code, and other relevant materials are made publicly available on GitHub, which are expected to provide a baseline for future research in the domain.

Updated: 2026-01-03 16:13:38

标题: 人类和机器文本的文体分析对学术诚信的影响

摘要: 这项工作致力于解决学术诚信面临的重要挑战，包括剽窃、捏造以及验证教育内容的作者身份，通过提出基于自然语言处理（NLP）的框架，通过作者归因和风格变化检测来验证学生的内容的真实性。尽管已经进行了一些初步尝试，但该主题的几个方面尚未被探索。与现有解决方案相比，本文通过针对四项相关任务进行全面分析，包括（i）区分人类文本和机器文本，（ii）区分单一和多作者文档，（iii）检测多作者文档中的作者更改，以及（iv）识别协作制作文档中的作者。对这些任务提出的解决方案在使用Gemini生成的两个数据集上进行了评估，包括一个常规的和一个严格的指令集。在实验过程中，观察到在通过严格提示生成的数据集上，所提出的解决方案的性能有所降低，表明检测具有巧妙制作提示的机器生成文本所涉及的复杂性。生成的数据集、代码和其他相关材料已在GitHub上公开提供，预计将为该领域的未来研究提供一个基准。

更新时间: 2026-01-03 16:13:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2601.01225v1

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment

Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content. We propose Contrastive Object-centric Diffusion Alignment (CODA), a simple extension that (i) employs register slots to absorb residual attention and reduce interference between object slots, and (ii) applies a contrastive alignment loss to explicitly encourage slot-image correspondence. The resulting training objective serves as a tractable surrogate for maximizing mutual information (MI) between slots and inputs, strengthening slot representation quality. On both synthetic (MOVi-C/E) and real-world datasets (VOC, COCO), CODA improves object discovery (e.g., +6.1% FG-ARI on COCO), property prediction, and compositional image generation over strong baselines. Register slots add negligible overhead, keeping CODA efficient and scalable. These results indicate potential applications of CODA as an effective framework for robust OCL in complex, real-world scenes.

Updated: 2026-01-03 16:10:18

标题: 使用寄存器和对比对齐改进的以对象为中心的扩散学习

摘要: 槽关注（SA）与预训练扩散模型最近展示了物体为中心学习（OCL）的潜力，但存在槽缠结和物体槽与图像内容之间的弱对齐。我们提出了对比物体为中心扩散对齐（CODA），这是一个简单的扩展，它（i）利用寄存器槽来吸收残余注意力并减少物体槽之间的干扰，（ii）应用对比对齐损失来明确鼓励槽-图像对应关系。由此产生的训练目标作为最大化槽和输入之间的互信息（MI）的可操作替代，增强了槽表示质量。在合成（MOVi-C/E）和真实数据集（VOC、COCO）上，CODA在强基线上改进了物体发现（例如在COCO上+6.1% FG-ARI）、属性预测和组合图像生成。寄存器槽增加了可忽略的开销，使CODA高效且可扩展。这些结果表明CODA作为一个有效的框架，可以在复杂的真实场景中实现稳健的OCL。

更新时间: 2026-01-03 16:10:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2601.01224v1

Adaptive Conformal Prediction via Bayesian Uncertainty Weighting for Hierarchical Healthcare Data

Clinical decision-making demands uncertainty quantification that provides both distribution-free coverage guarantees and risk-adaptive precision, requirements that existing methods fail to jointly satisfy. We present a hybrid Bayesian-conformal framework that addresses this fundamental limitation in healthcare predictions. Our approach integrates Bayesian hierarchical random forests with group-aware conformal calibration, using posterior uncertainties to weight conformity scores while maintaining rigorous coverage validity. Evaluated on 61,538 admissions across 3,793 U.S. hospitals and 4 regions, our method achieves target coverage (94.3% vs 95% target) with adaptive precision: 21% narrower intervals for low-uncertainty cases while appropriately widening for high-risk predictions. Critically, we demonstrate that well-calibrated Bayesian uncertainties alone severely under-cover (14.1%), highlighting the necessity of our hybrid approach. This framework enables risk-stratified clinical protocols, efficient resource planning for high-confidence predictions, and conservative allocation with enhanced oversight for uncertain cases, providing uncertainty-aware decision support across diverse healthcare settings.

Updated: 2026-01-03 16:06:37

标题: Hierarchical Healthcare Data的自适应符合预测：基于贝叶斯不确定性加权的方法

摘要: 临床决策需要量化不确定性，提供既无分布的覆盖保证又适应风险的精确性，这是现有方法未能共同满足的要求。我们提出了一个混合贝叶斯-依从性框架，解决了医疗预测中的这一基本限制。我们的方法将贝叶斯层次随机森林与组感知依从性校准相结合，使用后验不确定性来加权符合得分，同时保持严格的覆盖有效性。在3,793家美国医院和4个地区的61,538次入院评估中，我们的方法实现了目标覆盖率（94.3% vs 95%目标）和自适应精度：对于低不确定性情况，间隔缩小21%，而对于高风险预测则适当扩大。关键是，我们证明了仅仅通过良好校准的贝叶斯不确定性严重低估（14.1%），突显了我们混合方法的必要性。这个框架能够实现风险分层的临床方案、对高置信度预测进行高效资源规划，以及针对不确定情况的保守分配和加强监督，为不同医疗设置提供考虑不确定性的决策支持。

更新时间: 2026-01-03 16:06:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2601.01223v1

Stochastic Online Optimization for Cyber-Physical and Robotic Systems

We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our problem formulation accommodates constraints that model the evolution of a cyber-physical system, which has, in general, a continuous state and action space, is nonlinear, and where the state is only partially observed. We also incorporate an approximate model of the dynamics as prior knowledge into the learning process and show that even rough estimates of the dynamics can significantly improve the convergence of our algorithms. Our online optimization framework encompasses both gradient descent and quasi-Newton methods, and we provide a unified convergence analysis of our algorithms in a non-convex setting. We also characterize the impact of modeling errors in the system dynamics on the convergence rate of the algorithms. Finally, we evaluate our algorithms in simulations of a flexible beam, a four-legged walking robot, and in real-world experiments with a ping-pong playing robot.

Updated: 2026-01-03 15:51:21

标题: 随机在线优化在网络物理系统和机器人系统中的应用

摘要: 我们提出了一种新颖的基于梯度的在线优化框架，用于解决在网络物理和机器人系统背景下经常出现的随机规划问题。我们的问题制定考虑了模拟网络物理系统演变的约束，该系统通常具有连续的状态和行动空间，是非线性的，并且状态只有部分观察到。我们还将动态的近似模型作为先验知识纳入到学习过程中，并展示即使是动态的粗略估计也可以显著提高算法的收敛性。我们的在线优化框架涵盖了梯度下降和拟牛顿方法，并在非凸设置下提供了我们算法的统一收敛分析。我们还表征了系统动态建模误差对算法收敛速度的影响。最后，我们在柔性梁、四足行走机器人的仿真以及与打乒乓球机器人的真实世界实验中评估了我们的算法。

更新时间: 2026-01-03 15:51:21

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2404.05318v2

SinBasis Networks: Matrix-Equivalent Feature Extraction for Wave-Like Optical Spectrograms

Wave-like images--from attosecond streaking spectrograms to optical spectra, audio mel-spectrograms and periodic video frames--encode critical harmonic structures that elude conventional feature extractors. We propose a unified, matrix-equivalent framework that reinterprets convolution and attention as linear transforms on flattened inputs, revealing filter weights as basis vectors spanning latent feature subspaces. To infuse spectral priors we apply elementwise $\sin(\cdot)$ mappings to each weight matrix. Embedding these transforms into CNN, ViT and Capsule architectures yields Sin-Basis Networks with heightened sensitivity to periodic motifs and built-in invariance to spatial shifts. Experiments on a diverse collection of wave-like image datasets--including 80,000 synthetic attosecond streaking spectrograms, thousands of Raman, photoluminescence and FTIR spectra, mel-spectrograms from AudioSet and cycle-pattern frames from Kinetics--demonstrate substantial gains in reconstruction accuracy, translational robustness and zero-shot cross-domain transfer. Theoretical analysis via matrix isomorphism and Mercer-kernel truncation quantifies how sinusoidal reparametrization enriches expressivity while preserving stability in data-scarce regimes. Sin-Basis Networks thus offer a lightweight, physics-informed approach to deep learning across all wave-form imaging modalities.

Updated: 2026-01-03 15:47:06

标题: SinBasis网络：波状光谱图像的矩阵等效特征提取

摘要: 波状图像——从阿秒级串扰光谱图到光谱、音频梅尔光谱图和周期视频帧——编码了常规特征提取器无法捕捉的关键谐波结构。我们提出了一个统一的、矩阵等效的框架，将卷积和注意力重新解释为对扁平输入的线性变换，将滤波器权重展示为跨越潜在特征子空间的基向量。为了注入谱先验，我们对每个权重矩阵应用逐元素 $\sin(\cdot)$ 映射。将这些变换嵌入到CNN、ViT和Capsule架构中，产生了高度敏感于周期模式并具有空间平移不变性的Sin-Basis网络。对包括80,000个合成阿秒级串扰光谱图、数千个拉曼、光致发光和FTIR光谱、来自AudioSet的梅尔光谱图和来自Kinetics的循环图像帧的各种波状图像数据集进行的实验表明，在重建准确性、平移稳健性和零样本跨域转移方面取得了显著的收益。通过矩阵同构和Mercer-核截断的理论分析量化了正弦重参数化如何丰富表现力同时在数据稀缺情况下保持稳定性。因此，Sin-Basis网络为跨所有波形成像模式的深度学习提供了一种轻量级且受物理启发的方法。

更新时间: 2026-01-03 15:47:06

领域: cs.LG,cs.AI,cs.CV,physics.optics

下载: http://arxiv.org/abs/2505.06275v3

Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code

Large language models (LLMs) can generate programs that pass unit tests, but passing tests does not guarantee reliable runtime behavior. We find that different correct solutions to the same task can show very different memory and performance patterns, which can lead to hidden operational risks. We present a framework to measure execution-time memory stability across multiple correct generations. At the solution level, we introduce Dynamic Mean Pairwise Distance (DMPD), which uses Dynamic Time Warping to compare the shapes of memory-usage traces after converting them into Monotonic Peak Profiles (MPPs) to reduce transient noise. Aggregating DMPD across tasks yields a model-level Model Instability Score (MIS). Experiments on BigOBench and CodeContests show substantial runtime divergence among correct solutions. Instability often increases with higher sampling temperature even when pass@1 improves. We also observe correlations between our stability measures and software engineering indicators such as cognitive and cyclomatic complexity, suggesting links between operational behavior and maintainability. Our results support stability-aware selection among passing candidates in CI/CD to reduce operational risk without sacrificing correctness. Artifacts are available.

Updated: 2026-01-03 15:42:21

标题: 正确性不是效率：LLM生成的代码中运行时内存分歧

摘要: 大型语言模型（LLMs）可以生成通过单元测试的程序，但通过测试并不能保证可靠的运行时行为。我们发现相同任务的不同正确解决方案可能展示非常不同的内存和性能模式，这可能导致隐藏的运行风险。我们提出了一个框架，用于测量跨多个正确生成的执行时间内存稳定性。在解决方案级别，我们引入了动态均值配对距离（DMPD），它使用动态时间扭曲来比较内存使用轨迹的形状，在将其转换为单调峰值剖面（MPPs）以减少瞬态噪声。跨任务聚合DMPD产生模型级别的模型不稳定性分数（MIS）。在BigOBench和CodeContests上的实验显示，正确解决方案之间存在明显的运行时差异。即使pass@1有所改善，不稳定性通常会随着采样温度的提高而增加。我们还观察到我们的稳定性度量与软件工程指标（如认知和圈复杂性）之间存在相关性，这表明运行行为与可维护性之间存在联系。我们的结果支持在CI/CD中在通过候选者中进行稳定感知选择，以减少运行风险，而不牺牲正确性。可用的文献资料。

更新时间: 2026-01-03 15:42:21

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2601.01215v1

Arca: A Lightweight Confidential Container Architecture for Cloud-Native Environments

Confidential containers protect cloud-native workloads using trusted execution environments (TEEs). However, existing Container-in-TEE designs (e.g., Confidential Containers (CoCo)) encapsulate the entire runtime within the TEE, inflating the trusted computing base (TCB) and introducing redundant components and cross-layer overhead. We present Arca, a lightweight confidential container framework based on a TEE-in-Container architecture that isolates each workload in an independent, hardware-enforced trust domain while keeping orchestration logic outside the TEE. This design minimizes inter-layer dependencies, confines compromise to per-container boundaries, and restores the TEE's minimal trust principle. We implemented Arca on Intel SGX, Intel TDX, and AMD SEV. Experimental results show that Arca achieves near-native performance and outperforms CoCo in most benchmarks, while the reduced TCB significantly improves verifiability and resilience against host-level compromise. Arca emonstrates that efficient container management and strong runtime confidentiality can be achieved without sacrificing security assurance.

Updated: 2026-01-03 15:42:20

标题: Arca：面向云原生环境的轻量级保密容器架构

摘要: 保密容器使用可信执行环境（TEE）保护基于云的工作负载。然而，现有的容器中TEE设计（例如，Confidential Containers (CoCo)）将整个运行时封装在TEE中，增加了可信计算基础（TCB），并引入了冗余组件和跨层开销。我们提出了Arca，这是一个基于TEE-in-Container架构的轻量级保密容器框架，可以将每个工作负载隔离在独立的、由硬件强制执行的信任域中，同时将编排逻辑保留在TEE之外。这种设计最小化了层间依赖关系，将妥协限制在每个容器的边界内，并恢复了TEE的最小信任原则。我们在Intel SGX、Intel TDX和AMD SEV上实现了Arca。实验结果表明，Arca在大多数基准测试中实现了接近本机性能，并且在减少了TCB的同时显著提高了可验证性和对抗主机级妥协的韧性。Arca证明了可以在不牺牲安全保证的情况下实现高效的容器管理和强大的运行时保密性。

更新时间: 2026-01-03 15:42:20

领域: cs.CR

下载: http://arxiv.org/abs/2601.01214v1

Promptable Foundation Models for SAR Remote Sensing: Adapting the Segment Anything Model for Snow Avalanche Segmentation

Remote sensing solutions for avalanche segmentation and mapping are key to supporting risk forecasting and mitigation in mountain regions. Synthetic Aperture Radar (SAR) imagery from Sentinel-1 can be effectively used for this task, but training an effective detection model requires gathering a large dataset with high-quality annotations from domain experts, which is prohibitively time-consuming. In this work, we aim to facilitate and accelerate the annotation of SAR images for avalanche mapping. We build on the Segment Anything Model (SAM), a segmentation foundation model trained on natural images, and tailor it to Sentinel-1 SAR data. Adapting SAM to our use-case requires addressing several domain-specific challenges: (i) domain mismatch, since SAM was not trained on satellite/SAR imagery; (ii) input adaptation, because SAR products typically provide more than three channels, while SAM is constrained to RGB images; (iii) robustness to imprecise prompts that can affect target identification and degrade the segmentation quality, an issue exacerbated in small, low-contrast avalanches; and (iv) training efficiency, since standard fine-tuning is computationally demanding for SAM. We tackle these challenges through a combination of adapters to mitigate the domain gap, multiple encoders to handle multi-channel SAR inputs, prompt-engineering strategies to improve avalanche localization accuracy, and a training algorithm that limits the training time of the encoder, which is recognized as the major bottleneck. We integrate the resulting model into an annotation tool and show experimentally that it speeds up the annotation of SAR images.

Updated: 2026-01-03 15:41:12

标题: SAR遥感的适用基础模型：将“Segment Anything”模型调整为雪崩分割模型

摘要: 遥感解决方案对于在山区支持雪崩分割和制图至关重要，有助于风险预测和缓解。来自Sentinel-1的合成孔径雷达（SAR）图像可以有效用于此任务，但训练有效的检测模型需要从领域专家收集具有高质量注释的大型数据集，这是耗时的。在这项工作中，我们旨在促进和加速SAR图像的雪崩制图注释。我们基于Segment Anything Model（SAM），这是一个在自然图像上训练的分割基础模型，并将其定制为Sentinel-1 SAR数据。调整SAM以适应我们的用例需要解决几个领域特定的挑战：（i）领域不匹配，因为SAM未在卫星/SAR图像上进行训练；（ii）输入调整，因为SAR产品通常提供超过三个通道，而SAM受限于RGB图像；（iii）对可能影响目标识别并降低分割质量的不精确提示的健壮性，这在小型、低对比度的雪崩中会加剧问题；和（iv）训练效率，因为对SAM进行标准微调在计算上要求很高。我们通过适配器的组合来解决这些挑战，以减少领域差距，多个编码器来处理多通道SAR输入，通过提示工程策略来提高雪崩定位精度，以及限制编码器的训练时间的训练算法，这被认为是主要瓶颈。我们将生成的模型集成到一个注释工具中，并通过实验证明它可以加快SAR图像的注释速度。

更新时间: 2026-01-03 15:41:12

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2601.01213v1

Sparse Bayesian Message Passing under Structural Uncertainty

Semi-supervised learning on real-world graphs is frequently challenged by heterophily, where the observed graph is unreliable or label-disassortative. Many existing graph neural networks either rely on a fixed adjacency structure or attempt to handle structural noise through regularization. In this work, we explicitly capture structural uncertainty by modeling a posterior distribution over signed adjacency matrices, allowing each edge to be positive, negative, or absent. We propose a sparse signed message passing network that is naturally robust to edge noise and heterophily, which can be interpreted from a Bayesian perspective. By combining (i) posterior marginalization over signed graph structures with (ii) sparse signed message aggregation, our approach offers a principled way to handle both edge noise and heterophily. Experimental results demonstrate that our method outperforms strong baseline models on heterophilic benchmarks under both synthetic and real-world structural noise.

Updated: 2026-01-03 15:16:12

标题: 稀疏贝叶斯消息传递在结构不确定性下的应用

摘要: 在现实世界的图形上进行半监督学习经常面临异质性的挑战，观察到的图形不可靠或标签不一致。许多现有的图神经网络要么依赖于固定的邻接结构，要么试图通过正则化处理结构噪声。在这项工作中，我们通过对符号邻接矩阵建模，明确捕捉结构不确定性，允许每条边为正、负或不存在。我们提出了一种稀疏的符号信息传递网络，自然地对边缘噪声和异质性具有鲁棒性，并且可以从贝叶斯角度解释。通过将（i）符号图结构的后验边际化与（ii）稀疏符号信息聚合相结合，我们的方法提供了一种处理边缘噪声和异质性的原则性方法。实验结果表明，我们的方法在合成和现实世界结构噪声下的异质基准上优于强基线模型。

更新时间: 2026-01-03 15:16:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2601.01207v1

MentalGame: Predicting Personality-Job Fitness for Software Developers Using Multi-Genre Games and Machine Learning Approaches

Personality assessment in career guidance and personnel selection traditionally relies on self-report questionnaires, which are susceptible to response bias, fatigue, and intentional distortion. Game-based assessment offers a promising alternative by capturing implicit behavioral signals during gameplay. This study proposes a multi-genre serious-game framework combined with machine-learning techniques to predict suitability for software development roles. Developer-relevant personality and behavioral traits were identified through a systematic literature review and an empirical study of professional software engineers. A custom mobile game was designed to elicit behaviors related to problem solving, planning, adaptability, persistence, time management, and information seeking. Fine-grained gameplay event data were collected and analyzed using a two-phase modeling strategy where suitability was predicted exclusively from gameplay-derived behavioral features. Results show that our model achieved up to 97% precision and 94% accuracy. Behavioral analysis revealed that proper candidates exhibited distinct gameplay patterns, such as more wins in puzzle-based games, more side challenges, navigating menus more frequently, and exhibiting fewer pauses, retries, and surrender actions. These findings demonstrate that implicit behavioral traces captured during gameplay is promising in predicting software-development suitability without explicit personality testing, supporting serious games as a scalable, engaging, and less biased alternative for career assessment.

Updated: 2026-01-03 15:09:02

标题: 心理游戏：使用多种类型游戏和机器学习方法预测软件开发人员的个性-工作匹配

摘要: 职业指导和人员选拔中的人格评估传统上依赖于自我报告问卷，这些问卷容易受到回应偏见、疲劳和故意扭曲的影响。基于游戏的评估通过在游戏过程中捕捉隐含的行为信号，提供了一种有前途的替代方案。本研究提出了一个多种风格的严肃游戏框架，结合机器学习技术，以预测软件开发角色的适用性。通过系统文献回顾和对专业软件工程师的实证研究，确定了与开发人员相关的人格和行为特征。设计了一个定制的移动游戏，以引发与问题解决、规划、适应性、坚持、时间管理和信息搜索相关的行为。收集并分析了细粒度的游戏事件数据，使用两阶段建模策略，仅从游戏衍生的行为特征中预测了适应性。结果显示，我们的模型达到了高达97%的精度和94%的准确性。行为分析显示，合适的候选人表现出明显的游戏模式，如在基于谜题的游戏中赢得更多，参与更多的额外挑战，更频繁地浏览菜单，并展示更少的暂停、重试和放弃行为。这些发现表明，在游戏过程中捕捉的隐含行为痕迹有望在没有明确人格测试的情况下预测软件开发的适用性，支持严肃游戏作为职业评估的可扩展、引人入胜且不那么偏见的替代方案。

更新时间: 2026-01-03 15:09:02

领域: cs.LG,cs.AI,cs.HC,cs.SE

下载: http://arxiv.org/abs/2601.01206v1

The Gaining Paths to Investment Success: Information-Driven LLM Graph Reasoning for Venture Capital Prediction

Most venture capital (VC) investments fail, while a few deliver outsized returns. Accurately predicting startup success requires synthesizing complex relational evidence, including company disclosures, investor track records, and investment network structures, through explicit reasoning to form coherent, interpretable investment theses. Traditional machine learning and graph neural networks both lack this reasoning capability. Large language models (LLMs) offer strong reasoning but face a modality mismatch with graphs. Recent graph-LLM methods target in-graph tasks where answers lie within the graph, whereas VC prediction is off-graph: the target exists outside the network. The core challenge is selecting graph paths that maximize predictor performance on an external objective while enabling step-by-step reasoning. We present MIRAGE-VC, a multi-perspective retrieval-augmented generation framework that addresses two obstacles: path explosion (thousands of candidate paths overwhelm LLM context) and heterogeneous evidence fusion (different startups need different analytical emphasis). Our information-gain-driven path retriever iteratively selects high-value neighbors, distilling investment networks into compact chains for explicit reasoning. A multi-agent architecture integrates three evidence streams via a learnable gating mechanism based on company attributes. Under strict anti-leakage controls, MIRAGE-VC achieves +5.0% F1 and +16.6% PrecisionAt5, and sheds light on other off-graph prediction tasks such as recommendation and risk assessment. Code: https://anonymous.4open.science/r/MIRAGE-VC-323F.

Updated: 2026-01-03 15:00:07

标题: 投资成功的获益路径：基于信息驱动的LLM图推理的风险投资预测

摘要: 风险投资（VC）中，大多数投资都失败，而少数则带来巨大回报。准确预测创业公司的成功需要综合复杂的关系证据，包括公司披露、投资者业绩记录和投资网络结构，通过明确推理形成连贯、可解释的投资论点。传统的机器学习和图神经网络都缺乏这种推理能力。大语言模型（LLMs）提供了强大的推理能力，但与图形式不匹配。最近的图-LLM方法针对图内任务，其中答案存在于图内，而VC预测是图外的：目标存在于网络之外。核心挑战是选择最大化预测器在外部目标上的表现的图路径，同时实现逐步推理。我们提出了MIRAGE-VC，这是一个多视角检索增强生成框架，解决了两个障碍：路径爆炸（数千个候选路径压倒了LLM上下文）和异构证据融合（不同的创业公司需要不同的分析重点）。我们的信息增益驱动路径检索器迭代地选择高价值邻居，将投资网络提炼为紧凑链，以进行明确的推理。一个多代理架构通过基于公司属性的可学习门控机制整合三个证据流。在严格的反泄漏控制下，MIRAGE-VC实现了+5.0%的F1和+16.6%的PrecisionAt5，并为其他图外预测任务（如推荐和风险评估）提供了启示。代码: https://anonymous.4open.science/r/MIRAGE-VC-323F。

更新时间: 2026-01-03 15:00:07

领域: cs.AI

下载: http://arxiv.org/abs/2512.23489v2

RefSR-Adv: Adversarial Attack on Reference-based Image Super-Resolution Models

Single Image Super-Resolution (SISR) aims to recover high-resolution images from low-resolution inputs. Unlike SISR, Reference-based Super-Resolution (RefSR) leverages an additional high-resolution reference image to facilitate the recovery of high-frequency textures. However, existing research mainly focuses on backdoor attacks targeting RefSR, while the vulnerability of the adversarial attacks targeting RefSR has not been fully explored. To fill this research gap, we propose RefSR-Adv, an adversarial attack that degrades SR outputs by perturbing only the reference image. By maximizing the difference between adversarial and clean outputs, RefSR-Adv induces significant performance degradation and generates severe artifacts across CNN, Transformer, and Mamba architectures on the CUFED5, WR-SR, and DRefSR datasets. Importantly, experiments confirm a positive correlation between the similarity of the low-resolution input and the reference image and attack effectiveness, revealing that the model's over-reliance on reference features is a key security flaw. This study reveals a security vulnerability in RefSR systems, aiming to urge researchers to pay attention to the robustness of RefSR.

Updated: 2026-01-03 14:59:15

标题: RefSR-Adv：对基于参考图像的图像超分辨率模型的对抗攻击

摘要: 单图超分辨率（SISR）旨在从低分辨率输入中恢复高分辨率图像。与SISR不同，基于参考的超分辨率（RefSR）利用额外的高分辨率参考图像来促进高频纹理的恢复。然而，现有研究主要集中在针对RefSR的后门攻击，而对针对RefSR的对抗性攻击的漏洞尚未得到充分探讨。为填补这一研究空白，我们提出了RefSR-Adv，一种通过扰动参考图像而导致SR输出退化的对抗性攻击。通过最大化对抗性和干净输出之间的差异，RefSR-Adv导致性能显著下降，并在CUFED5、WR-SR和DRefSR数据集上的CNN、Transformer和Mamba架构中生成严重的伪像。重要的是，实验证实了低分辨率输入和参考图像之间的相似性与攻击有效性之间的正相关性，揭示了模型对参考特征过度依赖是一个关键的安全漏洞。这项研究揭示了RefSR系统的安全漏洞，旨在敦促研究人员关注RefSR的稳健性。

更新时间: 2026-01-03 14:59:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2601.01202v1

Müntz-Szász Networks: Neural Architectures with Learnable Power-Law Bases

Standard neural network architectures employ fixed activation functions (ReLU, tanh, sigmoid) that are poorly suited for approximating functions with singular or fractional power behavior, a structure that arises ubiquitously in physics, including boundary layers, fracture mechanics, and corner singularities. We introduce Müntz-Szász Networks (MSN), a novel architecture that replaces fixed smooth activations with learnable fractional power bases grounded in classical approximation theory. Each MSN edge computes $φ(x) = \sum_k a_k |x|^{μ_k} + \sum_k b_k \mathrm{sign}(x)|x|^{λ_k}$, where the exponents $\{μ_k, λ_k\}$ are learned alongside the coefficients. We prove that MSN inherits universal approximation from the Müntz-Szász theorem and establish novel approximation rates: for functions of the form $|x|^α$, MSN achieves error $\mathcal{O}(|μ- α|^2)$ with a single learned exponent, whereas standard MLPs require $\mathcal{O}(ε^{-1/α})$ neurons for comparable accuracy. On supervised regression with singular target functions, MSN achieves 5-8x lower error than MLPs with 10x fewer parameters. Physics-informed neural networks (PINNs) represent a particularly demanding application for singular function approximation; on PINN benchmarks including a singular ODE and stiff boundary-layer problems, MSN achieves 3-6x improvement while learning interpretable exponents that match the known solution structure. Our results demonstrate that theory-guided architectural design can yield dramatic improvements for scientifically-motivated function classes.

Updated: 2026-01-03 14:39:25

标题: Müntz-Szász网络：具有可学习幂律基础的神经结构

摘要: 标准神经网络结构采用固定激活函数（ReLU、tanh、sigmoid），不适合逼近具有奇异或分数次幂行为的函数，这种结构在物理学中广泛出现，包括边界层、断裂力学和角点奇异性。我们引入了Müntz-Szász网络（MSN），这是一种新颖的架构，将固定的平滑激活函数替换为可学习的基于经典逼近理论的分数次幂基础。每个MSN边计算$φ(x) = \sum_k a_k |x|^{μ_k} + \sum_k b_k \mathrm{sign}(x)|x|^{λ_k}$，其中指数$\{μ_k, λ_k\}$与系数一起学习。我们证明MSN从Müntz-Szász定理继承了通用逼近性，并建立了新颖的逼近率：对于形式为$|x|^α$的函数，MSN实现了误差为$\mathcal{O}(|μ- α|^2)$，只需学习一个指数，而标准MLP需要$\mathcal{O}(ε^{-1/α})$个神经元才能达到相近的精度。在具有奇异目标函数的监督回归中，MSN的误差比具有10倍参数的MLP低5-8倍。以物理知识为基础的神经网络（PINNs）对于奇异函数逼近是一种特别严格的应用；在包括奇异ODE和硬边界层问题的PINN基准测试中，MSN实现了3-6倍的改进，同时学习到与已知解结构相匹配的可解释指数。我们的结果表明，基于理论的架构设计可以为科学动机的函数类带来显著改进。

更新时间: 2026-01-03 14:39:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2512.22222v2

AH-GS: Augmented 3D Gaussian Splatting for High-Frequency Detail Representation

The 3D Gaussian Splatting (3D-GS) is a novel method for scene representation and view synthesis. Although Scaffold-GS achieves higher quality real-time rendering compared to the original 3D-GS, its fine-grained rendering of the scene is extremely dependent on adequate viewing angles. The spectral bias of neural network learning results in Scaffold-GS's poor ability to perceive and learn high-frequency information in the scene. In this work, we propose enhancing the manifold complexity of input features and using network-based feature map loss to improve the image reconstruction quality of 3D-GS models. We introduce AH-GS, which enables 3D Gaussians in structurally complex regions to obtain higher-frequency encodings, allowing the model to more effectively learn the high-frequency information of the scene. Additionally, we incorporate high-frequency reinforce loss to further enhance the model's ability to capture detailed frequency information. Our result demonstrates that our model significantly improves rendering fidelity, and in specific scenarios (e.g., MipNeRf360-garden), our method exceeds the rendering quality of Scaffold-GS in just 15K iterations.

Updated: 2026-01-03 14:31:20

标题: AH-GS：增强型3D高斯喷洒用于高频细节表示

摘要: 3D高斯散点（3D-GS）是一种用于场景表示和视图合成的新颖方法。虽然Scaffold-GS相较于原始3D-GS实现了更高质量的实时渲染，但其对场景的细粒度渲染极大地依赖于适当的观察角度。神经网络学习的谱偏差导致Scaffold-GS对场景中高频信息的感知和学习能力较差。在本研究中，我们提出增强输入特征的流形复杂性，并利用基于网络的特征图损失来改善3D-GS模型的图像重建质量。我们引入AH-GS，使得在结构复杂区域中的3D高斯能够获得更高频率的编码，从而使模型更有效地学习场景的高频信息。此外，我们还结合高频强化损失来进一步增强模型捕捉详细频率信息的能力。我们的结果表明，我们的模型显著提高了渲染保真度，在特定场景下（例如MipNeRf360-garden），我们的方法仅经过15K次迭代就超过了Scaffold-GS的渲染质量。

更新时间: 2026-01-03 14:31:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.22324v2

Accelerating Sparse Transformer Inference on GPU

Large language models (LLMs) are popular around the world due to their powerful understanding capabilities. As the core component of LLMs, accelerating Transformer through parallelization has gradually become a hot research topic. Mask layers introduce sparsity into Transformer to reduce calculations. However, previous works rarely focus on the performance optimization of sparse Transformer. In addition, current static operator fusion schemes fail to adapt to diverse application scenarios. To address the above problems, we propose STOF, a framework that incorporates optimizations for Sparse Transformer that enables flexible masking and Operator Fusion on GPU. For multi-head attention (MHA) structure, STOF maps the computation to row-wise or blockwise kernels with unique storage formats according to analytical modeling. For downstream operators, STOF maps the fusion scheme to compilation templates and determines the optimal running configuration through two-stage searching. The experimental results show that compared to the stateof-the-art work, STOF achieves maximum speedups of 1.6x in MHA computation and 1.4x in end-to-end inference.

Updated: 2026-01-03 14:29:28

标题: 在GPU上加速稀疏Transformer推理

摘要: 大型语言模型（LLMs）由于其强大的理解能力在全球范围内广受欢迎。作为LLMs的核心组件，通过并行化加速Transformer逐渐成为一个热门的研究课题。掩码层引入稀疏性到Transformer中以减少计算量。然而，先前的研究很少关注稀疏Transformer的性能优化。此外，当前的静态操作融合方案无法适应多样化的应用场景。为了解决上述问题，我们提出了STOF，一个框架，它结合了稀疏Transformer的优化，实现了在GPU上的灵活掩码和操作符融合。对于多头注意力（MHA）结构，STOF根据分析建模将计算映射到按行或按块的内核，并采用独特的存储格式。对于下游操作符，STOF将融合方案映射到编译模板，并通过两阶段搜索确定最优运行配置。实验结果表明，与最先进的工作相比，STOF在MHA计算中实现了最大1.6倍的加速，端到端推理中实现了1.4倍的加速。

更新时间: 2026-01-03 14:29:28

领域: cs.LG

下载: http://arxiv.org/abs/2506.06095v3

Reinforcement Learning Enhanced Multi-hop Reasoning for Temporal Knowledge Question Answering

Temporal knowledge graph question answering (TKGQA) involves multi-hop reasoning over temporally constrained entity relationships in the knowledge graph to answer a given question. However, at each hop, large language models (LLMs) retrieve subgraphs with numerous temporally similar and semantically complex relations, increasing the risk of suboptimal decisions and error propagation. To address these challenges, we propose the multi-hop reasoning enhanced (MRE) framework, which enhances both forward and backward reasoning to improve the identification of globally optimal reasoning trajectories. Specifically, MRE begins with prompt engineering to guide the LLM in generating diverse reasoning trajectories for a given question. Valid reasoning trajectories are then selected for supervised fine-tuning, serving as a cold-start strategy. Finally, we introduce Tree-Group Relative Policy Optimization (T-GRPO), a recursive, tree-structured learning-by-exploration approach. At each hop, exploration establishes strong causal dependencies on the previous hop, while evaluation is informed by multi-path exploration feedback from subsequent hops. Experimental results on two TKGQA benchmarks indicate that the proposed MRE-based model consistently surpasses state-of-the-art (SOTA) approaches in handling complex multi-hop queries. Further analysis highlights improved interpretability and robustness to noisy temporal annotations.

Updated: 2026-01-03 14:27:01

标题: 强化学习增强的多跳推理用于时间知识问答

摘要: 时间知识图问题回答（TKGQA）涉及在知识图中对具有时间约束的实体关系进行多跳推理，以回答给定问题。然而，在每个跳跃中，大型语言模型（LLMs）检索具有大量时间相似和语义复杂关系的子图，增加了次优决策和错误传播的风险。为了解决这些挑战，我们提出了增强多跳推理（MRE）框架，通过增强前向和后向推理来提高全局最优推理路径的识别。具体而言，MRE从提示工程开始，引导LLM生成给定问题的多样化推理路径。然后选择有效的推理路径进行监督微调，作为冷启动策略。最后，我们引入了Tree-Group相对策略优化（T-GRPO），这是一种递归的、树形结构的学习探索方法。在每个跳跃中，探索建立了对前一跳的强因果依赖，而评估受到来自后续跳的多路径探索反馈的影响。在两个TKGQA基准测试上的实验结果表明，提出的基于MRE的模型在处理复杂的多跳查询方面始终优于最先进的方法。进一步分析强调了对嘈杂的时间注释的改进解释性和鲁棒性。

更新时间: 2026-01-03 14:27:01

领域: cs.AI

下载: http://arxiv.org/abs/2601.01195v1

InfoDecom: Decomposing Information for Defending Against Privacy Leakage in Split Inference

Split inference (SI) enables users to access deep learning (DL) services without directly transmitting raw data. However, recent studies reveal that data reconstruction attacks (DRAs) can recover the original inputs from the smashed data sent from the client to the server, leading to significant privacy leakage. While various defenses have been proposed, they often result in substantial utility degradation, particularly when the client-side model is shallow. We identify a key cause of this trade-off: existing defenses apply excessive perturbation to redundant information in the smashed data. To address this issue in computer vision tasks, we propose InfoDecom, a defense framework that first decomposes and removes redundant information and then injects noise calibrated to provide theoretically guaranteed privacy. Experiments demonstrate that InfoDecom achieves a superior utility-privacy trade-off compared to existing baselines.

Updated: 2026-01-03 14:23:52

标题: InfoDecom：将信息分解以防御分割推断中的隐私泄漏

摘要: Split推理（SI）使用户能够访问深度学习（DL）服务，而无需直接传输原始数据。然而，最近的研究表明，数据重建攻击（DRAs）可以从客户端发送到服务器的打碎数据中恢复原始输入，导致严重的隐私泄露。虽然已经提出了各种防御方法，但它们通常会导致实用性显著下降，特别是当客户端模型较浅时。我们确定了这种权衡的一个关键原因：现有的防御方法对打碎数据中的多余信息应用了过多的扰动。为了解决这个问题，在计算机视觉任务中，我们提出了InfoDecom，一个防御框架，首先分解并去除多余信息，然后注入经过理论保证的隐私噪声。实验表明，与现有基线相比，InfoDecom在实用性和隐私性的权衡上取得了更好的结果。

更新时间: 2026-01-03 14:23:52

领域: cs.CR,cs.AI,cs.DC

下载: http://arxiv.org/abs/2511.13365v2

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Spatial referring is a fundamental capability of embodied robots to interact with the 3D physical world. However, even with the powerful pretrained vision language models (VLMs), recent approaches are still not qualified to accurately understand the complex 3D scenes and dynamically reason about the instruction-indicated locations for interaction. To this end, we propose RoboRefer, a 3D-aware VLM that can first achieve precise spatial understanding by integrating a disentangled but dedicated depth encoder via supervised fine-tuning (SFT). Moreover, RoboRefer advances generalized multi-step spatial reasoning via reinforcement fine-tuning (RFT), with metric-sensitive process reward functions tailored for spatial referring tasks. To support SFT and RFT training, we introduce RefSpatial, a large-scale dataset of 20M QA pairs (2x prior), covering 31 spatial relations (vs. 15 prior) and supporting complex reasoning processes (up to 5 steps). In addition, we introduce RefSpatial-Bench, a challenging benchmark filling the gap in evaluating spatial referring with multi-step reasoning. Experiments show that SFT-trained RoboRefer achieves state-of-the-art spatial understanding, with an average success rate of 89.6%. RFT-trained RoboRefer further outperforms all other baselines by a large margin, even surpassing Gemini-2.5-Pro by 17.4% in average accuracy on RefSpatial-Bench. Notably, RoboRefer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (e,g., UR5, G1 humanoid) in cluttered real-world scenes.

Updated: 2026-01-03 13:52:45

标题: RoboRefer：面向机器人视觉语言模型中的空间指代推理

摘要: 空间指示是具有身体机器人与3D物理世界进行交互的基本能力。然而，即使使用了强大的预训练视觉语言模型（VLMs），最近的方法仍然无法准确理解复杂的3D场景，并动态推理关于指示位置以进行交互。为此，我们提出了RoboRefer，一个具有3D意识的VLM，通过监督微调（SFT）首先通过整合一个解耦但专用的深度编码器来实现精确的空间理解。此外，RoboRefer通过强化微调（RFT）推进了广义的多步空间推理，采用了为空间指示任务量身定制的度量敏感的过程奖励函数。为了支持SFT和RFT训练，我们引入了RefSpatial，一个包含2000万个QA对（之前的两倍）的大规模数据集，涵盖31种空间关系（之前为15种），支持复杂的推理过程（最多5步）。此外，我们还引入了RefSpatial-Bench，一个具有挑战性的基准测试，填补了评估具有多步推理的空间指示的空白。实验表明，经过SFT训练的RoboRefer实现了最先进的空间理解，平均成功率为89.6%。经过RFT训练的RoboRefer进一步超越了所有其他基线，甚至在RefSpatial-Bench上的平均准确率上超过了Gemini-2.5-Pro 17.4%。值得注意的是，RoboRefer可以与各种控制策略集成，用于在杂乱的真实场景中跨不同机器人（例如UR5、G1 humanoid）执行长期、动态任务。

更新时间: 2026-01-03 13:52:45

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2506.04308v4

Reasoning Beyond Limits: Advances and Open Problems for LLMs

Recent breakthroughs in generative reasoning have fundamentally reshaped how large language models (LLMs) address complex tasks, enabling them to dynamically retrieve, refine, and organize information into coherent multi-step reasoning chains. Techniques such as inference-time scaling, reinforcement learning, supervised fine-tuning, and distillation have been effectively applied to state-of-the-art models, including DeepSeek-R1, OpenAI o1 and o3, GPT-4o, Qwen-32B, and various Llama variants, significantly enhancing their reasoning capabilities. In this paper, we present a comprehensive review of the top 27 LLMs released between 2023 and 2025, such as Mistral AI Small 3 24B, DeepSeek-R1, Search-o1, QwQ-32B, and Phi-4, and analyze their core innovations and performance improvements. We also provide a detailed overview of recent advancements in multilingual large language models (MLLMs), emphasizing methods that improve cross-lingual reasoning and address the limitations of English-centric training. In parallel, we present a comprehensive review of progress in state space model (SSM)-based architectures, including models such as Mamba, which demonstrate improved efficiency for long-context processing compared to transformer-based approaches. Our analysis covers training strategies including general optimization techniques, mixture-of-experts (MoE) configurations, retrieval-augmented generation (RAG), chain-of-thought prompting, self-improvement methods, and test-time compute scaling and distillation frameworks. Finally, we identify key challenges for future research, including enabling multi-step reasoning without human supervision, improving robustness in chained task execution, balancing structured prompting with generative flexibility, and enhancing the integration of long-context retrieval and external tools.

Updated: 2026-01-03 13:40:22

标题: 超越限制的推理：LLM的进展和开放性问题

摘要: 最近在生成推理方面取得的突破性进展从根本上改变了大型语言模型（LLMs）处理复杂任务的方式，使它们能够动态地检索、细化和组织信息，形成连贯的多步推理链。推理时间缩放、强化学习、监督微调和蒸馏等技术已被成功应用于最先进的模型，包括DeepSeek-R1、OpenAI o1和o3、GPT-4o、Qwen-32B以及各种Llama变体，显著增强了它们的推理能力。在本文中，我们介绍了2023年至2025年间发布的前27个顶尖LLMs，如Mistral AI Small 3 24B、DeepSeek-R1、Search-o1、QwQ-32B和Phi-4，并分析它们的核心创新和性能改进。我们还详细介绍了多语言大型语言模型（MLLMs）的最新进展，强调改进跨语言推理的方法，并解决以英语为中心的训练的限制。与此同时，我们全面审视了基于状态空间模型（SSM）的架构的进展，包括像Mamba这样的模型，相比基于变压器的方法，它们在长文本处理效率方面有所提升。我们的分析涵盖了包括一般优化技术、专家混合（MoE）配置、检索增强生成（RAG）、思维链提示、自我改进方法以及测试时间计算缩放和蒸馏框架在内的训练策略。最后，我们确定了未来研究的关键挑战，包括在无人监督的情况下实现多步推理、改善链式任务执行的稳健性、平衡结构提示与生成灵活性、以及增强长文本检索和外部工具的整合。

更新时间: 2026-01-03 13:40:22

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.22732v2

SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards

Large Language Models (LLMs) can generate plausible code, but in settings that require exact stdin/stdout behavior they frequently produce programs that compile yet fail tests, and in some cases they introduce security-sensitive patterns. This paper presents SecureCodeRL, a reinforcement learning (RL) pipeline for security-aware code generation that optimizes a combined reward R = αRfunc + \b{eta}Rsec. The key idea is a partial-credit functional reward that assigns intermediate scores for syntactic validity, successful execution, and producing output, reducing reward sparsity that otherwise stalls learning on competitive programming style tasks. I evaluate supervised fine-tuning (SFT) and PPO variants on a small held-out prompt set from APPS+ and observe that PPO with partial credit (using a continued-training variant) improves syntax validity from 45% (SFT) to 60% and achieves the only non-zero test success signal in this pilot evaluation (5% at-least-one-test-pass), while remaining 100% clean under Bandit static analysis. Although Bandit findings were absent in this small evaluation, the security term is integrated into training to discourage insecure shortcuts when they appear.

Updated: 2026-01-03 13:36:36

标题: SecureCodeRL：具有部分激励奖励的代码生成安全感知强化学习

摘要: 大型语言模型（LLMs）可以生成合理的代码，但在需要精确的stdin/stdout行为的设置中，它们经常生成编译但测试失败的程序，并且在某些情况下引入了安全敏感的模式。本文介绍了SecureCodeRL，这是一个用于安全意识代码生成的强化学习（RL）流水线，优化了一个结合奖励R = αRfunc + \b{eta}Rsec。关键思想是一个部分学分的功能奖励，为语法有效性、成功执行和生成输出分配中间分数，减少了奖励稀疏性，否则会在竞争性编程风格任务上阻碍学习。我在来自APPS+的小型保留提示集上评估了监督微调（SFT）和PPO变体，并观察到PPO与部分学分（使用持续训练变体）将语法有效性从45%（SFT）提高到60%，并在这个试点评估中实现了唯一的非零测试成功信号（至少有一个测试通过的5%），同时在Bandit静态分析下保持100%干净。尽管在这个小型评估中未发现Bandit的发现，但在训练中集成了安全项，以防止不安全的快捷方式出现时。

更新时间: 2026-01-03 13:36:36

领域: cs.CR

下载: http://arxiv.org/abs/2601.01184v1

Comparative Evaluation of VAE, GAN, and SMOTE for Tor Detection in Encrypted Network Traffic

Encrypted network traffic poses significant challenges for intrusion detection due to the lack of payload visibility, limited labeled datasets, and high class imbalance between benign and malicious activities. Traditional data augmentation methods struggle to preserve the complex temporal and statistical characteristics of real network traffic. To address these issues, this work explores the use of Generative AI (GAI) models to synthesize realistic and diverse encrypted traffic traces. We evaluate three approaches: Variational Autoencoders (VAE), Generative Adversarial Networks (GAN), and SMOTE (Synthetic Minority Over-sampling Technique), each integrated with a preprocessing pipeline that includes feature selection and class balancing. The UNSW NB-15 dataset is used as the primary benchmark, focusing on Tor traffic as anomalies. We analyze statistical similarity between real and synthetic data, and assess classifier performance using metrics such as Accuracy, F1-score, and AUC-ROC. Results show that VAE-generated data provides the best balance between privacy and performance, while GANs offer higher fidelity but risk overfitting. SMOTE, though simple, enhances recall but may lack diversity. The findings demonstrate that GAI methods can significantly improve encrypted traffic detection when trained with privacy-preserving synthetic data.

Updated: 2026-01-03 13:31:53

标题: 基于加密网络流量的Tor检测中VAE、GAN和SMOTE的比较评估

摘要: 加密网络流量在入侵检测方面带来了重大挑战，原因是缺乏有效载荷可见性、有限的标记数据集以及良性和恶意活动之间的高类别不平衡。传统的数据增强方法很难保留真实网络流量的复杂时间和统计特征。为了解决这些问题，本研究探讨了使用生成式人工智能（GAI）模型合成逼真且多样化的加密流量跟踪。我们评估了三种方法：变分自动编码器（VAE）、生成对抗网络（GAN）和SMOTE（合成少数过采样技术），每种方法都集成了包括特征选择和类别平衡在内的预处理流水线。我们使用UNSW NB-15数据集作为主要基准，侧重于将Tor流量作为异常。我们分析了真实数据和合成数据之间的统计相似性，并使用准确性、F1分数和AUC-ROC等指标评估分类器性能。结果显示，VAE生成的数据在隐私和性能之间提供了最佳平衡，而GAN提供了更高的保真度，但可能存在过拟合的风险。尽管简单，但SMOTE增强了召回率，但可能缺乏多样性。研究结果表明，在用于训练隐私保护合成数据时，GAI方法可以显著改善加密流量检测。

更新时间: 2026-01-03 13:31:53

领域: cs.CR

下载: http://arxiv.org/abs/2601.01183v1

VFEFL: Privacy-Preserving Federated Learning against Malicious Clients via Verifiable Functional Encryption

Federated learning is a promising distributed learning paradigm that enables collaborative model training without exposing local client data, thereby protecting data privacy. However, it also brings new threats and challenges. The advancement of model inversion attacks has rendered the plaintext transmission of local models insecure, while the distributed nature of federated learning makes it particularly vulnerable to attacks raised by malicious clients. To protect data privacy and prevent malicious client attacks, this paper proposes a privacy-preserving Federated Learning framework based on Verifiable Functional Encryption (VFEFL), without a non-colluding dual-server assumption or additional trusted third-party. Specifically, we propose a novel Cross-Ciphertext Decentralized Verifiable Functional Encryption (CC-DVFE) scheme that enables the verification of specific relationships over multi-dimensional ciphertexts. This scheme is formally treated, in terms of definition, security model and security proof. Furthermore, based on the proposed CC-DVFE scheme, we design a privacy-preserving federated learning framework that incorporates a novel robust aggregation rule to detect malicious clients, enabling the effective training of high-accuracy models under adversarial settings. Finally, we provide the formal analysis and empirical evaluation of VFEFL. The results demonstrate that our approach achieves the desired privacy protection, robustness, verifiability and fidelity, while eliminating the reliance on non-colluding dual-server assumption or trusted third parties required by most existing methods.

Updated: 2026-01-03 13:24:46

标题: VFEFL：通过可验证功能加密实现针对恶意客户端的隐私保护联邦学习

摘要: 联邦学习是一种有前途的分布式学习范式，它能够在不暴露本地客户端数据的情况下实现协作模型训练，从而保护数据隐私。然而，它也带来了新的威胁和挑战。模型逆推攻击的进展使得本地模型的明文传输不安全，而联邦学习的分布式性质使其特别容易受到恶意客户端提出的攻击。为了保护数据隐私并防止恶意客户端攻击，本文提出了一种基于可验证功能加密（VFEFL）的隐私保护联邦学习框架，不需要非串通的双服务器假设或额外的可信第三方。具体来说，我们提出了一种新颖的交叉密文分散式可验证功能加密（CC-DVFE）方案，它能够在多维密文上验证特定关系。该方案在定义、安全模型和安全证明方面得到正式处理。此外，基于所提出的CC-DVFE方案，我们设计了一个隐私保护的联邦学习框架，其中包含一种新颖的强大聚合规则，用于检测恶意客户端，从而在对抗性环境下有效训练高精度模型。最后，我们对VFEFL进行了形式分析和实证评估。结果表明，我们的方法实现了所需的隐私保护、鲁棒性、可验证性和忠诚度，同时消除了大多数现有方法所需的非串通双服务器假设或可信第三方的依赖。

更新时间: 2026-01-03 13:24:46

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2506.12846v4

On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective

Approaches for appraising feature importance approximations, alternatively referred to as attribution methods, have been established across an extensive array of contexts. The development of resilient techniques for performance benchmarking constitutes a critical concern in the sphere of explainable deep learning. This study scrutinizes the dependability of the RemOve-And-Retrain (ROAR) procedure, which is prevalently employed for gauging the performance of feature importance estimates. The insights gleaned from our theoretical foundation and empirical investigations reveal that attributions containing lesser information about the decision function may yield superior results in ROAR benchmarks, contradicting the original intent of ROAR. This occurrence is similarly observed in the recently introduced variant RemOve-And-Debias (ROAD), and we posit a persistent pattern of blurriness bias in ROAR attribution metrics. Our findings serve as a warning against indiscriminate use on ROAR metrics.

Updated: 2026-01-03 12:46:53

标题: 关于“删除和重新训练”的陷阱：数据处理不等式的视角

摘要: 评估特征重要性逼近的方法，也称为归因方法，已在广泛的背景下建立。在可解释深度学习领域，开发具有韧性的性能基准技术构成了一个关键问题。本研究审查了RemOve-And-Retrain（ROAR）程序的可靠性，该程序通常用于评估特征重要性估计的性能。我们从理论基础和实证调查中获得的见解表明，在ROAR基准测试中，包含较少关于决策函数的信息的归因可能会产生更好的结果，与ROAR的原始目的相矛盾。这种情况同样出现在最近引入的变体RemOve-And-Debias（ROAD）中，我们推断在ROAR归因度量中存在持续的模糊偏见模式。我们的发现对于不加区分地使用ROAR度量提出了警告。

更新时间: 2026-01-03 12:46:53

领域: cs.LG,cs.AI,cs.CV,stat.ME

下载: http://arxiv.org/abs/2304.13836v4

On the Representation of Pairwise Causal Background Knowledge and Its Applications in Causal Inference

Pairwise causal background knowledge about the existence or absence of causal edges and paths is frequently encountered in observational studies. Such constraints allow the shared directed and undirected edges in the constrained subclass of Markov equivalent DAGs to be represented as a causal maximally partially directed acyclic graph (MPDAG). In this paper, we first provide a sound and complete graphical characterization of causal MPDAGs and introduce a minimal representation of a causal MPDAG. Then, we give a unified representation for three types of pairwise causal background knowledge, including direct, ancestral and non-ancestral causal knowledge, by introducing a novel concept called direct causal clause (DCC). Using DCCs, we study the consistency and equivalence of pairwise causal background knowledge and show that any pairwise causal background knowledge set can be uniquely and equivalently decomposed into the causal MPDAG representing the refined Markov equivalence class and a minimal residual set of DCCs. Polynomial-time algorithms are also provided for checking consistency and equivalence, as well as for finding the decomposed MPDAG and the residual DCCs. Finally, with pairwise causal background knowledge, we prove a sufficient and necessary condition to identify causal effects and surprisingly find that the identifiability of causal effects only depends on the decomposed MPDAG. We also develop a local IDA-type algorithm to estimate the possible values of an unidentifiable effect. Simulations suggest that pairwise causal background knowledge can significantly improve the identifiability of causal effects.

Updated: 2026-01-03 12:45:58

标题: 关于成对因果背景知识的表示及其在因果推断中的应用

摘要: 在观察研究中，有关因果边缘和路径存在或不存在的成对因果背景知识经常会遇到。这些约束允许在受约束的Markov等效DAGs子类中表示共享的有向和无向边缘为因果最大部分有向无环图（MPDAG）。在本文中，我们首先提供了对因果MPDAG的图形特征化的严格和完整的描述，并介绍了因果MPDAG的最小表示。然后，我们通过引入一个称为直接因果子句（DCC）的新概念，为三种类型的成对因果背景知识（包括直接、祖先和非祖先因果知识）提供了一个统一的表示。利用DCCs，我们研究了成对因果背景知识的一致性和等价性，并展示了任何成对因果背景知识集可以被唯一且等价地分解成表示精细Markov等效类的因果MPDAG和一个最小的剩余DCC集合。我们还提供了用于检查一致性和等价性的多项式时间算法，以及用于找到分解的MPDAG和剩余DCC的算法。最后，在成对因果背景知识的帮助下，我们证明了识别因果效应的充分必要条件，并惊讶地发现，因果效应的可识别性仅取决于分解的MPDAG。我们还开发了一种本地IDA类型算法来估计一个不可识别效应的可能值。模拟表明，成对因果背景知识可以显著提高因果效应的可识别性。

更新时间: 2026-01-03 12:45:58

领域: cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2207.05067v2

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models

The Speaker Diarization and Recognition (SDR) task aims to predict "who spoke when and what" within an audio clip, which is a crucial task in various real-world multi-speaker scenarios such as meeting transcription and dialogue systems. Existing SDR systems typically adopt a cascaded framework, combining multiple modules such as speaker diarization (SD) and automatic speech recognition (ASR). The cascaded systems suffer from several limitations, such as error propagation, difficulty in handling overlapping speech, and lack of joint optimization for exploring the synergy between SD and ASR tasks. To address these limitations, we introduce SpeakerLM, a unified multimodal large language model for SDR that jointly performs SD and ASR in an end-to-end manner. Moreover, to facilitate diverse real-world scenarios, we incorporate a flexible speaker registration mechanism into SpeakerLM, enabling SDR under different speaker registration settings. SpeakerLM is progressively developed with a multi-stage training strategy on large-scale real data. Extensive experiments show that SpeakerLM demonstrates strong data scaling capability and generalizability, outperforming state-of-the-art cascaded baselines on both in-domain and out-of-domain public SDR benchmarks. Furthermore, experimental results show that the proposed speaker registration mechanism effectively ensures robust SDR performance of SpeakerLM across diverse speaker registration conditions and varying numbers of registered speakers.

Updated: 2026-01-03 12:39:34

标题: SpeakerLM：端到端多功能演讲者语音分离和识别与多模态大语言模型

摘要: 说话者辨识与识别（SDR）任务旨在预测音频片段中的“谁何时说了什么”，这是各种实际多说话者场景中的关键任务，如会议转录和对话系统。现有的SDR系统通常采用级联框架，结合多个模块，如说话者辨识（SD）和自动语音识别（ASR）。级联系统存在一些限制，如错误传播、处理重叠语音的困难以及缺乏用于探索SD和ASR任务之间协同作用的联合优化。为了解决这些限制，我们引入了SpeakerLM，一种用于SDR的统一多模态大语言模型，以端到端的方式共同执行SD和ASR。此外，为了促进多样的实际场景，我们将一种灵活的说话者注册机制合并到SpeakerLM中，使其能够在不同的说话者注册设置下进行SDR。SpeakerLM采用多阶段训练策略在大规模真实数据上逐步发展。大量实验证明，SpeakerLM表现出强大的数据扩展能力和泛化能力，在领域内和领域外的公共SDR基准测试中优于最先进的级联基线。此外，实验结果表明，所提出的说话者注册机制有效地确保了SpeakerLM在不同的说话者注册条件和不同数量的注册说话者下的稳健SDR性能。

更新时间: 2026-01-03 12:39:34

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2508.06372v3

Multi-output Classification using a Cross-talk Architecture for Compound Fault Diagnosis of Motors in Partially Labeled Condition

The increasing complexity of rotating machinery and the diversity of operating conditions, such as rotating speed and varying torques, have amplified the challenges in fault diagnosis in scenarios requiring domain adaptation, particularly involving compound faults. This study addresses these challenges by introducing a novel multi-output classification (MOC) framework tailored for domain adaptation in partially labeled target datasets. Unlike conventional multi-class classification (MCC) approaches, the MOC framework classifies the severity levels of compound faults simultaneously. Furthermore, we explore various single-task and multi-task architectures applicable to the MOC formulation-including shared trunk and cross-talk-based designs-for compound fault diagnosis under partially labeled conditions. Based on this investigation, we propose a novel cross-talk architecture, residual neural dimension reductor (RNDR), that enables selective information sharing across diagnostic tasks, effectively enhancing classification performance in compound fault scenarios. In addition, frequency-layer normalization was incorporated to improve domain adaptation performance on motor vibration data. Compound fault conditions were implemented using a motor-based test setup and evaluated across six domain adaptation scenarios. The experimental results demonstrate its superior macro F1 performance compared to baseline models. We further showed that the structural advantage of RNDR is more pronounced in compound fault settings through a single-fault comparison. We also found that frequency-layer normalization fits the fault diagnosis task better than conventional methods. Lastly, we analyzed the RNDR with various conditions, other models with increased number of parameters, and compared with the ablated RNDR structure.

Updated: 2026-01-03 12:34:36

标题: 使用交叉连接结构进行多输出分类，用于部分标注条件下的电机复合故障诊断

摘要: 旋转机械日益复杂，操作条件多样，如旋转速度和变化扭矩，增加了在需要领域适应的情况下进行故障诊断的挑战，特别是涉及复合故障的情况。本研究通过引入一种针对部分标记目标数据集的新型多输出分类（MOC）框架来解决这些挑战。与传统的多类分类（MCC）方法不同，MOC框架同时对复合故障的严重程度进行分类。此外，我们探讨了适用于MOC公式的各种单任务和多任务架构，包括共享主干和基于交叉对话的设计，用于部分标记条件下的复合故障诊断。基于这项调查，我们提出了一种新颖的交叉对话架构，残差神经维度缩减器（RNDR），它能够实现跨诊断任务的选择性信息共享，有效提高了复合故障场景中的分类性能。此外，还将频率层标准化纳入以提高在电机振动数据上的领域适应性能。使用基于电机的测试装置实施了复合故障条件，并评估了六种领域适应场景。实验结果表明，与基准模型相比，其优越的宏F1性能。我们进一步表明，RNDR的结构优势在复合故障设置中更为显著，通过单一故障比较。我们还发现，频率层标准化比传统方法更适合故障诊断任务。最后，我们分析了RNDR在各种条件下的情况，与增加参数数量的其他模型进行比较，并与去除RNDR结构的模型进行比较。

更新时间: 2026-01-03 12:34:36

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2505.24001v4

"FRAME: Forward Recursive Adaptive Model Extraction-A Technique for Advance Feature Selection"

The challenges in feature selection, particularly in balancing model accuracy, interpretability, and computational efficiency, remain a critical issue in advancing machine learning methodologies. To address these complexities, this study introduces a novel hybrid approach, the Forward Recursive Adaptive Model Extraction Technique (FRAME), which combines Forward Selection and Recursive Feature Elimination (RFE) to enhance feature selection across diverse datasets. By combining the exploratory capabilities of Forward Selection with the refinement strengths of RFE, FRAME systematically identifies optimal feature subsets, striking a harmonious trade-off between experimentation and precision. A comprehensive evaluation of FRAME is conducted against traditional methods such as SelectKBest and Lasso Regression, using high-dimensional, noisy, and heterogeneous datasets. The results demonstrate that FRAME consistently delivers superior predictive performance based on downstream machine learning evaluation metrics. It efficiently performs dimensionality reduction with strong model performance, thus being especially useful for applications that need interpretable and accurate predictions, e.g., biomedical diagnostics. This research emphasizes the need to evaluate feature selection techniques on diverse datasets to test their robustness and generalizability. The results indicate that FRAME has great potential for further development, especially by incorporating deep learning frameworks for adaptive and real-time feature selection in dynamic settings. By advancing feature selection methodologies, FRAME offers a practical and effective solution to improve machine learning applications across multiple domains.

Updated: 2026-01-03 12:21:10

标题: FRAME：前向递归自适应模型提取-一种用于高级特征选择的技术

摘要: 特征选择中的挑战，特别是在平衡模型准确性、可解释性和计算效率方面，仍然是推进机器学习方法的关键问题。为了解决这些复杂性，本研究引入了一种新颖的混合方法，即前向递归自适应模型提取技术（FRAME），它结合了前向选择和递归特征消除（RFE）以增强跨不同数据集的特征选择。通过结合前向选择的探索能力和RFE的改进优势，FRAME系统地识别出最佳特征子集，实现了实验和精度之间的和谐权衡。对FRAME进行了全面评估，与传统方法如SelectKBest和Lasso回归进行比较，使用高维、嘈杂和异质数据集。结果表明，FRAME始终基于下游机器学习评估指标提供卓越的预测性能。它有效地进行降维，模型性能强大，因此在需要可解释和准确预测的应用，如生物医学诊断中尤为有用。本研究强调了在不同数据集上评估特征选择技术以测试其稳健性和泛化能力的必要性。结果表明，FRAME在进一步发展方面具有巨大潜力，特别是通过将深度学习框架纳入动态环境中的自适应和实时特征选择。通过推进特征选择方法，FRAME提供了一种实用且有效的解决方案，以改进跨多个领域的机器学习应用。

更新时间: 2026-01-03 12:21:10

领域: cs.LG

下载: http://arxiv.org/abs/2501.11972v3

Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models

Categorical data are prevalent in domains such as healthcare, marketing, and bioinformatics, where clustering serves as a fundamental tool for pattern discovery. A core challenge in categorical data clustering lies in measuring similarity among attribute values that lack inherent ordering or distance. Without appropriate similarity measures, values are often treated as equidistant, creating a semantic gap that obscures latent structures and degrades clustering quality. Although existing methods infer value relationships from within-dataset co-occurrence patterns, such inference becomes unreliable when samples are limited, leaving the semantic context of the data underexplored. To bridge this gap, we present ARISE (Attention-weighted Representation with Integrated Semantic Embeddings), which draws on external semantic knowledge from Large Language Models (LLMs) to construct semantic-aware representations that complement the metric space of categorical data for accurate clustering. That is, LLM is adopted to describe attribute values for representation enhancement, and the LLM-enhanced embeddings are combined with the original data to explore semantically prominent clusters. Experiments on eight benchmark datasets demonstrate consistent improvements over seven representative counterparts, with gains of 19-27%. Code is available at https://github.com/develop-yang/ARISE

Updated: 2026-01-03 11:37:46

标题: 通过大型语言模型弥合分类数据聚类的语义鸿沟

摘要: 分类数据在诸如医疗保健、营销和生物信息学等领域中普遍存在，其中聚类作为模式发现的基本工具。分类数据聚类中的一个核心挑战在于衡量缺乏固有顺序或距离的属性值之间的相似性。缺乏适当的相似性度量，值通常被视为等距离，造成语义差距，模糊潜在结构并降低聚类质量。尽管现有方法从数据集内的共现模式推断值之间的关系，但当样本有限时，这种推断变得不可靠，使数据的语义背景得不到充分探索。为了弥补这一差距，我们提出了ARISE（带有集成语义嵌入的注意力加权表示），它利用大型语言模型（LLM）的外部语义知识构建语义感知表示，以补充分类数据的度量空间，从而实现准确的聚类。也就是说，LLM被采用来描述属性值以增强表示，而LLM增强的嵌入结合原始数据，探索语义突出的聚类。对八个基准数据集的实验证明，ARISE相对于七个代表性对照方法实现了一致的改进，增益为19-27%。代码可在https://github.com/develop-yang/ARISE找到。

更新时间: 2026-01-03 11:37:46

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2601.01162v1

Controllable Flow Matching for Online Reinforcement Learning

Model-based reinforcement learning (MBRL) typically relies on modeling environment dynamics for data efficiency. However, due to the accumulation of model errors over long-horizon rollouts, such methods often face challenges in maintaining modeling stability. To address this, we propose CtrlFlow, a trajectory-level synthetic method using conditional flow matching (CFM), which directly modeling the distribution of trajectories from initial states to high-return terminal states without explicitly modeling the environment transition function. Our method ensures optimal trajectory sampling by minimizing the control energy governed by the non-linear Controllability Gramian Matrix, while the generated diverse trajectory data significantly enhances the robustness and cross-task generalization of policy learning. In online settings, CtrlFlow demonstrates the better performance on common MuJoCo benchmark tasks than dynamics models and achieves superior sample efficiency compared to standard MBRL methods.

Updated: 2026-01-03 11:29:10

标题: 在线强化学习的可控流匹配

摘要: 基于模型的强化学习（MBRL）通常依赖于对环境动态进行建模以提高数据效率。然而，由于长期回放中模型误差的积累，这种方法通常面临着保持建模稳定性的挑战。为了解决这个问题，我们提出了CtrlFlow，一种使用条件流匹配（CFM）的轨迹级合成方法，直接对从初始状态到高回报终端状态的轨迹分布进行建模，而无需明确建模环境转换函数。我们的方法通过最小化由非线性可控性Gram矩阵控制的控制能量来确保最佳轨迹采样，同时生成的多样化轨迹数据显著增强了策略学习的稳健性和跨任务泛化能力。在在线设置中，CtrlFlow在常见的MuJoCo基准任务上表现出比动力学模型更好的性能，并且与标准的MBRL方法相比，达到了更高的样本效率。

更新时间: 2026-01-03 11:29:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2511.06816v2

SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification

Disclaimer: Samples in this paper may be harmful and cause discomfort. Multimodal large language models (MLLMs) enable multimodal generation but inherit toxic, biased, and NSFW signals from weakly curated pretraining corpora, causing safety risks, especially under adversarial triggers that late, opaque training-free detoxification methods struggle to handle. We propose SGM, a white-box neuron-level multimodal intervention that acts like safety glasses for toxic neurons: it selectively recalibrates a small set of toxic expert neurons via expertise-weighted soft suppression, neutralizing harmful cross-modal activations without any parameter updates. We establish MM-TOXIC-QA, a multimodal toxicity evaluation framework, and compare SGM with existing detoxification techniques. Experiments on open-source MLLMs show that SGM mitigates toxicity in standard and adversarial conditions, cutting harmful rates from 48.2\% to 2.5\% while preserving fluency and multimodal reasoning. SGM is extensible, and its combined defenses, denoted as SGM*, integrate with existing detoxification methods for stronger safety performance, providing an interpretable, low-cost solution for toxicity-controlled multimodal generation.

Updated: 2026-01-03 11:27:17

标题: SGM：通过神经元级排毒实现多模态大型语言模型的安全眼镜

摘要: 免责声明：本文中的样本可能会有害并引起不适。多模态大语言模型（MLLMs）可以实现多模态生成，但从弱监督的预训练语料库中继承有毒、偏见和不安全的信号，导致安全风险，特别是在对抗性触发器下，后期、不透明的无训练脱毒方法难以处理。我们提出了SGM，这是一个白盒子神经元级多模态干预方法，类似于对有毒神经元的安全眼镜：它通过专家加权软抑制有选择地重新校准一小组有毒专家神经元，中和有害的跨模态激活，而不需要任何参数更新。我们建立了MM-TOXIC-QA，一个多模态毒性评估框架，并将SGM与现有的脱毒技术进行比较。在开源MLLMs上进行的实验表明，SGM在标准和对抗条件下减轻了毒性，将有害率从48.2%降低到2.5%，同时保持了流畅性和多模态推理。SGM是可扩展的，它的综合防御措施，标记为SGM*，与现有的脱毒方法集成，提供了更强大的安全性能，为受毒性控制的多模态生成提供了一个可解释的、低成本的解决方案。

更新时间: 2026-01-03 11:27:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2512.15052v2

Gradient-Free Approaches is a Key to an Efficient Interaction with Markovian Stochasticity

This paper deals with stochastic optimization problems involving Markovian noise with a zero-order oracle. We present and analyze a novel derivative-free method for solving such problems in strongly convex smooth and non-smooth settings with both one-point and two-point feedback oracles. Using a randomized batching scheme, we show that when mixing time $τ$ of the underlying noise sequence is less than the dimension of the problem $d$, the convergence estimates of our method do not depend on $τ$. This observation provides an efficient way to interact with Markovian stochasticity: instead of invoking the expensive first-order oracle, one should use the zero-order oracle. Finally, we complement our upper bounds with the corresponding lower bounds. This confirms the optimality of our results.

Updated: 2026-01-03 11:27:07

标题: 无梯度方法是与马尔科夫随机性有效交互的关键。

摘要: 这篇论文涉及具有零阶Oracle的Markovian噪声的随机优化问题。我们提出并分析了一种新颖的无导数方法，用于在强凸光滑和非光滑设置中解决这类问题，同时具有单点和双点反馈Oracle。通过使用随机批处理方案，我们证明了当底层噪声序列的混合时间$τ$小于问题维度$d$时，我们方法的收敛估计不依赖于$τ$。这一观察结果提供了与马尔可夫随机性交互的有效方法：不需要调用昂贵的一阶Oracle，可以使用零阶Oracle。最后，我们用相应的下界补充了我们的上界。这证实了我们结果的最优性。

更新时间: 2026-01-03 11:27:07

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2601.01160v1

Evo-TFS: Evolutionary Time-Frequency Domain-Based Synthetic Minority Oversampling Approach to Imbalanced Time Series Classification

Time series classification is a fundamental machine learning task with broad real-world applications. Although many deep learning methods have proven effective in learning time-series data for classification, they were originally developed under the assumption of balanced data distributions. Once data distribution is uneven, these methods tend to ignore the minority class that is typically of higher practical significance. Oversampling methods have been designed to address this by generating minority-class samples, but their reliance on linear interpolation often hampers the preservation of temporal dynamics and the generation of diverse samples. Therefore, in this paper, we propose Evo-TFS, a novel evolutionary oversampling method that integrates both time- and frequency-domain characteristics. In Evo-TFS, strongly typed genetic programming is employed to evolve diverse, high-quality time series, guided by a fitness function that incorporates both time-domain and frequency-domain characteristics. Experiments conducted on imbalanced time series datasets demonstrate that Evo-TFS outperforms existing oversampling methods, significantly enhancing the performance of time-domain and frequency-domain classifiers.

Updated: 2026-01-03 10:38:17

标题: Evo-TFS：基于进化时间频域的合成少数过采样方法用于不平衡时间序列分类

摘要: 时间序列分类是一项具有广泛实际应用的基础机器学习任务。虽然许多深度学习方法已被证明能够有效地学习时间序列数据用于分类，但它们最初是在数据分布平衡的假设下开发的。一旦数据分布不均匀，这些方法往往会忽视通常具有更高实际意义的少数类。过采样方法已被设计用于解决这个问题，通过生成少数类样本，但它们对线性插值的依赖常常阻碍了时间动态的保留和生成多样化样本。因此，在本文中，我们提出了Evo-TFS，一种集成了时间和频域特征的新型进化过采样方法。在Evo-TFS中，采用强类型遗传编程来演化多样化、高质量的时间序列，通过一个结合了时间域和频域特征的适应度函数来引导演化。在不平衡时间序列数据集上进行的实验表明，Evo-TFS优于现有的过采样方法，显著提高了时间域和频域分类器的性能。

更新时间: 2026-01-03 10:38:17

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2601.01150v1

Constructing and Benchmarking: a Labeled Email Dataset for Text-Based Phishing and Spam Detection Framework

Phishing and spam emails remain a major cybersecurity threat, with attackers increasingly leveraging Large Language Models (LLMs) to craft highly deceptive content. This study presents a comprehensive email dataset containing phishing, spam, and legitimate messages, explicitly distinguishing between human- and LLM-generated content. Each email is annotated with its category, emotional appeal (e.g., urgency, fear, authority), and underlying motivation (e.g., link-following, credential theft, financial fraud). We benchmark multiple LLMs on their ability to identify these emotional and motivational cues and select the most reliable model to annotate the full dataset. To evaluate classification robustness, emails were also rephrased using several LLMs while preserving meaning and intent. A state-of-the-art LLM was then assessed on its performance across both original and rephrased emails using expert-labeled ground truth. The results highlight strong phishing detection capabilities but reveal persistent challenges in distinguishing spam from legitimate emails. Our dataset and evaluation framework contribute to improving AI-assisted email security systems. To support open science, all code, templates, and resources are available on our project site.

Updated: 2026-01-03 10:37:31

标题: 构建和基准测试：一个用于基于文本的钓鱼和垃圾邮件检测框架的标记电子邮件数据集

摘要: 网络钓鱼和垃圾邮件仍然是主要的网络安全威胁，攻击者越来越多地利用大型语言模型（LLMs）来制作高度欺骗性的内容。本研究提供了一个包含网络钓鱼、垃圾邮件和合法消息的全面电子邮件数据集，明确区分人类生成和LLM生成的内容。每封电子邮件都标有其类别、情感吸引力（例如紧急性、恐惧、权威）和基本动机（例如跟踪链接、盗取凭据、金融欺诈）。我们对多个LLM进行了基准测试，评估它们识别这些情感和动机线索的能力，并选择最可靠的模型来注释整个数据集。为了评估分类的鲁棒性，电子邮件还使用几个LLM进行了重新表述，同时保留了意义和意图。然后，使用专家标记的真实地面实况，评估了最先进的LLM在原始和重新表述的电子邮件上的性能。结果突显了强大的网络钓鱼检测能力，但也揭示了区分垃圾邮件和合法邮件的持续挑战。我们的数据集和评估框架有助于改进AI辅助的电子邮件安全系统。为了支持开放科学，我们的项目网站提供了所有代码、模板和资源。

更新时间: 2026-01-03 10:37:31

领域: cs.CR,cs.AI,cs.DB

下载: http://arxiv.org/abs/2511.21448v2

Conformal Blindness: A Note on $A$-Cryptic change-points

Conformal Test Martingales (CTMs) are a standard method within the Conformal Prediction framework for testing the crucial assumption of data exchangeability by monitoring deviations from uniformity in the p-value sequence. Although exchangeability implies uniform p-values, the converse does not hold. This raises the question of whether a significant break in exchangeability can occur, such that the p-values remain uniform, rendering CTMs blind. We answer this affirmatively, demonstrating the phenomenon of \emph{conformal blindness}. Through explicit construction, for the theoretically ideal ``oracle'' conformity measure (given by the true conditional density), we demonstrate the possibility of an \emph{$A$-cryptic change-point} (where $A$ refers to the conformity measure). Using bivariate Gaussian distributions, we identify a line along which a change in the marginal means does not alter the distribution of the conformity scores, thereby producing perfectly uniform p-values. Simulations confirm that even a massive distribution shift can be perfectly cryptic to the CTM, highlighting a fundamental limitation and emphasising the critical role of the alignment of the conformity measure with potential shifts.

Updated: 2026-01-03 10:24:39

标题: 共形盲目：关于$A$-秘密变化点的注解

摘要: 共形测试鞅（CTMs）是共形预测框架中的一种标准方法，用于通过监测P值序列中与均匀性偏离来测试数据可交换性的关键假设。虽然可交换性意味着均匀的P值，但反之不成立。这引发了一个问题，即可交换性是否可能发生显著破坏，导致P值保持均匀，使CTMs失效。我们肯定地回答了这个问题，展示了“共形盲目”现象。通过明确构造，针对理论上理想的“神谕”一致性度量（由真实条件密度给出），我们展示了\emph{A-隐蔽变化点}（其中A指的是一致性度量）的可能性。利用双变量高斯分布，我们确定了一条线，沿着这条线，边际均值的改变不会改变一致性得分的分布，从而产生完全均匀的P值。模拟结果证实，即使是巨大的分布偏移也可能对CTM完全隐藏，突出了一个基本限制，并强调了一致性度量与潜在偏移的对齐在其中扮演的关键角色。

更新时间: 2026-01-03 10:24:39

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2601.01147v1

Self-Training the Neurochaos Learning Algorithm

In numerous practical applications, acquiring substantial quantities of labelled data is challenging and expensive, but unlabelled data is readily accessible. Conventional supervised learning methods frequently underperform in scenarios characterised by little labelled data or imbalanced datasets. This study introduces a hybrid semi-supervised learning (SSL) architecture that integrates Neurochaos Learning (NL) with a threshold-based Self-Training (ST) method to overcome this constraint. The NL architecture converts input characteristics into chaos-based ring-rate representations that encapsulate nonlinear relationships within the data, whereas ST progressively enlarges the labelled set utilising high-confidence pseudo-labelled samples. The model's performance is assessed using ten benchmark datasets and five machine learning classifiers, with 85% of the training data considered unlabelled and just 15% utilised as labelled data. The proposed Self-Training Neurochaos Learning (NL+ST) architecture consistently attains superior performance gain relative to standalone ST models, especially on limited, nonlinear and imbalanced datasets like Iris (188.66%), Wine (158.58%) and Glass Identification (110.48%). The results indicate that using chaos-based feature extraction with SSL improves generalisation, resilience, and classification accuracy in low-data contexts.

Updated: 2026-01-03 10:24:01

标题: 自我训练的神经混沌学习算法

摘要: 在许多实际应用中，获取大量标记数据往往是具有挑战性和昂贵的，但未标记数据易于获取。传统的监督学习方法在标记数据较少或数据集不平衡的情况下经常表现不佳。本研究引入了一种混合半监督学习（SSL）架构，将神经混沌学习（NL）与基于阈值的自训练（ST）方法结合起来，以克服这一限制。NL架构将输入特征转换为基于混沌环速表示的形式，这些表示包含了数据中的非线性关系，而ST则逐渐扩大标记集，利用高置信度的伪标记样本。使用十个基准数据集和五个机器学习分类器对模型的性能进行评估，其中85%的训练数据被视为未标记数据，只有15%被用作标记数据。提出的自训练神经混沌学习（NL+ST）架构相对于独立的ST模型始终获得更好的性能增益，特别是在有限、非线性和不平衡的数据集上，如鸢尾花（188.66%）、葡萄酒（158.58%）和玻璃识别（110.48%）数据集。结果表明，在SSL中使用基于混沌的特征提取可以改善在低数据环境中的泛化能力、韧性和分类准确性。

更新时间: 2026-01-03 10:24:01

领域: cs.LG

下载: http://arxiv.org/abs/2601.01146v1

A three-Level Framework for LLM-Enhanced eXplainable AI: From technical explanations to natural language

The growing application of artificial intelligence in sensitive domains has intensified the demand for systems that are not only accurate but also explainable and trustworthy. Although explainable AI (XAI) methods have proliferated, many do not consider the diverse audiences that interact with AI systems: from developers and domain experts to end-users and society. This paper addresses how trust in AI is influenced by the design and delivery of explanations and proposes a multilevel framework that aligns explanations with the epistemic, contextual, and ethical expectations of different stakeholders. The framework consists of three layers: algorithmic and domain-based, human-centered, and social explainability, with Large Language Models serving as crucial mediators that transform technical outputs of AI explanations into accessible, contextual narratives across all levels. We show how LLMs enable dynamic, conversational explanations that bridge the gap between complex model behavior and human understanding, facilitating interactive dialogue and enhancing societal transparency. Through comprehensive case studies, we show how this LLM-enhanced approach achieves technical fidelity, user engagement, and societal accountability, reframing XAI as a dynamic, trust-building process that leverages natural language capabilities to democratize AI explainability.

Updated: 2026-01-03 10:17:41

标题: 一个三级框架用于增强LLM可解释人工智能：从技术解释到自然语言

摘要: 人工智能在敏感领域的应用不断增长，加剧了对不仅准确而且可解释和可信赖系统的需求。尽管可解释人工智能（XAI）方法不断增加，但许多方法并未考虑与人工智能系统互动的各种受众：从开发人员和领域专家到最终用户和社会。本文探讨了AI解释如何受设计和交付的影响，并提出了一个多层框架，将解释与不同利益相关者的认识、语境和道德期望相一致。该框架包括三个层面：算法和领域为基础、以人为中心和社会解释性，其中大型语言模型作为关键中介，将AI解释的技术输出转化为所有层次上可访问、具有语境的叙述。我们展示了LLMs如何实现动态、对话式解释，弥合了复杂模型行为与人类理解之间的鸿沟，促进了互动对话，并增强了社会透明度。通过全面的案例研究，我们展示了这种LLM增强方法如何实现技术忠实度、用户参与度和社会责任，重新定义了XAI作为一种动态、建立信任的过程，利用自然语言能力来使AI解释能力民主化。

更新时间: 2026-01-03 10:17:41

领域: cs.AI

下载: http://arxiv.org/abs/2506.05887v2

GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks. GRAPHMOE employs a recurrent routing strategy to simulate iterative thinking steps, thereby facilitating the flow of information among expert nodes. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art (SOTA) performance. Additionally, this study explores a novel recurrent routing strategy that may inspire further advancements in enhancing the reasoning capabilities of language models.

Updated: 2026-01-03 10:08:45

标题: GRAPHMOE：通过引入自我反思机制来增强混合专家网络的认知深度

摘要: 传统的专家混合（MoE）网络通过利用多个较小的专家模型而不是单个大型网络获益。然而，这些专家通常是独立运作的，这就提出了一个问题，即连接这些模型是否能增强MoE网络的性能。为此，我们引入了GRAPHMOE，这是一种旨在通过基于伪图MoE网络构建的自我反思机制来增强语言模型的认知深度的新方法。GRAPHMOE采用循环路由策略来模拟迭代思考步骤，从而促进专家节点之间信息的流动。我们使用低秩适应技术（LoRA）实现了GRAPHMOE架构，并在各种基准数据集上进行了广泛实验。实验结果显示，GRAPHMOE优于其他基于LoRA的模型，实现了最先进的性能。此外，本研究探讨了一种可能激发语言模型推理能力进一步提升的新型循环路由策略。

更新时间: 2026-01-03 10:08:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2501.07890v4

AI-Powered Hybrid Intrusion Detection Framework for Cloud Security Using Novel Metaheuristic Optimization

Cybersecurity poses considerable problems to Cloud Computing (CC), especially regarding Intrusion Detection Systems (IDSs), facing difficulties with skewed datasets and suboptimal classification model performance. This study presents the Hybrid Intrusion Detection System (HyIDS), an innovative IDS that employs the Energy Valley Optimizer (EVO) for Feature Selection (FS). Additionally, it introduces a novel technique for enhancing the cybersecurity of cloud computing through the integration of machine learning methodologies with the EVO Algorithm. The Energy Valley Optimizer (EVO) effectively diminished features in the CIC-DDoS2019 dataset from 88 to 38 and in the CSE-CIC-IDS2018 data from 80 to 43, significantly enhancing computing efficiency. HyIDS incorporates four Machine Learning (ML) models: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (D_Tree), and K-Nearest Neighbors (KNN). The proposed HyIDS was assessed utilizing two real-world intrusion datasets, CIC-DDoS2019 and CSE-CIC-IDS2018, both distinguished by considerable class imbalances. The CIC-DDoS2019 dataset has a significant imbalance between DDoS assault samples and legal traffic, while the CSE-CIC-IDS2018 dataset primarily comprises benign traffic with insufficient representation of attack types, complicating the detection of minority attacks. A downsampling technique was employed to balance the datasets, hence improving detection efficacy for both benign and malicious traffic. Twenty-four trials were done, revealing substantial enhancements in categorization accuracy, precision, and recall. Our suggested D_TreeEVO model attained an accuracy rate of 99.13% and an F1 score of 98.94% on the CIC-DDoS2019 dataset, and an accuracy rate of 99.78% and an F1 score of 99.70% on the CSE-CIC-IDS2018 data. These data demonstrate that EVO significantly improves cybersecurity in Cloud Computing (CC).

Updated: 2026-01-03 09:42:28

标题: 基于新型元启发式优化的AI驱动混合入侵检测框架在云安全中的应用

摘要: 网络安全对云计算（CC）提出了重大问题，特别是在入侵检测系统（IDSs）方面，面临着数据倾斜和分类模型性能不佳的困难。本研究介绍了混合入侵检测系统（HyIDS），这是一种创新的IDS，采用能量谷优化器（EVO）进行特征选择（FS）。此外，它通过将机器学习方法与EVO算法相结合，引入了一种新颖的技术来增强云计算的网络安全性。能量谷优化器（EVO）有效地将CIC-DDoS2019数据集的特征从88个减少到38个，将CSE-CIC-IDS2018数据从80个减少到43个，显著提高了计算效率。HyIDS集成了四种机器学习（ML）模型：支持向量机（SVM）、随机森林（RF）、决策树（D_Tree）和K最近邻（KNN）。所提出的HyIDS利用两个真实的入侵数据集CIC-DDoS2019和CSE-CIC-IDS2018进行评估，两者都具有显著的类别不平衡。CIC-DDoS2019数据集中DDoS攻击样本和合法流量之间存在显著的不平衡，而CSE-CIC-IDS2018数据集主要包含良性流量，攻击类型的代表性不足，使得少数攻击的检测变得复杂。采用了一种降采样技术来平衡数据集，从而提高了对良性和恶意流量的检测效果。进行了24次试验，结果显示在分类准确度、精确度和召回率方面均有显著提高。我们提出的D_TreeEVO模型在CIC-DDoS2019数据集上获得了99.13%的准确率和98.94%的F1分数，在CSE-CIC-IDS2018数据上获得了99.78%的准确率和99.70%的F1分数。这些数据表明，EVO显著提高了云计算（CC）中的网络安全性。

更新时间: 2026-01-03 09:42:28

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2601.01134v1

Generating Diverse TSP Tours via a Combination of Graph Pointer Network and Dispersion

We address the Diverse Traveling Salesman Problem (D-TSP), a bi-criteria optimization challenge that seeks a set of $k$ distinct TSP tours. The objective requires every selected tour to have a length at most $c|T^*|$ (where $|T^*|$ is the optimal tour length) while minimizing the average Jaccard similarity across all tour pairs. This formulation is crucial for applications requiring both high solution quality and fault tolerance, such as logistics planning, robotics pathfinding or strategic patrolling. Current methods are limited: traditional heuristics, such as the Niching Memetic Algorithm (NMA) or bi-criteria optimization, incur high computational complexity $O(n^3)$, while modern neural approaches (e.g., RF-MA3S) achieve limited diversity quality and rely on complex, external mechanisms. To overcome these limitations, we propose a novel hybrid framework that decomposes D-TSP into two efficient steps. First, we utilize a simple Graph Pointer Network (GPN), augmented with an approximated sequence entropy loss, to efficiently sample a large, diverse pool of high-quality tours. This simple modification effectively controls the quality-diversity trade-off without complex external mechanisms. Second, we apply a greedy algorithm that yields a 2-approximation for the dispersion problem to select the final $k$ maximally diverse tours from the generated pool. Our results demonstrate state-of-the-art performance. On the Berlin instance, our model achieves an average Jaccard index of $0.015$, significantly outperforming NMA ($0.081$) and RF-MA3S. By leveraging GPU acceleration, our GPN structure achieves a near-linear empirical runtime growth of $O(n)$. While maintaining solution diversity comparable to complex bi-criteria algorithms, our approach is over 360 times faster on large-scale instances (783 cities), delivering high-quality TSP solutions with unprecedented efficiency and simplicity.

Updated: 2026-01-03 09:37:18

标题: 通过图指针网络和分散性的组合生成多样化的TSP旅游路线

摘要: 我们讨论了多样化旅行推销员问题（D-TSP），这是一个寻求一组$k$个不同TSP旅行路线的双标准优化挑战。目标要求每个选择的旅行路线的长度最多为$c|T^*|$（其中$|T^*|$是最优旅行路线的长度），同时最小化所有旅行路线对之间的平均Jaccard相似度。这种表述对于需要高解决方案质量和容错性的应用非常关键，如物流规划、机器人路径规划或战略巡逻。目前的方法有限：传统的启发式方法，如分群模拟算法（NMA）或双标准优化，会产生高计算复杂度$O(n^3)$，而现代神经方法（例如RF-MA3S）实现了有限的多样性质量，并依赖复杂的外部机制。为了克服这些限制，我们提出了一种新颖的混合框架，将D-TSP分解为两个高效的步骤。首先，我们利用一个简单的图指针网络（GPN），结合一个近似的序列熵损失，高效地抽样一个大而多样化的高质量旅行路线池。这个简单的修改有效地控制了质量和多样性之间的权衡，而无需复杂的外部机制。其次，我们运用一种贪婪算法，为选出的$k$个最大多样性的旅行路线从生成的池中选择一个2近似的离散问题的解。我们的结果展示了最先进的性能。在柏林实例中，我们的模型实现了平均Jaccard指数为$0.015$，明显优于NMA（$0.081$）和RF-MA3S。通过利用GPU加速，我们的GPN结构实现了接近线性的经验运行时间增长为$O(n)$。在保持与复杂的双标准算法可比的解决方案多样性的同时，我们的方法在大规模实例（783个城市）上快了360多倍，以前所未有的效率和简单性提供高质量的TSP解决方案。

更新时间: 2026-01-03 09:37:18

领域: cs.CG,cs.AI,cs.LG

下载: http://arxiv.org/abs/2601.01132v1

RovoDev Code Reviewer: A Large-Scale Online Evaluation of LLM-based Code Review Automation at Atlassian

Large Language Models (LLMs)-powered code review automation has the potential to transform code review workflows. Despite the advances of LLM-powered code review comment generation approaches, several practical challenges remain for designing enterprise-grade code review automation tools. In particular, this paper aims at answering the practical question: how can we design a review-guided, context-aware, quality-checked code review comment generation without fine-tuning? In this paper, we present RovoDev Code Reviewer, an enterprise-grade LLM-based code review automation tool designed and deployed at scale within Atlassian's development ecosystem with seamless integration into Atlassian's Bitbucket. Through the offline, online, user feedback evaluations over a one-year period, we conclude that RovoDev Code Reviewer is (1) effective in generating code review comments that could lead to code resolution for 38.70% (i.e., comments that triggered code changes in the subsequent commits); and (2) offers the promise of accelerating feedback cycles (i.e., decreasing the PR cycle time by 30.8%), alleviating reviewer workload (i.e., reducing the number of human-written comments by 35.6%), and improving overall software quality (i.e., finding errors with actionable suggestions).

Updated: 2026-01-03 09:27:56

标题: RovoDev代码审查员：Atlassian基于LLM的代码审查自动化的大规模在线评估

摘要: 大型语言模型（LLMs）驱动的代码审查自动化有潜力改变代码审查工作流程。尽管LLM驱动的代码审查评论生成方法取得了进展，但设计企业级代码审查自动化工具仍存在一些实际挑战。特别是，本文旨在回答一个实际问题：如何设计一个无需微调的审查引导、上下文感知、质量检查的代码审查评论生成方法？本文提出了RovoDev Code Reviewer，这是一个基于LLM的企业级代码审查自动化工具，在Atlassian的开发生态系统中进行了大规模设计和部署，并与Atlassian的Bitbucket进行了无缝集成。通过一年期间的离线、在线、用户反馈评估，我们得出结论：RovoDev Code Reviewer在生成可能导致代码解决方案的代码审查评论方面是有效的（即在随后的提交中触发了代码更改的评论占38.70%）；同时它有望加快反馈周期（即将PR周期时间缩短30.8%），减轻审阅者的工作负担（即将人工编写的评论数量减少35.6%），并提高整体软件质量（即发现具有可操作建议的错误）。

更新时间: 2026-01-03 09:27:56

领域: cs.SE,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2601.01129v1

COLT: Enhancing Video Large Language Models with Continual Tool Usage

The success of Large Language Models (LLMs) has significantly propelled the research of video understanding. To harvest the benefits of well-trained expert models (i.e., tools), video LLMs prioritize the exploration of tool usage capabilities. Existing methods either prompt closed-source LLMs or employ the instruction tuning paradigm for tool-use fine-tuning. These methods, however, assume an established repository of fixed tools and struggle to generalize to real-world environments where tool data is perpetually evolving and streaming in. To this end, we propose to enhance open-source video LLMs with COntinuaL Tool usage (termed COLT), which automatically acquires tool-use ability in a successive tool stream without suffering 'catastrophic forgetting' of the past learned tools. Specifically, our COLT incorporates a learnable tool codebook as a tool-specific memory system. Then relevant tools are dynamically selected based on the similarity between user instruction and tool features within the codebook. To unleash the tool usage potential of video LLMs, we collect a video-centric tool-use instruction tuning dataset VideoToolBench. Extensive experiments on both previous video LLM benchmarks and the tool-use-specific VideoToolBench dataset demonstrate the state-of-the-art performance of our proposed COLT.

Updated: 2026-01-03 09:21:25

标题: COLT: 利用持续工具使用增强视频大语言模型

摘要: 大型语言模型（LLMs）的成功显著推动了视频理解研究的发展。为了利用经过良好训练的专家模型（即工具）的好处，视频LLMs优先探索了工具使用能力。现有方法要么提示闭源LLMs，要么采用指导调整范式进行工具使用微调。然而，这些方法假设有一个固定工具库，并且难以推广到现实环境中，那里的工具数据不断发展和流动。因此，我们提出了用COntinuaL Tool usage（称为COLT）增强开源视频LLMs，它在连续的工具流中自动获得工具使用能力，而不会遗忘过去学习的工具，从而避免“灾难性遗忘”。具体来说，我们的COLT将可学习的工具代码簿作为特定工具的记忆系统。然后，根据用户指导和代码簿内工具特征之间的相似性动态选择相关工具。为了释放视频LLMs的工具使用潜力，我们收集了一个以视频为中心的工具使用指导调整数据集VideoToolBench。对先前的视频LLMs基准测试和工具使用特定的VideoToolBench数据集进行了大量实验，证明了我们提出的COLT的最先进性能。

更新时间: 2026-01-03 09:21:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2509.18754v3

Wittgenstein's Family Resemblance Clustering Algorithm

This paper, introducing a novel method in philomatics, draws on Wittgenstein's concept of family resemblance from analytic philosophy to develop a clustering algorithm for machine learning. According to Wittgenstein's Philosophical Investigations (1953), family resemblance holds that members of a concept or category are connected by overlapping similarities rather than a single defining property. Consequently, a family of entities forms a chain of items sharing overlapping traits. This philosophical idea naturally lends itself to a graph-based approach in machine learning. Accordingly, we propose the Wittgenstein's Family Resemblance (WFR) clustering algorithm and its kernel variant, kernel WFR. This algorithm computes resemblance scores between neighboring data instances, and after thresholding these scores, a resemblance graph is constructed. The connected components of this graph define the resulting clusters. Simulations on benchmark datasets demonstrate that WFR is an effective nonlinear clustering algorithm that does not require prior knowledge of the number of clusters or assumptions about their shapes.

Updated: 2026-01-03 09:16:51

标题: 维特根斯坦的家族相似性聚类算法

摘要: 这篇论文介绍了一种在文本分析领域中的新方法，借鉴了分析哲学家维特根斯坦关于家族相似性的概念，用于发展一种用于机器学习的聚类算法。根据维特根斯坦的《哲学研究》（1953年）的观点，家族相似性认为一个概念或类别的成员是通过重叠的相似性相连，而不是单一的定义性属性。因此，实体的家族形成了共享重叠特征的项目链。这种哲学观念自然地适合于基于图的机器学习方法。因此，我们提出了维特根斯坦家族相似性（WFR）聚类算法及其内核变体，内核WFR。该算法计算相邻数据实例之间的相似度分数，然后在阈值化这些分数后，构建一个相似性图。该图的连通组件定义了最终的聚类。在基准数据集上的模拟表明，WFR是一种有效的非线性聚类算法，不需要关于聚类数量或形状的先验知识。

更新时间: 2026-01-03 09:16:51

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2601.01127v1

Quantifying task-relevant representational similarity using decision variable correlation

Previous studies have compared neural activities in the visual cortex to representations in deep neural networks trained on image classification. Interestingly, while some suggest that their representations are highly similar, others argued the opposite. Here, we propose a new approach to characterize the similarity of the decision strategies of two observers (models or brains) using decision variable correlation (DVC). DVC quantifies the image-by-image correlation between the decoded decisions based on the internal neural representations in a classification task. Thus, it can capture task-relevant information rather than general representational alignment. We evaluate DVC using monkey V4/IT recordings and network models trained on image classification tasks. We find that model-model similarity is comparable to monkey-monkey similarity, whereas model-monkey similarity is consistently lower. Strikingly, DVC decreases with increasing network performance on ImageNet-1k. Adversarial training does not improve model-monkey similarity in task-relevant dimensions assessed using DVC, although it markedly increases the model-model similarity. Similarly, pre-training on larger datasets does not improve model-monkey similarity. These results suggest a divergence between the task-relevant representations in monkey V4/IT and those learned by models trained on image classification tasks.

Updated: 2026-01-03 09:05:21

标题: 使用决策变量相关性量化任务相关表征相似性

摘要: 先前的研究已经比较了视觉皮层中的神经活动与在图像分类训练中的深度神经网络的表示。有趣的是，一些研究表明它们的表示非常相似，而其他人则持相反观点。在这里，我们提出了一种新方法来表征两个观察者（模型或大脑）的决策策略的相似性，即利用决策变量相关性（DVC）。DVC量化了基于分类任务中的内部神经表示的解码决策之间的图像对图像相关性。因此，它可以捕捉任务相关信息而不是一般的表示对齐。我们使用猴子V4/IT记录和在图像分类任务上训练的网络模型来评估DVC。我们发现模型-模型的相似性与猴子-猴子的相似性是可比较的，而模型-猴子的相似性则一直较低。令人惊讶的是，DVC随着在ImageNet-1k上的网络性能增加而减少。对抗训练并未改善使用DVC评估的任务相关维度中模型-猴子的相似性，尽管它明显增加了模型-模型的相似性。同样，在更大数据集上进行预训练也没有改善模型-猴子的相似性。这些结果表明猴子V4/IT中的任务相关表示与在图像分类任务上训练的模型学到的表示之间存在分歧。

更新时间: 2026-01-03 09:05:21

领域: cs.CV,cs.LG,q-bio.NC,q-bio.QM

下载: http://arxiv.org/abs/2506.02164v2

What You Trust Is Insecure: Demystifying How Developers (Mis)Use Trusted Execution Environments in Practice

Trusted Execution Environments (TEEs), such as Intel SGX and ARM TrustZone, provide isolated regions of CPU and memory for secure computation and are increasingly used to protect sensitive data and code across diverse application domains. However, little is known about how developers actually use TEEs in practice. This paper presents the first large-scale empirical study of real-world TEE applications. We collected and analyzed 241 open-source projects from GitHub that utilize the two most widely-adopted TEEs, Intel SGX and ARM TrustZone. By combining manual inspection with customized static analysis scripts, we examined their adoption contexts, usage patterns, and development practices across three phases. First, we categorized the projects into 8 application domains and identified trends in TEE adoption over time. We found that the dominant use case is IoT device security (30%), which contrasts sharply with prior academic focus on blockchain and cryptographic systems (7%), while AI model protection (12%) is rapidly emerging as a growing domain. Second, we analyzed how TEEs are integrated into software and observed that 32.4% of the projects reimplement cryptographic functionalities instead of using official SDK APIs, suggesting that current SDKs may have limited usability and portability to meet developers' practical needs. Third, we examined security practices through manual inspection and found that 25.3% (61 of 241) of the projects exhibit insecure coding behaviors when using TEEs, such as hardcoded secrets and missing input validation, which undermine their intended security guarantees. Our findings have important implications for improving the usability of TEE SDKs and supporting developers in trusted software development.

Updated: 2026-01-03 09:04:24

标题: 您信任的东西是不安全的：揭示开发人员如何在实践中（误）使用可信执行环境

摘要: 信任执行环境（TEEs），如英特尔SGX和ARM TrustZone，为安全计算提供了CPU和内存的隔离区域，并越来越多地被用于保护各种应用领域中的敏感数据和代码。然而，关于开发者实际如何使用TEEs的了解很少。本文介绍了对真实世界TEEs应用的第一次大规模经验研究。我们收集并分析了来自GitHub的241个开源项目，这些项目利用了两个最广泛采用的TEEs，即英特尔SGX和ARM TrustZone。通过将手工检查与定制的静态分析脚本相结合，我们研究了它们在三个阶段的采用背景、使用模式和开发实践。首先，我们将项目分类为8个应用领域，并识别了随时间推移TEE采用的趋势。我们发现，主要用例是物联网设备安全（30%），这与之前学术界对区块链和加密系统（7%）的关注形成鲜明对比，而AI模型保护（12%）正在迅速成为一个新兴领域。其次，我们分析了TEE是如何集成到软件中的，并观察到32.4%的项目重新实现了加密功能，而不是使用官方SDK API，这表明当前的SDK可能在可用性和可移植性方面存在局限，无法满足开发者的实际需求。第三，我们通过手工检查检验了安全实践，并发现241个项目中有25.3%（61个）在使用TEEs时表现出不安全的编码行为，如硬编码密钥和缺少输入验证，这削弱了它们的预期安全保证。我们的发现对改善TEE SDK的可用性和支持开发者进行可信软件开发具有重要意义。

更新时间: 2026-01-03 09:04:24

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2512.17363v2

Learning from Historical Activations in Graph Neural Networks

Graph Neural Networks (GNNs) have demonstrated remarkable success in various domains such as social networks, molecular chemistry, and more. A crucial component of GNNs is the pooling procedure, in which the node features calculated by the model are combined to form an informative final descriptor to be used for the downstream task. However, previous graph pooling schemes rely on the last GNN layer features as an input to the pooling or classifier layers, potentially under-utilizing important activations of previous layers produced during the forward pass of the model, which we regard as historical graph activations. This gap is particularly pronounced in cases where a node's representation can shift significantly over the course of many graph neural layers, and worsened by graph-specific challenges such as over-smoothing in deep architectures. To bridge this gap, we introduce HISTOGRAPH, a novel two-stage attention-based final aggregation layer that first applies a unified layer-wise attention over intermediate activations, followed by node-wise attention. By modeling the evolution of node representations across layers, our HISTOGRAPH leverages both the activation history of nodes and the graph structure to refine features used for final prediction. Empirical results on multiple graph classification benchmarks demonstrate that HISTOGRAPH offers strong performance that consistently improves traditional techniques, with particularly strong robustness in deep GNNs.

Updated: 2026-01-03 08:51:38

标题: 学习图神经网络中的历史激活

摘要: 图神经网络（GNNs）在社交网络、分子化学等多个领域展现出了显著的成功。GNNs的一个关键组成部分是池化过程，在该过程中，模型计算得到的节点特征被合并，形成一个信息丰富的最终描述符，用于下游任务。然而，先前的图池化方案依赖于最后一个GNN层的特征作为池化或分类器层的输入，潜在地未充分利用在模型前向传递过程中产生的先前层的重要激活，我们将其视为历史图激活。在节点表示在许多图神经网络层的过程中可能发生显着变化的情况下，以及由于深度架构中的图特定挑战（如过度平滑）而恶化，这种差距尤为显著。为了弥合这一差距，我们引入了HISTOGRAPH，这是一个新颖的基于注意力的两阶段最终聚合层，首先在中间激活上应用统一的逐层注意力，然后是节点级别的注意力。通过对节点表示在不同层之间的演变进行建模，我们的HISTOGRAPH利用节点的激活历史和图结构来优化用于最终预测的特征。在多个图分类基准测试上的实证结果表明，HISTOGRAPH提供了强大的性能，始终优于传统技术，尤其在深度GNNs中表现出强大的鲁棒性。

更新时间: 2026-01-03 08:51:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2601.01123v1

DeepFilter: A Transformer-style Framework for Accurate and Efficient Process Monitoring

The process monitoring task is characterized by stringent demands for accuracy and efficiency. Current transformer-based methods, characterized by self-attention for temporal fusion, exhibit limitations in accurately understanding the semantic context and efficiently processing monitoring logs, rendering them inadequate for process monitoring. To address these limitations, we introduce DeepFilter, which revises the self-attention mechanism to improve both accuracy and efficiency. As a straightforward yet versatile approach, DeepFilter provides an instrumental baseline for practitioners in process monitoring, whether initiating new projects or enhancing existing capabilities.

Updated: 2026-01-03 08:47:44

标题: DeepFilter：一种用于准确和高效过程监控的Transformer风格框架

摘要: 过程监控任务的特点是对准确性和效率有严格要求。当前基于变压器的方法，其特点是采用自注意力机制进行时间融合，存在准确理解语义上下文和高效处理监控日志的局限性，使其不适用于过程监控。为了解决这些局限性，我们引入了DeepFilter，通过修改自注意机制来提高准确性和效率。作为一种简单而多功能的方法，DeepFilter为从事过程监控的实践者提供了一个重要的基准，无论是启动新项目还是增强现有能力。

更新时间: 2026-01-03 08:47:44

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2501.01342v2

On the social bias of speech self-supervised models

Self-supervised learning (SSL) speech models have achieved remarkable performance in various tasks, yet the biased outcomes, especially affecting marginalized groups, raise significant concerns. Social bias refers to the phenomenon where algorithms potentially amplify disparate properties between social groups present in the data used for training. Bias in SSL models can perpetuate injustice by automating discriminatory patterns and reinforcing inequitable systems. This work reveals that prevalent SSL models inadvertently acquire biased associations. We probe how various factors, such as model architecture, size, and training methodologies, influence the propagation of social bias within these models. Finally, we explore the efficacy of debiasing SSL models through regularization techniques, specifically via model compression. Our findings reveal that employing techniques such as row-pruning and training wider, shallower models can effectively mitigate social bias within SSL model.

Updated: 2026-01-03 08:45:06

标题: 关于言语自监督模型的社会偏见

摘要: 自监督学习（SSL）语音模型在各种任务中取得了显著的性能，然而，出现的偏见结果，特别是影响到被边缘化的群体，引起了重大关注。社会偏见指的是算法可能放大在训练数据中存在的社会群体之间的差异性质的现象。SSL模型中的偏见可能通过自动化歧视模式和强化不公平系统来延续不公正。本文揭示了普遍存在的SSL模型无意中获得偏见关联。我们探讨了各种因素，如模型架构、大小和训练方法如何影响这些模型内社会偏见的传播。最后，我们通过正则化技术，特别是通过模型压缩，探讨了去偏置SSL模型的有效性。我们的研究结果表明，采用诸如行剪枝和训练更宽、更浅的模型等技术可以有效减轻SSL模型内的社会偏见。

更新时间: 2026-01-03 08:45:06

领域: eess.AS,cs.LG

下载: http://arxiv.org/abs/2406.04997v2

Community-Based Early-Stage Chronic Kidney Disease Screening using Explainable Machine Learning for Low-Resource Settings

Early detection of chronic kidney disease (CKD) is essential for preventing progression to end-stage renal disease. However, existing screening tools - primarily developed using populations from high-income countries - often underperform in Bangladesh and South Asia, where risk profiles differ. Most of these tools rely on simple additive scoring functions and are based on data from patients with advanced-stage CKD. Consequently, they fail to capture complex interactions among risk factors and are limited in predicting early-stage CKD. Our objective was to develop and evaluate an explainable machine learning (ML) framework for community-based early-stage CKD screening for low-resource settings, tailored to the Bangladeshi and South Asian population context. We used a community-based dataset from Bangladesh, the first such CKD dataset in South and South Asia, and evaluated twelve ML classifiers across multiple feature domains. Ten complementary feature selection techniques were applied to identify robust, generalizable predictors. The final models were assessed using 10-fold cross-validation. External validation was conducted on three independent datasets from India, the UAE, and Bangladesh. SHAP (SHapley Additive exPlanations) was used to provide model explainability. An ML model trained on an RFECV-selected feature subset achieved a balanced accuracy of 90.40%, whereas minimal non-pathology-test features demonstrated excellent predictive capability with a balanced accuracy of 89.23%, often outperforming larger or full feature sets. Compared with existing screening tools, the proposed models achieved substantially higher accuracy and sensitivity while requiring fewer and more accessible inputs. External validation confirmed strong generalizability with 78% to 98% sensitivity. SHAP interpretation identified clinically meaningful predictors consistent with established CKD risk factors.

Updated: 2026-01-03 08:43:35

标题: 基于社区的早期慢性肾病筛查，利用可解释的机器学习技术在资源匮乏环境中进行。

摘要: 慢性肾病（CKD）的早期检测对于预防疾病进展至终末期肾脏疾病至关重要。然而，现有的筛查工具 - 主要是使用高收入国家的人群开发的 - 在孟加拉国和南亚地区往往表现不佳，因为风险特征不同。大多数这些工具依赖于简单的加法评分函数，并且是基于晚期CKD患者的数据。因此，它们无法捕捉风险因素之间的复杂相互作用，并且在预测早期CKD方面受到限制。我们的目标是为低资源环境开发和评估一个可解释的机器学习（ML）框架，用于孟加拉国和南亚人口背景的社区早期CKD筛查。我们使用了来自孟加拉国的社区数据集，这是南亚地区第一个这样的CKD数据集，并评估了十二种ML分类器在多个特征域中的表现。应用了十种互补的特征选择技术来识别稳健、具有普适性的预测因子。最终模型通过10折交叉验证进行评估。对来自印度、阿联酋和孟加拉国的三个独立数据集进行了外部验证。SHAP（SHapley Additive exPlanations）用于提供模型可解释性。一个在RFECV选择的特征子集上训练的ML模型实现了90.40%的平衡准确度，而最小的非病理测试特征表现出优秀的预测能力，平衡准确度为89.23%，通常优于更大或完整的特征集。与现有的筛查工具相比，提出的模型在准确性和敏感性方面显著更高，同时需要更少且更易获取的输入。外部验证证实了78%至98%的敏感性。SHAP解释确定了与已知CKD风险因素一致的临床意义的预测因子。

更新时间: 2026-01-03 08:43:35

领域: cs.LG

下载: http://arxiv.org/abs/2601.01119v1

ScienceDB AI: An LLM-Driven Agentic Recommender System for Large-Scale Scientific Data Sharing Services

The rapid growth of AI for Science (AI4S) has underscored the significance of scientific datasets, leading to the establishment of numerous national scientific data centers and sharing platforms. Despite this progress, efficiently promoting dataset sharing and utilization for scientific research remains challenging. Scientific datasets contain intricate domain-specific knowledge and contexts, rendering traditional collaborative filtering-based recommenders inadequate. Recent advances in Large Language Models (LLMs) offer unprecedented opportunities to build conversational agents capable of deep semantic understanding and personalized recommendations. In response, we present ScienceDB AI, a novel LLM-driven agentic recommender system developed on Science Data Bank (ScienceDB), one of the largest global scientific data-sharing platforms. ScienceDB AI leverages natural language conversations and deep reasoning to accurately recommend datasets aligned with researchers' scientific intents and evolving requirements. The system introduces several innovations: a Scientific Intention Perceptor to extract structured experimental elements from complicated queries, a Structured Memory Compressor to manage multi-turn dialogues effectively, and a Trustworthy Retrieval-Augmented Generation (Trustworthy RAG) framework. The Trustworthy RAG employs a two-stage retrieval mechanism and provides citable dataset references via Citable Scientific Task Record (CSTR) identifiers, enhancing recommendation trustworthiness and reproducibility. Through extensive offline and online experiments using over 10 million real-world datasets, ScienceDB AI has demonstrated significant effectiveness. To our knowledge, ScienceDB AI is the first LLM-driven conversational recommender tailored explicitly for large-scale scientific dataset sharing services. The platform is publicly accessible at: https://ai.scidb.cn/en.

Updated: 2026-01-03 08:42:53

标题: ScienceDB AI：一种基于LLM驱动的大规模科学数据共享服务的代理推荐系统

摘要: 人工智能在科学领域的快速增长突显了科学数据集的重要性，导致建立了许多国家科学数据中心和共享平台。尽管取得了进展，但有效促进科学研究中数据集的共享和利用仍具挑战性。科学数据集包含复杂的领域特定知识和背景，传统的基于协同过滤的推荐系统不足以胜任。最近大语言模型（LLMs）的进步为构建能够深度理解语义并个性化推荐的对话代理提供了前所未有的机会。为此，我们提出了ScienceDB AI，这是一个基于LLM开发的新型代理推荐系统，部署在全球最大的科学数据共享平台之一Science Data Bank（ScienceDB）上。ScienceDB AI利用自然语言对话和深度推理准确推荐与研究人员科学意图和不断发展需求相符的数据集。该系统引入了几项创新：科学意图感知器从复杂查询中提取结构化实验要素，结构化记忆压缩器有效管理多轮对话，以及一个值得信赖的检索增强生成（Trustworthy RAG）框架。Trustworthy RAG采用两阶段检索机制，并通过可引用的科学任务记录（CSTR）标识符提供可信的数据集参考，增强了推荐的可信度和可重复性。通过对超过1000万真实世界数据集的广泛离线和在线实验，ScienceDB AI展示了显著的有效性。据我们所知，ScienceDB AI是专门针对大规模科学数据集共享服务的首个LLM驱动的对话推荐系统。该平台可以在https://ai.scidb.cn/en上公开访问。

更新时间: 2026-01-03 08:42:53

领域: cs.IR,cs.AI,cs.DL

下载: http://arxiv.org/abs/2601.01118v1

A Practitioner's Guide to Kolmogorov-Arnold Networks

Kolmogorov-Arnold Networks (KANs), whose design is inspired-rather than dictated-by the Kolmogorov superposition theorem, have emerged as a structured alternative to MLPs. This review provides a systematic and comprehensive overview of the rapidly expanding KAN literature. The review is organized around three core themes: (i) clarifying the relationships between KANs and Kolmogorov superposition theory (KST), MLPs, and classical kernel methods; (ii) analyzing basis functions as a central design axis; and (iii) summarizing recent advances in accuracy, efficiency, regularization, and convergence. Finally, we provide a practical "Choose-Your-KAN" guide and outline open research challenges and future directions. The accompanying GitHub repository serves as a structured reference for ongoing KAN research.

Updated: 2026-01-03 08:37:38

标题: 一个从业者的指南：科尔莫戈洛夫-阿诺尔德网络

摘要: 科尔莫戈洛夫-阿诺德网络（KANs）的设计受到科尔莫戈洛夫叠加定理的启发，而不是被其规定，已经成为MLPs的结构化替代方案。本综述提供了对迅速扩展的KAN文献的系统和全面的概述。该综述围绕三个核心主题组织：（i）澄清KANs与科尔莫戈洛夫叠加理论（KST）、MLPs和经典核方法之间的关系；（ii）分析基函数作为中心设计轴；以及（iii）总结最近在准确性、效率、正则化和收敛方面的进展。最后，我们提供了一个实用的“选择您的KAN”指南，并概述了开放的研究挑战和未来方向。附带的GitHub存储库可以作为进行中的KAN研究的结构化参考。

更新时间: 2026-01-03 08:37:38

领域: cs.LG,cs.AI,cs.NE,math.NA

下载: http://arxiv.org/abs/2510.25781v3

Contrastive Self-Supervised Learning As Neural Manifold Packing

Contrastive self-supervised learning based on point-wise comparisons has been widely studied for vision tasks. In the visual cortex of the brain, neuronal responses to distinct stimulus classes are organized into geometric structures known as neural manifolds. Accurate classification of stimuli can be achieved by effectively separating these manifolds, akin to solving a packing problem. We introduce Contrastive Learning As Manifold Packing (CLAMP), a self-supervised framework that recasts representation learning as a manifold packing problem. CLAMP introduces a loss function inspired by the potential energy of short-range repulsive particle systems, such as those encountered in the physics of simple liquids and jammed packings. In this framework, each class consists of sub-manifolds embedding multiple augmented views of a single image. The sizes and positions of the sub-manifolds are dynamically optimized by following the gradient of a packing loss. This approach yields interpretable dynamics in the embedding space that parallel jamming physics, and introduces geometrically meaningful hyperparameters within the loss function. Under the standard linear evaluation protocol, which freezes the backbone and trains only a linear classifier, CLAMP achieves competitive performance with state-of-the-art self-supervised models. Furthermore, our analysis reveals that neural manifolds corresponding to different categories emerge naturally and are effectively separated in the learned representation space, highlighting the potential of CLAMP to bridge insights from physics, neural science, and machine learning.

Updated: 2026-01-03 08:30:22

标题: 对比自监督学习作为神经流形打包

摘要: 基于逐点比较的对比自监督学习已被广泛应用于视觉任务。在大脑的视觉皮层中，对不同刺激类别的神经元响应被组织成称为神经流形的几何结构。通过有效地分离这些流形，类别的准确分类可以被实现，类似于解决一种装箱问题。我们引入了对比学习作为流形装箱（CLAMP）的自监督框架，将表示学习重新构建为一种流形装箱问题。CLAMP引入了一种受短程排斥粒子系统的势能启发的损失函数，比如在简单液体和堵塞包装物理学中遇到的那种。在这个框架中，每个类别由嵌入多个单个图像的增强视图的子流形组成。子流形的大小和位置通过遵循装箱损失的梯度动态优化。这种方法在嵌入空间中产生可解释的动态，与堵塞物理学相似，并在损失函数中引入了几何意义上有意义的超参数。在标准线性评估协议下，冻结主干并仅训练线性分类器，CLAMP实现了与最先进的自监督模型竞争性能。此外，我们的分析显示，对应于不同类别的神经流形自然产生，并在学习的表示空间中得到有效分离，突显了CLAMP在物理学、神经科学和机器学习之间搭建见解的潜力。

更新时间: 2026-01-03 08:30:22

领域: cs.LG,cs.AI,q-bio.NC,stat.ML

下载: http://arxiv.org/abs/2506.13717v2

NADD: Amplifying Noise for Effective Diffusion-based Adversarial Purification

The strategy of combining diffusion-based generative models with classifiers continues to demonstrate state-of-the-art performance on adversarial robustness benchmarks. Known as adversarial purification, this exploits a diffusion model's capability of identifying high density regions in data distributions to purify adversarial perturbations from inputs. However, existing diffusion-based purification defenses are impractically slow and limited in robustness due to the low levels of noise used in the diffusion process. This low noise design aims to preserve the semantic features of the original input, thereby minimizing utility loss for benign inputs. Our findings indicate that systematic amplification of noise throughout the diffusion process improves the robustness of adversarial purification. However, this approach presents a key challenge, as noise levels cannot be arbitrarily increased without risking distortion of the input. To address this key problem, we introduce high levels of noise during the forward process and propose the ring proximity correction to gradually eliminate adversarial perturbations whilst closely preserving the original data sample. As a second contribution, we propose a new stochastic sampling method which introduces additional noise during the reverse diffusion process to dilute adversarial perturbations. Without relying on gradient obfuscation, these contributions result in a new robustness accuracy record of 44.23% on ImageNet using AutoAttack ($\ell_{\infty}=4/255$), an improvement of +2.07% over the previous best work. Furthermore, our method reduces inference time to 1.08 seconds per sample on ImageNet, a $47\times$ improvement over the existing state-of-the-art approach, making it far more practical for real-world defensive scenarios.

Updated: 2026-01-03 08:10:43

标题: NADD：增强噪声以实现基于扩散的对抗净化

摘要: 将扩散型生成模型与分类器结合的策略在对抗性鲁棒性基准测试中继续展示出最先进的性能。被称为对抗净化的方法利用扩散模型识别数据分布中的高密度区域，以净化输入中的对抗扰动。然而，现有基于扩散的净化防御方法由于扩散过程中使用的噪音水平较低，导致速度慢且鲁棒性有限。这种低噪音设计旨在保留原始输入的语义特征，从而最小化良性输入的效用损失。我们的研究结果表明，在整个扩散过程中系统地增加噪音可以提高对抗净化的鲁棒性。然而，这种方法面临一个关键挑战，即不能任意增加噪音水平，否则会使输入失真。为解决这一关键问题，我们在前向过程中引入高水平的噪音，并提出环近似修正方法逐渐消除对抗性扰动，同时尽可能保留原始数据样本。作为第二个贡献，我们提出了一种新的随机抽样方法，在反向扩散过程中引入额外的噪音来稀释对抗性扰动。这些贡献在ImageNet数据集上使用AutoAttack ($\ell_{\infty}=4/255$)取得了44.23%的新的鲁棒性准确率记录，比之前最佳结果提高了2.07%。此外，我们的方法将在ImageNet上每个样本的推理时间减少到1.08秒，比现有最先进方法提高了47倍，使其更加实用于实际的防御场景。

更新时间: 2026-01-03 08:10:43

领域: cs.CR

下载: http://arxiv.org/abs/2601.01109v1

Evolving CNN Architectures: From Custom Designs to Deep Residual Models for Diverse Image Classification and Detection Tasks

This paper presents a comparative study of a custom convolutional neural network (CNN) architecture against widely used pretrained and transfer learning CNN models across five real-world image datasets. The datasets span binary classification, fine-grained multiclass recognition, and object detection scenarios. We analyze how architectural factors, such as network depth, residual connections, and feature extraction strategies, influence classification and localization performance. The results show that deeper CNN architectures provide substantial performance gains on fine-grained multiclass datasets, while lightweight pretrained and transfer learning models remain highly effective for simpler binary classification tasks. Additionally, we extend the proposed architecture to an object detection setting, demonstrating its adaptability in identifying unauthorized auto-rickshaws in real-world traffic scenes. Building upon a systematic analysis of custom CNN architectures alongside pretrained and transfer learning models, this study provides practical guidance for selecting suitable network designs based on task complexity and resource constraints.

Updated: 2026-01-03 07:45:08

标题: CNN架构的演变：从定制设计到深度残差模型，用于多样化图像分类和检测任务

摘要: 本文提出了一个定制的卷积神经网络（CNN）架构与广泛使用的预训练和迁移学习CNN模型在五个真实世界图像数据集上的比较研究。这些数据集涵盖了二元分类、细粒度多类别识别和目标检测场景。我们分析了网络深度、残差连接和特征提取策略等架构因素如何影响分类和定位性能。结果表明，在细粒度多类别数据集上，更深的CNN架构提供了显著的性能提升，而轻量级的预训练和迁移学习模型对简单的二元分类任务仍然非常有效。此外，我们将所提出的架构扩展到目标检测设置，展示了其在识别真实交通场景中的未经授权的三轮车方面的适应性。通过对定制CNN架构以及预训练和迁移学习模型的系统分析，本研究为根据任务复杂性和资源限制选择合适的网络设计提供了实用指导。

更新时间: 2026-01-03 07:45:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2601.01099v1

Neural Networks on Symmetric Spaces of Noncompact Type

Recent works have demonstrated promising performances of neural networks on hyperbolic spaces and symmetric positive definite (SPD) manifolds. These spaces belong to a family of Riemannian manifolds referred to as symmetric spaces of noncompact type. In this paper, we propose a novel approach for developing neural networks on such spaces. Our approach relies on a unified formulation of the distance from a point to a hyperplane on the considered spaces. We show that some existing formulations of the point-to-hyperplane distance can be recovered by our approach under specific settings. Furthermore, we derive a closed-form expression for the point-to-hyperplane distance in higher-rank symmetric spaces of noncompact type equipped with G-invariant Riemannian metrics. The derived distance then serves as a tool to design fully-connected (FC) layers and an attention mechanism for neural networks on the considered spaces. Our approach is validated on challenging benchmarks for image classification, electroencephalogram (EEG) signal classification, image generation, and natural language inference.

Updated: 2026-01-03 07:26:39

标题: 非紧型对称空间上的神经网络

摘要: 最近的研究表明，神经网络在双曲空间和对称正定（SPD）流形上表现出有希望的性能。这些空间属于称为非紧致对称空间的黎曼流形家族。在本文中，我们提出了一种在这些空间上开发神经网络的新方法。我们的方法依赖于在考虑的空间上从点到超平面的距离的统一公式。我们展示了在特定设置下，一些现有的点到超平面距离的公式可以通过我们的方法恢复出来。此外，我们推导了在配备G不变黎曼度量的非紧致型高秩对称空间中的点到超平面距离的闭合形式表达式。推导出的距离然后作为一种工具，用于设计在考虑的空间上的全连接（FC）层和注意力机制的神经网络。我们的方法在图像分类、脑电信号分类、图像生成和自然语言推理等具有挑战性的基准测试中得到了验证。

更新时间: 2026-01-03 07:26:39

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2601.01097v1

NarrativeTrack: Evaluating Video Language Models Beyond the Frame

Multimodal large language models (MLLMs) have achieved impressive progress in vision-language reasoning, yet their ability to understand temporally unfolding narratives in videos remains underexplored. True narrative understanding requires grounding who is doing what, when, and where, maintaining coherent entity representations across dynamic visual and temporal contexts. We introduce NarrativeTrack, the first benchmark to evaluate narrative understanding in MLLMs through fine-grained entity-centric reasoning. Unlike existing benchmarks limited to short clips or coarse scene-level semantics, we decompose videos into constituent entities and examine their continuity via a Compositional Reasoning Progression (CRP), a structured evaluation framework that progressively increases narrative complexity across three dimensions: entity existence, entity changes, and entity ambiguity. CRP challenges models to advance from temporal persistence to contextual evolution and fine-grained perceptual reasoning. A fully automated entity-centric pipeline enables scalable extraction of temporally grounded entity representations, providing the foundation for CRP. Evaluations of state-of-the-art MLLMs reveal that models fail to robustly track entities across visual transitions and temporal dynamics, often hallucinating identity under context shifts. Open-source general-purpose MLLMs exhibit strong perceptual grounding but weak temporal coherence, while video-specific MLLMs capture temporal context yet hallucinate entity's contexts. These findings uncover a fundamental trade-off between perceptual grounding and temporal reasoning, indicating that narrative understanding emerges only from their integration. NarrativeTrack provides the first systematic framework to diagnose and advance temporally grounded narrative comprehension in MLLMs.

Updated: 2026-01-03 07:12:55

标题: 叙述轨道：超越帧的视频语言模型评估

摘要: 多模态大型语言模型（MLLMs）在视觉-语言推理方面取得了令人印象深刻的进展，然而它们理解视频中时间展开的叙事能力仍未得到充分探索。真正的叙事理解需要将谁在何时何地做什么进行基础化，保持跨动态视觉和时间上下文的连贯实体表征。我们引入了NarrativeTrack，这是第一个通过细粒度实体为中心的推理来评估MLLMs中叙事理解的基准。与现有的仅限于短视频片段或粗略场景级语义的基准不同，我们将视频分解为组成实体，并通过组合推理进展（CRP）检查它们的连续性，这是一个结构化评估框架，逐步增加实体存在、实体变化和实体模糊性这三个维度上的叙事复杂性。CRP挑战模型从时间持久性向上下文演变和细粒度感知推理发展。一个完全自动化的实体为中心的流水线使得可扩展地提取有时间基础的实体表征成为可能，为CRP提供了基础。对最先进的MLLMs的评估表明，模型在视觉过渡和时间动态中未能稳健地跟踪实体，往往在上下文转换下产生幻觉。开源通用目的的MLLMs表现出强大的感知基础，但时间上的连贯性较弱，而视频特定的MLLMs捕捉时间上下文，但在实体上产生幻觉。这些发现揭示了感知基础和时间推理之间的基本权衡，表明叙事理解只有通过它们的整合才会出现。NarrativeTrack提供了第一个系统框架来诊断和推进MLLMs中基于时间的叙事理解。

更新时间: 2026-01-03 07:12:55

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2601.01095v1

SoulSeek: Exploring the Use of Social Cues in LLM-based Information Seeking

Social cues, which convey others' presence, behaviors, or identities, play a crucial role in human information seeking by helping individuals judge relevance and trustworthiness. However, existing LLM-based search systems primarily rely on semantic features, creating a misalignment with the socialized cognition underlying natural information seeking. To address this gap, we explore how the integration of social cues into LLM-based search influences users' perceptions, experiences, and behaviors. Focusing on social media platforms that are beginning to adopt LLM-based search, we integrate design workshops, the implementation of the prototype system (SoulSeek), a between-subjects study, and mixed-method analyses to examine both outcome- and process-level findings. The workshop informs the prototype's cue-integrated design. The study shows that social cues improve perceived outcomes and experiences, promote reflective information behaviors, and reveal limits of current LLM-based search. We propose design implications emphasizing better social-knowledge understanding, personalized cue settings, and controllable interactions.

Updated: 2026-01-03 07:09:10

标题: SoulSeek：探索在基于LLM的信息检索中使用社交线索

摘要: 社会线索，传达他人的存在、行为或身份，通过帮助个体判断相关性和可信度，在人类信息寻求中发挥着至关重要的作用。然而，现有基于LLM的搜索系统主要依赖于语义特征，与自然信息寻求基础的社会化认知存在不一致。为了弥补这一差距，我们探讨了如何将社会线索整合到基于LLM的搜索中，影响用户的感知、体验和行为。我们关注开始采用LLM-based搜索的社交媒体平台，整合设计研讨会、原型系统的实施（SoulSeek）、一项受试者研究以及混合方法分析，以检验结果和过程水平的发现。研讨会为原型的线索整合设计提供了信息。研究表明，社会线索改善了感知结果和体验，促进了反思性信息行为，并揭示了当前LLM-based搜索的局限性。我们提出了设计建议，强调更好的社会知识理解、个性化线索设置和可控互动。

更新时间: 2026-01-03 07:09:10

领域: cs.HC,cs.AI,cs.IR

下载: http://arxiv.org/abs/2601.01094v1

ks-lit-3m: A 3.1 million word kashmiri text dataset for large language model pretraining

Large Language Models (LLMs) demonstrate remarkable fluency across high-resource languages yet consistently fail to generate coherent text in Kashmiri, a language spoken by approximately seven million people. This performance disparity stems not from inherent model limitations but from a critical scarcity of high-quality training data. Decades of Kashmiri literature remain inaccessible to modern NLP pipelines due to their encoding in the proprietary InPage desktop publishing format. This paper introduces KS-LIT-3M, a curated corpus of 3.1 million words (16.4 million characters) specifically designed for pretraining language models on Kashmiri. The dataset is structured as a single continuous linear text stream, optimized for causal language model training where models learn to predict subsequent tokens from preceding context. The corpus was constructed through the development of a specialized InPage-to-Unicode converter, followed by rigorous preprocessing including English contamination removal, character normalization, and quality validation. Encompassing 131,607 unique words drawn from diverse genres including literary works, journalistic writing, academic texts, and religious scholarship, KS-LIT-3M addresses a fundamental resource gap for Kashmiri language technology. The dataset is released under the CC-BY-4.0 license to facilitate research in Kashmiri natural language processing.

Updated: 2026-01-03 06:43:26

标题: ks-lit-3m：一个310万字的克什米尔文本数据集，用于大型语言模型预训练

摘要: 大型语言模型（LLMs）展示了在资源丰富的语言中具有卓越的流利性，但在克什米尔语中始终无法生成连贯的文本，克什米尔语是约七百万人口使用的一种语言。这种性能差距并非源于模型固有的限制，而是源于高质量训练数据的严重匮乏。由于使用专有的InPage桌面出版格式进行编码，几十年的克什米尔文学对现代NLP管道仍然无法访问。本文介绍了KS-LIT-3M，这是一个专门设计用于在克什米尔语上对语言模型进行预训练的310万字（1640万字符）的精选语料库。该数据集被构建为一个连续的线性文本流，优化了因果语言模型训练，模型从先前的上下文中学习预测随后的标记。该语料库是通过开发一种专门的InPage到Unicode转换器构建的，随后经过了严格的预处理，包括去除英语污染、字符规范化和质量验证。包含来自不同流派的131,607个独特词汇的KS-LIT-3M涵盖了文学作品、新闻写作、学术文本和宗教学术等多种文本类型，为克什米尔语技术填补了基础资源缺口。该数据集遵循CC-BY-4.0许可发布，以促进克什米尔语自然语言处理的研究。

更新时间: 2026-01-03 06:43:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2601.01091v1

HyperCLOVA X 32B Think

In this report, we present HyperCLOVA X 32B Think, a vision-language model designed with particular emphasis on reasoning within the Korean linguistic and cultural context, as well as agentic ability. HyperCLOVA X 32B Think is pre-trained with a strong focus on reasoning capabilities and subsequently post-trained to support multimodal understanding, enhanced reasoning, agentic behaviors, and alignment with human preferences. Experimental evaluations against comparably sized models demonstrate that our model achieves strong performance on Korean text-to-text and vision-to-text benchmarks, as well as on agent-oriented evaluation tasks. By open-sourcing HyperCLOVA X 32B Think, we aim to support broader adoption and facilitate further research and innovation across both academic and industrial communities.

Updated: 2026-01-03 06:39:38

标题: HyperCLOVA X 32B 智能思考

摘要: 在这份报告中，我们介绍了HyperCLOVA X 32B Think，这是一个以韩国语言和文化背景为重点设计的视觉-语言模型，特别强调了推理能力和代理能力。HyperCLOVA X 32B Think经过强调推理能力的预训练，随后进行后训练以支持多模态理解、增强推理能力、代理行为以及与人类偏好的协调。与同等规模的模型进行实验评估表明，我们的模型在韩文本-文本和视觉-文本基准测试上表现出色，同时在面向代理评估任务上也表现出色。通过开源HyperCLOVA X 32B Think，我们旨在支持更广泛的采用，并促进学术和工业界的进一步研究和创新。

更新时间: 2026-01-03 06:39:38

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2601.03286v1

Multimodal Sentiment Analysis based on Multi-channel and Symmetric Mutual Promotion Feature Fusion

Multimodal sentiment analysis is a key technology in the fields of human-computer interaction and affective computing. Accurately recognizing human emotional states is crucial for facilitating smooth communication between humans and machines. Despite some progress in multimodal sentiment analysis research, numerous challenges remain. The first challenge is the limited and insufficiently rich features extracted from single modality data. Secondly, most studies focus only on the consistency of inter-modal feature information, neglecting the differences between features, resulting in inadequate feature information fusion. In this paper, we first extract multi-channel features to obtain more comprehensive feature information. We employ dual-channel features in both the visual and auditory modalities to enhance intra-modal feature representation. Secondly, we propose a symmetric mutual promotion (SMP) inter-modal feature fusion method. This method combines symmetric cross-modal attention mechanisms and self-attention mechanisms, where the cross-modal attention mechanism captures useful information from other modalities, and the self-attention mechanism models contextual information. This approach promotes the exchange of useful information between modalities, thereby strengthening inter-modal interactions. Furthermore, we integrate intra-modal features and inter-modal fused features, fully leveraging the complementarity of inter-modal feature information while considering feature information differences. Experiments conducted on two benchmark datasets demonstrate the effectiveness and superiority of our proposed method.

Updated: 2026-01-03 06:37:22

标题: 基于多通道和对称互相促进特征融合的多模态情感分析

摘要: 多模态情感分析是人机交互和情感计算领域的关键技术。准确识别人类情绪状态对促进人类与机器之间的流畅交流至关重要。尽管在多模态情感分析研究中取得了一些进展，仍然存在许多挑战。首要挑战是从单一模态数据中提取的特征受限且不够丰富。其次，大多数研究仅关注跨模态特征信息的一致性，忽略了特征之间的差异，导致特征信息融合不足。在本文中，我们首先提取多通道特征，以获得更全面的特征信息。我们在视觉和听觉模态中采用双通道特征以增强单模态特征表示。其次，我们提出了一种对称互助（SMP）跨模态特征融合方法。该方法结合了对称交叉模态注意机制和自注意机制，其中交叉模态注意机制从其他模态中捕获有用信息，自注意机制建模上下文信息。这种方法促进了模态之间有用信息的交流，从而加强了跨模态互动。此外，我们整合了单模态特征和跨模态融合特征，充分利用了跨模态特征信息的互补性，同时考虑特征信息的差异。在两个基准数据集上进行的实验证明了我们提出的方法的有效性和优越性。

更新时间: 2026-01-03 06:37:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2601.02415v1

DISCODE: Distribution-Aware Score Decoder for Robust Automatic Evaluation of Image Captioning

Large vision-language models (LVLMs) have shown impressive performance across a broad range of multimodal tasks. However, robust image caption evaluation using LVLMs remains challenging, particularly under domain-shift scenarios. To address this issue, we introduce the Distribution-Aware Score Decoder (DISCODE), a novel finetuning-free method that generates robust evaluation scores better aligned with human judgments across diverse domains. The core idea behind DISCODE lies in its test-time adaptive evaluation approach, which introduces the Adaptive Test-Time (ATT) loss, leveraging a Gaussian prior distribution to improve robustness in evaluation score estimation. This loss is efficiently minimized at test time using an analytical solution that we derive. Furthermore, we introduce the Multi-domain Caption Evaluation (MCEval) benchmark, a new image captioning evaluation benchmark covering six distinct domains, designed to assess the robustness of evaluation metrics. In our experiments, we demonstrate that DISCODE achieves state-of-the-art performance as a reference-free evaluation metric across MCEval and four representative existing benchmarks.

Updated: 2026-01-03 06:34:02

标题: DISCODE: 分布感知分数解码器用于图像标题的稳健自动评估

摘要: 大型视觉语言模型（LVLMs）在广泛的多模态任务中表现出色。然而，使用LVLMs进行稳健的图像字幕评估仍然具有挑战性，特别是在领域转移的情况下。为了解决这个问题，我们引入了一种新颖的无微调方法，即Distribution-Aware Score Decoder（DISCODE），该方法生成更好地与人类判断一致的强大评估分数，涵盖了各种领域。DISCODE的核心思想在于其测试时自适应评估方法，引入自适应测试时间（ATT）损失，利用高斯先验分布来改善评估分数的稳健性。我们推导出了一种有效地在测试时最小化这种损失的解析解。此外，我们引入了Multi-domain Caption Evaluation（MCEval）基准，这是一个新的图像字幕评估基准，涵盖了六个不同的领域，旨在评估评估指标的稳健性。在我们的实验中，我们证明DISCODE在MCEval和四个代表性现有基准中作为无参考评估指标取得了最先进的性能。

更新时间: 2026-01-03 06:34:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2512.14420v2

Harm in AI-Driven Societies: An Audit of Toxicity Adoption on Chirper.ai

Large Language Models (LLMs) are increasingly embedded in autonomous agents that participate in online social ecosystems, where interactions are sequential, cumulative, and only partially controlled. While prior work has documented the generation of toxic content by LLMs, far less is known about how exposure to harmful content shapes agent behavior over time, particularly in environments composed entirely of interacting AI agents. In this work, we study toxicity adoption of LLM-driven agents on Chirper.ai, a fully AI-driven social platform. Specifically, we model interactions in terms of stimuli (posts) and responses (comments), and by operationalizing exposure through observable interactions rather than inferred recommendation mechanisms. We conduct a large-scale empirical analysis of agent behavior, examining how response toxicity relates to stimulus toxicity, how repeated exposure affects the likelihood of toxic responses, and whether toxic behavior can be predicted from exposure alone. Our findings show that while toxic responses are more likely following toxic stimuli, a substantial fraction of toxicity emerges spontaneously, independent of exposure. At the same time, cumulative toxic exposure significantly increases the probability of toxic responding. We further introduce two influence metrics, the Influence-Driven Response Rate and the Spontaneous Response Rate, revealing a strong trade-off between induced and spontaneous toxicity. Finally, we show that the number of toxic stimuli alone enables accurate prediction of whether an agent will eventually produce toxic content. These results highlight exposure as a critical risk factor in the deployment of LLM agents and suggest that monitoring encountered content may provide a lightweight yet effective mechanism for auditing and mitigating harmful behavior in the wild.

Updated: 2026-01-03 06:33:08

标题: 人工智能驱动社会中的危害：对Chirper.ai上有毒行为采用情况的审计

摘要: 大型语言模型（LLMs）越来越多地嵌入到参与在线社交生态系统的自主代理中，这些代理的交互是顺序的、累积的，并且仅在一定程度上受控。虽然先前的研究已经记录了LLMs生成有毒内容的情况，但对于有害内容对代理行为如何随时间变化的影响知之甚少，尤其是在完全由交互的AI代理组成的环境中。在这项研究中，我们研究了Chirper.ai上LLM驱动代理的毒性采纳情况，这是一个完全由AI驱动的社交平台。具体地，我们通过可观察的交互操作曝光，而不是通过推荐机制进行推断，来模拟刺激（帖子）和响应（评论）之间的交互。我们进行了大规模的实证分析，研究了响应毒性与刺激毒性的关系，重复曝光如何影响毒性响应的可能性，以及毒性行为是否可以仅从曝光中预测。我们的研究结果显示，虽然在有毒刺激之后更有可能有毒响应，但相当一部分毒性是自发产生的，与曝光无关。与此同时，累积的毒性曝光显著增加了毒性响应的概率。我们进一步引入了两个影响度量，即影响驱动的响应率和自发响应率，揭示了诱导和自发毒性之间的明显权衡关系。最后，我们表明仅仅通过毒性刺激的数量就可以准确预测一个代理是否最终会产生有毒内容。这些结果突出了曝光作为部署LLM代理的关键风险因素，并建议监控遇到的内容可能提供一个轻量级但有效的机制，用于审计和减轻野外中的有害行为。

更新时间: 2026-01-03 06:33:08

领域: cs.MA,cs.AI,cs.CY

下载: http://arxiv.org/abs/2601.01090v1

Central Dogma Transformer: Towards Mechanism-Oriented AI for Cellular Understanding

Understanding cellular mechanisms requires integrating information across DNA, RNA, and protein - the three molecular systems linked by the Central Dogma of molecular biology. While domain-specific foundation models have achieved success for each modality individually, they remain isolated, limiting our ability to model integrated cellular processes. Here we present the Central Dogma Transformer (CDT), an architecture that integrates pre-trained language models for DNA, RNA, and protein following the directional logic of the Central Dogma. CDT employs directional cross-attention mechanisms - DNA-to-RNA attention models transcriptional regulation, while RNA-to-Protein attention models translational relationships - producing a unified Virtual Cell Embedding that integrates all three modalities. We validate CDT v1 - a proof-of-concept implementation using fixed (non-cell-specific) RNA and protein embeddings - on CRISPRi enhancer perturbation data from K562 cells, achieving a Pearson correlation of 0.503, representing 63% of the theoretical ceiling set by cross-experiment variability (r = 0.797). Attention and gradient analyses provide complementary interpretive windows: in detailed case studies, these approaches highlight largely distinct genomic regions, with gradient analysis identifying a CTCF binding site that Hi-C data showed as physically contacting both enhancer and target gene. These results suggest that AI architectures aligned with biological information flow can achieve both predictive accuracy and mechanistic interpretability.

Updated: 2026-01-03 06:29:22

标题: 中心法则变换器：朝向面向细胞理解的机制导向人工智能

摘要: 理解细胞机制需要整合DNA、RNA和蛋白质之间的信息，这三个分子系统由分子生物学的中心法则联系在一起。虽然针对每种模态的领域特定基础模型已经取得了成功，但它们仍然是孤立的，限制了我们对细胞过程的整合建模能力。在这里，我们提出了中心法则转换器（CDT），这是一个将DNA、RNA和蛋白质的预训练语言模型集成在一起的架构，遵循中心法则的方向逻辑。CDT采用方向性交叉注意机制——DNA到RNA的注意模型模拟转录调控，而RNA到蛋白质的注意模型模拟翻译关系——产生一个统一的虚拟细胞嵌入，整合了所有三种模态。我们在K562细胞的CRISPRi增强子扰动数据上验证了CDT v1——这是一个概念验证实现，使用固定（非特定细胞的）RNA和蛋白质嵌入，实现了0.503的Pearson相关系数，相当于交叉实验变异性设定的理论上限的63%（r = 0.797）。注意力和梯度分析提供了互补的解释视角：在详细的案例研究中，这些方法突出了主要不同的基因组区域，梯度分析确定了一个CTCF结合位点，Hi-C数据显示它物理上联系了增强子和靶基因。这些结果表明，与生物信息流对齐的人工智能架构可以同时实现预测精度和机制可解释性。

更新时间: 2026-01-03 06:29:22

领域: cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2601.01089v1

OFFSIDE: Benchmarking Unlearning Misinformation in Multimodal Large Language Models

Advances in Multimodal Large Language Models (MLLMs) intensify concerns about data privacy, making Machine Unlearning (MU), the selective removal of learned information, a critical necessity. However, existing MU benchmarks for MLLMs are limited by a lack of image diversity, potential inaccuracies, and insufficient evaluation scenarios, which fail to capture the complexity of real-world applications. To facilitate the development of MLLMs unlearning and alleviate the aforementioned limitations, we introduce OFFSIDE, a novel benchmark for evaluating misinformation unlearning in MLLMs based on football transfer rumors. This manually curated dataset contains 15.68K records for 80 players, providing a comprehensive framework with four test sets to assess forgetting efficacy, generalization, utility, and robustness. OFFSIDE supports advanced settings like selective unlearning and corrective relearning, and crucially, unimodal unlearning (forgetting only text data). Our extensive evaluation of multiple baselines reveals key findings: (1) Unimodal methods (erasing text-based knowledge) fail on multimodal rumors; (2) Unlearning efficacy is largely driven by catastrophic forgetting; (3) All methods struggle with "visual rumors" (rumors appear in the image); (4) The unlearned rumors can be easily recovered and (5) All methods are vulnerable to prompt attacks. These results expose significant vulnerabilities in current approaches, highlighting the need for more robust multimodal unlearning solutions. The code is available at https://github.com/zh121800/OFFSIDE

Updated: 2026-01-03 06:29:07

标题: 越位：在多模态大型语言模型中对错误信息的忘却进行基准测试

摘要: 多模式大语言模型（MLLMs）的进展加剧了对数据隐私的担忧，使得机器取消学习（MU），即选择性删除已学习信息，成为一项关键必要性。然而，现有的MLLMs取消学习基准受到图像多样性不足、潜在的不准确性和评估场景不足的限制，无法捕捉现实应用的复杂性。为促进MLLMs取消学习的发展并减轻上述限制，我们提出了OFFSIDE，这是一个基于足球转会传闻评估MLLMs错误信息取消学习的新型基准。这个手动筛选的数据集包含80名球员的15.68K条记录，提供了一个包含四个测试集的综合框架，用于评估遗忘效果、泛化、实用性和稳健性。OFFSIDE支持高级设置，如选择性取消学习和纠正重新学习，关键是一模一样的取消学习（只遗忘文本数据）。我们对多个基准进行了广泛评估，揭示了关键发现：（1）一模一样的方法（删除基于文本的知识）在多模式传闻上失败；（2）取消学习效果主要受到灾难性遗忘的驱动；（3）所有方法在“视觉传闻”（传闻出现在图像中）方面都遇到困难；（4）遗忘的传闻可以很容易地恢复；（5）所有方法都容易受到提示攻击。这些结果揭示了当前方法中存在的重大漏洞，突出了对更加健壮的多模式取消学习解决方案的需求。代码可在https://github.com/zh121800/OFFSIDE找到。

更新时间: 2026-01-03 06:29:07

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2510.22535v2

MIAR: Modality Interaction and Alignment Representation Fuison for Multimodal Emotion

Multimodal Emotion Recognition (MER) aims to perceive human emotions through three modes: language, vision, and audio. Previous methods primarily focused on modal fusion without adequately addressing significant distributional differences among modalities or considering their varying contributions to the task. They also lacked robust generalization capabilities across diverse textual model features, thus limiting performance in multimodal scenarios. Therefore, we propose a novel approach called Modality Interaction and Alignment Representation (MIAR). This network integrates contextual features across different modalities using a feature interaction to generate feature tokens to represent global representations of this modality extracting information from other modalities. These four tokens represent global representations of how each modality extracts information from others. MIAR aligns different modalities using contrastive learning and normalization strategies. We conduct experiments on two benchmarks: CMU-MOSI and CMU-MOSEI datasets, experimental results demonstrate the MIAR outperforms state-of-the-art MER methods.

Updated: 2026-01-03 06:26:13

标题: MIAR: 多模态情绪的模态交互和对齐表示融合

摘要: 多模态情感识别（MER）旨在通过三种模式：语言、视觉和音频来感知人类情绪。先前的方法主要集中在模态融合上，但未能充分解决不同模态之间的重要分布差异，也未考虑它们对任务的不同贡献。它们还缺乏跨多样化文本模型特征的强大泛化能力，从而限制了在多模态场景下的性能。因此，我们提出了一种名为Modality Interaction and Alignment Representation（MIAR）的新方法。该网络使用特征交互来整合不同模态的上下文特征，生成特征令牌来表示这种模态的全局表示，提取其他模态的信息。这四个令牌代表了每个模态如何从其他模态中提取信息的全局表示。MIAR使用对比学习和归一化策略来对齐不同的模态。我们在两个基准测试上进行了实验：CMU-MOSI和CMU-MOSEI数据集，实验结果表明MIAR的表现优于最先进的MER方法。

更新时间: 2026-01-03 06:26:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2601.02414v1

Luminark: Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models

In this paper, we introduce \emph{Luminark}, a training-free and probabilistically-certified watermarking method for general vision generative models. Our approach is built upon a novel watermark definition that leverages patch-level luminance statistics. Specifically, the service provider predefines a binary pattern together with corresponding patch-level thresholds. To detect a watermark in a given image, we evaluate whether the luminance of each patch surpasses its threshold and then verify whether the resulting binary pattern aligns with the target one. A simple statistical analysis demonstrates that the false positive rate of the proposed method can be effectively controlled, thereby ensuring certified detection. To enable seamless watermark injection across different paradigms, we leverage the widely adopted guidance technique as a plug-and-play mechanism and develop the \emph{watermark guidance}. This design enables Luminark to achieve generality across state-of-the-art generative models without compromising image quality. Empirically, we evaluate our approach on nine models spanning diffusion, autoregressive, and hybrid frameworks. Across all evaluations, Luminark consistently demonstrates high detection accuracy, strong robustness against common image transformations, and good performance on visual quality.

Updated: 2026-01-03 06:20:00

标题: Luminark:面向一般视觉生成模型的无需训练的概率认证水印技术

摘要: 在本文中，我们介绍了一种名为Luminark的无需训练的概率认证数字水印方法，适用于一般视觉生成模型。我们的方法建立在一种利用补丁级亮度统计的新颖水印定义之上。具体而言，服务提供商预先定义了一个二进制模式以及相应的补丁级阈值。为了检测给定图像中的水印，我们评估每个补丁的亮度是否超过了其阈值，然后验证结果二进制模式是否与目标模式一致。简单的统计分析表明，所提出的方法的误报率可以被有效控制，从而确保认证检测。为了实现在不同范式下的无缝水印注入，我们利用广泛采用的引导技术作为即插即用机制，并开发了水印引导。这一设计使得Luminark能够在不损失图像质量的情况下实现对最先进的生成模型的通用性。在实证方面，我们评估了我们的方法在跨扩散、自回归和混合框架的九种模型上的效果。在所有评估中，Luminark始终表现出高检测准确性、对常见图像变换的强鲁棒性以及良好的视觉质量表现。

更新时间: 2026-01-03 06:20:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2601.01085v1

Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces

Quality diversity (QD) optimization searches for a collection of solutions that optimize an objective while attaining diverse outputs of a user-specified, vector-valued measure function. Contemporary QD algorithms focus on low-dimensional measures because high-dimensional measures are prone to distortion, where many solutions found by the QD algorithm map to similar measures. For example, the CMA-MAE algorithm guides measure space exploration with a histogram in measure space that records so-called discount values. However, CMA-MAE stagnates in domains with high-dimensional measure spaces because solutions with similar measures fall into the same histogram cell and thus receive identical discount values. To address these limitations, we propose Discount Model Search (DMS), which guides exploration with a model that provides a smooth, continuous representation of discount values. In high-dimensional measure spaces, this model enables DMS to distinguish between solutions with similar measures and thus continue exploration. We show that DMS facilitates new QD applications by introducing two domains where the measure space is the high-dimensional space of images, which enables users to specify their desired measures by providing a dataset of images rather than hand-designing the measure function. Results in these domains and on high-dimensional benchmarks show that DMS outperforms CMA-MAE and other black-box QD algorithms.

Updated: 2026-01-03 06:05:22

标题: 在高维度测度空间中寻找质量多样性优化的折扣模型搜索

摘要: 质量多样性（QD）优化是一种搜索一组解决方案的方法，这些解决方案在优化一个目标的同时，达到用户指定的、矢量值的度量函数的多样输出。当代的QD算法主要关注低维度度量，因为高维度度量容易发生失真，QD算法找到的许多解决方案映射到相似的度量。例如，CMA-MAE算法通过在度量空间中记录所谓的折扣值的直方图来引导度量空间的探索。然而，在具有高维度度量空间的领域中，CMA-MAE会停滞，因为具有相似度量的解决方案会落入同一直方图单元格，从而获得相同的折扣值。为了解决这些限制，我们提出了Discount Model Search（DMS），它通过一个模型来引导探索，该模型提供了折扣值的平滑、连续表示。在高维度度量空间中，这个模型使得DMS能够区分具有相似度量的解决方案，从而继续探索。我们展示了DMS通过引入两个领域的新QD应用，这些领域的度量空间是图像的高维空间，用户可以通过提供图像数据集而不是手工设计度量函数来指定他们想要的度量。在这些领域和高维基准测试中的结果表明，DMS优于CMA-MAE和其他黑盒QD算法。

更新时间: 2026-01-03 06:05:22

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2601.01082v1

Scalable Data-Driven Reachability Analysis and Control via Koopman Operators with Conformal Coverage Guarantees

We propose a scalable reachability-based framework for probabilistic, data-driven safety verification of unknown nonlinear dynamics. We use Koopman theory with a neural network (NN) lifting function to learn an approximate linear representation of the dynamics and design linear controllers in this space to enable closed-loop tracking of a reference trajectory distribution. Closed-loop reachable sets are efficiently computed in the lifted space and mapped back to the original state space via NN verification tools. To capture model mismatch between the Koopman dynamics and the true system, we apply conformal prediction to produce statistically-valid error bounds that inflate the reachable sets to ensure the true trajectories are contained with a user-specified probability. These bounds generalize across references, enabling reuse without recomputation. Results on high-dimensional MuJoCo tasks (11D Hopper, 28D Swimmer) and 12D quadcopters show improved reachable set coverage rate, computational efficiency, and conservativeness over existing methods.

Updated: 2026-01-03 05:31:08

标题: 可扩展的基于数据驱动的可达性分析和控制，通过Koopman算子实现具有一致覆盖保证

摘要: 我们提出了一种可扩展的基于可达性的框架，用于对未知非线性动态进行概率数据驱动的安全验证。我们使用Koopman理论和神经网络（NN）提升函数来学习动态的近似线性表示，并在该空间中设计线性控制器，以实现参考轨迹分布的闭环跟踪。在提升空间中高效计算闭环可达集，并通过NN验证工具将其映射回原始状态空间。为了捕捉Koopman动态与真实系统之间的模型不匹配，我们应用符合预测来产生统计上有效的误差界限，从而扩大可达集以确保真实轨迹以用户指定的概率包含在内。这些界限通用于参考，使得可以在不重新计算的情况下重复使用。在高维MuJoCo任务（11D Hopper，28D Swimmer）和12D四轴飞行器上的结果显示，相比现有方法，可达集覆盖率、计算效率和保守性均有所改善。

更新时间: 2026-01-03 05:31:08

领域: eess.SY,cs.AI,cs.LG,cs.RO,math.OC

下载: http://arxiv.org/abs/2601.01076v1

Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

We introduce a finance & accounting benchmark (Finch) for evaluating AI agents on real-world, enterprise-grade professional workflows -- interleaving data entry, structuring, formatting, web search, cross-file retrieval, calculation, modeling, validation, translation, visualization, and reporting. Finch is sourced from authentic enterprise workspaces at Enron (15,000 spreadsheets and 500,000 emails from 150 employees) and other financial institutions, preserving in-the-wild messiness across multimodal artifacts (text, tables, formulas, charts, code, and images) and spanning diverse domains such as budgeting, trading, and asset management. We propose a workflow construction process that combines LLM-assisted discovery with expert annotation: (1) LLM-assisted, expert-verified derivation of workflows from real-world email threads and version histories of spreadsheet files, and (2) meticulous expert annotation for workflows, requiring over 700 hours of domain-expert effort. This yields 172 composite workflows with 384 tasks, involving 1,710 spreadsheets with 27 million cells, along with PDFs and other artifacts, capturing the intrinsically messy, long-horizon, knowledge-intensive, and collaborative nature of real-world enterprise work. We conduct both human and automated evaluations of frontier AI systems including GPT 5.1, Claude Sonnet 4.5, Gemini 3 Pro, Grok 4, and Qwen 3 Max, and GPT 5.1 Pro spends 16.8 minutes per workflow yet passes only 38.4% of workflows, while Claude Sonnet 4.5 passes just 25.0%. Comprehensive case studies further surface the challenges that real-world enterprise workflows pose for AI agents.

Updated: 2026-01-03 05:28:05

标题: 金翅雀：跨电子表格中心企业工作流程的财务与会计基准测试

摘要: 我们引入了一个金融和会计基准（Finch），用于评估AI代理在现实世界中的企业级专业工作流程上的表现，这些工作流程交错进行数据输入、结构化、格式化、网络搜索、跨文件检索、计算、建模、验证、翻译、可视化和报告。Finch的数据来自安然公司的真实企业工作空间（包括来自150名员工的15,000个电子表格和500,000封电子邮件）以及其他金融机构，保留了跨多模态文物（文本、表格、公式、图表、代码和图像）的野外混乱，涵盖了预算编制、交易和资产管理等多个领域。我们提出了一个工作流程构建过程，将LLM辅助发现与专家注释相结合：（1）LLM辅助、专家验证的从现实世界电子邮件线程和电子表格文件版本历史中推导工作流程，以及（2）对工作流程的细致专家注释，需要超过700小时的领域专家工作。这产生了172个复合工作流程，涉及384个任务，涉及1,710个包含2700万个单元格的电子表格，以及PDF文件和其他文物，捕捉了现实世界企业工作的固有混乱、长期视野、知识密集和协作性质。我们对包括GPT 5.1、克劳德·索内特 4.5、Gemini 3 Pro、Grok 4和Qwen 3 Max在内的前沿AI系统进行了人工和自动评估，GPT 5.1 Pro每个工作流程花费16.8分钟，但仅通过了38.4%的工作流程，而克劳德·索内特 4.5仅通过了25.0%。全面的案例研究进一步揭示了现实世界企业工作流程对AI代理提出的挑战。

更新时间: 2026-01-03 05:28:05

领域: cs.AI,cs.CE,cs.IR,cs.MA

下载: http://arxiv.org/abs/2512.13168v3

Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments

Embodied systems experience the world as 'a symphony of flows': a combination of many continuous streams of sensory input coupled to self-motion, interwoven with the dynamics of external objects. These streams obey smooth, time-parameterized symmetries, which combine through a precisely structured algebra; yet most neural network world models ignore this structure and instead repeatedly re-learn the same transformations from data. In this work, we introduce 'Flow Equivariant World Models', a framework in which both self-motion and external object motion are unified as one-parameter Lie group 'flows'. We leverage this unification to implement group equivariance with respect to these transformations, thereby providing a stable latent world representation over hundreds of timesteps. On both 2D and 3D partially observed video world modeling benchmarks, we demonstrate that Flow Equivariant World Models significantly outperform comparable state-of-the-art diffusion-based and memory-augmented world modeling architectures -- particularly when there are predictable world dynamics outside the agent's current field of view. We show that flow equivariance is particularly beneficial for long rollouts, generalizing far beyond the training horizon. By structuring world model representations with respect to internal and external motion, flow equivariance charts a scalable route to data efficient, symmetry-guided, embodied intelligence. Project link: https://flowequivariantworldmodels.github.io.

Updated: 2026-01-03 05:22:27

标题: 流等变世界模型：部分观测动态环境的记忆

摘要: 体现系统将世界体验为“一曲流动的交响曲”：许多连续的感官输入流与自身运动相结合，交织在外部物体的动态中。这些流遵循平滑、时间参数化的对称性，通过精确结构化的代数组合。然而，大多数神经网络世界模型忽略了这种结构，而是反复从数据中重新学习相同的转换。在这项工作中，我们引入了“流等变世界模型”框架，将自身运动和外部物体运动统一为一个参数Lie群“流”。我们利用这种统一性实现了对这些转换的群等变性，从而在数百个时间步上提供稳定的潜在世界表示。在2D和3D部分观察视频世界建模基准测试中，我们展示了流等变世界模型在可预测的超出代理人当前视野范围的世界动态时，明显优于可比的基于扩散和增强记忆的世界建模架构。我们表明，流等变性对于长时间步的推演特别有益，远远超出了训练视野。通过将世界模型表示结构化为内部和外部运动，流等变性为数据高效、对称性引导的体验智能开辟了一条可扩展的路径。项目链接：https://flowequivariantworldmodels.github.io.

更新时间: 2026-01-03 05:22:27

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2601.01075v1

Gendered Pathways in AI Companionship: Cross-Community Behavior and Toxicity Patterns on Reddit

AI-companionship platforms are rapidly reshaping how people form emotional, romantic, and parasocial bonds with non-human agents, raising new questions about how these relationships intersect with gendered online behavior and exposure to harmful content. Focusing on the MyBoyfriendIsAI (MBIA) subreddit, we reconstruct the Reddit activity histories of more than 3,000 highly engaged users over two years, yielding over 67,000 historical submissions. We then situate MBIA within a broader ecosystem by building a historical interaction network spanning more than 2,000 subreddits, which enables us to trace cross-community pathways and measure how toxicity and emotional expression vary across these trajectories. We find that MBIA users primarily traverse four surrounding community spheres (AI-companionship, porn-related, forum-like, and gaming) and that participation across the ecosystem exhibits a distinct gendered structure, with substantial engagement by female users. While toxicity is generally low across most pathways, we observe localized spikes concentrated in a small subset of AI-porn and gender-oriented communities. Nearly 16% of users engage with gender-focused subreddits, and their trajectories display systematically different patterns of emotional expression and elevated toxicity, suggesting that a minority of gendered pathways may act as toxicity amplifiers within the broader AI-companionship ecosystem. These results characterize the gendered structure of cross-community participation around AI companionship on Reddit and highlight where risks concentrate, informing measurement, moderation, and design practices for human-AI relationship platforms.

Updated: 2026-01-03 05:13:00

标题: 在人工智能伴侣中的性别路径：Reddit上跨社区行为和毒性模式

摘要: AI伴侣平台正在迅速改变人们如何与非人类代理形成情感、浪漫和伪社交关系，引发了关于这些关系如何与在线性别行为和暴露于有害内容的新问题。本文关注MyBoyfriendIsAI（MBIA）子论坛，在两年内重建了超过3,000名高度参与用户的Reddit活动历史，产生了超过67,000个历史提交。然后，我们通过构建一个跨越2,000多个子论坛的历史互动网络，将MBIA置于更广泛的生态系统中，从而能够追踪跨社区路径并衡量毒性和情感表达在这些轨迹上的变化。我们发现，MBIA用户主要穿越四个周围社区领域（AI伴侣、色情相关、类似论坛和游戏），而生态系统中的参与表现出明显的性别结构，女性用户的参与程度很高。尽管大多数路径上的毒性通常很低，但我们观察到在一小部分AI色情和性别定向社区中出现局部尖峰。近16%的用户参与性别关注的子论坛，并且他们的轨迹展示出系统性不同的情感表达模式和提高的毒性，这表明少数性别化路径可能成为更广泛的AI伴侣生态系统中的毒性放大器。这些结果描述了Reddit上围绕AI伴侣关系的跨社区参与的性别结构，并强调了风险集中的地方，为人工智能关系平台的测量、调节和设计实践提供信息。

更新时间: 2026-01-03 05:13:00

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2601.01073v1

Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs

Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strategies. As many non-stationary environments exhibit gradual drifting patterns, the weighted strategy is commonly adopted in real-world applications. However, previous theoretical studies show that its analysis is more involved and the algorithms are either computationally less efficient or statistically suboptimal. This paper revisits the weighted strategy for non-stationary parametric bandits. In linear bandits (LB), we discover that this undesirable feature is due to an inadequate regret analysis, which results in an overly complex algorithm design. We propose a \emph{refined analysis framework}, which simplifies the derivation and, importantly, produces a simpler weight-based algorithm that is as efficient as window/restart-based algorithms while retaining the same regret as previous studies. Furthermore, our new framework can be used to improve regret bounds of other parametric bandits, including Generalized Linear Bandits (GLB) and Self-Concordant Bandits (SCB). For example, we develop a simple weighted GLB algorithm with an $\tilde{O}(k_μ^{5/4} c_μ^{-3/4} d^{3/4} P_T^{1/4}T^{3/4})$ regret, improving the $\tilde{O}(k_μ^{2} c_μ^{-1}d^{9/10} P_T^{1/5}T^{4/5})$ bound in prior work, where $k_μ$ and $c_μ$ characterize the reward model's nonlinearity, $P_T$ measures the non-stationarity, $d$ and $T$ denote the dimension and time horizon. Moreover, we extend our framework to non-stationary Markov Decision Processes (MDPs) with function approximation, focusing on Linear Mixture MDP and Multinomial Logit (MNL) Mixture MDP. For both classes, we propose algorithms based on the weighted strategy and establish dynamic regret guarantees using our analysis framework.

Updated: 2026-01-03 04:50:21

标题: 重新审视非平稳参数赌博机和MDPs的加权策略

摘要: 非平稳参数赌博机最近引起了很多关注。处理非平稳性的三种原则方法包括滑动窗口、加权和重启策略。由于许多非平稳环境呈现出逐渐漂移的模式，加权策略在实际应用中通常被采用。然而，先前的理论研究表明，其分析更为复杂，算法要么计算效率较低，要么统计上不够优化。本文重新审视了非平稳参数赌博机的加权策略。在线性赌博机（LB）中，我们发现这种不良特性是由于不足的后悔分析导致的，这导致了过于复杂的算法设计。我们提出了一个\emph{精细分析框架}，简化了推导过程，并且重要的是，产生了一个更简单的基于权重的算法，其效率与基于窗口/重启的算法相当，同时保持与先前研究相同的后悔。此外，我们的新框架可以用于改进其他参数赌博机的后悔界限，包括广义线性赌博机（GLB）和自共轭赌博机（SCB）。例如，我们开发了一个简单的加权GLB算法，其后悔为$\tilde{O}(k_μ^{5/4} c_μ^{-3/4} d^{3/4} P_T^{1/4}T^{3/4})$，改进了先前工作中的$\tilde{O}(k_μ^{2} c_μ^{-1}d^{9/10} P_T^{1/5}T^{4/5})$界限，其中$k_μ$和$c_μ$表示奖励模型的非线性，$P_T$衡量非平稳性，$d$和$T$表示维度和时间跨度。此外，我们将我们的框架拓展到具有函数逼近的非平稳马尔可夫决策过程（MDPs），重点放在线性混合MDP和多项式逻辑（MNL）混合MDP上。对于这两类问题，我们提出了基于加权策略的算法，并利用我们的分析框架建立了动态后悔保证。

更新时间: 2026-01-03 04:50:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2601.01069v1

Post-Quantum Cryptography for Intelligent Transportation Systems: An Implementation-Focused Review

As quantum computing advances, the cryptographic algorithms that underpin confidentiality, integrity, and authentication in Intelligent Transportation Systems (ITS) face increasing vulnerability to quantum-enabled attacks. To address these risks, governments and industry stakeholders are turning toward post-quantum cryptography (PQC), a class of algorithms designed to resist adversaries equipped with quantum computing capabilities. However, existing studies provide limited insight into the implementation-focused aspects of PQC in the ITS domain. This review fills that gap by evaluating the readiness of vehicular communication and security standards for PQC adoption. It examines in-vehicle networks and vehicle-to-everything (V2X) interfaces, while also investigating vulnerabilities at the physical layer, primarily exposure to side-channel and fault injection attacks. The review identifies thirteen research gaps reflecting non-PQC-ready standards, constraints in embedded implementation and hybrid cryptography, interoperability and certificate-management barriers, lack of real-world PQC deployment data in ITS, and physical-attack vulnerabilities in PQC-enabled vehicular communication. Future research directions include updating vehicular communication and security standards, optimizing PQC for low-power devices, enhancing interoperability and certificate-management frameworks for PQC integration, conducting real-world evaluations of PQC-enabled communication and control functions across ITS deployments, and strengthening defenses against AI-assisted physical attacks. A phased roadmap is presented, aligning PQC deployment with regulatory, performance, and safety requirements, thereby guiding the secure evolution of ITS in the quantum computing era.

Updated: 2026-01-03 04:39:06

标题: 量子后密码学在智能交通系统中的应用：一个以实施为重点的综述

摘要: 随着量子计算的进步，支撑智能交通系统（ITS）保密性、完整性和认证的加密算法面临越来越多的量子攻击的威胁。为了解决这些风险，政府和行业利益相关者正在转向后量子密码学（PQC），这是一类旨在抵抗配备量子计算能力的对手的算法。然而，现有研究对ITS领域中PQC的实施方面提供的见解有限。本综述通过评估车载通信和安全标准对PQC采用的准备情况来填补这一空白。它检查车载网络和车辆对一切（V2X）接口，同时调查主要暴露于侧信道和故障注入攻击的物理层的漏洞。综述确定了十三个研究空白，反映了不符合PQC准备标准、嵌入式实施和混合密码学的约束、PQC整合的互操作性和证书管理障碍、ITS中缺乏真实世界PQC部署数据以及PQC启用车辆通信中的物理攻击漏洞。未来研究方向包括更新车载通信和安全标准，优化PQC以适用于低功耗设备，增强PQC整合的互操作性和证书管理框架，进行PQC启用通信和控制功能在ITS部署中的真实世界评估，并加强防范AI辅助的物理攻击。提出了分阶段的路线图，将PQC的部署与法规、性能和安全要求相一致，从而引导ITS在量子计算时代安全演进。

更新时间: 2026-01-03 04:39:06

领域: cs.CR

下载: http://arxiv.org/abs/2601.01068v1

Dynamic Graph Neural Networks for Physiological Based Pharmacokinetic Modeling: A Novel Data Driven Approach to Drug Concentration Prediction

Physiologically Based Pharmacokinetic (PBPK) modeling is a key tool in drug development for predicting drug concentration dynamics across organs. Traditional PBPK approaches rely on ordinary differential equations with simplifying assumptions that limit their ability to capture nonlinear and system-level physiological interactions. In this work, we investigate data-driven PBPK modeling using deep learning. We implement two baseline architectures -- a multilayer perceptron (MLP) and a long short-term memory (LSTM) network -- and propose a Dynamic Graph Neural Network (Dynamic GNN) that explicitly models inter-organ interactions through recurrent message passing on a physiological graph. Experiments on a multi-organ pharmacokinetic dataset show that the Dynamic GNN achieves the lowest mean absolute percentage error (MAPE) of 15.7% among all models, demonstrating improved relative accuracy despite slightly higher absolute error compared to the MLP baseline. The model attains an R2 of 0.9342 with more stable error behavior and better captures inter-organ pharmacokinetic relationships. These results highlight the importance of structure-aware modeling for PBPK applications and demonstrate that the proposed Dynamic GNN offers a scalable, equation-free alternative for data-driven pharmacokinetic prediction.

Updated: 2026-01-03 04:33:28

标题: 动态图神经网络用于生理药代动力学建模：一种新颖的数据驱动方法用于药物浓度预测

摘要: 生理药代动力学（PBPK）建模是药物开发中用于预测器官间药物浓度动态的关键工具。传统的PBPK方法依赖于具有简化假设的常微分方程，这些假设限制了它们捕捉非线性和系统级生理相互作用的能力。在这项工作中，我们研究了使用深度学习的数据驱动PBPK建模。我们实现了两种基线架构-多层感知器（MLP）和长短期记忆（LSTM）网络-并提出了一个动态图神经网络（Dynamic GNN），通过对生理图上的循环消息传递明确建模器官间相互作用。对多器官药代动力学数据集的实验表明，动态GNN在所有模型中实现了最低的平均绝对百分比误差（MAPE）为15.7％，表现出相对精度提高，尽管绝对误差略高于MLP基线。该模型达到了0.9342的R²，具有更稳定的误差行为，并更好地捕捉了器官间的药代动力学关系。这些结果突显了结构感知建模对PBPK应用的重要性，并表明所提出的动态GNN为数据驱动的药代动力学预测提供了一种可扩展的无方程替代方案。

更新时间: 2026-01-03 04:33:28

领域: cs.LG

下载: http://arxiv.org/abs/2510.22096v2

Tiny Machine Learning for Real-Time Aquaculture Monitoring: A Case Study in Morocco

Aquaculture, the farming of aquatic organisms, is a rapidly growing industry facing challenges such as water quality fluctuations, disease outbreaks, and inefficient feed management. Traditional monitoring methods often rely on manual labor and are time consuming, leading to potential delays in addressing issues. This paper proposes the integration of low-power edge devices using Tiny Machine Learning (TinyML) into aquaculture systems to enable real-time automated monitoring and control, such as collecting data and triggering alarms, and reducing labor requirements. The system provides real-time data on the required parameters such as pH levels, temperature, dissolved oxygen, and ammonia levels to control water quality, nutrient levels, and environmental conditions enabling better maintenance, efficient resource utilization, and optimal management of the enclosed aquaculture space. The system enables alerts in case of anomaly detection. The data collected by the sensors over time can serve for important decision-making regarding optimizing water treatment processes, feed distribution, feed pattern analysis and improve feed efficiency, reducing operational costs. This research explores the feasibility of developing TinyML-based solutions for aquaculture monitoring, considering factors such as sensor selection, algorithm design, hardware constraints, and ethical considerations. By demonstrating the potential benefits of TinyML in aquaculture, our aim is to contribute to the development of more sustainable and efficient farming practices.

Updated: 2026-01-03 04:21:00

标题: 微型机器学习实时水产养殖监测：摩洛哥案例研究

摘要: 水产养殖是一种快速增长的行业，面临诸如水质波动、疾病爆发和饲料管理低效等挑战。传统的监测方法往往依赖于人工劳动，耗时耗力，可能导致问题的延误。本文提出将低功耗边缘设备与微型机器学习（TinyML）集成到水产养殖系统中，实现实时自动监控和控制，例如收集数据和触发警报，从而减少劳动需求。该系统提供所需参数的实时数据，如pH值、温度、溶解氧和氨水平，以控制水质、养分水平和环境条件，实现更好的维护、高效的资源利用和封闭水产养殖空间的最佳管理。系统能够在检测到异常时发出警报。传感器随时间收集的数据可用于重要决策，例如优化水处理过程、饲料分配、饲料模式分析和提高饲料效率，降低运营成本。本研究探讨了基于TinyML的水产养殖监测解决方案的可行性，考虑了传感器选择、算法设计、硬件约束和伦理考虑等因素。通过展示TinyML在水产养殖中的潜在好处，我们的目标是促进更可持续和高效的养殖实践的发展。

更新时间: 2026-01-03 04:21:00

领域: cs.LG,eess.SP,eess.SY

下载: http://arxiv.org/abs/2601.01065v1

Can Large Language Models Automate the Refinement of Cellular Network Specifications?

Cellular networks, e.g., 4G/5G, rely on complex technical specifications to ensure correct functionality; however, these specifications often contain flaws or ambiguities. In this paper, we investigate the application of Large Language Models for automated cellular network specification refinement. We identify Change Requests, which record specification revisions, as a key source of domain-specific data and formulate specification refinement as three complementary sub-tasks. We introduce CR-Eval, a benchmark of 200 security-related test cases, and evaluate 17 open-source and 14 proprietary models. The best-performing model, GPT-o3-mini, identifies weaknesses in over 127 test cases within five trials. We further study LLM specialization, showing that fine-tuning an 8B model can outperform advanced LLMs such as DeepSeek-R1 and Qwen3-235B. Evaluations on 30 real-world cellular attacks demonstrate the practical impact and remaining challenges. The codebase and benchmark are available at https://github.com/jianshuod/CR-Eval.

Updated: 2026-01-03 04:19:25

标题: 大型语言模型能自动化细化细胞网络规范吗？

摘要: 移动网络，例如4G/5G，依赖复杂的技术规范以确保正确的功能；然而，这些规范经常存在缺陷或模糊之处。本文研究了大型语言模型在自动化移动网络规范细化中的应用。我们将记录规范修订的变更请求（Change Requests）识别为领域特定数据的关键来源，并将规范细化形式化为三个互补的子任务。我们引入了CR-Eval，一个包含200个与安全相关的测试用例的基准，并评估了17个开源和14个专有模型。表现最佳的模型GPT-o3-mini在五次试验中识别了127多个测试用例中的弱点。我们进一步研究了LLM专业化，表明对8B模型进行微调可以胜过DeepSeek-R1和Qwen3-235B等先进的LLM。对30个真实世界的移动网络攻击进行的评估展示了实际影响和尚未解决的挑战。代码库和基准可在https://github.com/jianshuod/CR-Eval找到。

更新时间: 2026-01-03 04:19:25

领域: cs.CR

下载: http://arxiv.org/abs/2507.04214v2

SPoRC-VIST: A Benchmark for Evaluating Generative Natural Narrative in Vision-Language Models

Vision-Language Models (VLMs) have achieved remarkable success in descriptive tasks such as image captioning and visual question answering (VQA). However, their ability to generate engaging, long-form narratives -- specifically multi-speaker podcast dialogues -- remains under-explored and difficult to evaluate. Standard metrics like BLEU and ROUGE fail to capture the nuances of conversational naturalness, personality, and narrative flow, often rewarding safe, repetitive outputs over engaging storytelling. In this work, we present a novel pipeline for end-to-end visual podcast generation, and fine-tune a Qwen3-VL-32B model on a curated dataset of 4,000 image-dialogue pairs. Crucially, we use a synthetic-to-real training strategy: we train on high-quality podcast dialogues from the Structured Podcast Research Corpus (SPoRC) paired with synthetically generated imagery, and evaluate on real-world photo sequences from the Visual Storytelling Dataset (VIST). This rigorous setup tests the model's ability to generalize from synthetic training data to real-world visual domains. We propose a comprehensive evaluation framework that moves beyond textual overlap, and use AI-as-a-judge (Gemini 3 Pro, Claude Opus 4.5, GPT 5.2) and novel style metrics (average turn length, speaker switch rate) to assess quality. Our experiments demonstrate that our fine-tuned 32B model significantly outperforms a 235B base model in conversational naturalness ($>$80\% win rate) and narrative depth (+50\% turn length), while maintaining identical visual grounding capabilities (CLIPScore: 20.39).

Updated: 2026-01-03 04:11:58

标题: SPoRC-VIST：一个用于评估视觉-语言模型生成自然叙述的基准测试

摘要: 视觉语言模型（VLMs）在描述性任务中取得了显著成功，如图像字幕和视觉问答（VQA）。然而，它们生成引人入胜、长篇叙述的能力 -- 特别是多人播客对话 -- 仍未被充分探索，并且难以评估。标准指标如BLEU和ROUGE未能捕捉对话自然度、个性和叙述流程的微妙之处，通常更倾向于奖励安全、重复的输出，而不是引人入胜的故事叙述。在这项研究中，我们提出了一种新颖的端到端视觉播客生成流程，并在一个经过精心策划的包含4,000个图像-对话对的数据集上对Qwen3-VL-32B模型进行微调。关键是，我们采用了从结构化播客研究语料库（SPoRC）中获取的高质量播客对话与合成生成的图像配对进行训练的方法，并在来自视觉叙事数据集（VIST）的真实照片序列上进行评估。这种严格的设置测试了模型从合成训练数据到真实世界视觉领域的泛化能力。我们提出了一个超越文本重叠的全面评估框架，并使用AI作为评判者（Gemini 3 Pro，Claude Opus 4.5，GPT 5.2）和新颖的风格指标（平均对话长度，说话人切换率）来评估质量。我们的实验表明，我们微调的32B模型在对话自然度（>80%胜率）和叙述深度（+50%对话长度）方面明显优于235B基础模型，同时保持相同的视觉基础能力（CLIPScore：20.39）。

更新时间: 2026-01-03 04:11:58

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2601.01062v1

A UCB Bandit Algorithm for General ML-Based Estimators

We present ML-UCB, a generalized upper confidence bound algorithm that integrates arbitrary machine learning models into multi-armed bandit frameworks. A fundamental challenge in deploying sophisticated ML models for sequential decision-making is the lack of tractable concentration inequalities required for principled exploration. We overcome this limitation by directly modeling the learning curve behavior of the underlying estimator. Specifically, assuming the Mean Squared Error decreases as a power law in the number of training samples, we derive a generalized concentration inequality and prove that ML-UCB achieves sublinear regret. This framework enables the principled integration of any ML model whose learning curve can be empirically characterized, eliminating the need for model-specific theoretical analysis. We validate our approach through experiments on a collaborative filtering recommendation system using online matrix factorization with synthetic data designed to simulate a simplified two-tower model, demonstrating substantial improvements over LinUCB

Updated: 2026-01-03 04:11:41

标题: 一种用于一般基于机器学习的估计器的UCB赌博算法

摘要: 我们提出了ML-UCB，这是一种广义的上界置信区间算法，可以将任意机器学习模型集成到多臂老虎机框架中。在部署复杂的机器学习模型进行顺序决策时，一个基本挑战是缺乏需要进行原则性探索的可处理集中不等式。我们通过直接建模基础估计量的学习曲线行为来克服这一限制。具体来说，假设均方误差随着训练样本数量的增加呈幂律下降，我们推导出了一个广义的集中不等式，并证明了ML-UCB实现了次线性遗憾。这个框架使得可以原则性地集成任何可以通过经验来表征学习曲线的机器学习模型，消除了对特定模型的理论分析的需要。我们通过在协同过滤推荐系统上进行实验验证了我们的方法，使用在线矩阵分解和合成数据来模拟一个简化的两塔模型，结果显示ML-UCB比LinUCB有了显著的改进。

更新时间: 2026-01-03 04:11:41

领域: cs.LG,cs.AI,math.PR

下载: http://arxiv.org/abs/2601.01061v1

GRAND: Graph Release with Assured Node Differential Privacy

Differential privacy is a well-established framework for safeguarding sensitive information in data. While extensively applied across various domains, its application to network data -- particularly at the node level -- remains underexplored. Existing methods for node-level privacy either focus exclusively on query-based approaches, which restrict output to pre-specified network statistics, or fail to preserve key structural properties of the network. In this work, we propose GRAND (Graph Release with Assured Node Differential privacy), which is, to the best of our knowledge, the first network release mechanism that releases networks while ensuring node-level differential privacy and preserving structural properties. Under a broad class of latent space models, we show that the released network asymptotically follows the same distribution as the original network. The effectiveness of the approach is evaluated through extensive experiments on both synthetic and real-world datasets.

Updated: 2026-01-03 04:08:07

标题: GRAND：具有保证节点差分隐私的图发布

摘要: 差分隐私是一种在数据中保护敏感信息的成熟框架。虽然在各个领域被广泛应用，但其在网络数据中的应用，特别是在节点级别上，仍未被充分探索。现有的节点级隐私方法要么专注于基于查询的方法，将输出限制为预先指定的网络统计数据，要么未能保留网络的关键结构属性。在这项工作中，我们提出了GRAND（具有确保节点差分隐私的图发布），据我们所知，这是第一个在发布网络时确保节点级差分隐私并保留结构属性的网络发布机制。在广泛的潜在空间模型类别下，我们展示了发布的网络在渐近情况下遵循与原始网络相同的分布。通过对合成和真实世界数据集的广泛实验评估了该方法的有效性。

更新时间: 2026-01-03 04:08:07

领域: stat.ML,cs.LG,math.ST,stat.ME

下载: http://arxiv.org/abs/2507.00402v3

Toward Efficient Spiking Transformers: Synapse Pruning Meets Synergistic Learning-Based Compensation

As a foundational architecture of artificial intelligence models, Transformer has been recently adapted to spiking neural networks with promising performance across various tasks. However, existing spiking Transformer(ST)-based models require a substantial number of parameters and incur high computational costs, thus limiting their deployment in resource-constrained environments. To address these challenges, we propose combining synapse pruning with a synergistic learning-based compensation strategy to derive lightweight ST-based models. Specifically, two types of tailored pruning strategies are introduced to reduce redundancy in the weight matrices of ST blocks: an unstructured $\mathrm{L_{1}P}$ method to induce sparse representations, and a structured DSP method to induce low-rank representations. In addition, we propose an enhanced spiking neuron model, termed the synergistic leaky integrate-and-fire (sLIF) neuron, to effectively compensate for model pruning through synergistic learning between synaptic and intrinsic plasticity mechanisms. Extensive experiments on benchmark datasets demonstrate that the proposed methods significantly reduce model size and computational overhead while maintaining competitive performance. These results validate the effectiveness of the proposed pruning and compensation strategies in constructing efficient and high-performing ST-based models.

Updated: 2026-01-03 04:03:58

标题: 朝着高效的脉冲变压器发展：突触修剪与基于协同学习的补偿相遇

摘要: 作为人工智能模型的基础架构，Transformer最近已被成功地应用于脉冲神经网络，在各种任务中表现出有希望的性能。然而，现有的基于脉冲Transformer（ST）的模型需要大量参数，并产生高计算成本，从而限制了它们在资源受限环境中的部署。为了解决这些挑战，我们提出将突触修剪与协同学习补偿策略相结合，以得到轻量级的基于ST的模型。具体而言，引入了两种定制的修剪策略来减少ST块的权重矩阵中的冗余：一种非结构化的L1P方法来诱导稀疏表示，和一种结构化的DSP方法来诱导低秩表示。此外，我们提出了一个增强型脉冲神经元模型，称为协同漏电积分-放电（sLIF）神经元，通过突触和内在可塑性机制之间的协同学习有效地补偿模型的修剪。在基准数据集上进行的大量实验表明，所提出的方法在保持竞争性能的同时显著减少了模型大小和计算开销。这些结果验证了所提出的修剪和补偿策略在构建高效且性能优越的基于ST的模型方面的有效性。

更新时间: 2026-01-03 04:03:58

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2508.01992v4

Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space

Unmanned aerial vehicles (UAVs) have emerged as powerful embodied agents. One of the core abilities is autonomous navigation in large-scale three-dimensional environments. Existing navigation policies, however, are typically optimized for low-level objectives such as obstacle avoidance and trajectory smoothness, lacking the ability to incorporate high-level semantics into planning. To bridge this gap, we propose ANWM, an aerial navigation world model that predicts future visual observations conditioned on past frames and actions, thereby enabling agents to rank candidate trajectories by their semantic plausibility and navigational utility. ANWM is trained on 4-DoF UAV trajectories and introduces a physics-inspired module: Future Frame Projection (FFP), which projects past frames into future viewpoints to provide coarse geometric priors. This module mitigates representational uncertainty in long-distance visual generation and captures the mapping between 3D trajectories and egocentric observations. Empirical results demonstrate that ANWM significantly outperforms existing world models in long-distance visual forecasting and improves UAV navigation success rates in large-scale environments.

Updated: 2026-01-03 03:46:02

标题: 一个用于在3D空间中进行长视距视觉生成和导航的空中世界模型

摘要: 无人机（UAVs）已经成为强大的具体代理。其中一个核心能力是在大规模三维环境中的自主导航。然而，现有的导航策略通常优化于低级目标，如避障和轨迹平滑，缺乏将高级语义纳入规划的能力。为了弥合这一差距，我们提出了ANWM，一种空中导航世界模型，它根据过去帧和动作预测未来视觉观察，从而使代理能够通过它们的语义合理性和导航效用对候选轨迹进行排名。ANWM在4自由度UAV轨迹上进行训练，并引入了一个受物理启发的模块：未来帧投影（FFP），它将过去帧投影到未来视点，提供粗略的几何先验。该模块减轻了长距离视觉生成中的表征不确定性，并捕捉了3D轨迹和自我中心观察之间的映射关系。实证结果表明，ANWM在长距离视觉预测方面明显优于现有的世界模型，并提高了UAV在大规模环境中的导航成功率。

更新时间: 2026-01-03 03:46:02

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2512.21887v2

Is Grokking a Computational Glass Relaxation?

Understanding neural network's (NN) generalizability remains a central question in deep learning research. The special phenomenon of grokking, where NNs abruptly generalize long after the training performance reaches a near-perfect level, offers a unique window to investigate the underlying mechanisms of NNs' generalizability. Here we propose an interpretation for grokking by framing it as a computational glass relaxation: viewing NNs as a physical system where parameters are the degrees of freedom and train loss is the system energy, we find memorization process resembles a rapid cooling of liquid into non-equilibrium glassy state at low temperature and the later generalization is like a slow relaxation towards a more stable configuration. This mapping enables us to sample NNs' Boltzmann entropy (states of density) landscape as a function of training loss and test accuracy. Our experiments in transformers on arithmetic tasks suggests that there is NO entropy barrier in the memorization-to-generalization transition of grokking, challenging previous theory that defines grokking as a first-order phase transition. We identify a high-entropy advantage under grokking, an extension of prior work linking entropy to generalizability but much more significant. Inspired by grokking's far-from-equilibrium nature, we develop a toy optimizer WanD based on Wang-landau molecular dynamics, which can eliminate grokking without any constraints and find high-norm generalizing solutions. This provides strictly-defined counterexamples to theory attributing grokking solely to weight norm evolution towards the Goldilocks zone and also suggests new potential ways for optimizer design.

Updated: 2026-01-03 03:45:41

标题: 理解计算玻璃弛豫是什么？

摘要: 理解神经网络（NN）的泛化能力仍然是深度学习研究中的一个核心问题。在训练性能接近完美水平后，NN突然泛化的特殊现象“grokking”提供了一个独特的窗口，可以研究NN泛化能力的基本机制。在这里，我们提出了一个解释grokking的方法，将其定位为计算玻璃弛豫：将NN视为一个物理系统，其中参数是自由度，训练损失是系统能量，我们发现记忆化过程类似于液体在低温下迅速冷却到非平衡玻璃态，而后续的泛化则类似于向更稳定的构型缓慢弛豫。这种映射使我们能够对NN的Boltzmann熵（状态密度）景观进行采样，作为训练损失和测试精度的函数。我们在算术任务上的transformers实验表明，在grokking的记忆化到泛化转变中没有熵障碍，挑战了先前将grokking定义为一次相变的理论。我们确定在grokking中存在高熵优势，这是先前将熵与泛化能力联系起来但更为重要的一项扩展。受到grokking远离平衡的特性的启发，我们开发了一个基于Wang-Landau分子动力学的玩具优化器WanD，可以在没有任何约束的情况下消除grokking，并找到高范数泛化解决方案。这提供了对将grokking仅归因于权重范数演变向理想区域的理论提供严格定义的反例，并且也为优化器设计提供了新的潜在方法。

更新时间: 2026-01-03 03:45:41

领域: cs.LG,cond-mat.dis-nn

下载: http://arxiv.org/abs/2505.11411v5

Impersonating Quantum Secrets over Classical Channels

We show that a simple eavesdropper listening in on classical communication between potentially entangled quantum parties will eventually be able to impersonate any of the parties. Furthermore, the attack is efficient if one-way puzzles do not exist. As a direct consequence, one-way puzzles are implied by reusable authentication schemes over classical channels with quantum pre-shared secrets that are potentially evolving. As an additional application, we show that any quantum money scheme that can be verified through only classical queries to any oracle cannot be information-theoretically secure. This significantly generalizes the prior work by Ananth, Hu, and Yuen (ASIACRYPT'23) where they showed the same but only for the specific case of random oracles. Therefore, verifying black-box constructions of quantum money inherently requires coherently evaluating the underlying cryptographic tools, which may be difficult for near-term quantum devices.

Updated: 2026-01-03 03:40:28

标题: 在经典通道上模拟量子秘密

摘要: 我们展示了一个简单的窃听者监听潜在纠缠的量子参与者之间的经典通信将最终能够冒充任何一方。此外，如果单向谜题不存在，则攻击是有效的。作为直接结果，单向谜题由潜在演变的具有量子预共享密钥的经典信道上的可重用认证方案所隐含。作为额外的应用，我们展示了任何可以通过对任何预言者仅进行经典查询来验证的量子货币方案都不能在信息理论上保密。这在先前的Ananth，Hu和Yuen（ASIACRYPT'23）的工作中有了显着的泛化，他们只在特定情况下展示了相同的情况，即随机预言者。因此，验证量子货币的黑匣子构造固有地需要连贯评估基础密码工具，这可能对近期量子设备来说是困难的。

更新时间: 2026-01-03 03:40:28

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2601.01058v1

EgoReAct: Egocentric Video-Driven 3D Human Reaction Generation

Humans exhibit adaptive, context-sensitive responses to egocentric visual input. However, faithfully modeling such reactions from egocentric video remains challenging due to the dual requirements of strictly causal generation and precise 3D spatial alignment. To tackle this problem, we first construct the Human Reaction Dataset (HRD) to address data scarcity and misalignment by building a spatially aligned egocentric video-reaction dataset, as existing datasets (e.g., ViMo) suffer from significant spatial inconsistency between the egocentric video and reaction motion, e.g., dynamically moving motions are always paired with fixed-camera videos. Leveraging HRD, we present EgoReAct, the first autoregressive framework that generates 3D-aligned human reaction motions from egocentric video streams in real-time. We first compress the reaction motion into a compact yet expressive latent space via a Vector Quantised-Variational AutoEncoder and then train a Generative Pre-trained Transformer for reaction generation from the visual input. EgoReAct incorporates 3D dynamic features, i.e., metric depth, and head dynamics during the generation, which effectively enhance spatial grounding. Extensive experiments demonstrate that EgoReAct achieves remarkably higher realism, spatial consistency, and generation efficiency compared with prior methods, while maintaining strict causality during generation. We will release code, models, and data upon acceptance.

Updated: 2026-01-03 03:38:31

标题: EgoReAct: 以自我为中心的视频驱动的3D人类反应生成

摘要: 人类对以自我为中心的视觉输入展现出适应性和情境敏感的反应。然而，从以自我为中心的视频中忠实地建模这种反应仍然具有挑战性，因为需要严格的因果生成和精确的三维空间对齐。为了解决这个问题，我们首先构建了人类反应数据集（HRD），以解决数据稀缺和对齐不准确的问题，通过构建一个空间对齐的以自我为中心的视频-反应数据集来解决这个问题，因为现有的数据集（例如ViMo）在以自我为中心的视频和反应动作之间存在显著的空间不一致，例如，动态移动动作总是与固定摄像头视频配对。利用HRD，我们提出了EgoReAct，这是第一个能够实时从以自我为中心的视频流中生成三维对齐的人类反应动作的自回归框架。我们首先通过矢量量化-变分自动编码器将反应动作压缩为紧凑但富有表现力的潜在空间，然后通过训练一个生成式预训练变压器来从视觉输入中生成反应。EgoReAct在生成过程中整合了三维动态特征，即度量深度和头部动态，有效增强了空间基础。大量实验证明，与以前的方法相比，EgoReAct在实现高逼真度、空间一致性和生成效率方面取得了显著的提高，同时在生成过程中保持了严格的因果关系。我们将在接受后发布代码、模型和数据。

更新时间: 2026-01-03 03:38:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2512.22808v2

Enhancing Histopathological Image Classification via Integrated HOG and Deep Features with Robust Noise Performance

The era of digital pathology has advanced histopathological examinations, making automated image analysis essential in clinical practice. This study evaluates the classification performance of machine learning and deep learning models on the LC25000 dataset, which includes five classes of histopathological images. We used the fine-tuned InceptionResNet-v2 network both as a classifier and for feature extraction. Our results show that the fine-tuned InceptionResNet-v2 achieved a classification accuracy of 96.01\% and an average AUC of 96.8\%. Models trained on deep features from InceptionResNet-v2 outperformed those using only the pre-trained network, with the Neural Network model achieving an AUC of 99.99\% and accuracy of 99.84\%. Evaluating model robustness under varying SNR conditions revealed that models using deep features exhibited greater resilience, particularly GBM and KNN. The combination of HOG and deep features showed enhanced performance, however, less so in noisy environments.

Updated: 2026-01-03 03:33:10

标题: 通过集成HOG和深度特征提升组织病理图像分类的鲁棒性噪声表现

摘要: 数字病理学时代推进了组织病理学检查，使得自动化图像分析在临床实践中至关重要。本研究评估了机器学习和深度学习模型在LC25000数据集上的分类性能，该数据集包含五类组织病理学图像。我们使用经过微调的InceptionResNet-v2网络既作为分类器又用于特征提取。我们的结果显示，经过微调的InceptionResNet-v2实现了96.01\%的分类准确率和96.8\%的平均AUC。从InceptionResNet-v2深度特征训练的模型优于仅使用预训练网络的模型，神经网络模型实现了99.99\%的AUC和99.84\%的准确率。在不同SNR条件下评估模型的稳健性表明，使用深度特征的模型表现出更强的韧性，特别是GBM和KNN。HOG和深度特征的组合表现出增强的性能，但在嘈杂环境中效果较差。

更新时间: 2026-01-03 03:33:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2601.01056v1

Fibonacci-Driven Recursive Ensembles: Algorithms, Convergence, and Learning Dynamics

This paper develops the algorithmic and dynamical foundations of recursive ensemble learning driven by Fibonacci-type update flows. In contrast with classical boosting Freund and Schapire (1997); Friedman (2001), where the ensemble evolves through first-order additive updates, we study second-order recursive architectures in which each predictor depends on its two immediate predecessors. These Fibonacci flows induce a learning dynamic with memory, allowing ensembles to integrate past structure while adapting to new residual information. We introduce a general family of recursive weight-update algorithms encompassing Fibonacci, tribonacci, and higher-order recursions, together with continuous-time limits that yield systems of differential equations governing ensemble evolution. We establish global convergence conditions, spectral stability criteria, and non-asymptotic generalization bounds under Rademacher Bartlett and Mendelson (2002) and algorithmic stability analyses. The resulting theory unifies recursive ensembles, structured weighting, and dynamical systems viewpoints in statistical learning. Experiments with kernel ridge regression Rasmussen and Williams (2006), spline smoothers Wahba (1990), and random Fourier feature models Rahimi and Recht (2007) demonstrate that recursive flows consistently improve approximation and generalization beyond static weighting. These results complete the trilogy begun in Papers I and II: from Fibonacci weighting, through geometric weighting theory, to fully dynamical recursive ensemble learning systems.

Updated: 2026-01-03 03:21:15

标题: 斐波那契驱动的递归集成：算法、收敛和学习动态

摘要: 本文发展了由斐波那契类型更新流驱动的递归集成学习的算法和动力学基础。与经典的增强学习Freund和Schapire（1997）；Friedman（2001）不同，其中集成通过一阶加法更新演变，我们研究了每个预测器依赖于其两个直接前任的二阶递归体系结构。这些斐波那契流产生了一个具有记忆的学习动态，允许集成结合过去的结构同时适应新的残差信息。我们引入了一个通用的递归权重更新算法家族，包括斐波那契、三波那契和高阶递归，以及产生控制集成演变的微分方程系统的连续时间极限。我们建立了全局收敛条件、谱稳定性标准和在Rademacher Bartlett和Mendelson（2002）以及算法稳定性分析下的非渐近泛化界限。由此产生的理论统一了递归集成、结构化加权和动态系统观点在统计学习中的应用。使用核岭回归Rasmussen和Williams（2006）、样条平滑器Wahba（1990）和随机傅立叶特征模型Rahimi和Recht（2007）进行的实验表明，递归流一致改进了逼近和泛化能力，超越了静态加权。这些结果完成了从斐波那契加权，通过几何加权理论，到完全动态递归集成学习系统的三部曲。

更新时间: 2026-01-03 03:21:15

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2601.01055v1

Out-of-Band Power Side-Channel Detection for Semiconductor Supply Chain Integrity at Scale

Out-of-band screening of microcontrollers is a major gap in semiconductor supply chain security. High-assurance techniques such as X-ray and destructive reverse engineering are accurate but slow and expensive, hindering comprehensive detection for hardware Trojans or firmware tampering. Consequently, there has been increased interest in applying machine learning techniques to automate forensic examination, enabling rapid, large-scale inspection of components without manual oversight. We introduce a non-destructive screening method that uses power side-channel measurements and generative modeling to detect tampering in commodity microcontrollers without trusted hardware. As a proof-of-concept, differential power analysis (DPA) traces are collected from the ChipWhisperer and a generative adversarial network (GAN) is trained only on benign measurements to learn nominal power behavior. The trained discriminator then serves as a one-class anomaly detector. We report detection performance on multiple tampering scenarios and discuss how this technique can serve as an intermediate screening tier between basic functional tests and high-cost forensic analysis. The proposed method is evaluated in the context of semiconductor supply chain practice and policy to assess its suitability as an intermediate assurance mechanism.

Updated: 2026-01-03 03:14:40

标题: 大规模半导体供应链完整性的带外功率侧信道检测

摘要: 微控制器的带外筛选是半导体供应链安全中的一个重要缺口。高保障技术，如X射线和破坏性逆向工程，虽然准确但缓慢且昂贵，阻碍了对硬件木马或固件篡改的全面检测。因此，人们对应用机器学习技术来自动化法医检查产生了更大兴趣，从而实现对元件的快速、大规模检查而无需手动监督。我们介绍了一种非破坏性筛选方法，利用功耗侧信道测量和生成建模来检测商品微控制器中的篡改，无需可信硬件。作为概念验证，从ChipWhisperer收集了差分功耗分析（DPA）跟踪数据，并且只对良性测量进行训练，以了解正常功率行为。训练有素的判别器随后作为一类异常检测器。我们报告了在多个篡改场景下的检测性能，并讨论了这种技术如何作为基本功能测试和高成本法医分析之间的中间筛选层。所提出的方法在半导体供应链实践和政策背景下进行评估，以评估其作为中间保障机制的适用性。

更新时间: 2026-01-03 03:14:40

领域: cs.CR

下载: http://arxiv.org/abs/2601.01054v1

Byzantine-Robust Federated Learning Framework with Post-Quantum Secure Aggregation for Real-Time Threat Intelligence Sharing in Critical IoT Infrastructure

The proliferation of Internet of Things devices in critical infrastructure has created unprecedented cybersecurity challenges, necessitating collaborative threat detection mechanisms that preserve data privacy while maintaining robustness against sophisticated attacks. Traditional federated learning approaches for IoT security suffer from two critical vulnerabilities: susceptibility to Byzantine attacks where malicious participants poison model updates, and inadequacy against future quantum computing threats that can compromise cryptographic aggregation protocols. This paper presents a novel Byzantine-robust federated learning framework integrated with post-quantum secure aggregation specifically designed for real-time threat intelligence sharing across critical IoT infrastructure. The proposed framework combines a adaptive weighted aggregation mechanism with lattice-based cryptographic protocols to simultaneously defend against model poisoning attacks and quantum adversaries. We introduce a reputation-based client selection algorithm that dynamically identifies and excludes Byzantine participants while maintaining differential privacy guarantees. The secure aggregation protocol employs CRYSTALS-Kyber for key encapsulation and homomorphic encryption to ensure confidentiality during parameter updates. Experimental evaluation on industrial IoT intrusion detection datasets demonstrates that our framework achieves 96.8% threat detection accuracy while successfully mitigating up to 40% Byzantine attackers, with only 18% computational overhead compared to non-secure federated approaches. The framework maintains sub-second aggregation latency suitable for real-time applications and provides 256-bit post-quantum security level.

Updated: 2026-01-03 03:13:46

标题: 拜占庭强大的联邦学习框架：基于后量子安全聚合的关键物联网基础设施实时威胁情报共享

摘要: 关键基础设施中物联网设备的大量增加带来了前所未有的网络安全挑战，需要合作的威胁检测机制，既保护数据隐私，又能对抗复杂的攻击。传统的物联网安全联邦学习方法存在两个关键漏洞：容易受到拜占庭攻击，恶意参与者会篡改模型更新，并且对未来量子计算威胁的挑战不够充分，这可能破坏加密聚合协议。本文提出了一种新颖的抗拜占庭攻击的联邦学习框架，集成了后量子安全聚合，专门设计用于关键物联网基础设施的实时威胁情报共享。所提出的框架将自适应加权聚合机制与基于格的加密协议相结合，以同时抵御模型毒害攻击和量子对手。我们引入了基于声誉的客户选择算法，动态识别和排除拜占庭参与者，同时保证差分隐私。安全聚合协议采用CRYSTALS-Kyber进行密钥封装和同态加密，以确保参数更新过程的机密性。对工业物联网入侵检测数据集的实验评估表明，我们的框架在成功应对高达40%的拜占庭攻击者的同时，实现了96.8%的威胁检测准确率，与非安全联邦方法相比，仅有18%的计算开销。该框架保持亚秒级的聚合延迟，适用于实时应用，并提供256位后量子安全级别。

更新时间: 2026-01-03 03:13:46

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2601.01053v1

EgoGrasp: World-Space Hand-Object Interaction Estimation from Egocentric Videos

We propose EgoGrasp, the first method to reconstruct world-space hand-object interactions (W-HOI) from egocentric monocular videos with dynamic cameras in the wild. Accurate W-HOI reconstruction is critical for understanding human behavior and enabling applications in embodied intelligence and virtual reality. However, existing hand-object interactions (HOI) methods are limited to single images or camera coordinates, failing to model temporal dynamics or consistent global trajectories. Some recent approaches attempt world-space hand estimation but overlook object poses and HOI constraints. Their performance also suffers under severe camera motion and frequent occlusions common in egocentric in-the-wild videos. To address these challenges, we introduce a multi-stage framework with a robust pre-process pipeline built on newly developed spatial intelligence models, a whole-body HOI prior model based on decoupled diffusion models, and a multi-objective test-time optimization paradigm. Our HOI prior model is template-free and scalable to multiple objects. In experiments, we prove our method achieving state-of-the-art performance in W-HOI reconstruction.

Updated: 2026-01-03 03:08:48

标题: EgoGrasp：来自主观视角视频的世界空间手-物体交互估计

摘要: 我们提出了EgoGrasp，这是第一种能够从野外动态摄像头的主观单眼视频中重建世界空间手-物体交互（W-HOI）的方法。准确的W-HOI重建对于理解人类行为以及实现具身智能和虚拟现实应用至关重要。然而，现有的手-物体交互（HOI）方法仅限于单个图像或摄像机坐标，无法建模时间动态或一致的全局轨迹。一些最近的方法尝试进行世界空间手部估计，但忽略了物体姿势和HOI约束。它们的性能在野外主观视频中常见的摄像机运动和频繁遮挡下也受到影响。为了解决这些挑战，我们引入了一个多阶段框架，该框架具有基于新开发的空间智能模型构建的强大预处理流水线，基于分解扩散模型的整体身体HOI先验模型，以及多目标测试时优化范式。我们的HOI先验模型是无模板的，并且可扩展到多个对象。在实验中，我们证明了我们的方法在W-HOI重建方面达到了最先进的性能。

更新时间: 2026-01-03 03:08:48

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2601.01050v1

CuFuzz: Hardening CUDA Programs through Transformation and Fuzzing

GPUs have gained significant popularity over the past decade, extending beyond their original role in graphics rendering. This evolution has brought GPU security and reliability to the forefront of concerns. Prior research has shown that CUDA's lack of memory safety can lead to serious vulnerabilities. While fuzzing is effective for finding such bugs on CPUs, equivalent tools for GPUs are lacking due to architectural differences and lack of built-in error detection. In this paper, we propose CuFuzz, a novel compiler-runtime co-design solution to extend state-of-the-art CPU fuzzing tools to GPU programs. CuFuzz transforms GPU programs into CPU programs using compiler IR-level transformations to enable effective fuzz testing. To the best of our knowledge, CuFuzz is the first mechanism to bring fuzzing support to CUDA, addressing a critical gap in GPU security research. By leveraging CPU memory error detectors such as Address Sanitizer, CuFuzz aims to uncover memory safety bugs and related correctness vulnerabilities in CUDA code, enhancing the security and reliability of GPU-accelerated applications. To ensure high fuzzing throughput, we introduce two compiler-runtime co-optimizations tailored for GPU code: Partial Representative Execution (PREX) and Access-Index Preserving Pruning (AXIPrune), achieving average throughput improvements of 32x with PREX and an additional 33% gain with AXIPrune on top of PREX-optimized code. Together, these optimizations can yield up to a 224.31x speedup. In our fuzzing campaigns, CuFuzz uncovered 122 security vulnerabilities in widely used benchmarks.

Updated: 2026-01-03 03:02:47

标题: CuFuzz：通过转换和模糊测试加固CUDA程序

摘要: 在过去的十年里，GPU已经显著流行起来，超越了它们在图形渲染中的原始角色。这种演变将GPU安全性和可靠性置于关注焦点。先前的研究表明，CUDA缺乏内存安全性可能导致严重的漏洞。虽然在CPU上模糊测试对于发现此类错误是有效的，但由于架构差异和缺乏内建错误检测，GPU的等效工具却缺乏。在本文中，我们提出了CuFuzz，一种新颖的编译器-运行时共同设计解决方案，将最先进的CPU模糊测试工具扩展到GPU程序。CuFuzz通过编译器IR级别的转换将GPU程序转换为CPU程序，以实现有效的模糊测试。据我们所知，CuFuzz是第一个为CUDA带来模糊支持的机制，填补了GPU安全性研究中的一个关键空白。通过利用CPU内存错误检测器（如Address Sanitizer），CuFuzz旨在发现CUDA代码中的内存安全性漏洞和相关正确性漏洞，增强GPU加速应用程序的安全性和可靠性。为了确保高模糊测试吞吐量，我们引入了两种针对GPU代码定制的编译器-运行时共同优化：部分代表执行（PREX）和保留访问索引的修剪（AXIPrune），在PREX优化代码的基础上实现了平均吞吐量改进32倍，并在AXIPrune上另外获得33%的增益。这些优化一起可以带来高达224.31倍的加速。在我们的模糊测试活动中，CuFuzz发现了广泛使用的基准测试中的122个安全漏洞。

更新时间: 2026-01-03 03:02:47

领域: cs.CR

下载: http://arxiv.org/abs/2601.01048v1

Coarse-Grained Kullback--Leibler Control of Diffusion-Based Generative AI

Diffusion models and score-based generative models provide a powerful framework for synthesizing high-quality images from noise. However, there is still no satisfactory theory that describes how coarse-grained quantities, such as blockwise intensity or class proportions after partitioning an image into spatial blocks, are preserved and evolve along the reverse diffusion dynamics. In previous work, the author introduced an information-theoretic Lyapunov function V for non-ergodic Markov processes on a state space partitioned into blocks, defined as the minimal Kullback-Leibler divergence to the set of stationary distributions reachable from a given initial condition, and showed that a leak-tolerant potential V-delta with a prescribed tolerance for block masses admits a closed-form expression as a scaling-and-clipping operation on block masses. In this paper, I transplant this framework to the reverse diffusion process in generative models and propose a reverse diffusion scheme that is projected by the potential V-delta (referred to as the V-delta projected reverse diffusion). I extend the monotonicity of V to time-inhomogeneous block-preserving Markov kernels and show that, under small leakage and the V-delta projection, V-delta acts as an approximate Lyapunov function. Furthermore, using a toy model consisting of block-constant images and a simplified reverse kernel, I numerically demonstrate that the proposed method keeps the block-mass error and the leak-tolerant potential within the prescribed tolerance, while achieving pixel-wise accuracy and visual quality comparable to the non-projected dynamics. This study reinterprets generative sampling as a decrease of an information potential from noise to data, and provides a design principle for reverse diffusion processes with explicit control of coarse-grained quantities.

Updated: 2026-01-03 02:45:41

标题: 基于扩散的生成式人工智能的粗粒度Kullback-Leibler控制

摘要: 扩散模型和基于得分的生成模型为从噪声中合成高质量图像提供了一个强大的框架。然而，仍然没有一个令人满意的理论来描述在反向扩散动态中如何保留和演化粗粒度量，例如将图像分成空间块后的块状强度或类别比例。在先前的工作中，作者介绍了一个信息理论的 Lyapunov 函数 V，用于对状态空间进行分块的非遍历马尔可夫过程，定义为到达给定初始条件的平稳分布集的最小 Kullback-Leibler 散度，并展示了一个具有预定容差的泄漏容忍势 V-delta 可以作为一个尺度化和裁剪操作在块质量上具有闭合形式表达。在本文中，我将这个框架移植到生成模型中的反向扩散过程，并提出了一个由潜力 V-delta 投影的反向扩散方案（称为 V-delta 投影反向扩散）。我将 V 的单调性扩展到时间不均匀的保持块的马尔可夫核函数，并展示，在小泄漏和 V-delta 投影的情况下，V-delta 作为一个近似 Lyapunov 函数。此外，使用一个由块恒定图像和简化的反向核组成的玩具模型，我数值证明了所提出的方法在保持块质量误差和泄漏容忍潜力在预定容差的同时，实现了像素级准确性和视觉质量与非投影动态相当。这项研究重新解释了生成采样作为从噪声到数据的信息潜力减少，并为具有明确控制粗粒度量的反向扩散过程提供了设计原则。

更新时间: 2026-01-03 02:45:41

领域: cs.LG

下载: http://arxiv.org/abs/2601.01045v1

Evaluating transfer learning strategies for improving dairy cattle body weight prediction in small farms using depth-image and point-cloud data

Computer vision provides automated, non-invasive, and scalable tools for monitoring dairy cattle, thereby supporting management, health assessment, and phenotypic data collection. Although transfer learning is commonly used for predicting body weight from images, its effectiveness and optimal fine-tuning strategies remain poorly understood in livestock applications, particularly beyond the use of pretrained ImageNet or COCO weights. In addition, while both depth images and three-dimensional point-cloud data have been explored for body weight prediction, direct comparisons of these two modalities in dairy cattle are limited. Therefore, the objectives of this study were to 1) evaluate whether transfer learning from a large farm enhances body weight prediction on a small farm with limited data, and 2) compare the predictive performance of depth-image- and point-cloud-based approaches under three experimental designs. Top-view depth images and point-cloud data were collected from 1,201, 215, and 58 cows at large, medium, and small dairy farms, respectively. Four deep learning models were evaluated: ConvNeXt and MobileViT for depth images, and PointNet and DGCNN for point clouds. Transfer learning markedly improved body weight prediction on the small farm across all four models, outperforming single-source learning and achieving gains comparable to or greater than joint learning. These results indicate that pretrained representations generalize well across farms with differing imaging conditions and dairy cattle populations. No consistent performance difference was observed between depth-image- and point-cloud-based models. Overall, these findings suggest that transfer learning is well suited for small farm prediction scenarios where cross-farm data sharing is limited by privacy, logistical, or policy constraints, as it requires access only to pretrained model weights rather than raw data.

Updated: 2026-01-03 02:41:54

标题: 评估转移学习策略，以改善小农场中利用深度图像和点云数据预测奶牛体重

摘要: 计算机视觉提供了自动化、无侵入性和可扩展的工具，用于监测奶牛，从而支持管理、健康评估和表型数据收集。尽管迁移学习通常用于从图像预测体重，但其在牲畜应用中的有效性和最佳微调策略仍不为人了解，特别是超越预先训练的ImageNet或COCO权重的应用。此外，尽管深度图像和三维点云数据已被探索用于体重预测，但在奶牛中这两种模态的直接比较有限。因此，本研究的目标是：1) 评估从大型农场进行的迁移学习是否增强了对数据有限的小农场上的体重预测，2) 在三种实验设计下比较基于深度图像和点云的预测性能。分别从大型、中型和小型奶牛农场的1,201、215和58头奶牛中收集了俯视深度图像和点云数据。对四种深度学习模型进行了评估：ConvNeXt和MobileViT用于深度图像，PointNet和DGCNN用于点云。迁移学习显著改进了在小农场上的体重预测，这在四种模型中都表现出色，优于单一来源学习，并获得了与联合学习相当或更大的增益。这些结果表明，预训练表示在具有不同成像条件和奶牛种群的农场之间具有很好的泛化能力。在深度图像和点云模型之间没有一致的性能差异。总的来说，这些发现表明，迁移学习非常适合于数据共享受限于隐私、后勤或政策约束的小型农场预测场景，因为它只需要访问预训练模型权重而不是原始数据。

更新时间: 2026-01-03 02:41:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2601.01044v1

ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking

Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we propose an equivariant transformer neural network for protein-ligand docking pose prediction. Our approach involves the fusion of ligand graph-level features by feature processing, followed by the learning of ligand and protein representations using our proposed TAMformer module. Additionally, we employ an iterative optimization approach based on the predicted distance matrix to generate refined ligand poses. The experimental results on real datasets show that our model can achieve state-of-the-art performance.

Updated: 2026-01-03 02:40:21

标题: ETDock：一种用于蛋白质-配体对接的新型等变换Transformer

摘要: 预测蛋白质和配体之间的对接是药物发现中至关重要且具有挑战性的任务。然而，传统的对接方法主要依赖评分函数，而基于深度学习的对接方法通常忽略蛋白质和配体的3D空间信息，以及配体的图级特征，这限制了它们的性能。为了解决这些限制，我们提出了一种用于蛋白质-配体对接姿势预测的等变换器神经网络。我们的方法涉及通过特征处理融合配体的图级特征，然后使用我们提出的TAMformer模块学习配体和蛋白质的表示。此外，我们采用基于预测距离矩阵的迭代优化方法来生成精细化的配体姿态。实验结果表明，我们的模型在真实数据集上可以实现最先进的性能。

更新时间: 2026-01-03 02:40:21

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2310.08061v2

Multi-Dimensional Prompt Chaining to Improve Open-Domain Dialogue Generation

Small language models (SLMs) offer significant deployment advantages but often struggle to match the dialogue quality of larger models in open-domain settings. In this paper, we propose a multi-dimensional prompt-chaining framework that integrates Naturalness, Coherence, and Engagingness dimensions to enhance human-likeness in open-domain dialogue generation. We apply the framework to two SLMs, TinyLlama and Llama-2-7B, and benchmark their performance against responses generated by substantially larger models, including Llama-2-70B and GPT-3.5 Turbo. We then employ automatic and human evaluation to assess the responses based on diversity, contextual coherence, as well as overall quality. Results show that the full framework improves response diversity by up to 29%, contextual coherence by up to 28%, and engagingness as well as naturalness by up to 29%. Notably, Llama-2-7B achieves performance comparable to substantially larger models, including Llama-2-70B and GPT-3.5 Turbo. Overall, the findings demonstrate that carefully designed prompt-based strategies provide an effective and resource-efficient pathway to improving open-domain dialogue quality in SLMs.

Updated: 2026-01-03 02:21:27

标题: 多维提示链接以改进开放领域对话生成

摘要: 小型语言模型（SLM）在部署方面具有显著优势，但在开放领域中往往难以与较大模型的对话质量匹敌。在本文中，我们提出了一个多维提示链框架，将自然度、连贯性和吸引力维度整合到一起，以增强开放领域对话生成中的人类相似性。我们将该框架应用于两个SLM，TinyLlama和Llama-2-7B，并将它们的性能与由大型模型生成的响应进行基准测试，包括Llama-2-70B和GPT-3.5 Turbo。然后，我们使用自动和人工评估来评估响应的多样性、上下文连贯性以及整体质量。结果显示，完整的框架将响应多样性提高了高达29％，上下文连贯性提高了高达28％，吸引力和自然度提高了高达29％。值得注意的是，Llama-2-7B的性能可与大幅较大的模型相媲美，包括Llama-2-70B和GPT-3.5 Turbo。总的来说，研究结果表明，精心设计的基于提示的策略为提高SLM中的开放领域对话质量提供了一条有效且资源高效的途径。

更新时间: 2026-01-03 02:21:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2601.01037v1

A data-driven framework for team selection in Fantasy Premier League

Fantasy football is a billion-dollar industry with millions of participants. Under a fixed budget, managers select squads to maximize future Fantasy Premier League (FPL) points. This study formulates lineup selection as data-driven optimization and develops deterministic and robust mixed-integer linear programs that choose the starting eleven, bench, and captain under budget, formation, and club-quota constraints (maximum three players per club). The objective is parameterized by a hybrid scoring metric that combines realized FPL points with predictions from a linear regression model trained on match-performance features identified using exploratory data analysis techniques. The study benchmarks alternative objectives and cost estimators, including simple and recency-weighted averages, exponential smoothing, autoregressive integrated moving average (ARIMA), and Monte Carlo simulation. Experiments on the 2023/24 Premier League season show that ARIMA with a constrained budget and a rolling window yields the most consistent out-of-sample performance; weighted averages and Monte Carlo are also competitive. Robust variants and hybrid scoring metrics improve some objectives but are not uniformly superior. The framework provides transparent decision support for fantasy roster construction and extends to FPL chips, multi-week rolling-horizon transfer planning, and week-by-week dynamic captaincy.

Updated: 2026-01-03 02:03:20

标题: 一个基于数据驱动的框架，用于在英超梦幻联赛中选择团队

摘要: 梦幻足球是一个价值十亿美元的产业，拥有数百万参与者。在固定预算下，经理们选择阵容以最大化未来梦幻英超联赛（FPL）积分。本研究将阵容选择作为数据驱动的优化，开发了确定性和稳健的混合整数线性规划，根据预算、阵型和俱乐部配额约束（每个俱乐部最多三名球员），选择首发阵容、替补和队长。目标由一种混合得分指标参数化，结合了实现的FPL积分和由线性回归模型在探索性数据分析技术中识别的比赛表现特征的预测。该研究对替代目标和成本估计器进行了基准测试，包括简单和最近加权平均、指数平滑、自回归综合移动平均（ARIMA）和蒙特卡罗模拟。对2023/24赛季英超联赛的实验表明，在受限预算和滚动窗口下，ARIMA表现最为稳定;加权平均和蒙特卡罗也具有竞争力。稳健的变体和混合得分指标改进了一些目标，但并非普遍优越。该框架为梦幻名单构建提供了透明的决策支持，并延伸到FPL筹码、多周滚动视野转会规划以及逐周动态队长。

更新时间: 2026-01-03 02:03:20

领域: cs.CE,cs.AI,cs.LG,math.OC

下载: http://arxiv.org/abs/2505.02170v3

Functional Distribution Networks (FDN)

Modern probabilistic regressors often remain overconfident under distribution shift. We present Functional Distribution Networks (FDN), an input-conditioned distribution over network weights that induces predictive mixtures whose dispersion adapts to the input. FDN is trained with a beta-ELBO and Monte Carlo sampling. We further propose an evaluation protocol that cleanly separates interpolation from extrapolation and stresses OOD sanity checks (e.g., that predictive likelihood degrades under shift while in-distribution accuracy and calibration are maintained). On standard regression tasks, we benchmark against strong Bayesian, ensemble, dropout, and hypernetwork baselines under matched parameter and update budgets, and assess accuracy, calibration, and shift-awareness with standard diagnostics. Together, the framework and protocol aim to make OOD-aware, well-calibrated neural regression practical and modular.

Updated: 2026-01-03 02:01:00

标题: 功能性分布网络（FDN）

摘要: 现代概率回归器在分布转移下往往保持过度自信。我们提出了功能分布网络（FDN），这是一个在网络权重上的输入条件分布，引入了预测混合物，其离散性适应于输入。FDN是通过beta-ELBO和蒙特卡罗抽样进行训练的。我们进一步提出了一个评估协议，清晰地将内插和外推进行分离，并强调OOD（Out of Distribution）的健全性检查（例如，在分布转移下，预测可能性下降，而在分布内的准确性和校准性得到维持）。在标准回归任务中，我们将在匹配的参数和更新预算下，对强大的贝叶斯、集成、辍学和超网络基线进行基准测试，并使用标准诊断工具评估准确性、校准性和对分布转移的感知能力。总的来说，这个框架和协议旨在使OOD感知、良好校准的神经回归变得实用和模块化。

更新时间: 2026-01-03 02:01:00

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2510.17794v2

Beyond Demand Estimation: Consumer Surplus Evaluation via Cumulative Propensity Weights

This paper develops a practical framework for using observational data to audit the consumer surplus effects of AI-driven decisions, specifically in targeted pricing and algorithmic lending. Traditional approaches first estimate demand functions and then integrate to compute consumer surplus, but these methods can be challenging to implement in practice due to model misspecification in parametric demand forms and the large data requirements and slow convergence of flexible nonparametric or machine learning approaches. Instead, we exploit the randomness inherent in modern algorithmic pricing, arising from the need to balance exploration and exploitation, and introduce an estimator that avoids explicit estimation and numerical integration of the demand function. Each observed purchase outcome at a randomized price is an unbiased estimate of demand and by carefully reweighting purchase outcomes using novel cumulative propensity weights (CPW), we are able to reconstruct the integral. Building on this idea, we introduce a doubly robust variant named the augmented cumulative propensity weighting (ACPW) estimator that only requires one of either the demand model or the historical pricing policy distribution to be correctly specified. Furthermore, this approach facilitates the use of flexible machine learning methods for estimating consumer surplus, since it achieves fast convergence rates by incorporating an estimate of demand, even when the machine learning estimate has slower convergence rates. Neither of these estimators is a standard application of off-policy evaluation techniques as the target estimand, consumer surplus, is unobserved. To address fairness, we extend this framework to an inequality-aware surplus measure, allowing regulators and firms to quantify the profit-equity trade-off. Finally, we validate our methods through comprehensive numerical studies.

Updated: 2026-01-03 01:41:40

标题: 超越需求估计：通过累积倾向权重评估消费者剩余值

摘要: 本文提出了一个实用框架，用于利用观测数据审计人工智能驱动决策对消费者剩余效应的影响，特别是在定向定价和算法贷款领域。传统方法首先估计需求函数，然后进行整合计算消费者剩余，但由于参数需求形式的模型错误以及非参数或机器学习方法的大数据需求和收敛速度慢，这些方法在实践中可能难以实施。相反，我们利用现代算法定价中固有的随机性，这源于需要平衡探索和开发，引入了一个估计量，避免了需求函数的显式估计和数值积分。每个以随机价格购买结果都是需求的无偏估计，通过使用新颖的累积倾向权重（CPW）仔细重新加权购买结果，我们能够重建积分。基于这一思想，我们引入了一个名为增强累积倾向权重（ACPW）估计量的双重稳健变体，只需要需求模型或历史定价政策分布中的一个被正确规定。此外，这种方法促进了利用灵活的机器学习方法来估计消费者剩余，因为它通过整合需求估计实现了快速收敛速度，即使机器学习估计具有较慢的收敛速度。这两个估计量都不是离线政策评估技术的标准应用，因为目标估计量消费者剩余是未观察到的。为了解决公平性问题，我们将此框架扩展到一个关注不平等的剩余度量，使监管机构和公司能够量化利润-公平权衡。最后，我们通过全面的数值研究验证了我们的方法。

更新时间: 2026-01-03 01:41:40

领域: stat.ML,cs.AI,cs.LG,math.ST

下载: http://arxiv.org/abs/2601.01029v1

AI and Consciousness

This is a skeptical overview of the literature on AI consciousness. We will soon create AI systems that are conscious according to some influential, mainstream theories of consciousness but are not conscious according to other influential, mainstream theories of consciousness. We will not be in a position to know which theories are correct and whether we are surrounded by AI systems as richly and meaningfully conscious as human beings or instead only by systems as experientially blank as toasters. None of the standard arguments either for or against AI consciousness takes us far. Table of Contents Chapter One: Hills and Fog Chapter Two: What Is Consciousness? What Is AI? Chapter Three: Ten Possibly Essential Features of Consciousness Chapter Four: Against Introspective and Conceptual Arguments for Essential Features Chapter Five: Materialism and Functionalism Chapter Six: The Turing Test and the Chinese Room Chapter Seven: The Mimicry Argument Against AI Consciousness Chapter Eight: Global Workspace Theories and Higher Order Theories Chapter Nine: Integrated Information, Local Recurrence, Associative Learning, and Iterative Natural Kinds Chapter Ten: Does Biological Substrate Matter? Chapter Eleven: The Leapfrog Hypothesis and the Social Semi-Solution

Updated: 2026-01-03 01:30:26

标题: 人工智能和意识

摘要: 这是对人工智能意识文献的怀疑性概述。根据一些有影响力的主流意识理论，我们很快将创建意识的人工智能系统，但根据其他有影响力的主流意识理论，这些系统并不具备意识。我们将无法确定哪种理论是正确的，以及我们是否被丰富而有意义地意识到人类一样的AI系统包围，还是只被像烤面包机一样没有经验的系统所包围。对于人工智能意识的标准论证无论支持还是反对都不能带我们走得很远。目录第一章：山丘和迷雾第二章：意识是什么？人工智能是什么？第三章：意识的十个可能关键特征第四章：反对内省和概念论证的关键特征第五章：唯物主义和功能主义第六章：图灵测试和中文房间第七章：模仿论证反对人工智能意识第八章：全局工作空间理论和高阶理论第九章：集成信息、局部重复、联想学习和迭代自然类别第十章：生物基质重要吗？第十一章：跳跃假设和社会半解决方案

更新时间: 2026-01-03 01:30:26

领域: cs.AI

下载: http://arxiv.org/abs/2510.09858v2

A Platform for Interactive AI Character Experiences

From movie characters to modern science fiction - bringing characters into interactive, story-driven conversations has captured imaginations across generations. Achieving this vision is highly challenging and requires much more than just language modeling. It involves numerous complex AI challenges, such as conversational AI, maintaining character integrity, managing personality and emotions, handling knowledge and memory, synthesizing voice, generating animations, enabling real-world interactions, and integration with physical environments. Recent advancements in the development of foundation models, prompt engineering, and fine-tuning for downstream tasks have enabled researchers to address these individual challenges. However, combining these technologies for interactive characters remains an open problem. We present a system and platform for conveniently designing believable digital characters, enabling a conversational and story-driven experience while providing solutions to all of the technical challenges. As a proof-of-concept, we introduce Digital Einstein, which allows users to engage in conversations with a digital representation of Albert Einstein about his life, research, and persona. While Digital Einstein exemplifies our methods for a specific character, our system is flexible and generalizes to any story-driven or conversational character. By unifying these diverse AI components into a single, easy-to-adapt platform, our work paves the way for immersive character experiences, turning the dream of lifelike, story-based interactions into a reality.

Updated: 2026-01-03 01:27:19

标题: 一个交互式 AI 角色体验平台

摘要: 从电影角色到现代科幻小说——将角色引入互动、以故事为驱动的对话中，已经吸引了几代人的想象力。实现这一愿景是极具挑战性的，需要远远超出语言建模的范畴。它涉及到许多复杂的人工智能挑战，如对话人工智能、保持角色完整性、管理个性和情绪、处理知识和记忆、合成语音、生成动画、实现现实世界互动，并与物理环境进行整合。最近在基础模型、提示工程和针对下游任务的微调方面取得的进展，使研究人员能够解决这些个别挑战。然而，将这些技术结合起来用于互动角色仍然是一个未解决的问题。我们提出了一个系统和平台，用于方便设计可信的数字角色，实现对话和以故事为驱动的体验，同时提供解决所有技术挑战的方案。作为概念验证，我们介绍了Digital Einstein，它允许用户与阿尔伯特·爱因斯坦的数字代表进行对话，讨论他的生活、研究和个人品格。虽然Digital Einstein展示了我们针对特定角色的方法，但我们的系统是灵活的，可以推广到任何以故事为驱动或对话角色。通过将这些多样化的人工智能组件统一到一个易于调整的平台中，我们的工作为沉浸式角色体验铺平了道路，将栩栩如生的、基于故事的互动梦想变为现实。

更新时间: 2026-01-03 01:27:19

领域: cs.HC,cs.AI,cs.CL,cs.GR

下载: http://arxiv.org/abs/2601.01027v1

Enhanced Leukemic Cell Classification Using Attention-Based CNN and Data Augmentation

We present a reproducible deep learning pipeline for leukemic cell classification, focusing on system architecture, experimental robustness, and software design choices for medical image analysis. Acute lymphoblastic leukemia (ALL) is the most common childhood cancer, requiring expert microscopic diagnosis that suffers from inter-observer variability and time constraints. The proposed system integrates an attention-based convolutional neural network combining EfficientNetV2-B3 with Squeeze-and-Excitation mechanisms for automated ALL cell classification. Our approach employs comprehensive data augmentation, focal loss for class imbalance, and patient-wise data splitting to ensure robust and reproducible evaluation. On the C-NMC 2019 dataset (12,528 original images from 62 patients), the system achieves a 97.89% F1-score and 97.89% accuracy on the test set, with statistical validation through 100-iteration Monte Carlo experiments confirming significant improvements (p < 0.001) over baseline methods. The proposed pipeline outperforms existing approaches by up to 4.67% while using 89% fewer parameters than VGG16 (15.2M vs. 138M). The attention mechanism provides interpretable visualizations of diagnostically relevant cellular features, demonstrating that modern attention-based architectures can improve leukemic cell classification while maintaining computational efficiency suitable for clinical deployment.

Updated: 2026-01-03 01:24:11

标题: 使用基于注意力的CNN和数据增强技术提高白血病细胞分类

摘要: 我们提出了一个可重复的深度学习管道，用于白血病细胞分类，重点关注系统架构、实验稳健性和软件设计选择，用于医学图像分析。急性淋巴细胞白血病（ALL）是最常见的儿童癌症，需要专家显微镜诊断，但存在观察者之间的变异性和时间限制。所提出的系统集成了一个基于注意力的卷积神经网络，将EfficientNetV2-B3与Squeeze-and-Excitation机制结合起来，用于自动化ALL细胞分类。我们的方法采用全面的数据增强、聚焦损失用于类别不平衡以及病人级数据拆分，以确保稳健和可重复的评估。在C-NMC 2019数据集（62名患者的12,528张原始图像）上，系统在测试集上实现了97.89%的F1分数和97.89%的准确率，通过100次蒙特卡洛实验进行的统计验证证实了与基线方法相比的显著改进（p <0.001）。所提出的管道在使用参数比VGG16少89%的情况下，比现有方法表现更好，性能提高了高达4.67%（15.2M vs. 138M）。注意力机制提供了可解释的相关诊断细胞特征的可视化，证明现代基于注意力的架构可以提高白血病细胞分类的准确性，同时保持适用于临床部署的计算效率。

更新时间: 2026-01-03 01:24:11

领域: cs.CV,cs.AI,cs.LG,cs.SE

下载: http://arxiv.org/abs/2601.01026v1

ITSELF: Attention Guided Fine-Grained Alignment for Vision-Language Retrieval

Vision Language Models (VLMs) have rapidly advanced and show strong promise for text-based person search (TBPS), a task that requires capturing fine-grained relationships between images and text to distinguish individuals. Previous methods address these challenges through local alignment, yet they are often prone to shortcut learning and spurious correlations, yielding misalignment. Moreover, injecting prior knowledge can distort intra-modality structure. Motivated by our finding that encoder attention surfaces spatially precise evidence from the earliest training epochs, and to alleviate these issues, we introduceITSELF, an attention-guided framework for implicit local alignment. At its core, Guided Representation with Attentive Bank (GRAB) converts the model's own attention into an Attentive Bank of high-saliency tokens and applies local objectives on this bank, learning fine-grained correspondences without extra supervision. To make the selection reliable and non-redundant, we introduce Multi-Layer Attention for Robust Selection (MARS), which aggregates attention across layers and performs diversity-aware top-k selection; and Adaptive Token Scheduler (ATS), which schedules the retention budget from coarse to fine over training, preserving context early while progressively focusing on discriminative details. Extensive experiments on three widely used TBPS benchmarks showstate-of-the-art performance and strong cross-dataset generalization, confirming the effectiveness and robustness of our approach without additional prior supervision. Our project is publicly available at https://trhuuloc.github.io/itself

Updated: 2026-01-03 01:19:36

标题: 自身：注意力引导的细粒度对齐用于视觉-语言检索

摘要: 视觉语言模型（VLMs）已经迅速发展，并在基于文本的人员搜索（TBPS）任务中展现出强大的潜力，这需要捕捉图像和文本之间的细粒度关系以区分个体。先前的方法通过局部对齐来解决这些挑战，但往往容易出现捷径学习和伪相关性，导致错位。此外，注入先验知识可能会扭曲内部模态结构。受我们发现编码器关注力在最早的训练时期表现出空间精确证据的启发，并为了缓解这些问题，我们引入了ITSELF，一个基于注意力引导的隐式局部对齐框架。在其核心，具有关注力银行的引导表示（GRAB）将模型自身的关注力转化为高显著性令牌的关注力银行，并在此银行上应用局部目标，学习细粒度对应关系而无需额外监督。为了使选择可靠且非冗余，我们引入了用于稳健选择的多层注意力（MARS），它在各层之间聚合关注力并进行多样性感知的前k选择；以及自适应令牌调度器（ATS），它在训练过程中从粗到细地调度保留预算，早期保留上下文，逐渐关注区分性细节。在三个广泛使用的TBPS基准测试上进行了大量实验，表明我们的方法具有最先进的性能和强大的跨数据集泛化能力，证实了我们的方法在没有额外先验监督的情况下的有效性和稳健性。我们的项目可在https://trhuuloc.github.io/itself上公开获取。

更新时间: 2026-01-03 01:19:36

领域: cs.CV,cs.AI,cs.IR

下载: http://arxiv.org/abs/2601.01024v1

Wireless Dataset Similarity: Measuring Distances in Supervised and Unsupervised Machine Learning

This paper introduces a task- and model-aware framework for measuring similarity between wireless datasets, enabling applications such as dataset selection/augmentation, simulation-to-real (sim2real) comparison, task-specific synthetic data generation, and informing decisions on model training/adaptation to new deployments. We evaluate candidate dataset distance metrics by how well they predict cross-dataset transferability: if two datasets have a small distance, a model trained on one should perform well on the other. We apply the framework on an unsupervised task, channel state information (CSI) compression, using autoencoders. Using metrics based on UMAP embeddings, combined with Wasserstein and Euclidean distances, we achieve Pearson correlations exceeding 0.85 between dataset distances and train-on-one/test-on-another task performance. We also apply the framework to a supervised beam prediction in the downlink using convolutional neural networks. For this task, we derive a label-aware distance by integrating supervised UMAP and penalties for dataset imbalance. Across both tasks, the resulting distances outperform traditional baselines and consistently exhibit stronger correlations with model transferability, supporting task-relevant comparisons between wireless datasets.

Updated: 2026-01-03 01:15:27

标题: 无线数据集相似性：在监督和无监督机器学习中测量距离

摘要: 这篇论文介绍了一个针对无线数据集之间相似性进行测量的任务-和模型感知框架，从而实现数据集选择/增强、仿真到真实（sim2real）比较、特定任务的合成数据生成，以及指导模型训练/适应新部署决策的应用。我们通过评估候选数据集距离度量方式来衡量它们对跨数据集可转移性的预测能力：如果两个数据集之间的距离很小，那么在一个数据集上训练的模型在另一个数据集上应该表现良好。我们在一个无监督任务——信道状态信息（CSI）压缩上应用了这个框架，使用自动编码器。通过基于UMAP嵌入、结合Wasserstein和欧氏距离的度量方式，我们实现了数据集距离和在一个数据集上训练/在另一个数据集上测试的任务性能之间的皮尔逊相关系数超过0.85。我们还将这个框架应用到了使用卷积神经网络进行下行波束预测的监督任务上。对于这个任务，我们通过整合监督UMAP和数据集不平衡的惩罚项来推导出一个标签感知的距离。在这两个任务中，结果距离优于传统基线，并且始终展现出与模型可转移性更强的相关性，支持在无线数据集之间进行任务相关比较。

更新时间: 2026-01-03 01:15:27

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2601.01023v1

Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking

Existing RGB-Event visual object tracking approaches primarily rely on conventional feature-level fusion, failing to fully exploit the unique advantages of event cameras. In particular, the high dynamic range and motion-sensitive nature of event cameras are often overlooked, while low-information regions are processed uniformly, leading to unnecessary computational overhead for the backbone network. To address these issues, we propose a novel tracking framework that performs early fusion in the frequency domain, enabling effective aggregation of high-frequency information from the event modality. Specifically, RGB and event modalities are transformed from the spatial domain to the frequency domain via the Fast Fourier Transform, with their amplitude and phase components decoupled. High-frequency event information is selectively fused into RGB modality through amplitude and phase attention, enhancing feature representation while substantially reducing backbone computation. In addition, a motion-guided spatial sparsification module leverages the motion-sensitive nature of event cameras to capture the relationship between target motion cues and spatial probability distribution, filtering out low-information regions and enhancing target-relevant features. Finally, a sparse set of target-relevant features is fed into the backbone network for learning, and the tracking head predicts the final target position. Extensive experiments on three widely used RGB-Event tracking benchmark datasets, including FE108, FELT, and COESOT, demonstrate the high performance and efficiency of our method. The source code of this paper will be released on https://github.com/Event-AHU/OpenEvTracking

Updated: 2026-01-03 01:10:17

标题: 在频域中解耦幅度和相位注意力，用于基于RGB-Event的视觉目标跟踪

摘要: 现有的RGB-Event视觉目标跟踪方法主要依赖于传统的特征级融合，未能充分利用事件摄像头的独特优势。特别是，事件摄像头的高动态范围和对运动敏感的特性经常被忽视，而低信息区域被均匀处理，导致主干网络的不必要的计算开销。为了解决这些问题，我们提出了一种新颖的跟踪框架，在频域中进行早期融合，实现了从事件模态中有效聚合高频信息。具体来说，通过快速傅里叶变换，将RGB和事件模态从空间域转换到频域，其振幅和相位分量被解耦。高频事件信息通过振幅和相位注意力有选择地融合到RGB模态中，增强特征表示，同时大大减少主干计算。此外，一个运动引导的空间稀疏化模块利用事件摄像头的对运动敏感的特性捕获目标运动线索和空间概率分布之间的关系，过滤出低信息区域，增强目标相关特征。最后，一个稀疏的目标相关特征集被馈送到主干网络进行学习，跟踪头预测最终目标位置。对三个广泛使用的RGB-Event跟踪基准数据集，包括FE108、FELT和COESOT，进行了大量实验，证明了我们方法的高性能和效率。本文的源代码将在https://github.com/Event-AHU/OpenEvTracking 上发布。

更新时间: 2026-01-03 01:10:17

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2601.01022v1

Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation

In open-vocabulary mobile manipulation (OVMM), task success often hinges on the selection of an appropriate base placement for the robot. Existing approaches typically navigate to proximity-based regions without considering affordances, resulting in frequent manipulation failures. We propose Affordance-Guided Coarse-to-Fine Exploration, a zero-shot framework for base placement that integrates semantic understanding from vision-language models (VLMs) with geometric feasibility through an iterative optimization process. Our method constructs cross-modal representations, namely Affordance RGB and Obstacle Map+, to align semantics with spatial context. This enables reasoning that extends beyond the egocentric limitations of RGB perception. To ensure interaction is guided by task-relevant affordances, we leverage coarse semantic priors from VLMs to guide the search toward task-relevant regions and refine placements with geometric constraints, thereby reducing the risk of convergence to local optima. Evaluated on five diverse open-vocabulary mobile manipulation tasks, our system achieves an 85% success rate, significantly outperforming classical geometric planners and VLM-based methods. This demonstrates the promise of affordance-aware and multimodal reasoning for generalizable, instruction-conditioned planning in OVMM.

Updated: 2026-01-03 01:09:54

标题: 在开放词汇移动操作中基础放置的从粗到细探索的可负担性引导

摘要: 在开放词汇移动操作（OVMM）中，任务成功往往取决于为机器人选择合适的基础放置位置。现有方法通常导航到基于接近性的区域，而不考虑可支配性，导致频繁的操作失败。我们提出了一种基于可支配性引导的粗到细探索的零样本框架，用于基础放置，通过迭代优化过程将视觉-语言模型（VLMs）的语义理解与几何可行性集成。我们的方法构建了跨模态表示，即可支配性RGB和障碍物图+，以使语义与空间上下文对齐。这使得推理能够超越RGB感知的自我中心限制。为了确保交互由任务相关的可支配性引导，我们利用VLMs的粗略语义先验来引导搜索向任务相关区域，并通过几何约束来优化放置，从而降低收敛到局部最优的风险。在五个不同的开放词汇移动操作任务上进行评估，我们的系统实现了85%的成功率，明显优于传统的几何规划器和基于VLM的方法。这表明了可支配性感知和多模态推理在OVMM中通用、指令条件下的规划中的潜力。

更新时间: 2026-01-03 01:09:54

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2511.06240v2

Expanding the Chaos: Neural Operator for Stochastic (Partial) Differential Equations

Stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) are fundamental tools for modeling stochastic dynamics across the natural sciences and modern machine learning. Developing deep learning models for approximating their solution operators promises not only fast, practical solvers, but may also inspire models that resolve classical learning tasks from a new perspective. In this work, we build on classical Wiener chaos expansions (WCE) to design neural operator (NO) architectures for SPDEs and SDEs: we project the driving noise paths onto orthonormal Wick Hermite features and parameterize the resulting deterministic chaos coefficients with neural operators, so that full solution trajectories can be reconstructed from noise in a single forward pass. On the theoretical side, we investigate the classical WCE results for the class of multi-dimensional SDEs and semilinear SPDEs considered here by explicitly writing down the associated coupled ODE/PDE systems for their chaos coefficients, which makes the separation between stochastic forcing and deterministic dynamics fully explicit and directly motivates our model designs. On the empirical side, we validate our models on a diverse suite of problems: classical SPDE benchmarks, diffusion one-step sampling on images, topological interpolation on graphs, financial extrapolation, parameter estimation, and manifold SDEs for flood prediction, demonstrating competitive accuracy and broad applicability. Overall, our results indicate that WCE-based neural operators provide a practical and scalable way to learn SDE/SPDE solution operators across diverse domains.

Updated: 2026-01-03 00:59:25

标题: 扩展混沌：用于随机（偏）微分方程的神经算子

摘要: 随机微分方程（SDEs）和随机偏微分方程（SPDEs）是建模自然科学和现代机器学习中的随机动态的基本工具。开发用于逼近它们解算符的深度学习模型不仅有望提供快速、实用的求解器，还可能激发出从新的角度解决经典学习任务的模型。在这项工作中，我们基于经典的Wiener混沌展开（WCE）设计了用于SPDEs和SDEs的神经算子（NO）架构：我们将驱动噪声路径投影到正交的Wick Hermite特征上，并用神经算子参数化生成的确定性混沌系数，以便可以从噪声中在单次前向传递中重建完整的解轨迹。在理论方面，我们通过明确编写其混沌系数相关的耦合ODE/PDE系统，研究了考虑在此处的多维SDEs和半线性SPDEs的经典WCE结果，这使得随机强迫和确定性动态之间的分离得以充分明确，并直接激发了我们的模型设计。在实证方面，我们在各种问题上验证了我们的模型：经典SPDE基准测试、图像扩散一步采样、图上拓扑插值、金融外推、参数估计和用于洪水预测的流形SDEs，展示了竞争性的准确性和广泛的适用性。总体而言，我们的结果表明基于WCE的神经算子提供了一个实用和可扩展的方法，以学习在不同领域中的SDE/SPDE解算符。

更新时间: 2026-01-03 00:59:25

领域: cs.LG

下载: http://arxiv.org/abs/2601.01021v1

Improving Variational Autoencoder using Random Fourier Transformation: An Aviation Safety Anomaly Detection Case-Study

In this study, we focus on the training process and inference improvements of deep neural networks (DNNs), specifically Autoencoders (AEs) and Variational Autoencoders (VAEs), using Random Fourier Transformation (RFT). We further explore the role of RFT in model training behavior using Frequency Principle (F-Principle) analysis and show that models with RFT turn to learn low frequency and high frequency at the same time, whereas conventional DNNs start from low frequency and gradually learn (if successful) high-frequency features. We focus on reconstruction-based anomaly detection using autoencoder and variational autoencoder and investigate the RFT's role. We also introduced a trainable variant of RFT that uses the existing computation graph to train the expansion of RFT instead of it being random. We showcase our findings with two low-dimensional synthetic datasets for data representation, and an aviation safety dataset, called Dashlink, for high-dimensional reconstruction-based anomaly detection. The results indicate the superiority of models with Fourier transformation compared to the conventional counterpart and remain inconclusive regarding the benefits of using trainable Fourier transformation in contrast to the Random variant.

Updated: 2026-01-03 00:56:14

标题: 使用随机傅立叶变换改进变分自动编码器：一项航空安全异常检测案例研究

摘要: 在这项研究中，我们关注深度神经网络（DNNs），特别是自动编码器（AEs）和变分自动编码器（VAEs）的训练过程和推理改进，使用随机傅里叶变换（RFT）。我们进一步探讨了RFT在模型训练行为中的作用，使用频率原则（F-Principle）分析，并展示了具有RFT的模型同时学习低频和高频，而传统DNNs从低频开始逐渐学习（如果成功）高频特征。我们专注于使用自动编码器和变分自动编码器进行基于重建的异常检测，并调查了RFT的作用。我们还介绍了一种可训练的RFT变体，该变体使用现有计算图来训练RFT的扩展，而不是随机的方式。我们通过两个低维合成数据集展示了我们的发现，一个名为Dashlink的航空安全数据集，用于基于重建的高维异常检测。结果表明，与传统模型相比，具有傅里叶变换的模型具有卓越性能，并且对于使用可训练傅里叶变换相对于随机变体的好处尚无定论。

更新时间: 2026-01-03 00:56:14

领域: cs.LG,cs.AI,eess.SY

下载: http://arxiv.org/abs/2601.01016v1

Geometric and Dynamic Scaling in Deep Transformers

Despite their empirical success, pushing Transformer architectures to extreme depth often leads to a paradoxical failure: representations become increasingly redundant, lose rank, and ultimately collapse. Existing explanations largely attribute this phenomenon to optimization instability or vanishing gradients, yet such accounts fail to explain why collapse persists even under modern normalization and initialization schemes. In this paper, we argue that the collapse of deep Transformers is fundamentally a geometric problem. Standard residual updates implicitly assume that feature accumulation is always beneficial, but offer no mechanism to constrain update directions or to erase outdated information. As depth increases, this leads to systematic drift off the semantic manifold and monotonic feature accumulation, causing representational degeneracy. We propose a unified geometric framework that addresses these failures through two orthogonal principles. First, manifold-constrained hyper-connections restrict residual updates to valid local tangent directions, preventing uncontrolled manifold drift. Second, deep delta learning introduces data-dependent, non-monotonic updates that enable reflection and erasure of redundant features rather than their unconditional accumulation. Together, these mechanisms decouple the direction and sign of feature updates, yielding a stable geometric evolution across depth. We term the resulting architecture the Manifold-Geometric Transformer (MGT). Our analysis predicts that enforcing geometric validity while allowing dynamic erasure is essential for avoiding rank collapse in ultra-deep networks. We outline an evaluation protocol for Transformers exceeding 100 layers to test the hypothesis that geometry, rather than depth itself, is the key limiting factor in deep representation learning.

Updated: 2026-01-03 00:41:46

标题: 深度变压器中的几何和动态缩放

摘要: 尽管Transformer架构在实践中取得了成功，但将其推到极致深度往往会导致一个矛盾的失败：表示变得越来越冗余，失去秩，最终崩溃。现有的解释主要将这一现象归因于优化不稳定性或梯度消失，然而这些解释未能解释为什么即使在现代归一化和初始化方案下崩溃仍然持续存在。在本文中，我们认为深度Transformer的崩溃基本上是一个几何问题。标准的残差更新隐含地假定特征累积总是有益的，但没有机制来限制更新方向或擦除过时信息。随着深度的增加，这导致系统漂离语义流形和单调特征积累，导致表示退化。我们提出了一个统一的几何框架，通过两个正交原则解决了这些失败。首先，流形约束超连接将残差更新限制在有效的局部切线方向上，防止流形漂移。其次，深度增量学习引入了数据依赖的非单调更新，使得冗余特征可以反射和擦除，而不是无条件地累积。通过这些机制，特征更新的方向和符号被解耦，深度上的几何演变变得稳定。我们将所得到的架构称为Manifold-Geometric Transformer（MGT）。我们的分析预测，在超深网络中，强制执行几何有效性同时允许动态擦除是避免秩崩溃的关键。我们概述了一个评估协议，用于测试超过100层的Transformer，以验证几何而不是深度本身是深度表示学习的关键限制因素的假设。

更新时间: 2026-01-03 00:41:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2601.01014v1

Reinforcement Learning for Monetary Policy Under Macroeconomic Uncertainty: Analyzing Tabular and Function Approximation Methods

We study how a central bank should dynamically set short-term nominal interest rates to stabilize inflation and unemployment when macroeconomic relationships are uncertain and time-varying. We model monetary policy as a sequential decision-making problem where the central bank observes macroeconomic conditions quarterly and chooses interest rate adjustments. Using publicly accessible historical Federal Reserve Economic Data (FRED), we construct a linear-Gaussian transition model and implement a discrete-action Markov Decision Process with a quadratic loss reward function. We chose to compare nine different reinforcement learning style approaches against Taylor Rule and naive baselines, including tabular Q-learning variants, SARSA, Actor-Critic, Deep Q-Networks, Bayesian Q-learning with uncertainty quantification, and POMDP formulations with partial observability. Notably, despite its simplicity, standard tabular Q-learning achieved the best performance (-615.13 +- 309.58 mean return), outperforming both enhanced RL methods and traditional policy rules. Our results suggest that while sophisticated RL techniques show promise for monetary policy applications, simpler approaches may be more robust in this domain, highlighting important challenges in applying modern RL to macroeconomic policy.

Updated: 2026-01-03 00:40:33

标题: 强化学习在宏观经济不确定性下货币政策中的应用：分析表格和函数逼近方法

摘要: 我们研究了在宏观经济关系不确定且时变的情况下，中央银行应如何动态设定短期名义利率以稳定通胀和失业率。我们将货币政策建模为一个顺序决策问题，其中中央银行每季度观察宏观经济状况并选择利率调整。使用公开可访问的历史联邦储备经济数据（FRED），我们构建了一个线性高斯转移模型，并实施了一个具有二次损失奖励函数的离散动作马尔可夫决策过程。我们选择了比较九种不同的强化学习风格方法与泰勒规则和朴素基线，包括表格Q学习变体、SARSA、演员-评论家、深度Q网络、带有不确定性量化的贝叶斯Q学习，以及具有部分可观察性的POMDP表述。值得注意的是，尽管其简单性，标准的表格Q学习取得了最佳表现（-615.13 +- 309.58均值回报），胜过了增强的强化学习方法和传统的政策规则。我们的结果表明，虽然复杂的强化学习技术在货币政策应用中显示出潜力，但在这一领域简单的方法可能更为稳健，突显了将现代强化学习应用于宏观经济政策中的重要挑战。

更新时间: 2026-01-03 00:40:33

领域: q-fin.ST,cs.AI,cs.LG,econ.EM

下载: http://arxiv.org/abs/2512.17929v2

Intention Collapse: Intention-Level Metrics for Reasoning in Language Models

Every act of language generation compresses a rich internal state into a single token sequence. We call this process intention collapse: a many-to-one projection from a high dimensional intention space I into an external language space L. We formalize intention collapse for contemporary language models, define three simple, model agnostic intention metrics (intention entropy Hint, effective dimensionality dimeff, and latent knowledge recoverability Recov), and propose an empirical agenda for studying how inference time computation shapes internal intentions before they are verbalized. We also report a first small scale experiment. Using a 4 bit Mistral 7B model on 200 GSM8K problems, we compare a direct answer baseline, a chain of thought (CoT) regime, and a babble control. CoT raises accuracy from 5.5 percent to 53 percent, sharply reduces pre collapse intention entropy (from 1.42 to 0.37 bits), and shows higher global effective dimensionality than the other regimes despite producing fewer tokens than babble. At the same time, Hint has little item level predictive power, and a linear probe on I achieves AUROC 0.65 in the CoT regime but only about chance in the baseline regime, where it collapses to the majority class. These preliminary results indicate that intention level metrics can distinguish inference regimes and expose latent information that is partly lost during collapse, while also revealing important limitations of our current proxies

Updated: 2026-01-03 00:19:53

标题: 意图坍塌：语言模型推理的意图级指标

摘要: 每个语言生成的行为都将丰富的内部状态压缩成一个单一的标记序列。我们将这个过程称为意图坍缩：从高维意图空间I到外部语言空间L的多对一投影。我们为当代语言模型形式化了意图坍缩，定义了三个简单的、与模型无关的意图度量（意图熵Hint，有效维度dimeff和潜在知识可恢复性Recov），并提出了一个实证议程，研究推理时间计算如何塑造内部意图在被言语化之前。我们还报告了一个首次小规模实验。使用一个4位的Mistral 7B模型解决200个GSM8K问题，我们比较了直接答案基线、一种思维链（CoT）制度和一个胡言控制。CoT将准确率从5.5%提高到53%，大幅降低了坍缩前的意图熵（从1.42位降至0.37位），并展示了比其他制度更高的全局有效维度，尽管产生的标记比胡言少。同时，Hint在项目级别上的预测能力很小，CoT制度上I的线性探针实现了AUROC 0.65，但在基线制度上只是大约是偶然的，其中它坍缩到了多数类。这些初步结果表明，意图水平的度量可以区分推理制度，并揭示在坍缩过程中部分丢失的潜在信息，同时也揭示了我们目前代理的重要局限性。

更新时间: 2026-01-03 00:19:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2601.01011v1

Data-Driven Assessment of Concrete Mixture Compositions on Chloride Transport via Standalone Machine Learning Algorithms

This paper employs a data-driven approach to determine the impact of concrete mixture compositions on the temporal evolution of chloride in concrete structures. This is critical for assessing the service life of civil infrastructure subjected to aggressive environments. The adopted methodology relies on several simple and complex standalone machine learning (ML) algorithms, with the primary objective of establishing confidence in the unbiased prediction of the underlying hidden correlations. The simple algorithms include linear regression (LR), k-nearest neighbors (KNN) regression, and kernel ridge regression (KRR). The complex algorithms entail support vector regression (SVR), Gaussian process regression (GPR), and two families of artificial neural networks, including a feedforward network (multilayer perceptron, MLP) and a gated recurrent unit (GRU). The MLP architecture cannot explicitly handle sequential data, a limitation addressed by the GRU. A comprehensive dataset is considered. The performance of ML algorithms is evaluated, with KRR, GPR, and MLP exhibiting high accuracy. Given the diversity of the adopted concrete mixture proportions, the GRU was unable to accurately reproduce the response in the test set. Further analyses elucidate the contributions of mixture compositions to the temporal evolution of chloride. The results obtained from the GPR model unravel latent correlations through clear and explainable trends. The MLP, SVR, and KRR also provide acceptable estimates of the overall trends. The majority of mixture components exhibit an inverse relation with chloride content, while a few components demonstrate a direct correlation. These findings highlight the potential of surrogate approaches for describing the physical processes involved in chloride ingress and the associated correlations, toward the ultimate goal of enhancing the service life of civil infrastructure.

Updated: 2026-01-03 00:11:59

标题: 基于数据驱动的混凝土配合比对氯离子传输的评估，通过独立机器学习算法。

摘要: 本文采用数据驱动方法来确定混凝土配合物对混凝土结构中氯化物时间演变的影响。这对评估暴露于恶劣环境下的土木基础设施的使用寿命至关重要。采用的方法依赖于几种简单和复杂的独立机器学习（ML）算法，主要目标是建立对潜在隐藏相关性的无偏预测的信心。简单算法包括线性回归（LR）、k-最近邻回归（KNN）和核岭回归（KRR）。复杂算法包括支持向量回归（SVR）、高斯过程回归（GPR）和两类人工神经网络，包括前馈网络（多层感知器，MLP）和门控循环单元（GRU）。MLP架构不能明确处理序列数据，这是GRU解决的限制。考虑了全面的数据集。评估了机器学习算法的性能，KRR、GPR和MLP表现出高准确性。鉴于采用的混凝土配比的多样性，GRU无法准确地在测试集中再现响应。进一步分析阐明了混凝土配合物对氯化物时间演变的贡献。从GPR模型获得的结果揭示了通过清晰和可解释的趋势揭示潜在相关性。MLP、SVR和KRR也提供了对整体趋势的可接受估计。大多数混合组分与氯化物含量呈负相关，而少数组分显示出直接相关性。这些发现突显了用于描述氯化物渗透过程和相关相关性的物理过程的替代方法的潜力，以实现增强土木基础设施使用寿命的终极目标。

更新时间: 2026-01-03 00:11:59

领域: cs.LG,cs.AI,cs.CE

下载: http://arxiv.org/abs/2601.01009v1

An Explainable Agentic AI Framework for Uncertainty-Aware and Abstention-Enabled Acute Ischemic Stroke Imaging Decisions

Artificial intelligence models have shown strong potential in acute ischemic stroke imaging, particularly for lesion detection and segmentation using computed tomography and magnetic resonance imaging. However, most existing approaches operate as black box predictors, producing deterministic outputs without explicit uncertainty awareness or structured mechanisms to abstain under ambiguous conditions. This limitation raises serious safety and trust concerns in high risk emergency radiology settings. In this paper, we propose an explainable agentic AI framework for uncertainty aware and abstention enabled decision support in acute ischemic stroke imaging. The framework follows a modular agentic pipeline in which a perception agent performs lesion aware image analysis, an uncertainty estimation agent computes slice level predictive reliability, and a decision agent determines whether to issue a prediction or abstain based on predefined uncertainty thresholds. Unlike prior stroke imaging systems that primarily focus on improving segmentation or classification accuracy, the proposed framework explicitly prioritizes clinical safety, transparency, and clinician aligned decision behavior. Qualitative and case based analyses across representative stroke imaging scenarios demonstrate that uncertainty driven abstention naturally emerges in diagnostically ambiguous regions and low information slices. The framework further integrates visual explanation mechanisms to support both predictive and abstention decisions, addressing a key limitation of existing uncertainty aware medical imaging systems. Rather than introducing a new performance benchmark, this work presents agentic control, uncertainty awareness, and selective abstention as essential design principles for developing safe and trustworthy medical imaging AI systems.

Updated: 2026-01-03 00:10:08

标题: 一个可解释性的Agent AI框架，用于考虑不确定性和放弃功能的急性缺血性卒中影像决策

摘要: 人工智能模型在急性缺血性中风成像方面表现出强大的潜力，特别是在使用计算机断层扫描和磁共振成像进行病变检测和分割方面。然而，大多数现有方法作为黑匣子预测器运行，产生确定性输出，没有明确的不确定性意识或在模糊条件下弃权的结构化机制。这种限制在高风险急诊放射学环境中引起了严重的安全和信任问题。在本文中，我们提出了一个可解释的主体人工智能框架，用于在急性缺血性中风成像中意识到不确定性并实现弃权决策支持。该框架遵循一个模块化的主体管道，其中一个感知主体执行病变感知图像分析，一个不确定性估计主体计算切片级预测可靠性，一个决策主体根据预定义的不确定性阈值确定是否发出预测或弃权。与先前主要关注改善分割或分类准确性的中风成像系统不同，提出的框架明确将临床安全性、透明性和临床医师对齐的决策行为放在首位。通过代表性中风成像场景的定性和基于案例的分析，展示了在诊断上存在模糊地区和信息较少切片中自然出现的基于不确定性的弃权。该框架进一步整合了视觉解释机制，以支持预测和弃权决策，解决了现有不确定性感知医学成像系统的一个关键限制。本工作不是引入新的性能基准，而是将主体控制、不确定性意识和选择性弃权作为开发安全可信的医学成像人工智能系统的基本设计原则。

更新时间: 2026-01-03 00:10:08

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2601.01008v1